CSC/ECE 506 Spring 2012/1c cl
Introduction
In computing, MISD (multiple instruction, single data) is a type of parallel computing architecture in which multiple processing elements execute from different instruction streams, and data is passed from one processing element to the next. The requirement that data is passed from one processing element to the next means that it is restricted to a certain type of computations, but is hard to apply in general. Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline. Fault-tolerant computers executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type.
Examples of parallel computer with MISD architechture
Few actual examples of this class of parallel computer have ever existed. One example of this machine is the systolic array, such as the CMU iWarp[Borkaret al., 1990]. Another prominent example of MISD in computing are the Space Shuttle flight control computers. The research on MISD concentrates on some conceivable usage: Multiple frequency filters operating on a single signal stream, Multiple cryptography algorithms, and Attempting to crack a single coded message.
Application
===in Network===
A number of important system services need strong scaling, where a fixed workload is executed more quickly with increased core counts. This is particularly true for data-intensive, network-oriented system services because network interface card (NIC) line rates continue to increase but the individual cores that service them are not getting any faster. Replicated work approach based on a multiple-instruction/single-data (MISD) execution model seems to be a promising solution to take advantage of unexploited parallelization opportunities in multi-core systems. This approach is particularly important in the network stack, as it is needed to enable strongly scalable network services.
A multiple instruction/single data (MISD) execution model (i.e. replicated work) is used to provide ne-grained data consistency between cores and to eliminate expensive inter-core interactions. In this approach, shown in Figure 1(a), state is replicated into domains on separate cores, and requests that modify shared state are broadcast to every domain using a ringbuffer -based channel abstraction. The first replica that dequeues a request becomes the primary replica for that request and is responsible for fully processing it, including any updates with globally visible side effects (e.g. data delivery, packet reconstruction, acknowledgment generation). Other replicas that process the request will, on the other hand, only partially process each request to maintain state consistency. This approach is particularly appropriate for many network services because it parallelizes expensive per-byte processing (e.g. data copies, encryption/decryption, and error correction) and replicates per-header processing costs that are known to be small. It also retains the reduced locking and caching costs of systems like Barrelfish while adding the ability to perform fine-grained updates to logically shared state.
To examine the potential for this approach compared to lock-based and atomic instruction-based MIMD approaches, A simple synthetic test has been constructed. In this test, processing each request requires some amount of work that can be done by one core without synchronization or replication (i.e. parallelizable work), and some work that must be synchronized or replicated. Specially, the parallelizable work is a set of memory updates done on per-core memory, while the synchronized work is the set of memory up-dates that must be performed: (a) on every replica; (b) while holding a shared lock; or (c) by using atomic memory update instructions. Figure 1(b) shows the potential performance advantages of this approach using a 10:1 ratio of parallelizable to replicated work; we chose this ratio based on our studies of production TCP/IP stacks. As can be seen, the lock-based model scales remarkably poorly and the atomic instruction approach is only slightly better.