CSC/ECE 506 Spring 2012/1c 12
MISD
Micheal J. Flynn introduced the idea of an MISD (Multiple Instruction, Single Data) computer architectures in his original taxonomy in 1966.[1]
The MISD Architecture
Dr. Yan Solihin defines MISD as "..an architecture in which multiple processing elements execute from different instruction streams, and data is passed from one processing element to the next."[2] He also notes that MISD architectures are restricted to certain types of computations due to the requirement of data-passing between processing elements.[2] Each processing element executes different instructions on the data stream.[3] Every time the data is processed by a processing element, we can always argu that the data is no longer the original data introduced at the start of the stream.[4]
From this image, we see that the data stream has one clear entrance and exit into the system. What we are unsure of is if each processing element has access to a collective instruction storage or if all processing elements are embedded with an individual instruction storage. Depending on the specific system described, each processing element is generally function specific or predestined; but in some systems (similar to iWarp), each processing element may be quite advanced.
For this image, Flynn describes each processing element as an independent virtual machine that operates on independent program sequences. He explicitly states that each processing element has it's own private instruction memory, which limits the data stream as being the only interaction between instruction streams.[14]
In this image, Flynn demonstrates a version of MISD in which the data stream is a force forwarding of operands between the execution units. An instruction that any individual execution unit sees can be fixed (flexible setup of units), semi-fixed (one pass of a data file), or variable (stream of instructions could operate on any point of the data stream)[14].
MISD Computers
While it is widely believed that no actual MISD computer exists in practice, it is controversially argued that a systolic array is the most common example of MISD[6].
Some arguments exist that pipelined vector processors could be considered an example of MISD architecture due to the fact that a different operation is performed on the data stream as it flows from stage to stage[6]. The argument against this idea is that individual processing elements in each stage do not technically fetch their operations from an instruction cache[6], but are more similar to a function specific, or ASIC, chip.
One application that exists for MISD VLSI architectures are applications which require multiple pattern matching in large data streams that lack any preprocessed indexes for lookups[8]. This research presents a set of recursive query semantics: "concatenation of basic strings and patterns, alphanumerical comparisons of simple strings, boolean operations on subexpressions, and hamming distance filtering"[12], and then explains that the recursion process of the semantics is best understood as a "..recursion tree where the result is found by propagating the results from the leaf nodes...to the root of the tree"[12].
Recently, Stanford University and Maxeler Technologies have been working on acceleration methodologies that benefit from combining different computer architectures. One of the proposed methodologies based on FPGA arrays uses SIMD for multiple data strips until the pin bandwidth limits the acceleration, then switches to an MISD-style pipeline of the FPGA arrays until acceleration is limited by circuit limitations[13].
Systolic Array
What is a Systolic Array?
"A systolic array is an arrangement of processors in an array (often rectangular) where data flows synchronously across the array between neighbors"[7] Systolic arrays have data processing units (DPU) arranged in the form an matrix such that they are connected to their neighbors in the form of a mesh[9].
The two models of systolic arrays are shown below:
The above diagram represents a systolic array where each DPU performs a specific operation on the data which can be input/output from an external source in the case of embedded systems or could be system generated by a auto sequencing memory unit. Each DPU performs a different computation based on the instruction set given to it and takes in data from the top or the left and then outputs it to it's right or below. A systolic array may or may not have local memory based on the application it is being used for.
link:http://coefs2.njit.edu/labmanuals/ece459/images/array.gif
An example of an application of systolic array is a matrix multiplication. The systolic array can have a 4X4 mesh to multiply two 4X4 matrices where the data of all the rows and columns to be multiplied can be entered as the input into each DPU and the instruction executed by each DPU would be to multiply the incoming stream of numbers and add them to a previous value stored in it if there is any. The final output that is the resultant matrix would be the values stored in each DPU.
Systolic arrays can be used to make algorithms involving a lot of parallel computation much easier. "Systolic array processors have been known to be extremely efficient at executing the class of algorithms that exhibit tight coupling between neighboring cells via data transfers in an N-dimensional model space" [11]. Though the size of the array does affect the performance. Small systolic arrays present timing issues, limitations on bus width and chip pins, as well as pipeline drain caused by interruptions in the data flow. Approaches discussed to resolve these issues were problem partition (either the specified algorithm or the data array), array emulation (time-sharing a small array's processors to mimic a larger array more properly suited to the specified algorithm), and software based scheduling programs[15].
Applications of Systolic Arrays
The various complex problems that can be solved efficiently using systolic arrays are
- Fast Fourier Transforms
- Convolutions
- Dynamic Time Wraping
- Video Filtering
- Data Compression
- Image Processing
- Signal Processing
- Differential Equations
Fast Fourier Transforms
Convolutions
In 2003, the idea of a super-systolic array was introduced by Jae-Jin Lee and Gi-Yong Son. A super-systolic array involves making cells of systolic array themselves a systolic array[16]. They defined the use of a super-systolic array for convolution as "..a logical systolic array in a sense that the array assumes all operation to complete in a unit delay to maintain rhythmic data flow"[16]. For their convolution problem, each cell in the systolic array is capable of performing multiplication and addition. It is the multiplication process that benefits highly from systolization, and is implemented as a systolic array in itself. Converting the cells of systolic array into systolic array themselves results in higher degrees of concurrency and lower resource consumption[16].
Dynamic Time Wraping
Video Filtering
Data Compression
Image Processing
Signal Processing
Differential Equations
Systolic Array vs. MISD
If relating to an MISD architecture, a systolic array is "..a network of small computing elements connected in a regular grid. All the elements are controlled by a global clock. On each cycle, an element will read a piece of data from one of its neighbors, perform a simple operation (e.g. add the incoming element to a stored value), and prepare a value to be written to a neighbor on the next step"[6]. This of course relates to the idea that typically the inner processing units (or nodes) of a systolic array do not access memory or buffers, but pass the data from node-to-node via registers. This relates to Flynn's taxonomy of MISD architecture correctly in that only the first and last processing units (nodes) will access the original data stream.
If a systolic array is not designed with a local memory, the individual processing units are then generally designed to be non-programmable (basically ASIC) and data is then introduced to the systolioc array through an outside shared memory controller (perhaps even through buffers to ease memory traffic)[15]. If designed without a local memory, then, a systolic array does not fit Flynn's taxonomy of MISD architecture: the data stream is still the only means of interaction between the individual processing units, but those individual processing units lack individual intruction memories. It could be argued to mimic SISD (single instruction, single data) at this point, since the processing units themself are generally application-specific. If a systolic array is designed with a local memory, it gains not only the ability to fetch data from that local memory[15], but also instructions. It could: a) have a local instruction memory, in which a program fetches and transmits instructions to each individual processing unit (on the assumption that each of those units is beyond application-specific), or b) attempt to mimic Flynn's MISD taxonomy and grant each of those individiaul processing units an individual instruction memory. The second option does present much overhead, and if not necessarily application or algorithm dependant, will be quite costly for no appearnt reason.
All are algorithm based, that
is one design is only for solving one specific problem.
In this paper, the special purpose systolic architecture has
been extended into a reconfigurable one and a systematic
design approach to mapping two or more algorithms into
a single reconfigurable systolic array is presented. First
multiple algorithms are mapped into a reconfigurable systolic
array that is able to compute one algorithm at a time with
proper control settings. Second the reconfigurable systolic
array is extended by using time or space redundancy so
that it can compute multiple algorithms simultaneously. In
addition, the optimal mapping, which minimizes the total
hardware cost and computation time, is explored and the
necessary condition of the transformation for computing
multiple problem instances is also proposed. According
to this condition, the search space of finding the optimal
mapping can be significantly reduced
Consistently we see that one major design characteristic that either places a systolic array in the MISD list or forces it to stretch the definition is the idea of application-specification, or problem specific. The "true" MISD architecture Flynn describes are individual processing units that are typically the same as it's neighbors, all guided by an ISA (instruction set architecture) capable of identification and execution of multiple operations. To achieve a systolic array that is more purely MISD, it must be re-configurable in some way. One such way is to design the systolic arra architecture such that two or more algorithms can mapped to a single re-configurable systolic array. This is done is two steps: 1) "..compute one algorithm at a time with proper control settings," then 2) extend the array using time or space redundancy to compute multiple algorithms simultaneously[[#References|[
References
- Flynn, M. (1972). "Some Computer Organizations and Their Effectiveness". IEEE Trans. Comput. C-21: 948.
- Solihin, Y. (2008). "Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems". Solihin Publishing & Consulting LLC. C-1: 12.
- CSC 8383 Lecuture 5
- MISD wiki
- ECE506 Spring 2012 Lecture 1
- 3.1.3 MISD Computers
- Laiq hasan,Yahya M.Khawaja,Abdul Bais,"A Systolic Array Architecture for the Smith-Waterman Algorithm with High Performance Cell Design" in IADIS European Conference Data Mining, 2008, pp. 37
- Arne Halaas, Børge Svingen, Magnar Nedland, Pål Sætrom, Ola Snøve, Jr., Olaf René Birkelan, "A Recursive MISD Architecture for Pattern Matching," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 7, July 2004, pp. 727.
- Systolic array
- Systolic array architecture
- Robert E. Morley, Jr.Thomas J. Sullivan ,"A Massively Parallel Systolic Array Processor System," in Electronic Systems and Signals Research Laboratory,Department of Electrical Engineering,Washington University, pp:217.
- Arne Halaas, Børge Svingen, Magnar Nedland, Pål Sætrom, Ola Snøve, Jr., Olaf René Birkelan, "A Recursive MISD Architecture for Pattern Matching," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 7, July 2004, pp. 728.
- Michael Flynn, R. Dimond, O. Mencer, O. Pell, "Finding Speedup in Parallel Processors," in International Symposium on Parallel and Distributed Computing, 2008, pp. 3.
- Michael J. Flnn, "Very High-Speed Computing Systems," Proceedings of the IEEE, vol. 54, no. 12, December 1966, pp.1908.
- Henry Y. H. Chuang, Ling Chen, "A General Approach to Solving Arbitrarily Large Problems in a Fixed Size Systolic Array," in IEEE TH0212-1/88/0000/0195, 1988, pp. 195-204.
- Jae-Jin Lee, Gi-Yong Song, 'Implementation of the Super-Systolic Array for Convolution," in IEEE 8-7803-7659-5/03, 2003, pp. 491-494.
- Wei Jin, Cang N. Zhang, Hua Li, "Mapping Multiple Algorithms into a Reconfigurable Systolic Array," in IEEE 978-1-4244-1643-1/08, 2008, pp. 1187-1191.