CSC/ECE 506 Fall 2007/wiki1 10 aj

From Expertiza_Wiki
Jump to navigation Jump to search

( Section 1.2.6)

New developments in dataflow and systolic architectures

In the 1980s, a steady progress of parallel computing gave way for various unique parallel architectures that underwent extensive research but were not truly implemented to its full extent as when compared to architectures such as Shared Address Programming, Message Passing and Data Parallel Processing. Such architectures are still in its infancy and though its concepts have been utilized in various approaches for increasing processing efficiency; the actual architectural implementation is not widely employed for the sake of parallel computing.

Two such architectures are Dataflow Architecture and Systolic Architecture.

Dataflow Architecture

Developments in Dataflow Architecture

Dataflow architectures (from a hardware standpoint) was an important research topic in the 1970s and early 1980s. Interest in the field has subsided in recent years due to the inability to resolve certain inherent problems in the Dataflow architectural model. To understand, this, the Dataflow concept is summarized below.


Summary of the Dataflow Idea

In its essence, a Dataflow architecture executes instructions based on whether the input arguments to the instructions are available. There is no program counter as present in a Von Neumann computer. To indicate the dependency of instructions, tags were used. When the tags contained simple memory addresses, the design was a static Dataflow machine. However, static designs couldn’t allow for multiple instances of a routine.

Dynamic Dataflow machines used Content-Addressable memory (CAM)(i.e. the tags were stored in memory) to solve this problem. The programs were loaded in the CAM. When the tagged operands of an instruction became available, the CAM would send it to an execution unit. After execution, the output data and it’s tags were sent back to the CAM (as a data token). The CAM would then execute the next instruction whose dependencies had been satisfied.

Since, the CAM could identify instructions whose tags were not dependant on any unexecuted instruction, parallelization was possible. There were however, major problems related to -

1) broadcasting the data tokens in a massively parallel system efficiently.

2) dispatching instruction tokens in a massively parallel system efficiently.

3) A real program had a huge number of dependencies. Building a CAM large enough for this proved difficult.

Due to these issues, new developments in this area were largely stagnant.


New Developments in the Dataflow Architectural Model

1) A subset of the Dataflow model - Out of order execution is widely used in present day architecture. It uses the conventional Von Neumann Architecture to run Execution Windows in the usual order. Within the Window however, instructions are run according to the Dataflow Paradigm thus enabling parallelization and efficient utilization of CPU cycles.


2)The WaveScalar Instruction Set Architecture and Execution Model, currently being developed at the University of Washington attempts at building an architectural model that can work with current Imperative languages.


3)Calstrom and Boden propose a packet instruction set computer (PISC) architecture which would employ dataflow architecture for network processors. They detailed a 40 gb/s network processor using Dataflow architecture.


4) Data flow architecture is also being evaluated for DSP processors as outlined in the referenced paper by Lee et al. Their proposed design was for a static Dataflow processor with 9 parallel processors for the multi-standard video codec processor.

Systolic Architecture

Emergence of Systolic Arrays

During the 1980s, tremendous technological advances were being made in the field of VLSI circuits, which led to the birth of several different concepts of highly parallel computing systems. One particular approach to such parallelism was using Systolic arrays, which constituted of a high number of processing elements known as cells, which were interconnected for local communication. This model utilizes a pipe-network arrangement of data processing units. Whenever a data object arrives, an operation is triggered in a data processing unit, unlike a central processing unit which has program counters. Several data streams are sent and received by a systolic array and several data counters are required to propagate data streams. This simple methodology facilitates parallelism.


Definition, Function and Applications

They are called systolic arrays; because data, similar to human blood circulation, circulates within the array and also may interact with other data. Some results that have been computed in certain cells are pumped into new cells as input, similar to the heart (Hence the analogy to a systolic mechanism of parallelism). H. T. Kung and Charles E. Leiserson invented the concept of systolic arrays being utilized as agents for parallel processing.

The Systolic Architecture is a mesh-like, interconnected set of processing elements, or cells, which interact with each other and pass on results that become inputs for the next computational time-step. Such proximity of high performance processors provide a highly required space efficiency, which is usually the main factor in parallel architecture. The structure of such systolic arrays may be non linear and multi-directional pathways may exist between processing elements. Even though this architecture is quite similar to Single-Instruction-Multiple-Data, unlike SIMD, each processing element is capable of processing a unique, different operation. The main feature of systolic architecture is that its highly specialized computations can be performed on simple parallel processors and localized communication patterns.

The systolic array is an architectural concept that is highly suited for very high integration technology. Communication between these mesh-connected processors occurs only amongst neighbors. Both control and data flow are local. Each processor has a communication register that can be read by any of its neighbors.


A Systolic array is a network of processing elements that has features like

• Modularity : each systolic array element is a module

• Synchrony : a global clock times the data through the network

• Locality (both spatial and temporal) : cells have local communication and a unit time delay

• Regularity : processing elements are homogeneously interconnected

• Parallel Computing and Pipelinability : architecture achieves high speed


The two main Systolic Architectural Features are

• Balancing Computation with I/O.

• Concurrency and Communication.


But other notable features of the architecture are its cost effectiveness, its regular & simple design, the modularity in arrays is quite high, the number of processors working simultaneously is high and also the local communication is fast – making overall execution time much faster.


Types of Systolic Arrays

1. Single Dimensional Input/Output Linear Array

       - This is usually utilized for single input/output
       


2. Two Dimensional Input/Output Linear Array

       - This allows for more input/output and control over a linear array
       


3. Perimeter Input/Output Planar Array

       - This topology allows input/output through its boundary or perimeter cells only
       
 

4. Three Dimensional Input/Output Focal Plane Array

       - This facilitates input/output to each and every systolic cell in the array
       	 


A few applications of Systolic Arrays

• Matrix Multiplication

• Solution of linear equations to problems in image processing

• Computer graphics

• Cryptography

• Computer tomography

• Matrix Inversion and Decomposition

• Convolution

• Polynomial Evaluation

• Lattice filters for seismic and speech signal processing

• Artificial Neural Networks


For a complete explanation on the invention of systolic arrays, utilize the following link : http://www.patentstorm.us/patents/4727503-fulltext.html


New Developments

iWarp

In 1989, the Intel Corporation launched the iWarp as a product and produced more than 1500 iWarp systems. The underlying architecture was based on systolic arrays and data streams being utilized for sending and receiving data between elements on the system. Even though the system was well suited for several applications, Intel stopped actively marketing the product in the mid-90s. It is still a useful tool for research in the parallel computing direction and it was one of the first successful steps towards true parallelism in the early ages of parallel computing.

Image Processing and Morphology

A two-dimensional systolic array consisting of cells can be implemented, based on parallel modules which provide internal pipeline operation where every cell can be configured according to a control word. Also, the array can be provided with a group of image buffers, that can reduce the number of accesses to data memory may also extend the array capabilities, allowing the possibility of chaining interconnection of multiple processing blocks.

Digital Signal Processors

Low cost, systolic architectural microprocessors utilizes a DSP systolic array, which makes it possible to build low cost ICs capable of providing high speed digital video services. This requires high processing capabilities, and parallelism is a requirement in such areas. Systolic architectures are better suited for such purposes as they are of much lower cost due to their space-efficiency and interconnectivity.

Tilera Tile64

Arranged in a two dimensional grid, the Tilera 64 is a tile that has 64 processing elements or cells which are interconnected through a five way switch and it provides a two level hierarchical cache. The architecture is completely based on systolic arrays and this sixty-four element multi-core processor is being released in September 2007. It is already predicted to be utilized extensively in the field of embedded systems for advanced networking, unified thread management, digital video, web services, etc.


Why Systolic Architecture has not truly evolved with time (to the extent of other architectures, is because)

1. Global Synchronization takes more time due to signal delays within the network

2. Bandwidth requirements are tremendous for both processor and memory

3. Fault tolerance is very poor as interconnection protocols are lacking

Also, a certain ‘parallel overhead’ point occurs in Systolic Arrays where beyond that point, adding more processors does not increase the efficiency of processing as it would normally be expected. There may even be a point beyond which adding more processors may lead to slower execution time for the architecture.


References

1. Ideas presented in the Wikipedia articles at - http://en.wikipedia.org/wiki/Dataflow_architecture and http://en.wikipedia.org/wiki/Out-of-order_execution

2. University of Washington WaveScalar ISA - http://wavescalar.cs.washington.edu/

3. Synchronous Dataflow Architecture for Network Processors - Jakob Calstrom and Thomas Boden http://portal.acm.org/citation.cfm?id=1032450 & http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&arnumber=1332593&isnumber=29428

4. Data flow processor for multi-standard video codec - Lee, B.W et al http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel2/3036/8629/00379757.pdf?isnumber=&arnumber=379757

5. Ideas presented in the Wikipedia article at - http://en.wikipedia.org/wiki/Systolic_array

6. Systolic Algorithms: Concepts, Synthesis and Evolution - Siang W.Song, University of Sau Paulo, Department Of Computer Science http://citeseer.ist.psu.edu/cache/papers/cs/30734/http:zSzzSzwww.ime.usp.brzSz~songzSzpaperszSzcimpa.pdf/systolic-algorithms-concepts-synthesis.pdf

7. Instruction Systolic Array (ISA) - H.W. Lang, FH Flensburg http://www.iti.fh-flensburg.de/lang/papers/isa/isa1.htm

8. Systolic Arrays : Patent Description http://www.patentstorm.us/patents/4727503-fulltext.html

9. iWarp http://www.cs.cmu.edu/~iwarp/

10. Tilera Tile64 http://www.tilera.com/products/processors.php

11. FPGA-Based Customizable Systolic Architecture for Image Processing Applications : Proceedings of the 2005 International Conference on Reconfigurable Computing and FPGAs (ReConFig'05) on Reconfigurable Computing and FPGAs - Griselda Saldana, Miguel Arias-Estrada (IEEE Computer Society)

12. Summary of Systolic Architectures (Slides) - Syeda Mohsina Afroze http://web.cecs.pdx.edu/~mperkows/temp/May13/070-Systolic-Processors.pdf