CSC/ECE 506 Fall 2007/wiki1 9 vr

From Expertiza_Wiki
Jump to navigation Jump to search

Definition

Array Processing is a CPU design concept which uses multiple interconnected processing elements to execute the same instruction on different data items at the same time. A control processor dispatches a single instruction stream to each of these processing elements containing a processor and a memory. Communication between the processing elements is achieved by interconnecting the nodes.

Vector Processing can be described as an alternative model for exploiting Instruction Level Parallelism (ILP) where multiple data elements contained in vector registers are processed using pipelined vector functional units. A vector register is a linear array of n-bit data elements each of a definite width. The pipelined units perform arithmetic operations on the data elements in parallel.

History

Flynn’s Taxonomy classifies computers based on the number of concurrent instructions and data streams available for execution. Array and Vector Processing come under a category called SIMD (Single Instruction Multiple Data), which means multiple data streams are processed for the same instruction.

During 1960’s, fear of performance stagnation pushed computer architects to look for smarter alternatives to increase the throughput. Daniel Slotnick, a Professor from Computer Science Department of University of Illinois, proposed a conceptual SIMD machine called “SOLOMON”, with 1024 1-bit processing elements each having a memory capacity for 128 32–bit values. The machine was never built but the design was a starting point for the advanced computer called ILLIAC-IV. It had 64 processing elements each with a memory capacity of 2,048 words of 64 bit length. These elements communicated with each other through an interconnect that resembled a ring. Each element was provided a direct data path to four other elements, its immediate right and left neighbors and the neighbors spaced eight elements away. This interconnection structure is wrapped around, so that PE 63(Processing Element 63)is directly connected to PE 0(Processing Element 0).

The first successful implementation of vector processing architecture was CDC STAR- 100, from Central Data Corporation. It used vector registers to hold multiple data elements. Vector architectures exhibited SIMD behavior by having operations that applied to all elements in a vector register.

These parallel computing architectures tried to exploit inherent data parallelism in programs.

Description

An array processor, usually has multiple processing elements each capable of performing arithmetic/logical operations and storing the result in its memory. Parallelism is achieved by operating on a stream of data rather than a single element. A control processor is responsible for fetching and broadcasting the instruction which will be executed by the PEs (Processing Elements). Array processing provides more throughput than serial computing.

For example a computation that takes an array of elements and performs some operation on it would require a serial computer to process one element at a time. However an array processor does this by distributing the array elements among the PEs. Each PE may be assigned an element in an array or a set of rows. The instruction dispatched by the control unit will be executed by the PE’s which communicate with each other. It is needless to say that this computing technique is suited for matrix multiplications and array operations which are extensively used in statistical analyzes, numerical linear algebra, numerical solution of partial differential equations and digital signal processing calculations.

Vector Processing as the name itself suggests uses vectors i.e. a series of values or elements rather than a scalar i.e. single value or an element. Vector Processors typically have

• Vector Registers

• Vector Functional Units

• Scalar Units with registers and data paths,

• Vector Load Store Units

• Interconnect which is used for communication.

Each vector register is capable of holding multiple data elements of a definite width. A typical system would have a number of such registers. The load/store units are responsible for fetching operands and writing the results into the memory. The pipelined functional units perform arithmetic and logical operations. All of these operate on a series of values which are either residing in the main memory or in the registers. Cray Y-MP, a supercomputer built by Cray Inc used vector processing to increase the performance.

Block diagram of CRAY1 supercomputer at http://research.microsoft.com/users/gbell/CrayTalk/sld063.htm.

Consider an example of multiplication of two arrays on a vector processor. The operands which are elements of the input arrays are loaded into two vector registers. The first elements of each of these vector registers are fed into a pipelined multiplication unit which performs the operation and stores the result in another vector register. All functional units are pipelined so that the overall execution time is low. The results are written back to the main memory using the load/store unit.

More information on the same at http://www.pcc.qub.ac.uk/tec/courses/cray/ohp/CRAY-slides_3.html.

New Trends in Vector and Array Processing

National Energy Research Scientific Computing Center (NERSC) at Berkeley, California has collaborations with the computer and computational science departments for several universities. This organization also ranks the most powerful supercomputers in the world based on Rmax (a benchmark from Linpack). The listings are available at http://www.top500.org. Horst D Simon, the Director of NERSC Center, gave a presentation on the trends in supercomputing in December 2003. As indicated in his presentations global climate modeling and earth simulators are few of the computationally intensive examples which need supercomputers. Earth Simulator, developed for Japan Aerospace Exploration Agency is a highly parallel vector supercomputer with 640 processor nodes connected by 640x640 single-stage crossbar switches. Each node consisted of 8 vector type arithmetic processors and 16 GB memory with a peak performance of 8Gflops per vector processor. It could run holistic simulations of atmosphere and oceans down to the resolution of 10 km.

Apart from the traditional problems of supercomputing, vector processing is finding applications in various other domains like :-

a. Gaming

Most of the code in gaming is a mixture of integer, floating point and vector calculations. This is best handled by a CPU with a vector unit. Dot products are very critical to games as they are used to find out vector lengths, projections and transformations. Vector processing is best suited for them.

Sony's third generation Play Stations called PS3 use Power PC based Cell processor having AltiVec vector processing units.

b. Image Processing

Image processing is a challenging domain: high computing power is required to calculate image adjustments in real time. These typically include processing of pixel arrays and performing mathematical transforms (like Fast Four Transforms) on them. Vector processing helps in accelerating the performance for such applications.

Apple Computers use the fourth generation of Power PC processors developed by Motorola ( MPC7400 series ) which incorporates the AltiVec model and is one of the widely used embedded processors in this realm.

c. Signal processing applications like RADAR and SONAR

These processors can be harnessed well for signal processing applications. SONAR and RADAR are computationally intensive embedded applications Vector processing architectures like AltiVec( from IBM, Motorola and Apple) deliver very well in the areas of graphics and multimedia which can stretch even the current super scalar CPUs.

Future

Computational simulation is one of the areas which require supercomputing. The scientific challenges like understanding, detecting and predicting the human influence on climate and modeling the earth system including atmosphere, ocean, land and their interactions require the aid of supercomputers.

Several companies like IBM, Cray Inc and SGI are doing pioneering research in supercomputing which continues to scale towards new heights.

Bibliography

Array and Vector Procesing

  • Parallel Computer Architecture: A Hardware/Software Approach by David Culler, J.P. Singh, Anoop Gupta.

Recent Trends