CSC/ECE 506 Fall 2007/wiki1 9 arubha: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
'''''Array Processing''''' is a computer architectural concept that was first put to use in the early 1960s. As scientific computing evolved, the need to process large amounts of data using a common algorithm became important. Computers with an array of processing elements (PEs), controlled by a common [http://en.wikipedia.org/wiki/Control_unit  control unit] (CU) were built. The PEs were usually [http://en.wikipedia.org/wiki/Arithmetic_logic_unit ALUs], capable of performing simple mathematical operations. The [http://en.wikipedia.org/wiki/Central_processing_unit CPU] itself would perform the job of the CU.
'''''Array Processing''''' is a computer architectural concept that was first put to use in the early 1960s. As scientific computing evolved, the need to process large amounts of data using a common algorithm became important. Computers with an array of processing elements (PEs), controlled by a common [http://en.wikipedia.org/wiki/Control_unit  control unit] (CU) were built. The PEs were usually [http://en.wikipedia.org/wiki/Arithmetic_logic_unit ALUs], capable of performing simple mathematical operations. The [http://en.wikipedia.org/wiki/Central_processing_unit CPU] itself would perform the job of the CU.


As computer architectures evolved, a new concept called the '''''Vector processing''''' was developed during the 1970s. In ''vector processing'', a PE usually consists of a collection of functional units that operate on vectors of data. This greatly simplifies the interconnections and reduces data dependency, compared to ''array processing''.
As computer architectures evolved, a new concept called the '''''Vector processing''''' was developed during the 1970s. In ''vector processing'', a PE usually consists of a collection of functional units that operate on vectors of data.Since vector architectures provide instructions that perform the same operation on multiple sets of corresponding operands, they exploit ILP ( Instruction - Level Parallelism )to a large degree. This greatly simplifies the interconnections and reduces data dependency, compared to ''array processing''.
The basic idea of vectorization is to realize that when performing an operation for the first iteration of the loop, we might as well do the same operation for a bunch of following iterations, too. When we get to the bottom of the loop, we'll have accomplished the work which would require multiple passes on a scalar architecture.The concept of vectorization is extensively used in MATLAB . For more details, refer to the following link : [http://web.cecs.pdx.edu/~gerry/MATLAB/programming/performance.html MATLAB_VECTORS]
 
== Advantages of Vector Processing ==
* Pipelining is obviously applicable to vector instructions , since all the elemental operations are independant
* Once the compiler has done the  job of converting loops into vector form, the hardware retains lots of flexibility in executing the elemental operations as data become available
* Vectorised code posesses better readability as compared to software-pipelined VLIW code
* Vectorized code is optimised in order to fully exploit large processor bandwidths and interleaved memory systems
 
== Pitfalls ==
* Not every software loop may be amenable to vectorization . ( However, the data dependancy requirements for vectorization are weaker than those for full paralleization. )
* Some of the code will still have to run the normal scalar instruction set. So a vector machine still has to have a high-performance scalar microarchitecture with speculation,branch prediction, pipelining, multiple issue , etc.
 


[http://en.wikipedia.org/wiki/Vector_processor Vector processors] and array processors form the basic building blocks of some of the early and most successful supercomputers. Vector and array processing techniques are extensively used by applications like ocean mapping, [http://en.wikipedia.org/wiki/3D_modeling 3D modeling], molecular modeling, [http://en.wikipedia.org/wiki/Weather_forecasting weather forecasting], wind tunnel simulations. The Airbus [http://en.wikipedia.org/wiki/Airbus_A380 A380] project made use of the [http://www.nec.co.jp/hpc/sx-e/sx-world/no27/e10_11.pdf NEC SX-5], scalable vector processor architecture supercomputers, to run simulations and fine tune the design even before the aircraft's maiden flight.
[http://en.wikipedia.org/wiki/Vector_processor Vector processors] and array processors form the basic building blocks of some of the early and most successful supercomputers. Vector and array processing techniques are extensively used by applications like ocean mapping, [http://en.wikipedia.org/wiki/3D_modeling 3D modeling], molecular modeling, [http://en.wikipedia.org/wiki/Weather_forecasting weather forecasting], wind tunnel simulations. The Airbus [http://en.wikipedia.org/wiki/Airbus_A380 A380] project made use of the [http://www.nec.co.jp/hpc/sx-e/sx-world/no27/e10_11.pdf NEC SX-5], scalable vector processor architecture supercomputers, to run simulations and fine tune the design even before the aircraft's maiden flight.


== Past Trends ==
== Trends ==
The earliest array processors were used to operate on matrix-like data. The CU would load all the ALUs with a common instruction. The ALUs would get data inputs from a array of memory locations, containing different values from the matrix. This concept of using separate ALUs for each data element, but performing the same operation, is classifed as the Single-Instruction-Multiple-Data ([http://en.wikipedia.org/wiki/SIMD SIMD]) under the [http://en.wikipedia.org/wiki/Flynn's_Taxonomy Flynn Taxonomy].
The earliest array processors were used to operate on matrix-like data. The CU would load all the ALUs with a common instruction. The ALUs would get data inputs from a array of memory locations, containing different values from the matrix. This concept of using separate ALUs for each data element, but performing the same operation, is classifed as the Single-Instruction-Multiple-Data ([http://en.wikipedia.org/wiki/SIMD SIMD]) under the [http://en.wikipedia.org/wiki/Flynn's_Taxonomy Flynn Taxonomy].


Line 16: Line 28:


== Horizons ==
== Horizons ==
Unlike the past, vector architectures are now being targeted at environments other than supercomputing . As integration levels continue to increase, it becomes more feasible and attractive to put some processing power on the same die as memory circuits. The amount of bandwidth inside a memory chip is astounding, but very little of it makes it out to the pins. When processors are put on the same die as memory circuits, there'll be a premium put on die area (read: simple logic), heat dissipation (read: simple logic again), and bandwidth exploitation (read: ILP). Vectors are the natural solution .


== Links ==
== Links ==

Revision as of 21:03, 5 September 2007

Array Processing is a computer architectural concept that was first put to use in the early 1960s. As scientific computing evolved, the need to process large amounts of data using a common algorithm became important. Computers with an array of processing elements (PEs), controlled by a common control unit (CU) were built. The PEs were usually ALUs, capable of performing simple mathematical operations. The CPU itself would perform the job of the CU.

As computer architectures evolved, a new concept called the Vector processing was developed during the 1970s. In vector processing, a PE usually consists of a collection of functional units that operate on vectors of data.Since vector architectures provide instructions that perform the same operation on multiple sets of corresponding operands, they exploit ILP ( Instruction - Level Parallelism )to a large degree. This greatly simplifies the interconnections and reduces data dependency, compared to array processing. The basic idea of vectorization is to realize that when performing an operation for the first iteration of the loop, we might as well do the same operation for a bunch of following iterations, too. When we get to the bottom of the loop, we'll have accomplished the work which would require multiple passes on a scalar architecture.The concept of vectorization is extensively used in MATLAB . For more details, refer to the following link : MATLAB_VECTORS

Advantages of Vector Processing

  • Pipelining is obviously applicable to vector instructions , since all the elemental operations are independant
  • Once the compiler has done the job of converting loops into vector form, the hardware retains lots of flexibility in executing the elemental operations as data become available
  • Vectorised code posesses better readability as compared to software-pipelined VLIW code
  • Vectorized code is optimised in order to fully exploit large processor bandwidths and interleaved memory systems

Pitfalls

  • Not every software loop may be amenable to vectorization . ( However, the data dependancy requirements for vectorization are weaker than those for full paralleization. )
  • Some of the code will still have to run the normal scalar instruction set. So a vector machine still has to have a high-performance scalar microarchitecture with speculation,branch prediction, pipelining, multiple issue , etc.


Vector processors and array processors form the basic building blocks of some of the early and most successful supercomputers. Vector and array processing techniques are extensively used by applications like ocean mapping, 3D modeling, molecular modeling, weather forecasting, wind tunnel simulations. The Airbus A380 project made use of the NEC SX-5, scalable vector processor architecture supercomputers, to run simulations and fine tune the design even before the aircraft's maiden flight.

Trends

The earliest array processors were used to operate on matrix-like data. The CU would load all the ALUs with a common instruction. The ALUs would get data inputs from a array of memory locations, containing different values from the matrix. This concept of using separate ALUs for each data element, but performing the same operation, is classifed as the Single-Instruction-Multiple-Data (SIMD) under the Flynn Taxonomy.

The first implementation of a vector processing based computer system was the CDC STAR-100. It was developed by the Control Data Corporation (CDC) in the early 1970s and was capable of performing 100 million floating point operations (MFLOPS). The CDC STAR-100 combined scalar and vector computations. Though it was able to achieve a peak performance of 20 MFLOPS, when fully loaded, it's performance for real-life data sets was much lower.

The first system to fully exploit the vector processing architecture was the Cray-1. The Cray-1, again developed by CDC, was able to overcome some of the pitfalls encounterd during the STAR-100 project. The STAR-100 took a lot of time decoding vector instructions and also had to re-fetch data every time an instruction asked for it. The Cray-1 moved away from the memory-memory architecture and introduced a set of CPU registers which would not only pre-fetch frequently used data,but also successive instructions, thus introducing pipelining. This enabled Cray-1 to work on more flexible data-sets and also improved it's instruction fetch & decoding times. But the registers introduced a limit on the vector sizes and also made the system expensive. Even with the drawbacks, the Cray-1 managed to perform at 80 MFLOPS.

Following the Cray-1, several supercomputers based on vector processing principles, followed. Companies like Fujitsu with VP100 & VP200, Hitachi with S810 and NEC with SX series, entered the fray. Floating Point Systems (FPS) came up with a minicomputer with floating point co-processors, the AP-120B. It consisted of array coprocessors exclusively performing floating point operations. It made the conventional minicomputer faster for floating point operations and also less expensive. With advances in the technology, the computers based on vector and array processing models, kept getting faster, owing to faster clock speeds and faster switching gates.

Horizons

Unlike the past, vector architectures are now being targeted at environments other than supercomputing . As integration levels continue to increase, it becomes more feasible and attractive to put some processing power on the same die as memory circuits. The amount of bandwidth inside a memory chip is astounding, but very little of it makes it out to the pins. When processors are put on the same die as memory circuits, there'll be a premium put on die area (read: simple logic), heat dissipation (read: simple logic again), and bandwidth exploitation (read: ILP). Vectors are the natural solution .

Links

Bibliography