CSC/ECE 506 Fall 2007/wiki1 9 arubha: Difference between revisions

Revision as of 16:52, 5 September 2007

Array Processing is a computer architectural concept that was first put to use in the early 1960s. As scientific computing evolved, the need to process large amounts of data using a common algorithm became important. Computers with an array of processing elements (PEs), controlled by a common control unit (CU) were built. The PEs were usually ALUs, capable of performing simple mathematical operations. The CPU itself would perform the job of the CU.

As computer architectures evolved, a new concept called the Vector processing was developed during the 1970s. In vector processing, a PE usually consists of a collection of functional units that operate on vectors of data. This greatly simplifies the interconnections and reduces data dependency, compared to array processing.

Vector processors and array processors form the basic building blocks of some of the early and most successful supercomputers. Vector and array processing techniques are extensively used by applications like ocean mapping, 3D modeling, molecular modeling, weather forecasting, wind tunnel simulations. The Airbus A380 project made use of the NEC SX-5, scalable vector processor architecture supercomputers, to run simulations and fine tune the design even before the aircraft's maiden flight.

Trends

The first implementation of a vector processing based computre system was the CDC Star-100. It was developed by the Control Data Corporation (CDC) in the early 1970s and was capable of performing 100 million floating point operations (MFLOPS)

Vector processing was first worked on in the early 1960s at Westinghouse in their Solomon project. Solomon's goal was to dramatically increase math performance by using a large number of simple math co-processors (or ALUs) under the control of a single master CPU. The CPU fed a single common instruction to all of the ALUs, one per "cycle", but with a different data point for each one to work on. This allowed the Solomon machine to apply a single algorithm to a large data set, fed in the form of an array. In 1962 Westinghouse cancelled the project, but the effort was re-started at the University of Illinois as the ILLIAC IV. Their version of the design originally called for a 1 GFLOPS machine with 256 ALUs, but when it was finally delivered in 1972 it had only 64 ALUs and could reach only 100 to 150 MFLOPS. Nevertheless it showed that the basic concept was sound, and when used on data-intensive applications, such as computational fluid dynamics, the "failed" ILLIAC was the fastest machine in the world. It should be noted that the ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel computing.

The first successful implementation of vector processing appears to be the CDC STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC). The basic ASC (i.e., "one pipe") ALU used a pipeline architecture which supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient to support these expanded modes. The STAR was otherwise slower than CDC's own supercomputers like the CDC 7600, but at data related tasks they could keep up while being much smaller and less expensive. However the machine also took considerable time decoding the vector instructions and getting ready to run the process, so it required very specific data sets to work on before it actually sped anything up.

The vector technique was first fully exploited in the famous Cray-1. Instead of leaving the data in memory like the STAR and ASC, the Cray design had eight "vector registers" which held sixty-four 64-bit words each. The vector instructions were applied between registers, which is much faster than talking to main memory. In addition the design had completely separate pipelines for different instructions, for example, addition/subtraction was implemented in different hardware than multiplication. This allowed a batch of vector instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even today.

Other examples followed. CDC tried to re-enter the high-end market again with its ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the supercomputing field entirely. Various Japanese companies (Fujitsu, Hitachi and NEC) introduced register-based vector machines similar to the Cray-1, typically being slightly faster and much smaller. Oregon-based Floating Point Systems (FPS) built add-on array processors for minicomputers, later building their own minisupercomputers. However Cray continued to be the performance leader, continually beating the competition with a series of machines that led to the Cray-2, Cray X-MP and Cray Y-MP. Since then the supercomputer market has focused much more on massively parallel processing rather than better implementations of vector processors. However, recognizing the benefits of vector processing IBM developed Virtual Vector Architecture for use in supercomputers coupling several scalar processors to act as a vector processor.

Today the average computer at home crunches as much data watching a short QuickTime video as did all of the supercomputers in the 1970s. Vector processor elements have since been added to almost all modern CPU designs, although they are typically referred to as SIMD. In these implementations the vector processor runs beside the main scalar CPU, and is fed data from programs that know it is there.

Horizons

Links

Bibliography

Parallel computer architecture : a hardware/software approach / David E. Culler, Jaswinder Pal Singh, with Anoop Gupta. ISBN:1558603433
Lecture notes by Professor David A. Patterson http://www.cs.berkeley.edu/~pattrsn/252S98/Lec07-vector.pdf
Lecture notes by Professor David E. Culler http://www.cs.berkeley.edu/~culler/cs252-s02/slides/Lec20-vector.pdf

CSC/ECE 506 Fall 2007/wiki1 9 arubha: Difference between revisions

Revision as of 16:52, 5 September 2007

Contents

Trends

Horizons

Links

Bibliography

Navigation menu

@@ Line 6: / Line 6: @@
 == Trends ==
+The first implementation of a vector processing based computre system was the [http://en.wikipedia.org/wiki/CDC_STAR-100 CDC Star-100]. It was developed by the [http://en.wikipedia.org/wiki/Control_Data_Corporation Control Data Corporation] (CDC) in the early 1970s and was capable of performing 100 million floating point operations ([http://en.wikipedia.org/wiki/MFLOPS MFLOPS])
+Vector processing was first worked on in the early 1960s at Westinghouse in their Solomon project. Solomon's goal was to dramatically increase math performance by using a large number of simple math co-processors (or ALUs) under the control of a single master CPU. The CPU fed a single common instruction to all of the ALUs, one per "cycle", but with a different data point for each one to work on. This allowed the Solomon machine to apply a single algorithm to a large data set, fed in the form of an array. In 1962 Westinghouse cancelled the project, but the effort was re-started at the University of Illinois as the ILLIAC IV. Their version of the design originally called for a 1 GFLOPS machine with 256 ALUs, but when it was finally delivered in 1972 it had only 64 ALUs and could reach only 100 to 150 MFLOPS. Nevertheless it showed that the basic concept was sound, and when used on data-intensive applications, such as computational fluid dynamics, the "failed" ILLIAC was the fastest machine in the world. It should be noted that the ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel computing.
+The first successful implementation of vector processing appears to be the CDC STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC). The basic ASC (i.e., "one pipe") ALU used a pipeline architecture which supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient to support these expanded modes. The STAR was otherwise slower than CDC's own supercomputers like the CDC 7600, but at data related tasks they could keep up while being much smaller and less expensive. However the machine also took considerable time decoding the vector instructions and getting ready to run the process, so it required very specific data sets to work on before it actually sped anything up.
+The vector technique was first fully exploited in the famous Cray-1. Instead of leaving the data in memory like the STAR and ASC, the Cray design had eight "vector registers" which held sixty-four 64-bit words each. The vector instructions were applied between registers, which is much faster than talking to main memory. In addition the design had completely separate pipelines for different instructions, for example, addition/subtraction was implemented in different hardware than multiplication. This allowed a batch of vector instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even today.
+Other examples followed. CDC tried to re-enter the high-end market again with its ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the supercomputing field entirely. Various Japanese companies (Fujitsu, Hitachi and NEC) introduced register-based vector machines similar to the Cray-1, typically being slightly faster and much smaller. Oregon-based Floating Point Systems (FPS) built add-on array processors for minicomputers, later building their own minisupercomputers. However Cray continued to be the performance leader, continually beating the competition with a series of machines that led to the Cray-2, Cray X-MP and Cray Y-MP. Since then the supercomputer market has focused much more on massively parallel processing rather than better implementations of vector processors. However, recognizing the benefits of vector processing IBM developed Virtual Vector Architecture for use in supercomputers coupling several scalar processors to act as a vector processor.
+Today the average computer at home crunches as much data watching a short QuickTime video as did all of the supercomputers in the 1970s. Vector processor elements have since been added to almost all modern CPU designs, although they are typically referred to as SIMD. In these implementations the vector processor runs beside the main scalar CPU, and is fed data from programs that know it is there.
 == Horizons ==

CSC/ECE 506 Fall 2007/wiki1 9 arubha: Difference between revisions

Revision as of 16:52, 5 September 2007

Trends

Horizons

Links

Bibliography

Navigation menu

Search