CSC/ECE 506 Spring 2011/ch2 JR: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
= | =Supplement to Chapter 2: The Data Parallel Programming Model= | ||
= History = | |||
As computer architectures have evolved, so have parallel programming models. The earliest advancements in parallel computers took advantage of bit-level parallelism. These computers used vector processing, which required a shared memory programming model. As performance returns from this architecture diminished, the emphasis was placed on instruction-level parallelism and the message passing model began to dominate. Most recently, with the move to cluster-based machines, there has been an increased emphasis on thread-level parallelism. This has corresponded to an increase interest in the data parallel programming model. | As computer architectures have evolved, so have parallel programming models. The earliest advancements in parallel computers took advantage of bit-level parallelism. These computers used vector processing, which required a shared memory programming model. As performance returns from this architecture diminished, the emphasis was placed on instruction-level parallelism and the message passing model began to dominate. Most recently, with the move to cluster-based machines, there has been an increased emphasis on thread-level parallelism. This has corresponded to an increase interest in the data parallel programming model. | ||
== Bit-level parallelism in the 1970's == | |||
The major performance improvements from computers during this time were due to the ability to execute 32-bit word size operations at one time. [Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta, pg 15]. The dominant supercomputers of the time, like the Cray and the ILLIAC IV, were mainly Single Instruction Multiple Data architectures and used a shared memory programming model. They each used different forms of vector processing [Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 21]. | The major performance improvements from computers during this time were due to the ability to execute 32-bit word size operations at one time. [Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta, pg 15]. The dominant supercomputers of the time, like the Cray and the ILLIAC IV, were mainly Single Instruction Multiple Data architectures and used a shared memory programming model. They each used different forms of vector processing [Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 21]. | ||
Development of the ILLIAC IV began in 1964 and wasn't finished until 1975 [CITE: http://en.wikipedia.org/wiki/ILLIAC_IV]. A central processor was connected to the main memory and delegated tasks to individual PE's, which each had their own memory cache. [http://archive.computerhistory.org/resources/text/Burroughs/Burroughs.ILLIAC%20IV.1974.102624911.pdf pg 4]. Each PE could operate either an 8-, 32- or 64-bit operand at a given time [http://archive.computerhistory.org/resources/text/Burroughs/Burroughs.ILLIAC%20IV.1974.102624911.pdf pg 4]. | Development of the ILLIAC IV began in 1964 and wasn't finished until 1975 [CITE: http://en.wikipedia.org/wiki/ILLIAC_IV]. A central processor was connected to the main memory and delegated tasks to individual PE's, which each had their own memory cache. [http://archive.computerhistory.org/resources/text/Burroughs/Burroughs.ILLIAC%20IV.1974.102624911.pdf pg 4]. Each PE could operate either an 8-, 32- or 64-bit operand at a given time [http://archive.computerhistory.org/resources/text/Burroughs/Burroughs.ILLIAC%20IV.1974.102624911.pdf pg 4]. | ||
Line 10: | Line 11: | ||
The Cray machine was installed at Los Alamos National Laborartory in1976 by Control Data Corporation and had similar performance to the ILLIAC IV [http://en.wikipedia.org/wiki/ILLIAC_IV]. The Cray machine relied heavily on the use of registers instead of individual processors like the ILLIAC IV. Each processor was connected to main memory and had a number of 64-bit registers used to perform operations [CITE: http://www.eecg.toronto.edu/~moshovos/ACA05/read/cray1.pdf pg 65]. | The Cray machine was installed at Los Alamos National Laborartory in1976 by Control Data Corporation and had similar performance to the ILLIAC IV [http://en.wikipedia.org/wiki/ILLIAC_IV]. The Cray machine relied heavily on the use of registers instead of individual processors like the ILLIAC IV. Each processor was connected to main memory and had a number of 64-bit registers used to perform operations [CITE: http://www.eecg.toronto.edu/~moshovos/ACA05/read/cray1.pdf pg 65]. | ||
== Move to instruction-level parallelism in the 1980's == | |||
Increasing the word size above 32-bits offered diminishing returns in terms of performance [CITE: Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 15]. In the mid-1980's the emphasis changed from bit-level parallelism to instruction-level parallelism, which involved increasing the number of instructions that could be executed at one time [CITE: Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 15]. The message passing model allowed programmers the ability to divide up instructions in order to take advantage of this architecture. | Increasing the word size above 32-bits offered diminishing returns in terms of performance [CITE: Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 15]. In the mid-1980's the emphasis changed from bit-level parallelism to instruction-level parallelism, which involved increasing the number of instructions that could be executed at one time [CITE: Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 15]. The message passing model allowed programmers the ability to divide up instructions in order to take advantage of this architecture. | ||
== Thread-level parallelism == | |||
The move to cluster-based machines in the past decade, has added another layer of complexity to parallelism. Since computers could be located across a network from each other, there is more emphasis on software acting as a bridge [CITE: http://cobweb.ecn.purdue.edu/~pplinux/ppcluster.html]. This has led to a greater emphasis on thread- or task-level parallelism [CITE: http://en.wikipedia.org/wiki/Thread-level_parallelism] and the addition of the data parallelism programming model to existing message passing or shared memory models [CITE: http://en.wikipedia.org/wiki/Thread-level_parallelism]. | The move to cluster-based machines in the past decade, has added another layer of complexity to parallelism. Since computers could be located across a network from each other, there is more emphasis on software acting as a bridge [CITE: http://cobweb.ecn.purdue.edu/~pplinux/ppcluster.html]. This has led to a greater emphasis on thread- or task-level parallelism [CITE: http://en.wikipedia.org/wiki/Thread-level_parallelism] and the addition of the data parallelism programming model to existing message passing or shared memory models [CITE: http://en.wikipedia.org/wiki/Thread-level_parallelism]. | ||
= Data Parallel Model = | |||
== Description and Example == | |||
== Comparison with Message Passing and Shared Memory == | |||
= Task Parallel Model = | |||
== Description and Example == | |||
= Data Parallel Model vs Task Parallel Model = | |||
= Definitions = | |||
= References = |
Revision as of 22:41, 29 January 2011
Supplement to Chapter 2: The Data Parallel Programming Model
History
As computer architectures have evolved, so have parallel programming models. The earliest advancements in parallel computers took advantage of bit-level parallelism. These computers used vector processing, which required a shared memory programming model. As performance returns from this architecture diminished, the emphasis was placed on instruction-level parallelism and the message passing model began to dominate. Most recently, with the move to cluster-based machines, there has been an increased emphasis on thread-level parallelism. This has corresponded to an increase interest in the data parallel programming model.
Bit-level parallelism in the 1970's
The major performance improvements from computers during this time were due to the ability to execute 32-bit word size operations at one time. [Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta, pg 15]. The dominant supercomputers of the time, like the Cray and the ILLIAC IV, were mainly Single Instruction Multiple Data architectures and used a shared memory programming model. They each used different forms of vector processing [Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 21]. Development of the ILLIAC IV began in 1964 and wasn't finished until 1975 [CITE: http://en.wikipedia.org/wiki/ILLIAC_IV]. A central processor was connected to the main memory and delegated tasks to individual PE's, which each had their own memory cache. pg 4. Each PE could operate either an 8-, 32- or 64-bit operand at a given time pg 4.
The Cray machine was installed at Los Alamos National Laborartory in1976 by Control Data Corporation and had similar performance to the ILLIAC IV [1]. The Cray machine relied heavily on the use of registers instead of individual processors like the ILLIAC IV. Each processor was connected to main memory and had a number of 64-bit registers used to perform operations [CITE: http://www.eecg.toronto.edu/~moshovos/ACA05/read/cray1.pdf pg 65].
Move to instruction-level parallelism in the 1980's
Increasing the word size above 32-bits offered diminishing returns in terms of performance [CITE: Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 15]. In the mid-1980's the emphasis changed from bit-level parallelism to instruction-level parallelism, which involved increasing the number of instructions that could be executed at one time [CITE: Parallel Computer Architecture: A Hardware/Software Approach (The Morgan Kaufmann Series in Computer Architecture and Design) by David Culler, J.P. Singh, and Anoop Gupta (Hardcover – Aug 15, 1998) pg 15]. The message passing model allowed programmers the ability to divide up instructions in order to take advantage of this architecture.
Thread-level parallelism
The move to cluster-based machines in the past decade, has added another layer of complexity to parallelism. Since computers could be located across a network from each other, there is more emphasis on software acting as a bridge [CITE: http://cobweb.ecn.purdue.edu/~pplinux/ppcluster.html]. This has led to a greater emphasis on thread- or task-level parallelism [CITE: http://en.wikipedia.org/wiki/Thread-level_parallelism] and the addition of the data parallelism programming model to existing message passing or shared memory models [CITE: http://en.wikipedia.org/wiki/Thread-level_parallelism].