CSC/ECE 506 Fall 2007/wiki1 4 la: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 5: Line 5:
The textbook discusses that up to 1986, advancement in microprocessors were dominated by ''bit-level parallelism''. It started with 4-bit datapaths, followed by 8-bit, 16-bit and 32-bit wide datapaths. In server design, the norm has been established to be at 64-bit since the start of the millennium. A 128-bit datapath is rarely mentioned to be used in microprocessors. However, graphics processors have been using 128-bit and 256-bit wide datapaths and it is possible to see an increase to 512-bit wide datapaths soon, especially with the advancements in computer graphics, animations and gaming.
The textbook discusses that up to 1986, advancement in microprocessors were dominated by ''bit-level parallelism''. It started with 4-bit datapaths, followed by 8-bit, 16-bit and 32-bit wide datapaths. In server design, the norm has been established to be at 64-bit since the start of the millennium. A 128-bit datapath is rarely mentioned to be used in microprocessors. However, graphics processors have been using 128-bit and 256-bit wide datapaths and it is possible to see an increase to 512-bit wide datapaths soon, especially with the advancements in computer graphics, animations and gaming.


''Instruction-level parallelism'' took off as advancements in ''bit-level parallelism'' receded. After all, the benefits possible by advancements in ''bit-level parallelism'' are limited to the ability to address more storage space and the ability to do more in a single cycle. The latter benefit has been limited to more precise floating point calculation although some microprocessors have the ability to bundle a couple of instructions into one.
''[http://pg.ece.ncsu.edu/mediawiki/index.php/CSC/ECE_506_Fall_2007/wiki1_4_la Instruction-level parallelism]'' took off as advancements in ''bit-level parallelism'' receded. After all, the benefits possible by advancements in ''bit-level parallelism'' are limited to the ability to address more storage space and the ability to do more in a single cycle. The latter benefit has been limited to more precise floating point calculation although some microprocessors have the ability to bundle a couple of instructions into one.


The period within the 1980's and 1990's indeed set the stage for the modern microprocessor. Superscalar microprocessors were created, which encompassed branch predictors, out-of-order execution, deeper and larger levels of cache on chip, cache coherency protocols and the ability to communicate with other microprocessors on chip. Research done in the 1990's and early 2000's set the stage for the next level of parallelism to be exploited: ''thread-level parallelism.''
The period within the 1980's and 1990's indeed set the stage for the modern microprocessor. [http://en.wikipedia.org/wiki/Superscalar Superscalar] microprocessors were created, which encompassed [http://en.wikipedia.org/wiki/Branch_prediction branch predictors], [http://en.wikipedia.org/wiki/Out_of_order_execution out-of-order execution], deeper and larger levels of [http://en.wikipedia.org/wiki/Cache cache] on chip, [http://en.wikipedia.org/wiki/Cache_coherency cache coherency] protocols and the ability to communicate with other microprocessors on chip. Research done in the 1990's and early 2000's set the stage for the next level of parallelism to be exploited: ''[http://en.wikipedia.org/wiki/Thread_level_parallelism thread-level parallelism].''


Two technologies appeard in the 2000's that altered the microprocessor performance race. The first is ''Multiple Cores'' on chip and the second is ''Simultaneous Multi-Threading (SMT)'' (also known as ''Hyper-Threading''). Industry refrained at this point from using the clock speed as the performance metric since a microprocessor encompassed many more intertwined technologies than merely speeding up the clock cycle. The industry has seen two cores on a single chip. Then, it saw cores taking advantage of SMT. The number of cores and the number of threads exploited in a microprocessor are ever increasing. Both core and thread technologies are increasing in the number of threads they are able to support. Dual core processors and dual thread processors are already in existence with the promise to merge both technologies so each core can support two threads. There are microprocessors in existence today with four and eight cores, with the promise of sixteen cores on a single chip to be made in a matter of months.
Two technologies appeard in the 2000's that altered the microprocessor performance race. The first is ''[http://en.wikipedia.org/wiki/Multi-core_%28computing%29 Multi-Core]'' on chip and the second is ''[http://en.wikipedia.org/wiki/Simultaneous_multithreading Simultaneous Multi-Threading (SMT)]'' (also known as ''[http://en.wikipedia.org/wiki/Hyper_threading Hyper-Threading]''). Industry refrained at this point from using the [http://en.wikipedia.org/wiki/Clock_speed clock speed] as the performance metric since a microprocessor encompassed many more intertwined technologies than merely speeding up the clock cycle. The industry has seen two cores on a single chip. Then, it saw cores taking advantage of SMT. The number of cores and the number of threads exploited in a microprocessor are ever increasing. Both core and thread technologies are increasing in the number of threads they are able to support. Dual core processors and dual thread processors are already in existence with the promise to merge both technologies so each core can support two threads. There are microprocessors in existence today with four and eight cores. It is foreseen to see sixteen cores on a single chip in a matter of months.


===Clock Speed and Parallelism===
===Clock Speed and Parallelism===


In the PC system world, throughout the 1990's and early 2000's, increasing chip clock speed was the standard way to increase system performance. Desktop processors topped 1GHz clock speeds in 2000, 2GHz in 2001, and topped 3GHz in 2002. But, due to power demands and heat concerns, this trend has since been discontinued. Design obstacles, especially in laptop computers, meant that other methods had to be pursued in order to increase processing power without losing efficiency. The ''Multi-Core'' era was then introduced to the PC world. In the spring of 2005, dual-core chips were introduced by Intel and then by AMD.  Quad-core processors have reached the market, and octal-cores may hit the market by 2009.
In the PC system world, throughout the 1990's and early 2000's, increasing chip clock speed was the standard way to increase system performance. Desktop processors topped 1GHz clock speeds in 2000, 2GHz in 2001, and topped 3GHz in 2002. But, due to power demands and heat concerns, this trend has since been discontinued. Design obstacles, especially in laptop computers, meant that other methods had to be pursued in order to increase processing power without losing efficiency. The ''Multi-Core'' era was then introduced to the PC world. In the spring of 2005, dual-core chips were introduced by [http://en.wikipedia.org/wiki/Intel Intel] and then by [http://en.wikipedia.org/wiki/Amd AMD].  Quad-core processors have reached the market, and octal-cores may hit the market by 2009.


In 2002, Intel released the Itanium microprocessor, which takes advantage of explicit ''instruction-level parallelism''. The compiler makes decisions about which instructions to execute in parallel, allowing the processor to execute up to six instructions per clock cycle. Although the original (and several subsequent) Itanium processors contained a single core, in 2006, Intel released an Itanium dual core microprocessor. The future of the Itanium family will follow the trend of most other microprocessors, in that ''thread-level parallelism'' will be exploited via multi-core chips.
In 2002, Intel released the [http://en.wikipedia.org/wiki/Itanium Itanium] microprocessor, which takes advantage of explicit ''instruction-level parallelism''. The compiler makes decisions about which instructions to execute in parallel, allowing the processor to execute up to six instructions per clock cycle. Although the original (and several subsequent) Itanium processors contained a single core, in 2006, Intel released an Itanium dual core microprocessor. The future of the Itanium family will follow the trend of most other microprocessors, in that ''thread-level parallelism'' will be exploited via multi-core chips.


===Silicon Technologies===
===Silicon Technologies===


In 1998, IBM announced its first PowerPC microprocessor which was designed using copper wiring. IBM claimed that its performance was boosted by up to a third by utilizing that technology. In 2004, it announced developing chips utilizing the Silicon-On-Insulator (SOI) technology, which saved significant amount of power. Finally in 2007, Intel and IBM announced recently that they were able to produce a high-K material and electrode metals (instead of polysilicon) that will enable the mass production of chips in 45nm technology. Dual core and dual threaded microprocessors have already been designed in 65nm technology. Designing microprocessors in 45nm technology will enable adding more cores and cache to the chip among other features. Coupled with the technologies mentioned earlier, performance will increase and power consumption will be kept at bay.
In 1998, IBM announced its first [http://en.wikipedia.org/wiki/Powerpc PowerPC] microprocessor which was designed using copper wiring. IBM claimed that its performance was boosted by up to a third by utilizing that technology. In 2004, it announced developing chips utilizing the [http://en.wikipedia.org/wiki/Silicon_on_insulator Silicon-On-Insulator (SOI)] technology, which saved significant amount of power. Finally in 2007, Intel and IBM announced recently that they were able to produce a [http://en.wikipedia.org/wiki/High-k_dielectric high-K] material and electrode metals (instead of [http://en.wikipedia.org/wiki/Polysilicon polysilicon]) that will enable the mass production of chips in 45nm technology. Dual core and dual threaded microprocessors have already been designed in 65nm technology. Designing microprocessors in 45nm technology will enable adding more cores and cache to the chip among other features. Coupled with the technologies mentioned earlier, performance will increase and power consumption will be kept at bay continuing the legacy of [http://en.wikipedia.org/wiki/Moore%27s_law Moore's Law].


==System Design Trends==
==System Design Trends==
Line 25: Line 25:
===PC Direction===
===PC Direction===


The number of supported processors in a computer is ever increasing. Since mid 2000's, the norm has increasingly been to support more than one processor in a desktop computer (with laptops following closely behind). Intel and AMD are in a constant race to provide a stronger chip which provides higher performance (with multiple cores) and higher bandwidth (with faster electrical signaling, wider datapaths, pipelined protocols, multiple paths and software support).
The number of supported microprocessors in a computer is ever increasing. Since mid 2000's, the norm has increasingly been to support more than one processor in a desktop computer (with laptops following closely behind). Intel and AMD are in a constant race to provide a stronger chip which provides higher performance (with multiple cores) and higher bandwidth (with faster electrical signaling, wider datapaths, pipelined protocols, multiple paths and software support).


===Server Direction===
===Server Direction===
Line 35: Line 35:
<center>Figure 1. Number of processors in fully configured commercial bus-based share memory multiprocessors.</center>
<center>Figure 1. Number of processors in fully configured commercial bus-based share memory multiprocessors.</center>


A different class of servers is emerging which is neither an SMP or a cluster. It is called ccNUMA. ccNUMA servers utilize Cache-Coherent Non-Uniform Memory Access. Such servers provide better memory access time to local memory. However, the different copies of the same data are kept up to date through cache-coherency protocols. Such technology is being supported by Intel and AMD. Another server manufacturer supporting this technology is SGI, with its Origin 350 server supporting up to 32 microprocessors.
A different class of servers is emerging which is neither an SMP or a cluster. It is called [http://en.wikipedia.org/wiki/Ccnuma#Cache_coherent_NUMA_.28ccNUMA.29 ccNUMA]. ccNUMA servers utilize Cache-Coherent Non-Uniform Memory Access. Such servers provide better memory access time to local memory. However, the different copies of the same data are kept up to date through cache-coherency protocols. Such technology is being supported by Intel and AMD. Another server manufacturer supporting this technology is [http://en.wikipedia.org/wiki/Silicon_Graphics SGI], with its [http://en.wikipedia.org/wiki/SGI_Origin_350 Origin 350] server supporting up to 32 microprocessors.


===Shared Memory Bus Direction===
===Shared Memory Bus Direction===


As processors become faster, and more and more processors (all sharing a common bus) are added to a system, the bandwidth of the bus becomes ever more critical. As shown in Figure 2, the shared bus bandwidth of commercial multiprocessors has increased with time. Various technologies and techniques have been implemented to increase bus bandwidth, such as faster electrical signaling, wider datapaths, pipelined protocols, and multiple paths. In 2001, a bidirectional serial/parallel high-bandwidth, low-latency point to point link called HyperTransport (HT) was introduced. HT runs from 200 MHz to 2.6 GHz.  It is used in many processors and in high-performance computing. HT has also been used as an interconnect for NUMA multiprocessor systems (see above).
As microprocessors become faster, and more and more microprocessors (all sharing a common bus) are added to a system, the [http://en.wikipedia.org/wiki/Bandwidth bandwidth] of the bus becomes ever more critical. As shown in Figure 2, the shared bus bandwidth of commercial multiprocessors has increased with time. Various technologies and techniques have been implemented to increase bus bandwidth, such as faster electrical signaling, wider datapaths, pipelined protocols, and multiple paths. In 2001, a bidirectional serial/parallel high-bandwidth, low-latency point to point link called [http://en.wikipedia.org/wiki/Hyper_transport HyperTransport (HT)] was introduced. HT runs from 200 MHz to 2.6 GHz.  It is used in many processors and in high-performance computing. HT has also been used as an interconnect for NUMA multiprocessor systems (see above).


Techniques have also been implemented to alleviate the strain put on the bus. With the Pentium III, Intel introduced an instruction designed to reduce bus contention. This is called the PAUSE instructions, which eliminates the bus transactions that occur when spin lock code repeatedly tries to test and set a memory location.
Techniques have also been implemented to alleviate the strain put on the bus. With the [http://en.wikipedia.org/wiki/Pentium_3 Pentium III], Intel introduced an instruction designed to reduce bus contention. This is called the PAUSE instructions, which eliminates the bus transactions that occur when spin lock code repeatedly tries to test and set a memory location.


<center>http://upload.wikimedia.org/wikipedia/commons/5/5e/Bandwidth.JPG</center>
<center>http://upload.wikimedia.org/wikipedia/commons/5/5e/Bandwidth.JPG</center>

Revision as of 06:14, 9 September 2007

Update section 1.1.3: Architectural Trends

Microprocessor Design Trends

The textbook discusses that up to 1986, advancement in microprocessors were dominated by bit-level parallelism. It started with 4-bit datapaths, followed by 8-bit, 16-bit and 32-bit wide datapaths. In server design, the norm has been established to be at 64-bit since the start of the millennium. A 128-bit datapath is rarely mentioned to be used in microprocessors. However, graphics processors have been using 128-bit and 256-bit wide datapaths and it is possible to see an increase to 512-bit wide datapaths soon, especially with the advancements in computer graphics, animations and gaming.

Instruction-level parallelism took off as advancements in bit-level parallelism receded. After all, the benefits possible by advancements in bit-level parallelism are limited to the ability to address more storage space and the ability to do more in a single cycle. The latter benefit has been limited to more precise floating point calculation although some microprocessors have the ability to bundle a couple of instructions into one.

The period within the 1980's and 1990's indeed set the stage for the modern microprocessor. Superscalar microprocessors were created, which encompassed branch predictors, out-of-order execution, deeper and larger levels of cache on chip, cache coherency protocols and the ability to communicate with other microprocessors on chip. Research done in the 1990's and early 2000's set the stage for the next level of parallelism to be exploited: thread-level parallelism.

Two technologies appeard in the 2000's that altered the microprocessor performance race. The first is Multi-Core on chip and the second is Simultaneous Multi-Threading (SMT) (also known as Hyper-Threading). Industry refrained at this point from using the clock speed as the performance metric since a microprocessor encompassed many more intertwined technologies than merely speeding up the clock cycle. The industry has seen two cores on a single chip. Then, it saw cores taking advantage of SMT. The number of cores and the number of threads exploited in a microprocessor are ever increasing. Both core and thread technologies are increasing in the number of threads they are able to support. Dual core processors and dual thread processors are already in existence with the promise to merge both technologies so each core can support two threads. There are microprocessors in existence today with four and eight cores. It is foreseen to see sixteen cores on a single chip in a matter of months.

Clock Speed and Parallelism

In the PC system world, throughout the 1990's and early 2000's, increasing chip clock speed was the standard way to increase system performance. Desktop processors topped 1GHz clock speeds in 2000, 2GHz in 2001, and topped 3GHz in 2002. But, due to power demands and heat concerns, this trend has since been discontinued. Design obstacles, especially in laptop computers, meant that other methods had to be pursued in order to increase processing power without losing efficiency. The Multi-Core era was then introduced to the PC world. In the spring of 2005, dual-core chips were introduced by Intel and then by AMD. Quad-core processors have reached the market, and octal-cores may hit the market by 2009.

In 2002, Intel released the Itanium microprocessor, which takes advantage of explicit instruction-level parallelism. The compiler makes decisions about which instructions to execute in parallel, allowing the processor to execute up to six instructions per clock cycle. Although the original (and several subsequent) Itanium processors contained a single core, in 2006, Intel released an Itanium dual core microprocessor. The future of the Itanium family will follow the trend of most other microprocessors, in that thread-level parallelism will be exploited via multi-core chips.

Silicon Technologies

In 1998, IBM announced its first PowerPC microprocessor which was designed using copper wiring. IBM claimed that its performance was boosted by up to a third by utilizing that technology. In 2004, it announced developing chips utilizing the Silicon-On-Insulator (SOI) technology, which saved significant amount of power. Finally in 2007, Intel and IBM announced recently that they were able to produce a high-K material and electrode metals (instead of polysilicon) that will enable the mass production of chips in 45nm technology. Dual core and dual threaded microprocessors have already been designed in 65nm technology. Designing microprocessors in 45nm technology will enable adding more cores and cache to the chip among other features. Coupled with the technologies mentioned earlier, performance will increase and power consumption will be kept at bay continuing the legacy of Moore's Law.

System Design Trends

PC Direction

The number of supported microprocessors in a computer is ever increasing. Since mid 2000's, the norm has increasingly been to support more than one processor in a desktop computer (with laptops following closely behind). Intel and AMD are in a constant race to provide a stronger chip which provides higher performance (with multiple cores) and higher bandwidth (with faster electrical signaling, wider datapaths, pipelined protocols, multiple paths and software support).

Server Direction

Figure 1 shows the number of processors that have been supported in a shared bus this decade. A commonality between the technology appearing this decade and in the last decade is that servers at these times supported either a single core or a dual core microprocessor. The industry has been inching towards supporting 100 microprocessors on a single shared bus. Because the bus has a fixed bandwidth, such an approach was bound to reach a dead end if new levels of indirection were not exploited. Indeed, new technologies have made supporting more microprocessors on a shared bus more feasible. Among these technologies are multiple cores per chip, deeper levels of caching and better addressing schemes. Consider a microprocessor with multiple cores as a node. Nodes communicate, and it is left up to the microprocessor to arbitrate between cores, thus relieving the shared bus from this addressing strain. With the constant improvements in multiple core support within a chip, it is possible to see servers with over two hundred cores as soon as this decade.

http://upload.wikimedia.org/wikipedia/commons/3/32/Procs.JPG


Figure 1. Number of processors in fully configured commercial bus-based share memory multiprocessors.

A different class of servers is emerging which is neither an SMP or a cluster. It is called ccNUMA. ccNUMA servers utilize Cache-Coherent Non-Uniform Memory Access. Such servers provide better memory access time to local memory. However, the different copies of the same data are kept up to date through cache-coherency protocols. Such technology is being supported by Intel and AMD. Another server manufacturer supporting this technology is SGI, with its Origin 350 server supporting up to 32 microprocessors.

Shared Memory Bus Direction

As microprocessors become faster, and more and more microprocessors (all sharing a common bus) are added to a system, the bandwidth of the bus becomes ever more critical. As shown in Figure 2, the shared bus bandwidth of commercial multiprocessors has increased with time. Various technologies and techniques have been implemented to increase bus bandwidth, such as faster electrical signaling, wider datapaths, pipelined protocols, and multiple paths. In 2001, a bidirectional serial/parallel high-bandwidth, low-latency point to point link called HyperTransport (HT) was introduced. HT runs from 200 MHz to 2.6 GHz. It is used in many processors and in high-performance computing. HT has also been used as an interconnect for NUMA multiprocessor systems (see above).

Techniques have also been implemented to alleviate the strain put on the bus. With the Pentium III, Intel introduced an instruction designed to reduce bus contention. This is called the PAUSE instructions, which eliminates the bus transactions that occur when spin lock code repeatedly tries to test and set a memory location.

http://upload.wikimedia.org/wikipedia/commons/5/5e/Bandwidth.JPG


Figure 2. Bandwidth of the shared memory bus in commercial multiprocessors.

References

Culler DE, Singh JP, Gupta A. Parallel Computer Architecture: A Hardware/Software Approach. San Francisco, CA: Morgan Kaufmann Publishers, Inc., 1999.
http://compoundsemiconductor.net/articles/news/11/1/25
http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power/ppc_arch.html
http://www-05.ibm.com/se/news/sv/2007/05/power-timeline.html
http://www.demandtech.com/Resources/Papers/Multiprocessor%20scalability.pdf
http://www.endian.net/details.aspx?ItemNo=655
http://www.hypertransport.org/
http://www.mbipr.com/whitepaper5.pdf
http://www.sgi.com/products/remarketed/offering.html
http://www.sun.com/processors/
http://www.theinquirer.net/?article=9235