CSC/ECE 506 Fall 2007/wiki1 4 la: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 39: Line 39:
===Shared Memory Bus Direction===
===Shared Memory Bus Direction===


As processors become faster, and more and more processors (all sharing a common bus) are added to a system, the bandwidth of the bus becomes ever more critical. As shown in Figure 2, the shared bus bandwidth of commercial multiprocessors has increased with time. Various technologies and techniques have been implemented to increase bus bandwidth, such as faster electrical signaling, wider datapaths, pipelined protocols, and multiple paths.
As processors become faster, and more and more processors (all sharing a common bus) are added to a system, the bandwidth of the bus becomes ever more critical. As shown in Figure 2, the shared bus bandwidth of commercial multiprocessors has increased with time. Various technologies and techniques have been implemented to increase bus bandwidth, such as faster electrical signaling, wider datapaths, pipelined protocols, and multiple paths. In 2001, a bidirectional serial/parallel high-bandwidth, low-latency point to point link called HyperTransport (HT) was introduced. HT is used in many processors and high-performance computing. HT has also been used as an interconnect for NUMA multiprocessor systems (see above).


Techniques have also been implemented to alleviate the strain put on the bus. With the Pentium III, Intel introduced an instruction designed to reduce bus contention. This is called the PAUSE instructions, which eliminates the bus transactions that occur when spin lock code repeatedly tries to test and set a memory location.
Techniques have also been implemented to alleviate the strain put on the bus. With the Pentium III, Intel introduced an instruction designed to reduce bus contention. This is called the PAUSE instructions, which eliminates the bus transactions that occur when spin lock code repeatedly tries to test and set a memory location.

Revision as of 03:17, 6 September 2007

Update section 1.1.3: Architectural Trends

Microprocessor Design Trends

The textbook discusses that up to 1986, advancement in microprocessors were dominated by bit-level parallelism. It started with 4-bit datapaths, followed by 8-bit, 16-bit and 32-bit wide datapaths. In server design, the norm has been established to be at 64-bit since the start of the millennium. A 128-bit datapath is rarely mentioned to be used in microprocessors. However, graphics processors have been using 128-bit and 256-bit wide datapaths and it is possible to see an increase to 512-bit wide datapaths soon, especially with the advancements in computer graphics, animations and gaming.

Instruction-level parallelism took off as advancements in bit-level parallelism receded. After all, the benefits possible by advancements in bit-level parallelism are limited to the ability to address more storage space and the ability to do more in a single cycle. The latter benefit has been limited to more precise floating point calculation although some microprocessors have the ability to bundle a couple of instructions into one.

The period within the 1980's and 1990's indeed set the stage for the modern microprocessor. Superscalar microprocessors were created, which encompassed branch predictors, out-of-order execution, deeper and larger levels of cache on chip, cache coherency protocols and the ability to communicate with other microprocessors on chip. Research done in the 1990's and early 2000's set the stage for the next level of parallelism to be exploited: thread-level parallelism.

Two technologies appeard in the 2000's that altered the microprocessor performance race. The first is Multiple Cores on chip and the second is Simultaneous Multi-Threading (SMT) (also known as Hyper-Threading). Industry refrained at this point from using the clock speed as the performance metric since a microprocessor encompassed many more intertwined technologies than merely speeding up the clock cycle. The industry has seen two cores on a single chip. Then, it saw cores taking advantage of SMT. The number of cores and the number of threads exploited in a microprocessor are ever increasing. Both core and thread technologies are increasing in the number of threads they are able to support. Dual core processors and dual thread processors are already in existence with the promise to merge both technologies so each core can support two threads. There are microprocessors in existence today with four and eight cores, with the promise of sixteen cores on a single chip to be made in a matter of months.

Clock Speed and Parallelism

In the PC system world, throughout the 1990's and early 2000's, increasing chip clock speed was the standard way to increase system performance. Desktop processors topped 1GHz clock speeds in 2000, 2GHz in 2001, and topped 3GHz in 2002. But, due to power demands and heat concerns, this trend has since been discontinued. Design obstacles, especially in laptop computers, meant that other methods had to be pursued in order to increase processing power without losing efficiency. The Multi-Core era was then introduced to the PC world. In the spring of 2005, dual-core chips were introduced by Intel and then by AMD. Quad-core processors have reached the market, and octal-cores may hit the market by 2009.

In 2002, Intel released the Itanium microprocessor, which takes advantage of explicit instruction-level parallelism. The compiler makes decisions about which instructions to execute in parallel, allowing the processor to execute up to six instructions per clock cycle. Although the original (and several subsequent) Itanium processors contained a single core, in 2006, Intel released an Itanium dual core microprocessor. The future of the Itanium family will follow the trend of most other microprocessors, in that thread-level parallelism will be exploited via multi-core chips.

Silicon Technologies

In 1998, IBM announced its first PowerPC microprocessor which was designed using copper wiring. IBM claimed that its performance was boosted by up to a third by utilizing that technology. In 2004, it announced developing chips utilizing the Silicon-On-Insulator (SOI) technology, which saved significant amount of power. Finally in 2007, Intel and IBM announced recently that they were able to produce a high-K material and electrode metals (instead of polysilicon) that will enable the mass production of chips in 45nm technology. Dual core and dual threaded microprocessors have already been designed in 65nm technology. Designing microprocessors in 45nm technology will enable adding more cores and cache to the chip among other features. Coupled with the technologies mentioned earlier, performance will increase and power consumption will be kept at bay.

System Design Trends

PC Direction

The number of supported processors in a computer is ever increasing. Since mid 2000's, the norm has increasingly been to support more than one processor in a desktop computer (with laptops following closely behind). Intel and AMD are in a constant race to provide a stronger chip which provides higher performance (with multiple cores) and higher bandwidth (with faster electrical signaling, wider datapaths, pipelined protocols, multiple paths and software support).

Server Direction

Figure 1 shows the number of processors that have been supported in a shared bus this decade. A commonality between the technology appearing this decade and in the last decade is that servers at these times supported either a single core or a dual core microprocessor. The industry has been inching towards supporting 100 microprocessors on a single shared bus. Because the bus has a fixed bandwidth, such an approach was bound to reach a dead end if new levels of indirection were not exploited. Indeed, new technologies have made supporting more microprocessors on a shared bus more feasible. Among these technologies are multiple cores per chip, deeper levels of caching and better addressing schemes. Consider a microprocessor with multiple cores as a node. Nodes communicate, and it is left up to the microprocessor to arbitrate between cores, thus relieving the shared bus from this addressing strain. With the constant improvements in multiple core support within a chip, it is possible to see servers with over two hundred cores as soon as this decade.

http://upload.wikimedia.org/wikipedia/commons/3/32/Procs.JPG


Figure 1. Number of processors in fully configured commercial bus-based share memory multiprocessors.

A different class of servers is emerging which is neither an SMP or a cluster. It is called ccNUMA. ccNUMA servers utilize Cache-Coherent Non-Uniform Memory Access. Such servers provide better memory access time to local memory. However, the different copies of the same data are kept up to date through cache-coherency protocols. Such technology is being supported by Intel and AMD. Another server manufacturer supporting this technology is SGI, with its Origin 350 server supporting up to 32 microprocessors.

Shared Memory Bus Direction

As processors become faster, and more and more processors (all sharing a common bus) are added to a system, the bandwidth of the bus becomes ever more critical. As shown in Figure 2, the shared bus bandwidth of commercial multiprocessors has increased with time. Various technologies and techniques have been implemented to increase bus bandwidth, such as faster electrical signaling, wider datapaths, pipelined protocols, and multiple paths. In 2001, a bidirectional serial/parallel high-bandwidth, low-latency point to point link called HyperTransport (HT) was introduced. HT is used in many processors and high-performance computing. HT has also been used as an interconnect for NUMA multiprocessor systems (see above).

Techniques have also been implemented to alleviate the strain put on the bus. With the Pentium III, Intel introduced an instruction designed to reduce bus contention. This is called the PAUSE instructions, which eliminates the bus transactions that occur when spin lock code repeatedly tries to test and set a memory location.

http://upload.wikimedia.org/wikipedia/commons/5/5e/Bandwidth.JPG


Figure 2. Bandwidth of the shared memory bus in commercial multiprocessors.

References

Culler DE, Singh JP, Gupta A. Parallel Computer Architecture: A Hardware/Software Approach. San Francisco, CA: Morgan Kaufmann Publishers, Inc., 1999.
http://www.endian.net/details.aspx?ItemNo=655
http://www-05.ibm.com/se/news/sv/2007/05/power-timeline.html
http://www-03.ibm.com/servers/eserver/pseries/hardware/whitepapers/power/ppc_arch.html
http://www.theinquirer.net/?article=9235
http://www.sun.com/processors/
http://www.sgi.com/products/remarketed/offering.html
http://compoundsemiconductor.net/articles/news/11/1/25
http://www.mbipr.com/whitepaper5.pdf
http://www.demandtech.com/Resources/Papers/Multiprocessor%20scalability.pdf