Chapter 1: Nick Nicholls, Albert Chu

Since 2006, parallel computers have continued to evolve. Besides the increasing number of transistors (as predicted by Moore's law), other designs and architectures have increased in prominence. These include Chip Multi-Processors, cluster computing, and mobile processors.

Transistor Count

At the most fundamental level of parallel computing development is the transistor count. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006. By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971. The clock frequency has also continued to rise, if a bit slower since 2006. In 2006, the clock speed was around 2.4GHz, three times the speed in 1971 of 750KHz. Now the high end clock speed of a processor is in the 3.3GHz range.

Evolution of Intel Processors

Table 1.1: Evolution of Intel Processors
From	Procs	Specifications	New Features
2000	Pentium IV	1.4-3GHz, 55M transistors	hyper-pipelining, SMT
2006	Xeon	64-bit, 2GHz, 167M transistors, 4MB L2 cache on chip	Dual core, virtualization support
2007	Core 2 Allendale	1.8-2.6 GHz, 167M transistors, 2MB L2 cache	2 CPUs on one die, Trusted Execution Technology
2008	Xeon	2.5-2.83 GHz, 820M transistors, 6MB L3 cache
2009	Core i7 Lynnfield	2.66-2.93 GHz, 774M transistors, 8MB L3 cache	2-channel DDR3
2010	Core i7 Gulftown	3.2 GHz, 1.17B transistors	One of the new 32 nm processors
2011	Core i7 Sandy Bridge E	3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache, 2.27B transistors	Up to 8 cores

Chip Multi-Processors

With the sophistication of processors and increasing clock speeds, effort was placed on parallelism. The high clock speed could be broken down into a large pipeline; this large pipeline allowed big performance gains with instruction level parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance in parallel computers after only a few years. This inhibited adoption of multi-processors as it was costly and not needed. Of course, the performance gains of ILP was soon limited. Once branch prediction had a success rate of 90%, there was little room for further improvement. At this point, the main way of increasing performance was to increase the clock speed. This also meant more power consumption.

As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn towards chip multi-processors (i.e. multicore architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can execute multiple tasks simultaneously, and each core can use ILP with pipelining. Driven by the gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.

Table 1.2: Examples of current multicore processors
Aspects	Intel Sandy Bridge	AMD Valencia	IBM POWER7
# Cores	4	8	8
Clock Freq.	3.5GHz	3.3GHz	3.55GHz
Clock Type	OOO Superscalar	OOO Superscalar	SIMD
Caches	8MB L3	8MB L3	32MB L3
Chip Power	95 Watts	95 Watts	650 Watts for the whole system

Cluster Computers

The '90s saw a rise of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and then combine to create a unified, powerful system. Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors. Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware. In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by the information technology Fujitsu. The K computer contains 88,128 nodes and can perform 10.51 petaflops, making it 4 times as fast as the previous record holder, while doing it at a computing efficiency of 93%. The processor used at each of the nodes is the SPARC64 VIIIfx.

Processor Used in K Computer
From	Name	Specifications	Notes
2009	SPARC64 VIIIfx	2 GHz, 8 cores	45 nm, Made by Fujitsu

One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails. In order to function, high-availability clusters must be able to check and change the status of running applications. The applications must also use shared storage, while operating in a way such that its data is protected from corruption.

Mobile Processors

Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.

Examples of current mobile processors
Aspects	Intel Atom N2800	ARM Cortex-A9
# Cores	2	2
Clock Freq	1.86GHz	800MHz-2000MHz
Cache	1MB L2	4MB L2
Power	35 W	.5W-1.9W

Chapter 1: Nick Nicholls, Albert Chu

Contents

Transistor Count

Evolution of Intel Processors

Chip Multi-Processors

Cluster Computers

Mobile Processors

Sources

Navigation menu

Chapter 1: Nick Nicholls, Albert Chu

Transistor Count

Evolution of Intel Processors

Chip Multi-Processors

Cluster Computers

Mobile Processors

Sources

Navigation menu

Search