Main Page/CSC 456 Fall 2013/1a bc: Difference between revisions
mNo edit summary |
No edit summary |
||
(21 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Since 2006, parallel computers have continued to evolve. Besides the increasing number of transistors (as predicted by Moore's law), other designs and architectures have increased in prominence. These include Chip Multi-Processors, cluster computing, and mobile processors. | Edited from http://wiki.expertiza.ncsu.edu/index.php/Chapter_1:_Nick_Nicholls,_Albert_Chu | ||
Since 2006, parallel computers have continued to evolve. Besides the increasing number of transistors (as predicted by [http://en.wikipedia.org/wiki/Moore%27s_law Moore's law]), other designs and architectures have increased in prominence. These include Chip Multi-Processors, cluster computing, and mobile processors. | |||
==Transistor Count== | ==Transistor Count== | ||
At the most fundamental level of parallel computing development is the transistor count. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006. By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971. The clock frequency has also continued to rise | At the most fundamental level of parallel computing development is the transistor count | ||
<ref name="transcount">http://en.wikipedia.org/wiki/Transistor_count | |||
{{cite web | |||
| url = http://en.wikipedia.org/wiki/Transistor_count | |||
| title = Transistor Count | |||
| last1 = | |||
| first1 = | |||
| middle1 = | |||
| last2 = | |||
| first2 = | |||
| middle2 = | |||
| location = | |||
| date = | |||
| accessdate = October 1, 2013 | |||
| separator = , | |||
}} | |||
</ref> | |||
. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006. By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971. The clock frequency has also continued to rise. In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz from 1971. By 2011, the high end clock speed of a processor was in the 3.3GHz range. | |||
====Evolution of Intel Processors==== | ====Evolution of Intel Processors==== | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ Table 1.1: Evolution of Intel Processors | |+ Table 1.1: Evolution of Intel Processors | ||
<ref name="intelspecs">http://ark.intel.com/ | |||
{{cite web | |||
| url = http://ark.intel.com/ | |||
| title = Intel Processor Specifications | |||
| last1 = | |||
| first1 = | |||
| middle1 = | |||
| last2 = | |||
| first2 = | |||
| middle2 = | |||
| location = | |||
| date = | |||
| accessdate = October 1, 2013 | |||
| separator = , | |||
}} | |||
</ref> | |||
|- | |- | ||
! From | ! From | ||
Line 47: | Line 83: | ||
| Core i7 Gulftown | | Core i7 Gulftown | ||
| 1.17 Billion | | 1.17 Billion | ||
| 3.2 GHz | | 3.2 GHz | ||
| | | 32 nm | ||
|- | |- | ||
| 2011 | | 2011 | ||
| Core i7 Sandy Bridge | | Core i7 Sandy Bridge EP4 | ||
| 2 | | 1.2 Billion | ||
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache | | 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache | ||
| Up to 8 cores | | Up to 8 cores | ||
|- | |||
|2012 | |||
| Core i7 Ivy Bridge | |||
| 1.2 Billion | |||
| 2.5-3.7 GHz | |||
| 22 nm, 3D Tri-gate transistors | |||
|- | |||
|2013 | |||
| Core Haswell | |||
| 1.4 Billion | |||
| 2.5-3.7 GHz | |||
| Fully integrated voltage regulator | |||
|} | |} | ||
==Chip Multi-Processors== | ==Chip Multi-Processors== | ||
With the sophistication of processors and | With the increasing sophistication of processors and limitations of Silicon on Chip designs, design efforts shifted to parallelism. Instructions could be broken down into a large pipeline. The larger pipeline allowed big performance gains with Instruction Level Parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s, the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance of parallel computers after only a few years. This inhibited adoption of multi-processor systems since single-processor systems achieved relative performance while being less costly. Over time, the effort to gain improvements from ILP began to have diminishing returns. In single-processor systems, the primary way of increasing performance was to increase the clock speed. As clock speeds increase, power consumption also increases. With parallelism, as long as the instructions are parallelizable, performance can be increased with an increase in processors. | ||
<ref name="cpuperf">http://en.wikipedia.org/wiki/Central_processing_unit#Performance | |||
{{cite web | |||
| url = http://en.wikipedia.org/wiki/Central_processing_unit#Performance | |||
| title = CPU Performance | |||
| last1 = | |||
| first1 = | |||
| middle1 = | |||
| last2 = | |||
| first2 = | |||
| middle2 = | |||
| location = | |||
| date = | |||
| accessdate = October 1, 2013 | |||
| separator = , | |||
}} | |||
</ref> | |||
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn | As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn toward on-chip multi-processors (i.e. multi-core architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can simultaneously execute multiple tasks and each core can use ILP with pipelining. Driven by the performance gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors. | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ Table 1.2: Examples of | |+ Table 1.2: Examples of multi-core processors | ||
|- | |- | ||
! Aspects | ! Aspects | ||
Line 97: | Line 161: | ||
==Cluster Computers== | ==Cluster Computers== | ||
The | The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system. Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors. Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware. | ||
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails. In order to function, high-availability clusters must be able to check and change the status of running applications. The applications must also use shared storage, while operating in a way such that its data is protected from corruption. | |||
{| class="wikitable" | {| class="wikitable" | ||
|+ | |+ Top500.org Cluster computers 2008 - 2013<ref name="top500list">http://www.top500.org/lists/2013/06/ | ||
{{cite web | |||
| url = http://www.top500.org/lists/2013/06/ | |||
| title = Top500.org Supercomputer List | |||
| last1 = | |||
| first1 = | |||
| middle1 = | |||
| last2 = | |||
| first2 = | |||
| middle2 = | |||
| location = | |||
| date = June 2013 | |||
| accessdate = October 3, 2013 | |||
| separator = , | |||
}} | |||
</ref> | |||
|- | |- | ||
! | ! Date of #1 Rank | ||
! Name | ! Name | ||
! Number of Cores/Nodes | |||
! Specifications | ! Specifications | ||
! | ! Peak Performance | ||
|- | ! Power Usage | ||
| 2009 | ! Information | ||
| SPARC64 VIIIfx | |- valign="top" | ||
| 2 | | 2009 Jun | ||
| | | Roadrunner | ||
| | |||
* 129,600 Cores | |||
* 6,480 computing nodes | |||
| | |||
* AMD Opteron 2210 2-core | |||
* IBM PowerXCell8i 8+1 cores | |||
* 104 Terabytes RAM | |||
* Infiniband interconnect | |||
* OS - REHL and Fedora Linux | |||
| 1.46 Petaflops | |||
| 2.5 Megawatts | |||
| Built by IBM, housed in NM, US | |||
|- valign="top" | |||
| 2010 Jun | |||
| Jaguar | |||
| | |||
* 224,162 Cores | |||
* 18,688 computing nodes | |||
| | |||
* AMD Opteron 2435 6-core | |||
* AMD Opteron 1354 4-core | |||
* 360 Terabytes RAM | |||
* Cray Seastar2+, Infiniband interconnects | |||
* OS - Cray Linux | |||
| 2.33 Petaflops | |||
| 7.0 Megawatts | |||
| Built by Cray, housed in Tennessee, US | |||
|- valign="top" | |||
| 2010 Nov | |||
| Tianhe-1A | |||
| | |||
* 186,368 Cores | |||
* 7,168 computing nodes | |||
| | |||
* 2 Xeon X5670 6-core CPUs per node | |||
* 1 Nvidia M2050 GPU per node | |||
* 262 Terabytes RAM | |||
* Arch interconnect (NUDT) | |||
* OS - Linux variant | |||
| 4.7 Petaflops | |||
| 4.0 Megawatts | |||
| Built by NUDT, China | |||
|- valign="top" | |||
| 2011 Nov | |||
| K Computer | |||
| | |||
* 705,024 Cores | |||
* 96 computing nodes | |||
| | |||
* 2.0GHz 8-core SPARC64 VIIIfx | |||
* 6 I/O nodes | |||
* Using Message Passing Interface | |||
* Tofu 6-dimensional torus interconnect | |||
* OS - Linux variant | |||
| 11.28 Petaflops | |||
| 9.89 Megawatts | |||
| Built by Fujitsu, housed in Japan | |||
|- valign="top" | |||
| 2012 Jun | |||
| Sequoia | |||
| | |||
* 1,572,864 Cores | |||
* 98,304 computing nodes | |||
| | |||
* 16-core PowerPC A2, Blue Gene/Q | |||
* 1.5 Petabytes RAM | |||
* 5-dimensional torus interconnect | |||
* OS - Linux variant | |||
| 20.13 Petaflops | |||
| 7.9 Megawatts | |||
| Built by IBM, housed in California, US | |||
|- valign="top" | |||
| 2012 Nov | |||
| Titan | |||
| | |||
* 560,640 computing cores | |||
| | |||
* AMD Opertons CPUs | |||
* Nvidia Tesla GPUs | |||
* 693 Terabytes RAM (CPU + GPU) | |||
* Cray Gemini interconnect | |||
* OS - Cray Linux | |||
| 27.11 Petaflops | |||
| 8.2 Megawatts | |||
| Built by Cray, housed in California, US | |||
|- valign="top" | |||
| 2013 Jun | |||
| Tianhe-2 | |||
| | |||
* 3,120,000 Cores | |||
* 16,000 nodes | |||
| | |||
* 2 Intel Xeon IvyBridge per node | |||
* 3 Intel Xeon Phi per node | |||
* 1.34 Petabytes RAM | |||
* TH Express-2 fat tree topology (NUDT) | |||
* OS - NUDT Kylin Linux | |||
| 54.9 Petaflops | |||
| 17.6 Megawatts | |||
| Built by NUDT, China | |||
|} | |} | ||
===Trends=== | |||
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu. Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance. | |||
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs. | |||
====Graphical trends for super computers 2008-2013<ref name="top500stats">http://www.top500.org/statistics/sublist/ | |||
{{cite web | |||
| url = http://www.top500.org/statistics/sublist | |||
| title = CPU Performance | |||
| last1 = | |||
| first1 = | |||
| middle1 = | |||
| last2 = | |||
| first2 = | |||
| middle2 = | |||
| location = | |||
| date = | |||
| accessdate = October 1, 2013 | |||
| separator = , | |||
}} | |||
</ref>==== | |||
* [[Media:Top500_cores-per-socket.png|Top500.org Cores per socket]] - In recent years, 8-core have been gaining a large portion of the market-share with 16-core systems a recent player in the market. Single processor systems have been minor use since 2008. | |||
* [[Media:Top500_cores-per-socket-performance.png|Top500.org Performance for cores per socket]] - 8-core systems have the most performance share of the super computer market. 16-core systems place into a very close second place with 12-core systems bringing up third place. In total, these three categories make up 85% of the top performance among super computers. | |||
* [[Media:Top500 interconnect-family.png|Top500.org Interconnects used for super computers]] - Infiniband's interconnect technology makes up the largest portion of the super computer arena. Interconnect systems utilizing gigabit ethernets make up the next largest portion. | |||
* [[Media:Top500 vendors.png|Top500.org Vendor trends of super computers]] - IBM and HP make up nearly half of the super computer market. HP and Cray appear to be on the trend of gaining market share in recent years. | |||
==Mobile Processors== | ==Mobile Processors== | ||
Line 140: | Line 347: | ||
| .5W-1.9W | | .5W-1.9W | ||
|} | |} | ||
==Sources== | ==Sources== | ||
====References==== | |||
<references/> | |||
====Other sources==== | |||
<ol> | <ol> | ||
<li>http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html</li> | <li>http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html</li> | ||
<li>http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive</li> | <li>http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive</li> | ||
<li>http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/</li> | <li>http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/</li> | ||
<li>http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review</li> | <li>http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review</li> | ||
<li>http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html</li> | <li>http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html</li> | ||
<li>http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html</li> | <li>http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html</li> | ||
<li>http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series</li> | <li>http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series</li> | ||
<li>http://www.arm.com/products/processors/cortex-a/cortex-a9.php</li> | <li>http://www.arm.com/products/processors/cortex-a/cortex-a9.php</li> | ||
<li>http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx</li> | <li>http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx</li> | ||
<li>http://en.wikipedia.org/wiki/High-availability_cluster</li> | <li>http://en.wikipedia.org/wiki/High-availability_cluster</li> | ||
</ol> | </ol> |
Latest revision as of 02:14, 8 October 2013
Edited from http://wiki.expertiza.ncsu.edu/index.php/Chapter_1:_Nick_Nicholls,_Albert_Chu
Since 2006, parallel computers have continued to evolve. Besides the increasing number of transistors (as predicted by Moore's law), other designs and architectures have increased in prominence. These include Chip Multi-Processors, cluster computing, and mobile processors.
Transistor Count
At the most fundamental level of parallel computing development is the transistor count <ref name="transcount">http://en.wikipedia.org/wiki/Transistor_count
</ref> . According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006. By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971. The clock frequency has also continued to rise. In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz from 1971. By 2011, the high end clock speed of a processor was in the 3.3GHz range.
Evolution of Intel Processors
From | Procs | Transistors | Specifications | New Features |
---|---|---|---|---|
2000 | Pentium IV | 55 Million | 1.4-3GHz | hyper-pipelining, SMT |
2006 | Xeon | 167 Million | 64-bit, 2GHz, 4MB L2 cache on chip | Dual core, virtualization support |
2007 | Core 2 Allendale | 167 Million | 1.8-2.6 GHz, 2MB L2 cache | 2 CPUs on one die, Trusted Execution Technology |
2008 | Xeon | 820 Million | 2.5-2.83 GHz, 6MB L3 cache | |
2009 | Core i7 Lynnfield | 774 Million | 2.66-2.93 GHz, 8MB L3 cache | 2-channel DDR3 |
2010 | Core i7 Gulftown | 1.17 Billion | 3.2 GHz | 32 nm |
2011 | Core i7 Sandy Bridge EP4 | 1.2 Billion | 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache | Up to 8 cores |
2012 | Core i7 Ivy Bridge | 1.2 Billion | 2.5-3.7 GHz | 22 nm, 3D Tri-gate transistors |
2013 | Core Haswell | 1.4 Billion | 2.5-3.7 GHz | Fully integrated voltage regulator |
Chip Multi-Processors
With the increasing sophistication of processors and limitations of Silicon on Chip designs, design efforts shifted to parallelism. Instructions could be broken down into a large pipeline. The larger pipeline allowed big performance gains with Instruction Level Parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s, the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance of parallel computers after only a few years. This inhibited adoption of multi-processor systems since single-processor systems achieved relative performance while being less costly. Over time, the effort to gain improvements from ILP began to have diminishing returns. In single-processor systems, the primary way of increasing performance was to increase the clock speed. As clock speeds increase, power consumption also increases. With parallelism, as long as the instructions are parallelizable, performance can be increased with an increase in processors. <ref name="cpuperf">http://en.wikipedia.org/wiki/Central_processing_unit#Performance
</ref>
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn toward on-chip multi-processors (i.e. multi-core architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can simultaneously execute multiple tasks and each core can use ILP with pipelining. Driven by the performance gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.
Aspects | Intel Sandy Bridge | AMD Valencia | IBM POWER7 |
---|---|---|---|
# Cores | 4 | 8 | 8 |
Clock Freq. | 3.5GHz | 3.3GHz | 3.55GHz |
Clock Type | OOO Superscalar | OOO Superscalar | SIMD |
Caches | 8MB L3 | 8MB L3 | 32MB L3 |
Chip Power | 95 Watts | 95 Watts | 650 Watts for the whole system |
Cluster Computers
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system. Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors. Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails. In order to function, high-availability clusters must be able to check and change the status of running applications. The applications must also use shared storage, while operating in a way such that its data is protected from corruption.
Date of #1 Rank | Name | Number of Cores/Nodes | Specifications | Peak Performance | Power Usage | Information |
---|---|---|---|---|---|---|
2009 Jun | Roadrunner |
|
|
1.46 Petaflops | 2.5 Megawatts | Built by IBM, housed in NM, US |
2010 Jun | Jaguar |
|
|
2.33 Petaflops | 7.0 Megawatts | Built by Cray, housed in Tennessee, US |
2010 Nov | Tianhe-1A |
|
|
4.7 Petaflops | 4.0 Megawatts | Built by NUDT, China |
2011 Nov | K Computer |
|
|
11.28 Petaflops | 9.89 Megawatts | Built by Fujitsu, housed in Japan |
2012 Jun | Sequoia |
|
|
20.13 Petaflops | 7.9 Megawatts | Built by IBM, housed in California, US |
2012 Nov | Titan |
|
|
27.11 Petaflops | 8.2 Megawatts | Built by Cray, housed in California, US |
2013 Jun | Tianhe-2 |
|
|
54.9 Petaflops | 17.6 Megawatts | Built by NUDT, China |
Trends
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu. Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs.
====Graphical trends for super computers 2008-2013<ref name="top500stats">http://www.top500.org/statistics/sublist/
</ref>====
- Top500.org Cores per socket - In recent years, 8-core have been gaining a large portion of the market-share with 16-core systems a recent player in the market. Single processor systems have been minor use since 2008.
- Top500.org Performance for cores per socket - 8-core systems have the most performance share of the super computer market. 16-core systems place into a very close second place with 12-core systems bringing up third place. In total, these three categories make up 85% of the top performance among super computers.
- Top500.org Interconnects used for super computers - Infiniband's interconnect technology makes up the largest portion of the super computer arena. Interconnect systems utilizing gigabit ethernets make up the next largest portion.
- Top500.org Vendor trends of super computers - IBM and HP make up nearly half of the super computer market. HP and Cray appear to be on the trend of gaining market share in recent years.
Mobile Processors
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.
Aspects | Intel Atom N2800 | ARM Cortex-A9 |
---|---|---|
# Cores | 2 | 2 |
Clock Freq | 1.86GHz | 800MHz-2000MHz |
Cache | 1MB L2 | 4MB L2 |
Power | 35 W | .5W-1.9W |
Sources
References
<references/>
Other sources
- http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html
- http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive
- http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/
- http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review
- http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html
- http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html
- http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series
- http://www.arm.com/products/processors/cortex-a/cortex-a9.php
- http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx
- http://en.wikipedia.org/wiki/High-availability_cluster