CSC 456 Spring 2012/11b AB: Difference between revisions
(41 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
=Large-Scale Multiprocessors= | =Large-Scale Multiprocessors= | ||
With modern technology, large-scale multiprocessors (LSMs) have become more prevalent. There has been considerable research into networking topologies for connecting the processors, and several methods have been conceived to ensure coherence. Additionally, there are numerous manufacturers who make the materials necessary to build LSMs. In this article, we show examples of each of these. | |||
==Manufacturers== | ==Manufacturers== | ||
In order to build a | In order to build a LSM, you will need to choose the right processors, as well as the most appropriate cabinet(s) to place them in. There are several different manufacturers of processors and cabinets that can be used in LSM configurations. For example, Fujitsu's K computer (the number one ranked supercomputer on TOP500's November 2011 list) uses a configuration of 88,128 SPARC64 VIIIfx processors. This means it has a total of 705,024 cores at its use.<ref name="k computer"/> Additional examples of processors used in LSMs can be found in Table 1. | ||
[[File:kcomputer.jpg|thumb|right|none|upright=2|alt=alt text|Fujitsu's K Computer<ref name="k computer image"/>]] | |||
{| class="wikitable" | {| class="wikitable" | ||
Line 10: | Line 14: | ||
! Manufacturer | ! Manufacturer | ||
! Processor | ! Processor | ||
! Year | |||
! Cores | ! Cores | ||
! Clock Rate | ! Clock Rate | ||
Line 16: | Line 21: | ||
| Fujitsu | | Fujitsu | ||
| SPARC64 VIIIfx<ref name="fujitsu proc"/> | | SPARC64 VIIIfx<ref name="fujitsu proc"/> | ||
| 2009 | |||
| 8 | | 8 | ||
| 2.0 GHz | | 2.0 GHz | ||
Line 22: | Line 28: | ||
| Intel | | Intel | ||
| Xeon 7500<ref name="intel proc"/> | | Xeon 7500<ref name="intel proc"/> | ||
| 2010 | |||
| 8 | | 8 | ||
| 1.733-2.667 GHz | | 1.733-2.667 GHz | ||
Line 28: | Line 35: | ||
| IBM | | IBM | ||
| POWER7<ref name="ibm proc"/> | | POWER7<ref name="ibm proc"/> | ||
| 2010 | |||
| 8 | | 8 | ||
| 2.4-4.25 GHz | | 2.4-4.25 GHz | ||
Line 34: | Line 42: | ||
| AMD | | AMD | ||
| Opteron 6100<ref name="amd proc"/> | | Opteron 6100<ref name="amd proc"/> | ||
| 2010 | |||
| 12 | | 12 | ||
| 1.7-2.4 GHz | | 1.7-2.4 GHz | ||
Line 64: | Line 73: | ||
===Network Topology=== | ===Network Topology=== | ||
There are many different ways to connect the network of processors. Each network type has different properties and values related to their diameter, bisection bandwidth, and degree. The diameter of a network is the longest number of network hops between any pair of nodes. Bisection bandwidth refers to the minimum number of links that need to be cut to divide the network in half. The degree of a network refers to the number of in/out links on each node. The following figure displays some examples. | |||
[[File:NetworkTopologies.png|center|left|An example of possible network structures.<ref name = "topology"/>]] | |||
The following table gives some more detail on the different characteristics of some network types. "p" is the number of nodes, "d" is dimensions, and "k" is the number of nodes in each dimension. | |||
{| class="wikitable" | |||
|+ Table 3: Network Properties | |||
|- | |||
! Topology | |||
! Diameter | |||
! Bandwidth | |||
! Degree | |||
! Example(s) | |||
|- | |||
| Ring | |||
| p/2 | |||
| 2 | |||
| 2 | |||
| KSR-1, NUMA-chine <ref name="ring"/> | |||
|- | |||
| k-ary d Mesh | |||
| 2(sqrt(p) - 1) | |||
| sqrt(p) | |||
| 4 | |||
| Intel Paragon, Cray T3D <ref name="mesh"/> | |||
|- | |||
| Butterfly | |||
| log_2(p) | |||
| p/2 | |||
| 4 | |||
| BBN Butterfly<ref name="butterfly"/> | |||
|- | |||
| k-ary Fat Tree | |||
| 2 x log_k(p) | |||
| p/2 | |||
| k+1 | |||
| Xtreme-X<ref name="xtreme"/> | |||
|- | |||
| Hypercube | |||
| log_2(p) | |||
| p/2 | |||
| log_2(p) | |||
| nCUBE 1<ref name="ncube"/> | |||
|} | |||
==Coherence== | ==Coherence== | ||
For LSMs that use a Distributed Shared Memory (DSM) architecture, cache coherence is an important issue. In 1990, researchers at the Massachusetts Institute of Technology showed that it was possible to build to build a coherent LSM using a directory-based approach with the Alewife multiprocessor <ref name="alewife"/> | For LSMs that use a Distributed Shared Memory (DSM) architecture, cache coherence is an important issue. In 1990, researchers at the Massachusetts Institute of Technology showed that it was possible to build to build a coherent LSM using a directory-based approach with the Alewife multiprocessor.<ref name="alewife"/> A modern example is the Pittsburgh Supercomputing Center's Blacklight, a supercomputer with hardware-enabled shared coherent memory.<ref name="blacklight"/> | ||
On the other hand, some LSMs use distributed memory systems, meaning that each of the processors has its own private memory, making cache coherency a non-issue. Fujitsu's K computer is an example of such a system.<ref name="k computer"/> | |||
Another example of a memory design used by LSMs is Non Uniform Memory Access (NUMA). NUMA has a coherent version of its system, called cache coherent NUMA (ccNUMA), where data and memory is accessed globally.<ref name="ccNUMA"/> The 2008 IBM Roadrunner supercomputer, which has 6480 Opteron processors and 12960 IBM Cell processors, uses ccNUMA.<ref name="roadrunner"/> | |||
==References== | ==References== | ||
<references> | <references> | ||
<ref name="fujitsu proc">http://en.wikipedia.org/wiki/SPARC64_VIIIfx</ref> | <ref name="fujitsu proc">[http://en.wikipedia.org/wiki/SPARC64_VIIIfx Fujitsu SPARC64 VIIIfx Processor]</ref> | ||
<ref name="intel proc">http://en.wikipedia.org/wiki/Xeon#6500.2F7500-series_.22Beckton.22</ref> | <ref name="intel proc">[http://en.wikipedia.org/wiki/Xeon#6500.2F7500-series_.22Beckton.22 Intel Xeon Processor]</ref> | ||
<ref name="ibm proc">http://en.wikipedia.org/wiki/Power7</ref> | <ref name="ibm proc">[http://en.wikipedia.org/wiki/Power7 IBM Power 7 Processor]</ref> | ||
<ref name="amd proc">http://en.wikipedia.org/wiki/Opteron#Opteron_.2845_nm_SOI.29</ref> | <ref name="amd proc">[http://en.wikipedia.org/wiki/Opteron#Opteron_.2845_nm_SOI.29 AMD Opteron Processor]</ref> | ||
<ref name="supermicro chassis">http://www.supermicro.com/products/system/4U/8046/SYS-8046B-TRLF.cfm</ref> | <ref name="supermicro chassis">[http://www.supermicro.com/products/system/4U/8046/SYS-8046B-TRLF.cfm Supermicro Chassis]</ref> | ||
<ref name="hp chassis">http://h20341.www2.hp.com/integrity/us/en/high-end/integrity-high-end-servers-superdome2.html</ref> | <ref name="hp chassis">[http://h20341.www2.hp.com/integrity/us/en/high-end/integrity-high-end-servers-superdome2.html HP Chassis]</ref> | ||
<ref name="ibm chassis">http://www-03.ibm.com/systems/bladecenter/hardware/chassis/bladeht/index.html</ref> | <ref name="ibm chassis">[http://www-03.ibm.com/systems/bladecenter/hardware/chassis/bladeht/index.html IBM Chassis]</ref> | ||
<ref name="k computer">http://top500.org/lists/2011/11/press-release</ref> | <ref name="k computer">[http://top500.org/lists/2011/11/press-release K Computer]</ref> | ||
<ref name="19 inch rack">http://en.wikipedia.org/wiki/19-inch_rack</ref> | <ref name="19 inch rack">[http://en.wikipedia.org/wiki/19-inch_rack 19 inch Rack]</ref> | ||
<ref name="alewife">http://webcache.googleusercontent.com/search?q=cache:-oLJbStOeAEJ:groups.csail.mit.edu/cag/pub/papers/chaiken-thesis.ps.Z+&cd=1&hl=en&ct=clnk&gl=us&client=firefox-a</ref> | <ref name="alewife">[http://webcache.googleusercontent.com/search?q=cache:-oLJbStOeAEJ:groups.csail.mit.edu/cag/pub/papers/chaiken-thesis.ps.Z+&cd=1&hl=en&ct=clnk&gl=us&client=firefox-a Cache Coherence Protocols for Large Scale Multiprocessors]</ref> | ||
<ref name="blacklight">http://www.psc.edu/machines/sgi/uv/blacklight.php</ref> | <ref name="blacklight">[http://www.psc.edu/machines/sgi/uv/blacklight.php Blacklight Multiprocessor]</ref> | ||
<ref name="topology">[http://en.wikibooks.org/wiki/Communication_Networks/Network_Topologies Network Topologies]</ref> | |||
<ref name="k computer image">[http://www.top500.org/files/systems/k.jpg K Computer Image]</ref> | |||
<ref name="ring">[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.6894&rep=rep1&type=pdf Ring Network Example]</ref> | |||
<ref name="mesh">[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.4149&rep=rep1&type=pdf Mesh Network Example]</ref> | |||
<ref name="ccNUMA">[http://www.top500.org/2007_overview_recent_supercomputers/ccnuma_machines ccNuma Machines]</ref> | |||
<ref name="roadrunner">[http://www.leb.eei.uni-erlangen.de/winterakademie/2009/report/content/course02/pdf/0211.pdf Supercomputer Architecture]</ref> | |||
<ref name="butterfly">[http://en.wikipedia.org/wiki/BBN_Butterfly BBN Butterfly Supercomputer]</ref> | |||
<ref name="ncube">[http://en.wikipedia.org/wiki/NCUBE nCUBE 1]</ref> | |||
<ref name="xtreme">[http://www.appro.com/products/supercomputers/xtreme-x_supercomputer/ Xtreme-X]</ref> | |||
</references> | </references> |
Latest revision as of 04:57, 26 April 2012
Large-Scale Multiprocessors
With modern technology, large-scale multiprocessors (LSMs) have become more prevalent. There has been considerable research into networking topologies for connecting the processors, and several methods have been conceived to ensure coherence. Additionally, there are numerous manufacturers who make the materials necessary to build LSMs. In this article, we show examples of each of these.
Manufacturers
In order to build a LSM, you will need to choose the right processors, as well as the most appropriate cabinet(s) to place them in. There are several different manufacturers of processors and cabinets that can be used in LSM configurations. For example, Fujitsu's K computer (the number one ranked supercomputer on TOP500's November 2011 list) uses a configuration of 88,128 SPARC64 VIIIfx processors. This means it has a total of 705,024 cores at its use.<ref name="k computer"/> Additional examples of processors used in LSMs can be found in Table 1.
Manufacturer | Processor | Year | Cores | Clock Rate | Architecture |
---|---|---|---|---|---|
Fujitsu | SPARC64 VIIIfx<ref name="fujitsu proc"/> | 2009 | 8 | 2.0 GHz | SPARC |
Intel | Xeon 7500<ref name="intel proc"/> | 2010 | 8 | 1.733-2.667 GHz | Nehalem |
IBM | POWER7<ref name="ibm proc"/> | 2010 | 8 | 2.4-4.25 GHz | Power ISA v.2.06 |
AMD | Opteron 6100<ref name="amd proc"/> | 2010 | 12 | 1.7-2.4 GHz | Direct Connect 2.0 |
Like processors, different manufacturers offer varying cabinet/server types, such as IBM's BladeCenter HT. This particular model uses their CoolBlue technology, a set of tools that allows the user to have greater control over cooling and power use. There are also some standard cabinet frames, such as 19-inch racks, which get their name from the 19-inch panels used in their design. Typically, these racks allow for easy processor/server installation and removal. <ref name="19 inch rack"/>
Manufacturer | Cabinet | Blade Count |
---|---|---|
SuperMicro | MP Superserver 8064B-TRLF<ref name="supermicro chassis"/> | 4 |
HP | Integrity Superdome 2<ref name="hp chassis"/> | 32 |
IBM | BladeCenter HT<ref name="ibm chassis"/> | 12 |
Assembling
Network Topology
There are many different ways to connect the network of processors. Each network type has different properties and values related to their diameter, bisection bandwidth, and degree. The diameter of a network is the longest number of network hops between any pair of nodes. Bisection bandwidth refers to the minimum number of links that need to be cut to divide the network in half. The degree of a network refers to the number of in/out links on each node. The following figure displays some examples.
The following table gives some more detail on the different characteristics of some network types. "p" is the number of nodes, "d" is dimensions, and "k" is the number of nodes in each dimension.
Topology | Diameter | Bandwidth | Degree | Example(s) |
---|---|---|---|---|
Ring | p/2 | 2 | 2 | KSR-1, NUMA-chine <ref name="ring"/> |
k-ary d Mesh | 2(sqrt(p) - 1) | sqrt(p) | 4 | Intel Paragon, Cray T3D <ref name="mesh"/> |
Butterfly | log_2(p) | p/2 | 4 | BBN Butterfly<ref name="butterfly"/> |
k-ary Fat Tree | 2 x log_k(p) | p/2 | k+1 | Xtreme-X<ref name="xtreme"/> |
Hypercube | log_2(p) | p/2 | log_2(p) | nCUBE 1<ref name="ncube"/> |
Coherence
For LSMs that use a Distributed Shared Memory (DSM) architecture, cache coherence is an important issue. In 1990, researchers at the Massachusetts Institute of Technology showed that it was possible to build to build a coherent LSM using a directory-based approach with the Alewife multiprocessor.<ref name="alewife"/> A modern example is the Pittsburgh Supercomputing Center's Blacklight, a supercomputer with hardware-enabled shared coherent memory.<ref name="blacklight"/>
On the other hand, some LSMs use distributed memory systems, meaning that each of the processors has its own private memory, making cache coherency a non-issue. Fujitsu's K computer is an example of such a system.<ref name="k computer"/>
Another example of a memory design used by LSMs is Non Uniform Memory Access (NUMA). NUMA has a coherent version of its system, called cache coherent NUMA (ccNUMA), where data and memory is accessed globally.<ref name="ccNUMA"/> The 2008 IBM Roadrunner supercomputer, which has 6480 Opteron processors and 12960 IBM Cell processors, uses ccNUMA.<ref name="roadrunner"/>
References
<references> <ref name="fujitsu proc">Fujitsu SPARC64 VIIIfx Processor</ref> <ref name="intel proc">Intel Xeon Processor</ref> <ref name="ibm proc">IBM Power 7 Processor</ref> <ref name="amd proc">AMD Opteron Processor</ref> <ref name="supermicro chassis">Supermicro Chassis</ref> <ref name="hp chassis">HP Chassis</ref> <ref name="ibm chassis">IBM Chassis</ref> <ref name="k computer">K Computer</ref> <ref name="19 inch rack">19 inch Rack</ref> <ref name="alewife">Cache Coherence Protocols for Large Scale Multiprocessors</ref> <ref name="blacklight">Blacklight Multiprocessor</ref> <ref name="topology">Network Topologies</ref> <ref name="k computer image">K Computer Image</ref> <ref name="ring">Ring Network Example</ref> <ref name="mesh">Mesh Network Example</ref> <ref name="ccNUMA">ccNuma Machines</ref> <ref name="roadrunner">Supercomputer Architecture</ref> <ref name="butterfly">BBN Butterfly Supercomputer</ref> <ref name="ncube">nCUBE 1</ref> <ref name="xtreme">Xtreme-X</ref> </references>