CSC 456 Spring 2012/11a NC: Difference between revisions
Line 23: | Line 23: | ||
==mordred2 (Kerlabs)== | ==mordred2 (Kerlabs)== | ||
The mordred2 is one of several clusters operated by Kerlabs. It is a distributed shared memory system running the open source software Kerrighed. The cluster contains 110 nodes, each with 2 dual-core AMD Opteron processors and 4GB of memory. Its distributed shared memory is provided on the software level by the Linux extension Kerrighed. The software provides sequential consistency, process migration to another node, and checkpointing (the ability to return to a previous application state in case of failure). | The mordred2 is one of several clusters operated by Kerlabs. It is a distributed shared memory system running the open source software Kerrighed. The cluster contains 110 nodes, each with 2 dual-core AMD Opteron processors and 4GB of memory.<ref name="mordred2"/> Its distributed shared memory is provided on the software level by the Linux extension Kerrighed. The software provides sequential consistency, process migration to another node, and checkpointing (the ability to return to a previous application state in case of failure).<ref name="kerrighed"/> | ||
==References== | ==References== |
Revision as of 02:30, 16 April 2012
Large-Scale Multiprocessor Examples
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two] How 'bout IBM's large systems--Blue Gene, etc.
K Computer
Made by Fujitsu, the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. <ref name="kprocs"/>
The system is networked together via point-to-point, or direct, connection. Fujitsu has their own proprietary network, known as the "Tofu Interconnect". It is a six-dimensional mesh/torus topology. Each set of 12 nodes is called a "node group" and is considered the unit of job allocation. Each node group is connected to adjacent node groups via a three-dimensional torus network. Additionally, the nodes within each node group are adjacently connection via their own three-dimensional mesh/torus. <ref name="kpdf"/><ref name="ktofu"/><ref name="knetwork"/> What topology? Surely not 95^2 links!
The K Computer is not a distributed shared memory (DSM) machine in which the physically separate nodes are addressed as one logically shared address space. Instead, the K Computer utilizes a message passing interface (MPI), allowing the nodes to pass messages to one another as needed.
Tianhe-1A
The Tianhe-1A, sponsored by the National University of Defense Technology in China, is capable of 4.701 petaFLOPS. It is comprised of 14,336 Xeon X5670 processors and 7,168 Nvidia GP-GPUs. In addition to the Xeon and Nvidia chips, there are 2048 FeiTeng 1000 processors.
All of these processors are contained in 112 computer cabinets, 12 storage cabinets, 6 communication cabinets, and 8 I/O cabinets. In each computer cabinet are 4 racks with 8 blades each and a 16 port switch. A single blade contains 2 computer nodes each containing 2 Xeon processors and 1 Nvidia GPU. This comes to a total of 3584 blades. These individual nodes are connected using a high-speed interconnect called Arch, which has a bandwidth of 160 Gbps.
The Arch interconnect uses point-to-point connections in a hybrid fat tree configuration.
The system uses message passing rather than shared memory, so neither a system-wide cache coherency protocol nor a memory consistency protocol is necessary.
mordred2 (Kerlabs)
The mordred2 is one of several clusters operated by Kerlabs. It is a distributed shared memory system running the open source software Kerrighed. The cluster contains 110 nodes, each with 2 dual-core AMD Opteron processors and 4GB of memory.<ref name="mordred2"/> Its distributed shared memory is provided on the software level by the Linux extension Kerrighed. The software provides sequential consistency, process migration to another node, and checkpointing (the ability to return to a previous application state in case of failure).<ref name="kerrighed"/>
References
<references> <ref name="kpdf">http://www.fujitsu.com/downloads/TC/sc10/interconnect-of-k-computer.pdf</ref> <ref name="ktofu">http://www.fujitsu.com/global/about/tech/k/whatis/network/</ref> <ref name="kprocs">http://en.wikipedia.org/wiki/K_computer</ref> <ref name="knetwork">http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf</ref> <ref name="kerrighed">http://en.wikipedia.org/wiki/Kerrighed</ref> <ref name="mordred2">http://kerrighed.org/php/clusterview.php?id=29</ref> </references>