CSC 456 Spring 2012/11a NC
Motivation For Article
This aims to briefly outline the architecture of current large scale multiprocessor systems. We will go into detail about several current systems including their manufacturer, physical composition and connecting network, memory consistency models, and how data is kept coherent within the system.
Large-Scale Multiprocessor Examples
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and mordred2 from Kerlabs.
K Computer
Made by Fujitsu, the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. <ref name="kprocs"/>
The system is networked together via point-to-point, or direct, connection. Fujitsu has their own proprietary network, known as the "Tofu Interconnect". It is a six-dimensional mesh/torus topology. Each set of 12 nodes is called a "node group" and is considered the unit of job allocation. Each node group is connected to adjacent node groups via a three-dimensional torus network. Additionally, the nodes within each node group are adjacently connection via their own three-dimensional mesh/torus. <ref name="kpdf"/><ref name="ktofu"/><ref name="knetwork"/>
The K Computer utilizes a message passing interface (MPI), allowing the nodes to pass messages to one another as needed.
Tianhe-1A
The Tianhe-1A, sponsored by the National University of Defense Technology in China, is capable of 4.701 petaFLOPS. It is comprised of 14,336 Xeon X5670 processors and 7,168 Nvidia GP-GPUs. In addition to the Xeon and Nvidia chips, there are 2048 FeiTeng 1000 processors.
All of these processors are contained in 112 computer cabinets, 12 storage cabinets, 6 communication cabinets, and 8 I/O cabinets. In each computer cabinet are 4 racks with 8 blades each and a 16 port switch. A single blade contains 2 computer nodes each containing 2 Xeon processors and 1 Nvidia GPU. This comes to a total of 3584 blades. These individual nodes are connected using a high-speed interconnect called Arch, which has a bandwidth of 160 Gbps.
The Arch interconnect uses point-to-point connections in a hybrid fat tree configuration.
The system also uses message passing (MPI), so neither a system-wide cache coherency protocol nor a memory consistency protocol is necessary.<ref name="tianhe_mpi"/>
This cluster computer cost $88 million to build and an additional $20 million per year for electricity and operating expenses.<ref name="tianhe" />
mordred2 (Kerlabs)
The mordred2 is one of several clusters operated by Kerlabs. It is a distributed shared memory system running the open source software Kerrighed. The cluster contains 110 nodes, each with 2 dual-core AMD Opteron processors and 4GB of memory, making it the largest known cluster running Kerrighed.<ref name="mordred2"/> Its distributed shared memory is provided on the software level by the Linux extension Kerrighed. The software provides sequential consistency, process migration to another node, and checkpointing (the ability to return to a previous application state in case of failure).<ref name="kerrighed"/>
References
<references> <ref name="kpdf">http://www.fujitsu.com/downloads/TC/sc10/interconnect-of-k-computer.pdf</ref> <ref name="ktofu">http://www.fujitsu.com/global/about/tech/k/whatis/network/</ref> <ref name="kprocs">http://en.wikipedia.org/wiki/K_computer</ref> <ref name="knetwork">http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf</ref> <ref name="kerrighed">http://en.wikipedia.org/wiki/Kerrighed</ref> <ref name="mordred2">http://kerrighed.org/php/clusterview.php?id=29</ref> <ref name="tianhe">http://en.wikipedia.org/wiki/Tianhe-1</ref> <ref name="tianhe_mpi">https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0CD8QFjAD&url=http%3A%2F%2Fsoftware.intel.com%2Ffile%2F39450&ei=iZaVT4fEC4Tetgfgtfm1Cw&usg=AFQjCNFCsxJccaKAuD5knOPxC6VQAR-xbQ</ref> </references>