CSC 456 Spring 2012/11a NC
Large-Scale Multiprocessor Examples
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two] How 'bout IBM's large systems--Blue Gene, etc.
K Computer
Made by Fujitsu, the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. <ref name="kprocs"/>
The system is networked together via point-to-point, or direct, connection. Fujitsu has their own proprietary network, known as the "Tofu Interconnect". It is a six-dimensional mesh/torus topology. Each set of 12 nodes is called a "node group" and is considered the unit of job allocation. Each node group is connected to adjacent node groups via a three-dimensional torus network. Additionally, the nodes within each node group are adjacently connection via their own three-dimensional mesh/torus. <ref name="kpdf"/><ref name="ktofu"/><ref name="knetwork"/> What topology? Surely not 95^2 links!
The K Computer is not a distributed shared memory (DSM) machine in which the physically separate nodes are addressed as one logically shared address space. Instead, the K Computer utilizes a message passing interface (MPI), allowing the nodes to pass messages to one another as needed.
Tianhe-1A
The Tianhe-1A, sponsored by the National University of Defense Technology in China, is capable of 4.701 petaFLOPS. It is comprised of 14,336 Xeon X5670 processors and 7,168 Nvidia GP-GPUs. In addition to the Xeon and Nvidia chips, there are 2048 FeiTeng 1000 processors.
All of these processors are contained in 112 computer cabinets, 12 storage cabinets, 6 communication cabinets, and 8 I/O cabinets. In each computer cabinet are 4 racks with 8 blades each and a 16 port switch. A single blade contains 2 computer nodes each containing 2 Xeon processors and 1 Nvidia GPU. This comes to a total of 3584 blades. These individual nodes are connected using a high-speed interconnect called Arch, which has a bandwidth of 160 Gbps.
The Arch interconnect uses point-to-point connections in a hybrid fat tree configuration.
The system uses message passing rather than shared memory, so neither a system-wide cache coherency protocol nor a memory consistency protocol is necessary.
References
<references> <ref name="kpdf">http://www.fujitsu.com/downloads/TC/sc10/interconnect-of-k-computer.pdf</ref> <ref name="ktofu">http://www.fujitsu.com/global/about/tech/k/whatis/network/</ref> <ref name="kprocs">http://en.wikipedia.org/wiki/K_computer</ref> <ref name="knetwork">http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf</ref> </references>