CSC 456 Spring 2012/11a NC: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
==Write-up==
==Motivation For Article==
11b. Current large-scale multiprocessors. Who sells them, how do you assemble them into systems, what consistency models do they use (briefly, don't redo 10b), do they maintain coherence across the whole system, & how?
This aims to briefly outline the architecture of current large scale multiprocessor systems. We will go into detail about several current systems including their manufacturer, physical composition and connecting network, memory consistency models, and how data is kept coherent within the system.


==Large-Scale Multiprocessor Examples==
==Large-Scale Multiprocessor Examples==

Revision as of 17:10, 23 April 2012

Motivation For Article

This aims to briefly outline the architecture of current large scale multiprocessor systems. We will go into detail about several current systems including their manufacturer, physical composition and connecting network, memory consistency models, and how data is kept coherent within the system.

Large-Scale Multiprocessor Examples

Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and mordred2 from Kerlabs. Overall, in large cluster computers, message passing is much more popular than distributed shared memory.

K Computer

Made by Fujitsu, the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. <ref name="kprocs"/>

The system is networked together via point-to-point, or direct, connection. Fujitsu has their own proprietary network, known as the "Tofu Interconnect". It is a six-dimensional mesh/torus topology. Each set of 12 nodes is called a "node group" and is considered the unit of job allocation. Each node group is connected to adjacent node groups via a three-dimensional torus network. Additionally, the nodes within each node group are adjacently connection via their own three-dimensional mesh/torus. <ref name="kpdf"/><ref name="ktofu"/><ref name="knetwork"/> What topology? Surely not 95^2 links!

The K Computer is not a distributed shared memory (DSM) machine in which the physically separate nodes are addressed as one logically shared address space. Instead, the K Computer utilizes a message passing interface (MPI), allowing the nodes to pass messages to one another as needed.

Tianhe-1A

The Tianhe-1A, sponsored by the National University of Defense Technology in China, is capable of 4.701 petaFLOPS. It is comprised of 14,336 Xeon X5670 processors and 7,168 Nvidia GP-GPUs. In addition to the Xeon and Nvidia chips, there are 2048 FeiTeng 1000 processors.

All of these processors are contained in 112 computer cabinets, 12 storage cabinets, 6 communication cabinets, and 8 I/O cabinets. In each computer cabinet are 4 racks with 8 blades each and a 16 port switch. A single blade contains 2 computer nodes each containing 2 Xeon processors and 1 Nvidia GPU. This comes to a total of 3584 blades. These individual nodes are connected using a high-speed interconnect called Arch, which has a bandwidth of 160 Gbps.

The Arch interconnect uses point-to-point connections in a hybrid fat tree configuration.

The system uses message passing rather than shared memory, so neither a system-wide cache coherency protocol nor a memory consistency protocol is necessary.

This cluster computer cost $88 million to build and an additional $20 million per year for electricity and operating expenses.<ref name="tianhe" />

mordred2 (Kerlabs)

The mordred2 is one of several clusters operated by Kerlabs. It is a distributed shared memory system running the open source software Kerrighed. The cluster contains 110 nodes, each with 2 dual-core AMD Opteron processors and 4GB of memory, making it the largest known cluster running Kerrighed.<ref name="mordred2"/> Its distributed shared memory is provided on the software level by the Linux extension Kerrighed. The software provides sequential consistency, process migration to another node, and checkpointing (the ability to return to a previous application state in case of failure).<ref name="kerrighed"/>

References

<references> <ref name="kpdf">http://www.fujitsu.com/downloads/TC/sc10/interconnect-of-k-computer.pdf</ref> <ref name="ktofu">http://www.fujitsu.com/global/about/tech/k/whatis/network/</ref> <ref name="kprocs">http://en.wikipedia.org/wiki/K_computer</ref> <ref name="knetwork">http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf</ref> <ref name="kerrighed">http://en.wikipedia.org/wiki/Kerrighed</ref> <ref name="mordred2">http://kerrighed.org/php/clusterview.php?id=29</ref> <ref name="tianhe">http://en.wikipedia.org/wiki/Tianhe-1</ref> </references>