Main Page/CSC 456 Fall 2013/1a bc: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 125: Line 125:
! Information
! Information
|- valign="top"
|- valign="top"
| 2010 Nov
| Tianhe-1
|
* 186,368 Cores
* 7,168 computing nodes
|
* 2 Xeon X5670 6-core CPUs per node
* 1 Nvidia M2050 GPU per node
* 262 Terabytes RAM
* Arch interconnect (NUDT)
* Linux
| 4.7 Petaflops
| 4.0 Megawatts
| Built by NUDT, China
|-valign="top"
| 2011 Nov
| 2011 Nov
| K Computer
| K Computer
Line 138: Line 153:
| 11.28 Petaflops
| 11.28 Petaflops
| 9.89 Megawatts
| 9.89 Megawatts
| Built by Fujitsu, Housed in Japan, $10M/yr operating cost
| Built by Fujitsu, Housed in Japan
|- valign="top"
|- valign="top"
| 2012 Jun
| 2012 Jun
Line 179: Line 194:
* TH Express-2 fat tree topology (NUDT)
* TH Express-2 fat tree topology (NUDT)
* OS - NUDT Kylin Linux
* OS - NUDT Kylin Linux
| 33.8 Petaflops
| 54.9 Petaflops
| 17.6 Megawatts
| 17.6 Megawatts
| Built by NUDT, China
| Built by NUDT, China
Line 185: Line 200:


===Trends===
===Trends===
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance 34% greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a twenty-five percent increase in performance.
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance 34% greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.


==Mobile Processors==
==Mobile Processors==

Revision as of 21:17, 24 September 2013

Since 2006, parallel computers have continued to evolve. Besides the increasing number of transistors (as predicted by Moore's law), other designs and architectures have increased in prominence. These include Chip Multi-Processors, cluster computing, and mobile processors.

Transistor Count

At the most fundamental level of parallel computing development is the transistor count. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006. By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971. The clock frequency has also continued to rise, if a bit slower since 2006. In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz in 1971. By 2011, the high end clock speed of a processor is in the 3.3GHz range.

Evolution of Intel Processors

Table 1.1: Evolution of Intel Processors
From Procs Transistors Specifications New Features
2000 Pentium IV 55 Million 1.4-3GHz hyper-pipelining, SMT
2006 Xeon 167 Million 64-bit, 2GHz, 4MB L2 cache on chip Dual core, virtualization support
2007 Core 2 Allendale 167 Million 1.8-2.6 GHz, 2MB L2 cache 2 CPUs on one die, Trusted Execution Technology
2008 Xeon 820 Million 2.5-2.83 GHz, 6MB L3 cache
2009 Core i7 Lynnfield 774 Million 2.66-2.93 GHz, 8MB L3 cache 2-channel DDR3
2010 Core i7 Gulftown 1.17 Billion 3.2 GHz 32 nm
2011 Core i7 Sandy Bridge EP4 1.2 Billion 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache Up to 8 cores
2012 Core i7 Ivy Bridge 1.2 Billion 2.5-3.7 GHz 22 nm, 3D Tri-gate transistors
2013 Core Haswell 1.4 Billion 2.5-3.7 GHz Fully integrated voltage regulator

Chip Multi-Processors

With the sophistication of processors and increasing clock speeds, effort was placed on parallelism. The high clock speed could be broken down into a large pipeline; this large pipeline allowed big performance gains with instruction level parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance in parallel computers after only a few years. This inhibited adoption of multi-processors as it was costly and not needed. Of course, the performance gains of ILP was soon limited. Once branch prediction had a success rate of 90%, there was little room for further improvement. At this point, the main way of increasing performance was to increase the clock speed. This also meant more power consumption.

As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn towards chip multi-processors (i.e. multicore architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can execute multiple tasks simultaneously, and each core can use ILP with pipelining. Driven by the gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.

Table 1.2: Examples of current multicore processors
Aspects Intel Sandy Bridge AMD Valencia IBM POWER7
# Cores 4 8 8
Clock Freq. 3.5GHz 3.3GHz 3.55GHz
Clock Type OOO Superscalar OOO Superscalar SIMD
Caches 8MB L3 8MB L3 32MB L3
Chip Power 95 Watts 95 Watts 650 Watts for the whole system

Cluster Computers

The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system. Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors. Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.

One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails. In order to function, high-availability clusters must be able to check and change the status of running applications. The applications must also use shared storage, while operating in a way such that its data is protected from corruption.


Cluster computers 2008 - 2013
Date of #1 Rank Name Number of Cores/Nodes Specifications Peak Performance Power Usage Information
2010 Nov Tianhe-1
  • 186,368 Cores
  • 7,168 computing nodes
  • 2 Xeon X5670 6-core CPUs per node
  • 1 Nvidia M2050 GPU per node
  • 262 Terabytes RAM
  • Arch interconnect (NUDT)
  • Linux
4.7 Petaflops 4.0 Megawatts Built by NUDT, China
2011 Nov K Computer
  • 705,024 Cores
  • 96 computing nodes
  • 2.0GHz 8-core SPARC64 VIIIfx
  • 6 I/O nodes
  • Using Message Passing Interface
  • Tofu 6-dimensional torus interconnect
  • OS - Linux variant
11.28 Petaflops 9.89 Megawatts Built by Fujitsu, Housed in Japan
2012 Jun Sequoia
  • 1,572,864 Cores
  • 98,304 computing nodes
  • 16-core PowerPC A2, Blue Gene/Q
  • 1.5 Petabytes RAM
  • 5-dimensional torus interconnect
  • OS - Linux variant
20.13 Petaflops 7.9 Megawatts Built by IBM, Housed in California, US
2012 Nov Titan
  • 560,640 computing cores
  • AMD Opertons CPUs
  • Nvidia Tesla GPUs
  • 693 Terabytes RAM (CPU + GPU)
  • Cray Gemini interconnect
  • OS - Cray Linux
27.11 Petaflops 8.2 Megawatts Built by Cray, housed in California, US
2013 Jun Tianhe-2
  • 3,120,000 Cores
  • 16,000 nodes
  • 2 Intel Xeon IvyBridge per node
  • 3 Intel Xeon Phi per node
  • 1.34 Petabytes RAM
  • TH Express-2 fat tree topology (NUDT)
  • OS - NUDT Kylin Linux
54.9 Petaflops 17.6 Megawatts Built by NUDT, China

Trends

In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu. Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance 34% greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.

Mobile Processors

Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.

Examples of current mobile processors
Aspects Intel Atom N2800 ARM Cortex-A9
# Cores 2 2
Clock Freq 1.86GHz 800MHz-2000MHz
Cache 1MB L2 4MB L2
Power 35 W .5W-1.9W

Sources

  1. http://en.wikipedia.org/wiki/Transistor_count
  2. http://ark.intel.com/products/52220/Intel-Core-i3-2310M-Processor-%283M-Cache-2_10-GHz%29
  3. http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html
  4. http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive
  5. http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/
  6. http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review
  7. http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon
  8. http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html
  9. http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html
  10. http://ark.intel.com/products/61275
  11. http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series
  12. http://www.arm.com/products/processors/cortex-a/cortex-a9.php
  13. http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)
  14. http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx
  15. http://en.wikipedia.org/wiki/High-availability_cluster