CSC/ECE 506 Fall 2007/wiki1 1 11

From Expertiza_Wiki
Jump to navigation Jump to search

Sections 1.1 and 1.1.2: Update performance trends in multiprocessors.


1.1 Why parallel architecture ?

The number of processors – new dimension to the design space. Demand: performance at acceptable cost Factors impacting performance gain

Microprocessor, minicomputer, mainframe, supercomputer – performance trend over time.

Single chip Microprocessor dominating – 1990s

Technological & architectural trends that strives to meet application demand for increased performance.


1.1.2 Technology trends

Scientific & engineering computing, commercial computing

Processors Difficult to wait for single processor to get fast enough Critical issues in parallel computer architecture are fundamentally similar to that in sequential computer: resource allocation among functional units, caches – locality, wires – communication bandwidth

1. Reduction in the basic VLSI feature size – makes transistor, gates, circuit faster & smaller – more fit in same area 2. Useful die size is growing – more area to use 3. Clock rate improves in proportion to size increase (1,2). Use of may transistors at once (parallelism) is expected.

Performance of microprocessor has been increasing at much greater rate than clock frequency. Benchmark for measuring workstation performance: SPEC, LINPACK Processors are getting faster in large part by making more effective use of an even larger volume of computing resources.

Basic single chip building block – 100 million transistors by year 2000. Raises possibility of placing more computer system on chip – including memory & I/O support. Possibility of placing multiple processors on chip. Evident commercially – system-on-a-chip: embedded systems

DUAL CORE Processors System designers are moving toward multi-core processor architectures rather than higher frequency devices to enable higher system performance while minimizing increases in power consumption. Dual core micro-processors, originally conceived for computationally intensive applications such as servers, are now being designed and deployed across a range of embedded applications. Many applications are better suited to thread level parallelism (TLP) methods, and multiple independent CPUs is one common method used to increase a system's overall TLP. A combination of increased available space due to refined manufacturing processes and the demand for increased TLP is the logic behind the creation of multi-core CPUs.

Memory technology: divergence between capacity & speed – capacity increased 1000 times, cycle time – factor of 2. Gap between processor cycle time & memory cycle time – wider. Memory bandwidth demanded by processor is growing rapidly. Latency: access time – One or two levels of caches on chip, additional level of external cache. Multiprocessor design – how to organize collection of caches.

DDR2 - Like all SDRAM implementations, DDR2 stores memory in memory cells that are activated with the use of a clock signal to synchronize their operation with an external data bus. Like DDR before it, DDR2 cells transfer data both on the rising and falling edge of the clock (a technique called double pumping). The key difference between DDR and DDR2 is that in DDR2 the bus is clocked at twice the speed of the memory cells, so four words of data can be transferred per memory cell cycle. Thus, without speeding up the memory cells themselves, DDR2 can effectively operate at twice the bus speed of DDR. http://en.wikipedia.org/wiki/DDR2

On-chip memory controllers are reducing processor-to-memory latency by a factor of 3 to 4.

Disks: Parallel disk storage system – RAID is becoming norm. Redundant Array of Independent Drives – RAID - combines physical hard disks into a single logical unit either by using special hardware or software. The main aims of using RAID are to improve reliability & speed http://en.wikipedia.org/wiki/RAID

Large multilevel caches for files / disk blocks are predominant.

DMA – Direct Memory Access: A DMA transfer essentially copies a block of memory from one device to another. While the CPU initiates the transfer, it does not execute it. For so-called "third party" DMA, as is normally used with the ISA bus, the transfer is performed by a DMA controller which is typically part of the motherboard chipset. More advanced bus designs such as PCI typically use bus mastering DMA, where the device takes control of the bus and performs the transfer itself. A typical usage of DMA is copying a block of memory from system RAM to or from a buffer on the device. Such an operation does not stall the processor, which as a result can be scheduled to perform other tasks. http://en.wikipedia.org/wiki/Direct_memory_access