Introduction

This article features and answers the following queries:

Any changes in the organization of address spaces in the last 10 years?
Are the interconnection structures different in new computers now than they were 10 years ago?
What is the size and capacity of current SMPs?
How have supercomputers evolved since the Cray T3E?

Shared Address Space

In the parallel computing world, this is the range of Memory addresses accessed / shared by multiple processors. "Shared Memory Multi-Processors" is a class of parallel machines which use Shared Address Space for parallelisation.

Trends in Organisation of address spaces

Typical usage of the address space is by a single processor to store data and instructions. As the need for faster processing of the data and instructions grew, Processors became more and more powerful, along with better organisation of the available memory space. Soon, we had multiple processors sitting on the same chip along with the either dedicated memories or with shared memories. To utilise memory better, different memory accessing schemes evolved ; Real Memory and Virtual Memory. Real memory access is a sequential access of the memory, where memory addresses are accessed one after the other. Such a scheme allowed utilisation of only the available address space. Virtual Memory has also evolved over the years. In the initial years, Memory access was done at two-levels. One of RAM and the other of either the hard-disk or the tape-drive using a technique called overlaying. Overlaying is a technique of replacing an unwanted block in RAM with a block which is required for the current execution.

An improvement of the overlaying technique is what is today called Paging. An intermediate technique called Segmentation also existed, but Segmentation had major drawbacks, especially when the segments were too large to handle.

All current virtual memory principles are based on Paging. Paging is a technique of breaking down the data into 4K "Pages", and then loading them onto the RAM from the lower memory as and when required. A "Translation Lookaside Buffer" (TLB) takes care of all the virtual to physical memory mappings, and using this table, the required data is transferred into the Main memory for access.

Virtual memory, by way of combining the main memory with the lower memories like the hard-drive and the flash-drives, gives the user the feel of having virtually "infinite" memory addresses, though the primary memory might be limited to a particular Memory size.

Pentium can access about 4GB of physical Memory and about 64TB of virtual Memory Physical access has now become virtual access, where the user will feel that the memory to be accessed is infinite, since Virtual Memory mechanism includes the lower elements in the memory hierarchy for expanding memory addresses.

If an application is run on a 32-bit OS, the maximum number of Address locations that can be accessed for a process created is about 4GB. But the physical memory in a typical Desktop might be about 1GB of RAM. To better utilise the full spectrum of address space values accessible by the OS, Virtual Memory concept is put into use, where the remaining 3GB address space is "borrowed" from one or more of the lower memories like the hard-disk.

Interconnection structure

Interconnection structures have changed over the last many years. Bus-type interconnects have replaced the conventional cross-bar switch type interconnects, giving the best of performance and cost.

Current SMPs (Symmetric Multiprocessing)

A Computer system containing 2 or more processors in the same box, with shared memory, but containing just one OS running on them is termed as a Symmetric multiprocessor system. The downtime of such a system is dependent on the weakest link, ie. the single processors; if one processor is down, the whole system is said to be down.

SMPs rely strongly on a good and well-designed Operating System to take care of load-balancing between the multiprocessors, since the assignment of tasks is done solely by the OS. The better the OS, better is the load-balancing, better is the efficient operation of the SMP.

One big advantage of an SMP System is its scalability ; additional processors can be added as needed. Having said that, processors themselves are the biggest disadvantage as well in such a system. If one of the many processors shuts down for some reason, there is no way for the OS to get to know of this failure, and it continues to assign tasks to the "dead" processor, and slows down the execution.

Earlier Computer mother-boards had two separate places where two CPUs could sit, sharing a single main memory, forming an SMP System. But, most of today's laptops and desktops are based on a "multi-core" SMP System, where multiple processors reside in a single package.

The current speeds in SMP Systems have clock speeds exceeding 3.7 GHz, with matching bus speeds of about 1333MHz with an 8MB cache. There are instances of cache in the range of 24MB as well, but the clock speeds are kept low at such cache values.

Supercomputer evolution since Cray T3E

Cray T3E was the first real world Supercomputer to be able to sustain 1TFLOPS in a real world application. Designed and developed by Cray and launched in 1995. It is a "Massively Parallel Processing System" (MPPP).

The next models of Cray used the Direct Connected Processor (DCP) Architecture in which the processor is fused into the interconnect structure. This arrangement eliminated memory contention and also optimised message-passing applications by directly linking processors to each other through a high-performance interconnect fabric.

Majority of the supercomputers run on some flavour of Unix or Linux. It has been predominantly Linux since 2004.

There are special purpose Supercomputers available to solve specific problems ;Astrophysics computation and codebreaking ; Molecular Dynamics ; Deepblue for the game of chess and so on.

Measure of computational speed has gone up to TFLOPS (Tera Floating Point Operations Per Second) and has been moving towards PFLOPS (Peta Floating Point Operations Per Second)

Clustered supercomputing, where a cluster of MIMD Multiprocessors are connected togethere, with each cluster's processor being a SIMD.

Current supercomputers can go beyond 300 TFLOPS.While the Cray T3E was about 4 GFLOPS, which was actually the first one to break the 1GFLOPS barrier.

References

Computer User: http://www.computeruser.com/resources/dictionary/definition.html?lookup=7776
Foldoc : http://foldoc.org/index.cgi?symmetric+multiprocessing
Searchdatacenter : http://searchdatacenter.techtarget.com/sDefinition/0,,sid80_gci214218,00.html
Wikipedia / Supercomputers: http://en.wikipedia.org/wiki/Supercomputers
Intel: http://www.intel.com/cd/ids/developer/asmo-na/eng/95581.htm?page=2

CSC/ECE 506 Fall 2007/wiki1 7 2281

Contents