CSC/ECE 506 Spring 2012/1c cl: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 118: Line 118:
The iWarp project was started in 1988 to investigate issues involved in building and using high performance computer systems with powerful communication support. The project lead to the construction of the iWarp machines, jointly developed by Carnegie Mellon University and Intel Corporation.  
The iWarp project was started in 1988 to investigate issues involved in building and using high performance computer systems with powerful communication support. The project lead to the construction of the iWarp machines, jointly developed by Carnegie Mellon University and Intel Corporation.  


[[File:Cl7.jpg]] [[File:Cl8.jpg]]
[[File:Cl8.jpg]]


The basic building block of the iWarp system is a full custom VSLI component integrating a LIW microprocessor, a network interface and a switching node into one single chip of 1.2cm x 1.2cm silicon.  
The basic building block of the iWarp system is a full custom VSLI component integrating a LIW microprocessor, a network interface and a switching node into one single chip of 1.2cm x 1.2cm silicon.  


Intel Corporation has announced the iWarp systems as product in 1989 and built iWarp systems with over 1500 nodes since then. The first iWarp prototype system was delivered to Carnegie Mellon in Summer 1990 and in Fall CMU received the first 64 cell systems. All three full speed production systems were delivered in 1991. With the creation of the Intel Supercomputing Systems Division in Summer 1992 the iWarp know how was merged with the iPSC product line.  
Intel Corporation has announced the iWarp systems as product in 1989 and built iWarp systems with over 1500 nodes since then. The first iWarp prototype system was delivered to Carnegie Mellon in Summer 1990 and in Fall CMU received the first 64 cell systems. All three full speed production systems were delivered in 1991. With the creation of the Intel Supercomputing Systems Division in Summer 1992 the iWarp know how was merged with the iPSC product line.  


[[File:Cl91.jpg]] [[File:Cl10.jpg]]
[[File:Cl91.jpg]] [[File:Cl10.jpg]]
[[File:Cl7.jpg]]


Intel kept iWarp as a product but stopped actively marketing it. As of today, the start of 1995 all three iWarp systems at CMU are in still in daily use. Surprisingly there are a few applications (e.g. in real time vision) for which iWarp is still the best machine, 3 years after it has been delivered. The high speed static memory and the high performance low latency communication system make iWarp a very well suited target for research efforts and many "proof of concept" applications.
Intel kept iWarp as a product but stopped actively marketing it. As of today, the start of 1995 all three iWarp systems at CMU are in still in daily use. Surprisingly there are a few applications (e.g. in real time vision) for which iWarp is still the best machine, 3 years after it has been delivered. The high speed static memory and the high performance low latency communication system make iWarp a very well suited target for research efforts and many "proof of concept" applications.

Revision as of 01:47, 31 January 2012

Introduction to MISD

In computing, MISD (multiple instruction, single data) is a type of parallel computing architecture in which multiple processing elements execute from different instruction streams, and data is passed from one processing element to the next. The requirement that data is passed from one processing element to the next means that it is restricted to a certain type of computations, but is hard to apply in general. There are some bottle neck of MISD architecture:

1, Low level of parallelism ; 2, High synchronizations; 3, High bandwidth required ; 4, CISC bottleneck ; 5, High complexity

Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline. Fault-tolerant computers executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type.

Here is a simple MISD coding sample

Stream #1 
Load R0,%1 
Add  $1,R0 
Store R1,%1 
Stream #2 
Load R0,%1 
MUL %1,R0 
Store R1,%1

MISD 
ADD_MUL_SUB $1,$4,$7,%1 
SISD 
Load R0,%1 
ADD $1,R0 
MUL $4,R0 
STORE %1,R0

Multiprocessing and Classification

Multiprocessing

Multiprocessing is the use of two or more central processing units) CPUs (within a single computer system .The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them .There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined.

Multiprocessing sometimes refers to the execution of multiple concurrent software processes in a system as opposed to a single process at any one instant .However, the terms multitasking or multiprogramming are more appropriate to describe this concept, which is implemented mostly in software, whereas multiprocessing is more appropriate to describe the use of multiple hardware CPUs .A system can be both multiprocessing and multiprogramming, only one of the two, or neither of the two.

Types of Multiprocessing

Processor Symmetry

In a multiprocessing system, all CPUs may be equal, or some may be reserved for special purposes .A combination of hardware and operating-system software design considerations determine the symmetry )or lack thereof (in a given system .For example, hardware or software considerations may require that only one CPU respond to all hardware interrupts, whereas all other work in the system may be distributed equally among CPUs; or execution of kernel-mode code may be restricted to only one processor )either a specific processor, or only one processor at a time(, whereas user-mode code may be executed in any combination of processors .Multiprocessing systems are often easier to design if such restrictions are imposed, but they tend to be less efficient than systems in which all CPUs are utilized.

Systems that treat all CPUs equally are called symmetric multiprocessing)SMP (systems .In systems where all CPUs are not equal, system resources may be divided in a number of ways, including asymmetric multiprocessing )ASMP(, non-uniform memory access)NUMA(multiprocessing, and clustered multiprocessing.

Processor coupling

Tightly-coupled multiprocessor systems contain multiple CPUs that are connected at the bus level .These CPUs may have access to a central shared memory )SMP or UMA(, or may participate in a memory hierarchy with both local and shared memory )NUMA.( Chip multiprocessors, also known as multi-core computing, involves more than one processor placed on a single chip and can be thought of the most extreme form of tightly-coupled multiprocessing .Mainframe systems with multiple processors are often tightly-coupled.

Loosely-coupled multiprocessor systems )often referred to as clusters (are based on multiple standalone single or dual processor commodity computers interconnected via a high speed communication system )Gigabit Ethernet is common .(A Linux Beowulf cluster is an example of a loosely-coupled system.

Tightly-coupled systems perform better and are physically smaller than loosely-coupled systems, but have historically required greater initial investments and may depreciate rapidly; nodes in a loosely-coupled system are usually inexpensive commodity computers and can be recycled as independent machines upon retirement from the cluster.

Power consumption is also a consideration .Tightly-coupled systems tend to be much more energy efficient than clusters .This is because considerable economies can be realized by designing components to work together from the beginning in tightly-coupled systems, whereas loosely-coupled systems use components that were not necessarily intended specifically for use in such systems.


Flynn's taxonomy for Instruction and Data Streams

In multiprocessing, the processors can be used to execute a single sequence of instructions in multiple contexts )single-instruction, multiple-data or SIMD, often used in vector processing(, multiple sequences of instructions in a single context )multiple-instruction, single-data or MISD(, or multiple sequences of instructions in multiple contexts )multiple-instruction, multiple-data or MIMD). Flynn's taxonomy is a classification to describe instruction and data streams for parallel computing, proposed by Michael J. Flynn in 1966. The four classifications defined by Flynn are based upon the number of concurrent instruction (or control) and data streams available in the architecture:

Single Instruction, Single Data stream (SISD) A sequential computer which exploits no parallelism in either the instruction or data streams. Single control unit (CU) fetches single Instruction Stream (IS) from memory. The CU then generates appropriate control signals to direct single processing element (PE) to operate on single Data Stream (DS) i.e. one operation at a time. Examples of SISD architecture are the traditional uniprocessor machines like a PC (currently manufactured PCs have multiple processors) or old mainframes.

Single Instruction, Multiple Data streams (SIMD) A computer which exploits multiple data streams against a single instruction stream to perform operations which may be naturally parallelized. For example, an array processor or GPU.

Multiple Instruction, Single Data stream (MISD) Multiple instructions operate on a single data stream. Uncommon architecture which is generally used for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the result. Examples include the Space Shuttle flight control computer.

Multiple Instruction, Multiple Data streams (MIMD) Multiple autonomous processors simultaneously executing different instructions on different data. Distributed systems are generally recognized to be MIMD architectures; either exploiting a single shared memory space or a distributed memory space.

MISD multiprocessing

In computing, MISD)Multiple Instruction, Single Data (is a type of parallel computing architecture where many functional units perform different operations on the same data .Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline.

MISD multiprocessing offers mainly the advantage of redundancy, since multiple processing units perform the same tasks on the same data, reducing the chances of incorrect results if one of the units fails .MISD architectures may involve comparisons between processing units to detect failures .Fault-tolerant computers executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may be considered to belong to this type.

Apart from the redundant and fail-safe character of this type of multiprocessing, it has few advantages, and it is very expensive .It does not improve performance. Not many instances of this architecture exist, as MIMD and SIMD are often more appropriate for common data parallel techniques .Specifically, they allow better scaling and use of computational resources than MISD does.



MISD Architecture Application

Few actual examples of this class of parallel computer have ever existed. One example of this machine is the systolic array, such as the CMU iWarp[Borkaret al., 1990]. Another prominent example of MISD in computing are the Space Shuttle flight control computers. The research on MISD concentrates on some conceivable usage: Multiple frequency filters operating on a single signal stream, Multiple cryptography algorithms, and Attempting to crack a single coded message.


Network application with MISD architecture

A number of important system services need strong scaling, where a fixed workload is executed more quickly with increased core counts. This is particularly true for data-intensive, network-oriented system services because network interface card (NIC) line rates continue to increase but the individual cores that service them are not getting any faster. Replicated work approach based on a multiple-instruction/single-data (MISD) execution model seems to be a promising solution to take advantage of unexploited parallelization opportunities in multi-core systems. This approach is particularly important in the network stack, as it is needed to enable strongly scalable network services.

A multiple instruction/single data (MISD) execution model (i.e. replicated work) is used to provide ne-grained data consistency between cores and to eliminate expensive inter-core interactions. In this approach, shown in Figure 1(a), state is replicated into domains on separate cores, and requests that modify shared state are broadcast to every domain using a ringbuffer -based channel abstraction.

The replica that dequeues a request becomes the primary replica for that request and is responsible for fully processing it, including any updates with globally visible side effects (e.g. data delivery, packet reconstruction, acknowledgment generation). Other replicas that process the request will, on the other hand, only partially process each request to maintain state consistency.

MISD approach is particularly appropriate for many network services because it parallelizes expensive per-byte processing (e.g. data copies, encryption/decryption, and error correction) and replicates per-header processing costs that are known to be small. It also retains the reduced locking and caching costs of systems like Barrelfish while adding the ability to perform fine-grained updates to logically shared state.

To examine the potential for this approach compared to lock-based and atomic instruction-based MIMD approaches, A simple synthetic test has been constructed. In this test, processing each request requires some amount of work that can be done by one core without synchronization or replication (i.e. parallelizable work), and some work that must be synchronized or replicated. Specially, the parallelizable work is a set of memory updates done on per-core memory, while the synchronized work is the set of memory up-dates that must be performed. Figure 1(b) shows the potential performance advantages of this approach using a 10:1 ratio of parallelizable to replicated work; we chose this ratio based on our studies of production TCP/IP stacks. As can be seen, the lock-based model scales remarkably poorly and the atomic instruction approach is only slightly better.


Iwarp project by CMU

The iWarp project was started in 1988 to investigate issues involved in building and using high performance computer systems with powerful communication support. The project lead to the construction of the iWarp machines, jointly developed by Carnegie Mellon University and Intel Corporation.


The basic building block of the iWarp system is a full custom VSLI component integrating a LIW microprocessor, a network interface and a switching node into one single chip of 1.2cm x 1.2cm silicon.

Intel Corporation has announced the iWarp systems as product in 1989 and built iWarp systems with over 1500 nodes since then. The first iWarp prototype system was delivered to Carnegie Mellon in Summer 1990 and in Fall CMU received the first 64 cell systems. All three full speed production systems were delivered in 1991. With the creation of the Intel Supercomputing Systems Division in Summer 1992 the iWarp know how was merged with the iPSC product line.

Intel kept iWarp as a product but stopped actively marketing it. As of today, the start of 1995 all three iWarp systems at CMU are in still in daily use. Surprisingly there are a few applications (e.g. in real time vision) for which iWarp is still the best machine, 3 years after it has been delivered. The high speed static memory and the high performance low latency communication system make iWarp a very well suited target for research efforts and many "proof of concept" applications.


References

http://en.wikipedia.org/wiki/Multiprocessing

http://www.itswtech.org/Lec/Manal%28system%20programming%29/simeners_A/Multiprocessing_System_Cimenar.pdf

http://en.wikipedia.org/wiki/Parallel_computing

http://en.wikipedia.org/wiki/Flynn%27s_taxonomy

http://www.cse.unsw.edu.au/~cs4211/04s1/seminars/rompo.ppt

http://www.cs.ucf.edu/courses/cot4810/fall04/presentations/Parallel_Computing.ppt

http://www.codeproject.com/Articles/146414/Basics-of-Single-Instruction-Multiple-Data-SIMD

https://computing.llnl.gov/tutorials/parallel_comp/

Exploiting MISD Performance Opportunities in Multi-core Systems. Patrick G. Bridges, Donour Sizemore, and Scott Levy. Department of Computer Science, University of New Mexico.

http://www.cs.cmu.edu/~iwarp/