CSC/ECE 506 Spring 2013/1c ad: Difference between revisions

Revision as of 07:38, 5 February 2013

Overview

This wiki article explores the Multiple Instruction Single Data architecture of multi processors as classified by Flynn’s Taxonomy. The article starts with a description of Flynn’s Taxonomy and its classification followed by the MISD architecture and its implementation. It also talks about the authors' and researchers' comments about the real-world examples of architecture and ends by providing examples of the architecture.

Parallel Architecture<ref>http://en.wikipedia.org/wiki/Parallel_computing</ref>

"Parallel operation has many different forms within a computer system. Using a model based on the different streams used in the computation process, we represent some of the different kinds of parallelism available. A stream is a sequence of objects such as data, or of actions such as instructions. Each stream is independent of all other streams, and each element of a stream can consist of one or more objects or actions. We thus have four combinations that describe most familiar parallel architectures:

(1) SISD: single instruction, single data stream. This is the traditional uniprocessor

(2)SIMD: single instruction, multiple data stream. This includes vector processors as well as massively parallel processors

(3) MISD: multiple instruction, single data stream. These are typically systolic arrays

(4) MIMD: multiple instruction, multiple data stream. This includes traditional multiprocessors as well as the newer networks of workstations

Each of these combinations characterizes a class of architectures and a corresponding type of parallelism." Flynn's Taxonomy

Flynn’s Taxonomy of Parallel Computers<ref>http://en.wikipedia.org/wiki/Flynn's_taxonomy</ref><ref>http://www.phy.ornl.gov/csep/ca/node11.html</ref>

Flynn defined the taxonomy of parallel computers [Flynn, 1972] based on the number of instruction streams and data streams.

• An Instruction stream is a sequence of instructions followed from a single program counter

• A Data stream is an address in memory which the instruction operates on.

A control unit fetches instructions from a single program counter, decodes them, and issues them to the processing element. The processing element is assumed to be a functional unit. Instruction and data are both supplied from the memory.

The four classifications defined by Flynn are based upon the number of concurrent instruction (or control) and data streams available in the architecture are<ref>https://computing.llnl.gov/tutorials/parallel_comp/#Flynn</ref>

Multiple Instructions Single Data (MISD)

MISD (multiple instruction, single data) is an architecture in which multiple processing elements execute from different instruction streams, and data is passed from one processing element to the next. It is a type of parallel computing architecture where many functional units perform different operations on the same data.

What actually MISD architecture means

As explained in detail by Flynn <ref>http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1447203&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F5%2F31091%2F01447203 Flynn's paper</ref>: "It basically employs the high bandwidth dedicated execution unit. This unit is then shared by n virtual machines operating on program sequences independent of one another. Each virtual machine has access to the execution hardware once per cycle. Each virtual machine, of course, has its own private instruction memory and interaction between instruction streams occurs only via the common data memory."

"Presumably, if there are N instruction units then the bandwidth of the common data storage must be N times greater than the individual instruction storage. This requirement could be substantially reduced by use of a modified version of Tomasulo’s tag-forwarding algorithm and a separate common forwarding bus."

"Another version of this would force forwarding of operands. Thus the data stream presented to Execution Unit 2 is the resultant of Execution Unit 1 operating its instruction on the source data stream. The instruction that any unit performs may be fixed (specialized such that the interconnection (or setup) of units must be flexible), semifixed (such that the function of any unit is fixed for one pass of a data file) or variable (so that the streamof instructions operates at any point on thes ingle data stream). Under such an arrangement, only the first execution unit sees the source data stream and while it is processing the ith operand the ith execution unit is processing the ith derivation of the first operand of the source stream."

Implementing MISD<ref>http://public.callutheran.edu/~reinhart/CSC521MSCS/Week5/FlynnTaxonomies.pdf</ref>

"Although it is easy to both envision and design MISD processors, there has been little interest in this type of parallel architecture. The reason, so far anyway, is that no ready programming constructs easily map programs into the MISD organization.

Abstractly, the MISD is a pipeline of multiple independently executing functional units operating on a single stream of data, forwarding results from one functional unit to the next. On the microarchitecture level, this is exactly what the vector processor does. However, in the vector pipeline the operations are simply fragments of an assembly- level operation, as distinct from being a complete operation in themselves.

Surprisingly, some of the earliest attempts at computers in the 1940s could be seen as the MISD concept. They used plug boards for programs, where data in the form of a punched card was introduced into the first stage of a multistage processor. A sequential series of actions was taken in which the intermediate results were forwarded from stage to stage until at the final stage a result was punched into a new card."

Implementations of MISD architecture

Systolic Array

A systolic array is an arrangement of processors in an array where data flows synchronously across the array between neighbors, usually with different data flowing in different directions. Each Processor at each step takes in data from one or more neighbors, processes it and, in the next step, outputs results in the opposite direction.

The systolic array paradigm, data-stream-driven by data counters, is the counterpart of the von Neumann paradigm, instruction-stream-driven by a program counter. Because a systolic array usually sends and receives multiple data streams, and multiple data counters are needed to generate these data streams, it supports data parallelism. The name derives from analogy with the regular pumping of blood by the heart.<ref>http://en.wikipedia.org/wiki/Systolic_array</ref>

Type of Systolic Arrays<ref>http://home.engineering.iastate.edu/~zambreno/classes/cpre583/documents/JohHur93A.pdf General Purpose Systolic Arrays </ref>

Special-purpose systolic array

Figure 6: The algorithm for the sum of a scalar product, computed in systolic element [5]

Figure 7: The systolic product of two 3x3 matrices [5]

An array of hardwired systolic processing elements tailored for a specific application. Typically, many tens or hundreds of cells fit on a single chip. One of the major applications of special-purpose systolic array is in matrix operations. Figure 6 illustrates the algorithm for the sum of a scalar product, computed in a single systolic element. Here, a’s and b’s are synchronously shifted through the processing element to be available for next element. These data synchronously exits the processing element unmodified for the next element. The sum of the products is then shifted out of the accumulator.

This principle easily extends to a matrix product as shown in Figure 7 and the animation created with the help from the link. The only difference between single-element processing and array processing is that the latter delays each additional column and row by one cycle so that the columns and rows line up for a matrix multiply. The product matrix is shifted out after completion of processing.

General-purpose systolic array

It is an array of systolic processing elements, which gets adapted to a variety of applications via programming or reconfiguration. Array topologies can be either programmable or reconfigurable. Likewise, array cells are either programmable or reconfigurable. This is referred to as Systolic topologies.

A programmable systolic architecture is a collection of interconnected, general-purpose systolic cells, each of which is either programmable or reconfigurable. Programmable systolic cells are flexible processing elements specially designed to meet the computational and I/O requirements of systolic arrays. Programmable systolic architectures can be classified according to their cell inter-connection topologies: fixed or programmable.

Reconfigurable systolic architectures capitalize on FPGA technology, which allows the user to configure a low-level logic circuit for each cell. Reconfigurable arrays also have either fixed or reconfigurable cell interconnections. The user configures an array’s topology by means of a switch lattice. Any general-purpose array that is not conventionally programmable is usually considered reconfigurable. All FPGA re-configuring is static due to technology limitations.

Hybrid models make use of both VLSI and FPGA technology. They usually consist of VLSI circuits embedded in an FPGA-reconfigurable interconnection network.

Programmable Systolic Array

It is an array of programmable systolic elements that operates either in SIMD or MIMD fashion. Either the arrays interconnect or each processing unit is programmable and a program controls dataflow through the elements. Programmable systolic arrays are programmable either at a high level or a low level. At either level, programmable arrays can be categorized as either SIMD or MIMD machines.

SIMD (Single Instruction Multiple Data)

Figure 8: General organization of SIMD programmable linear systolic arrays 5

In SIMD systolic machines (Figure 8) the host workstation preloads a controller and a memory, which are external to the array, with the instructions and data for the application. The systolic cells store no programs or instructions. As soon as the workstation enables execution, the controller sequences through the external memory thereby delivering instructions and data to the systolic array. Within the array, instructions are broadcast and all cells perform the same operationon different data. Adjacent cells may share memory, but generally nomemory is shared by theentire array. After exiting the array, data is collected in the external buffer memory.

This architecture can also be classified based on the number of instruction and data streams as Single Instruction Single Data (SISD) architecture as all the PEs are fed from the same instruction stream and the single data stream passes through all the PEs.

MISD (Multiple Instruction Single Data)

Figure 9: General organization of MIMD programmable linear systolic arrays 5

The workstation downloads a program to each MISD (Figure 9) systolic cell. Each cell may be loaded with a different program, or all the cells in the array may be loaded with the same program. Each cell's architecture is somewhat similar to the conventional von Neumann architecture: It contains a control unit, an ALU, and local memory. MIMD systolic cells have more local memory than their SIMD counterparts to support the von Neumann-style organization.

This architecture is defined as Multiple Instruction Multiple Data (MIMD) architecture in 5. The architecture has multiple instruction streams for the PEs and a single data stream passing through all the PEs. Thus, it can also be defined as Multiple Instruction Single Data (MISD) architecture. The architecture of Systolic array configuration are controversial as explained in the section 4.1.2.

Reconfigurable Systolic Array

Figure 10: Block Diagram of the RSA Architecture 5

It is an array of systolic elements that can be programmed at the lowest level. Recent gate density advances in FPGA technology have produced a low-level, reconfigurable systolic array architecture that bridges the gap between special-purpose arrays and the more versatile, programmable general-purpose arrays. The FPGA architecture is unusual because a single hardware platform can be logically reconfigured as an exact duplicate of a special-purpose systolic array.

The RSA circuit design is based on systolic array architecture consisting of PEs interconnected via SWs as depicted in Figure 10. The homogeneous characteristic of the Reconfigurable Systolic Array (RSA) architecture, where each reconfigurable processing element (PE) cell is connected to its nearest neighbors via configurable switch (SW) elements, enables array expansion for parallel processing and facilitates time sharing computation of high-throughput data by individual PEs. Both the PEs and SWs can be reconfigured dynamically with the former as an arithmetic processor and the latter as a flexible router linking the neighboring PE cells. The RSA shifts reconfiguration and input signals into the PEs and SWs on separate data bus which enables the circuit to continue its operation while the reconfiguration is in process.

Architecture of systolic arrays as against MISD architecture

Figure 11.Comparison between Architecture of systolic arrays and MISD

As from the above mentioned configurations of the Systolic Arrays, it is seen that generally the configurations have multiple processing elements executing different instructions from dedicated instruction streams for each processing element. There is a single data stream that connects the adjacent PEs. Thus, systolic array can be defined as an MISD architecture.

Many authors say that as the data read as input by one processing element is processed data output of the adjacent PE. The data stream cannot be considered as single because all the data paths do not carry the same data to all the PEs. Figure 11 shows the difference between the Data Stream for Systolic Arrays and the MISD architecture. Thus the systolic array should be considered as “Multiple Data” architecture and not Single Data architecture.

Fault Tolerant Systems<ref>http://en.wikipedia.org/wiki/Fault-tolerant_computer_system#Types_of_fault_tolerance</ref>

The fault tolerant systems are designed to handle the possible failures in software, hardware or interfaces. The hardware faults include hard disk failures, input or output device failures, etc. and the software and interface faults include driver failures; operator errors, installing unexpected software etc. The hardware faults can be detected and identified by implementing redundant hardware and multiple backups. The software faults can be tolerable by removing the program errors by executing the software redundantly or by implementing small programs that take over the tasks that crash or generate errors.

History:<ref>http://en.wikipedia.org/wiki/Fault-tolerant_computer_system#History</ref>

Figure 12 MISD as fault tolerant architecture

The first known fault-tolerant computer was SAPO, built in 1951 in Czechoslovakia by Antonin Svoboda. Its basic design was magnetic drums connected via relays, with a voting method of memory error detection.

They separated into three distinct categories:

machines that would last a long time without any maintenance
computers that were very dependable but required constant monitoring
computers with a high amount of runtime which would be under heavy use

Voting was another initial method with multiple redundant backups operating constantly and checking each other's results and reporting the component with non-matching result as faulty. This is called M out of N majority voting.

Historically, motion has always been to move further from N-model and more to M out of N due to the fact that the complexity of systems and the difficulty of ensuring the transitive state from fault-negative to fault-positive did not disrupt operations.

In computer systems, the SIMD, MISD and MIMD architectures facilitate the implementation of the fault tolerance systems by multiple instruction streams or multiple data streams or both. Fault tolerance on computations can be implemented by multiple processors (likely with different architectures) executing the algorithms on the same set of data. The output of each processor is compared with that of the others and M out of N majority voting method is used to determine the faulty processor. Thus MISD architecture is utilized to get the fault tolerance on critical computations.

There are various examples of MISD architecture being used as fault tolerant architecture. The major examples being flight control systems, nuclear power plants, satellite systems, super collider experiment systems, etc. Here, the flight control system is explained as an example of MISD architecture.

The Flight Control System – MISD Example for fault tolerance

A Fly-By-Wire system is used to replace the manual flight control by an electronic control interface. The movements of the flight control in the cockpit are converted to electronic signals and are transmitted to the actuators by wires. The control computers use the feedback from the sensors to compute and control the movement of the actuators to provide the expected response. These computers also perform the task to stabilize the aircraft and perform other tasks without the knowledge of the pilot. Flight control systems must meet extremely high levels of accuracy and functional integrity.

There are redundant flight control computers present in the flight control system. If one of the flight-control computers crashes, gets damaged or is affected by electromagnetic pulses, the other computer can overrule the faulty one and hence the flight of the aircraft is unharmed.

Figure 13: Architecture of triple redundant 777 primary flight computer 6

The number of redundant flight control computers is generally more than two, so that any computer whose results disagree with the others is ruled out to be faulty and is either ignored or rebooted.

Figure 14: PFC with instruction and data streams

Multiple Processors Implementation in Boeing 777<ref>http://www.citemaster.net/getdoc/8767/R8.pdf Y.C. (Bob) Yeh, Boeing Commercial Airplane Group, "Triple-Triple Redundant 777 Primary Flight Computer" </ref>

In modern computers, the redundant flight control computations are carried out by multiprocessor systems. The triple redundant 777 primary flight computer, has the architecture as shown in Figure 13.

The system has three primary flight control computers, each of them having three lanes with different processors. The flight control program is compiled for each of the processors which get the input data from the same data bus but drive the output on their individual control bus. Thus each processor executes different instructions but they process the same data. Thus, it is the best suited example of Multiple Instruction Single Data (MISD) architecture.

The three processors selected for the flight control system of Boeing 777 were Intel 80486, Motorola 68040 and AMD 29050. The dissimilar processors lead to dissimilar interface hardware circuits and compilers. Each lane of the flight control computer is data synchronized with the other lanes so that all of the lanes read the same frame of data from the flight sensors. As the outputs of each lane can be different, the median value of the outputs is used to select the output of the lane to be considered. The lane which has the median value select hardware selected is said to be in “command mode” whereas the other lanes are said to be in “monitoring mode”. It receives the data from the other Primary Flight Computer (PFC) lanes and performs a median select of the outputs. This provides a fault blocking mechanism before the fault detection and identification by the cross-lane monitoring system. Thus, the MISD based multi computer architecture is capable of detecting generic errors in compilers or in complex hardware devices providing assurance beyond reasonable doubt of the dependability of the Fly-By-Wire system.

The above mentioned system clearly has individual Instruction Streams as the architecture of each processor is different, thus different instruction sets and different instruction streams. These processors have frame synchronized input data which means they have same set of data to work upon which is fed from a single data stream. Thus the flight control system can be classified under MISD architecture.

References

Glossary

CMU: Carnegie Mellon University

CU: Control Unit

DS: Data Stream

Fly-By-Wire: System that replaces the conventional manual flight controls of an aircraft with an electronic interface

FPGA: Field Programmable Gate Array

Heterogeneous Systems: A multiprocessor system with different kind of processors

Homogeneous System: A multiprocessor system with same kind of processors

ILP: Instruction Level Parallelism

IS: Instruction Stream

MIMD: Multiple Instruction Multiple Data

MISD: Multiple Instruction Single Data

PE: Processing Element

PFC: Primary Flight Computer

SAPO: Short for Samočinný počítač

SIMD: Single Instruction Multiple Data

SISD: Single Instruction Single Data

VLSI: Very Large Scale Integration

CSC/ECE 506 Spring 2013/1c ad: Difference between revisions

Revision as of 07:38, 5 February 2013

Contents

Overview

Parallel Architecture<ref>http://en.wikipedia.org/wiki/Parallel_computing</ref>

Flynn’s Taxonomy of Parallel Computers<ref>http://en.wikipedia.org/wiki/Flynn's_taxonomy</ref><ref>http://www.phy.ornl.gov/csep/ca/node11.html</ref>