CSC/ECE 506 Fall 2007/wiki1 1.3.3 1.3.4 chase2007

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction to Parallel Computer Architecture ->Fundamental Design Issues -> Communication and Replication (section 1.3.3)

Communication and Replication

Communicationa and replication are inherently related.To be clear on the relationship of communication and replication, it is important to distinguish several concepts. When a program performs a write, it binds a data value to an address. The data resides in some physical storage element in the machine. A data transfer occurs whenever data in one storage element is transferred into another. This doesnot necessarily change the bindings of the addresses and values. The same data may reside in multiple physical locations as it does in the uniprosessor storage heirarchy, but the one nearest to the processor is the only one that the processor can observe. If it is updated, the other hidden replicas including the actual memory location, must eventually be updated.Copying data binds a new set of addresses to the sane set of values. Generally, this will cause data transfers.Once the copy has been made, the two sets of bindings are completely independent.Hence, Replication is the creation of a local copy of data to help enable parallelism. Replication helps to avoid unnecessary communication.


Communication occurs when data written from one process is read by another process. This may cause data transfer within the machine, either on the write or the read,or the data transfer may occur for other reasons.Communication may involve establishing a new binding or not doing so, depending on the particular communication abstraction.

Introduction to Parallel Computer Architecture ->Fundamental Design Issues -> Performance (section 1.3.4)

Computer Performance:

Computer performance is a measure of the output of a computer with with respect to time and resources used.

Performance metrics:

Following are the important metrics used to measure a computer's performance:

1.Latency: The time taken to perform an operation
2.Bandwidth:The rate at which the operations are performed
3.Cost: The impact these operations have on the execution time of the program
4.Saturation Point: In performance of a parallel system there is a point that is reached where the system can not keep up with the load on it. At this point response times will go to infinity. This point is where the load on the system completely utilizes the most scarce resource, often the cpu.
5.Scaleability: Often a parallel system is not rated by its speed alone, but instead based on how it performes with varying loads. A system that is scaleable will have a gradual increase in response times untill it reaches saturation.
6.Load: The load on a parallel system is the measure of the work/time that is required of the system. It is relative to the number of actions taking place as well as the time between those actions and the complexity of each action.

In the context of parallel computers, communication and synchonisation between the processors become very important considerations.The communication between the processors that occurs mostly in the form of data transfers between the processors. So, the performance of a parallel computer is modeled considering the follwoing criteria

1.Data Transfer Time:

It is the time taken for initiation of a data transfer and the time required for actual data transfer. So the Data Transfer Time can be given as:

Transfer Time (n) = T+(n/B)
where
n = Amount of Data (in bytes)
B = Transfer Rate of the component moving the data (bytes per second)
T = Start up cost, a constant

This is a very convinient model, and it is used to describe a diverse collection of operations, including messages, memory accesses, bus transactions, and vector operations. For message passing, the start up cost can be thought of as the time for the first bit to get to the destination. For memory operations, it is essentially the access time. For bus transactions, it reflects the bus arbitration and command phases. For any sort of pipelinedoperation, including pipelined instruction processing or vector operations, it is the time to fill the pipeline.


2.Overhead and Occupancy:

The data transfer operations are initiated by the processor through communication assist. The essential components of this operation can be described by the following simple model. This model is very generic and can be used to explain data transfers in many places in modern, highly pipelined computer systems.:

Communication Time = Overhead+Occupancy+Network Delay

The overhead is the time the processor spends initiating the transfer of data. This may be a fixed cost, if the processor imply has to tell the communication assist to start. The overhead can also be linear with Tranfer time, if the processor has to copy the data into the assist.The key point is that this is time the processor is busy with the communication event; it cannot do other useful work to initiate other communication during this time. Thr ramaining portion of the communication time is considered as network latency; it is the part that can be hidden by other processor operations

The occupancy is the time it takes for the data to pass through the slowest componant on the communication path.The data wol occupy oyher resources, including buffers,switches, and the communication assist. Often the communication assist is the bottleneck that determines the occupancy. The occupancy limits how frequently communication operations can be initiated.The next data transfer will have to wait untill the critical resource is no longer occupied before it can use the same resource. If there is buffering between the processor and the bottleneck, the processor may be able to issue a burst of transfers at a frequency greater than I/O occupancy; however, once this buffer is full, the processor must slow to the rate set by the occupancy. A new transfer can start only when an older one finishes

The remaining communication time is lumped into the network dealy,which includes the time for the bit to be routed across the actual network as well as many other factors, such as the time to get through communication assists.

3.Communication Cost:

It is the time that the processor spends in communicating with other processors. It can be given by the following:

Communication Cost = Frequency of Communication * (Communication Time - Overlap)


Frequency of Communication = Number of communications per unit of work

Communication Time = Overhead + Occupancy + Network Delay

Overlap = The portion of the communication operation that is performed concurrently with other useful work.

Frequency of communication depends on many programming factors and many hardware design factors.In particular, hardware may limit the transfer size and thereby determine the minimum number of messages. It may automatically replicate data or migrate it to where it is used. However, a certain amount of communication is inherent to parallel execution since data must be shared and processors must coordinate their work. In general, for a machine to support programs with a high communication frequency, the other parts of the communication cost equation must be small - low overhead, low network delay, and small occupancy.

The overlap is the portion of the communication coperation that is performed caoncurrently with other useful work, including computation or other communication time. The reduction of the effective cost is possible because much of the communication time involves work done by the components of the system other than the processor such as the communication assist, the bus, the network, or the remote processor or memory.Overlapping communicationw ith oether work is a form of small-scale parallelism, as is the instruction-level parallelism exploited by fast microprocessors.In effect, some available parallelism will be invested to hide the actual cost of communication.


References:

Parallel Computer Architecture- A Hardware/Software Approach by David E Culler, Jaswinder Pal Singh and Anoop Guptha

http://en.wikipedia.org/wiki/Computer_performance