CSC/ECE 506 Spring 2010/ch 2 aj/Data Parallel Programming

From Expertiza_Wiki
Revision as of 02:25, 4 February 2010 by Zzhang14 (talk | contribs) (→‎Example)
Jump to navigation Jump to search

Data-Parallel Programming Model

Introduction

Data-parallel processing is when multiple processing elements perform an action separately on different sets of data and exchange information globally before processing more code synchronously. Even though this model is offered in contrast to the shared memory model and the message passing model, data-parallel processing can be accomplished by actually passing messages or sharing memory at a deeper level. The main distinction between the data-parallel model and the other two models has to do with the outcome of the individual steps instead of the method of communication.

Importance

Data-parallel processing has been found to be effective in situations where computations involve every element of a matrix in a uniform way. This allows the processing to be divided spatially over memories. Originally, the need for this model was discovered in scientific calculations and is still used today.

Example

This section shows a simple example illustrating the data-parallel programming model. Each of the codes below are written in pseudo-code style.

Suppose we want to perform the following task on an array a: updating each element of a by the product of itself and its index, and adding together the elements of a into the variable sum. The corresponding code is shown below.

// a simple code
sum = 0;
for (i = 0; i < a.length; i++)
{
   a[i] = a[i] * i;
   sum = sum + a[i];
}

When we implement the task using data-parallel programming model, the program can be divided into two parts. The first part performs the same operations on separate elements of the array for each processing element, and the second part reorganizes data among all processing elements (In our example data reorganization is summing up values across different processing elements). Since data-parallel programming model only defines the overall effects of parallel steps, the second part can be accomplished either through shared memory or message passing. The three code fragments below are examples for the first part of the program, shared-memory version of the second part, and message passing for the second part, respectively.

// data parallel programming: let each PE perform the same task on different pieces of distributed data
pe_id = getid();
my_sum = 0;
for (i = pe_id; i < a.length; i += number_of_pe)         //separate elements of the array are assigned to each PE 
{
   a[i] = a[i] * i;
   my_sum = my_sum + a[i];                               //all PEs accumulate elements that are assigned to them in parallel
}
// data reorganization via shared memory
shared sum;
lock();                                                  //prevent race condition
sum = sum + my_sum;                                      //each PE adds up their local my_sum to shared variable sum 
unlock();
barrier;
// data reorganization via message passing
if (pe_id != 0) send_msg (0, my_sum);                    //PE other than PE0 send their local my_sum
else                                                     //PE0 does this
{
   for (i = 1; i < number_of_pe; i++)                    //for each other PE
   {
      recv_msg (i, temp);                                //receive local sum from other PEs
      my_sum = my_sum + temp;                            //accumulate into total
   }
   sum = my_sum;
}
Illustration of data parallel programming

As illustrated above, each processing element performs actions on the array's separate elements which are identified using the processing element's id. In our example, elements assigned to each processing element are interleaved instead of continuous. For example, if the system contains only two process elements, then one of them would operate on all the odd entries and the other the even entries of the array.

Recent changes

As mentioned previously, the data-parallel programming model in itself is a contrast to the shared memory programming model and the message passing programming model. Now, with the evolution of SIMD to single-program-multiple data (SPMD) machines, multiple processing elements run identical copies of the same program. In addition, SPMD now indeed uses a form of shared memory and message-passing to communicate between processing elements. As a result, even though each of the three parallel programming models have distinct features, it appears the trends are for a convergence that integrates multiple models into a single system.

References

1) David E. Culler, Jaswinder Pal Singh, and Anoop Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Gulf Professional Publishing, August 1998.

2) Yan Solihin, Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems, Solihin Books, August 2009.