CSC/ECE 506 Spring 2010/ch 2 maf: Difference between revisions
Line 9: | Line 9: | ||
// Message passing code, adapted from Solohin (2008), p. 27. | // Message passing code, adapted from Solohin (2008), p. 27. | ||
id = getmyid(); // | id = getmyid(); // Assume id = 0 for thread 0, id = 1 for thread 1 | ||
local_iter = 4; | local_iter = 4; | ||
start_iter = id * local_iter; | start_iter = id * local_iter; |
Revision as of 04:11, 29 January 2010
Supplement to Chapter 2: The Data Parallel Programming Model
Chapter 2 of Solihin (2008) covers the shared memory and message passing parallel programming models. However, it does not address the data parallel model, another commonly recognized parallel programming model covered in other treatments like Foster (1995) and Culler (1999). The goal of this supplement is to provide a treatment of the data parallel model which complements Chapter 2 of Solihin (2008).
Overview
Whereas the shared memory and message passing models focus on how parallel tasks access common data, the data parallel model focuses on how to divide up work into parallel tasks. Data parallel algorithms exploit parallelism by dividing a problem into a number of identical tasks which execute on different subsets of common data. An example of a data parallel code can be seen in Code 2.5 from Solohin (2008) which is reproduced with some annotations below.
// Message passing code, adapted from Solohin (2008), p. 27. id = getmyid(); // Assume id = 0 for thread 0, id = 1 for thread 1 local_iter = 4; start_iter = id * local_iter; end_iter = start_iter + local_iter; if (id == 0) send_msg(P1, b[4..7], c[4..7]); else recv_msg(P0, b[4..7], c[4..7]); // Begin data parallel section for (i = start_iter; i < end_iter; i++) a[i] = b[i] + c[i]; local_sum = 0; for (i = start_iter; i < end_iter; i++) if (a[i] > 0) local_sum = local_sum + a[i]; // End data parallel section if (id == 0) { recv_msg(P1, &local_sum1); sum = local_sum + local_sum1; Print sum; } else send_msg(P0, local_sum);
In this code the three 8 element arrays are divided into two 4 element chunks which are processed concurrently by two threads.
The logical opposite of data parallel is task parallel, in which a number of distinct tasks operate on common data. Since each parallel task is unique, a major limitation of task parallel algorithms is that the maximum degree of parallelism attainable is limited to the number of tasks that have been formulated. This is in contrast to data parallel algorithms, for which the number of parallel tasks may be increased by partitioning the data into a greater number of smaller subsets. Haveraaen (2000) also notes that task parallel algorithms are inherently more complex, requiring more elaborate methods of communication and synchronization. Haveraaen (2000) claims that data parallel algorithms on the other hand appear much like sequential algorithms and are therefore easier to write and to read.
int id = getmyid(); // assume id = 0 for thread 0, id = 1 for thread 1 if (id == 0) { for (i = 0; i < 8; i++) { a[i] = b[i] + c[i]; send_msg(P1, a[i]); } } else { sum = 0; for (i = 0; i < 8; i++) { recv_msg(P0, a[i]); if (a[i] > 0) sum = sum + a[i]; } Print sum; }
The shared memory and message passing models do not, in general, impose constraints on what kind of work is to be performed by each task, and the data parallel model does not necessarily require a particular methodology for controlling access to common data.
In fact, the data parallel model can be used in conjunction with either the shared memory or the message passing model, as explored by Klaiber (1994).
Aspects | Shared Memory | Message Passing | Data Parallel |
---|---|---|---|
Communication | implicit (via loads/stores) | explicit messages | implicit |
Synchronization | explicit | implicit (via messages) | typically implicit |
Hardware support | typically required | none | |
Development effort | lower | higher | higher |
Tuning effort | higher | lower |
A Code Example
// Simple sequential code from Solihin 2008, page 25. for (i = 0; i < 8; i++) a[i] = b[i] + c[i]; sum = 0; for (i = 0; i < 8; i++) if (a[i] > 0) sum = sum + a[i]; Print sum;
// Data parallel implementation in C++ with OpenMP. #include <omp.h> #include <iostream> int main(void) { double a[8], b[8], c[8], localSum[2]; long s = 4; int id, i; #pragma omp parallel for private(id, i) reduction(+:s) for (i = 0; i < 8; i++) { a[i] = b[i] + c[i]; } for (i = 0; i < 2; i++) localSum[i] = 0; #pragma omp parallel for private(id, i) reduction(+:s) for (i = 0; i < 8; i++) { id = omp_get_thread_num(); if (a[i] > 0) localSum[id] = localSum[id] + a[i]; } double sum = localSum[0] + localSum[1]; std::cout << sum << std::endl; }
Hardware Examples
Definitions
- Data parallel
- Task parallel
References
- David E. Culler, Jaswinder Pal Singh, and Anoop Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Morgan-Kauffman, 1999.
- Ian Foster, Designing and Building Parallel Programs, Addison-Wesley, 1995.
- Magne Haveraaen, "Machine and collection abstractions for user-implemented data-parallel programming," Scientific Programming, 8(4):231-246, 2000.
- W. Daniel Hillis and Guy L. Steele, Jr., "Data parallel algorithms," Communications of the ACM, 29(12):1170-1183, December 1986.
- Alexander C. Klaiber and Henry M. Levy, "A comparison of message passing and shared memory architectures for data parallel programs," in Proceedings of the 21st Annual International Symposium on Computer Architecture, April 1994, pp. 94-105.
- Yan Solihin, Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems, Solihin Books, 2008.