Supplement to Chapter 2: The Data Parallel Programming Model
Overview
Comparing the Data Parallel Model with the Shared Memory and Message Passing Models
Comparison between shared memory, message passing, and data parallel programming models (adapted from Solihin 2008, page 22).
Aspects
|
Shared Memory
|
Message Passing
|
Data Parallel
|
Communication
|
implicit (via loads/stores)
|
explicit messages
|
implicit
|
Synchronization
|
explicit
|
implicit (via messages)
|
implicit for SIMD; explicit for SPMD
|
Hardware support
|
typically required
|
none
|
|
Development effort
|
lower
|
higher
|
higher
|
Tuning effort
|
higher
|
lower
|
|
A Code Example
// Simple sequential code from Solihin 2008, page 25.
for (i = 0; i < 8; i++)
a[i] = b[i] + c[i];
sum = 0;
for (i = 0; i < 8; i++)
if (a[i] > 0)
sum = sum + a[i];
Print sum;
// Data parallel implementation in C for CUDA.
__global__ void kernel(
double* a,
double* b,
double* c,
double* local_sum)
{
int id = threadIdx.x;
int local_iter = 4;
int start_iter = id * local_iter;
int end_iter = start_iter + local_iter;
for (int i = start_iter; i < end_iter; i++)
a[i] = b[i] + c[i];
local_sum[id] = 0;
for (int i = start_iter; i < end_iter; i++)
local_sum[id] = local_sum[id] + a[i];
}
int main()
{
double a[8], b[8], c[8], local_sum[2];
kernel<<<1, 2>>>(a, b, c, local_sum);
double sum = local_sum[0] + local_sum[1];
cout << sum;
}
C DATA PARALLEL IMPLEMENTATION IN FORTRAN
REAL A(8), B(8), C(8), LOCAL_SUM(2)
FORALL ID = 1:2
LOCAL_ITER = 4
START_ITER = (ID - 1) * LOCAL_ITER + 1
END_ITER = START_ITER + LOCAL_ITER - 1
DO I = START_ITER:END_ITER
A[I] = B[I] + C[I]
END DO
END FORALL
FORALL ID = 1:2
LOCAL_ITER = 4
START_ITER = (ID - 1) * LOCAL_ITER + 1
END_ITER = START_ITER + LOCAL_ITER - 1
LOCAL_SUM[ID] = 0;
DO I = START_ITER:END_ITER
LOCAL_SUM[ID] = LOCAL_SUM[ID] + A[I]
END DO
END FORALL
SUM = LOCAL_SUM[0] + LOCAL_SUM[1]
WRITE(*,*) SUM
Hardware Examples