CSC/ECE 506 Spring 2011/ch4a bm
Overview
Many algorithms can be parallelized effectively. Some of them can even be parellelized using different parallel models. Gaussian elimination is one such algorithm. It can be implemented in the Data Parallel, Shared Memory, and Message passing models. This article discusses implementations of Gaussian elimination in all three models, using High Performance FORTRAN (HPF), OpenMP, MPI.
Gaussian Elimination
FORTRAN Background
Parallel Implementations
Data Parallel
Message Passing
The following section of code implements Gaussian Elimination via message passing, using MPI. It was taken from a paper by S.F.McGinn and R.E.Shaw from the University of New Brunswick, New Brunswick, Canada.[1]
1: root = 0 2: chunk = n**2/p 3: ! main loop 4: do pivot = 1, n-1 5: ! root maintains communication 6: if (my_rank.eq.0) then 7: ! adjust the chunk size 8: if (MOD(pivot, p).eq.0) then 9: chunk = chunk - n 10: endif 11: 12: ! calculate chunk vectors 13: rem = MOD((n**2-(n*pivot)),chunk) 14: tmp = 0 15: do i = 1, p 16: tmp = tmp + chunk 17: if (tmp.le.(n**2-(n*pivot))) then 18: a_chnk_vec(i) = chunk 19: b_chnk_vec(i) = chunk / n 20: else 21: a_chnk_vec(i) = rem 22: b_chnk_vec(i) = rem / n 23: rem = 0 24: endif 25: continue 26: 27: ! calculate displacement vectors 28: a_disp_vec(1) = (pivot*n) 29: b_disp_vec(1) = pivot 30: do i = 2, p 31: a_disp_vec(i) = a_disp_vec(i-1) + a_chnk_vec(i-1) 32: b_disp_vec(i) = b_disp_vec(i-1) + b_chnk_vec(i-1) 33: continue 34: 35: ! fetch the pivot equation 36: do i = 1, n 37: pivot_eqn(i) = a(n-(i-1),pivot) 38: continue 39: 40: pivot_b = b(pivot) 41: endif ! my_rank.eq.0 42: 43: ! distribute the pivot equation 44: call MPI_BCAST(pivot_eqn, n, 45: MPI_DOUBLE_PRECISION, 46: root, MPI_COMM_WORLD, ierr) 47: 48: call MPI_BCAST(pivot_b, 1, 49: MPI_DOUBLE_PRECISION, 50: root, MPI_COMM_WORLD, ierr) 51: 52: ! distribute the chunk vector 53: call MPI_SCATTER(a_chnk_vec, 1, MPI_INTEGER, 54: chunk, 1, MPI_INTEGER, 55: root, MPI_COMM_WORLD, ierr) 56: 57: ! distribute the data 58: call MPI_SCATTERV(a, a_chnk_vec, a_disp_vec, 59: MPI_DOUBLE_PRECISION, 60: local_a, chunk, 61: MPI_DOUBLE_PRECISION, 62: root, MPI_COMM_WORLD,ierr) 63: 64: call MPI_SCATTERV(b, b_chnk_vec, b_disp_vec, 65: MPI_DOUBLE_PRECISION, 66: local_b, chunk/n, 67: MPI_DOUBLE_PRECISION, 68: root, MPI_COMM_WORLD,ierr) 69: 70: ! forward elimination 71: do j = 1, (chunk/n) 72: xmult = local_a((n-(pivot-1)),j) / pivot_eqn(pivot) 73: do i = (n-pivot), 1, -1 74: local_a(i,j) = local_a(i,j) - (xmult * pivot_eqn(n-(i-1))) 75: continue 76: 77: local_b(j) = local_b(j) - (xmult * pivot_b) 78: continue 79: 80: ! restore the data to root 81: call MPI_GATHERV(local_a, chunk, 82: MPI_DOUBLE_PRECISION, 83: a, a_chnk_vec, a_disp_vec, 84: MPI_DOUBLE_PRECISION, 85: root, MPI_COMM_WORLD, ierr) 86: 87: call MPI_GATHERV(local_b, chunk/n, 88: MPI_DOUBLE_PRECISION, 89: b, b_chnk_vec, b_disp_vec, 90: MPI_DOUBLE_PRECISION, 91: root, MPI_COMM_WORLD, ierr) 92: continue ! end of main loop 93: 94: ! backwards substitution done in parallel (not shown)
This code lacks some of the declarations for the variables, but most of the variables are self-explanatory. The code also attempts to do some load balancing via the chunk
variable. chunk
is also used to determine how much data to send, as the amount of data needed in each step gets progressively smaller. Making chunk
smaller will therefor decrease the amount of time spent in communication, thus yielding better runtimes. The other variable of note is root
, which refers to the root processor, the processor that controls the rest of the processors.
The code effectively begins its parallel section at line 4. Lines 5-41 have the root processor setting the chunk size and setting up the data to be passed to the other processors. In lines 43-68, the root processor sends the necessary data to the other processors. The functions MPI_BCAST
, MPI_SCATTER
, and MPI_SCATTERV
serve as either a "send" or a "receive", depending on which processor is executing them; on the root, they act as a send, while on all other processors, they act as a receive[3]. In lines 70-78, each processor is performing the forward elimination on its chunk of data. Finally, the data from each processor is sent back to the root processor using the MPI_GATHERV
function, which also functions as either a "send" or a receive", only the root processor is now the receiver and the other processors are the senders. All of this code is executed for each pivot point in the matrix. Backwards substitution is then done sequentially on the root processor.
The key elements of Message Passing in this code example are the communication via the MPI_
functions and the root processor performing some set-up of data to be passed on its own.
Definitions
References
1. S.F.McGinn and R.E.Shaw, University of New Brunswick, Parallel Gaussian Elimination Using OpenMP and MPI
2. Ian Foster, Argonne National Laboratory, Case Study: Gaussian Elimination
3. MPI: A Message-Passing Interface Standard