CSC/ECE 506 Spring 2011/ch4a bm: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
(→‎References: added MPI reference)
(→‎Message Passing: Completed the section)
Line 12: Line 12:


== Message Passing ==
== Message Passing ==
The following section of code implements Gaussian Elimination via message passing, using MPI:<br>
The following section of code implements Gaussian Elimination via message passing, using MPI.  It was taken from a paper by S.F.McGinn and R.E.Shaw from the University of New Brunswick, New Brunswick, Canada.<sup><span id="1body">[[#1foot|[1]]]</span></sup> <br>


   1: ! main loop
   1: root = 0
   2: do pivot = 1, n-1
  2: chunk = n**2/p
   3:    ! root maintains communication
  3: ! main loop
   4:    if (my_rank.eq.0) then
   4: do pivot = 1, n-1
   5:        ! adjust the chunk size
   5:    ! root maintains communication
   6:        if (MOD(pivot, p).eq.0) then
   6:    if (my_rank.eq.0) then
   7:          chunk = chunk - n
   7:        ! adjust the chunk size
  8:        endif
   8:        if (MOD(pivot, p).eq.0) then
  9:  
   9:          chunk = chunk - n
  10:        ! calculate chunk vectors
10:        endif
  11:        rem = MOD((n**2-(n*pivot)),chunk)
11:  
  12:        tmp = 0
  12:        ! calculate chunk vectors
  13:        do i = 1, p
  13:        rem = MOD((n**2-(n*pivot)),chunk)
  14:          tmp = tmp + chunk
  14:        tmp = 0
  15:          if (tmp.le.(n**2-(n*pivot))) then
  15:        do i = 1, p
  16:              a_chnk_vec(i) = chunk
  16:          tmp = tmp + chunk
  17:              b_chnk_vec(i) = chunk / n
  17:          if (tmp.le.(n**2-(n*pivot))) then
  18:          else
  18:              a_chnk_vec(i) = chunk
  19:              a_chnk_vec(i) = rem
  19:              b_chnk_vec(i) = chunk / n
  20:              b_chnk_vec(i) = rem / n
  20:          else
  21:              rem = 0
  21:              a_chnk_vec(i) = rem
  22:          endif
  22:              b_chnk_vec(i) = rem / n
  23:        continue
  23:              rem = 0
  24:  
  24:          endif
  25:        ! calculate displacement vectors
  25:        continue
  26:        a_disp_vec(1) = (pivot*n)
  26:  
  27:        b_disp_vec(1) = pivot
  27:        ! calculate displacement vectors
  28:        do i = 2, p
  28:        a_disp_vec(1) = (pivot*n)
  29:          a_disp_vec(i) = a_disp_vec(i-1) + a_chnk_vec(i-1)
  29:        b_disp_vec(1) = pivot
  30:          b_disp_vec(i) = b_disp_vec(i-1) + b_chnk_vec(i-1)
  30:        do i = 2, p
  31:        continue
  31:          a_disp_vec(i) = a_disp_vec(i-1) + a_chnk_vec(i-1)
  32:   
  32:          b_disp_vec(i) = b_disp_vec(i-1) + b_chnk_vec(i-1)
  33:        ! fetch the pivot equation
  33:        continue
  34:        do i = 1, n
  34:   
  35:          pivot_eqn(i) = a(n-(i-1),pivot)
  35:        ! fetch the pivot equation
  36:        continue
  36:        do i = 1, n
  37:   
  37:          pivot_eqn(i) = a(n-(i-1),pivot)
  38:        pivot_b = b(pivot)
  38:        continue
  39:    endif ! my_rank.eq.0
  39:   
  40:   
  40:        pivot_b = b(pivot)
  41:    ! distribute the pivot equation
  41:    endif ! my_rank.eq.0
  42:    call MPI_BCAST(pivot_eqn, n,
  42:   
  43:                    MPI_DOUBLE_PRECISION,
  43:    ! distribute the pivot equation
  44:                    root, MPI_COMM_WORLD, ierr)
  44:    call MPI_BCAST(pivot_eqn, n,
  45:   
  45:                    MPI_DOUBLE_PRECISION,
  46:    call MPI_BCAST(pivot_b, 1,
  46:                    root, MPI_COMM_WORLD, ierr)
  47:                    MPI_DOUBLE_PRECISION,
  47:   
  48:                    root, MPI_COMM_WORLD, ierr)
  48:    call MPI_BCAST(pivot_b, 1,
  49:   
  49:                    MPI_DOUBLE_PRECISION,
  50:    ! distribute the chunk vector
  50:                    root, MPI_COMM_WORLD, ierr)
  51:    call MPI_SCATTER(a_chnk_vec, 1, MPI_INTEGER,
  51:   
  52:                      chunk, 1, MPI_INTEGER,
  52:    ! distribute the chunk vector
  53:                      root, MPI_COMM_WORLD, ierr)
  53:    call MPI_SCATTER(a_chnk_vec, 1, MPI_INTEGER,
  54:   
  54:                      chunk, 1, MPI_INTEGER,
  55:    ! distribute the data
  55:                      root, MPI_COMM_WORLD, ierr)
  56:    call MPI_SCATTERV(a, a_chnk_vec, a_disp_vec,
  56:   
57:                      MPI_DOUBLE_PRECISION,
  57:    ! distribute the data
58:                      local_a, chunk,
  58:    call MPI_SCATTERV(a, a_chnk_vec, a_disp_vec,
  59:                      MPI_DOUBLE_PRECISION,
  59:                      MPI_DOUBLE_PRECISION,
  60:                      root, MPI_COMM_WORLD,ierr)
  60:                      local_a, chunk,
  61:   
61:                      MPI_DOUBLE_PRECISION,
  62:    call MPI_SCATTERV(b, b_chnk_vec, b_disp_vec,
62:                      root, MPI_COMM_WORLD,ierr)
63:                      MPI_DOUBLE_PRECISION,
  63:   
64:                      local_b, chunk/n,
  64:    call MPI_SCATTERV(b, b_chnk_vec, b_disp_vec,
  65:                      MPI_DOUBLE_PRECISION,
  65:                      MPI_DOUBLE_PRECISION,
  66:                      root, MPI_COMM_WORLD,ierr)
  66:                      local_b, chunk/n,
  67:   
67:                      MPI_DOUBLE_PRECISION,
  68:    ! forward elimination
68:                      root, MPI_COMM_WORLD,ierr)
  69:    do j = 1, (chunk/n)
  69:   
  70:        xmult = local_a((n-(pivot-1)),j) / pivot_eqn(pivot)
  70:    ! forward elimination
  71:        do i = (n-pivot), 1, -1
  71:    do j = 1, (chunk/n)
  72:          local_a(i,j) = local_a(i,j) - (xmult * pivot_eqn(n-(i-1)))
  72:        xmult = local_a((n-(pivot-1)),j) / pivot_eqn(pivot)
  73:        continue
  73:        do i = (n-pivot), 1, -1
  74:   
  74:          local_a(i,j) = local_a(i,j) - (xmult * pivot_eqn(n-(i-1)))
  75:        local_b(j) = local_b(j) - (xmult * pivot_b)
  75:        continue
  76:    continue
  76:   
  77:   
  77:        local_b(j) = local_b(j) - (xmult * pivot_b)
  78:    ! restore the data to root
  78:    continue
  79:    call MPI_GATHERV(local_a, chunk,
  79:   
80:                      MPI_DOUBLE_PRECISION,
  80:    ! restore the data to root
81:                      a, a_chnk_vec, a_disp_vec,
  81:    call MPI_GATHERV(local_a, chunk,
  82:                      MPI_DOUBLE_PRECISION,
  82:                      MPI_DOUBLE_PRECISION,
  83:                      root, MPI_COMM_WORLD, ierr)
  83:                      a, a_chnk_vec, a_disp_vec,
  84:   
84:                      MPI_DOUBLE_PRECISION,
  85:    call MPI_GATHERV(local_b, chunk/n,
85:                      root, MPI_COMM_WORLD, ierr)
86:                      MPI_DOUBLE_PRECISION,
  86:   
87:                      b, b_chnk_vec, b_disp_vec,
  87:    call MPI_GATHERV(local_b, chunk/n,
  88:                      MPI_DOUBLE_PRECISION,
  88:                      MPI_DOUBLE_PRECISION,
  89:                      root, MPI_COMM_WORLD, ierr)
  89:                      b, b_chnk_vec, b_disp_vec,
  90:  continue ! end of main loop
90:                      MPI_DOUBLE_PRECISION,
91:                      root, MPI_COMM_WORLD, ierr)
  92:  continue ! end of main loop
93:
94:  ! backwards substitution done in parallel (not shown)
<br>
This code lacks some of the declarations for the variables, but most of the variables are self-explanatory.  The code also attempts to do some load balancing via the <code>chunk</code> variable.  <code>chunk</code> is also used to determine how much data to send, as the amount of data needed in each step gets progressively smaller.  Making <code>chunk</code> smaller will therefor decrease the amount of time spent in communication, thus yielding better runtimes.  The other variable of note is <code>root</code>, which refers to the root processor, the processor that controls the rest of the processors.
 
The code effectively begins its parallel section at line 4.  Lines 5-41 have the root processor setting the chunk size and setting up the data to be passed to the other processors.  In lines 43-68, the root processor sends the necessary data to the other processors.  The functions <code>MPI_BCAST</code>, <code>MPI_SCATTER</code>, and <code>MPI_SCATTERV</code> serve as either a "send" or a "receive", depending on which processor is executing them; on the root, they act as a send, while on all other processors, they act as a receive<sup><span id="3body">[[#3foot|[3]]]</span></sup>.  In lines 70-78, each processor is performing the forward elimination on its chunk of data.  Finally, the data from each processor is sent back to the root processor using the <code>MPI_GATHERV</code> function, which also functions as either a "send" or a receive", only the root processor is now the receiver and the other processors are the senders.  All of this code is executed for each pivot point in the matrix.  Backwards substitution is then done sequentially on the root processor.
 
The key elements of Message Passing in this code example are the communication via the <code>MPI_</code> functions and the root processor performing some set-up of data to be passed on its own.


= Definitions =
= Definitions =

Revision as of 05:06, 27 February 2011

Overview

Gaussian Elimination

FORTRAN Background

Parallel Implementations

Data Parallel

Shared Memory

Message Passing

The following section of code implements Gaussian Elimination via message passing, using MPI. It was taken from a paper by S.F.McGinn and R.E.Shaw from the University of New Brunswick, New Brunswick, Canada.[1]

 1: root = 0
 2: chunk = n**2/p
 3: ! main loop
 4: do pivot = 1, n-1
 5:     ! root maintains communication
 6:     if (my_rank.eq.0) then
 7:        ! adjust the chunk size
 8:        if (MOD(pivot, p).eq.0) then
 9:           chunk = chunk - n
10:        endif
11: 
12:        ! calculate chunk vectors
13:        rem = MOD((n**2-(n*pivot)),chunk)
14:        tmp = 0
15:        do i = 1, p
16:           tmp = tmp + chunk
17:           if (tmp.le.(n**2-(n*pivot))) then
18:              a_chnk_vec(i) = chunk
19:              b_chnk_vec(i) = chunk / n
20:           else
21:              a_chnk_vec(i) = rem
22:              b_chnk_vec(i) = rem / n
23:              rem = 0
24:           endif
25:        continue
26: 
27:        ! calculate displacement vectors
28:        a_disp_vec(1) = (pivot*n)
29:        b_disp_vec(1) = pivot
30:        do i = 2, p
31:           a_disp_vec(i) = a_disp_vec(i-1) + a_chnk_vec(i-1)
32:           b_disp_vec(i) = b_disp_vec(i-1) + b_chnk_vec(i-1)
33:        continue
34:  
35:        ! fetch the pivot equation
36:        do i = 1, n
37:           pivot_eqn(i) = a(n-(i-1),pivot)
38:        continue
39:  
40:        pivot_b = b(pivot)
41:     endif ! my_rank.eq.0
42:  
43:     ! distribute the pivot equation
44:     call MPI_BCAST(pivot_eqn, n,
45:                    MPI_DOUBLE_PRECISION,
46:                    root, MPI_COMM_WORLD, ierr)
47:  
48:     call MPI_BCAST(pivot_b, 1,
49:                    MPI_DOUBLE_PRECISION,
50:                    root, MPI_COMM_WORLD, ierr)
51:  
52:     ! distribute the chunk vector
53:     call MPI_SCATTER(a_chnk_vec, 1, MPI_INTEGER,
54:                      chunk, 1, MPI_INTEGER,
55:                      root, MPI_COMM_WORLD, ierr)
56:  
57:     ! distribute the data
58:     call MPI_SCATTERV(a, a_chnk_vec, a_disp_vec,
59:                       MPI_DOUBLE_PRECISION,
60:                       local_a, chunk,
61:                       MPI_DOUBLE_PRECISION,
62:                       root, MPI_COMM_WORLD,ierr)
63:  
64:     call MPI_SCATTERV(b, b_chnk_vec, b_disp_vec,
65:                       MPI_DOUBLE_PRECISION,
66:                       local_b, chunk/n,
67:                       MPI_DOUBLE_PRECISION,
68:                       root, MPI_COMM_WORLD,ierr)
69:  
70:     ! forward elimination
71:     do j = 1, (chunk/n)
72:        xmult = local_a((n-(pivot-1)),j) / pivot_eqn(pivot)
73:        do i = (n-pivot), 1, -1
74:           local_a(i,j) = local_a(i,j) - (xmult * pivot_eqn(n-(i-1)))
75:        continue
76:  
77:        local_b(j) = local_b(j) - (xmult * pivot_b)
78:     continue
79:  
80:     ! restore the data to root
81:     call MPI_GATHERV(local_a, chunk,
82:                      MPI_DOUBLE_PRECISION,
83:                      a, a_chnk_vec, a_disp_vec,
84:                      MPI_DOUBLE_PRECISION,
85:                      root, MPI_COMM_WORLD, ierr)
86:  
87:     call MPI_GATHERV(local_b, chunk/n,
88:                      MPI_DOUBLE_PRECISION,
89:                      b, b_chnk_vec, b_disp_vec,
90:                      MPI_DOUBLE_PRECISION,
91:                      root, MPI_COMM_WORLD, ierr)
92:  continue ! end of main loop
93: 
94:  ! backwards substitution done in parallel (not shown)


This code lacks some of the declarations for the variables, but most of the variables are self-explanatory. The code also attempts to do some load balancing via the chunk variable. chunk is also used to determine how much data to send, as the amount of data needed in each step gets progressively smaller. Making chunk smaller will therefor decrease the amount of time spent in communication, thus yielding better runtimes. The other variable of note is root, which refers to the root processor, the processor that controls the rest of the processors.

The code effectively begins its parallel section at line 4. Lines 5-41 have the root processor setting the chunk size and setting up the data to be passed to the other processors. In lines 43-68, the root processor sends the necessary data to the other processors. The functions MPI_BCAST, MPI_SCATTER, and MPI_SCATTERV serve as either a "send" or a "receive", depending on which processor is executing them; on the root, they act as a send, while on all other processors, they act as a receive[3]. In lines 70-78, each processor is performing the forward elimination on its chunk of data. Finally, the data from each processor is sent back to the root processor using the MPI_GATHERV function, which also functions as either a "send" or a receive", only the root processor is now the receiver and the other processors are the senders. All of this code is executed for each pivot point in the matrix. Backwards substitution is then done sequentially on the root processor.

The key elements of Message Passing in this code example are the communication via the MPI_ functions and the root processor performing some set-up of data to be passed on its own.

Definitions

References

1. S.F.McGinn and R.E.Shaw, University of New Brunswick, Parallel Gaussian Elimination Using OpenMP and MPI
2. Ian Foster, Argonne National Laboratory, Case Study: Gaussian Elimination
3. MPI: A Message-Passing Interface Standard