CSC/ECE 506 Spring 2011/ch4a ob: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
Line 61: Line 61:
'''Serial Code:'''
'''Serial Code:'''


int N;
float<nowiki> X[N+1][N+1]; // N+1 Points, each with N+1 coordinates.</nowiki>
float<nowiki> Y[N+1</nowiki>]; // Function values for N+1 points. (this is a function of X)
float Y_max_allowed; // termination condition
float Ycen, Xcen<nowiki>[N+1]</nowiki>; // centroid
float Yr, Xr<nowiki>[N+1];</nowiki>  // reflection point
float Yc, Xc<nowiki>[N+1];</nowiki> // contration point
float Ye, Xe<nowiki>[N+1]; // expansion point </nowiki>
main()
{
Nelder_Mead();
}
Nelder_Mead()
{
done = 0;
Sort_Simplex();
while(!done)
{
  Calculate_Centroid();
      Calculate_new_point();
  Sort_Simplex();
      Check_done();
      }
}
Sort_Simplex()
{
  Comparison sort of N+1 numbers,
  Ascending order by function value
}
Calculate_Centroid()
{
  for i  =  0 thru N do // For all  Dimensions
    for j  = 0 thru N-1 do  // Centroid of all but worst point
        Xcen<nowiki>[</nowiki>i] += <nowiki>X[</nowiki>i<nowiki>][</nowiki>j];
    end for
    Xcen<nowiki>[</nowiki>i] /= N;
end for 
}
Calculate_new_point()
{
  // reflection
  for i = 0 thru N do
      Xr<nowiki>[</nowiki>i] = (1+alpha)Xcen<nowiki>[</nowiki>i] – alpha*<nowiki>X[N]</nowiki><nowiki>[</nowiki>i];
  // expansion
  for i = 0 thru N do
      Xe<nowiki>[</nowiki>i] = gamma*Xr<nowiki>[</nowiki>i] + (1- gamma)Xcen<nowiki>[</nowiki>i];
  //contraction
  for i = 0 thru N do
      Xe<nowiki>[</nowiki>i] = beta*<nowiki>X[</nowiki><nowiki>N][</nowiki>i] + (1- beta)Xcen<nowiki>[</nowiki>i];
// accept one of the points based on their function values
// If none of the values is better, shrink the simplex based on X0
for i = 0 thru N-1 do
    for j = 0 thru N do
          <nowiki>X[</nowiki>i<nowiki>][</nowiki><nowiki>j] =  X[0][j] + gamma(X[</nowiki>i<nowiki>][j] – X[</nowiki>i<nowiki>][0]); </nowiki>
    end_for
end_for
}
Check_done()
{
}




Line 217: Line 81:


It can clearly be seen from the serial code that some of the statements like initial function calculation for every point, calculation of coordinate values for each dimension of the centroid and other points, exhibit DOALL data parallelism, i.e. they are independent of each other and can be performed all at once. The code for that can be given as follows, by replacing the for loops by for_all loops:
It can clearly be seen from the serial code that some of the statements like initial function calculation for every point, calculation of coordinate values for each dimension of the centroid and other points, exhibit DOALL data parallelism, i.e. they are independent of each other and can be performed all at once. The code for that can be given as follows, by replacing the for loops by for_all loops:
main()
{
Nelder_Mead();
}
Nelder_Mead()
{
done = 0;
Sort_Simplex();
while(!done)
{
  Calculate_Centroid();
      Calculate_new_point();
  Sort_Simplex();
      Check_done();
      }
}
Sort_Simplex()
{
  Comparison sort of N+1 numbers,
  Ascending order by function value
}
Calculate_Centroid()
{
  for_all i  =  0 thru N do // For all  Dimensions
    for_all j  = 0 thru N-1 do  // Centroid of all but worst point
        myXcen<nowiki>[</nowiki>i<nowiki>] += X[</nowiki>i<nowiki>][j];</nowiki> // private variable for each accumulate
    end for_all
      REDUCE(myXcen<nowiki>[</nowiki>i],Xcen<nowiki>[</nowiki>i],ADD) ;  // reduction operator for Centroid
    Xcen<nowiki>[</nowiki>i] /= N;
end for     
}
Calculate_new_point()
{
  // reflection
  for_all i = 0 thru N do
      Xr<nowiki>[</nowiki>i] = (1+alpha)Xcen<nowiki>[</nowiki>i<nowiki>] – alpha*X[N][</nowiki>i];
 
  // expansion
  for_all i = 0 thru N do
      Xe<nowiki>[</nowiki>i] = gamma*Xr<nowiki>[</nowiki>i] + (1- gamma)Xcen<nowiki>[</nowiki>i];
  //contraction
  for_all i = 0 thru N do
      Xe<nowiki>[</nowiki>i] = beta*<nowiki>X[</nowiki><nowiki>N][</nowiki>i] + (1- beta)Xcen<nowiki>[</nowiki>i];
// accept one of the points based on their function values
// If none of the values is better, shrink or reduce the simplex based on X0
for_all i = 0 thru N-1 do
    for_all j = 0 thru N do
          <nowiki>X[</nowiki>i<nowiki>][</nowiki><nowiki>j] =  X[0][j] + gamma(X[</nowiki>i<nowiki>][j] – X[</nowiki>i<nowiki>][0]); </nowiki>
    end_for_all
end_for_all
}
Check_done()
{
}


'''Shared Memory Implementation:'''
'''Shared Memory Implementation:'''

Revision as of 15:25, 26 February 2011

Serial and Parallel implementations of the Nelder Mead Algorithm

The Nelder Mead Algorithm [CITE] is a simplex method for function minimization developed by J. A. Nelder and R. Mead in 1965. The method is used for minimizing a function of n variables by comparing the function values at n+1 vertices of a general simplex, followed by the replacement of the vertex with highest function value by another point. The algorithm, by itself is highly serial in nature.

In this article, we will discuss the Serial Nelder Mead Algorithm and its parallelization at Algorithmic level and Code / Task Level.


Non Parallel Algorithm (Serial Simplex Method):

For an ‘n variable function, we start with a simplex of ‘n+1’ arbitrary points in an ‘n’ dimensional space. Let these points be X0 … Xn and the corresponding function values be Y0 to Yn. Let the maximum function value among these ‘n+1’ points be Yh at point Xh and the minimum be Yl at point Xl. Let Xcen be the centroid of the ‘n+1’ points.

Single Step of the Nelder Mead Algorithm:

At each step of the algorithm we perform four operations – Centroid, Reflection, Expansion, Contraction and Reduction.

Ordering:

We sort the simplex in ascending order by function values. i.e. X0 has the lowest function value and Xn has the highest value.

Centroid:

In this step we calculate the centroid of all but the worst point (Xh) in the current simplex.


Reflection:


After calculating the reflection point, we evaluate the function (Yr) value at this point. If it is an improvement over the minimum function value, we continue in the same direction and calculate the expansion point.

Expansion:


If the expansion point is an improvement, then Xe replaces the worst Point Xh and a new simplex is formed. If the expansion point is not an improvement, then Xr replaces Xh and the new simplex is calculated.

Contraction:

If Yr is not an improvement over the minimum function value, then we go in the other direction and calculate the contraction point.


If Yc is an improvement over the minimum function value, then Xc replaces Xh to form the new simplex.

Reduction:

If the contraction point is not an improvement either, then we shrink / reduce the entire simplex towards the best point Xo.


Widely used Coefficient Values:



This process is repeated till we get diminishing results, i . e. there is no significant improvement over the minimum function value found in the simplex, or if the condition for maximum allowed value for a minima is satisfied.

Serial Code:





Parallelization at Algorithm Level (Parallel Simplex Method):

Each Processor is assigned parameters corresponding to one point in the simplex. The processors then conduct simplex search steps (as in the serial version) to find a better point. On completion, the processors communicate the new points to form a new simplex. Let us assume that the degree of parallelization is P. Bear in mind that P is may or may not be equal to the number of available processors. For simplicity, let us assume that there are P processors. For 1 processor, this algorithm will be equivalent to the serial algorithm.

We first create a simplex of N+1 points and assign each processor, one of the P worst points and distribute rest of the (N+1 –P) points across all the processors. Then each processor goes through an entire step of the Nelder Mead algorithm. P processors return P improved points which replace the P worst points in the original simplex. The new simplex is again sorted and divided in the same way. Lee & Wiswall [CITE] provides adequate proof that the parallel algorithm converges as well as the serial algorithm does and that the performance gain reaches saturation after P = N/2. Note that this does not take into account parallelism at task level.


Parallelization at Task Level:

Data Parallel Implementation:

It can clearly be seen from the serial code that some of the statements like initial function calculation for every point, calculation of coordinate values for each dimension of the centroid and other points, exhibit DOALL data parallelism, i.e. they are independent of each other and can be performed all at once. The code for that can be given as follows, by replacing the for loops by for_all loops:

Shared Memory Implementation:

In the shared memory implementation of the algorithm, we use lock () and unlock () commands to make sure that multiple threads do not write to the same memory location at a given time. Only one thread can be in the critical section of the program at a given time. The corresponding code is as follows:


////REMOVE BELOW

Here we have lock and unlock for shared variables and we have to assign manually N+1 / P points to each processor.

We need to have a barrier after each step

We need to have Barrier before the CHECK condition to make sure all the points are calculated

////REMOVE ABOVE

Message Passing Implementation:

////REMOVE BELOW

Here we have messages for shared variables and we have to assign manually N+1/ P points to each processor

////REMOVE ABOVE


References: