CSC/ECE 506 Spring 2010/ch 3 jb/Parallel Programming Model Support

From Expertiza_Wiki
Revision as of 21:01, 23 February 2010 by Wjfisher (talk | contribs) (→‎References)
Jump to navigation Jump to search

Supplement to Chapter 3: Support for parallel-programming models. Discuss how DOACROSS, DOPIPE, DOALL, etc. are implemented in packages such as Posix threads, Intel Thread Building Blocks, OpenMP 2.0 and 3.0.


Posix threads

POSIX thread, also referred to as a pthread is used in shared address space architectures for parallel programs. Through the use of the pthread API, various functions can be used to create and manage pthreads. In order to fully, understand how pthreads can be used to exploit DOACROSS, DOPIPE, and DOALL parallelism a brief introduction to creating/terminating pthreads, mutexes, and conditional variables are necessary.

Creating and Terminating Pthreads

In order to create a pthread, the API provides the pthread_create() function. The pthread function accepts 4 arguments: thread, attr, start_routine, and arg. The thread argument is used to provide a unique identifier for the thread you are creating. The attr argument is used to specify a threads attribute object, or use default attributes by passing NULL. For the examples discussed the default attributes will be sufficient; for more information on setting thread attributes please see references. The start_routine argument is the program subroutine that will be executed by the thread being created. The arg argument is used to pass an argument to the subroutine that is being executed by the thread being created (value can be set to NULL if no argument is being pass to the subroutine).

In order to terminate pthreads, the API provides the pthread_exit() function. In order to terminate a pthread, the thread being run simply has to call this function (even the main thread). NOTE: there are alternate methods for terminating pthreads not discussed here for convenience and simplicity.

Mutexes

A mutex variable is a variable that must only be accessed by a single thread (mutex is short for mutual exclusion). The API provides the pthread_mutex_t data type in order statically create a mutex variable, and the pthread_mutex_init() function to create it dynamically.

Mutex variables are used to implement locks, so that multiple pthreads do not access critical data in a program. The API provides the pthread_mutex_lock() and pthread_mutex_unlock() functions. These functions simply lock or unlock the mutex variable specified.

Conditional Variables

Conditional variables allow for point-to-point synchronization between threads. The API provides a few useful functions in order to synchronize threads: pthread_cond_wait() and pthread_cond_signal(). The pthread_cond_wait() blocks all threads until the specified condition is satisfied. The pthread_cond_signal() wakes up another thread that is waiting on the condition to be satisfied. These functions are the pthread specific functions that are analogous to the general wait() and post() function discussed in the Sohlihin text.

DOACROSS Parallelism

In order to exploit DOACROSS parallelism using pthreads, conditions are needed in order to synchronize the threads. Since instructions are executed across iterations, and data dependencies exist across iterations are assumed (See Sohlin Text), the conditional variables shown above are used to ensure the correct execution of the code.

Lets take a simple example where each thread calculates A[i] = A[i-1] + B[i]. Point-to-point synchronization is necessary in order to make sure A[i-1] is not read before its value is written. This is where the pthread’s conditional variable comes is useful. We put pthread_cond_wait() and pthread_cond_signal() around the instruction above, in order to make sure the previous thread has signaled that its completed before the current thread performs its own computation.

DOPIPE Parallelism

In order to exploit DOPIPE parallelism, conditions are also needed in order to synchronize threads. Instead of instructions being implemented across threads, instructions are implemented on a single thread (i.e. instruction 1 is executed by thread 1, instruction 2 is executed by thread 2, etc). However, there are loop independent data dependencies which require the conditional variables.

Here we use the pthreads differently from the DOACROSS parallelism. The DOPIPE parallelism has each thread call a different function. Each function may have some loop independent dependence with some other function. Lets say function 2 depends on function 1, so function 1 will call pthread_cond_signal() once it is finished, and function 2 will call pthread_cond_wait().

The differences between DOPIPE and DOACROSS are that DOPIPE executes different functions on each thread, and it has the signal and wait functions called from different functions. Whereas DOACROSS executes the same function on each thread (it just uses different data), and it has the signal and wait functions called from the same function.

DOALL Parallelism

Since DOALL parallelism just means that all iterations are executed in parallel, and no dependences exits. All that is necessary is that the threads have to be created.

Intel Threading Building Blocks

According the to Intel® Software site, Intel® Threading Building Blocks (Intel® TBB) is a C++ template library that abstracts thread to tasks to create reliable, portable, and scalable parallel applications.

Parallel Loops

A DOALL parallel construct can be specified using the parallel_for() construct. This construct takes two parameters. The first is the range of indices of the loop that can be run in parallel. The second is the a solid chunk of operations that can be processed as a units that are safe to run concurrently. For a DOALL loop, this second parameter should include all possible loop indices. Also, an optional third parameter can be specified to define the chunk size of the loop and information about cache affinity.

OpenMP 2.0

OpenMP 3.0

Some features such as tasks, synchronization primitives among a few others were included in the upgrades from OpenMP 2.0 but nothing significant was upgraded that fits the scope of this article.

References

  • Yan Solihin, Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems, Solihin Books, August 2009.