CSC/ECE 506 Spring 2013/3a bs: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(295 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Overview: ==
Current page address: http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/3a_bs
 
Page started with: http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2011/ch3_ab
 
== Overview ==


The main goal of this wiki is to explain which architectural mechanisms are used by library functions for DOALL, DOACROSS, and DOPIPE parallelism, reduction, and functional parallelism in various architectures.
The main goal of this wiki is to explain which architectural mechanisms are used by library functions for DOALL, DOACROSS, and DOPIPE parallelism, reduction, and functional parallelism in various architectures.


== Synchronization: ==
== Synchronization ==
 
When using any [http://en.wikipedia.org/wiki/Parallel_programming_model parallel programming model], [http://en.wikipedia.org/wiki/Synchronization_(computer_science) synchronization]<ref>Rahman, M.M.; , "Process synchronization in multiprocessor and multi-core processor," Informatics, Electronics & Vision (ICIEV), 2012 International Conference on , vol., no., pp.554-559, 18-19 May 2012</ref> is needed to guarantee accuracy of the overall program. The following are a few example situations where synchronization<ref>http://people.csail.mit.edu/rinard/paper/cpande99.pdf</ref> will be needed.


== Libraries: ==
 
•  The code following the parallelized loop requires that all of the parallel [http://en.wikipedia.org/wiki/Process_(computer_science)  processes] be completed before advancing. It cannot be triggered simply by one of the processes completing.
 
•  A portion of code in the middle of a parallelized section MUST be executed in a very particular order so that [http://en.wikipedia.org/wiki/Global_variable global variables] used across processes get read and written in the proper order. This is known as the [http://en.wikipedia.org/wiki/Critical_section critical section].
 
•  Multiple processes must update a global variable in such a way that one process does not overwrite the updates of a different process. (i.e. SUM = SUM + <process update>).
 
 
These are just a few examples. Every architecture implements synchronization in a unique way using different types of mechanisms. The subsequent section will highlight the following that are used to achieve synchronization:
 
1. Semaphore.h
 
2. Pthread.h
 
== Libraries<ref>http://linux.die.net/man/3/</ref> ==


    
    
=== Semaphore.h ===
=== Semaphore.h ===
-----------------------
   
   
A semaphore is special variable that acts similar to a lock. If the semaphore can be acquired then the process can proceed into the critical section. If the semaphore cannon be acquired, then the process is “put to sleep” and the processor is then used for another process. This means the processes cache is saved off in a place where it can be retrieved when the process is “woken up”. Once the semaphore is available the “sleeping” process is woken up and obtains the semaphore and proceeds in to the critical section.A simple way to execute a semaphore would be to use the following functions[[1]];
A [http://en.wikipedia.org/wiki/Semaphore_(programming) semaphore] is special [http://en.wikipedia.org/wiki/Variable_(computer_science) variable] that acts similar to a [http://en.wikipedia.org/wiki/Lock_(computer_science) lock]. For a process to enter into the critical section it must be able to acquire the semaphore. If the semaphore cannot be acquired, then the process is “put to [http://en.wikipedia.org/wiki/Sleep_(system_call) sleep]” and the processor is then used for another process. This means the processes [http://en.wikipedia.org/wiki/CPU_cache cache] is saved off in a place where it can be retrieved when the process is “woken up”. Once the semaphore is available the “sleeping” process is woken up and obtains the semaphore and proceeds in to the critical section.A simple way to execute a semaphore would be to use the following functions for various operations on semaphores;




'''I Initializing a semaphore:'''
====Initializing a semaphore====


'''int sem_init(sem_t *sem, int pshared, unsigned int value);''': sem_init() initializes the unnamed semaphore at the address pointed to by sem. The value argument specifies the initial value for the semaphore. The pshared argument indicates whether this semaphore is to be shared between the threads of a process, or between processes. If pshared has the value 0, then the semaphore is shared between the threads of a process, and should be located at some address that is visible to all threads (e.g., a global variable, or a variable allocated dynamically on the heap).
'''•  ''int sem_init(sem_t *sem, int pshared, unsigned int value):'''''
If pshared is nonzero, then the semaphore is shared between processes, and should be located in a region of shared memory (see shm_open(3), mmap(2), and shmget(2)). (Since a child created by fork(2) inherits its parent's memory mappings, it can also access the semaphore.) Any process that can access the shared memory region can operate on the semaphore using sem_post(3), sem_wait(3), etc.
This function [http://en.wikipedia.org/wiki/Initialization_(programming) initializes] the unnamed semaphore at the address pointed to by sem. Value is the initial value of the semaphore. The variable pshared is indicative of whether the semaphore is shared between threads or processes. If its value is zero it is shared between [http://en.wikipedia.org/wiki/Threads_(computer_science) threads] else it is shared between [http://en.wikipedia.org/wiki/Process_(computing) processes].
Initializing a semaphore that has already been initialized results in undefined behavior.
 
'''Return Value:'''
Return Value:
sem_init() returns 0 on success; on error, -1 is returned, and errno is set to indicate the error.
sem_init() returns 0 on success; on error, -1 is returned, and errno is set to indicate the error.


====Locking the semaphore====


'''II Locking the semaphore:'''
'''•  ''int sem_wait(sem_t *sem):''''' This function decrements (locks) the semaphore pointed to by sem. Decrement will only proceed if the value of the semaphore is greater than zero. If its value is zero, then the call blocks till the value of the semaphore becomes positive so that it can acquire it or a [http://en.wikipedia.org/wiki/Signal_handler signal handler] interrupts the call.


'''1. int sem_wait(sem_t *sem):'''sem_wait() decrements (locks) the semaphore pointed to by sem. If the semaphore's value is greater than zero, then the decrement proceeds, and the function returns, immediately. If the semaphore currently has the value zero, then the call blocks until either it becomes possible to perform the decrement (i.e., the semaphore value rises above zero), or a signal handler interrupts the call.
'''•  ''int sem_trywait(sem_t *sem):'''''
This function is the same as sem_wait(), except that if the decrement cannot be immediately performed, then call returns an error (errno set to EAGAIN<ref>http://www-numi.fnal.gov/computing/minossoft/releases/R2.3/WebDocs/Errors/unix_system_errors.html</ref>) instead of [http://en.wikipedia.org/wiki/Blocking_(computing) blocking].


'''2. int sem_trywait(sem_t *sem);'''
'''•  ''int sem_timedwait(sem_t *sem, const struct timespec *abs_timeout):'''''
sem_trywait() is the same as sem_wait(), except that if the decrement cannot be immediately performed, then call returns an error (errno set to EAGAIN) instead of blocking.
This function is also the same as sem_wait(), except that abs_timeout specifies a limit on the amount of time that the call should block if the decrement cannot be immediately performed. The abs_timeout argument points to a structure that specifies an absolute timeout in seconds and nanoseconds. This structure is defined as follows:
 
'''3. int sem_timedwait(sem_t *sem, const struct timespec *abs_timeout);'''
sem_timedwait() is the same as sem_wait(), except that abs_timeout specifies a limit on the amount of time that the call should block if the decrement cannot be immediately performed. The abs_timeout argument points to a structure that specifies an absolute timeout in seconds and nanoseconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC). This structure is defined as follows:
        struct timespec {
      struct timespec  
    time_t tv_sec;     /* Seconds */
      {
    long  tv_nsec;     /* Nanoseconds [0 .. 999999999] */};
      time_t tv_sec;         /* Seconds */
      long  tv_nsec;         /* Nanoseconds [0 .. 999999999] */
      };


If the timeout has already expired by the time of the call, and the semaphore could not be locked immediately, then sem_timedwait() fails with a timeout error (errno set toETIMEDOUT).
If the timeout has already expired by the time of the call, and the semaphore could not be locked immediately, then sem_timedwait() fails with a timeout error (errno set to ETIMEDOUT<ref>http://pubs.opengroup.org/onlinepubs/7908799/xsh/errors.html</ref>). If the operation can be performed immediately, then sem_timedwait() never fails with a timeout error, regardless of the value of abs_timeout. Furthermore, the validity of abs_timeout is not checked in this case.
If the operation can be performed immediately, then sem_timedwait() never fails with a timeout error, regardless of the value of abs_timeout. Furthermore, the validity ofabs_timeout is not checked in this case.


'''Return Value:'''
Return Value:
All of these functions return 0 on success; on error, the value of the semaphore is left unchanged, -1 is returned, and errno is set to indicate the error.
All of these functions return 0 on success; on error, the value of the semaphore is left unchanged, -1 is returned, and errno<ref>http://www.kernel.org/doc/man-pages/online/pages/man3/errno.3.html</ref> is set to indicate the error.


==== Releasing the semaphore ====


'''III Releasing the semaphore:'''
'''•  ''int sem_post(sem_t *sem):''''' This function increments (unlocks) the semaphore pointed to by sem, thus making the value of the semaphore positive. Some other process can now acquire it.


'''int sem_post(sem_t *sem);'''sem_post() increments (unlocks) the semaphore pointed to by sem. If the semaphore's value consequently becomes greater than zero, then another process or thread blocked in a sem_wait(3) call will be woken up and proceed to lock the semaphore.
Return Value: sem_post() returns 0 on success; on error, the value of the semaphore is left unchanged, -1 is returned, and errno is set to indicate the error.


'''Return value:'''
===='''Pseudo Code'''====
sem_post() returns 0 on success; on error, the value of the semaphore is left unchanged, -1 is returned, and errno is set to indicate the error.
 
'''Pseudo Code:'''


      <code>
       sem_t *sem;  
       sem_t *sem;  
       int pshared;
       int pshared;
       unsigned int value;
       unsigned int value;
       int i = sem_init(sem, pshared, value); /*initialize the semaphore*/
       int i = sem_init(sem, pshared, value);                           /*initialize the semaphore*/
       int wait=sem_wait(sem); /*will decrement the value of the semaphore i.e. acquire the lock */
       int wait=sem_wait(sem);                                           /*will decrement the value of the semaphore i.e. acquire the lock */
     
       if(wait==-1)
       if(wait==-1)
      printf(“Error occurred, the value of the semaphore was not decremented”);
          printf(“Error occurred, the value of the semaphore was not decremented”);
     
       /*critical section;*/
       /*critical section;*/
       int post=sem_post(sem); /*will increment the value of the semaphore i.e. release the lock*/
       int post=sem_post(sem);                                           /*will increment the value of the semaphore i.e. release the lock*/
       if(post==-1)
       if(post==-1)
      printf(“Error occurred, the value of the semaphore was not incremented”);
          printf(“Error occurred, the value of the semaphore was not incremented”);
 
      </code>
 


=== Pthread.h ===
----------------------------


POSIX threads<ref>http://maxim.int.ru/bookshelf/PthreadsProgram/toc.html</ref>, usually referred to as [http://en.wikipedia.org/wiki/Pthread pthreads] defines a set of programming language [http://en.wikipedia.org/wiki/Data_type types],[http://en.wikipedia.org/wiki/Function_(computer_science) functions] and constants.It is implemented with a pthread.h [http://en.wikipedia.org/wiki/Header_file header file] and a thread library. The pthread library provides the following synchronization mechanisms:


1. Mutexes


2. Joins


=== Pthread.h ===
3. Conditional Variables


The pthread library provides three synchronization mechanisms:
4. Barriers




==== Mutexes ====
==== Mutexes ====


'''Mutual Exclusion Lock''', mutex in short are used to protect a shared resource from a race condition. Mutexes are used to prevent operations by multiple threads on the same memory location at the same time or when an order of operation is expected which would lead to data inconsistencies. It blocks access to variables by other threads.
'''''Mutual Exclusion Lock'''''<ref>http://docs.oracle.com/cd/E19963-01/html/821-1601/sync-28983.html</ref>, mutex in short is another synchronization method and is used to avoid [http://en.wikipedia.org/wiki/Race_condition race conditions]. In cases leading to data inconsistencies,like when multiple threads are to be prevented from operating on the same memory location simultaneously or when a specific order of operation is expected, mutexes are used.It blocks access to variables by other threads.
Mutexes are in particular used to protect a critical region (“a segment of memory”) from other threads. Mutexes work only between threads in a single process and donot work between processes as do semaphores.
Mutexes are in particular used to protect a critical region (“a segment of memory”) from other threads<ref>Raghunathan, S.; , "Extending Inter-process Synchronization with Robust Mutex and Variants in Condition Wait," Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on , vol., no., pp.121-128, 8-10 Dec. 2008</ref>.
 
The following are the functions for managing mutexes:
 
 
'''I. Initialising the mutex:'''
 
'''pthread_mutex_init (pthread_mutex_t *restrict mutex,const pthread_mutexattr_t *restrict attr);
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER:'''
This function is used to initialize the mutex referenced by the mutex with attributes specified by the attribute. The default mutex attributes are used if attribute passed is NULL which is same as passing the address of a default mutex. The state of the mutex is  initialized  and unlocked, upon successful initialization.
A destroyed mutex object can be reinitialized using pthread_mutex_init().Attempting to initialize an already initialized mutex results in undefined behavior.
In cases where default mutex attributes are appropriate, the macro PTHREAD_MUTEX_INITIALIZER can be used to initialize mutexes that are statically allocated. It is dynamic initialization by a call to pthread_mutex_init() with parameter attr specified as NULL, except that no error checks are performed.
 


'''II. Destroying the mutex:'''
The following are the functions for using mutexes:<ref>http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.genprogc%2Fdoc%2Fgenprogc%2Fmutexes.htm</ref>


'''pthread_mutex_destroy (pthread_mutex_t *mutex): '''
=====Initialising the mutex =====
This function is used to destroy a mutex which is no longer needed. The function destroys the mutex object referenced by mutex and hence the mutex object becomes uninitialized. pthread_mutex_destroy() may set the object referenced by mutex to an invalid  value.
A  destroyed  mutex object can be reinitialized using pthread_mutex_init().A mutex cannot be referenced after it is destroyed. It shall be safe to destroy an initialized mutex that is unlocked. Attempting to destroy a locked mutex results in undefined behavior.


Only mutex itself may be used for performing synchronization. The result of referring to copies of mutex in calls to pthread_mutex_lock(), pthread_mutex_trylock(), pthread_mutex_unlock(), and pthread_mutex_destroy() is undefined.
'''•  ''pthread_mutex_init (pthread_mutex_t *restrict mutex,const pthread_mutexattr_t *restrict attr):'''''
The mutex referenced by the mutex is initialised with the attributes specified by attr using this function. The default mutex attributes are used if attribute passed is NULL.Passing a NULL attribute implies passing the address of a default mutex. The state of the mutex is initialized and unlocked,upon successful initialization.


'''Return Value'''
The pthread_mutex_init can be used to reinitialize an already destroyed mutex,but attempting to initialize an already initialized mutex results in undefined behavior.
If successful, the pthread_mutex_destroy() and pthread_mutex_init() functions shall return zero; otherwise, an error number shall be returned to indicate the error.The [EBUSY] and [EINVAL] error checks, if implemented, act as if they were performed immediately at the beginning of processing for the function and shall cause an error return prior to modifying the state of the mutex specified by mutex.
The [http://en.wikipedia.org/wiki/Macro_(computer_science) macro] [http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.basetechref%2Fdoc%2Fbasetrf1%2FPTHREAD_MUTEX_INITIALIZER.htm PTHREAD_MUTEX_INITIALIZER] is used to initialize mutexes that are statically allocated in cases where default mutex attributes are appropriate. It is dynamic initialization by a call with parameter attr specified as NULL to pthread_mutex_init() but in this case no error checks are performed.


Return Value: The function returns zero on success; otherwise, an error number is returned to indicate the error.


'''III. Locking the mutex:'''
===== Destroying the mutex=====


'''pthread_mutex_lock (pthread_mutex_t *mutex):''' A mutex object referenced by the mutex can be locked using this function. If the  mutex  is already locked, the  calling thread  shall  block until the mutex becomes available.This operation shall return with the mutex object referenced by mutex in the locked  state with the calling thread as its owner.If the mutex type is PTHREAD_MUTEX_NORMAL, deadlock detection shall not be provided. Attempting to relock the mutex causes deadlock. If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results.
'''•  ''pthread_mutex_destroy (pthread_mutex_t *mutex):'''''
This function is used to destroy a mutex which is no longer needed. The function destroys the mutex object passed by mutex attribute and hence the mutex object becomes uninitialized. pthread_mutex_destroy() sets the object referenced to an invalid value.
A  destroyed  mutex object can be reinitialized using pthread_mutex_init().A mutex cannot be referenced after it is destroyed.It is advised to destroy an initialized mutex that is unlocked. Attempting to destroy a locked mutex results in undefined behavior.


Error checking is provided if the mutex is of type PTHREAD_MUTEX_ERRORCHECK. Error shall be returned if a thread attempts to relock a mutex that it has already locked or if a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked.
Return Value: The function returns zero on success; otherwise, an error number is returned to indicate the error.


PTHREAD_MUTEX_RECURSIVE, mutexes of this type shall maintain the concept of a lock count. The lock count shall be set to one when a thread successfully acquires a mutex for the first time. And the lock count shall be incremented by one every time a thread relocks this mutex. Each time the thread unlocks the mutex, the lock count shall be decremented by one. When the lock count reaches zero, the mutex shall become available for other threads to acquire. If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, an error shall be returned.
=====Locking the mutex=====


If the mutex type is PTHREAD_MUTEX_DEFAULT, attempting to recursively lock the mutex or attempting to unlock the mutex if it was not locked by the calling thread or attempting to unlock the mutex if it is not locked results in undefined behavior.
'''•  ''pthread_mutex_lock (pthread_mutex_t *mutex):'''''
This function is used to lock the mutex passed. If the mutex is already locked by another process, the calling thread blocks until the mutex is unlocked by the other process.This operation returns the mutex object referenced by mutex in the locked state with the calling thread as its owner.Error checking is provided if the mutex is of type PTHREAD_MUTEX_ERRORCHECK. Error shall be returned if a thread attempts to relock a mutex that it has already locked or if a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked.


The pthread_mutex_trylock() function shall be equivalent to pthread_mutex_lock(), except that if the mutex object referenced by mutex is currently locked (by any thread, including the current thread), the call shall return immediately. If the mutex type is PTHREAD_MUTEX_RECURSIVE and the mutex is currently owned by the calling thread, the mutex lock count shall be incremented by one and the pthread_mutex_trylock() function shall immediately return success.(In the case of PTHREAD_MUTEX_RECURSIVE mutexes, the mutex shall become available when the count reaches zero and the calling thread no longer has any locks on this mutex.)
Mutexes of type,PTHREAD_MUTEX_RECURSIVE<ref>http://developer.apple.com/library/ios/#documentation/System/Conceptual/ManPages_iPhoneOS/man3/pthread_mutexattr_settype.3.html</ref> maintain a lock count. The lock count is set to one when a mutex is acquired by a thread successfully for the first time. And the lock count is incremented in units of one every time a thread relocks this mutex. Each time the thread unlocks the mutex, the lock count is decremented by one. When the lock count reaches zero, the mutex becomes available for other threads to acquire.An error is returned if a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked.


If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread shall resume waiting for the mutex as if it was not interrupted.
If the mutex type is PTHREAD_MUTEX_DEFAULT, attempting to recursively lock the mutex or attempting to unlock the mutex if it was not locked by the calling thread or attempting to unlock the mutex if it is not locked results in undefined behavior.If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread shall resume waiting for the mutex as if it was not interrupted.


Return Value: The functions return a zero on success; otherwise, an error number is returned to indicate the error.


'''IV. Unlocking the mutex:'''
===== Unlocking the mutex=====


'''pthread_mutex_unlock (pthread_mutex_t *mutex):''' This function is used to release a mutex that is previously locked.The pthread_mutex_unlock() function shall release the mutex object referenced by  mutex. The  manner  in which a mutex is released is dependent upon the mutex's type attribute.If  there  are  threads  blocked on the mutex object referenced by mutex when pthread_mutex_unlock() is called, resulting in the mutex becoming available, the scheduling policy shall determine which thread shall acquire the mutex. An error is returned if mutex is already unlocked or owned by another thread.
'''•  ''pthread_mutex_unlock (pthread_mutex_t *mutex):''''' This function is used to release a mutex that is previously locked.The function shall release the mutex object referenced by  mutex. The  manner  in which a mutex is released is dependent upon the mutex's type attribute.If  there  are  threads  blocked on the mutex object referenced by mutex when pthread_mutex_unlock() is called, resulting in the mutex becoming available, the [http://en.wikipedia.org/wiki/Scheduling_(computing) scheduling] policy shall determine which thread shall acquire the mutex. An error is returned if mutex is already unlocked or owned by another thread.
   
   
'''Return Value:'''Only mutex itself may be used for performing synchronization. The result of referring to copies of mutex in calls to pthread_mutex_lock(), pthread_mutex_trylock(),pthread_mutex_unlock(), and pthread_mutex_destroy() is undefined.
Return Value: The functions return a zero on success; otherwise, an error number is returned to indicate the error.
 
'''Pseudo code:'''


====='''Pseudo code'''=====
    <code>
     pthread_mutex_t *mutex,
     pthread_mutex_t *mutex,
     const pthread_mutexattr_t *attr;
     const pthread_mutexattr_t *attr;
     int p = pthread_mutex_init(mutex, attr);
     int p, p1, pu, pd;
    p = pthread_mutex_init(mutex, attr);
   
     if(p!=0)
     if(p!=0)
    printf(“Error occurred mutex was not created”);
          printf("Error occurred mutex was not created”);
     int pl = pthread_mutex_lock(mutex);
      
    pl = pthread_mutex_lock(mutex);    
     if(pl!=0)
     if(pl!=0)
    printf(“Error occurred mutex was not locked”);
          printf("Error occurred mutex was not locked”);
     //critical section
   
     int pu = pthread_mutex_unlock(mutex);
     /*critical section*/
     pu = pthread_mutex_unlock(mutex);
     if(pu!=0)
     if(pu!=0)
    printf(“Error occurred mutex was not unlocked”);
          printf("Error occurred mutex was not unlocked”);
     int pd = pthread_mutex_destroy(mutex);
      
    pd = pthread_mutex_destroy(mutex);
     if(pd!=0)
     if(pd!=0)
    printf(“Error occurred mutex was not destroyed”);
          printf("Error occurred mutex was not destroyed”);
    </code>


==== Joins ====
====='''Avoiding Deadlock<ref>http://www2.chrishardick.com:1099/Notes/Computing/C/pthreads/mutexes.html</ref>'''=====


A join is performed when one wants to wait for a thread to finish. A thread calling routine may launch multiple threads and then wait for them to finish to get the results.
[http://en.wikipedia.org/wiki/Deadlock Deadlocks] occur when the program holds more than one mutex. A classic example of deadlock is;


pthread_join(pthread_t thread,void **value_ptr) function is useful for determining if a thread has completed before starting another task.Argument,thread refers to the thread id and value_ptr refers to the value passed from pthread_exit.This function suspends the  execution of  the calling thread until the target thread terminates, unless the target thread has already terminated.On return from a successful pthread_join() call with a non-NULL value_ptr argument, the value passed to pthread_exit() by the terminating thread shall be made available in the location  referenced by value_ptr.When a pthread_join() returns successfully, the target thread has been terminated. The results of multiple simultaneous calls  to  pthread_join()  specifying  the same target thread are undefined. If the thread calling pthread_join() is canceled, then the  target thread shall not be detached.
'''Thread 1:'''


    lock mutex_a
    |
    lock mutex_b
    -blocked forever waiting for mutex_b


====  Conditional Variables ====
'''Thread 2:'''


    lock mutex_b
    |
    lock mutex_a
    -blocked forever waiting for mutex a


Condition variables are synchronization objects that allow threads to wait for certain events (conditions) to occur. Condition variables are slightly more complex than mutexes, and the correct use of condition variables requires the thread to co-operatively use a specific protocol in order to ensure safe and consistent serialization. The protocol for using condition variables includes a mutex, a boolean predicate (true/false expression) and the condition variable itself. The threads that are cooperating using condition variables can wait for a condition to occur, or can wake up other threads that are waiting for a condition.


The following are the functions used in conjunction with the conditional variable:
Few common techniques<ref>http://pages.cs.wisc.edu/~remzi/Classes/537/Fall2011/Book/threads-deadlock.pdf</ref> exist which can be used to avoid deadlocks, which are;


'''i. Creating/Destroying:'''
1. Establish a locking hierarchy.<ref>http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/inspectorxe/lin/ug_docs/GUID-38B68BDA-257C-4D4E-9AC9-B0B2698AD950.htm</ref>


'''int pthread_cond_init(pthread_cond_t *restrict cond, const pthread_condattr_t *restrict attr);'''
2. [http://en.wikipedia.org/wiki/Spinlock Spinlock.]
The pthread_cond_init() function is used to initialize the  conditional variable referenced by cond with attributes referenced by attr. Default condition variable attributes shall be used if attribute is NULL.It is same as passing the address of a default condition variable attributes object.Upon successful initialization, the conditional variable state is initialized. Attempting to initialize an  already  initialized condition variable results in undefined behavior.


'''pthread_cond_destroy(pthread_cond_t *cond);'''The  pthread_cond_destroy()  function shall destroy the given condition variable specified by cond; the object becomes, in  effect,  uninitialized. An  implementation  may  cause pthread_cond_destroy() to set the object referenced by cond to an invalid value.  A  destroyed  condition variable object can  be  reinitialized using pthread_cond_init(); the results of otherwise referencing the object after it has been destroyed are undefined.
3. Chaining.


It  shall  be  safe  to destroy an initialized condition variable upon which no threads are currently blocked. Attempting to destroy a condition variable upon which other threads are currently blocked results in undefined behavior.


'''1. Establish a locking hierarchy:'''


'''Return Value''' If successful, the pthread_cond_destroy() and pthread_cond_init() functions  shall  return zero; otherwise, an error number shall be returned to indicate the error.
• A hierarchy needs to be established to hold multiple [http://en.wikipedia.org/wiki/Lock_(computer_science) locks].
A condition variable can be destroyed immediately after all the threads
that  are blocked on it are awakened.  


For example, consider the following code:
• A rule could be established such that in order to lock mutex_a and mutex_b, mutex_a must be locked before mutex_b.


          struct list {
• If unnecessary then both locks should not be held simultaneously.


  pthread_mutex_t lm;
• Order of locking the mutexes must be maintained.


  ...
• Mutexes can be unlocked in any order that is preferred by the program, because unlocking doesn't create deadlocks.


          }
• Another function must be formulated in order to unlock the set of mutexes.


                  struct elt {


  key k;
'''Thread 1:'''
    lock mutex_a
    lock mutex_b
   
    //perform processing
   
    unlock mutex_a
    unlock mutex_b
    ...


  int busy;
'''Thread 2:'''
    lock mutex_a(blocked)
   
    wake up(obtained mutex_a)
    lock mutex_b
   
    //perform processing
   
    unlock mutex_a
    unlock mutex_b
    ...


  pthread_cond_t notbusy;
'''2. Spin lock:'''


  ...
• The first mutex can be locked without any [http://en.wikipedia.org/wiki/Constraint constraints]. But from here on to lock additional mutexes non-blocking mutexes<ref>http://docs.oracle.com/cd/E19683-01/806-6867/sync-36993/index.html</ref> must be used (pthread_mutex_trylock())


          }
• In case of a failure in any lock, unlock in reverse order and try again.


                  /* Find a list element and reserve it. */
• This method of unlocking in the reverse order reduces the spinning for other threads.


          struct elt *


          list_find(struct list *lp, key k)
'''Thread 1:'''


          {
    lock mutex_a
    try-lock mutex_b
    |
    perform processing
    |
    unlock mutex_b
    unlock mutex_a 
    |       
    ...


  struct elt *ep;
'''Thread 2:'''


                  pthread_mutex_lock(&lp->lm);
    lock mutex_a (blocked)
   
    wake up (obtained mutex_a)
    try-lock mutex_b
    |
    perform processing
    |
    unlock mutex_b
    unlock mutex_a


  while ((ep = find_elt(l, k) != NULL) && ep->busy)
'''3. Chaining:'''


  pthread_cond_wait(&ep->notbusy, &lp->lm);
• Traversing [http://en.wikipedia.org/wiki/Linked_list linked lists] and other [http://en.wikipedia.org/wiki/Tree_(data_structure)tree data structures].


  if (ep != NULL)
• A locking hierarchy is to be created, like Lock1, Lock2, Lock3, Unlock1, Lock4.


  ep->busy = 1;
'''Thread 1'''


  pthread_mutex_unlock(&lp->lm);
    lock head node 
    head node processing 
    |                   
    traverse branch locking first node
    unlock head node
    first node processing


  return(ep);
==== Joins ====


          }
A join is performed when one wants to wait for a thread to finish. A thread calling routine may launch multiple threads and then wait for them to finish to get the results.


          delete_elt(struct list *lp, struct elt *ep)
'''•  ''pthread_join(pthread_t thread,void **value_ptr):''''' This function determines if a thread has completed before starting another task. The argument,thread refers to the thread id and value_ptr refers to the value passed from pthread_exit. This function suspends the execution of the calling thread until the target thread terminates, unless the target thread has already terminated.On return from a successful pthread_join() call with a non-NULL value_ptr argument, the value passed to [http://www.kernel.org/doc/man-pages/online/pages/man3/pthread_exit.3.html pthread_exit()] by the terminating thread shall be made available in the location  referenced by value_ptr. When a pthread_join() returns successfully, the target thread has been terminated. The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined. If the thread calling pthread_join() is canceled, then the target thread shall not be detached.


          {
Return Value: The function returns zero on success; otherwise, the error number is returned to indicate the error.


  pthread_mutex_lock(&lp->lm);
====  Conditional Variables ====


  assert(ep->busy);
  ... remove ep from list ...
  ep->busy = 0; /* Paranoid. */
          (A) pthread_cond_broadcast(&ep->notbusy);
  pthread_mutex_unlock(&lp->lm);
          (B) pthread_cond_destroy(&rp->notbusy);
  free(ep);
          }


In this example, the condition variable and its list element may be freed (line  B) immediately after all threads waiting for it are awakened (line A), since the mutex and the code ensure that no other thread can touch the element to be deleted.
Conditional variables<ref>http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/index.jsp</ref> are a kind of synchronization objects that are used to allow threads to wait for certain events to occur and are slightly more complex than mutexes. In order to ensure a safe and consistent [http://en.wikipedia.org/wiki/Serialization serialization], usage of condition variables requires the thread to cooperatively use a specific protocol which includes a mutex, a boolean [http://en.wikipedia.org/wiki/Predicate_(mathematical_logic) predicate] and the condition variable itself. The threads that are cooperating using condition variables can wait for a condition to occur, or can wake up other threads that are waiting for a condition.


'''ii. Waiting on condition:'''
The following are the functions used in conjunction with the conditional variable:


'''int pthread_cond_timedwait(pthread_cond_t *restrict cond,pthread_mutex_t *restrict mutex, const struct timespec *restrict abstime); and
=====Creating/Destroying=====
int pthread_cond_wait(pthread_cond_t *restrict cond, pthread_mutex_t *restrict mutex);''' These functions are used to shall block  on  a condition variable. They shall be called with mutex locked
by the calling thread or undefined behavior results.These functions atomically release mutex and cause the  calling thread to block on the condition variable cond; atomically here means "atomi-
cally with respect to access by another thread to the  mutex  and  then the  condition variable". That is, if another thread is able to acquire the mutex after the about-to-block thread has released it, then a  subsequent call to pthread_cond_broadcast() or pthread_cond_signal() in that thread shall behave as if it were issued after the about-to-block thread has blocked.


Upon successful return, the mutex shall have been locked and shall be owned by the calling thread.
'''•  ''int pthread_cond_init(pthread_cond_t *restrict cond, const pthread_condattr_t *restrict attr):'''''
This function is used to initialize the conditional variable referenced by 'cond' with attributes referenced by 'attr'. Default condition variable attributes shall be used if attribute is NULL.It is same as passing the address of a default condition variable attributes object.Upon successful initialization, the conditional variable state is initialized. Attempting to initialize an already initialized condition variable results in undefined behavior.


'''iii. Waking thread based on condition:'''
Return Value: If successful the function returns zero; otherwise, an error number is returned to indicate the error.


'''pthread_cond_broadcast(pthread_cond_t *cond); and pthread_cond_signal(pthread_cond_t *cond);''' These  functions shall unblock threads blocked on a condition variable.The pthread_cond_broadcast() function shall unblock  all threads  currently blocked on the specified condition variable cond.


The pthread_cond_signal()  function shall unblock at least one of the threads that are blocked on the specified condition variable cond  (if any threads are blocked on cond).
'''• ''pthread_cond_destroy(pthread_cond_t *cond):''''' This function destroys the given condition variable specified by cond; the object becomes, in effect, uninitialized.An implementation may cause pthread_cond_destroy() to set the object referenced by cond to an invalid value.An already destroyed condition variable object can be reinitialized using pthread_cond_init(); the results of otherwise referencing the object after it has been destroyed are undefined.
It shall be safe  to destroy an initialized condition variable upon which no threads are currently blocked. Attempting to destroy condition variable upon which other threads are currently blocked results in an undefined behavior.


If more than one thread  is blocked on a condition variable, the scheduling policy shall determine  the order  in  which  threads  are unblocked.When each thread unblocked as a result of  a pthread_cond_broadcast() or pthread_cond_signal() returns from its call to  pthread_cond_wait()  or  pthread_cond_timedwait(), the thread shall own the  mutex with which it called pthread_cond_wait()  or pthread_cond_timedwait().The thread(s) that are unblocked shall contend for the mutex according to the scheduling policy and as if each had called pthread_mutex_lock().
Return Value: If successful the function returns zero; otherwise, an error number is returned to indicate the error.


====  Barriers  ====


A barrier is a type of synchronization method. A barrier for a group of threads or processes in the source code means any thread/process must stop at this point and cannot proceed until all other threads/processes reach this barrier.
For example<ref>http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#BASICS</ref>, consider the following code:


<code>
  struct list
  {
    pthread_mutex_t lm;
    ...
  }
  struct elt
  {
    key k;
    int busy;
    pthread_cond_t notbusy;
    ...
  }
  /* Find a list element and reserve it. */
  struct elt * list_find(struct list *lp, key k)
  {
    struct elt *ep;
      pthread_mutex_lock(&lp->lm);
      while ((ep = find_elt(l, k) != NULL) && ep->busy)
      pthread_cond_wait(&ep->notbusy, &lp->lm);
      if (ep != NULL)
      ep->busy = 1;
      pthread_mutex_unlock(&lp->lm);
      return(ep);
  }
  delete_elt(struct list *lp, struct elt *ep)
  {
    pthread_mutex_lock(&lp->lm);
    assert(ep->busy);
    ... remove ep from list ...
    ep->busy = 0; /* Paranoid. */
      (A) pthread_cond_broadcast(&ep->notbusy);
        pthread_mutex_unlock(&lp->lm);
      (B) pthread_cond_destroy(&rp->notbusy);
        free(ep);
  }
</code>


'''I.Barrier Wait:'''
In this example, the condition variable and its list element may be freed (line  B) immediately after all threads waiting for it are awakened (line A), since the mutex and the code ensure that no other thread can touch the element to be deleted.


'''int pthread_barrier_wait(pthread_barrier_t *barrier);''' The pthread_barrier_wait() function shall synchronize participating threads at the barrier referenced by barrier. The calling thread shall block until the required number of threads have called pthread_barrier_wait() specifying the barrier.
===== Waiting on condition=====


When the required number of threads have called pthread_barrier_wait() specifying the barrier, the constant PTHREAD_BARRIER_SERIAL_THREAD shall be returned to one unspecified thread and zero shall be returned to each of the remaining threads. At this point, the barrier shall be reset to the state it had as a result of the most recent pthread_barrier_init() function that referenced it.
'''•  ''int pthread_cond_timedwait(pthread_cond_t *restrict cond,pthread_mutex_t *restrict mutex, const struct timespec *restrict abstime);'' and ''int pthread_cond_wait(pthread_cond_t *restrict cond, pthread_mutex_t *restrict mutex):'''''
These functions are used to block a condition variable. They are called with mutex locked by the calling thread or undefined behavior results.These functions atomically release mutex and cause the calling thread to block on the condition variable cond; atomically here means atomically with respect to access by another thread to the  mutex and then the condition variable. That is, if another thread is able to acquire the mutex after the about-to-block thread has released it, then a subsequent call to pthread_cond_broadcast() or pthread_cond_signal() in that thread shall behave as if it were issued after the about-to-block thread has blocked.The mutex would be locked and be owned by the calling thread upon successful return.


The constant PTHREAD_BARRIER_SERIAL_THREAD is defined in <pthread.h> and its value shall be distinct from any other value returned by pthread_barrier_wait().The results are undefined if this function is called with an uninitialized barrier.
===== Waking thread based on condition=====


If a signal is delivered to a thread blocked on a barrier, upon return from the signal handler the thread shall resume waiting at the barrier if the barrier wait has not completed (that is, if the required number of threads have not arrived at the barrier during the execution of the signal handler); otherwise, the thread shall continue as normal from the completed barrier wait. Until the thread in the signal handler returns from it, it is unspecified whether other threads may proceed past the barrier once they have all reached it.
'''•  ''pthread_cond_broadcast(pthread_cond_t *cond): and pthread_cond_signal(pthread_cond_t *cond):''''' These  functions unblock threads blocked on a condition variable.The pthread_cond_broadcast() function unblocks  all threads  currently blocked on the specified condition variable cond. The pthread_cond_signal() function unblocks at least one of the threads that are blocked on the specified condition variable cond (if any threads are blocked on cond).


A thread that has blocked on a barrier shall not prevent any unblocked thread that is eligible to use the same processing resources from eventually making forward progress in its execution. Eligibility for processing resources shall be determined by the scheduling policy.
If  more than one thread is blocked on a condition variable, the scheduling policy determines the order in which threads are unblocked.When each thread unblocked as a result of a pthread_cond_broadcast() or pthread_cond_signal() returns from its call to pthread_cond_wait() or pthread_cond_timedwait(), the thread owns the mutex with which it called pthread_cond_wait() or pthread_cond_timedwait().The thread(s) that are unblocked contend for the mutex according to the scheduling policy and as if each had called pthread_mutex_lock().


'''Return Value:'''Upon successful completion, the pthread_barrier_wait() function shall return PTHREAD_BARRIER_SERIAL_THREAD for a single (arbitrary) thread synchronized at the barrier and zero for each of the other threads. Otherwise, an error number shall be returned to indicate the error.
====  Barriers  ====


A barrier<ref>http://www.cs.utah.edu/~retrac/papers/jpdc05.pdf</ref> is a type of synchronization method which is used for a group of threads or processes in the source code. It basically stops any thread/process at a certain point till all other threads/processes reach it. Only then are the processes are allowed to proceed.


'''II.Destroying the Barrier:'''
===== Initialising the Barrier=====


'''int pthread_barrier_destroy(pthread_barrier_t *barrier);'''The pthread_barrier_destroy() function shall destroy the barrier referenced by barrier and release any resources used by the barrier. The effect of subsequent use of the barrier is undefined until the barrier is reinitialized by another call to pthread_barrier_init(). An implementation may use this function to set barrier to an invalid value. The results are undefined if pthread_barrier_destroy() is called when any thread is blocked on the barrier, or if this function is called with an uninitialized barrier.
'''•  ''int pthread_barrier_init(pthread_barrier_t *restrict barrier, const pthread_barrierattr_t *restrict attr, unsigned count):''''' The init function initializes the barrier with the specified attributes and reserves any resources required to use the barrier. Attempting to initialize an already initialized barrier or initializing a barrier when any thread is blocked on the barrier or using an uninitialized barrier would lead to undefined results.
The count argument specified the number of threads that must call before any of the them successfully return from the call and hence it should be a positive number greater than zero.Failure of the init function results in non initialization of the barrier and the contents of barrier are undefined.Only the object referenced by barrier may be used for performing synchronization. The result of referring to copies of that object in calls to pthread_barrier_destroy() or pthread_barrier_wait() is undefined.


Return Value: Upon successful completion, these functions shall return zero; otherwise, an error number shall be returned to indicate the error.


'''III. Initialising the Barrier:'''
===== Barrier Wait =====


'''int pthread_barrier_init(pthread_barrier_t *restrict barrier, const pthread_barrierattr_t *restrict attr, unsigned count);'''The pthread_barrier_init() function shall allocate any resources required to use the barrier referenced by barrier and shall initialize the barrier with attributes referenced by attr. If attr is NULL, the default barrier attributes shall be used; the effect is the same as passing the address of a default barrier attributes object. The results are undefined if pthread_barrier_init() is called when any thread is blocked on the barrier (that is, has not returned from the pthread_barrier_wait() call). The results are undefined if a barrier is used without first being initialized. The results are undefined if pthread_barrier_init() is called specifying an already initialized barrier.
'''•  ''int pthread_barrier_wait(pthread_barrier_t *barrier):'''''  The wait function is used to synchronize parallel threads.Until a required number of threads call pthread_barrier_wait() referring the barrier, the calling thread blocks.When the required number of threads call the barrier referenced, a zero value is returned to all the threads, except for one.The constant PTHREAD_BARRIER_SERIAL_THREAD is returned to one unspecified thread.And then it is sent to the state it has as a result of the most recent init function.


The count argument specifies the number of threads that must call pthread_barrier_wait() before any of them successfully return from the call. The value specified by count must be greater than zero.
When the required number of threads have arrived at the barrier during the execution of a signal handler, it marks the completion of barrier wait.If a signal is delivered to a thread blocked on a barrier, upon return from the signal handler the thread resumes waiting at the barrier if the barrier wait has not completed; otherwise, the thread continues as normal from the completed barrier wait. Until the thread in the signal handler returns from it, it is unspecified whether other threads may proceed past the barrier once they have all reached it.A thread that has blocked on a barrier does not prevent any unblocked thread that is eligible to use the same processing resources from eventually making forward progress in its execution. Eligibility for processing resources is determined by the scheduling policy.


If the pthread_barrier_init() function fails, the barrier shall not be initialized and the contents of barrier are undefined.
Return Value: Upon successful completion, the function shall return [http://www.lindevdoc.org/doku/libc:pthread_barrier_serial_thread PTHREAD_BARRIER_SERIAL_THREAD] for an arbitrary thread synchronized at the barrier and zero for each of the other threads. Otherwise, an error number shall be returned to indicate the error.


Only the object referenced by barrier may be used for performing synchronization. The result of referring to copies of that object in calls to pthread_barrier_destroy() or pthread_barrier_wait() is undefined.
===== Destroying the Barrier =====


'''Return Value:'''Upon successful completion, these functions shall return zero; otherwise, an error number shall be returned to indicate the error.
'''•  ''int pthread_barrier_destroy(pthread_barrier_t *barrier):'''''This function is used to destroy the barrier passed by the barrier attribute and also releases any resources used by the barrier. A destroyed barrier can be reused when reinitialized by another call to pthread_barrier_init().The destroyed barrier is set to an invalid value. The results are undefined if pthread_barrier_destroy() is called when any thread is blocked on the barrier, or if this function is called with an uninitialized barrier.


'''Pseudo code:'''
Return Value: Upon successful completion, the functions returns zero; otherwise, an error number is returned to indicate the error.


====='''Pseudo code'''=====
    <code>
     pthread_barrier_t *barrier;
     pthread_barrier_t *barrier;
     pthread_barrierattr_t *attr;
     pthread_barrierattr_t *attr;
     unsigned int count;
     unsigned int count;
     int i = pthread_barrier_init(barrier, attr, count); // initialize the barrier
     int i = pthread_barrier_init(barrier, attr, count);             // initialize the barrier
   
     if(i!=0)
     if(i!=0)
    printf(“Error occurred barrier was not initialized”):
      printf(“Error occurred barrier was not initialized”):
     int b = pthread_barrier_wait(barrier); //synchronize participating threads
   
     int b = pthread_barrier_wait(barrier);                         //synchronize participating threads
     if(b!=0)
     if(b!=0)
    printf(“Error occurred in synchronizing threads”);
      printf(“Error occurred in synchronizing threads”);
     // critical section
   
     int d = pthread_barrier_destroy(barrier); //destroy the barrier
     /* critical section */
   
     int d = pthread_barrier_destroy(barrier);                       //destroy the barrier
     if(d!=0)
     if(d!=0)
    printf(“Error occurred barrier was not destroyed”):
      printf(“Error occurred barrier was not destroyed”):
    </code>
 
== References ==
 
<References>http://linux.die.net/man/3/
 
http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/index.jsp
 
http://maxim.int.ru/bookshelf/PthreadsProgram/toc.html
 
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#BASICS
 
</References>
 
== Quiz ==
 
1. What happens if a semaphore cannot be acquired?(More than one options could be correct)
 
a. The process is woken up.
 
b. The process cache is saved off and can be retrieved when the process is woken up.
 
c. The process is put to sleep.
 
d. The process proceeds into the critical section.
 
 
2. Which one of the following attempts is valid?
 
a. Multiple simultaneous calls to pthread_join using the same target.
 
b. Initializing an already initialized conditional variable.
 
c. Initializing an already destroyed mutex.
 
d. Destroying a locked mutex
 
 
3. Upon successful completion what does an int pthread_barrier_wait(pthread_barrier_t *barrier) function return?(More than one options could be correct)         
 
a. Zero for all the threads.
 
b. PTHREAD_BARRIER_SERIAL_THREAD for an arbitrary thread synchronized at the barrier.
 
c. Zero for each of the threads,except for one.
 
d. None of the above.
 
 
4. Which of the following functions are used to lock a semaphore?(More than one options could be correct)
 
a. sem_post
 
b. sem_wait
 
c. sem_trywait
 
d. sem_timedwait
 
5. What parameters does the method pthread_join() take? (More than one options could be correct)
 
a. Null
 
b. value_ptr
 
c. thread_id
 
d. value passed from pthread exit
 
 
6.  Which of the following is true about mutexes and semaphores?
 
a. Both semaphore and mutexes are used to prevent multiple threads from operating on a single memory location.
 
b. Mutexes work between processes and semaphores work only between threads of a single process.
 
c. Mutexes work only between threads in a process and semaphores work between processes.
 
d. None of the above.
 
 
7. Which of the following functions would unlock threads blocked on a conditional variable?(More than one options could be correct)
 
a. int pthread_cond_timedwait
 
b. int pthread_cond_init
 
c. pthread_cond_broadcast
 
d. pthread_cond_signal
 
 
8. What is the significance of ''abs_timeout'' in sem_timedwait function?
 
a. It specifies to return an error if immediate decrement cannot be performed.
 
b. It specifies a limit on the amount of time that the call should block.
 
c. It specifies a timeout error.
 
d. None of the above
 
 
9. Which of the following type of mutexes maintain a lock count?
 
a. ERRORCHECK
 
b. DEFAULT
 
c. RECURSIVE
 
d. INITIALIZER
 


== References: ==
10. When a mutex is initialized with NULL attributes


[1] Linux Manual pages
a. Error is returned


[2]http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/index.jsp?topic=%2Fapis%2Fusers_68.html
b. The mutex referenced is not initialized.


[3]http://maxim.int.ru/bookshelf/PthreadsProgram/toc.html
c. Default mutex attributes are used to initialize the mutex.


[4]http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#BASICS
d. The referenced mutex is destroyed.

Latest revision as of 03:02, 21 February 2013

Current page address: http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2013/3a_bs

Page started with: http://wiki.expertiza.ncsu.edu/index.php/CSC/ECE_506_Spring_2011/ch3_ab

Overview

The main goal of this wiki is to explain which architectural mechanisms are used by library functions for DOALL, DOACROSS, and DOPIPE parallelism, reduction, and functional parallelism in various architectures.

Synchronization

When using any parallel programming model, synchronization<ref>Rahman, M.M.; , "Process synchronization in multiprocessor and multi-core processor," Informatics, Electronics & Vision (ICIEV), 2012 International Conference on , vol., no., pp.554-559, 18-19 May 2012</ref> is needed to guarantee accuracy of the overall program. The following are a few example situations where synchronization<ref>http://people.csail.mit.edu/rinard/paper/cpande99.pdf</ref> will be needed.


• The code following the parallelized loop requires that all of the parallel processes be completed before advancing. It cannot be triggered simply by one of the processes completing.

• A portion of code in the middle of a parallelized section MUST be executed in a very particular order so that global variables used across processes get read and written in the proper order. This is known as the critical section.

• Multiple processes must update a global variable in such a way that one process does not overwrite the updates of a different process. (i.e. SUM = SUM + <process update>).


These are just a few examples. Every architecture implements synchronization in a unique way using different types of mechanisms. The subsequent section will highlight the following that are used to achieve synchronization:

1. Semaphore.h

2. Pthread.h

Libraries<ref>http://linux.die.net/man/3/</ref>

Semaphore.h


A semaphore is special variable that acts similar to a lock. For a process to enter into the critical section it must be able to acquire the semaphore. If the semaphore cannot be acquired, then the process is “put to sleep” and the processor is then used for another process. This means the processes cache is saved off in a place where it can be retrieved when the process is “woken up”. Once the semaphore is available the “sleeping” process is woken up and obtains the semaphore and proceeds in to the critical section.A simple way to execute a semaphore would be to use the following functions for various operations on semaphores;


Initializing a semaphore

int sem_init(sem_t *sem, int pshared, unsigned int value): This function initializes the unnamed semaphore at the address pointed to by sem. Value is the initial value of the semaphore. The variable pshared is indicative of whether the semaphore is shared between threads or processes. If its value is zero it is shared between threads else it is shared between processes.

Return Value: sem_init() returns 0 on success; on error, -1 is returned, and errno is set to indicate the error.

Locking the semaphore

int sem_wait(sem_t *sem): This function decrements (locks) the semaphore pointed to by sem. Decrement will only proceed if the value of the semaphore is greater than zero. If its value is zero, then the call blocks till the value of the semaphore becomes positive so that it can acquire it or a signal handler interrupts the call.

int sem_trywait(sem_t *sem): This function is the same as sem_wait(), except that if the decrement cannot be immediately performed, then call returns an error (errno set to EAGAIN<ref>http://www-numi.fnal.gov/computing/minossoft/releases/R2.3/WebDocs/Errors/unix_system_errors.html</ref>) instead of blocking.

int sem_timedwait(sem_t *sem, const struct timespec *abs_timeout): This function is also the same as sem_wait(), except that abs_timeout specifies a limit on the amount of time that the call should block if the decrement cannot be immediately performed. The abs_timeout argument points to a structure that specifies an absolute timeout in seconds and nanoseconds. This structure is defined as follows:

      struct timespec 
      {
      time_t tv_sec;          /* Seconds */
      long   tv_nsec;         /* Nanoseconds [0 .. 999999999] */
      };

If the timeout has already expired by the time of the call, and the semaphore could not be locked immediately, then sem_timedwait() fails with a timeout error (errno set to ETIMEDOUT<ref>http://pubs.opengroup.org/onlinepubs/7908799/xsh/errors.html</ref>). If the operation can be performed immediately, then sem_timedwait() never fails with a timeout error, regardless of the value of abs_timeout. Furthermore, the validity of abs_timeout is not checked in this case.

Return Value: All of these functions return 0 on success; on error, the value of the semaphore is left unchanged, -1 is returned, and errno<ref>http://www.kernel.org/doc/man-pages/online/pages/man3/errno.3.html</ref> is set to indicate the error.

Releasing the semaphore

int sem_post(sem_t *sem): This function increments (unlocks) the semaphore pointed to by sem, thus making the value of the semaphore positive. Some other process can now acquire it.

Return Value: sem_post() returns 0 on success; on error, the value of the semaphore is left unchanged, -1 is returned, and errno is set to indicate the error.

Pseudo Code

     
     sem_t *sem; 
     int pshared;
     unsigned int value;
     int i = sem_init(sem, pshared, value);                            /*initialize the semaphore*/
     int wait=sem_wait(sem);                                           /*will decrement the value of the semaphore i.e. acquire the lock */
     
     if(wait==-1)
         printf(“Error occurred, the value of the semaphore was not decremented”);
     
     /*critical section;*/
     int post=sem_post(sem);                                           /*will increment the value of the semaphore i.e. release the lock*/
     if(post==-1)
         printf(“Error occurred, the value of the semaphore was not incremented”);
     

Pthread.h


POSIX threads<ref>http://maxim.int.ru/bookshelf/PthreadsProgram/toc.html</ref>, usually referred to as pthreads defines a set of programming language types,functions and constants.It is implemented with a pthread.h header file and a thread library. The pthread library provides the following synchronization mechanisms:

1. Mutexes

2. Joins

3. Conditional Variables

4. Barriers


Mutexes

Mutual Exclusion Lock<ref>http://docs.oracle.com/cd/E19963-01/html/821-1601/sync-28983.html</ref>, mutex in short is another synchronization method and is used to avoid race conditions. In cases leading to data inconsistencies,like when multiple threads are to be prevented from operating on the same memory location simultaneously or when a specific order of operation is expected, mutexes are used.It blocks access to variables by other threads. Mutexes are in particular used to protect a critical region (“a segment of memory”) from other threads<ref>Raghunathan, S.; , "Extending Inter-process Synchronization with Robust Mutex and Variants in Condition Wait," Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on , vol., no., pp.121-128, 8-10 Dec. 2008</ref>.

The following are the functions for using mutexes:<ref>http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.genprogc%2Fdoc%2Fgenprogc%2Fmutexes.htm</ref>

Initialising the mutex

pthread_mutex_init (pthread_mutex_t *restrict mutex,const pthread_mutexattr_t *restrict attr): The mutex referenced by the mutex is initialised with the attributes specified by attr using this function. The default mutex attributes are used if attribute passed is NULL.Passing a NULL attribute implies passing the address of a default mutex. The state of the mutex is initialized and unlocked,upon successful initialization.

The pthread_mutex_init can be used to reinitialize an already destroyed mutex,but attempting to initialize an already initialized mutex results in undefined behavior. The macro PTHREAD_MUTEX_INITIALIZER is used to initialize mutexes that are statically allocated in cases where default mutex attributes are appropriate. It is dynamic initialization by a call with parameter attr specified as NULL to pthread_mutex_init() but in this case no error checks are performed.

Return Value: The function returns zero on success; otherwise, an error number is returned to indicate the error.

Destroying the mutex

pthread_mutex_destroy (pthread_mutex_t *mutex): This function is used to destroy a mutex which is no longer needed. The function destroys the mutex object passed by mutex attribute and hence the mutex object becomes uninitialized. pthread_mutex_destroy() sets the object referenced to an invalid value. A destroyed mutex object can be reinitialized using pthread_mutex_init().A mutex cannot be referenced after it is destroyed.It is advised to destroy an initialized mutex that is unlocked. Attempting to destroy a locked mutex results in undefined behavior.

Return Value: The function returns zero on success; otherwise, an error number is returned to indicate the error.

Locking the mutex

pthread_mutex_lock (pthread_mutex_t *mutex): This function is used to lock the mutex passed. If the mutex is already locked by another process, the calling thread blocks until the mutex is unlocked by the other process.This operation returns the mutex object referenced by mutex in the locked state with the calling thread as its owner.Error checking is provided if the mutex is of type PTHREAD_MUTEX_ERRORCHECK. Error shall be returned if a thread attempts to relock a mutex that it has already locked or if a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked.

Mutexes of type,PTHREAD_MUTEX_RECURSIVE<ref>http://developer.apple.com/library/ios/#documentation/System/Conceptual/ManPages_iPhoneOS/man3/pthread_mutexattr_settype.3.html</ref> maintain a lock count. The lock count is set to one when a mutex is acquired by a thread successfully for the first time. And the lock count is incremented in units of one every time a thread relocks this mutex. Each time the thread unlocks the mutex, the lock count is decremented by one. When the lock count reaches zero, the mutex becomes available for other threads to acquire.An error is returned if a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked.

If the mutex type is PTHREAD_MUTEX_DEFAULT, attempting to recursively lock the mutex or attempting to unlock the mutex if it was not locked by the calling thread or attempting to unlock the mutex if it is not locked results in undefined behavior.If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread shall resume waiting for the mutex as if it was not interrupted.

Return Value: The functions return a zero on success; otherwise, an error number is returned to indicate the error.

Unlocking the mutex

pthread_mutex_unlock (pthread_mutex_t *mutex): This function is used to release a mutex that is previously locked.The function shall release the mutex object referenced by mutex. The manner in which a mutex is released is dependent upon the mutex's type attribute.If there are threads blocked on the mutex object referenced by mutex when pthread_mutex_unlock() is called, resulting in the mutex becoming available, the scheduling policy shall determine which thread shall acquire the mutex. An error is returned if mutex is already unlocked or owned by another thread.

Return Value: The functions return a zero on success; otherwise, an error number is returned to indicate the error.

Pseudo code
   
    pthread_mutex_t *mutex,
    const pthread_mutexattr_t *attr;
    int p, p1, pu, pd;
    p = pthread_mutex_init(mutex, attr);
    
    if(p!=0)
         printf("Error occurred mutex was not created”);
    
    pl = pthread_mutex_lock(mutex);     
    if(pl!=0)
         printf("Error occurred mutex was not locked”);
    
    /*critical section*/
    pu = pthread_mutex_unlock(mutex);
    if(pu!=0)
         printf("Error occurred mutex was not unlocked”);
    
    pd = pthread_mutex_destroy(mutex);
    if(pd!=0)
         printf("Error occurred mutex was not destroyed”);
    
Avoiding Deadlock<ref>http://www2.chrishardick.com:1099/Notes/Computing/C/pthreads/mutexes.html</ref>

Deadlocks occur when the program holds more than one mutex. A classic example of deadlock is;

Thread 1:

   lock mutex_a
   |
   lock mutex_b
   -blocked forever waiting for mutex_b

Thread 2:

   lock mutex_b
   |
   lock mutex_a
   -blocked forever waiting for mutex a


Few common techniques<ref>http://pages.cs.wisc.edu/~remzi/Classes/537/Fall2011/Book/threads-deadlock.pdf</ref> exist which can be used to avoid deadlocks, which are;

1. Establish a locking hierarchy.<ref>http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/inspectorxe/lin/ug_docs/GUID-38B68BDA-257C-4D4E-9AC9-B0B2698AD950.htm</ref>

2. Spinlock.

3. Chaining.


1. Establish a locking hierarchy:

• A hierarchy needs to be established to hold multiple locks.

• A rule could be established such that in order to lock mutex_a and mutex_b, mutex_a must be locked before mutex_b.

• If unnecessary then both locks should not be held simultaneously.

• Order of locking the mutexes must be maintained.

• Mutexes can be unlocked in any order that is preferred by the program, because unlocking doesn't create deadlocks.

• Another function must be formulated in order to unlock the set of mutexes.


Thread 1:

   lock mutex_a
   lock mutex_b
   
   //perform processing
   
   unlock mutex_a
   unlock mutex_b
   ...

Thread 2:

   lock mutex_a(blocked)
   
   wake up(obtained mutex_a)
   lock mutex_b
   
   //perform processing
   
   unlock mutex_a
   unlock mutex_b
   ...

2. Spin lock:

• The first mutex can be locked without any constraints. But from here on to lock additional mutexes non-blocking mutexes<ref>http://docs.oracle.com/cd/E19683-01/806-6867/sync-36993/index.html</ref> must be used (pthread_mutex_trylock())

• In case of a failure in any lock, unlock in reverse order and try again.

• This method of unlocking in the reverse order reduces the spinning for other threads.


Thread 1:

   lock mutex_a
   try-lock mutex_b
   |
   perform processing 
   | 
   unlock mutex_b
   unlock mutex_a  
   |        
   ...

Thread 2:

   lock mutex_a (blocked)
   
   wake up (obtained mutex_a)
   try-lock mutex_b
   |
   perform processing
   |
   unlock mutex_b
   unlock mutex_a

3. Chaining:

• Traversing linked lists and other data structures.

• A locking hierarchy is to be created, like Lock1, Lock2, Lock3, Unlock1, Lock4.

Thread 1

   lock head node  
   head node processing  
   |                    
   traverse branch locking first node 
   unlock head node 
   first node processing

Joins

A join is performed when one wants to wait for a thread to finish. A thread calling routine may launch multiple threads and then wait for them to finish to get the results.

pthread_join(pthread_t thread,void **value_ptr): This function determines if a thread has completed before starting another task. The argument,thread refers to the thread id and value_ptr refers to the value passed from pthread_exit. This function suspends the execution of the calling thread until the target thread terminates, unless the target thread has already terminated.On return from a successful pthread_join() call with a non-NULL value_ptr argument, the value passed to pthread_exit() by the terminating thread shall be made available in the location referenced by value_ptr. When a pthread_join() returns successfully, the target thread has been terminated. The results of multiple simultaneous calls to pthread_join() specifying the same target thread are undefined. If the thread calling pthread_join() is canceled, then the target thread shall not be detached.

Return Value: The function returns zero on success; otherwise, the error number is returned to indicate the error.

Conditional Variables

Conditional variables<ref>http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/index.jsp</ref> are a kind of synchronization objects that are used to allow threads to wait for certain events to occur and are slightly more complex than mutexes. In order to ensure a safe and consistent serialization, usage of condition variables requires the thread to cooperatively use a specific protocol which includes a mutex, a boolean predicate and the condition variable itself. The threads that are cooperating using condition variables can wait for a condition to occur, or can wake up other threads that are waiting for a condition.

The following are the functions used in conjunction with the conditional variable:

Creating/Destroying

int pthread_cond_init(pthread_cond_t *restrict cond, const pthread_condattr_t *restrict attr): This function is used to initialize the conditional variable referenced by 'cond' with attributes referenced by 'attr'. Default condition variable attributes shall be used if attribute is NULL.It is same as passing the address of a default condition variable attributes object.Upon successful initialization, the conditional variable state is initialized. Attempting to initialize an already initialized condition variable results in undefined behavior.

Return Value: If successful the function returns zero; otherwise, an error number is returned to indicate the error.


pthread_cond_destroy(pthread_cond_t *cond): This function destroys the given condition variable specified by cond; the object becomes, in effect, uninitialized.An implementation may cause pthread_cond_destroy() to set the object referenced by cond to an invalid value.An already destroyed condition variable object can be reinitialized using pthread_cond_init(); the results of otherwise referencing the object after it has been destroyed are undefined. It shall be safe to destroy an initialized condition variable upon which no threads are currently blocked. Attempting to destroy condition variable upon which other threads are currently blocked results in an undefined behavior.

Return Value: If successful the function returns zero; otherwise, an error number is returned to indicate the error.


For example<ref>http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#BASICS</ref>, consider the following code:

 struct list 
 {
    pthread_mutex_t lm;
    ...
 }
 struct elt 
 {
    key k;
    int busy;
    pthread_cond_t notbusy;
    ...
 }
 /* Find a list element and reserve it. */
 struct elt * list_find(struct list *lp, key k)
 {
    struct elt *ep;
      pthread_mutex_lock(&lp->lm);
      while ((ep = find_elt(l, k) != NULL) && ep->busy)
      pthread_cond_wait(&ep->notbusy, &lp->lm);
      if (ep != NULL)
      ep->busy = 1;
      pthread_mutex_unlock(&lp->lm);
      return(ep);
 }
 delete_elt(struct list *lp, struct elt *ep)
 {
    pthread_mutex_lock(&lp->lm);
    assert(ep->busy);
    ... remove ep from list ...
    ep->busy = 0;	 /* Paranoid. */
      (A) pthread_cond_broadcast(&ep->notbusy);
       pthread_mutex_unlock(&lp->lm);
      (B) pthread_cond_destroy(&rp->notbusy);
       free(ep); 
  }

In this example, the condition variable and its list element may be freed (line B) immediately after all threads waiting for it are awakened (line A), since the mutex and the code ensure that no other thread can touch the element to be deleted.

Waiting on condition

int pthread_cond_timedwait(pthread_cond_t *restrict cond,pthread_mutex_t *restrict mutex, const struct timespec *restrict abstime); and int pthread_cond_wait(pthread_cond_t *restrict cond, pthread_mutex_t *restrict mutex): These functions are used to block a condition variable. They are called with mutex locked by the calling thread or undefined behavior results.These functions atomically release mutex and cause the calling thread to block on the condition variable cond; atomically here means atomically with respect to access by another thread to the mutex and then the condition variable. That is, if another thread is able to acquire the mutex after the about-to-block thread has released it, then a subsequent call to pthread_cond_broadcast() or pthread_cond_signal() in that thread shall behave as if it were issued after the about-to-block thread has blocked.The mutex would be locked and be owned by the calling thread upon successful return.

Waking thread based on condition

pthread_cond_broadcast(pthread_cond_t *cond): and pthread_cond_signal(pthread_cond_t *cond): These functions unblock threads blocked on a condition variable.The pthread_cond_broadcast() function unblocks all threads currently blocked on the specified condition variable cond. The pthread_cond_signal() function unblocks at least one of the threads that are blocked on the specified condition variable cond (if any threads are blocked on cond).

If more than one thread is blocked on a condition variable, the scheduling policy determines the order in which threads are unblocked.When each thread unblocked as a result of a pthread_cond_broadcast() or pthread_cond_signal() returns from its call to pthread_cond_wait() or pthread_cond_timedwait(), the thread owns the mutex with which it called pthread_cond_wait() or pthread_cond_timedwait().The thread(s) that are unblocked contend for the mutex according to the scheduling policy and as if each had called pthread_mutex_lock().

Barriers

A barrier<ref>http://www.cs.utah.edu/~retrac/papers/jpdc05.pdf</ref> is a type of synchronization method which is used for a group of threads or processes in the source code. It basically stops any thread/process at a certain point till all other threads/processes reach it. Only then are the processes are allowed to proceed.

Initialising the Barrier

int pthread_barrier_init(pthread_barrier_t *restrict barrier, const pthread_barrierattr_t *restrict attr, unsigned count): The init function initializes the barrier with the specified attributes and reserves any resources required to use the barrier. Attempting to initialize an already initialized barrier or initializing a barrier when any thread is blocked on the barrier or using an uninitialized barrier would lead to undefined results.

The count argument specified the number of threads that must call before any of the them successfully return from the call and hence it should be a positive number greater than zero.Failure of the init function results in non initialization of the barrier and the contents of barrier are undefined.Only the object referenced by barrier may be used for performing synchronization. The result of referring to copies of that object in calls to pthread_barrier_destroy() or pthread_barrier_wait() is undefined.

Return Value: Upon successful completion, these functions shall return zero; otherwise, an error number shall be returned to indicate the error.

Barrier Wait

int pthread_barrier_wait(pthread_barrier_t *barrier): The wait function is used to synchronize parallel threads.Until a required number of threads call pthread_barrier_wait() referring the barrier, the calling thread blocks.When the required number of threads call the barrier referenced, a zero value is returned to all the threads, except for one.The constant PTHREAD_BARRIER_SERIAL_THREAD is returned to one unspecified thread.And then it is sent to the state it has as a result of the most recent init function.

When the required number of threads have arrived at the barrier during the execution of a signal handler, it marks the completion of barrier wait.If a signal is delivered to a thread blocked on a barrier, upon return from the signal handler the thread resumes waiting at the barrier if the barrier wait has not completed; otherwise, the thread continues as normal from the completed barrier wait. Until the thread in the signal handler returns from it, it is unspecified whether other threads may proceed past the barrier once they have all reached it.A thread that has blocked on a barrier does not prevent any unblocked thread that is eligible to use the same processing resources from eventually making forward progress in its execution. Eligibility for processing resources is determined by the scheduling policy.

Return Value: Upon successful completion, the function shall return PTHREAD_BARRIER_SERIAL_THREAD for an arbitrary thread synchronized at the barrier and zero for each of the other threads. Otherwise, an error number shall be returned to indicate the error.

Destroying the Barrier

int pthread_barrier_destroy(pthread_barrier_t *barrier):This function is used to destroy the barrier passed by the barrier attribute and also releases any resources used by the barrier. A destroyed barrier can be reused when reinitialized by another call to pthread_barrier_init().The destroyed barrier is set to an invalid value. The results are undefined if pthread_barrier_destroy() is called when any thread is blocked on the barrier, or if this function is called with an uninitialized barrier.

Return Value: Upon successful completion, the functions returns zero; otherwise, an error number is returned to indicate the error.

Pseudo code
   
   pthread_barrier_t *barrier;
   pthread_barrierattr_t *attr;
   unsigned int count;
   int i = pthread_barrier_init(barrier, attr, count);             // initialize the barrier
   
   if(i!=0)
      printf(“Error occurred barrier was not initialized”):
   
   int b = pthread_barrier_wait(barrier);                          //synchronize participating threads
   if(b!=0)
      printf(“Error occurred in synchronizing threads”);
   
   /* critical section */
   
   int d = pthread_barrier_destroy(barrier);                       //destroy the barrier
   if(d!=0)
      printf(“Error occurred barrier was not destroyed”):
   

References

<References>http://linux.die.net/man/3/

http://publib.boulder.ibm.com/infocenter/iseries/v7r1m0/index.jsp
http://maxim.int.ru/bookshelf/PthreadsProgram/toc.html
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html#BASICS

</References>

Quiz

1. What happens if a semaphore cannot be acquired?(More than one options could be correct)

a. The process is woken up.

b. The process cache is saved off and can be retrieved when the process is woken up.

c. The process is put to sleep.

d. The process proceeds into the critical section.


2. Which one of the following attempts is valid?

a. Multiple simultaneous calls to pthread_join using the same target.

b. Initializing an already initialized conditional variable.

c. Initializing an already destroyed mutex.

d. Destroying a locked mutex


3. Upon successful completion what does an int pthread_barrier_wait(pthread_barrier_t *barrier) function return?(More than one options could be correct)

a. Zero for all the threads.

b. PTHREAD_BARRIER_SERIAL_THREAD for an arbitrary thread synchronized at the barrier.

c. Zero for each of the threads,except for one.

d. None of the above.


4. Which of the following functions are used to lock a semaphore?(More than one options could be correct)

a. sem_post

b. sem_wait

c. sem_trywait

d. sem_timedwait


5. What parameters does the method pthread_join() take? (More than one options could be correct)

a. Null

b. value_ptr

c. thread_id

d. value passed from pthread exit


6. Which of the following is true about mutexes and semaphores?

a. Both semaphore and mutexes are used to prevent multiple threads from operating on a single memory location.

b. Mutexes work between processes and semaphores work only between threads of a single process.

c. Mutexes work only between threads in a process and semaphores work between processes.

d. None of the above.


7. Which of the following functions would unlock threads blocked on a conditional variable?(More than one options could be correct)

a. int pthread_cond_timedwait

b. int pthread_cond_init

c. pthread_cond_broadcast

d. pthread_cond_signal


8. What is the significance of abs_timeout in sem_timedwait function?

a. It specifies to return an error if immediate decrement cannot be performed.

b. It specifies a limit on the amount of time that the call should block.

c. It specifies a timeout error.

d. None of the above


9. Which of the following type of mutexes maintain a lock count?

a. ERRORCHECK

b. DEFAULT

c. RECURSIVE

d. INITIALIZER


10. When a mutex is initialized with NULL attributes

a. Error is returned

b. The mutex referenced is not initialized.

c. Default mutex attributes are used to initialize the mutex.

d. The referenced mutex is destroyed.