CSC/ECE 506 Spring 2010/ch 9
In addition to a proper cache coherency model, it is also important for a multiprocessor system to provide support at the hardware level for synchronization. The most common types of synchronization are locks and barriers, which are discussed in this chapter.
Lock Implementations
Locks are an important concept when programming for multi core systems. The basic concept of a lock is to protect the code inside the lock. That is to be sure that while a certain thread X has entered the critical section, another thread Y is not also inside of the critical section and possibly modifying critical values. When a lock is being used, while thread X has entered into the critical section, thread Y must wait until X has exited before entering. This can be accomplished in a variety of ways of varying complexity and performance.
== Performance Evaluation
The Solihin text gives 4 methods to determine the performance of a lock implementation.
- Acquisition Latency - How much time does it take to acquire the lock?
- Traffic - How much bus traffic is generated by threads attempting to acquire the lock?
- Fairness - FIFO vs. Luck
- Storage - How much storage is needed compared to the number of threads?
Atomic Instructions
Since a multiprocessor system cannot disable interupts as an effective method to execute a bit of code atomically, there must be hardware support for atomic operations. [1]
Being able to execute an atomic instruction is a requirement for most lock implementations. It is important that when a processor attempts to set a lock either the lock is fully set and the thread is able to enter into the critical section, or the lock is not set, and it appears that none of the instructions required to set the lock have executed.
In the x86 instruction set the opcode CMPXCHG (meaning compare and exchange) can be used in a lock implementation in order to guarantee atomicity. This function works by sending a destination and a source. The accumulator is compared to the destination and if they are equal loaded with the source. If they are NOT equal the accumulator is loaded with the destination value. In order to assure that this is executed atomically the opcode must be issued with the LOCK prefix. This is useful in implementing some locks, such as ticket locks. [2]
Hand-off Lock
Another type of lock that is not discussed in the text is known as the "Hand-off" lock. In this lock the first thread acquires the lock if no other thread is currently locked (since it is the first thread). When another thread attempts to gain the lock it will see that the lock is in use and adds itself to the queue. Once done this thread can sleep until called by the thread with the lock. Once the thread in the lock is finished, it will pass the lock to the next thread in the queue. [www.cs.duke.edu/~chase/cps110/slides/threads3.ppt]
Avoiding Locks
There are many reasons why a programmer should attempt to write programs in such a way as to avoid locks if possible. There are many problems that can arise with the use of locks. [3]
One of the must well known issue is deadlock. This can occur when the threads are waiting to acquire the lock, but the lock will never be unlocked. For example if 2 threads are both spinning on a lock that is locked, they will continue to spin forever, as each thread 'thinks' that the other is inside the critical section.
Another problem with using locks is that the performance is not optimal, as often a lock is used when there is only a chance of conflict. This approach to programming yields slower performance than what might be possible with other methods. This also leads to questions of granularity, that is how much of the code should be protected under the critical section. The programmer must decide between many small (fine grain) locks or fewer, more encompassing locks. This decision can greatly effect the performance of the program. Lock (computer science) [4]
Barrier Implementations
Partners Work Not Received