CSC/ECE 506 Fall 2007/wiki 11 e4

From Expertiza_Wiki
Jump to navigation Jump to search

Sections 1.3.1 and 1.3.2: Communication and programming model.

How have reordering strategies evolved to accommodate larger multicomputers?

Reordering strategies in the book include:

Implicit ordering – Operations in a thread are in program order. When multiple threads access the same data, there is no guarantee that they will not reach the data too late/too soon.

Mutual Exclusion – Certain operations on certain data are performed by only one thread/process at a time. The processes are sequential, but without a specific order or events.

Events – Specific events allow other processes to start. An event may trigger a single process or a group of processes.

In more recent years, advances have been made to accommodate larger multicomputers. These advances include developments in reordering both processes and the data being used by the processes.

Process Ordering

Explicit Reordering – Programs avoid overhead by explicitly assigning and ordering processes to take advantage of data locality. This approach works well when there is complicated data dependency.

Data Reordering:

Reverse Cuthill-McKee (RCM) Reordering - RCM is a typical level set reordering method. Elements of a level set are traversed from the nodes of the highest degree to those of the lowest degree, according to dependency relationships. Degree refers to the number of nodes connected to each node.

DJDS reordering- Reorders data to produce arrays of coefficients with continuous memory access. Vector programming is made more efficient by providing sufficient length for innermost loops. Descending-order jagged diagonal storage (DJDS) involves permuting rows into an order of decreasing number of non-zeros. Can be modified to split and permute arrays to be distributed across an SMP node (parallel DJDS or PDJDS)

[1] provides a good explanation and comparison of RCM/PDJDS re-ordering techniques.


Have new kinds of synchronization operations been developed?

There is a small collection of tools that are commonly used for synchronization in parallel architectures. These include:

Locks - Locks limit access to shared resources by identifying each resource as being available or unavailable. Successful acquisition of a lock means that your code now has the right to use the protected resource.

Semaphores - A semaphore is a flag that signals the presence of a specific condition in your program. Semaphores offer a way to synchronize the execution of code.

Critical regions" - A critical region is a section of code that can be executed by only one thread at a time. You might designate a block of code as a critical region to prevent modification of a common data structure or to prevent race conditions as two threads execute the same code.

Barriers - Deadlock occurs whenever a thread is blocked waiting on a resource of another thread that will never become available.

Atomic operations - The x86 bit test and set is an example of Instruction Set Architecture support for synchronization. They simplify synchronization by eliminating the need to locks altogether.

This list is short, has not changed much in the recent past and is unlikely to change any time soon. Any new mechanisms like that of a monotonic counter(see references) only build on existing process synchronization constructs. Most of the effort and research seems to be going in the way of building threading libraries that will allow software developers to build scalable applications that can optimally utilize massively parallel architectures. These goal of these libraries is to reduce the amount of time spent on dealing with the low-level without compromising on the performance of the applications. Intel, for example, has gone public with an open source threading building block library with promises that it will become the STL(Standard Template Library) for multi-threaded applications. These libraries allow rapid application development of new applications while hiding the low-level thread synchronization mechanics. A few years ago, distributed computing was the talk of the town but with the movement of PC architecture toward multiple cores per processor, these threading libraries will have a big role to play in getting the best performance out of these new architectures.

References

1. John Thornley, K. Mani Chandy, "Monotonic Counters: A New Mechanism for Thread Synchronization," ipdps, p. 573, 14th International Parallel and Distributed Processing Symposium (IPDPS'00), 2000

2. Intel Threading Building Blocks 2.0, http://www.intel.com/cd/software/products/asmo-na/eng/294797.htm?cid=cim:ggl%7Cspd_us_tbb%7Ck7A3E%7Cs

3. Apple Threading Libraries, http://developer.apple.com/documentation/Cocoa/Conceptual/Multithreading/articles/ThreadLocking.html