CSC/ECE 506 Fall 2007/wiki 11 e4: Difference between revisions
Line 4: | Line 4: | ||
== Ordering == | == Ordering == | ||
In parallel programming models, ordering is key to coordinating the activity of all threads. Ordering will ensure that dependencies are maintained and that specified threads remain synchronized via explicit synchronization operations. | In parallel programming models, ordering is key to coordinating the activity of all threads. Ordering will ensure that dependencies are maintained and that specified threads remain synchronized via explicit synchronization operations. | ||
Parallel Programming models have developed to take advantage of shared memory multiprocessors and distributed memory systems. Nanothreads Programming Model (NPM) is one such programming model, and exploits multiple levels of loop and functional parallelism. This model results in a runtime environment that has dynamic program adaptability, allowing the program to adjust the granularity of the generated parallelism to the available resources. The main ordering and scheduling objective of this model is that application scheduling, at the user level, and virtual process scheduling, at the kernel level, be tightly coordinated. The result is high performance. | |||
Parallel Programming models have developed to take advantage of shared memory multiprocessors and distributed memory systems. Nanothreads Programming Model (NPM) is one such programming model | Other programming models combine loop directives and message passing to maintain order and facilitate reordering. An example of a program using this method is the GeoFEM on the Earth Simulator Supercomputer in Japan. This is part of a parallel finite-element platform for solid earth simulation being studied. A special reordering technique was used to create parallel iterative solvers with localized preconditioning. This allows GeoFEM to attain concurrent local operations, no global dependency, continuous memory access, and sufficiently long innermost loops to take advantage of the vector processors on Earth Simulator. | ||
Other programming models combine loop directives and message passing to maintain order and facilitate reordering. An example of a program using this method is the GeoFEM on the Earth Simulator Supercomputer in Japan. This is part of a parallel finite-element platform for solid earth simulation being studied. A special reordering technique was used to create parallel iterative solvers with localized preconditioning. This allows GeoFEM to attain concurrent local operations, no global | |||
== Synchronization == | == Synchronization == |
Revision as of 16:52, 10 September 2007
Sections 1.3.1 and 1.3.2: Communication and programming model.
Ordering
In parallel programming models, ordering is key to coordinating the activity of all threads. Ordering will ensure that dependencies are maintained and that specified threads remain synchronized via explicit synchronization operations. Parallel Programming models have developed to take advantage of shared memory multiprocessors and distributed memory systems. Nanothreads Programming Model (NPM) is one such programming model, and exploits multiple levels of loop and functional parallelism. This model results in a runtime environment that has dynamic program adaptability, allowing the program to adjust the granularity of the generated parallelism to the available resources. The main ordering and scheduling objective of this model is that application scheduling, at the user level, and virtual process scheduling, at the kernel level, be tightly coordinated. The result is high performance. Other programming models combine loop directives and message passing to maintain order and facilitate reordering. An example of a program using this method is the GeoFEM on the Earth Simulator Supercomputer in Japan. This is part of a parallel finite-element platform for solid earth simulation being studied. A special reordering technique was used to create parallel iterative solvers with localized preconditioning. This allows GeoFEM to attain concurrent local operations, no global dependency, continuous memory access, and sufficiently long innermost loops to take advantage of the vector processors on Earth Simulator.
Synchronization
There is a small collection of tools that are commonly used for synchronization in parallel architectures. These include:
Locks - Locks limit access to shared resources by identifying each resource as being available or unavailable. Successful acquisition of a lock means that your code now has the right to use the protected resource.
Semaphores - A semaphore is a flag that signals the presence of a specific condition in your program. Semaphores offer a way to synchronize the execution of code.
Critical regions" - A critical region is a section of code that can be executed by only one thread at a time. You might designate a block of code as a critical region to prevent modification of a common data structure or to prevent race conditions as two threads execute the same code.
Barriers - Deadlock occurs whenever a thread is blocked waiting on a resource of another thread that will never become available.
Atomic operations - The x86 bit test and set is an example of Instruction Set Architecture support for synchronization. They simplify synchronization by eliminating the need to locks altogether.
This list is short, has not changed much in the recent past and is unlikely to change any time soon. Any new mechanisms like that of a monotonic counter(see references) only build on existing process synchronization constructs. Most of the effort and research seems to be going in the way of building threading libraries that will allow software developers to build scalable applications that can optimally utilize massively parallel architectures. These goal of these libraries is to reduce the amount of time spent on dealing with the low-level without compromising on the performance of the applications. Intel, for example, has gone public with an open source threading building block library with promises that it will become the STL(Standard Template Library) for multi-threaded applications. These libraries allow rapid application development of new applications while hiding the low-level thread synchronization mechanics. A few years ago, distributed computing was the talk of the town but with the movement of PC architecture toward multiple cores per processor, these threading libraries will have a big role to play in getting the best performance out of these new architectures.
References
1. John Thornley, K. Mani Chandy, "Monotonic Counters: A New Mechanism for Thread Synchronization," ipdps, p. 573, 14th International Parallel and Distributed Processing Symposium (IPDPS'00), 2000
2. Intel Threading Building Blocks 2.0, http://www.intel.com/cd/software/products/asmo-na/eng/294797.htm?cid=cim:ggl%7Cspd_us_tbb%7Ck7A3E%7Cs
3. Apple Threading Libraries, http://developer.apple.com/documentation/Cocoa/Conceptual/Multithreading/articles/ThreadLocking.html