CSC/ECE 506 Spring 2014/4a ad
Automatic Parallelization
Auto parallelization is the technique of automatically converting a sequential code into multi-threaded or vectorized (or even both) code. Therefore one can take advantage of a shared memory multi processor environment to run the multi threaded code and obtain a better performance. This also serves the goal of relieving programmers from the tedious and error-prone manual parallelization process.
Introduction
With increasing transistor densities, multi-core computing systems are being touted as the most popular means of delivering performance. Unless the application is parallelized efficiently, one cannot realize the potential of multi core environment. It is a challenge to efficiently parallelize sequential programs without any errors. It is widely accepted that manual parallelization of code, by expert programmers, leads to the most streamlined parallel implementation. But it is also a costly and time consuming process. If one can parallelize the compiler technology to itself while ensuring the formal correctness of the resulting parallel code, time-to-market and cost can be greatly reduced.
Approaches
Abhishek
Generally, a high fraction of a program’s execution time takes place in one form of loop or the other. So the initial emphasis of auto parallelization was on loop level parallelization.There are two main approaches to parallelization of loops: pipelined multi-threading and cyclic multi-threading.
Cyclic multithreading
A parallelizing compiler using cyclic multithreading tries to split a loop in a such a way that each iteration of a loop can be executed concurrently on a different processor.Before actually parallelizing the code, the compiler does two passes analyzing the following in each pass the code respectively.
- a) How safe is it to parallelize the loop?
Accurate dependency analysis and alias analysis is required to answer the above question. The compiler needs to first determine whether each iteration of the loop can be executed independently of others. Some forms of data dependencies can be addressed but they might incur additional overhead in the form of message passing or synchronization of shared memory or some other method of processor communication.
- b) Can we gain an appreciable performance enhancement if we parallelize the loop?
In order to estimate it, the compiler compares the theoretical execution time of code after parallelization to the sequential execution time. This has to be done because sometimes the overhead of parallelization and execution over multiple processors might dominate the actual enhancement obtained due to parallelization. In such cases the actual speedup will be less than the potential speedup.
Example:
for(i=0;i<n;i++) z[i] = x[i] + y[i];
The above code exhibits DOALL parallelism because each iteration is independent of the others, and the final result of z[] will not depend on the execution order of the other iterations. Therefore it can be auto parallelized by the compiler because each iteration is independent of the others, and the final result of array z will be correct regardless of the execution order of the other iterations.
for(i=2;i<n;i++) z[i] = z[i-1]*2;