CSC/ECE 506 Spring 2014/4a ad: Difference between revisions

Revision as of 22:33, 25 February 2014

Automatic Parallelization

Auto parallelization is the technique of automatically converting a sequential code into multi-threaded or vectorized (or even both) code. Therefore one can take advantage of a shared memory multi processor environment to run the multi threaded code and obtain a better performance. This also serves the goal of relieving programmers from the tedious and error-prone manual parallelization process.

Introduction

With increasing transistor densities, multi-core computing systems are being touted as the most popular means of delivering performance. Unless the application is parallelized efficiently, one cannot realize the potential of multi core environment. It is a challenge to efficiently parallelize sequential programs without any errors. It is widely accepted that manual parallelization of code, by expert programmers, leads to the most streamlined parallel implementation. But it is also a costly and time consuming process. If one can parallelize the compiler technology to itself while ensuring the formal correctness of the resulting parallel code, time-to-market and cost can be greatly reduced.

Approaches

Abhishek

Generally, a high fraction of a program’s execution time takes place in one form of loop or the other. So the initial emphasis of auto parallelization was on loop level parallelization.There are two main approaches to parallelization of loops: pipelined multi-threading and cyclic multi-threading.

Cyclic multithreading

A parallelizing compiler using cyclic multithreading tries to split a loop in a such a way that each iteration of a loop can be executed concurrently on a different processor.Before actually parallelizing the code, the compiler does two passes analyzing the following in each pass the code respectively.

a) How safe is it to parallelize the loop?

Accurate dependency analysis and alias analysis is required to answer the above question. The compiler needs to first determine whether each iteration of the loop can be executed independently of others. Some forms of data dependencies can be addressed but they might incur additional overhead in the form of message passing or synchronization of shared memory or some other method of processor communication.

b) Can we gain an appreciable performance enhancement if we parallelize the loop?

In order to estimate it, the compiler compares the theoretical execution time of code after parallelization to the sequential execution time. This has to be done because sometimes the overhead of parallelization and execution over multiple processors might dominate the actual enhancement obtained due to parallelization. In such cases the actual speedup will be less than the potential speedup.

Example:

        for(i=0;i<n;i++)
	    z[i] = x[i] + y[i];

The above code exhibits DOALL parallelism because each iteration is independent of the others, and the final result of z[] will not depend on the execution order of the other iterations. Therefore it can be auto parallelized by the compiler because each iteration is independent of the others, and the final result of array z will be correct regardless of the execution order of the other iterations.

        for(i=2;i<n;i++)
	    z[i] = z[i-1]*2;

The above for loop cannot be auto-parallelized because z[i] depends on the previous iteration value z[i-1].But it doesn’t mean that it cannot be parallelized. The above loop is equal to the following

        for(i=2;i<n;i++)
	    z[i] = z[1]*2^(i-1);

But current parallelizing compilers are not usually capable of bringing out these parallelisms automatically, and one cannot say for sure that this piece of code will benefit from parallelization.

@@ Line 35: / Line 35: @@
 </pre>
+<p>The above for loop cannot be auto-parallelized because z[i] depends on the previous iteration value z[i-1].But it doesn’t mean that it cannot be parallelized. The above loop is equal to the following</p>
+<pre>
+        for(i=2;i<n;i++)
+	    z[i] = z[1]*2^(i-1);
+</pre>
+<p>But current parallelizing compilers are not usually capable of bringing out these parallelisms automatically, and one cannot say for sure that this piece of code will benefit from parallelization.</p>
 === Dipali ===

CSC/ECE 506 Spring 2014/4a ad: Difference between revisions

Revision as of 22:33, 25 February 2014

Contents

Automatic Parallelization

Introduction

Approaches

Abhishek

Cyclic multithreading

Dipali

Examples of Auto Parallel Compilers

Limitations

References

Navigation menu

CSC/ECE 506 Spring 2014/4a ad: Difference between revisions

Revision as of 22:33, 25 February 2014

Automatic Parallelization

Introduction

Approaches

Abhishek

Cyclic multithreading

Dipali

Examples of Auto Parallel Compilers

Limitations

References

Navigation menu

Search