Chapter 2b: Data parallelism in GPUs: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 11: | Line 11: | ||
== Problem == | == Problem == | ||
Consider the problem of computing addition of two vectors and storing the result into a third vector. A standard C program to achieve this would look like this: | |||
<pre> | |||
int main() | |||
{ | |||
float A[1000],B[1000],C[1000]; | |||
..... | |||
//Some initializations | |||
..... | |||
for(int i = 0; i < 1000; i++) | |||
C[i] = A[i] + B[i]; | |||
...... | |||
} | |||
</pre> | |||
== Solution == | == Solution == | ||
=== | === CUDA solution === | ||
Revision as of 00:21, 31 January 2012
Take a modern GPU architecture, and use it as an example in explaining how data-parallel programming is done. Do this in a discussion similar to the discussion of the hypothetical array processor in Lecture 3. That is, describe the problem, then describe the instructions of the GPU, and show code for how the problem can be solved efficiently using GPU instructions. You might want to use multiple examples to illustrate different facilities of a GPU instruction set.
Introduction
Terminology
Basics of CUDA GPU
Architecture overview
Instruction set overview
C Runtime overview
Problem
Consider the problem of computing addition of two vectors and storing the result into a third vector. A standard C program to achieve this would look like this:
int main() { float A[1000],B[1000],C[1000]; ..... //Some initializations ..... for(int i = 0; i < 1000; i++) C[i] = A[i] + B[i]; ...... }