CSC/ECE 506 Spring 2013/2b ks
Introduction
Processor clock speed has plateaued in the last several years. This has resulted in demand for other alternatives to achieve performance gains. At the heart of sciences discoveries has been parallelism. Today’s computers are driven meant to take advantage of processing in parallel to produce quicker results. Even personal computers bought in stores contains CPU’s with several cores that execute in parallel.
While the idea of multi-core is a fairly new concept for the CPU, it has been a focus for GPU design for much longer. GPU’s have had the luxury of being purposed for very specialized tasks related to graphics computation and rendering. These types of computation require a great deal of parallelism to be efficient. In Understanding the Parallelism of GPU’s, the author uses an example of blending two images together. Part of what the GPU will need to be able to do is perform a blending action on pixels in both images. This process will for the most part be the same operation, just on different data points. The intense level of floating point calculation along with a focused purpose have led GPU’s to rely heavily on data parallelism. This focus has lead to very different engineering choices as GPU’s have matured. Now that CPU clock speeds are no longer improving, the scientific and technology communities are looking to GPU’s for a way to see continued efficiency gains.
Based on the engineering choices of GPU hardware, developers have been wanting to take advantage of the raw processing power to leverage more responsive applications. The requirements to do this were very daunting to anyone but the most highly trained and experience programmers. A strong level knowledge for the low level details was required to see any true benefit and if the developer was not highly trained in the area of processing they were trying to see performance gains in that could instead have the exact opposite affect. Due to these challenges software alternative slowly started to appear to try and make the task of leveraging this power through software less daunting. As part of this article we will take a look at General Purpose Graphic Processing Unit (GPGPU) programming. We will take a look at language abstraction such as CUDA and OpenCL to see how accessible this approach is becoming. We will also examine the possible performance gains of this approach and how they compare to solutions that are evolving that still utilize CPU’s. Lastly, we will give our thoughts on the future of this area of study based on our assessments and our expectation for its place in the industry.
History
Software
In the the nineties 3 dimensional rendering was in high demand as the next step in graphical performance. Based on this demand, programming interfaces emerged called OpenGL and DirectX. OpenGL is a language independent, platform independent collections of functions which can be called by a client application to render 2D and 3D graphics. Because OpenGL is language independent many 3rd parties right extensions to the language or interface with it using their own API’s (javascript would be an example of this). DirectX is Microsofts platform specific version of this same concept which made it’s debut as part of Window 95 to help incentivize the Windows platform for game developers who were more likely to write games for DOS due to its allowance of direct access to the graphics hardware. Both technologies are built the rendering power and efficiency as their central focus.
At the turn of the 21st century the two major companies in the GPU industry emerged as Nvidia and ATI both made great strides in allowing developers access to the graphics rendering hardware. Nvidia began allowing developers to insert small chunks of code into the graphics pipeline giving them programmable control over shaders. ATI were next contribute this evolution when they were the first to support floating point calculations in the hardware. This opened the door for more realistic graphics rendering and modeling. Even with these notable steps, developers were still very restricted about how many instructions they could write and what functions were supported. It was enough to entice developers but they were still quite limited.
Even with the progress being made to allow developers access to the graphics hardware, developers were still required to have extensive knowledge of how to go about manipulating that graphics hardware and the algorithmic calculations that needed to take place to accomplish their goal. In short, developers were capable of doing it but it was not very accessible. In 2006 Nvidia introduced CUDA (Compute Unified Device Architecture) which truly revolutionized development for GPU’s and truly made GPGPU a reality for the technology industry. CUDA introduced a true parallel computing platform that gave developers access to memory and computational processes[wiki]. It also gave this access through extensions to the C programming language giving the development community an interface they were familiar with. The drawback to Nvidia’s CUDA is that it was not agnostic. It was specifically built for Nvidia GPUs. This challenge was addressed in 2008 when OpenCL was released. Originally a project from Apple, it was submitted to Khronos Group to complete. It is a completely open standard that has been adopted by many of the high profile technology leaders including Intel and AMD. OpenCL is a collection of API’s that provide data parallelism and is hardware agnostic.