CSC 456 Fall 2013/1d vb: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(17 intermediate revisions by 2 users not shown)
Line 14: Line 14:
The issue with increased pipeline length is the problem of incorrect branch predictions. The longer a pipeline is, the more stages of wasted processing have been wasted when a different branch is taken. Decreasing the pipeline length has resulted in lower clock frequencies, but equal or better IPC. A smaller pipeline suffers less of a loss for every bad prediction, and the overall performance is improved. With all processor properties, there is no simple "best" pipeline, there is always a bell curve pointing to the the most effective pipeline pipeline length for a given setup.
The issue with increased pipeline length is the problem of incorrect branch predictions. The longer a pipeline is, the more stages of wasted processing have been wasted when a different branch is taken. Decreasing the pipeline length has resulted in lower clock frequencies, but equal or better IPC. A smaller pipeline suffers less of a loss for every bad prediction, and the overall performance is improved. With all processor properties, there is no simple "best" pipeline, there is always a bell curve pointing to the the most effective pipeline pipeline length for a given setup.


Latch delays can also play a role when you increase pipeline Length..
Latch delays can also play a role when you increase pipeline Length. When you attempt to improve performance by increasing the number of stages more than 90% of the optimum performance, problems occur. The performance does not increase at this point unless the latch overhead time is addressed, requiring that the increasable latch overhead time is less than the total overhead time. [5]
" When a performance increase is attempted by further increasing the number of stages for a processor in which  the  number  of  stages  is  more than 90% of the optimum for performance, the performance is found not to be increased unless the increasable latch overhead time is less than the overhead time. " [5] - An Analysis about Increasable Latch Overhead Time for Processor Pipeline Depth Increase


==Power Consumption vs. Performance==
==Power Consumption vs. Performance==
Along with performance, power is another concern in microarchitectural design. There have been multiple studies on what optimal pipeline depth when considering both performance and power. Srinivasan stated that majority of power used is related to latches, including clocking and the leakage of power per latch. The number of latches grows super linearly with the number of pipeline stages according to Srinivasan,  
Along with performance, power is another concern in microarchitectural design. There have been multiple studies on what optimal pipeline depth when considering both performance and power. Srinivasan stated that majority of power used is related to latches, including clocking and the leakage of power per latch. The number of latches grows super linearly with the number of pipeline stages according to Srinivasan. The overall power/performance is improved with the number of pipeline stages, as illustrated in the graph.


==An Example of Pipeline changes in Cray Systems==
[[File:Pipeline_stage_vs_metric.png]]
 
==Examples of Pipeline changes in Different Processors==


{| class="wikitable"
{| class="wikitable"
Line 31: Line 32:
|-
|-
| 1976
| 1976
| Cray 1 [2]
| Cray 1  
| 3
| 3 [2]
| 12
| 12
|-
|-
|2006
|2006
|IBM Cell BE
|IBM Cell BE
|23
|23 [6]
|16
|16
|-
|-
| 2012
| 2012
| Cray XK7 [3]
| Cray XK7
| 12 for scalar , 17 for vector
| 12 for scalar , 17 for vector [3]
| 6
| 6
|
|
|}
|}


[http://www.ibm.com/developerworks/power/library/pa-cellperf/figure2.gif An example of the Cell Pipeline]
[http://www.ibm.com/developerworks/power/library/pa-cellperf/figure2.gif A visual example of the Cell Pipeline]


==Sources==
==Sources==
<ol>
<ol>
<li>[https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CE4QFjAA&url=http%3A%2F%2Fclasses.soe.ucsc.edu%2Fcmpe202%2FFall04%2Fpapers%2Fopteron.pdf&ei=hQsmUv6LKMnJsASb8IGACQ&usg=AFQjCNHvPcgDLJjfk0ufcd7HRA6aDAgU8w&sig2=fZvAhhwalsuZAs7GuamDHg&bvm=bv.51495398,d.cWc The AMD Opteron Processor for Multiprocessor Servers ]</li>
<li>[https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CE4QFjAA&url=http%3A%2F%2Fclasses.soe.ucsc.edu%2Fcmpe202%2FFall04%2Fpapers%2Fopteron.pdf&ei=hQsmUv6LKMnJsASb8IGACQ&usg=AFQjCNHvPcgDLJjfk0ufcd7HRA6aDAgU8w&sig2=fZvAhhwalsuZAs7GuamDHg&bvm=bv.51495398,d.cWc The AMD Opteron Processor for Multiprocessor Servers] Chetana N. Keltcher, Kevin J. McGrath, Ardsher Ahmed, Pat Conway </li>
<li>[http://en.wikipedia.org/wiki/Cray-1 Cray 1]</li>
<li>[http://en.wikipedia.org/wiki/Cray-1 Cray 1] Wikipedia</li>
<li>[http://en.wikipedia.org/wiki/XK7 Cray SK7]</li>
<li>[http://en.wikipedia.org/wiki/XK7 Cray XK7] Wikipedia</li>
<li>[https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CDkQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.93.4333%26rep%3Drep1%26type%3Dpdf&ei=BwwmUtX-D7OgsQSrxYCgDw&usg=AFQjCNFrDohjVe-SefuaJvLAJwXEFVgWYw&sig2=Hfx9Gs6MI8XtOVT3PvoDlw&bvm=bv.51495398,d.cWc The Optimum Pipeline Depth for a Microprocessor]</li>
<li>[https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CDkQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.93.4333%26rep%3Drep1%26type%3Dpdf&ei=BwwmUtX-D7OgsQSrxYCgDw&usg=AFQjCNFrDohjVe-SefuaJvLAJwXEFVgWYw&sig2=Hfx9Gs6MI8XtOVT3PvoDlw&bvm=bv.51495398,d.cWc The Optimum Pipeline Depth for a Microprocessor] A. Hartstein, Thomas R. Puzak</li>
<li>[http://www.readcube.com/articles/10.1002/ecjc.20127?locale=en An Analysis about Increasable Latch Overheard time for Processor Pipeline Depth Increase]</li>
<li>[http://www.readcube.com/articles/10.1002/ecjc.20127?locale=en An Analysis about Increasable Latch Overheard time for Processor Pipeline Depth Increase] Magoshi & Murakami</li>
<li>[http://dl.acm.org/citation.cfm?id=956566 Optimum Power/Performance Pipeline Depth]</li>
<li>[http://www.ibm.com/developerworks/power/library/pa-cellperf/ Cell Broadband Engine Architecture] Dr.Thomas Chen, Dr.Ram Raghavan, Jason Dale, Eiji Iwata</li>
<li>[http://www.ibm.com/developerworks/power/library/pa-cellperf/ Optimum Power/Performance Pipeline Depth]</li>
</ol>
</ol>

Latest revision as of 16:24, 17 September 2013

Trends in Pipelining

Introduction

Computing architectures have changed greatly over the relatively short span of a few decades. In the pursuit of good performance and economical cost, processor architectures have taken many forms. There have been many trends over the years relating to specific processor characteristics such as pipeline length. Some changes are based on technological limitations of the time period, and other decisions are based on hypothetical and real world performance research.

Factors Favoring Longer Pipeline Length

With each tick of the clock, the pipeline is advanced by one stage. Having a much longer pipeline allows for each individual step to be very short. Since each individual pipeline step is relatively small it is possible for the clock speed to be much faster since each step does not require as much time or work.

There is a secondary affect of longer pipelines as well. The resulting higher clock speed can also be used as a marketing point. The average user does not understand the metrics of raw processor power, but being able to compare two numbers such as 2.9Ghz vs 3.4Ghz is a simple way in which many attempt to understand different processors.

Factors Favoring Shorter Pipeline Length

The issue with increased pipeline length is the problem of incorrect branch predictions. The longer a pipeline is, the more stages of wasted processing have been wasted when a different branch is taken. Decreasing the pipeline length has resulted in lower clock frequencies, but equal or better IPC. A smaller pipeline suffers less of a loss for every bad prediction, and the overall performance is improved. With all processor properties, there is no simple "best" pipeline, there is always a bell curve pointing to the the most effective pipeline pipeline length for a given setup.

Latch delays can also play a role when you increase pipeline Length. When you attempt to improve performance by increasing the number of stages more than 90% of the optimum performance, problems occur. The performance does not increase at this point unless the latch overhead time is addressed, requiring that the increasable latch overhead time is less than the total overhead time. [5]

Power Consumption vs. Performance

Along with performance, power is another concern in microarchitectural design. There have been multiple studies on what optimal pipeline depth when considering both performance and power. Srinivasan stated that majority of power used is related to latches, including clocking and the leakage of power per latch. The number of latches grows super linearly with the number of pipeline stages according to Srinivasan. The overall power/performance is improved with the number of pipeline stages, as illustrated in the graph.

Examples of Pipeline changes in Different Processors

Pipeline Specifications of Cray Systems
Year Name Pipeline Length Number of Pipelines
1976 Cray 1 3 [2] 12
2006 IBM Cell BE 23 [6] 16
2012 Cray XK7 12 for scalar , 17 for vector [3] 6

A visual example of the Cell Pipeline

Sources

  1. The AMD Opteron Processor for Multiprocessor Servers Chetana N. Keltcher, Kevin J. McGrath, Ardsher Ahmed, Pat Conway
  2. Cray 1 Wikipedia
  3. Cray XK7 Wikipedia
  4. The Optimum Pipeline Depth for a Microprocessor A. Hartstein, Thomas R. Puzak
  5. An Analysis about Increasable Latch Overheard time for Processor Pipeline Depth Increase Magoshi & Murakami
  6. Cell Broadband Engine Architecture Dr.Thomas Chen, Dr.Ram Raghavan, Jason Dale, Eiji Iwata