CSC 456 Spring 2012/ch4b: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
Gustafson's Law
==Gustafson's Law==


In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. "Karp's Challenge", as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. "Karp's Challenge", as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.
Line 8: Line 8:
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182


===Derivation from Amdahl's Law===
<pre>
<pre>
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -> Assumes a fixed problem size (T1 = 1)
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -> Assumes a fixed problem size (T1 = 1)
Line 29: Line 30:
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s
</pre>
</pre>
Superlinear Speedup
 
==Superlinear Speedup==


If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as "superlinear speedup".
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as "superlinear speedup".


 
Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.
 
Superlinear speedup most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.


Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work.  
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work.  


===Lack of a Serial Equivalent===
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.



Revision as of 19:42, 19 March 2012

Gustafson's Law

In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. "Karp's Challenge", as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.

John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as "Gustafson's Law".

SOURCE http://books.google.com/books?id=Hm6LaufVKFEC&pg=PA55&lpg=PA55&dq=%E2%80%9CKarp+Challenge%E2%80%9D&source=bl&ots=uCAOgSzfmR&sig=KpvmL85rJHqoFuBZlXNL_e_thbs&hl=en&sa=X&ei=ZNRgT4HxL4KatweYz5y7BQ&ved=0CFAQ6AEwBw#v=onepage&q=%E2%80%9CKarp%20Challenge%E2%80%9D&f=false SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182

Derivation from Amdahl's Law

Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -> Assumes a fixed problem size (T1 = 1)
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -> Assumes a fixed execution time (Tparallel = 1)
How to calculate T1?

Examine the work graph:
Tparallel =
[s][1-s  ]
   [1-s  ]
     ...
   [1-s  ]
Total execution time: s+(1-s) = 1 = Tparallel
Serial fraction: s = 0.3 (3 of 10 units)

T1 =
[s][1-s  ][1-s  ] ... [1-s  ]
By inspection, the execution time is a single serial portion + p parallel portions.
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p

Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s

Superlinear Speedup

If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as "superlinear speedup".

Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.

Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work.

Lack of a Serial Equivalent

However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.

Additionally, a serial algorithm could be constructed that would reduce the total problem size, but it would be much slower than its parallel counterpart. E.g., you could take this serial algorithm and parallelize it and both instances would do the same amount of work (less than the original problem size), but the parallel version would still do it much faster, achieving the superlinear speedup.

Sources:
[1] http://books.google.com/books?id=Hm6LaufVKFEC&pg=PA55&lpg=PA55&dq=%E2%80%9Cpublished+speedups%E2%80%9D&source=bl&ots=uCAOgSzfmR&sig=KpvmL85rJHqoFuBZlXNL_e_thbs&hl=en&sa=X&ei=ZNRgT4HxL4KatweYz5y7BQ&ved=0CFAQ6AEwBw#v=onepage&q=%E2%80%9Cpublished%20speedups%E2%80%9D
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182