CSC 456 Spring 2012/ch4b: Difference between revisions

Revision as of 18:03, 19 March 2012

Gustafson's Law

In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. "Karp's Challenge", as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.

John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as "Gustafson's Law".

SOURCE http://books.google.com/books?id=Hm6LaufVKFEC&pg=PA55&lpg=PA55&dq=%E2%80%9CKarp+Challenge%E2%80%9D&source=bl&ots=uCAOgSzfmR&sig=KpvmL85rJHqoFuBZlXNL_e_thbs&hl=en&sa=X&ei=ZNRgT4HxL4KatweYz5y7BQ&ved=0CFAQ6AEwBw#v=onepage&q=%E2%80%9CKarp%20Challenge%E2%80%9D&f=false SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182

Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -> Assumes a fixed problem size (T1 = 1)
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -> Assumes a fixed execution time (Tparallel = 1)
How to calculate T1?

Examine the work graph:
Tparallel =
[s][1-s  ]
   [1-s  ]
     ...
   [1-s  ]
Total execution time: s+(1-s) = 1 = Tparallel
Serial fraction: s = 0.3 (3 of 10 units)

T1 =
[s][1-s  ][1-s  ] ... [1-s  ]
By inspection, the execution time is a single serial portion + p parallel portions.
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p

Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s

Superlinear Speedup

If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as "superlinear speedup".

Superlinear speedup most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.

Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work.

Sources:
[1] http://books.google.com/books? id=Hm6LaufVKFEC&pg=PA55&lpg=PA55&dq=%E2%80%9Cpublished+speedups%E2%80%9D&source=bl&ots=uCAOgSzfmR&sig=KpvmL85rJHqoFuBZlXNL_e_thbs&hl=en&sa=X&ei=ZNRgT4HxL4KatweYz5y7BQ&ved=0CFAQ6AEwBw#v=onepage&q=%E2%80%9Cpublished%20speedups%E2%80%9D
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182

@@ Line 30: / Line 30: @@
 </pre>
 Superlinear Speedup
+If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as "superlinear speedup".
+Superlinear speedup most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.
+Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work.
 Sources:<br>
 [1] http://books.google.com/books? id=Hm6LaufVKFEC&pg=PA55&lpg=PA55&dq=%E2%80%9Cpublished+speedups%E2%80%9D&source=bl&ots=uCAOgSzfmR&sig=KpvmL85rJHqoFuBZlXNL_e_thbs&hl=en&sa=X&ei=ZNRgT4HxL4KatweYz5y7BQ&ved=0CFAQ6AEwBw#v=onepage&q=%E2%80%9Cpublished%20speedups%E2%80%9D<br>
 [2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182

CSC 456 Spring 2012/ch4b: Difference between revisions

Revision as of 18:03, 19 March 2012

Navigation menu

Search