CSC 456 Spring 2012/ch4b: Difference between revisions
No edit summary |
(the source is copyright me, pzwong, 2012) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==Gustafson's Law== | ==Gustafson's Law== | ||
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times.<ref name="karp" /> "Karp's Challenge", as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold | In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times.<ref name="karp" /> "Karp's Challenge", as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold <ref name="published speedup" />, and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that. | ||
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program.<ref name="IBM" /> He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). Gustafson referred to this as [http://en.wikipedia.org/wiki/Gustafson%27s_law scaled speedup]. This provided the basis of what became known as "Gustafson's Law". | John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program.<ref name="IBM" /> He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). Gustafson referred to this as [http://en.wikipedia.org/wiki/Gustafson%27s_law scaled speedup]. This provided the basis of what became known as "Gustafson's Law". | ||
===Derivation from Amdahl's Law=== | ===Derivation from Amdahl's Law=== | ||
Amdahl's law assumes that the problem size is constant. A different expression, Gustafson's law, can be derived if the total time used to execute a workload is kept constant instead.<ref name="derivation" /> This source contains the derivation. | |||
==Superlinear Speedup== | ==Superlinear Speedup== | ||
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial | If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as [http://en.wikipedia.org/wiki/Speedup#Super_linear_speedup superlinear speedup]. This controversy is the product of differences in interpretations of the basis of what constitutes speedup, making this an ideological disagreement.<ref name = "controversy" /> | ||
Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.<ref name="SS" /> | Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.<ref name="SS" /> |
Latest revision as of 18:09, 26 March 2012
Gustafson's Law
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times.<ref name="karp" /> "Karp's Challenge", as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold <ref name="published speedup" />, and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program.<ref name="IBM" /> He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). Gustafson referred to this as scaled speedup. This provided the basis of what became known as "Gustafson's Law".
Derivation from Amdahl's Law
Amdahl's law assumes that the problem size is constant. A different expression, Gustafson's law, can be derived if the total time used to execute a workload is kept constant instead.<ref name="derivation" /> This source contains the derivation.
Superlinear Speedup
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as superlinear speedup. This controversy is the product of differences in interpretations of the basis of what constitutes speedup, making this an ideological disagreement.<ref name = "controversy" />
Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.<ref name="SS" />
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work.
Lack of a Serial Equivalent
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.
Additionally, a serial algorithm could be constructed that would reduce the total problem size, but it would be much slower than its parallel counterpart. E.g., you could take this serial algorithm and parallelize it and both instances would do the same amount of work (less than the original problem size), but the parallel version would still do it much faster, achieving the superlinear speedup.
References
<references> <ref name="karp">http://books.google.com/books?id=Hm6LaufVKFEC&pg=PA55&lpg=PA55&dq=%E2%80%9CKarp+Challenge%E2%80%9D&source=bl&ots=uCAOgSzfmR&sig=KpvmL85rJHqoFuBZlXNL_e_thbs&hl=en&sa=X&ei=ZNRgT4HxL4KatweYz5y7BQ&ved=0CFAQ6AEwBw#v=onepage&q=%E2%80%9CKarp%20Challenge%E2%80%9D&f=false</ref> <ref name="IBM">http://techresearch.intel.com/ResearcherDetails.aspx?Id=182</ref> <ref name="published speedup">http://books.google.com/books?id=Hm6LaufVKFEC&pg=PA55&lpg=PA55&dq=%E2%80%9Cpublished+speedups%E2%80%9D&source=bl&ots=uCAOgSzfmR&sig=KpvmL85rJHqoFuBZlXNL_e_thbs&hl=en&sa=X&ei=ZNRgT4HxL4KatweYz5y7BQ&ved=0CFAQ6AEwBw#v=onepage&q=%E2%80%9Cpublished%20speedups%E2%80%9D</ref> <ref name="SS">http://en.wikipedia.org/wiki/Speedup#Super_linear_speedup</ref> <ref name="derivation">http://expertiza.csc.ncsu.edu/wiki/images/5/5f/Ohai.png</ref> <ref name = "controversy">http://www.sciencedirect.com/science/article/pii/0167819187900536</ref> </references>