<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Ccoffey</id>
	<title>Expertiza_Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Ccoffey"/>
	<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Special:Contributions/Ccoffey"/>
	<updated>2026-05-13T16:18:23Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61873</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61873"/>
		<updated>2012-04-13T19:42:16Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two] [[How 'bout IBM's large systems--Blue Gene, etc]].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The system is networked together via [http://en.wikipedia.org/wiki/Point-to-point_(network_topology)#Point-to-point point-to-point], or direct, connection. Fujitsu has their own proprietary network, known as the &amp;quot;Tofu Interconnect&amp;quot;. It is a six-dimensional [http://en.wikipedia.org/wiki/Mesh_topology mesh]/[http://en.wikipedia.org/wiki/Torus_interconnect torus] topology. Each set of 12 nodes is called a &amp;quot;node group&amp;quot; and is considered the unit of job allocation. Each node group is connected to adjacent node groups via a three-dimensional torus network. Additionally, the nodes within each node group are adjacently connection via their own three-dimensional mesh/torus. &amp;lt;ref name=&amp;quot;kpdf&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;ktofu&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;knetwork&amp;quot;/&amp;gt;  [[What topology?  Surely not 95^2 links!]]&lt;br /&gt;
&lt;br /&gt;
The K Computer is not a [http://en.wikipedia.org/wiki/Distributed_shared_memory distributed shared memory] (DSM) machine in which the physically separate nodes are addressed as one logically shared address space. Instead, the K Computer utilizes a [http://en.wikipedia.org/wiki/Message_Passing_Interface message passing interface] (MPI), allowing the nodes to pass messages to one another as needed.&lt;br /&gt;
&lt;br /&gt;
==Tianhe-1A==&lt;br /&gt;
The Tianhe-1A, sponsored by the National University of Defense Technology in China, is capable of 4.701 petaFLOPS. It is comprised of 14,336 Xeon X5670 processors and 7,168 Nvidia GP-GPUs. In addition to the Xeon and Nvidia chips, there are 2048 FeiTeng 1000 processors.&lt;br /&gt;
&lt;br /&gt;
All of these processors are contained in 112 computer cabinets, 12 storage cabinets, 6 communication cabinets, and 8 I/O cabinets. In each computer cabinet are 4 racks with 8 blades each and a 16 port switch. A single blade contains 2 computer nodes each containing 2 Xeon processors and 1 Nvidia GPU. This comes to a total of 3584 blades. These individual nodes are connected using a high-speed interconnect called Arch, which has a bandwidth of 160 Gbps.&lt;br /&gt;
&lt;br /&gt;
The Arch interconnect uses point-to-point connections in a hybrid fat tree configuration.&lt;br /&gt;
&lt;br /&gt;
The system uses message passing rather than shared memory, so neither a system-wide cache coherency protocol nor a memory consistency protocol is necessary.&lt;br /&gt;
&lt;br /&gt;
[[Maybe you could make a table of characteristics of these supercomputers ... you could use top500 as a starting point, and add more detailed info on architecture ... though that might be hard to obtain for some.]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kpdf&amp;quot;&amp;gt;http://www.fujitsu.com/downloads/TC/sc10/interconnect-of-k-computer.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;ktofu&amp;quot;&amp;gt;http://www.fujitsu.com/global/about/tech/k/whatis/network/&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;knetwork&amp;quot;&amp;gt;http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61870</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61870"/>
		<updated>2012-04-13T19:35:56Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two] [[How 'bout IBM's large systems--Blue Gene, etc]].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The system is networked together via [http://en.wikipedia.org/wiki/Point-to-point_(network_topology)#Point-to-point point-to-point], or direct, connection. Fujitsu has their own proprietary network, known as the &amp;quot;Tofu Interconnect&amp;quot;. It is a six-dimensional [http://en.wikipedia.org/wiki/Mesh_topology mesh]/[http://en.wikipedia.org/wiki/Torus_interconnect torus] topology. &amp;lt;ref name=&lt;br /&gt;
&amp;quot;kpdf&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;ktofu&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;knetwork&amp;quot;/&amp;gt;  [[What topology?  Surely not 95^2 links!]]&lt;br /&gt;
&lt;br /&gt;
The K Computer is not a [http://en.wikipedia.org/wiki/Distributed_shared_memory distributed shared memory] (DSM) machine in which the physically separate nodes are addressed as one logically shared address space. Instead, the K Computer utilizes a [http://en.wikipedia.org/wiki/Message_Passing_Interface message passing interface] (MPI), allowing the nodes to pass messages to one another as needed.&lt;br /&gt;
&lt;br /&gt;
==Tianhe-1A==&lt;br /&gt;
The Tianhe-1A, sponsored by the National University of Defense Technology in China, is capable of 4.701 petaFLOPS. It is comprised of 14,336 Xeon X5670 processors and 7,168 Nvidia GP-GPUs. In addition to the Xeon and Nvidia chips, there are 2048 FeiTeng 1000 processors.&lt;br /&gt;
&lt;br /&gt;
All of these processors are contained in 112 computer cabinets, 12 storage cabinets, 6 communication cabinets, and 8 I/O cabinets. In each computer cabinet are 4 racks with 8 blades each and a 16 port switch. A single blade contains 2 computer nodes each containing 2 Xeon processors and 1 Nvidia GPU. This comes to a total of 3584 blades. These individual nodes are connected using a high-speed interconnect called Arch, which has a bandwidth of 160 Gbps.&lt;br /&gt;
&lt;br /&gt;
The Arch interconnect uses point-to-point connections in a hybrid fat tree configuration.&lt;br /&gt;
&lt;br /&gt;
The system uses message passing rather than shared memory, so neither a system-wide cache coherency protocol nor a memory consistency protocol is necessary.&lt;br /&gt;
&lt;br /&gt;
[[Maybe you could make a table of characteristics of these supercomputers ... you could use top500 as a starting point, and add more detailed info on architecture ... though that might be hard to obtain for some.]]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kpdf&amp;quot;&amp;gt;http://www.fujitsu.com/downloads/TC/sc10/interconnect-of-k-computer.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;ktofu&amp;quot;&amp;gt;http://www.fujitsu.com/global/about/tech/k/whatis/network/&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;knetwork&amp;quot;&amp;gt;http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61771</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61771"/>
		<updated>2012-04-11T15:53:58Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The system is networked together via [http://en.wikipedia.org/wiki/Point-to-point_(network_topology)#Point-to-point point-to-point], or direct, connection. &amp;lt;ref name=&amp;quot;knetwork&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The K Computer is not a [http://en.wikipedia.org/wiki/Distributed_shared_memory distributed shared memory] (DSM) machine in which the physically separate nodes are addressed as one logically shared address space. Instead, the K Computer utilizes a [http://en.wikipedia.org/wiki/Message_Passing_Interface message passing interface] (MPI), allowing the nodes to pass messages to one another as needed.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;knetwork&amp;quot;&amp;gt;http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61770</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61770"/>
		<updated>2012-04-11T15:44:10Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
The system is networked together via [http://en.wikipedia.org/wiki/Point-to-point_(network_topology)#Point-to-point point-to-point], or direct, connection. &amp;lt;ref name=&amp;quot;knetwork&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;knetwork&amp;quot;&amp;gt;http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61769</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61769"/>
		<updated>2012-04-11T15:44:02Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The system is networked together via [http://en.wikipedia.org/wiki/Point-to-point_(network_topology)#Point-to-point point-to-point], or direct, connection. &amp;lt;ref name=&amp;quot;knetwork&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;knetwork&amp;quot;&amp;gt;http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61768</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61768"/>
		<updated>2012-04-11T15:43:05Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
The system is networked together via [http://en.wikipedia.org/wiki/Point-to-point_(network_topology)#Point-to-point point-to-point], or direct, connection. &amp;lt;ref name=&amp;quot;knetwork&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;knetwork&amp;quot;&amp;gt;http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61767</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61767"/>
		<updated>2012-04-11T15:42:48Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Comp&lt;br /&gt;
uter consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
The system is networked together via [http://en.wikipedia.org/wiki/Point-to-point_(network_topology)#Point-to-point point-to-point], or direct, connection. &amp;lt;ref name=&amp;quot;knetwork&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;knetwork&amp;quot;&amp;gt;http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61766</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61766"/>
		<updated>2012-04-11T15:42:18Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
The system is networked together via [http://en.wikipedia.org/wiki/Point-to-point_(network_topology)#Point-to-point point-to-point], or direct, connection. &amp;lt;ref name=&amp;quot;knetwork&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;knetwork&amp;quot;&amp;gt;http://www.riken.jp/engn/r-world/info/release/pamphlet/aics/pdf/2010_09.pdf&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61765</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61765"/>
		<updated>2012-04-11T15:33:18Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61764</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61764"/>
		<updated>2012-04-11T15:33:04Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
==K Computer==&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;kprocs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/K_computer&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61763</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61763"/>
		<updated>2012-04-11T15:31:04Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
=K Computer=&lt;br /&gt;
&lt;br /&gt;
Made by [http://www.fujitsu.com/global/ Fujitsu], the K Computer consists of 88,128 processors between 864 cabinets. Each cabinet contains 96 nodes which, in turn, each contain one processor and 16 GBytes of memory. &amp;lt;ref name=&amp;quot;kprocs&amp;quot;/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61761</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61761"/>
		<updated>2012-04-11T15:28:57Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
=K Computer=&lt;br /&gt;
Maker: [http://www.fujitsu.com/global/ Fujitsu]&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61760</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61760"/>
		<updated>2012-04-11T15:28:40Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
=K Computer=&lt;br /&gt;
Maker: [[http://www.fujitsu.com/global/ Fujitsu]]&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61759</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61759"/>
		<updated>2012-04-11T15:27:37Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessor Examples==&lt;br /&gt;
&lt;br /&gt;
Some examples of large-scale multiprocessor systems include Fujitsu's K Computer, the Tianhe-1A from the National Supercomputer Center in Tianjin, China, and [another example or two].&lt;br /&gt;
&lt;br /&gt;
=K Computer=&lt;br /&gt;
Maker: Fujitsu&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61758</id>
		<title>CSC 456 Spring 2012/11a NC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/11a_NC&amp;diff=61758"/>
		<updated>2012-04-11T15:23:38Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: Created page with &amp;quot;==Large-Scale Multiprocessors==&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Large-Scale Multiprocessors==&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2011&amp;diff=61757</id>
		<title>CSC 456 Spring 2011</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2011&amp;diff=61757"/>
		<updated>2012-04-11T15:23:09Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Chapter 1: Nick Nicholls, Albert Chu]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 4a: Brandon Chisholm, Chris Barile]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 6: Joshua Mohundro, Patrick Wong]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 6: Allison Hamann, Chris Barile]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch1 BC]] &amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch7 MN]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch7 AA]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch4b]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/10a AJ]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/11a AB]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/11a NC]]&amp;lt;br/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60066</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60066"/>
		<updated>2012-03-19T19:56:25Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Gustafson's Law==&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times.&amp;lt;ref name=&amp;quot;karp&amp;quot; /&amp;gt; &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [&amp;lt;ref name=&amp;quot;published speedup&amp;quot; /&amp;gt;, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program.&amp;lt;ref name=&amp;quot;IBM&amp;quot; /&amp;gt; He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
===Derivation from Amdahl's Law===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Superlinear Speedup==&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&amp;lt;ref name=&amp;quot;SS&amp;quot; /&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
===Lack of a Serial Equivalent===&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.&lt;br /&gt;
&lt;br /&gt;
Additionally, a serial algorithm could be constructed that would reduce the total problem size, but it would be much slower than its parallel counterpart. E.g., you could take this serial algorithm and parallelize it and both instances would do the same amount of work (less than the original problem size), but the parallel version would still do it much faster, achieving the superlinear speedup.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;karp&amp;quot;&amp;gt;http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;IBM&amp;quot;&amp;gt;http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;published speedup&amp;quot;&amp;gt;http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;SS&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Speedup#Super_linear_speedup&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60065</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60065"/>
		<updated>2012-03-19T19:54:01Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Gustafson's Law==&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times.&amp;lt;ref name=&amp;quot;karp&amp;quot; /&amp;gt; &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [&amp;lt;ref name=&amp;quot;published speedup&amp;quot; /&amp;gt;, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program.&amp;lt;ref name=&amp;quot;IBM&amp;quot; /&amp;gt; He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
===Derivation from Amdahl's Law===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Superlinear Speedup==&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
===Lack of a Serial Equivalent===&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.&lt;br /&gt;
&lt;br /&gt;
Additionally, a serial algorithm could be constructed that would reduce the total problem size, but it would be much slower than its parallel counterpart. E.g., you could take this serial algorithm and parallelize it and both instances would do the same amount of work (less than the original problem size), but the parallel version would still do it much faster, achieving the superlinear speedup.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;karp&amp;quot;&amp;gt;http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;IBM&amp;quot;&amp;gt;http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;published speedup&amp;quot;&amp;gt;http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60064</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60064"/>
		<updated>2012-03-19T19:49:59Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Gustafson's Law==&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times.&amp;lt;ref name=&amp;quot;karp&amp;quot; /&amp;gt; &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&amp;lt;ref name=&amp;quot;published speedup&amp;quot; /&amp;gt;&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
SOURCE http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&lt;br /&gt;
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;br /&gt;
&lt;br /&gt;
===Derivation from Amdahl's Law===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Superlinear Speedup==&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
===Lack of a Serial Equivalent===&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.&lt;br /&gt;
&lt;br /&gt;
Additionally, a serial algorithm could be constructed that would reduce the total problem size, but it would be much slower than its parallel counterpart. E.g., you could take this serial algorithm and parallelize it and both instances would do the same amount of work (less than the original problem size), but the parallel version would still do it much faster, achieving the superlinear speedup.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;karp&amp;quot;&amp;gt;http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;published speedup&amp;quot;&amp;gt;http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sources:&amp;lt;br&amp;gt;&lt;br /&gt;
[1] http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;br&amp;gt;&lt;br /&gt;
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60063</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60063"/>
		<updated>2012-03-19T19:42:57Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Gustafson's Law==&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
SOURCE http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&lt;br /&gt;
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;br /&gt;
&lt;br /&gt;
===Derivation from Amdahl's Law===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Superlinear Speedup==&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup can most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
===Lack of a Serial Equivalent===&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.&lt;br /&gt;
&lt;br /&gt;
Additionally, a serial algorithm could be constructed that would reduce the total problem size, but it would be much slower than its parallel counterpart. E.g., you could take this serial algorithm and parallelize it and both instances would do the same amount of work (less than the original problem size), but the parallel version would still do it much faster, achieving the superlinear speedup.&lt;br /&gt;
&lt;br /&gt;
Sources:&amp;lt;br&amp;gt;&lt;br /&gt;
[1] http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;br&amp;gt;&lt;br /&gt;
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60062</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60062"/>
		<updated>2012-03-19T19:17:12Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Gustafson's Law&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
SOURCE http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&lt;br /&gt;
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Superlinear Speedup&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.&lt;br /&gt;
&lt;br /&gt;
Additionally, a serial algorithm could be constructed that would reduce the total problem size, but it would be much slower than its parallel counterpart. E.g., you could take this serial algorithm and parallelize it and both instances would do the same amount of work (less than the original problem size), but the parallel version would still do it much faster, achieving the superlinear speedup.&lt;br /&gt;
&lt;br /&gt;
Sources:&amp;lt;br&amp;gt;&lt;br /&gt;
[1] http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;br&amp;gt;&lt;br /&gt;
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60061</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60061"/>
		<updated>2012-03-19T19:05:32Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Gustafson's Law&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
SOURCE http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&lt;br /&gt;
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Superlinear Speedup&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm. While it is possible for one processor to have a cache size large enough to encompass a problem's working set, the usage of cache manipulation in achieving superlinear speedup relies on parallel execution, which a single processor is incapable of doing. Also, the monetary cost of such a cache would be so high as to make this implementation practically unfeasible.&lt;br /&gt;
&lt;br /&gt;
Sources:&amp;lt;br&amp;gt;&lt;br /&gt;
[1] http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;br&amp;gt;&lt;br /&gt;
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60060</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60060"/>
		<updated>2012-03-19T18:51:28Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Gustafson's Law&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
SOURCE http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&lt;br /&gt;
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Superlinear Speedup&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm.&lt;br /&gt;
&lt;br /&gt;
Sources:&amp;lt;br&amp;gt;&lt;br /&gt;
[1] http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;br&amp;gt;&lt;br /&gt;
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60059</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60059"/>
		<updated>2012-03-19T18:50:40Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Gustafson's Law&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
SOURCE http://books.google.com/booksid=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&lt;br /&gt;
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Superlinear Speedup&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm.&lt;br /&gt;
&lt;br /&gt;
Sources:&amp;lt;br&amp;gt;&lt;br /&gt;
[1] http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;br&amp;gt;&lt;br /&gt;
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60058</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=60058"/>
		<updated>2012-03-19T18:49:34Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Gustafson's Law&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold [1, first paragraph, second column, first page], and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
SOURCE http://books.google.com/booksid=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&lt;br /&gt;
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Regular speedup(p) = T1 / Tparallel = 1/(s+(1-s)/p) -&amp;gt; Assumes a fixed problem size (T1 = 1)&lt;br /&gt;
Gustafson's speedup(p) = T1 / Tparallel = (T1)/(s+(1-s)) = (T1) -&amp;gt; Assumes a fixed execution time (Tparallel = 1)&lt;br /&gt;
How to calculate T1?&lt;br /&gt;
&lt;br /&gt;
Examine the work graph:&lt;br /&gt;
Tparallel =&lt;br /&gt;
[s][1-s  ]&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
     ...&lt;br /&gt;
   [1-s  ]&lt;br /&gt;
Total execution time: s+(1-s) = 1 = Tparallel&lt;br /&gt;
Serial fraction: s = 0.3 (3 of 10 units)&lt;br /&gt;
&lt;br /&gt;
T1 =&lt;br /&gt;
[s][1-s  ][1-s  ] ... [1-s  ]&lt;br /&gt;
By inspection, the execution time is a single serial portion + p parallel portions.&lt;br /&gt;
Total execution time: s (serial) + p*(1-s) (parallel) = 0.3 + p*(1-0.3) = 0.3+0.7p&lt;br /&gt;
&lt;br /&gt;
Gustafson's speedup(p) = s + p*(1-s) / (s+(1-s)) = p + s - p*s&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Superlinear Speedup&lt;br /&gt;
&lt;br /&gt;
If a problem were 100% parallelizable, then under ideal circumstances one would expect the speedup for a 4-processor system running the same problem to be 4. However, there are cases where such a system might achieve a speedup of, say, 4.3, or 5. This seems counter intuitive, and is a controversial topic known as &amp;quot;superlinear speedup&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Superlinear speedup most easily be attained by taking advantage of the combined cache size of all the processors. If the total cache size is greater than the problem's total working set, the problem can be placed inside the cache and executed much more quickly, allowing faster execution while doing the same amount of work.&lt;br /&gt;
&lt;br /&gt;
Another explanation for superlinear speedup is that the parallel execution of the problem does less total work than a uniprocessor system. This can be done by clever usage of algorithms such that the problem size is reduced, resulted in less total work. &lt;br /&gt;
&lt;br /&gt;
However, it is not possible to serialize the parallel algorithm used in achieving superlinear speedup in order to get a better serial algorithm.&lt;br /&gt;
&lt;br /&gt;
Sources:&amp;lt;br&amp;gt;&lt;br /&gt;
[1] http://books.google.com/books? id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9Cpublished+speedups%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9Cpublished%20speedups%E2%80%9D&amp;lt;br&amp;gt;&lt;br /&gt;
[2] http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2011&amp;diff=59973</id>
		<title>CSC 456 Spring 2011</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2011&amp;diff=59973"/>
		<updated>2012-03-19T15:40:00Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Chapter 1: Nick Nicholls, Albert Chu]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 4a: Brandon Chisholm, Chris Barile]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 6: Joshua Mohundro, Patrick Wong]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 6: Allison Hamann, Chris Barile]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch1 BC]] &amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch7 MN]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch7 AA]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch4b]]&amp;lt;br/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2011&amp;diff=59972</id>
		<title>CSC 456 Spring 2011</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2011&amp;diff=59972"/>
		<updated>2012-03-19T15:39:45Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Chapter 1: Nick Nicholls, Albert Chu]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 4a: Brandon Chisholm, Chris Barile]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 6: Joshua Mohundro, Patrick Wong]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[Chapter 6: Allison Hamann, Chris Barile]]&amp;lt;br /&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch1 BC]] &amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch7 MN]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch7 AA]]&amp;lt;br/&amp;gt;&lt;br /&gt;
[[CSC 456 Spring 2012/ch4b PC]]&amp;lt;br/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=59971</id>
		<title>CSC 456 Spring 2012/ch4b</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch4b&amp;diff=59971"/>
		<updated>2012-03-19T15:37:41Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Gustafson's Law&lt;br /&gt;
&lt;br /&gt;
In 1985, IBM scientist Alan Karp issued a challenge to anyone who could produce a speedup of over 200 times. &amp;quot;Karp's Challenge&amp;quot;, as it became known, highlighted the limitations of Amdahl's Law. Prevailing speedups at the time were less than tenfold, and were for applications with little real-world value. C. Gordon Bell decided to up the ante, offering a $1000 award for the same challenge, issued annually to the winner, but only if the speedup was at least twice that of the previous award. He initially expected the first winner to have a speedup close to ten times, and that it would be difficult to advance beyond that.&lt;br /&gt;
  &lt;br /&gt;
John Gustafson won the 1988 Gordon Bell prize by demonstrating a 1000x speedup on a parallel program. He noticed a limitation in Amdahl's Law, which assumed a constant serial fraction of the problem, regardless of problem size. Gustafson realized that when you scale the problem size up proportional to the number of processors, the non-parallelizable fraction of work decreases (i.e., big machines do big problems, bigger problems means smaller portions of serial code, which means that there is more room for processors to parallelize). This provided the basis of what became known as &amp;quot;Gustafson's Law&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
SOURCE http://books.google.com/books?id=Hm6LaufVKFEC&amp;amp;pg=PA55&amp;amp;lpg=PA55&amp;amp;dq=%E2%80%9CKarp+Challenge%E2%80%9D&amp;amp;source=bl&amp;amp;ots=uCAOgSzfmR&amp;amp;sig=KpvmL85rJHqoFuBZlXNL_e_thbs&amp;amp;hl=en&amp;amp;sa=X&amp;amp;ei=ZNRgT4HxL4KatweYz5y7BQ&amp;amp;ved=0CFAQ6AEwBw#v=onepage&amp;amp;q=%E2%80%9CKarp%20Challenge%E2%80%9D&amp;amp;f=false&lt;br /&gt;
SOURCE http://techresearch.intel.com/ResearcherDetails.aspx?Id=182&lt;br /&gt;
&lt;br /&gt;
speedup(p) = p - s(p-1)&lt;br /&gt;
&lt;br /&gt;
Superlinear Speedup&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch1_BC&amp;diff=58533</id>
		<title>CSC 456 Spring 2012/ch1 BC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch1_BC&amp;diff=58533"/>
		<updated>2012-02-13T01:53:57Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;From 2006-2012 the increase in the number of transistors on a chip has grown from 167 million to 2.6 billion, a 15x increase.&amp;lt;ref name=&amp;quot;trans count&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From 2006-2012 the clock frequency has increased from 2.4ghz to 5.2, a 2.2x increase.&amp;lt;ref name=&amp;quot;proc chrono&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IBM now has the 16-core processor Power PC A2, Intel has the 10 core Xeon E7, AMD has the 16 Opteron Interlagos, and Sun has the 8-core Niagara.&amp;lt;ref name=&amp;quot;xeon&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;powerpc&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;amd&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;sun&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!Evolution of Intel Processors&amp;lt;ref name=&amp;quot;intel procs&amp;quot;/&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!From&lt;br /&gt;
!Procs&lt;br /&gt;
!Specifications&lt;br /&gt;
!New Features&lt;br /&gt;
|-&lt;br /&gt;
|1971&lt;br /&gt;
|4004&lt;br /&gt;
|740KHz, 2300 transistors, 10 micrometers, 640B addressable memory, 4KB program memory&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|1978&lt;br /&gt;
|8086&lt;br /&gt;
|16-bit, 5-10MHz, 29000 transistors at 3 micrometers, 1MB addressable memory&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|1982&lt;br /&gt;
|80286&lt;br /&gt;
|8-12.5MHz&lt;br /&gt;
|Virtual memory and protection mode&lt;br /&gt;
|-&lt;br /&gt;
|1985&lt;br /&gt;
|386&lt;br /&gt;
|32-bit, 16-33MHz, 275K transistors, 4GB addressable memory&lt;br /&gt;
|Pipelining&lt;br /&gt;
|-&lt;br /&gt;
|1989&lt;br /&gt;
|486&lt;br /&gt;
|25-100MHz, 1.5M transistors&lt;br /&gt;
|FPU integration&lt;br /&gt;
|-&lt;br /&gt;
|1993&lt;br /&gt;
|Pentium&lt;br /&gt;
|60-200MHz&lt;br /&gt;
|On-chip L1 caches and SMP suport&lt;br /&gt;
|-&lt;br /&gt;
|1995&lt;br /&gt;
|Pentium Pro&lt;br /&gt;
|16KB L1 caches, 5.5M transistors&lt;br /&gt;
|OOO execution&lt;br /&gt;
|-&lt;br /&gt;
|1997&lt;br /&gt;
|Pentium MMX&lt;br /&gt;
|233-450MHz, 32KB L1 cache, 4.5M transistors&lt;br /&gt;
|Dynamic branch prediction, MMX instruction sets&lt;br /&gt;
|-&lt;br /&gt;
|1999&lt;br /&gt;
|Pentium III&lt;br /&gt;
|450-1400MHz, 256KB L2 cache on chip, 28M transistors&lt;br /&gt;
|SSE instruction sets&lt;br /&gt;
|-&lt;br /&gt;
|2000&lt;br /&gt;
|Pentium IV&lt;br /&gt;
|1.4-3GHz, 55M transistors&lt;br /&gt;
|Hyperpipelining and SMT&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|2006&lt;br /&gt;
|Xeon&lt;br /&gt;
|64-bit, 2GHz, 167M transistors, 4MB L2 cache on chip&lt;br /&gt;
|Dual-core and virtualization support&lt;br /&gt;
|-&lt;br /&gt;
|'''2008'''&lt;br /&gt;
|'''Intel Core i7'''&lt;br /&gt;
|'''64-bit, 3.2GHz, 730M transistors, 4 core'''&lt;br /&gt;
|&lt;br /&gt;
|-	&lt;br /&gt;
|'''2010'''&lt;br /&gt;
|'''Intel Xeon &amp;quot;Nehalem-EX&amp;quot;'''&lt;br /&gt;
|'''64-bit, 2.66GHz, 2300M transistors, 8 core'''&lt;br /&gt;
|&lt;br /&gt;
|-	&lt;br /&gt;
|'''2011'''&lt;br /&gt;
|'''Intel Xeon E7'''&lt;br /&gt;
|'''64-bit, 2.67GHz, 2600M transistors, 10 core'''&lt;br /&gt;
|'''First Intel chip with 10 processors'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!Examples of Current Multicore Processors&amp;lt;ref name=&amp;quot;z196&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;xeon&amp;quot;/&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!Name&lt;br /&gt;
!# Cores&lt;br /&gt;
!Clock Freq&lt;br /&gt;
!Clock Type&lt;br /&gt;
!Caches&lt;br /&gt;
!Chip Power&lt;br /&gt;
|-&lt;br /&gt;
|IBM z196&lt;br /&gt;
|4 cores&lt;br /&gt;
|5.3GHz&lt;br /&gt;
|OOO Superscalar&lt;br /&gt;
|128KB L1, 1.5MB L2, 24MB L3, 192MB L4&lt;br /&gt;
|1800W&lt;br /&gt;
|-&lt;br /&gt;
|Intel Xeon E&lt;br /&gt;
|10 cores&lt;br /&gt;
|2.67GHz&lt;br /&gt;
|SIMD&lt;br /&gt;
|64KB L1, 256KB L2, 30MB L3&lt;br /&gt;
|130W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;trans count&amp;quot;&amp;gt; http://en.wikipedia.org/wiki/Transistor_count &amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;proc chrono&amp;quot;&amp;gt; http://en.wikipedia.org/wiki/Microprocessor_chronologyref &amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;intel procs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/List_of_Intel_microprocessors &amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;z196&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/IBM_z196_(microprocessor) &amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;xeon&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)#Westmere &amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;powerpc&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/PowerPC_A2 &amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;amd&amp;quot;&amp;gt;http://www.tomshardware.com/news/interlagos-bulldozer-opteron-16-core-valencia,13984.html &amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;sun&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/UltraSPARC_T1 &amp;lt;/ref&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch1_BC&amp;diff=58532</id>
		<title>CSC 456 Spring 2012/ch1 BC</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Spring_2012/ch1_BC&amp;diff=58532"/>
		<updated>2012-02-13T01:52:24Z</updated>

		<summary type="html">&lt;p&gt;Ccoffey: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;From 2006-2012 the increase in the number of transistors on a chip has grown from 167 million to 2.6 billion, a 15x increase.&amp;lt;ref name=&amp;quot;trans count&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
From 2006-2012 the clock frequency has increased from 2.4ghz to 5.2, a 2.2x increase.&amp;lt;ref name=&amp;quot;proc chrono&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
IBM now has the 16-core processor Power PC A2, Intel has the 10 core Xeon E7, AMD has the 16 Opteron Interlagos, and Sun has the 8-core Niagara.&amp;lt;ref name=&amp;quot;xeon&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;powerpc&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;amd&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;sun&amp;quot;/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!Evolution of Intel Processors&amp;lt;ref name=&amp;quot;intel procs&amp;quot;/&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!From&lt;br /&gt;
!Procs&lt;br /&gt;
!Specifications&lt;br /&gt;
!New Features&lt;br /&gt;
|-&lt;br /&gt;
|1971&lt;br /&gt;
|4004&lt;br /&gt;
|740KHz, 2300 transistors, 10 micrometers, 640B addressable memory, 4KB program memory&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|1978&lt;br /&gt;
|8086&lt;br /&gt;
|16-bit, 5-10MHz, 29000 transistors at 3 micrometers, 1MB addressable memory&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|1982&lt;br /&gt;
|80286&lt;br /&gt;
|8-12.5MHz&lt;br /&gt;
|Virtual memory and protection mode&lt;br /&gt;
|-&lt;br /&gt;
|1985&lt;br /&gt;
|386&lt;br /&gt;
|32-bit, 16-33MHz, 275K transistors, 4GB addressable memory&lt;br /&gt;
|Pipelining&lt;br /&gt;
|-&lt;br /&gt;
|1989&lt;br /&gt;
|486&lt;br /&gt;
|25-100MHz, 1.5M transistors&lt;br /&gt;
|FPU integration&lt;br /&gt;
|-&lt;br /&gt;
|1993&lt;br /&gt;
|Pentium&lt;br /&gt;
|60-200MHz&lt;br /&gt;
|On-chip L1 caches and SMP suport&lt;br /&gt;
|-&lt;br /&gt;
|1995&lt;br /&gt;
|Pentium Pro&lt;br /&gt;
|16KB L1 caches, 5.5M transistors&lt;br /&gt;
|OOO execution&lt;br /&gt;
|-&lt;br /&gt;
|1997&lt;br /&gt;
|Pentium MMX&lt;br /&gt;
|233-450MHz, 32KB L1 cache, 4.5M transistors&lt;br /&gt;
|Dynamic branch prediction, MMX instruction sets&lt;br /&gt;
|-&lt;br /&gt;
|1999&lt;br /&gt;
|Pentium III&lt;br /&gt;
|450-1400MHz, 256KB L2 cache on chip, 28M transistors&lt;br /&gt;
|SSE instruction sets&lt;br /&gt;
|-&lt;br /&gt;
|2000&lt;br /&gt;
|Pentium IV&lt;br /&gt;
|1.4-3GHz, 55M transistors&lt;br /&gt;
|Hyperpipelining and SMT&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|2006&lt;br /&gt;
|Xeon&lt;br /&gt;
|64-bit, 2GHz, 167M transistors, 4MB L2 cache on chip&lt;br /&gt;
|Dual-core and virtualization support&lt;br /&gt;
|-&lt;br /&gt;
|'''2008'''&lt;br /&gt;
|'''Intel Core i7'''&lt;br /&gt;
|'''64-bit, 3.2GHz, 730M transistors, 4 core'''&lt;br /&gt;
|&lt;br /&gt;
|-	&lt;br /&gt;
|'''2010'''&lt;br /&gt;
|'''Intel Xeon &amp;quot;Nehalem-EX&amp;quot;'''&lt;br /&gt;
|'''64-bit, 2.66GHz, 2300M transistors, 8 core'''&lt;br /&gt;
|&lt;br /&gt;
|-	&lt;br /&gt;
|'''2011'''&lt;br /&gt;
|'''Intel Xeon E7'''&lt;br /&gt;
|'''64-bit, 2.67GHz, 2600M transistors, 10 core'''&lt;br /&gt;
|'''First Intel chip with 10 processors'''&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
!Examples of Current Multicore Processors&amp;lt;ref name=&amp;quot;z196&amp;quot;/&amp;gt;&amp;lt;ref name=&amp;quot;xeon&amp;quot;/&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!Name&lt;br /&gt;
!# Cores&lt;br /&gt;
!Clock Freq&lt;br /&gt;
!Clock Type&lt;br /&gt;
!Caches&lt;br /&gt;
!Chip Power&lt;br /&gt;
|-&lt;br /&gt;
|IBM z196&lt;br /&gt;
|4 cores&lt;br /&gt;
|5.3GHz&lt;br /&gt;
|OOO Superscalar&lt;br /&gt;
|128KB L1, 1.5MB L2, 24MB L3, 192MB L4&lt;br /&gt;
|1800W&lt;br /&gt;
|-&lt;br /&gt;
|Intel Xeon E&lt;br /&gt;
|10 cores&lt;br /&gt;
|2.67GHz&lt;br /&gt;
|SIMD&lt;br /&gt;
|64KB L1, 256KB L2, 30MB L3&lt;br /&gt;
|130W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;trans count&amp;quot;&amp;gt; http://en.wikipedia.org/wiki/Transistor_count&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;proc chrono&amp;quot;&amp;gt; http://en.wikipedia.org/wiki/Microprocessor_chronologyref&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;intel procs&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/List_of_Intel_microprocessors&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;z196&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/IBM_z196_(microprocessor)&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;xeon&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)#Westmere&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;powerpc&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/PowerPC_A2&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;amd&amp;quot;&amp;gt;http://www.tomshardware.com/news/interlagos-bulldozer-opteron-16-core-valencia,13984.html&amp;gt;&lt;br /&gt;
&amp;lt;ref name=&amp;quot;sun&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/UltraSPARC_T1&amp;gt;&lt;br /&gt;
&amp;lt;/references&amp;gt;&lt;/div&gt;</summary>
		<author><name>Ccoffey</name></author>
	</entry>
</feed>