<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Cmbeverl</id>
	<title>Expertiza_Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Cmbeverl"/>
	<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Special:Contributions/Cmbeverl"/>
	<updated>2026-05-16T20:11:30Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82528</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82528"/>
		<updated>2013-11-19T17:15:14Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Weather Modeling */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&amp;lt;ref name=&amp;quot;complb&amp;quot;&amp;gt;http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
 |      title = Performance Analysis of Load Balancing Algorithms&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate November 19, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| Static&lt;br /&gt;
| Static&lt;br /&gt;
| Static&lt;br /&gt;
| Dynamic&lt;br /&gt;
| Dynamic&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|-&lt;br /&gt;
| Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
Loading balancing methods play a large role in weather modeling as the amount of data that needs to be processed is quite large and computationally intensive. Many models construct their own data structures and use variations on static and dynamic load balancing to achieve satisfactory performance.&lt;br /&gt;
&lt;br /&gt;
[http://wwwpub.zih.tu-dresden.de/~mlieber/publications/para10web.pdf Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 ]&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode &amp;lt;ref name=&amp;quot;pseudocode&amp;quot;&amp;gt;http://code.google.com/p/hypertable/wiki/LoadBalancing&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://code.google.com/p/hypertable/wiki/LoadBalancing&lt;br /&gt;
 |      title = Load Balancing Design&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate December 28, 2010&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
====References====&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Other Sources====&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82525</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82525"/>
		<updated>2013-11-19T17:07:50Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Comparisons of Static versus Dynamic */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&amp;lt;ref name=&amp;quot;complb&amp;quot;&amp;gt;http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
 |      title = Performance Analysis of Load Balancing Algorithms&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate November 19, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| Static&lt;br /&gt;
| Static&lt;br /&gt;
| Static&lt;br /&gt;
| Dynamic&lt;br /&gt;
| Dynamic&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|-&lt;br /&gt;
| Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode &amp;lt;ref name=&amp;quot;pseudocode&amp;quot;&amp;gt;http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
 |      title = Performance Analysis of Load Balancing Algorithms&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate November 19, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
====References====&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Other Sources====&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=82520</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=82520"/>
		<updated>2013-11-19T17:06:10Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Comments on third draft */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB Performance: http://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=3&amp;amp;ved=0CFgQFjAC&amp;amp;url=http%3A%2F%2Fwww.cs.ucr.edu%2F~bhuyan%2FCS213%2Fload_balancing.ps&amp;amp;ei=VDBUUtj4HYr29gSLh4GADA&amp;amp;usg=AFQjCNFo08VxZ0irGr6e-ejmr1TXDDL7hQ&amp;amp;bvm=bv.53537100,d.eWU&amp;amp;cad=rja&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://research.ijcaonline.org/ccsn2012/number4/ccsn1040.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05645456&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
weather modeling: http://cisl.ucar.edu/dir/CAS2K11/Presentations/panetta/jairo.panetta.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modeling: http://wwwpub.zih.tu-dresden.de/~mlieber/publications/para10web.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modeling: http://wwwpub.zih.tu-dresden.de/~mlieber/publications/para10paper.pdf&lt;br /&gt;
&lt;br /&gt;
== Comments on first draft ==&lt;br /&gt;
&lt;br /&gt;
Good organization; look forward to the text.  I also suggest this paper&lt;br /&gt;
&lt;br /&gt;
http://paper.ijcsns.org/07_book/201006/20100619.pdf A guide to dynamic load balancing in distributed computer systems&lt;br /&gt;
AM Alakeel - International Journal of Computer Science and …, 2010 - paper.ijcsns.org&lt;br /&gt;
&lt;br /&gt;
== Comments on second draft ==&lt;br /&gt;
&lt;br /&gt;
Generally well written; would like to see you extend it to describe situations in which each strategy works best.  If you can find empirical results to support those guidelines, so much the better.&lt;br /&gt;
&lt;br /&gt;
I think you need a better delineation of static vs. dynamic.  Since Central Manager assigns each new task to the processor with the least work, it sounds like it is dividing the work at run time.&lt;br /&gt;
&lt;br /&gt;
The load-balancing pseudocode needs to be accompanied by a prose explanation.&lt;br /&gt;
&lt;br /&gt;
== Comments on third draft ==&lt;br /&gt;
&lt;br /&gt;
-&amp;quot;Work load&amp;quot; --&amp;gt; &amp;quot;Workload&amp;quot;&lt;br /&gt;
&lt;br /&gt;
-In the first section, by &amp;quot;increased performance&amp;quot; do you mean &amp;quot;improved performance&amp;quot;?&lt;br /&gt;
&lt;br /&gt;
Giving a description of the various strategies isn't really sufficient.  I'd like to see you tell which strategy works best in various circumstances, preferably backed up by some numbers.&lt;br /&gt;
&lt;br /&gt;
Is Central Manager a static or dynamic strategy?  If it assigns work to the processor with the lowest current load, that certainly sounds like a dynamic strategy.  How is it differentiated from Central Queue?&lt;br /&gt;
&lt;br /&gt;
In Central Manager, you say, &amp;quot;different overhead than usual.&amp;quot;  What is &amp;quot;usual&amp;quot;?  Perhaps you mean to say that the overheads are distributed differently than with the other strategies encountered so far.&lt;br /&gt;
&lt;br /&gt;
When you say &amp;quot;fewer messages to be sent in order to facilitate load balancing,&amp;quot; can you quantify that?&lt;br /&gt;
&lt;br /&gt;
-In &amp;quot;Local Queue&amp;quot;, can you quantify the large amount of interprocessor communication, and compare it with other strategies (Central Manager, Central Queue)?&lt;br /&gt;
&lt;br /&gt;
In Central Queue, can you give a diagram of the cluster arrangement?  It sounds like strictly hierarchical workload managers.&lt;br /&gt;
&lt;br /&gt;
Please give some real-world applications, filling out the sections for which there are headings.&lt;br /&gt;
&lt;br /&gt;
-Still, more than pseudocode is needed for the example.  A description is also needed.&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82518</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82518"/>
		<updated>2013-11-19T17:01:42Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Sources */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&amp;lt;ref name=&amp;quot;complb&amp;quot;&amp;gt;http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
 |      title = Performance Analysis of Load Balancing Algorithms&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate November 19, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|-&lt;br /&gt;
| Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode [1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
====References====&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Other Sources====&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82517</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82517"/>
		<updated>2013-11-19T17:01:08Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Load Balancing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&amp;lt;ref name=&amp;quot;complb&amp;quot;&amp;gt;http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
 |      title = Performance Analysis of Load Balancing Algorithms&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate November 19, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|-&lt;br /&gt;
| Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode [1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82514</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82514"/>
		<updated>2013-11-19T16:58:10Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Load Balancing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&amp;lt;ref name=&amp;quot;Performance Analysis of Load Balancing Algorithms&amp;quot;&amp;gt;&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
 |      title = CPU Performance&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate November 19, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|-&lt;br /&gt;
| Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode [1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82513</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82513"/>
		<updated>2013-11-19T16:57:16Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Load Balancing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&amp;lt;ref name=&amp;quot;Performance Analysis of Load Balancing Algorithms&amp;quot;&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
 |      title = CPU Performance&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate November 19, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|-&lt;br /&gt;
| Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode [1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82512</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82512"/>
		<updated>2013-11-19T16:56:49Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Comparisons of Static versus Dynamic */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&amp;lt;ref name=&amp;quot;Performance Analysis of Load Balancing Algorithms&amp;quot;&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
 |      title = CPU Performance&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate November 19, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|-&lt;br /&gt;
| Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode [1]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82509</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82509"/>
		<updated>2013-11-19T16:53:04Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Comparisons of Static versus Dynamic */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines its own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manage work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized work load manager. One work load manager would be in charge of distributing work loads to each cluster work load manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|-&lt;br /&gt;
| Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82507</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82507"/>
		<updated>2013-11-19T16:52:15Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Comparisons of Static versus Dynamic */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines its own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manage work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized work load manager. One work load manager would be in charge of distributing work loads to each cluster work load manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table: Comparison of Load Balancing Algorithms&lt;br /&gt;
|-&lt;br /&gt;
! Parameters&lt;br /&gt;
! Round Robin&lt;br /&gt;
! Random&lt;br /&gt;
! Central Manager&lt;br /&gt;
! Local Queue&lt;br /&gt;
! Central Queue&lt;br /&gt;
|-&lt;br /&gt;
| Overload Rejection&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Fault Tolerant&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Forecasting Accuracy&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
|-&lt;br /&gt;
| Stability&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Large&lt;br /&gt;
| Small&lt;br /&gt;
| Small&lt;br /&gt;
|-&lt;br /&gt;
| Centralized/Decentralized&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
| D&lt;br /&gt;
| C&lt;br /&gt;
|-&lt;br /&gt;
| Dynamic/Static&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| S&lt;br /&gt;
| D&lt;br /&gt;
| D&lt;br /&gt;
|-&lt;br /&gt;
| Cooperative&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Process Migration&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| No&lt;br /&gt;
| Yes&lt;br /&gt;
| No&lt;br /&gt;
|- Resource Utilization&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| Less&lt;br /&gt;
| More&lt;br /&gt;
| Less&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82503</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82503"/>
		<updated>2013-11-19T16:42:09Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Load Balancing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, work loads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines its own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manage work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized work load manager. One work load manager would be in charge of distributing work loads to each cluster work load manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Comparisons of Static versus Dynamic==&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 //sort the load data and store it in different orders for use later&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
 //While the deviation is too high, iterate through the nodes&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD)&lt;br /&gt;
 {&lt;br /&gt;
   //get the tasks for node [0], and sort them&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   i=0;&lt;br /&gt;
   //iterates through the past load data for this node&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp; i &amp;lt; range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     &amp;amp;nbsp;&lt;br /&gt;
     //If a given swap results in a lesser deviation&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)&lt;br /&gt;
     {&lt;br /&gt;
        //swap and update load balance data related to the load swap&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
   {&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   }&lt;br /&gt;
 &amp;amp;nbsp;&lt;br /&gt;
   //re-balance the load before iterating again on the next node&lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82499</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82499"/>
		<updated>2013-11-19T16:34:37Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Sources */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, work loads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines its own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manage work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized work load manager. One work load manager would be in charge of distributing work loads to each cluster work load manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD) {&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
   i=0;&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp;&lt;br /&gt;
             i &amp;lt; range_load_vec.size()) {&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) {&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82486</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82486"/>
		<updated>2013-11-19T16:13:37Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Load Balancing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, work loads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines its own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manage work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized work load manager. One work load manager would be in charge of distributing work loads to each cluster work load manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD) {&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
   i=0;&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp;&lt;br /&gt;
             i &amp;lt; range_load_vec.size()) {&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) {&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx SIMULATION OF STATIC LOAD BALANCING ALGORITHMS ON HOMOGENEOUS AND HETEROGENEOUS CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=82419</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=82419"/>
		<updated>2013-11-12T17:01:05Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB Performance: http://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=3&amp;amp;ved=0CFgQFjAC&amp;amp;url=http%3A%2F%2Fwww.cs.ucr.edu%2F~bhuyan%2FCS213%2Fload_balancing.ps&amp;amp;ei=VDBUUtj4HYr29gSLh4GADA&amp;amp;usg=AFQjCNFo08VxZ0irGr6e-ejmr1TXDDL7hQ&amp;amp;bvm=bv.53537100,d.eWU&amp;amp;cad=rja&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://research.ijcaonline.org/ccsn2012/number4/ccsn1040.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05645456&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
weather modeling: http://cisl.ucar.edu/dir/CAS2K11/Presentations/panetta/jairo.panetta.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modeling: http://wwwpub.zih.tu-dresden.de/~mlieber/publications/para10web.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modeling: http://wwwpub.zih.tu-dresden.de/~mlieber/publications/para10paper.pdf&lt;br /&gt;
&lt;br /&gt;
== Comments on first draft ==&lt;br /&gt;
&lt;br /&gt;
Good organization; look forward to the text.  I also suggest this paper&lt;br /&gt;
&lt;br /&gt;
http://paper.ijcsns.org/07_book/201006/20100619.pdf A guide to dynamic load balancing in distributed computer systems&lt;br /&gt;
AM Alakeel - International Journal of Computer Science and …, 2010 - paper.ijcsns.org&lt;br /&gt;
&lt;br /&gt;
== Comments on second draft ==&lt;br /&gt;
&lt;br /&gt;
Generally well written; would like to see you extend it to describe situations in which each strategy works best.  If you can find empirical results to support those guidelines, so much the better.&lt;br /&gt;
&lt;br /&gt;
I think you need a better delineation of static vs. dynamic.  Since Central Manager assigns each new task to the processor with the least work, it sounds like it is dividing the work at run time.&lt;br /&gt;
&lt;br /&gt;
The load-balancing pseudocode needs to be accompanied by a prose explanation.&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=82418</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=82418"/>
		<updated>2013-11-12T16:54:40Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB Performance: http://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=3&amp;amp;ved=0CFgQFjAC&amp;amp;url=http%3A%2F%2Fwww.cs.ucr.edu%2F~bhuyan%2FCS213%2Fload_balancing.ps&amp;amp;ei=VDBUUtj4HYr29gSLh4GADA&amp;amp;usg=AFQjCNFo08VxZ0irGr6e-ejmr1TXDDL7hQ&amp;amp;bvm=bv.53537100,d.eWU&amp;amp;cad=rja&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://research.ijcaonline.org/ccsn2012/number4/ccsn1040.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05645456&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
weather modeling: http://cisl.ucar.edu/dir/CAS2K11/Presentations/panetta/jairo.panetta.pdf&lt;br /&gt;
weather modeling: http://wwwpub.zih.tu-dresden.de/~mlieber/publications/para10web.pdf&lt;br /&gt;
&lt;br /&gt;
== Comments on first draft ==&lt;br /&gt;
&lt;br /&gt;
Good organization; look forward to the text.  I also suggest this paper&lt;br /&gt;
&lt;br /&gt;
http://paper.ijcsns.org/07_book/201006/20100619.pdf A guide to dynamic load balancing in distributed computer systems&lt;br /&gt;
AM Alakeel - International Journal of Computer Science and …, 2010 - paper.ijcsns.org&lt;br /&gt;
&lt;br /&gt;
== Comments on second draft ==&lt;br /&gt;
&lt;br /&gt;
Generally well written; would like to see you extend it to describe situations in which each strategy works best.  If you can find empirical results to support those guidelines, so much the better.&lt;br /&gt;
&lt;br /&gt;
I think you need a better delineation of static vs. dynamic.  Since Central Manager assigns each new task to the processor with the least work, it sounds like it is dividing the work at run time.&lt;br /&gt;
&lt;br /&gt;
The load-balancing pseudocode needs to be accompanied by a prose explanation.&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82402</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82402"/>
		<updated>2013-11-12T16:39:40Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to increased performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, work loads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large affect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines it's own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manager work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized work load manager. One work load manager would be in charge of distributing work loads to each cluster work load manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of it's central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD) {&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
   i=0;&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp;&lt;br /&gt;
             i &amp;lt; range_load_vec.size()) {&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) {&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx SIMULATION OF STATIC LOAD BALANCING ALGORITHMS ON HOMOGENEOUS AND HETEROGENEOUS CPUs ] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=82293</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=82293"/>
		<updated>2013-10-31T16:22:07Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB Performance: http://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=3&amp;amp;ved=0CFgQFjAC&amp;amp;url=http%3A%2F%2Fwww.cs.ucr.edu%2F~bhuyan%2FCS213%2Fload_balancing.ps&amp;amp;ei=VDBUUtj4HYr29gSLh4GADA&amp;amp;usg=AFQjCNFo08VxZ0irGr6e-ejmr1TXDDL7hQ&amp;amp;bvm=bv.53537100,d.eWU&amp;amp;cad=rja&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://research.ijcaonline.org/ccsn2012/number4/ccsn1040.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05645456&lt;br /&gt;
&lt;br /&gt;
== Comments on first draft ==&lt;br /&gt;
&lt;br /&gt;
Good organization; look forward to the text.  I also suggest this paper&lt;br /&gt;
&lt;br /&gt;
http://paper.ijcsns.org/07_book/201006/20100619.pdf A guide to dynamic load balancing in distributed computer systems&lt;br /&gt;
AM Alakeel - International Journal of Computer Science and …, 2010 - paper.ijcsns.org&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82286</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82286"/>
		<updated>2013-10-31T16:07:51Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Real World applications of Load Balancing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break-up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to increased performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static Vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overheard. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large affect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines it's own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manager work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized work load manager. One work load manager would be in charge of distributing work loads to each cluster work load manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of it's central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
====Weather Modeling====&lt;br /&gt;
&lt;br /&gt;
====Visible Human Project====&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD) {&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
   i=0;&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp;&lt;br /&gt;
             i &amp;lt; range_load_vec.size()) {&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) {&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82284</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82284"/>
		<updated>2013-10-31T16:04:41Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Central Queue */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break-up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to increased performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static Vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overheard. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large affect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines it's own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manager work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized work load manager. One work load manager would be in charge of distributing work loads to each cluster work load manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of it's central load manager were to stop functioning.&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD) {&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
   i=0;&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp;&lt;br /&gt;
             i &amp;lt; range_load_vec.size()) {&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) {&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82281</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82281"/>
		<updated>2013-10-31T16:00:34Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Local Queue */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break-up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to increased performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static Vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
==='''Static Load balancing'''===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overheard. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large affect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
Central manager is a load balancing scheme which selects a certain processor to act as the &amp;quot;central node&amp;quot;, which handles the balancing. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing.&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue work load management, also called distributed work load management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines it's own work load and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their work load and can still manager work loads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory work load at all processors.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request.&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD) {&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
   i=0;&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp;&lt;br /&gt;
             i &amp;lt; range_load_vec.size()) {&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) {&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82270</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82270"/>
		<updated>2013-10-31T15:50:45Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Dynamic Load Balancing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break-up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to increased performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static Vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
===Static Load balancing===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overheard. Generating good &amp;quot;random&amp;quot; values is one challenge, because the function is called so many times that any bias will have a large affect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also sparingly assign multiple large tasks to a single processor, which would also lead to uneven load balancing.&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
Under local queue management, each processor is responsible for maintaining a sufficient work load. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor work load manager to send work. The remote load manager receiving the request examines it's own work load and, if it has sufficient extra work load, will send work to the requesting load manager.&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
A centralized work load manager is responsible for distributing work load to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request.&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD) {&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
   i=0;&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp;&lt;br /&gt;
             i &amp;lt; range_load_vec.size()) {&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) {&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82257</id>
		<title>CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC_456_Fall_2013/4a_bc&amp;diff=82257"/>
		<updated>2013-10-31T15:21:42Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Load Balancing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Load Balancing=&lt;br /&gt;
In multi-processor systems, load-balancing is used to break-up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to increased performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.&lt;br /&gt;
&lt;br /&gt;
==Static Vs. Dynamic Techniques==&lt;br /&gt;
&lt;br /&gt;
===Static Load balancing===&lt;br /&gt;
&lt;br /&gt;
====Round Robin====&lt;br /&gt;
&lt;br /&gt;
====Random====&lt;br /&gt;
&lt;br /&gt;
====Central Manager====&lt;br /&gt;
&lt;br /&gt;
===Dynamic Load Balancing===&lt;br /&gt;
&lt;br /&gt;
====Local Queue====&lt;br /&gt;
&lt;br /&gt;
====Central Queue====&lt;br /&gt;
&lt;br /&gt;
==Real World applications of Load Balancing==&lt;br /&gt;
&lt;br /&gt;
==Examples of Load Balancing in action==&lt;br /&gt;
&lt;br /&gt;
Server Load balancing pseudocode&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
 server_load_vec_desc = sort_descending(server_load_vec);&lt;br /&gt;
 server_load_vec_asc = sort_ascending(server_load_vec);&lt;br /&gt;
 while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD) {&lt;br /&gt;
   populate_range_load_vector(server_load_vec_desc[0].server_name);&lt;br /&gt;
   sort descending range_load_vec;&lt;br /&gt;
   i=0;&lt;br /&gt;
   while (server_load_vec_desc[0].deviation &amp;gt; DEVIATION_THRESHOLD &amp;amp;&amp;amp;&lt;br /&gt;
             i &amp;lt; range_load_vec.size()) {&lt;br /&gt;
     if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) {&lt;br /&gt;
        add range_load_vec[i] to balance plan&lt;br /&gt;
        partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;&lt;br /&gt;
        server_load_vec_desc[0].loadavg -= partial_deviation;&lt;br /&gt;
        server_load_vec_desc[0].deviation -= partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].loadavg += partial_deviation;&lt;br /&gt;
        server_load_vec_asc[0].deviation += partial_deviation;&lt;br /&gt;
        server_load_vec_asc = sort_ascending(server_load_vec_asc); &lt;br /&gt;
     }&lt;br /&gt;
     i++;&lt;br /&gt;
   }&lt;br /&gt;
   if (i == range_load_vec.size())&lt;br /&gt;
     remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc  &lt;br /&gt;
   server_load_vec_desc = sort_descending(server_load_vec_desc);&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[http://code.google.com/p/hypertable/wiki/LoadBalancing Load Balancing PseudoCode and other information]  &amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80140</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80140"/>
		<updated>2013-10-08T16:20:29Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB Performance: http://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=3&amp;amp;ved=0CFgQFjAC&amp;amp;url=http%3A%2F%2Fwww.cs.ucr.edu%2F~bhuyan%2FCS213%2Fload_balancing.ps&amp;amp;ei=VDBUUtj4HYr29gSLh4GADA&amp;amp;usg=AFQjCNFo08VxZ0irGr6e-ejmr1TXDDL7hQ&amp;amp;bvm=bv.53537100,d.eWU&amp;amp;cad=rja&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://research.ijcaonline.org/ccsn2012/number4/ccsn1040.pdf&amp;lt;br&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80139</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80139"/>
		<updated>2013-10-08T16:20:06Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB Performance: http://www.google.com/url?sa=t&amp;amp;rct=j&amp;amp;q=&amp;amp;esrc=s&amp;amp;source=web&amp;amp;cd=3&amp;amp;ved=0CFgQFjAC&amp;amp;url=http%3A%2F&amp;lt;br&amp;gt;%2Fwww.cs.ucr.edu%2F~bhuyan%2FCS213%2Fload_balancing.ps&amp;amp;ei=VDBUUtj4HYr29gSLh4GADA&amp;amp;usg=AFQjCNFo08VxZ0irGr6e-ejmr1TXDDL7hQ&amp;amp;bvm=bv.53537100,d.eWU&amp;amp;cad=rja&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://research.ijcaonline.org/ccsn2012/number4/ccsn1040.pdf&amp;lt;br&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80138</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80138"/>
		<updated>2013-10-08T16:11:53Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://research.ijcaonline.org/ccsn2012/number4/ccsn1040.pdf&amp;lt;br&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80137</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80137"/>
		<updated>2013-10-08T16:11:04Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&amp;lt;br&amp;gt;&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&amp;lt;br&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80136</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80136"/>
		<updated>2013-10-08T16:10:52Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&lt;br /&gt;
weather modelling: http://cdac.in/HTML/pdf/ECMWF.pdf&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80135</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80135"/>
		<updated>2013-10-08T16:04:56Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80134</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80134"/>
		<updated>2013-10-08T16:04:38Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static and dynamic LB: http://www.advanceresearchlibrary.com/temp/downloads/jct/may2013/v2.pdf&lt;br /&gt;
LB performance: http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80133</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80133"/>
		<updated>2013-10-08T16:01:00Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80132</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80132"/>
		<updated>2013-10-08T16:00:42Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Possible example topics:&lt;br /&gt;
human-slice project data: http://lspwww.epfl.ch/publications/gigaserver/piiiaapa.pdf&lt;br /&gt;
mapreduce applications: http://en.wikipedia.org/wiki/MapReduce&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80125</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80125"/>
		<updated>2013-10-08T15:50:23Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;lt;br&amp;gt;&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&amp;lt;br&amp;gt;&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&amp;lt;br&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80123</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80123"/>
		<updated>2013-10-08T15:48:58Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling: http://www.ics.uci.edu/~cs237/reading/parallel.pdf&lt;br /&gt;
static load-balancing: http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx&lt;br /&gt;
dynamic load-balancing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.2736&amp;amp;rep=rep1&amp;amp;type=pdf&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80112</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80112"/>
		<updated>2013-10-08T15:37:53Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;dynamic scheduling - http://www.ics.uci.edu/~cs237/reading/parallel.pdf&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80111</id>
		<title>Talk:CSC 456 Fall 2013/4a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:CSC_456_Fall_2013/4a_bc&amp;diff=80111"/>
		<updated>2013-10-08T15:35:37Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: Created page with &amp;quot;http://www.ics.uci.edu/~cs237/reading/parallel.pdf&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;http://www.ics.uci.edu/~cs237/reading/parallel.pdf&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79998</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79998"/>
		<updated>2013-10-08T02:14:54Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Edited from http://wiki.expertiza.ncsu.edu/index.php/Chapter_1:_Nick_Nicholls,_Albert_Chu&lt;br /&gt;
&lt;br /&gt;
Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by [http://en.wikipedia.org/wiki/Moore%27s_law Moore's law]), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count&lt;br /&gt;
&amp;lt;ref name=&amp;quot;transcount&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
 |      title = Transistor Count&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz from 1971. By 2011, the high end clock speed of a processor was in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
&amp;lt;ref name=&amp;quot;intelspecs&amp;quot;&amp;gt;http://ark.intel.com/&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://ark.intel.com/&lt;br /&gt;
 |      title = Intel Processor Specifications&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the increasing sophistication of processors and limitations of Silicon on Chip designs, design efforts shifted to parallelism. Instructions could be broken down into a large pipeline. The larger pipeline allowed big performance gains with Instruction Level Parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s, the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance of parallel computers after only a few years. This inhibited adoption of multi-processor systems since single-processor systems achieved relative performance while being less costly. Over time, the effort to gain improvements from ILP began to have diminishing returns. In single-processor systems, the primary way of increasing performance was to increase the clock speed. As clock speeds increase, power consumption also increases. With parallelism, as long as the instructions are parallelizable, performance can be increased with an increase in processors.&lt;br /&gt;
&amp;lt;ref name=&amp;quot;cpuperf&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Central_processing_unit#Performance&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://en.wikipedia.org/wiki/Central_processing_unit#Performance&lt;br /&gt;
 |      title = CPU Performance&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn toward on-chip multi-processors (i.e. multi-core architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can simultaneously execute multiple tasks and each core can use ILP with pipelining. Driven by the performance gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of multi-core processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&amp;lt;ref name=&amp;quot;top500list&amp;quot;&amp;gt;http://www.top500.org/lists/2013/06/&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://www.top500.org/lists/2013/06/&lt;br /&gt;
 |      title = Top500.org Supercomputer List&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = June 2013&lt;br /&gt;
 | accessdate = October 3, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, housed in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs.&lt;br /&gt;
&lt;br /&gt;
====Graphical trends for super computers 2008-2013&amp;lt;ref name=&amp;quot;top500stats&amp;quot;&amp;gt;http://www.top500.org/statistics/sublist/&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://www.top500.org/statistics/sublist&lt;br /&gt;
 |      title = CPU Performance&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;====&lt;br /&gt;
* [[Media:Top500_cores-per-socket.png|Top500.org Cores per socket]] - In recent years, 8-core have been gaining a large portion of the market-share with 16-core systems a recent player in the market. Single processor systems have been minor use since 2008. &lt;br /&gt;
* [[Media:Top500_cores-per-socket-performance.png|Top500.org Performance for cores per socket]] - 8-core systems have the most performance share of the super computer market. 16-core systems place into a very close second place with 12-core systems bringing up third place. In total, these three categories make up 85% of the top performance among super computers.&lt;br /&gt;
* [[Media:Top500 interconnect-family.png|Top500.org Interconnects used for super computers]] - Infiniband's interconnect technology makes up the largest portion of the super computer arena. Interconnect systems utilizing gigabit ethernets make up the next largest portion.&lt;br /&gt;
* [[Media:Top500 vendors.png|Top500.org Vendor trends of super computers]] - IBM and HP make up nearly half of the super computer market. HP and Cray appear to be on the trend of gaining market share in recent years.&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
====References====&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====Other sources====&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79284</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79284"/>
		<updated>2013-10-04T00:41:57Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Edited from http://wiki.expertiza.ncsu.edu/index.php/Chapter_1:_Nick_Nicholls,_Albert_Chu&lt;br /&gt;
&lt;br /&gt;
Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by [http://en.wikipedia.org/wiki/Moore%27s_law Moore's law]), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count&lt;br /&gt;
&amp;lt;ref name=&amp;quot;transcount&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
 |      title = Transistor Count&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz from 1971. By 2011, the high end clock speed of a processor was in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
&amp;lt;ref name=&amp;quot;intelspecs&amp;quot;&amp;gt;http://ark.intel.com/&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://ark.intel.com/&lt;br /&gt;
 |      title = Intel Processor Specifications&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the increasing sophistication of processors and limitations of Silicon on Chip designs, design efforts shifted to parallelism. Instructions could be broken down into a large pipeline. The larger pipeline allowed big performance gains with Instruction Level Parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s, the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance of parallel computers after only a few years. This inhibited adoption of multi-processor systems since single-processor systems achieved relative performance while being less costly. Over time, the effort to gain improvements from ILP began to have diminishing returns. In single-processor systems, the primary way of increasing performance was to increase the clock speed. As clock speeds increase, power consumption also increases. With parallelism, as long as the instructions are parallelizable, performance can be increased with an increase in processors.&lt;br /&gt;
&amp;lt;ref name=&amp;quot;cpuperf&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Central_processing_unit#Performance&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://en.wikipedia.org/wiki/Central_processing_unit#Performance&lt;br /&gt;
 |      title = CPU Performance&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn toward on-chip multi-processors (i.e. multi-core architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can simultaneously execute multiple tasks and each core can use ILP with pipelining. Driven by the performance gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of multi-core processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&amp;lt;ref name=&amp;quot;top500list&amp;quot;&amp;gt;http://www.top500.org/lists/2013/06/&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://www.top500.org/lists/2013/06/&lt;br /&gt;
 |      title = Top500.org Supercomputer List&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = June 2013&lt;br /&gt;
 | accessdate = October 3, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, housed in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs.&lt;br /&gt;
&lt;br /&gt;
====Graphical trends for super computers 2008-2013&amp;lt;ref name=&amp;quot;top500stats&amp;quot;&amp;gt;http://www.top500.org/statistics/sublist/&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://www.top500.org/statistics/sublist&lt;br /&gt;
 |      title = CPU Performance&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;====&lt;br /&gt;
* [[Media:Top500_cores-per-socket.png|Top500.org Cores per socket]] - In recent years, 8-core have been gaining a large portion of the market-share with 16-core systems a recent player in the market. Single processor systems have been minor use since 2008. &lt;br /&gt;
* [[Media:Top500_cores-per-socket-performance.png|Top500.org Performance for cores per socket]] - 8-core systems have the most performance share of the super computer market. 16-core systems place into a very close second place with 12-core systems bringing up third place. In total, these three categories make up 85% of the top performance among super computers.&lt;br /&gt;
* [[Media:Top500 interconnect-family.png|Top500.org Interconnects used for super computers]] - Infiniband's interconnect technology makes up the largest portion of the super computer arena. Interconnect systems utilizing gigabit ethernets make up the next largest portion.&lt;br /&gt;
* [[Media:Top500 vendors.png|Top500.org Vendor trends of super computers]] - IBM and HP make up nearly half of the super computer market. HP and Cray appear to be on the trend of gaining market share in recent years.&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/61275&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Talk:MainPage/CSC_456_Fall_2013/1a_bc&amp;diff=79283</id>
		<title>Talk:MainPage/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Talk:MainPage/CSC_456_Fall_2013/1a_bc&amp;diff=79283"/>
		<updated>2013-10-04T00:27:26Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Round 3 reply ==&lt;br /&gt;
&lt;br /&gt;
Since thumbnail creation appears to be broken, I'll leave it as links for the time being, as the PNGs are rather large in dimension.&lt;br /&gt;
&lt;br /&gt;
== Round 3 comments ==&lt;br /&gt;
&lt;br /&gt;
Please add citations in your running text, e.g., for what you say about chip multiprocessors.&lt;br /&gt;
&lt;br /&gt;
Table 1.2 needs dates of introduction.&lt;br /&gt;
&lt;br /&gt;
Please add narration about the trends shown in your Top 500 graphs.  Also, there should be a way to embed the graphs in the page, rather than linking to a .jpg.&lt;br /&gt;
&lt;br /&gt;
For mobile processors, some discussion of dynamic frequency scaling would be helpful, with links to further descriptions.&lt;br /&gt;
&lt;br /&gt;
== Round 2 comments == &lt;br /&gt;
&lt;br /&gt;
Please insert a link to the previous (2012) version of the page.&lt;br /&gt;
&lt;br /&gt;
The Cluster Computers table has most of the information in the Specifications column.  Consider splitting it into multiple columns, e.g., processor chip, interconnect, OS.&lt;br /&gt;
&lt;br /&gt;
Last time, we talked about looking at trends from Top500.org.  You could include graphs (being careful to cite the source!) and discuss what they show.&lt;br /&gt;
&lt;br /&gt;
== Other comments ==&lt;br /&gt;
&lt;br /&gt;
Observe trends, pulling data from Top500&lt;br /&gt;
Architecture statistics&lt;br /&gt;
-mpp, constellations gone 2007&lt;br /&gt;
Number of cores&lt;br /&gt;
Interconnects&lt;br /&gt;
&lt;br /&gt;
Of interest? IBM Blue Gene/Q design uses 18-core processor, 16 computer cores, 1 OS core, 1 non-functional manufacturing spare.&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79282</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79282"/>
		<updated>2013-10-04T00:16:19Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Edited from http://wiki.expertiza.ncsu.edu/index.php/Chapter_1:_Nick_Nicholls,_Albert_Chu&lt;br /&gt;
&lt;br /&gt;
Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by [http://en.wikipedia.org/wiki/Moore%27s_law Moore's law]), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count&lt;br /&gt;
&amp;lt;ref name=&amp;quot;transcount&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
 |      title = Transistor Count&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz from 1971. By 2011, the high end clock speed of a processor was in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
&amp;lt;ref name=&amp;quot;intelspecs&amp;quot;&amp;gt;http://ark.intel.com/&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://ark.intel.com/&lt;br /&gt;
 |      title = Intel Processor Specifications&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the increasing sophistication of processors and limitations of Silicon on Chip designs, design efforts shifted to parallelism. Instructions could be broken down into a large pipeline. The larger pipeline allowed big performance gains with Instruction Level Parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s, the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance of parallel computers after only a few years. This inhibited adoption of multi-processor systems since single-processor systems achieved relative performance while being less costly. Over time, the effort to gain improvements from ILP began to have diminishing returns. Once branch prediction had a success rate of 90%, there was little room for further improvement. In single-processor systems, the primary way of increasing performance was to increase the clock speed. As clock speeds increase, power consumption also increases.&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn toward on-chip multi-processors (i.e. multi-core architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can simultaneously execute multiple tasks and each core can use ILP with pipelining. Driven by the performance gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of current multi-core processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, housed in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs.&lt;br /&gt;
&lt;br /&gt;
Graphical trends for super computers 2008-2013:&lt;br /&gt;
* [[Media:Top500_cores-per-socket.png|Top500.org Cores per socket]]&lt;br /&gt;
* [[Media:Top500_cores-per-socket-performance.png|Top500.org Performance for cores per socket]]&lt;br /&gt;
* [[Media:Top500 interconnect-family.png|Top500.org Interconnects used for super computers]]&lt;br /&gt;
* [[Media:Top500 vendors.png|Top500.org Vendor trends of super computers]]&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/61275&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79228</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79228"/>
		<updated>2013-10-01T16:21:49Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Edited from http://wiki.expertiza.ncsu.edu/index.php/Chapter_1:_Nick_Nicholls,_Albert_Chu&lt;br /&gt;
&lt;br /&gt;
Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by [http://en.wikipedia.org/wiki/Moore%27s_law Moore's law]), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count&lt;br /&gt;
&amp;lt;ref name=&amp;quot;transcount&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
 |      title = Transistor Count&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz from 1971. By 2011, the high end clock speed of a processor was in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
&amp;lt;ref name=&amp;quot;intelspecs&amp;quot;&amp;gt;http://ark.intel.com/&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://ark.intel.com/&lt;br /&gt;
 |      title = Intel Processor Specifications&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the increasing sophistication of processors and limitations of Silicon on Chip designs, design efforts shifted to parallelism. Instructions could be broken down into a large pipeline. The larger pipeline allowed big performance gains with Instruction Level Parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s, the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance of parallel computers after only a few years. This inhibited adoption of multi-processors as single processor systems achieved relative performance while being less costly. EDIT PAUSE Of course, the performance gains of ILP was soon limited. Once branch prediction had a success rate of 90%, there was little room for further improvement. At this point, the main way of increasing performance was to increase the clock speed. This also meant more power consumption.&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn towards chip multi-processors (i.e. multicore architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can execute multiple tasks simultaneously, and each core can use ILP with pipelining. Driven by the gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of current multicore processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, housed in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs.&lt;br /&gt;
&lt;br /&gt;
Graphical trends for super computers 2008-2013:&lt;br /&gt;
* [[Media:Top500_cores-per-socket.png|Top500.org Cores per socket]]&lt;br /&gt;
* [[Media:Top500_cores-per-socket-performance.png|Top500.org Performance for cores per socket]]&lt;br /&gt;
* [[Media:Top500 interconnect-family.png|Top500.org Interconnects used for super computers]]&lt;br /&gt;
* [[Media:Top500 vendors.png|Top500.org Vendor trends of super computers]]&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/61275&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79223</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=79223"/>
		<updated>2013-10-01T16:06:50Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: added basic layout for reference structure&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Edited from http://wiki.expertiza.ncsu.edu/index.php/Chapter_1:_Nick_Nicholls,_Albert_Chu&lt;br /&gt;
&lt;br /&gt;
Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by [http://en.wikipedia.org/wiki/Moore%27s_law Moore's law]), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count&lt;br /&gt;
&amp;lt;ref name=&amp;quot;transcount&amp;quot;&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
{{cite web&lt;br /&gt;
 |        url = http://en.wikipedia.org/wiki/Transistor_count&lt;br /&gt;
 |      title = Transistor Count&lt;br /&gt;
 |      last1 = &lt;br /&gt;
 |     first1 = &lt;br /&gt;
 |    middle1 = &lt;br /&gt;
 |      last2 = &lt;br /&gt;
 |     first2 = &lt;br /&gt;
 |    middle2 = &lt;br /&gt;
 |   location = &lt;br /&gt;
 |       date = &lt;br /&gt;
 | accessdate = October 1, 2013&lt;br /&gt;
 |  separator = ,&lt;br /&gt;
 }}&lt;br /&gt;
&amp;lt;/ref&amp;gt;&lt;br /&gt;
. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz from 1971. By 2011, the high end clock speed of a processor was in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the sophistication of processors and increasing clock speeds, effort was placed on parallelism. The high clock speed could be broken down into a large pipeline; this large pipeline allowed big performance gains with instruction level parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance in parallel computers after only a few years. This inhibited adoption of multi-processors as it was costly and not needed. Of course, the performance gains of ILP was soon limited. Once branch prediction had a success rate of 90%, there was little room for further improvement. At this point, the main way of increasing performance was to increase the clock speed. This also meant more power consumption.&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn towards chip multi-processors (i.e. multicore architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can execute multiple tasks simultaneously, and each core can use ILP with pipelining. Driven by the gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of current multicore processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, housed in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs.&lt;br /&gt;
&lt;br /&gt;
Graphical trends for super computers 2008-2013:&lt;br /&gt;
* [[Media:Top500_cores-per-socket.png|Top500.org Cores per socket]]&lt;br /&gt;
* [[Media:Top500_cores-per-socket-performance.png|Top500.org Performance for cores per socket]]&lt;br /&gt;
* [[Media:Top500 interconnect-family.png|Top500.org Interconnects used for super computers]]&lt;br /&gt;
* [[Media:Top500 vendors.png|Top500.org Vendor trends of super computers]]&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/52220/Intel-Core-i3-2310M-Processor-%283M-Cache-2_10-GHz%29&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/61275&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_vendors.jpg&amp;diff=79222</id>
		<title>File:Top500 vendors.jpg</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_vendors.jpg&amp;diff=79222"/>
		<updated>2013-10-01T15:48:08Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=78919</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=78919"/>
		<updated>2013-09-24T22:31:55Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Edited from http://wiki.expertiza.ncsu.edu/index.php/Chapter_1:_Nick_Nicholls,_Albert_Chu&lt;br /&gt;
&lt;br /&gt;
Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by Moore's law), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise, if a bit slower since 2006.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz in 1971. By 2011, the high end clock speed of a processor is in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the sophistication of processors and increasing clock speeds, effort was placed on parallelism. The high clock speed could be broken down into a large pipeline; this large pipeline allowed big performance gains with instruction level parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance in parallel computers after only a few years. This inhibited adoption of multi-processors as it was costly and not needed. Of course, the performance gains of ILP was soon limited. Once branch prediction had a success rate of 90%, there was little room for further improvement. At this point, the main way of increasing performance was to increase the clock speed. This also meant more power consumption.&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn towards chip multi-processors (i.e. multicore architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can execute multiple tasks simultaneously, and each core can use ILP with pipelining. Driven by the gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of current multicore processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, housed in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs.&lt;br /&gt;
&lt;br /&gt;
Graphical trends for super computers 2008-2013:&lt;br /&gt;
* [[Media:Top500_cores-per-socket.png|Top500.org Cores per socket]]&lt;br /&gt;
* [[Media:Top500_cores-per-socket-performance.png|Top500.org Performance for cores per socket]]&lt;br /&gt;
* [[Media:Top500 interconnect-family.png|Top500.org Interconnects used for super computers]]&lt;br /&gt;
* [[Media:Top500 vendors.png|Top500.org Vendor trends of super computers]]&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/52220/Intel-Core-i3-2310M-Processor-%283M-Cache-2_10-GHz%29&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/61275&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=78918</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=78918"/>
		<updated>2013-09-24T22:26:16Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Trends */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by Moore's law), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise, if a bit slower since 2006.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz in 1971. By 2011, the high end clock speed of a processor is in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the sophistication of processors and increasing clock speeds, effort was placed on parallelism. The high clock speed could be broken down into a large pipeline; this large pipeline allowed big performance gains with instruction level parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance in parallel computers after only a few years. This inhibited adoption of multi-processors as it was costly and not needed. Of course, the performance gains of ILP was soon limited. Once branch prediction had a success rate of 90%, there was little room for further improvement. At this point, the main way of increasing performance was to increase the clock speed. This also meant more power consumption.&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn towards chip multi-processors (i.e. multicore architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can execute multiple tasks simultaneously, and each core can use ILP with pipelining. Driven by the gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of current multicore processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, housed in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance thirty-four percent greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
Since 2008, super computers have trended towards using multi-core processors in the architecture. As of 2013, according to Top500.org data, trends have been to use processors with a high number of cores, eight or more. Most use computing nodes with multiple multi-core CPUs.&lt;br /&gt;
&lt;br /&gt;
Graphical trends for super computers 2008-2013:&lt;br /&gt;
* [[Media:Top500_cores-per-socket.png|Top500.org Cores per socket]]&lt;br /&gt;
* [[Media:Top500_cores-per-socket-performance.png|Top500.org Performance for cores per socket]]&lt;br /&gt;
* [[Media:Top500 interconnect-family.png|Top500.org Interconnects used for super computers]]&lt;br /&gt;
* [[Media:Top500 vendors.png|Top500.org Vendor trends of super computers]]&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/52220/Intel-Core-i3-2310M-Processor-%283M-Cache-2_10-GHz%29&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/61275&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_vendors.png&amp;diff=78916</id>
		<title>File:Top500 vendors.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_vendors.png&amp;diff=78916"/>
		<updated>2013-09-24T22:14:12Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: Top500.org Trend 2008-2013
Vendors of super computers&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Top500.org Trend 2008-2013&lt;br /&gt;
Vendors of super computers&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_interconnect-family.png&amp;diff=78915</id>
		<title>File:Top500 interconnect-family.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_interconnect-family.png&amp;diff=78915"/>
		<updated>2013-09-24T22:13:54Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: Top500.org Trend 2008-2013
Interconnects used&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Top500.org Trend 2008-2013&lt;br /&gt;
Interconnects used&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_cores-per-socket-performance.png&amp;diff=78914</id>
		<title>File:Top500 cores-per-socket-performance.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_cores-per-socket-performance.png&amp;diff=78914"/>
		<updated>2013-09-24T22:13:36Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: Top500.org Trend 2008-2013
Performance of cores per processor&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Top500.org Trend 2008-2013&lt;br /&gt;
Performance of cores per processor&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_cores-per-socket.png&amp;diff=78913</id>
		<title>File:Top500 cores-per-socket.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=File:Top500_cores-per-socket.png&amp;diff=78913"/>
		<updated>2013-09-24T22:12:51Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: Top500.org Trend 2008-2013
Number of cores per processor&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Top500.org Trend 2008-2013&lt;br /&gt;
Number of cores per processor&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=78909</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=78909"/>
		<updated>2013-09-24T21:49:00Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Cluster Computers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by Moore's law), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise, if a bit slower since 2006.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz in 1971. By 2011, the high end clock speed of a processor is in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the sophistication of processors and increasing clock speeds, effort was placed on parallelism. The high clock speed could be broken down into a large pipeline; this large pipeline allowed big performance gains with instruction level parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance in parallel computers after only a few years. This inhibited adoption of multi-processors as it was costly and not needed. Of course, the performance gains of ILP was soon limited. Once branch prediction had a success rate of 90%, there was little room for further improvement. At this point, the main way of increasing performance was to increase the clock speed. This also meant more power consumption.&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn towards chip multi-processors (i.e. multicore architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can execute multiple tasks simultaneously, and each core can use ILP with pipelining. Driven by the gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of current multicore processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, housed in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance 34% greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/52220/Intel-Core-i3-2310M-Processor-%283M-Cache-2_10-GHz%29&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/61275&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=78905</id>
		<title>Main Page/CSC 456 Fall 2013/1a bc</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Main_Page/CSC_456_Fall_2013/1a_bc&amp;diff=78905"/>
		<updated>2013-09-24T21:46:23Z</updated>

		<summary type="html">&lt;p&gt;Cmbeverl: /* Cluster Computers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Since 2006, parallel computers have continued to evolve.  Besides the increasing number of transistors (as predicted by Moore's law), other designs and architectures have increased in prominence.  These include Chip Multi-Processors, cluster computing, and mobile processors.&lt;br /&gt;
&lt;br /&gt;
==Transistor Count==&lt;br /&gt;
At the most fundamental level of parallel computing development is the transistor count. According to the text, since 1971 the number of transistors on a chip has increased from 2,300 to 167 million in 2006.  By 2011, the transistor count had further increased to 2.6 billion, a 1,130,434x increase from 1971.  The clock frequency has also continued to rise, if a bit slower since 2006.  In 2006, the clock speed was around 2.4GHz, 3,200 times the speed of 750KHz in 1971. By 2011, the high end clock speed of a processor is in the 3.3GHz range.&lt;br /&gt;
&lt;br /&gt;
====Evolution of Intel Processors====&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.1: Evolution of Intel Processors&lt;br /&gt;
|-&lt;br /&gt;
! From&lt;br /&gt;
! Procs&lt;br /&gt;
! Transistors&lt;br /&gt;
! Specifications&lt;br /&gt;
! New Features&lt;br /&gt;
|-&lt;br /&gt;
| 2000&lt;br /&gt;
| Pentium IV&lt;br /&gt;
| 55 Million&lt;br /&gt;
| 1.4-3GHz&lt;br /&gt;
| hyper-pipelining, SMT&lt;br /&gt;
|-&lt;br /&gt;
| 2006 &lt;br /&gt;
| Xeon&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 64-bit, 2GHz, 4MB L2 cache on chip&lt;br /&gt;
| Dual core, virtualization support&lt;br /&gt;
|-&lt;br /&gt;
| 2007&lt;br /&gt;
| Core 2 Allendale&lt;br /&gt;
| 167 Million&lt;br /&gt;
| 1.8-2.6 GHz, 2MB L2 cache&lt;br /&gt;
| 2 CPUs on one die, Trusted Execution Technology&lt;br /&gt;
|-&lt;br /&gt;
| 2008&lt;br /&gt;
| Xeon&lt;br /&gt;
| 820 Million&lt;br /&gt;
| 2.5-2.83 GHz, 6MB L3 cache&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| 2009&lt;br /&gt;
| Core i7 Lynnfield&lt;br /&gt;
| 774 Million&lt;br /&gt;
| 2.66-2.93 GHz, 8MB L3 cache&lt;br /&gt;
| 2-channel DDR3&lt;br /&gt;
|-&lt;br /&gt;
| 2010&lt;br /&gt;
| Core i7 Gulftown&lt;br /&gt;
| 1.17 Billion&lt;br /&gt;
| 3.2 GHz&lt;br /&gt;
| 32 nm&lt;br /&gt;
|-&lt;br /&gt;
| 2011&lt;br /&gt;
| Core i7 Sandy Bridge EP4&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 3.2-3.3 GHz, 32 KB L1 cache per core, 256 KB L2 cache, 20 MB L3 cache&lt;br /&gt;
| Up to 8 cores&lt;br /&gt;
|-&lt;br /&gt;
|2012&lt;br /&gt;
| Core i7 Ivy Bridge&lt;br /&gt;
| 1.2 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| 22 nm, 3D Tri-gate transistors&lt;br /&gt;
|-&lt;br /&gt;
|2013&lt;br /&gt;
| Core Haswell&lt;br /&gt;
| 1.4 Billion&lt;br /&gt;
| 2.5-3.7 GHz&lt;br /&gt;
| Fully integrated voltage regulator&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Chip Multi-Processors==&lt;br /&gt;
&lt;br /&gt;
With the sophistication of processors and increasing clock speeds, effort was placed on parallelism. The high clock speed could be broken down into a large pipeline; this large pipeline allowed big performance gains with instruction level parallelism (ILP). Instruction level parallelism is the act of executing multiple instructions at the same time. This would be implemented in a single core, with each stage of the pipeline being executed in each clock cycle. By the 1970s the gains from ILP were significant enough to allow uni-processor systems to reach the level of performance in parallel computers after only a few years. This inhibited adoption of multi-processors as it was costly and not needed. Of course, the performance gains of ILP was soon limited. Once branch prediction had a success rate of 90%, there was little room for further improvement. At this point, the main way of increasing performance was to increase the clock speed. This also meant more power consumption.&lt;br /&gt;
&lt;br /&gt;
As the diminishing returns and power inefficiencies of ILP progressed, manufacturers began to turn towards chip multi-processors (i.e. multicore architectures). These systems allowed task parallelism in addition to ILP. For example, one processor can execute multiple tasks simultaneously, and each core can use ILP with pipelining. Driven by the gains of multi-processors, the amount of cores on a chip has continued to increase since 2006. By 2011, Intel and IBM were producing 8-core processors. For servers, AMD was producing up to 16-core processors.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Table 1.2: Examples of current multicore processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Sandy Bridge&lt;br /&gt;
! AMD Valencia&lt;br /&gt;
! IBM POWER7&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 4&lt;br /&gt;
| 8&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq.&lt;br /&gt;
| 3.5GHz&lt;br /&gt;
| 3.3GHz&lt;br /&gt;
| 3.55GHz&lt;br /&gt;
|-&lt;br /&gt;
! Clock Type&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| OOO Superscalar&lt;br /&gt;
| SIMD&lt;br /&gt;
|-&lt;br /&gt;
! Caches&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 8MB L3&lt;br /&gt;
| 32MB L3&lt;br /&gt;
|-&lt;br /&gt;
! Chip Power&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 95 Watts&lt;br /&gt;
| 650 Watts for the whole system&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Cluster Computers==&lt;br /&gt;
The 1990s saw a rise in the use of cluster computers, or distributed super computers. These systems take advantage of the power of individual processors, and combine them to create a powerful unified system.  Originally, cluster computers only used uniprocessors, but have since adopted the use of multi-processors.  Unfortunately, the cost advantage mentioned by the book has largely dissipated, as many current implementations use expensive, high-end hardware.&lt;br /&gt;
&lt;br /&gt;
One of the newer innovations in cluster computers is high-availability. These types of clusters operate with redundant nodes to minimize downtime when components fail. Such a system uses automated load-balancing algorithms to route traffic when a node fails.  In order to function, high-availability clusters must be able to check and change the status of running applications.  The applications must also use shared storage, while operating in a way such that its data is protected from corruption.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Top500.org Cluster computers 2008 - 2013&lt;br /&gt;
|-&lt;br /&gt;
! Date of #1 Rank&lt;br /&gt;
! Name&lt;br /&gt;
! Number of Cores/Nodes&lt;br /&gt;
! Specifications&lt;br /&gt;
! Peak Performance&lt;br /&gt;
! Power Usage&lt;br /&gt;
! Information&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2009 Jun&lt;br /&gt;
| Roadrunner&lt;br /&gt;
|&lt;br /&gt;
* 129,600 Cores&lt;br /&gt;
* 6,480 computing nodes &lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2210 2-core&lt;br /&gt;
* IBM PowerXCell8i 8+1 cores&lt;br /&gt;
* 104 Terabytes RAM&lt;br /&gt;
* Infiniband interconnect&lt;br /&gt;
* OS - REHL and Fedora Linux&lt;br /&gt;
| 1.46 Petaflops&lt;br /&gt;
| 2.5 Megawatts&lt;br /&gt;
| Built by IBM, housed in NM, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Jun&lt;br /&gt;
| Jaguar&lt;br /&gt;
|&lt;br /&gt;
* 224,162 Cores&lt;br /&gt;
* 18,688 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* AMD Opteron 2435 6-core&lt;br /&gt;
* AMD Opteron 1354 4-core&lt;br /&gt;
* 360 Terabytes RAM&lt;br /&gt;
* Cray Seastar2+, Infiniband interconnects&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 2.33 Petaflops&lt;br /&gt;
| 7.0 Megawatts&lt;br /&gt;
| Built by Cray, house in Tennessee, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2010 Nov&lt;br /&gt;
| Tianhe-1A&lt;br /&gt;
|&lt;br /&gt;
* 186,368 Cores&lt;br /&gt;
* 7,168 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Xeon X5670 6-core CPUs per node&lt;br /&gt;
* 1 Nvidia M2050 GPU per node&lt;br /&gt;
* 262 Terabytes RAM&lt;br /&gt;
* Arch interconnect (NUDT)&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 4.7 Petaflops&lt;br /&gt;
| 4.0 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2011 Nov&lt;br /&gt;
| K Computer&lt;br /&gt;
|&lt;br /&gt;
* 705,024 Cores&lt;br /&gt;
* 96 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 2.0GHz 8-core SPARC64 VIIIfx&lt;br /&gt;
* 6 I/O nodes&lt;br /&gt;
* Using Message Passing Interface &lt;br /&gt;
* Tofu 6-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 11.28 Petaflops&lt;br /&gt;
| 9.89 Megawatts&lt;br /&gt;
| Built by Fujitsu, Housed in Japan&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Jun&lt;br /&gt;
| Sequoia&lt;br /&gt;
|&lt;br /&gt;
* 1,572,864 Cores&lt;br /&gt;
* 98,304 computing nodes&lt;br /&gt;
|&lt;br /&gt;
* 16-core PowerPC A2, Blue Gene/Q&lt;br /&gt;
* 1.5 Petabytes RAM&lt;br /&gt;
* 5-dimensional torus interconnect&lt;br /&gt;
* OS - Linux variant&lt;br /&gt;
| 20.13 Petaflops&lt;br /&gt;
| 7.9 Megawatts&lt;br /&gt;
| Built by IBM, Housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2012 Nov&lt;br /&gt;
| Titan&lt;br /&gt;
|&lt;br /&gt;
* 560,640 computing cores&lt;br /&gt;
|&lt;br /&gt;
* AMD Opertons CPUs&lt;br /&gt;
* Nvidia Tesla GPUs&lt;br /&gt;
* 693 Terabytes RAM (CPU + GPU)&lt;br /&gt;
* Cray Gemini interconnect&lt;br /&gt;
* OS - Cray Linux&lt;br /&gt;
| 27.11 Petaflops&lt;br /&gt;
| 8.2 Megawatts&lt;br /&gt;
| Built by Cray, housed in California, US&lt;br /&gt;
|- valign=&amp;quot;top&amp;quot;&lt;br /&gt;
| 2013 Jun&lt;br /&gt;
| Tianhe-2&lt;br /&gt;
|&lt;br /&gt;
* 3,120,000 Cores&lt;br /&gt;
* 16,000 nodes&lt;br /&gt;
|&lt;br /&gt;
* 2 Intel Xeon IvyBridge per node&lt;br /&gt;
* 3 Intel Xeon Phi per node&lt;br /&gt;
* 1.34 Petabytes RAM&lt;br /&gt;
* TH Express-2 fat tree topology (NUDT)&lt;br /&gt;
* OS - NUDT Kylin Linux&lt;br /&gt;
| 54.9 Petaflops&lt;br /&gt;
| 17.6 Megawatts&lt;br /&gt;
| Built by NUDT, China&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Trends===&lt;br /&gt;
In 2011 the fastest super computer was Japan's K Computer, a cluster computer built by Fujitsu.  Six months later, Sequoia replaced the K Computer as the top ranking cluster computer with a performance of 20.13 petaflops, a seventy-eight percent increase. Titan replaced the Sequoia as number in November 2012, with performance 34% greater than it's predecessor. The June 2013 top leader, Tianhe-2, displaced Titan with a one-hundred percent increase in performance.&lt;br /&gt;
&lt;br /&gt;
==Mobile Processors==&lt;br /&gt;
Due to the popularity of smart phones, there has been significant development on mobile processors. This category of processors has been specifically designed for low power use. To conserve power, these types of processors use dynamic frequency scaling. This technology allows the processor to run at varying clock frequencies based on the current load.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|+ Examples of current mobile processors&lt;br /&gt;
|-&lt;br /&gt;
! Aspects&lt;br /&gt;
! Intel Atom N2800&lt;br /&gt;
! ARM Cortex-A9&lt;br /&gt;
|-&lt;br /&gt;
! # Cores&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
! Clock Freq&lt;br /&gt;
| 1.86GHz&lt;br /&gt;
| 800MHz-2000MHz&lt;br /&gt;
|-&lt;br /&gt;
! Cache&lt;br /&gt;
| 1MB L2&lt;br /&gt;
| 4MB L2&lt;br /&gt;
|-&lt;br /&gt;
! Power&lt;br /&gt;
| 35 W&lt;br /&gt;
| .5W-1.9W&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Sources==&lt;br /&gt;
&amp;lt;ol&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/Transistor_count&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/52220/Intel-Core-i3-2310M-Processor-%283M-Cache-2_10-GHz%29&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/news/intel-ivy-bridge-22nm-cpu-3d-transistor,14093.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5091/intel-core-i7-3960x-sandy-bridge-e-review-keeping-the-high-end-alive&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.chiplist.com/Intel_Core_2_Duo_E4xxx_series_processor_Allendale/tree3f-subsection--2249-/&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.pcper.com/reviews/Processors/Intel-Lynnfield-Core-i7-870-and-Core-i5-750-Processor-Review&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.intel.com/pressroom/kits/quickreffam.htm#Xeon&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.tomshardware.com/reviews/core-i7-980x-gulftown,2573-2.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.fujitsu.com/global/news/pr/archives/month/2011/20111102-02.html&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/61275&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.anandtech.com/show/5096/amd-releases-opteron-4200-valencia-and-6200-interlagos-series&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://www.arm.com/products/processors/cortex-a/cortex-a9.php&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://ark.intel.com/products/58917/Intel-Atom-Processor-N2800-(1M-Cache-1_86-GHz)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/SPARC64_VI#SPARC64_VIIIfx&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;http://en.wikipedia.org/wiki/High-availability_cluster&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;/div&gt;</summary>
		<author><name>Cmbeverl</name></author>
	</entry>
</feed>