CSC 456 Fall 2013/4a bc: Difference between revisions
(51 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
=Load Balancing= | =Load Balancing= | ||
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be ''statically'' balanced. Dividing the work load up during run-time is ''dynamically'' balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors. | |||
==Static | ==Static vs. Dynamic Techniques== | ||
===Static Load balancing=== | ==='''Static Load balancing'''=== | ||
====Round Robin==== | |||
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind. | |||
====Random==== | |||
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good "random" values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing. | |||
====Central Manager==== | |||
Central manager is a load balancing scheme which selects a certain processor to act as the "central node", which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle. | |||
===Dynamic Load Balancing=== | ===Dynamic Load Balancing=== | ||
====Local Queue==== | |||
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors. | |||
====Central Queue==== | |||
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning. | |||
==Comparisons of Static versus Dynamic== | |||
{| class="wikitable" | |||
|+ Table: Comparison of Load Balancing Algorithms<ref name="complb">http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf | |||
{{cite web | |||
| url = http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf | |||
| title = Performance Analysis of Load Balancing Algorithms | |||
| last1 = | |||
| first1 = | |||
| middle1 = | |||
| last2 = | |||
| first2 = | |||
| middle2 = | |||
| location = | |||
| date = | |||
| accessdate November 19, 2013 | |||
| separator = , | |||
}} | |||
</ref> | |||
|- | |||
! Parameters | |||
! Round Robin | |||
! Random | |||
! Central Manager | |||
! Local Queue | |||
! Central Queue | |||
|- | |||
| Dynamic/Static | |||
| Static | |||
| Static | |||
| Static | |||
| Dynamic | |||
| Dynamic | |||
|- | |||
| Overload Rejection | |||
| No | |||
| No | |||
| No | |||
| Yes | |||
| Yes | |||
|- | |||
| Fault Tolerant | |||
| No | |||
| No | |||
| Yes | |||
| Yes | |||
| Yes | |||
|- | |||
| Forecasting Accuracy | |||
| More | |||
| More | |||
| More | |||
| Less | |||
| Less | |||
|- | |||
| Stability | |||
| Large | |||
| Large | |||
| Large | |||
| Small | |||
| Small | |||
|- | |||
| Centralized/Decentralized | |||
| D | |||
| D | |||
| C | |||
| D | |||
| C | |||
|- | |||
| Cooperative | |||
| No | |||
| No | |||
| Yes | |||
| Yes | |||
| Yes | |||
|- | |||
| Process Migration | |||
| No | |||
| No | |||
| No | |||
| Yes | |||
| No | |||
|- | |||
| Resource Utilization | |||
| Less | |||
| Less | |||
| Less | |||
| More | |||
| Less | |||
|} | |||
==Real World applications of Load Balancing== | ==Real World applications of Load Balancing== | ||
====Weather Modeling==== | |||
Loading balancing methods play a large role in weather modeling as the amount of data that needs to be processed is quite large and computationally intensive. Many models construct their own data structures and use variations on static and dynamic load balancing to achieve satisfactory performance. | |||
[http://wwwpub.zih.tu-dresden.de/~mlieber/publications/para10web.pdf Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4 ] | |||
====Visible Human Project==== | |||
==Examples of Load Balancing in action== | ==Examples of Load Balancing in action== | ||
Server Load balancing pseudocode | Server Load balancing pseudocode <ref name="pseudocode">http://code.google.com/p/hypertable/wiki/LoadBalancing | ||
{{cite web | |||
| url = http://code.google.com/p/hypertable/wiki/LoadBalancing | |||
| title = Load Balancing Design | |||
| last1 = | |||
| first1 = | |||
| middle1 = | |||
| last2 = | |||
| first2 = | |||
| middle2 = | |||
| location = | |||
| date = | |||
| accessdate December 28, 2010 | |||
| separator = , | |||
}} | |||
</ref> | |||
<code> | <code> | ||
server_load_vec_desc = sort_descending(server_load_vec); | //sort the load data and store it in different orders for use later | ||
server_load_vec_asc = sort_ascending(server_load_vec); | server_load_vec_desc = sort_descending(server_load_vec); | ||
while (server_load_vec_desc[0].deviation > DEVIATION_THRESHOLD) { | server_load_vec_asc = sort_ascending(server_load_vec); | ||
| |||
//While the deviation is too high, iterate through the nodes | |||
while (server_load_vec_desc[0].deviation > DEVIATION_THRESHOLD) | |||
{ | |||
//get the tasks for node [0], and sort them | |||
populate_range_load_vector(server_load_vec_desc[0].server_name); | |||
sort descending range_load_vec; | |||
| |||
i=0; | |||
//iterates through the past load data for this node | |||
while (server_load_vec_desc[0].deviation > DEVIATION_THRESHOLD && i < range_load_vec.size()) | |||
{ | |||
| |||
//If a given swap results in a lesser deviation | |||
if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation) | |||
{ | |||
//swap and update load balance data related to the load swap | |||
add range_load_vec[i] to balance plan | |||
partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate; | |||
} | server_load_vec_desc[0].loadavg -= partial_deviation; | ||
server_load_vec_desc[0].deviation -= partial_deviation; | |||
server_load_vec_asc[0].loadavg += partial_deviation; | |||
server_load_vec_asc[0].deviation += partial_deviation; | |||
server_load_vec_asc = sort_ascending(server_load_vec_asc); | |||
} | |||
i++; | |||
} | |||
| |||
//if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed | |||
if (i == range_load_vec.size()) | |||
{ | |||
remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc | |||
} | |||
| |||
//re-balance the load before iterating again on the next node | |||
server_load_vec_desc = sort_descending(server_load_vec_desc); | |||
} | |||
</code> | </code> | ||
==Sources== | |||
====References==== | |||
<references/> | |||
====Other Sources==== | |||
<ol> | |||
<li>[http://paper.ijcsns.org/07_book/201006/20100619.pdf A Guide to Dynamic Load Balancing in Distributed Computer Systems] </li> | |||
<li>[http://www.ics.uci.edu/~cs237/reading/parallel.pdf Strategies for Dynamic Load Balancing on Highly Parallel Computers] </li> | |||
<li>[http://www.vsrdjournals.com/CSIT/Issue/2013_05_May/Web/1_Jagdeep_Singh_1670_Research_Article_VSRDIJCSIT_May_2013.docx Simulation of Static Load Balancing Algorithms on Homogeneous and Heterogeneous CPUs ] </li> | |||
<li>[http://masters.donntu.edu.ua/2010/fknt/babkin/library/article11.pdf Performance Analysis of Load Balancing Algorithms]</li> | |||
</ol> |
Latest revision as of 17:15, 19 November 2013
Load Balancing
In multi-processor systems, load-balancing is used to break up and distribute the work load to individual processors in order to make effective use of processor time. When the work load is divided up at compile-time, the balance is said to be statically balanced. Dividing the work load up during run-time is dynamically balancing the load. Static load balancing has reduced overhead as the work is divided before run time. Dynamic load balancing assigns work as processors become idle, so there is greater overhead. However, dynamic balancing can lead to improved performance of load balancing due to being able to assign work to a processor when it does become idle, reducing the overall idle time of processors.
Static vs. Dynamic Techniques
Static Load balancing
Round Robin
Round robin is a load balancing technique which evenly distributes tasks across available processors. Each processor is lined up, and given a task one after the other until it loops around again back to the first processor. Visualize a dealer in a casino passing out cards to each player in a circle, one at a time. The advantage is that this is a very simple load balancing technique to implement, with very little overhead. A disadvantage is that there is no care given to the job size or performance. This can create problems if a processor is unlucky and is continually assigned large tasks, causing it to fall behind.
Random
Random load balancing relies on the hope that over the course of enough time, workloads are evenly spread by random chance. Random is fairly easy to implement with little overhead. Generating good "random" values is one challenge, because the function is called so many times that any bias will have a large effect. Random suffers from the same drawbacks as round robin though. There is always the chance that a certain processor is randomly picked in an unusually frequent fashion, leading to wait times for other processors. Random could also assign multiple large tasks to a single processor in a short period of time, which would also lead to uneven load balancing.
Central Manager
Central manager is a load balancing scheme which selects a certain processor to act as the "central node", which handles the balancing. The central node assigns each new task to the slave processor which currently has the least load. This method has a different overhead than usual. Before there would be intercommunication between all processors, where as with central load balancing, the communication exists solely between the central node and the other processors. A drawback of the Central Management is that it usually works best with smaller networks of processors. A hierarchy of master central nodes controlling lesser central nodes is possible, but adds more complexity. It is possible for a central control node to be inundated by messages from its children nodes, locking up the system and causing great drops in performance. The Central Manager policy has an advantage because it requires fewer messages to be sent in order to facilitate load balancing. This method also greatly reduces the chance that any one processor is overworked or left idle.
Dynamic Load Balancing
Local Queue
Under local queue workload management, also called distributed workload management, each processor is responsible for maintaining a sufficient workload. When a load drops below a threshold, the load manager for the processor fires off a request to another random processor workload manager to send work. The remote load manager receiving the request examines its own workload and, if it has sufficient extra work load, will send work to the requesting load manager. This algorithm scheme is fault tolerant in that if any processor were to fail, the other nodes would be able to continue working as they still have their workload and can still manage workloads with other processors. Unfortunately, this scheme generally requires a relatively large amount of inter-processor communications to maintain a satisfactory workload at all processors.
Central Queue
A centralized workload manager is responsible for distributing workload to processors under the central queue algorithm. The central manager is aware of all work to be distributed to the processors. When a processor's load falls below a threshold, a request for more work is sent to the central load manager, which then distributes more work. If there is not enough work in the central queue to meet the demand, the request is buffered until there enough work is available to meet the request. In systems with large numbers of processors, clusters can be formed of groups of processors with each cluster have a centralized workload manager. One workload manager would be in charge of distributing workloads to each cluster workload manager. This scheme has a lower fault tolerance as the system can be at risk of being brought down if the central load manager were to stop working. Also, an entire cluster could stop producing of its central load manager were to stop functioning.
Comparisons of Static versus Dynamic
Parameters | Round Robin | Random | Central Manager | Local Queue | Central Queue |
---|---|---|---|---|---|
Dynamic/Static | Static | Static | Static | Dynamic | Dynamic |
Overload Rejection | No | No | No | Yes | Yes |
Fault Tolerant | No | No | Yes | Yes | Yes |
Forecasting Accuracy | More | More | More | Less | Less |
Stability | Large | Large | Large | Small | Small |
Centralized/Decentralized | D | D | C | D | C |
Cooperative | No | No | Yes | Yes | Yes |
Process Migration | No | No | No | Yes | No |
Resource Utilization | Less | Less | Less | More | Less |
Real World applications of Load Balancing
Weather Modeling
Loading balancing methods play a large role in weather modeling as the amount of data that needs to be processed is quite large and computationally intensive. Many models construct their own data structures and use variations on static and dynamic load balancing to achieve satisfactory performance.
Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4
Visible Human Project
Examples of Load Balancing in action
Server Load balancing pseudocode <ref name="pseudocode">http://code.google.com/p/hypertable/wiki/LoadBalancing
</ref>
//sort the load data and store it in different orders for use later
server_load_vec_desc = sort_descending(server_load_vec);
server_load_vec_asc = sort_ascending(server_load_vec);
//While the deviation is too high, iterate through the nodes
while (server_load_vec_desc[0].deviation > DEVIATION_THRESHOLD)
{
//get the tasks for node [0], and sort them
populate_range_load_vector(server_load_vec_desc[0].server_name);
sort descending range_load_vec;
i=0;
//iterates through the past load data for this node
while (server_load_vec_desc[0].deviation > DEVIATION_THRESHOLD && i < range_load_vec.size())
{
//If a given swap results in a lesser deviation
if (moving range_load_vec[i] from server_load_vec_desc[0] to server_load_vec_asc[0] reduces deviation)
{
//swap and update load balance data related to the load swap
add range_load_vec[i] to balance plan
partial_deviation = range_load_vec[i].loadestimate * loadavg_per_loadestimate;
server_load_vec_desc[0].loadavg -= partial_deviation;
server_load_vec_desc[0].deviation -= partial_deviation;
server_load_vec_asc[0].loadavg += partial_deviation;
server_load_vec_asc[0].deviation += partial_deviation;
server_load_vec_asc = sort_ascending(server_load_vec_asc);
}
i++;
}
//if true, then the entire load has been processed for this node, and entry [0] which is the current node can be removed
if (i == range_load_vec.size())
{
remove server_load_vec_desc[0] and corresponding entry in server_load_vec_asc
}
//re-balance the load before iterating again on the next node
server_load_vec_desc = sort_descending(server_load_vec_desc);
}
Sources
References
<references/>