CSC/ECE 506 Spring 2012/12a mt: Difference between revisions
No edit summary |
No edit summary |
||
(9 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
=='''Unpopular Topologies'''== | =='''Unpopular Topologies'''== | ||
Today, many in the parallel computer programming world have come to realize that depending on the type of task needed to be accomplished, certain interconnection networks are more useful than others. Furthermore, the networks that are most in favor today: use hundreds (if not thousands) of [http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2011/ch12#Types_of_Network_Topologies nodes], utilize [http://en.wikipedia.org/wiki/Crossbar_switch crossbar switches] and sub-switch buffers, adapting routing that avoids [http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2011/ch12#Deadlock deadlocks], etc. However, from the 1980s to the early 1990s, the mindset and the views on interconnection networks of programmers were different. | Today, many in the parallel computer programming world have come to realize that depending on the type of task needed to be accomplished, certain interconnection networks are more useful than others. Furthermore, the networks that are most in favor today: use hundreds (if not thousands) of [http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2011/ch12#Types_of_Network_Topologies nodes], utilize [http://en.wikipedia.org/wiki/Crossbar_switch crossbar switches] and sub-switch buffers, adapting routing that avoids [http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2011/ch12#Deadlock deadlocks], etc<ref>http://www.csm.ornl.gov/workshops/IAA-IC-Workshop-08/documents/wiki/dally_iaa_workshop_0708.pdf</ref>. However, from the 1980s to the early 1990s, the mindset and the views on interconnection networks of programmers were different. | ||
===Introduction to Interconnection Network Architecture=== | ===Introduction to Interconnection Network Architecture=== | ||
Line 29: | Line 29: | ||
===Brief History=== | ===Brief History=== | ||
Ever since inter-processor interconnection technologies were first introduced in the 1960s, many interconnection topologies fads have come and gone. By the 1980s, the topologies that were in high favor were [http://en.wikipedia.org/wiki/VLSI VLSI] ('''V'''ery-'''L'''arge-'''S'''cale '''I'''ntegration) | Ever since inter-processor interconnection technologies were first introduced in the 1960s, many interconnection topologies fads have come and gone. By the 1980s, the topologies that were in high favor were [http://en.wikipedia.org/wiki/VLSI VLSI] ('''V'''ery-'''L'''arge-'''S'''cale '''I'''ntegration) communication networks. VLSI systems are considered to be "wire-limited". "The cost of these systems is predominately that of connecting devices, and the performance is limited by the delay of these interconnections. Thus...the network must make efficient use of the available wire"<ref>Dally, William J. "Performance Analysis of k-ary n-cube Interconnection Networks." ''IEEE Transactions On Computers''. 39.6: (1990), 775-785. http://www.ai.mit.edu/projects/aries/papers/networks/k-ary_n-cube.pdf</ref>. | ||
Also, many topologies were created and suggested to be used during this time, but the one used most frequently was the ''k''-ary ''n''-cube, where there were ''n'' dimensions and ''k'' nodes in each dimension. In fact, most parallel (or concurrent) computers at the time utilized either the ''k''-ary ''n''-cube topology or those similar to it (i.e. mesh, ring, omega network, etc.). The following is the formula that shows the relationship between ''k'', ''n'', and the number of nodes in the network (N): | |||
N = k^n | |||
===The ''PROBLEM''=== | ===The ''PROBLEM''=== | ||
As mentioned previously, a VLSI network is wire-limited; as a result: network complexity is limited by wire delay, the program's run speed is limited by wire density, and the majority of the power is consumed by drive wires. Therefore, in order to achieve a low latency and high bandwidth communication, network architects have make sure that their designs are "wire-efficient". To do this, the wires need to be kept as short as possible in order to limit latency and diameter. Furthermore, when architects are brainstorming ideas for their networks, they have to ensure the physical organization (the wires) matches the logical organization (function of the program) so that it produces the most efficient and optimized results. | |||
However, "networks have traditionally been analyzed under the assumption of constant channel bandwidth"[3]. As it so happens, constant channel bandwidth favors high-dimensional networks; unfortunately, low-dimensional networks are far more efficient than high-dimensional ones under VLSI. The reason is due to VLSI's wire-limited nature. As more processors (and subsequently more dimensions) are added to the network, the length and number of wires needed increase. Due to the number of wires increasing, the number of [http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2012/12a_mt#Glossary links] necessary for messages to cross to reach their destination increase, thereby increasing the latency. Furthermore, the increase in length of the wires means that more materials are needed to build the network, thereby increasing the cost. By the early 1990s, network architects began to abandon VLSI networks like the 'k''-ary ''n''-cube, and began to research networks that would be more favorable to high-dimensional networks. | |||
=='''Recent Studies'''== | =='''Recent Studies'''== | ||
Line 43: | Line 49: | ||
====Background==== | ====Background==== | ||
Among the recent developments of various multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many of its attractive properties: regularity, symmetry, small diameter, strong connectivity, recursive construction, | Among the recent developments of various multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many of its attractive properties: regularity, symmetry, small diameter, strong connectivity, recursive construction, partition ability, and relatively small link complexity. Additionally, there have been several variations of the Hypercube, including (but not limited to) the Varietal Hypercube (VH) and Balanced Hypercube (BH). The VH has a lower degree, average distance, and cost when compared to several other versions of Hypercube like the Folded Hypercube, Twisted Hypercube, Crossed Hypercube, etc. On the other hand, the BH has proven to be better in terms of fault tolerance (which improves system reliability as the number of processors increase) compared to the regular Hypercube. | ||
In interconnection network design, some of the most important performance parameters to consider are: communication efficiency, reliability, fault tolerance, and cost effectiveness. Thus, in order to meet the demands for such a network with a large number of processors, a new design that combined the basic structures of the VH and BH was designed known simply as the Balanced Varietal Hypercube (BVH). It "has got a reduced diameter, optimal average distance with less cost", and the inherited merits of fault tolerance from the BH. The following exhibits the BVH's topological properties for an n-dimensional BVH: | In interconnection network design, some of the most important performance parameters to consider are: communication efficiency, reliability, fault tolerance, and cost effectiveness. Thus, in order to meet the demands for such a network with a large number of processors, a new design that combined the basic structures of the VH and BH was designed known simply as the Balanced Varietal Hypercube (BVH). It "has got a reduced diameter, optimal average distance with less cost", and the inherited merits of fault tolerance from the BH. The following exhibits the BVH's topological properties for an n-dimensional BVH: | ||
Line 64: | Line 70: | ||
===Omega Network=== | ===Omega Network=== | ||
A crucial design attribute of a large network system is its reliability. When the system size grows, the probability of having all system nodes fault free during a given operation period falls quickly and could reach an unacceptable low point. It is, thus, necessary to incorporate redundancy into the system design to guarantee proper continuous operation even after the failure of some nodes | A crucial design attribute of a large network system is its reliability. When the system size grows, the probability of having all system nodes fault free during a given operation period falls quickly and could reach an unacceptable low point. It is, thus, necessary to "incorporate redundancy into the system design to guarantee proper continuous operation even after the failure of some nodes"<ref>Bataineh, Sameer, and Ghassan E. Quanzu'a. "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems." The Computer Journal. 46.5 (2003): 467-475. Print. http://comjnl.oxfordjournals.org/content/46/5/467.abstract</ref>. | ||
====Background==== | ====Background==== | ||
Line 101: | Line 107: | ||
<references></references> | <references></references> | ||
===See Also=== | |||
http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2011/ch12 | |||
Line 116: | Line 126: | ||
2) What is a more simplified way to calculate the ''cost'' of a | 2) What is a more simplified way to calculate the ''cost'' of a Balanced Varietal Hypercube? | ||
a) 3n | a) 3n | ||
Line 125: | Line 135: | ||
d) 2n^2 + 2n | d) 2n^2 + 2n | ||
3) True or False: The current FTΩ can tolerate more than one node malfunction at a time. | 3) True or False: The current FTΩ can tolerate more than one node malfunction at a time. | ||
a) True | |||
b) False | |||
4) How many nodes are in a binary 10-node cube? (Hint: the prefix ''bi-'' in Latin means "two". | |||
a) 20 | |||
b) 100 | |||
c) 1024 | |||
d) 2048 | |||
5) If there are 2187 nodes in a tertiary (''k'' = 3) cube, what is the value of ''n''? | |||
a) 7 | |||
b) 8 | |||
c) 9 | |||
d) 10 | |||
6) For review, which of the following is '''NOT''' one of the three ingredients necessary for an efficient shared memory multiprocessor? | |||
a) synchronization primitives | |||
b) cache coherence | |||
c) interconnection network | |||
d) memory consistency | |||
7) When were inter-processor interconnection technologies were ''first'' introduced? | |||
a) 1950s | |||
b) 1960s | |||
c) 1970s | |||
d) 1980s | |||
8) Which of the following is '''NOT''' a type of Hypercube? | |||
a) Flexed | |||
b) Twisted | |||
c) Crossed | |||
d) Folded | |||
9) Why is it important for a interconnection network to have ''high'' bandwidth communication? | |||
a) It isn't important, only achieving low latency is important. | |||
b) It means the network is fast. | |||
c) It means the network is correct. | |||
d) It means the network is efficient. | |||
10) True or False: The reason the VLSI computer networks were abandoned was due to their "wire-limited" characteristic? | |||
a) True | |||
b) False | |||
==='''''Answer Key'''''=== | ==='''''Answer Key'''''=== | ||
1) ''d''; 2) ''b''; | 1) ''d''; 2) ''b''; 3) ''b''; 4) ''c''; 5) ''a''; 6) ''c''; 7) ''b''; 8) ''a''; 9) ''d''; 10) ''a''; |
Latest revision as of 02:55, 26 April 2012
Interconnection Network Topologies
Introduction
In a multiprocessor environment, the terms interconnection network usually refers to the links between multiple independent processors. There are several different network topologies that are selected based upon the unique characteristics for which a system requires. For example, in a shared memory multiprocessor, messages generally are: short, frequent, and make it hard for processors to hide the message communication delay; therefore, shared memory multiprocessors prefer interconnection networks that have low latency and high bandwidth. While latency and bandwidth are two very important factors an interconnection network, there are plenty of other factors to consider: coherence protocol, memory consistency, communication protocols, etc. In order to better understand the unique aspects and characteristics that are considered for an interconnection network, we will discuss certain examples of interconnection topologies that were researched then abandoned, as well as recent developments in networks for large scale multiprocessor systems.
Unpopular Topologies
Today, many in the parallel computer programming world have come to realize that depending on the type of task needed to be accomplished, certain interconnection networks are more useful than others. Furthermore, the networks that are most in favor today: use hundreds (if not thousands) of nodes, utilize crossbar switches and sub-switch buffers, adapting routing that avoids deadlocks, etc<ref>http://www.csm.ornl.gov/workshops/IAA-IC-Workshop-08/documents/wiki/dally_iaa_workshop_0708.pdf</ref>. However, from the 1980s to the early 1990s, the mindset and the views on interconnection networks of programmers were different.
Introduction to Interconnection Network Architecture
In order to better understand the reason behind the different viewpoints and change of opinions in the 1990s it is better to understand a bit about interconnection architecture. By now you should know (at least I hope you do) about the three key elements necessary to build an efficient shared memory multiprocessor (SMP): cache coherence, memory consistency, and synchronization primitives (i.e. barrier, locks, etc). Assuming you know all about these by now (and if you don't, you can look it up later), then you should also know that for a SMP to be efficient, it must also be able to pass messages between the multiple processors. With only a small number of processors, a shared bus is all that is required; but as I mentioned earlier, nowadays programmers want to utilize thousands of processors at a time, so a shared bus would be infeasible. To that end, the purpose of the interconnection network is to create a reliable communication network between all the processors; in other words, an interconnection network "glues multiple processors together to form a shared multiprocessor system"<ref>Solihin, Yan. Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems. Raleigh: Solihin Books, 2008-2009. Print.</ref>.
There are plenty of concepts one should familiarize with to better understand a network. But for the sake of focus, I will only concentrate on the ones that describe important metrics which programmers consider when designing the network's topology (the rest are in the Glossary below).
- Bandwidth - a rate of measure depicting the available or consumed data communication resources.
- Latency - the amount of time needed to send a message from the 'sender' node to the 'receiver' node.
- Diameter - the longest distance between any pair of nodes in a network.
- Average Distance - the sum of distances between all pairs of nodes in a network divided by the number of node pairs.
- Bisection Bandwidth - the minimum number of links that must be cut in order to partition the network into two equal halves.
- Degree - the number of in/out links connecting to each node (or to be more specific, the node's router).
The above terms are metrics that describe and evaluate the characteristics of an interconnection network topology. The first two are the most important, as the goal of any network topology design is to achieve low latency and high bandwidth communication between nodes. To clarify, the former means that messages should be delivered as quickly as possible, while the latter means that messages should be delivered as efficiently as possible.
Brief History
Ever since inter-processor interconnection technologies were first introduced in the 1960s, many interconnection topologies fads have come and gone. By the 1980s, the topologies that were in high favor were VLSI (Very-Large-Scale Integration) communication networks. VLSI systems are considered to be "wire-limited". "The cost of these systems is predominately that of connecting devices, and the performance is limited by the delay of these interconnections. Thus...the network must make efficient use of the available wire"<ref>Dally, William J. "Performance Analysis of k-ary n-cube Interconnection Networks." IEEE Transactions On Computers. 39.6: (1990), 775-785. http://www.ai.mit.edu/projects/aries/papers/networks/k-ary_n-cube.pdf</ref>.
Also, many topologies were created and suggested to be used during this time, but the one used most frequently was the k-ary n-cube, where there were n dimensions and k nodes in each dimension. In fact, most parallel (or concurrent) computers at the time utilized either the k-ary n-cube topology or those similar to it (i.e. mesh, ring, omega network, etc.). The following is the formula that shows the relationship between k, n, and the number of nodes in the network (N):
N = k^n
The PROBLEM
As mentioned previously, a VLSI network is wire-limited; as a result: network complexity is limited by wire delay, the program's run speed is limited by wire density, and the majority of the power is consumed by drive wires. Therefore, in order to achieve a low latency and high bandwidth communication, network architects have make sure that their designs are "wire-efficient". To do this, the wires need to be kept as short as possible in order to limit latency and diameter. Furthermore, when architects are brainstorming ideas for their networks, they have to ensure the physical organization (the wires) matches the logical organization (function of the program) so that it produces the most efficient and optimized results.
However, "networks have traditionally been analyzed under the assumption of constant channel bandwidth"[3]. As it so happens, constant channel bandwidth favors high-dimensional networks; unfortunately, low-dimensional networks are far more efficient than high-dimensional ones under VLSI. The reason is due to VLSI's wire-limited nature. As more processors (and subsequently more dimensions) are added to the network, the length and number of wires needed increase. Due to the number of wires increasing, the number of links necessary for messages to cross to reach their destination increase, thereby increasing the latency. Furthermore, the increase in length of the wires means that more materials are needed to build the network, thereby increasing the cost. By the early 1990s, network architects began to abandon VLSI networks like the 'k-ary n-cube, and began to research networks that would be more favorable to high-dimensional networks.
Recent Studies
As I mentioned earlier, the views on preferred interconnection topologies and technologies have changed significantly over the past couple of decades. The following shows a couple of examples of recent endeavors made in the field of interconnection technology to strive for the most optimal network available:
Balanced Varietal Hypercube<ref>Tripathy, C. R., and N. Adhikari. "On A New Multicomputer Interconnection Topology For Massively Parallel Systems." International Journal of Distributed and Parallel Systems. 2.4 (2011): 162-178. Print. http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf</ref>
Background
Among the recent developments of various multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many of its attractive properties: regularity, symmetry, small diameter, strong connectivity, recursive construction, partition ability, and relatively small link complexity. Additionally, there have been several variations of the Hypercube, including (but not limited to) the Varietal Hypercube (VH) and Balanced Hypercube (BH). The VH has a lower degree, average distance, and cost when compared to several other versions of Hypercube like the Folded Hypercube, Twisted Hypercube, Crossed Hypercube, etc. On the other hand, the BH has proven to be better in terms of fault tolerance (which improves system reliability as the number of processors increase) compared to the regular Hypercube.
In interconnection network design, some of the most important performance parameters to consider are: communication efficiency, reliability, fault tolerance, and cost effectiveness. Thus, in order to meet the demands for such a network with a large number of processors, a new design that combined the basic structures of the VH and BH was designed known simply as the Balanced Varietal Hypercube (BVH). It "has got a reduced diameter, optimal average distance with less cost", and the inherited merits of fault tolerance from the BH. The following exhibits the BVH's topological properties for an n-dimensional BVH:
Parameters
- Degree = 2n
- Number of Nodes = 2^2n
- Number of Edges = n*2^2n
- Diameter: if (n=1), 2; if (n>1), n + n/2
- Average distance: (1/(2^2n))* sum of all distances
- Cost = degree * diameter = [2n * (n + n/2)]
Performance
Based on tests and comparisons to other hypercube networks, the BVH has certainly proven to be a compromise between the two extremes of BH and VH. For instance, in terms of topological parameters like diameter and cost, VH has the least, BH has the most, and BVH patterns out between the two. Furthermore, in terms of reliability and cost effectiveness, BVH is far more efficient than BH.
Omega Network
A crucial design attribute of a large network system is its reliability. When the system size grows, the probability of having all system nodes fault free during a given operation period falls quickly and could reach an unacceptable low point. It is, thus, necessary to "incorporate redundancy into the system design to guarantee proper continuous operation even after the failure of some nodes"<ref>Bataineh, Sameer, and Ghassan E. Quanzu'a. "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems." The Computer Journal. 46.5 (2003): 467-475. Print. http://comjnl.oxfordjournals.org/content/46/5/467.abstract</ref>.
Background
In real-time applications, fault tolerance and reliability are said to be two of the most important factors to consider. As a result, the main focus of the Omega Network's design was to be able to maintain its network structure despite any faults that may occur.
Description
In order to maintain the network's structure, there are additional, redundant nodes and links added to the structure to act as back-ups for the primary nodes. For instance, in the proposed design, a control switch is added between nodes and four additional switches are added to each node. In a fault-free state, everything works as it should and the nodes are considered to be in the "X" state. However, in the event of a node malfunction, the node switches to the "V" state and the four switches are used to redirect traffic flow away from the node toward its replacement. Its replacement is quite probably one of the extra nodes. Due to the presence of the extra links and control switches, this particular network is known as the FTΩ (fault-tolerance omega).
Each node has four links attached to them: two for heading forward away from the node, and two coming from behind into the node. Furthermore, the FTΩ design only has one extra node per primary node. As a result, a data path for this network can tolerate one fault, but no more than that. To be specific, when a node malfunction occurs, a node replaces it via an extra link to transfer the traffic; however, extra links are limited, and when one is in use, this makes it impossible for certain adjacent (or nearby) nodes to be used in case of any subsequent faults; these unusable nodes are called critical nodes. In order to improve the current design, it would have to incorporate another set of extra nodes.
Glossary
Term | Definition |
---|---|
node | A unit containing a processor(s), its cache, memory, communication assist, interface logic, and router. |
router | The connector that links up with other nodes (via their routers). |
link | the physical wire between two individual routers. |
unidirectional | the data travels only in ONE direction on link. |
bidirectional | the data can travel in BOTH directions on a link. |
topology | the shape of the network (i.e. mesh, ring, tree, butterfly, etc). |
network hops | the number of links a message must travel to in order to reach its destination. |
References
<references></references>
See Also
http://expertiza.csc.ncsu.edu/wiki/index.php/CSC/ECE_506_Spring_2011/ch12
Pop Quiz
1) If a Balanced Varietal Hypercube has 4 dimensions (n = 4), than how many nodes and edges will the network have, respectively?
a) 8, 512
b) 8, 1024
c) 256, 512
d) 256, 1024
2) What is a more simplified way to calculate the cost of a Balanced Varietal Hypercube?
a) 3n
b) 3n^2
c) 2n^2 + n
d) 2n^2 + 2n
3) True or False: The current FTΩ can tolerate more than one node malfunction at a time.
a) True
b) False
4) How many nodes are in a binary 10-node cube? (Hint: the prefix bi- in Latin means "two".
a) 20
b) 100
c) 1024
d) 2048
5) If there are 2187 nodes in a tertiary (k = 3) cube, what is the value of n?
a) 7
b) 8
c) 9
d) 10
6) For review, which of the following is NOT one of the three ingredients necessary for an efficient shared memory multiprocessor?
a) synchronization primitives
b) cache coherence
c) interconnection network
d) memory consistency
7) When were inter-processor interconnection technologies were first introduced?
a) 1950s
b) 1960s
c) 1970s
d) 1980s
8) Which of the following is NOT a type of Hypercube?
a) Flexed
b) Twisted
c) Crossed
d) Folded
9) Why is it important for a interconnection network to have high bandwidth communication?
a) It isn't important, only achieving low latency is important.
b) It means the network is fast.
c) It means the network is correct.
d) It means the network is efficient.
10) True or False: The reason the VLSI computer networks were abandoned was due to their "wire-limited" characteristic?
a) True
b) False
Answer Key
1) d; 2) b; 3) b; 4) c; 5) a; 6) c; 7) b; 8) a; 9) d; 10) a;