Expertiza_Wiki - User contributions [en]

User:Mdcotter

2012-04-25T19:08:03Z

Jjohn: /* Quiz */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg|center|Fig 1.|Basic Hypercube Structures ]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

=K-Ary n-cube Interconnection networks=

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

===Throughput===
Throughput is another metric of network performance. Throughput is the total number of messages the network can handle per unit of time. This can best be estimated by calculating the total capacity of the given network. Low-dimensional networks's latency increase slower when the traffic is actually applied vs high-dimensional networks. The low-dimensional networks are able to handle contention better because they use fewer channels at higher bandwidth and get better throughput performance than the high-dimensional networks.

===Hot-Spot Throughput===
Hot spot throughput describes the situation where traffic is not uniform, instead it is concentrated in a pair of nodes that is responsible for higher traffic. Hot-spot traffic causes congestion and can hurt the throughput. As with normal throughput, low-dimensional networks have better bandwidth and thus better hot-spot throughput than the high-dimensional networks.

===Conclusion===
Low-dimensional k-ary n-cube networks have lower latency, less contention, and higher hot-spot throughput than the high-dimensional networks. This demonstrates how this outdated network technique is not suitable for larger machines, cause is does not scale well.

=Quiz=
'''Question 1 : Two nodes in the hypercube are adjacent if and only if they differ at how many bit positions'''
a) Two
b) One
c) Three
d) Zero

'''Question 2 : A n dimensional, balanced varietal hypercube consists of how many nodes?'''
a) 2n
b) 2n-1
c) 2^2n
d) 2^2n -1

'''Question 3 :The maximum number of links that must be traversed to send a message to any node is the'''
a) Network Narrowness
b) Network Increments
c) Network Diameter
d) Network Connectivity

'''Question 4 :It is difficult to design and fabricate the nodes of the hypercube because'''
a) Large fan out
b) Increased Heat Dissipation
c) Lesser number of neighbours
d) Small Diameter

'''Question 5 : A good reason for choosing Balanced Hypercube is'''
a) Large Diameter
b) Fault Tolerance
c) Increased Heat Dissipation
d) Average Distance

'''Question 6 : '''
a)
b)
c)
d)

'''Question 7 : '''
a)
b)
c)
d)

'''Question 8 : '''
a)
b)
c)
d)

'''Question 9 : '''
a)
b)
c)
d)

'''Question 10 : '''
a)
b)
c)
d)

Solutions :
1-b
2-c
3-c
4-a
5-b

=References=
<references />

User:Mdcotter

2012-04-25T19:06:39Z

Jjohn: /* Quiz */

User:Mdcotter

2012-04-25T04:57:30Z

Jjohn: /* Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg|center|Fig 1.|Basic Hypercube Structures ]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

===Throughput===
Throughput is another metric of network performance. Throughput is the total number of messages the network can handle per unit of time. This can best be estimated by calculating the total capacity of the given network. Low-dimensional networks's latency increase slower when the traffic is actually applied vs high-dimensional networks. The low-dimensional networks are able to handle contention better because they use fewer channels at higher bandwidth and get better throughput performance than the high-dimensional networks.

===Hot-Spot Throughput===
Hot spot throughput describes the situation where traffic is not uniform, instead it is concentrated in a pair of nodes that is responsible for higher traffic. Hot-spot traffic causes congestion and can hurt the throughput. As with normal throughput, low-dimensional networks have better bandwidth and thus better hot-spot throughput than the high-dimensional networks.

===Conclusion===
Low-dimensional k-ary n-cube networks have lower latency, less contention, and higher hot-spot throughput than the high-dimensional networks. This demonstrates how this outdated network technique is not suitable for larger machines, cause is does not scale well.

=References=
<references />

User:Mdcotter

2012-04-25T04:57:16Z

Jjohn: /* Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg|thumb|center|Fig 1.|Basic Hypercube Structures ]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

===Throughput===
Throughput is another metric of network performance. Throughput is the total number of messages the network can handle per unit of time. This can best be estimated by calculating the total capacity of the given network. Low-dimensional networks's latency increase slower when the traffic is actually applied vs high-dimensional networks. The low-dimensional networks are able to handle contention better because they use fewer channels at higher bandwidth and get better throughput performance than the high-dimensional networks.

===Hot-Spot Throughput===
Hot spot throughput describes the situation where traffic is not uniform, instead it is concentrated in a pair of nodes that is responsible for higher traffic. Hot-spot traffic causes congestion and can hurt the throughput. As with normal throughput, low-dimensional networks have better bandwidth and thus better hot-spot throughput than the high-dimensional networks.

===Conclusion===
Low-dimensional k-ary n-cube networks have lower latency, less contention, and higher hot-spot throughput than the high-dimensional networks. This demonstrates how this outdated network technique is not suitable for larger machines, cause is does not scale well.

=References=
<references />

User:Mdcotter

2012-04-25T04:56:54Z

Jjohn: /* Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg|thumb|center|400px|Fig 1.|Basic Hypercube Structures ]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

===Throughput===
Throughput is another metric of network performance. Throughput is the total number of messages the network can handle per unit of time. This can best be estimated by calculating the total capacity of the given network. Low-dimensional networks's latency increase slower when the traffic is actually applied vs high-dimensional networks. The low-dimensional networks are able to handle contention better because they use fewer channels at higher bandwidth and get better throughput performance than the high-dimensional networks.

===Hot-Spot Throughput===
Hot spot throughput describes the situation where traffic is not uniform, instead it is concentrated in a pair of nodes that is responsible for higher traffic. Hot-spot traffic causes congestion and can hurt the throughput. As with normal throughput, low-dimensional networks have better bandwidth and thus better hot-spot throughput than the high-dimensional networks.

===Conclusion===
Low-dimensional k-ary n-cube networks have lower latency, less contention, and higher hot-spot throughput than the high-dimensional networks. This demonstrates how this outdated network technique is not suitable for larger machines, cause is does not scale well.

=References=
<references />

User:Mdcotter

2012-04-17T03:38:18Z

Jjohn: /* Extended Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

=References=
<references />

User:Mdcotter

2012-04-17T03:35:26Z

Jjohn: /* References */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

=References=
<references />

User:Mdcotter

2012-04-17T03:35:09Z

Jjohn: /* Quiz */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

==References==
<references />

User:Mdcotter

2012-04-17T03:34:35Z

Jjohn: /* References */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

==Quiz==

==References==
<references />

User:Mdcotter

2012-04-17T03:34:11Z

Jjohn: /* Extended Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:31:12Z

Jjohn: /* Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:28:53Z

Jjohn: /* Extended Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:27:26Z

Jjohn: /* Balanced Varietal Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]

The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

[[File:Latency_Fig9.png]]
[[File:Latency_Fig10.png]]

Figure 9 and 10 further demonstrate how latency increases with higher dimensional networks, using logarithmic and linear delay. With linear delay, latency is determined solely by bandwidth and physical distance, putting the higher dimensional networks at a disadvantage.

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:25:30Z

Jjohn: /* Balanced Varietal Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks <ref name/="ext-var">

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]
The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:23:09Z

Jjohn: /* Balanced Varietal Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks<ref name="ext-var"></ref>.

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]
The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:22:29Z

Jjohn: /* Balanced Varietal Hypercube */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks<ref name="ext-var">.

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]
The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:19:36Z

Jjohn: /* References */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks.

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===
Latency is a measure of network performance, where it is the sum of the latency due to the network and the latency due to the processing node. Figure 8 below shows the average network latency vs dimension with varying nodes. Figure 8 demonstrates that low-dimensional networks provide lower latency than high-dimensional networks with a constant delay.
[[File:Latency_Fig8.png]]
The low-dimensional networks are able to capitalize on locality and have lower latency. However, the high-dimensional networks are do not benefit from locality and are forced to have longer message lengths.

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [[http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:16:01Z

Jjohn: /* References */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks.

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===

==Quiz==

==References==
<references />
3. Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:12:23Z

Jjohn: /* Referecences */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks.

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===

==Quiz==

==References==
<references />
3.Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 <ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]</ref>

4. J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 <ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

5. Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

6. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T03:08:39Z

Jjohn: /* Referecences */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks.

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 2003 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

The purpose of this section is to demonstrate how the outdated k-ary n-cube interconnection networks were never really used for large scale multiprocessor systems.<ref name="k-ary">Dally, William J.; , Performance Analysis of k-ary n-cube
Interconnection Networks" IEEE Transactions on Computers. vol. 39, no. 6, pp.775-785, June 1990 [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=53599&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D53599]</ref> VLSI systems, a.k.a k-ary n-cube, are communication limited rather than processing limited. This means that is performance is dependent on the wire used to connect the system. This section will show through latency, throughput, hot-spot throughput that k-ary n-cube interconnection networks work better on lower-dimensional networks than it does on high-dimensional networks, assuming constant bisectional width.

===VLSI Complexity===
Since VLSI systems are wire-limited, meaning that the speed that it can run at is dependent on the wire delay. This means that systems are required to organize there nodes logically and physically to keep the wires as short as possible. Networks that work with higher dimensions cost more dut to the fact that there are more and longer wires, compared to low-dimension networks.

===Latency===

==Quiz==

==Referecences==
<references />
<ref name="Otis-Hypercube">Basel A. Mahafzah; Bashira A. Jaradat; "The load balancing problem in OTIS-Hypercube interconnection networks" The Journal of Supercomputing Volume 46 Issue 3, December 2008 [http://www.springerlink.com.prox.lib.ncsu.edu/content/vux017112073p8w7/fulltext.pdf]</ref>

<ref name="ext-hypercube">J.Mohan Kumar; L.M.Patnaik; "Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes" IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 3, NO. 1, JANUARY 1992 [http://eprints.iisc.ernet.in/6842/1/12.pdf]</ref>

<ref name="hyp-ext">Liu Youyao; Han Jungang; Du Huimin; "A Hypercube-based Scalable Interconnection Network for Massively Parallel Computing" JOURNAL OF COMPUTERS, VOL. 3, NO. 10, OCTOBER 2008[http://www.academypublisher.com/jcp/vol03/no10/jcp0310058065.pdf]</ref>

<ref name="ext-var">C. R. Tripathy; N. Adhikari; "ON A NEW MULTICOMPUTER INTERCONNECTION TOPOLOGY FOR MASSIVELY PARALLEL SYSTEMS" International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.4, July 2011[http://arxiv.org/ftp/arxiv/papers/1108/1108.1462.pdf]</ref>

User:Mdcotter

2012-04-17T00:55:27Z

Jjohn: /* Reliable Omega interconnected networks */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks.

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=

The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 203 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

==Quiz==

==Referecences==
<references />

User:Mdcotter

2012-04-17T00:54:38Z

Jjohn: /* Reliable Omega interconnected networks */

=New Interconnection Topologies=

==Introduction==
Parallel processing has assumed a crucial role in the field of supercomputing. It has overcome
the various technological barriers and achieved high levels of performance. The most efficient
way to achieve parallelism is to employ multicomputer system. The success of the
multicomputer system completely relies on the underlying interconnection network which
provides a communication medium among the various processors. It also determines
the overall performance of the system in terms of speed of execution and efficiency. The
suitability of a network is judged in terms of cost, bandwidth, reliability, routing ,broadcasting,
throughput and ease of implementation. Among the recent developments of various
multicomputing networks, the Hypercube (HC) has enjoyed the highest popularity due to many
of its attractive properties. These properties include regularity, symmetry, small
diameter, strong connectivity, recursive construction, partitionability and relatively small link
complexity.

==Metrics Interconnection Networks==
===Network Connectivity===

Network nodes and communication links sometimes fail and must be removed from service for repair. When components do fail the network should continue to function with reduced capacity.
Network connectivity measures the resiliency of a network and its ability to continue operation despite disabled components i.e. connectivity is the minimum number of nodes or links that must fail to partition the network into two or more disjoint networks
The larger the connectivity for a network the better the network is able to cope with failures.

===Network Diameter===

The diameter of a network is the maximum internode distance i.e. it is the maximum number of links that must be traversed to send a message to any node along a shortest path.
The lower the diameter of a network the shorter the time to send a message from one node to the node farthest away from it.

===Narrowness===
This is a measure of congestion in a network and is calculated as follows:
Partition the network into two groups of processors A and B where the number of processors in each group is Na and Nb and assume Nb < = Na. Now count the number of interconnections between A and B call this I. Find the maximum value of Nb / I for all partitionings of the network. This is the narrowness of the network.
The idea is that if the narrowness is high ( Nb > I) then if the group B processors want to send messages to group A congestion in the network will be high ( since there are fewer links than processors )

===Network Expansion Increments===
A network should be expandable i.e. it should be possible to create larger and more powerful multicomputer systems by simply adding more nodes to the network.
For reasons of cost it is better to have the option of small increments since this allows you to upgrade your network to the size you require ( i.e. flexibility ) within a particular budget.
E.g. an 8 node linear array can be expanded in increments of 1 node but a 3 dimensional hypercube can be expanded only by adding another 3D hypercube. (i.e. 8 nodes)

=Hypercube=
Hypercube networks consist of N = 2n nodes arranged in a k dimensional hypercube. The nodes are numbered 0 , 1, ....2n -1 and two nodes are connected if their binary labels differ by exactly one bit.
The attractiveness of the hypercube topology is its small diameter, which is the maximum number of links (or hops) a message has to travel to reach its final destination between any two nodes. For a hypercube network the diameter is identical to the degree of a node n = log2N. There are 2n nodes contained in the hypercube; each is uniquely represented by a binary sequence bn-1bn-2...b0 of length n. Two nodes in the hypercube are adjacent if and only if they differ at exactly one bit position. This property greatly facilitates the routing of messages through the network.

[[File:1d-cube.jpg]]

In addition, the regular and symmetric nature of the network provides fault tolerance. Most Important parameters of an interconnection network of a multicomputer system are its scalability and modularity. Scalable networks have the property that the size of the system (e.g., the number of nodes) can be increased with minor or no change in the existing configuration. Also, the increase in system size is expected to result in an increase in performance to the extent of the increase in size. a major drawback of the hypercube network is its lack of scalability, which limits its use in building large size systems out of small size systems with little changes in the configuration. As the dimension of the hypercube is increased by one, one more link needs to be added to every node in the network. Therefore, it becomes more difficult to design and fabricate the nodes of the hypercube because of the large fan-out.

[[File:4d-hyper.jpg]]

=Twisted Cubes=
Twisted cubes are variants of hypercubes. The n-dimensional twisted cube has 2n nodes and n2n-1 edges. It possesses some desirable features for interconnection networks . Its diameter, wide diameter, and faulty diameter are about half of those of the n-dimensional hypercube. A complete binary tree can be embedded into it. The five-dimensional twisted cube TQ5, where the end nodes of a missing edge are marked with arrows labeled with the same letter is shown below

[[File:Twsit-1.jpg]]

=Extended Hypercube=
The hypercube networks are not truly expandable because we have to change the hardware configuration of all the nodes whenever the number of nodes grows exponentially, as the nodes have to be provided with additional ports. an incompletely populated hypercube lacks some of the properties which make the hypercube attractive in the first place. Complicated routing algorithms are necessary for the incomplete hypercube. Extended Hypercube(EH) retains the attractive features of the hypercube topology to a large extent. The EH is built using basic modules consisting of a k-cube of processor elements (PE’s) and a Network Controller (NC) as shown in the figure below. The NC is used as a communication processor to handle intermodule communication; 2k such basic modules can be interconnected via 2k NC’s, forming a k-cube among the NC’s. The EH is essentially a truly expansive, recursive structure with a constant predefined building block. The number of I/O ports for each PE (and NC) is fixed and is independent of the size of the network. The EH structure is found to be most suited for implementing a class of highly parallel algorithms. The EH can emulate the binary hypercube in implementing a large class of algorithms with insignificant degradation in performance. The utilization factor of the EH is higher than that of the Hypercube.

[[File:Ext.jpg]]

=Balanced Hypercube=
As the number of processors in a system increases, the probability of system failure can be expected to be quite high unless specific measures are taken to tolerate faults within the system. Therefore, a major goal in the design of such a system is fault tolerance. These systems are made fault-tolerant by providing redundant or spare processors and/or links. When an active processor (where tasks are running) fails, its tasks are dynamically transferred to spare components. The objective is to provide an efficient reconfiguration by keeping the recovery time small. It is highly desirable that the reconfiguration process is a decentralized and local one, so fault status exchange and task migration can be reduced or eliminated.
The Balanced Hypercube is a variation of the Hypercube structure that can support consistently recoverable embedding. Balanced hypercubes belong to a special type of load balanced graphs [10] that can support consistently recoverable embedding. In a load balanced graph G = (V, E), with V as the node set and E as the edge set, for each node v there exists another node v’, such that v and v’ have the same adjacent nodes. Such a pair of nodes v and v’ is called a matching pair.

[[File:Ext1.jpg]]

In a load balanced graph, a task can be scheduled to both v and v’ in such a way that one copy is active and the other one is passive. If node v fails, we can simply shift tasks of v to v’ by activating copies of these tasks in v’. All the other tasks running on other nodes do not need to be reassigned to keep the adjacency property, i.e., two tasks that are adjacent are still adjacent after a system reconfiguration. Note that the rule of v and v’ as primary and backup are relative. We can have an active task running on node v with its backup on node v’, while having another active task running on node v’ with its backup on node v. With a sufficient number of tasks and a suitable load balancing approach, we can have a balanced use of processors in the system.

[[File:Ext2.jpg]]

A graph G is load balanced if and only if for every node in G there exists another node matching it, i.e., these two nodes have the same adjacent nodes. Hence they are called a matching pair. A completely connected graph with an even number of nodes is load balanced, while several other commonly used graph structures, such as meshes, trees, and hypercubes, are not. In the figures shown labeled as BHn, n is called the dimension of the balanced hypercube.

=Balanced Varietal Hypercube=
A n dimensional, balanced varietal hypercube consists of 2^2n nodes. Every node connects 2n nodes, which are of 2 types :a)Inner nodes b)Outer nodes through hyperlinks.

a) Inner node:
Case I: When a0 is even:

(i) <(a0+1)mod 4, a1,a2..... an-1>
(ii)< (a0-2)mod 4, a1,a2..... an-1>

Case II: When a0 is odd:

(i) <(a0-1)mod 4, a1,a2..... an-1>
(ii)< (a0+2)mod 4 ,a1,a2..... an-1>

b) Outer node:
Case I: When a0=0,3:

(i) For ‘ai’ = 0

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+1)mod 4 a2,....,an-1>

(ii) For ‘ai’ = 3

<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>

Case II: when a0=1,2 and ai= 0,3:

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case III: when a0=0,1

(i) For ai=1

<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
(ii) For ai=2
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>
<(a0+1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

Case IV: when a0=2,3

(i) For ai=1
<(a0+1) mod 4 , a1,....,(ai-1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4 ,....,an-1>

(ii) For ai=2

<(a0+1) mod 4 , a1,....,(ai+1)mod 4 ,....,an-1>
<(a0-1) mod 4 , a1,....,(ai+2)mod 4,....,an-1>

[[File:Evh.jpg]]

==Properties of a Balanced Varietal Hypercube==

The degree of any node in the Balanced varietal hypercube of dimension n is equal to 2n.

An n-dimensional Balanced varietal hypercube has n* 22n edges.

The diameter of an n-dimensional Balanced varietal hypercube is
i. 2n for n=1
ii. ceil(n+n/2) for n> 1.

The average distance conveys the actual performance of the network better in practice. The summation of distance of all nodes from a given node over the total number of nodes determines the average distance of the network. In the Balanced varietal hypercube the average distance is given by 1/(2^2n)∑[d(0,0,0),k]

Cost is an important factor as far as an interconnection network is concerned. The topology
which possesses minimum cost is treated as the best candidate. Cost factor of a network is the
product of degree and diameter. The cost of an n-dimensional balanced varietal hypercube is given by 2n* ceiling(n + n/2).

==Routing in Balanced Varietal Hypercube==
In routing process, each processor along the path considers itself as the source and forwards the message to a neighbouring node one step closer to the destination. The algorithm consists of a left to right scan of source and destination address. Let r be the right most differing bit (quarternary) position. The numbers to the right of ur is not to be considered as they lie on the same BVHr . Since the diameter of BVH1 is 2 there is atleast one vertex which is a common neighbour of ur and vr. If d is an element such that d neighbour of ur is also a neighbour of vr.Then d is choosen such that ur=vr. Then in the next step d1 is choosen such that ur-1=vr-1. This process continues until u=v.

Algorithm:Procedure Route(u,v)

{

r: right most differing bit position

d:choice such that dur=vr

route to d-neighbour else

route to r-neighbour (k and v are adjacent)

if (u and v are not adjacent) then

d1=choice that dur-1=dvr-1

route to d1 neighbour

}

==Broadcast in Balanced Varietal Hypercube==
Broadcasting refers to a method of transferring a message to all recipients simultaneously.The broadcast primitive finds wide application in the control of distributed systems and in parallel computing. An optimal one-to-all broadcast algorithm is presented for BVHn assuming that concurrent communication through all ports of each processor is possible. It consists of (n+1) steps as shown below.

Procedure Broadcast(u,n):

Step1: send message to 2n neighbours of u

Step 2: one of 2n nodes sends message to its 2n-1 neighbours. Then n nodes from the rest nodes

send message to their (2n-2) neighbours.

Step 3: continue step 2 till all the nodes get the message.

Step 4: end

=Reliable Omega interconnected networks=
===Introduction===
The omega network topology supports both one-to-one routing and broadcast routing. Since every node in the system is a fixed size, the system can be easily scaled to larger systems. The omega network belongs to the multistage interconnection networks (MINs). Since the omega network is designed for larger systems, it depends heavily on reliability, therefore we will also discuss an enhanced version of the omega network <ref name="omega">Bataineh, Sameer; Qanzu'a, Ghassan.; , "Reliable Omega Interconnected Network for Large-Scale Multiprocessor Systems" The Compter Journal, 2003. vol. 46, no. 5, pp.467-475, 203 [http://comjnl.oxfordjournals.org/content/46/5/467.abstract]</ref>.

The original omega distributed-memory multiprocessor is of size...
*N=L x (log2(L) + 1)
** N = number of switches
** L = number of levels
** (log2(L) + 1) = number of stages
Each link between nodes is used bidirectionally to send data.

[[File:Figure_1_Omega_Design.png|400px|center]]
Figure 1 above has 32 nodes with 8 levels. Each node as four links connected it to the previous and next stages. The connections go as follows.
*Output link ''x'' on stage ''g'' connects to input link ''σn(x)'' on stage ''g+1''
**0 ≤ g < log2(L)
**''x'' is an ''n+1'' bit number if the form ''x = {bn,bn-1,...,b1,b0}''
**''σn(x) = {bn−1,...,b2,b1,b0,bn}''
**In the Figure 1 above ''L=8, N=32, n=3, and x=0-15'' and has a four bit representation

===Proposed Reliable Omega Design===
To create a more reliable design, it must ensure that the system maintains its structure after a fault. Therefore for the proposed design there are extra links between nodes and an extra stage the end to improve reliability. This way if a node faults it can be replaced.

The extra links are added as such...
# [[File:Proposed_Link_1.png]] where ''0 ≤ g<n−1'' and '' n = log2(L)''
#[[File:Proposed Link 2.png]]
#[[File:Proposed_Link_3.png]] if ''l'' is even and ''0 < g ≤ n''
#Two links for [[File:Proposed_Link_4.png]]

The added links and nodes are represented in Figure 2 below.
[[File:Figure_2_Proposed_Omega.png|550px|center]]

===Node Replacement Policy and Reconfiguration===
When a node fails in the system it must be reconfigured to replace it and maintain the omega topology.

====Replacement====
If a spare node fails then it does not requeire replacement and can easily be bypassed. However if and active node fails, then it must be replaced by its dual node. The dual node is replaced by it dual node until a spare node replaces the original active node.

The dual of a node is determined by...
*''(g,l)'' has a dual node at ''(g+1, 2 ( l%L/2) + bn)'', where ''g ≤ n − 1''
*... or ''(n,l)'' has a dual node at ''(n + 1, l)'' if that is a spare node to the right in the same level

====Reconfiguration====
A node path can suffer one fail then all other nodes perform the Replacement procedure. However, if another node fails then all nodes after that participate in the Reconfiguration procedure.

The reconfiguration procedure is as follows...
*The failed node is bypassed
*The node ''(k,l)'' in the node path carries out the following steps.
[[File:Reconfig_1.png|300px|border]]
[[File:Reconfig_2.png|300px|border]]
[[File:Reconfig_3.png|300px|border]]

=== Reliability Analysis===
[[File:Reliability_Fig7.png]][[File:Reliability_Fig8.png]]

Figure 7 and Figure 8 above graph the reliability [R(t)] of the Omega, Fault-tollerant Butterfly, and fault-tolerant omega topologies over time. This shows that the fault-tolerant omega demonstrates more reliability over time than the original omega design, even with added complexity.

==K-Ary n-cube Interconnection networks==

==Quiz==

==Referecences==
<references />

User:Mdcotter

2012-04-16T23:10:11Z

Jjohn: /* Broadcast in Balanced Varietal Hypercube */

User:Mdcotter

2012-04-16T23:09:51Z

Jjohn: /* Routing in Balanced Varietal Hypercube */

User:Mdcotter

2012-04-16T23:09:14Z

Jjohn: /* Balanced Varietal Hypercube */

User:Mdcotter

2012-04-16T23:08:55Z

Jjohn: /* Balanced Varietal Hypercube */

User:Mdcotter

2012-04-16T22:53:47Z

Jjohn: /* Balanced Varietal Hypercube */

User:Mdcotter

2012-04-16T22:51:04Z

Jjohn: /* Balanced Hypercube */

File:Evh.jpg

2012-04-16T22:50:19Z

Jjohn:

User:Mdcotter

2012-04-16T18:48:36Z

Jjohn: /* Balanced Hypercube */

User:Mdcotter

2012-04-16T18:47:18Z

Jjohn: /* Balanced Hypercube */

File:Ext2.jpg

2012-04-16T17:53:54Z

Jjohn:

File:Ext1.jpg

2012-04-16T17:52:05Z

Jjohn:

User:Mdcotter

2012-04-16T17:51:39Z

Jjohn: /* Extended Hypercube */

User:Mdcotter

2012-04-16T17:48:46Z

Jjohn: /* Twisted Cubes */

File:Ext.jpg

2012-04-16T17:48:17Z

Jjohn:

User:Mdcotter

2012-04-16T17:41:22Z

Jjohn: /* Twisted Cubes */

User:Mdcotter

2012-04-16T17:40:42Z

Jjohn: /* Twisted Cubes */

File:Twsit-1.jpg

2012-04-16T17:37:15Z

Jjohn:

User:Mdcotter

2012-04-16T17:36:58Z

Jjohn: /* Hypercube */

User:Mdcotter

2012-04-16T17:13:16Z

Jjohn: /* Hypercube */

File:4d-hyper.jpg

2012-04-16T17:12:13Z

Jjohn:

User:Mdcotter

2012-04-16T17:08:47Z

Jjohn: /* Hypercube */

User:Mdcotter

2012-04-16T17:00:08Z

Jjohn: /* Hypercube */

File:1d-cube.jpg

2012-04-16T16:58:39Z

Jjohn:

User:Mdcotter

2012-04-16T16:52:47Z

Jjohn: /* Hypercube */

User:Mdcotter

2012-04-16T16:49:34Z

Jjohn: /* New Interconnection Topologies */

User:Mdcotter

2012-04-16T16:48:47Z

Jjohn: /* Metrics Interconnection Networks */

User:Mdcotter

2012-04-15T21:37:23Z

Jjohn:

CSC/ECE 506 Spring 2012/3b sk

2012-02-21T01:02:25Z

Jjohn: /* References */

= Introduction =

MapReduce is a software framework introduced by Google in 2004 to support [http://publib.boulder.ibm.com/infocenter/txformp/v6r0m0/index.jsp?topic=%2Fcom.ibm.cics.te.doc%2Ferziaz0015.htm distributed computing] on large data sets on clusters of computers.
MapReduce programming model consists of two major steps: map and reduce. In the map step, the problem being solved is divided into a series of sub-problems and distributed to different workers. After collecting results from workers, the computation enters the reduce step to combine and produce the final result.

= Programming Model =
[[File:Mapreduce.png|thumbnail|MapReduce for a Shape Counter]] 
The MapReduce programming model is inspired by [http://enfranchisedmind.com/blog/posts/what-is-a-functional-programming-language/ functional languages] and targets data-intensive computations. The input data format is application-specific, and is specified by the user. The output is a set of <key,value> pairs. The user expresses an algorithm using two functions, Map and Reduce. The Map function is applied on the input data and produces a list of intermediate <key,value> pairs. The Reduce function is applied to all intermediate pairs with the same key. It typically performs some kind of merging operation and produces zero or more output pairs. Finally, the output pairs are sorted by their key value. In the simplest form of MapReduce programs, the programmer provides just the Map function. All other functionality, including the grouping of the intermediate pairs which have the same key and the final sorting, is provided by the runtime.
The main benefit of this model is simplicity. The programmer provides a simple description of the algorithm that focuses on functionality and not on parallelization. The actual parallelization and the details of concurrency management are left to the runtime system. Hence the program code is generic and easily portable across systems. Nevertheless, the model provides sufficient high-level information for parallelization. The Map function can be executed in parallel on non-overlapping portions of the input data and the Reduce function can be executed in parallel on each set of intermediate pairs with the same key. Similarly, since it is explicitly known which pairs each function will operate upon, one can employ pre-fetching or other scheduling optimizations for locality.

= Examples =
Below are a few simple examples of programs that can be easily expressed as MapReduce computations.
*Distributed [http://unixhelp.ed.ac.uk/CGI/man-cgi?grep Grep]: The map function emits a line if it matches a given pattern. The reduce function is an identity function that just copies the supplied intermediate data to the output. 
*Count of URL Access Frequency: The map function processes logs of web page requests and outputs <URL, 1>. The reduce function adds together all values for the same URL and emits a <URL, total count> pair. 
*[http://books.google.com/books?id=gJrmszNHQV4C&pg=PA376&lpg=PA376&dq=what+is+reverse+web+link+graph&source=bl&ots=rLQ2yuV6oc&sig=wimcG_7MR7d9g-ePGXkEK1ANmws&hl=en&sa=X&ei=BtxBT5HkN42DtgefhbXRBQ&ved=0CEwQ6AEwBg#v=onepage&q=what%20is%20reverse%20web%20link%20graph&f=false Reverse Web-Link Graph]: The map function outputs <target, source> pairs for each link to a target URL found in a page named "source". The reduce function concatenates the list of all source URLs associated with a given target URL and emits the pair: <target, list(source)>. 
*Term-Vector per Host: A term vector summarizes the most important words that occur in a document or a set of documents as a list of <word, frequency> pairs. The map function emits a <hostname, term vector> pair for each input document (where the hostname is extracted from the URL of the document). The reduce function is passed all per-document term vectors for a given host. It adds these term vectors together, throwing away infrequent terms, and then emits a final <hostname, term vector> pair. 
*[http://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html Inverted Index]: The map function parses each document, and emits a sequence of <word, document ID> pairs. The reduce function accepts all pairs for a given word, sorts the corresponding document IDs and emits a <word, list(document ID)> pair. The set of all output pairs forms a simple inverted index. It is easy to augment this computation to keep track of word positions.

= Sample Code =
The following pseudocode shows the basic structure of a
MapReduce program that counts the number of occurences
of each word in a collection of documents.
<pre>
//Input : a Document
//Intermediate Output: key = word, value = 1
Map(void * input){
for each word w in Input
Emit Intermediate(w,1)
}

//Intermediate Output key = word, value = 1
//Output : key = word, value = occurrences
Reduce(String key, Iterator values){
int result = 0;
for each v in values
result += v
Emit(w, result)
}
</pre>

= Runtime System =
The MapReduce runtime is responsible for parallelization and concurrency control. To parallelize the Map function, it splits the input pairs into units that are processed concurrently on multiple nodes. Next, the runtime partitions the intermediate pairs using a scheme that keeps pairs with the same key in the same unit. The partitions are processed in parallel by Reduce tasks running on multiple nodes. In both steps, the runtime must decide on factors such as the size of the units, the number of nodes involved, how units are assigned to nodes dynamically, and how buffer space is allocated. The decisions can be fully automatic or guided by the programmer given application specific knowledge. These decisions allow the runtime to execute a program efficiently across a wide range of machines and dataset scenarios without modifications to the source code. Finally, the runtime must merge and sort the output pairs from all Reduce tasks.

= Implementations =
Many different implementations of the MapReduce interface are possible. The right choice depends on the environment. For example, one implementation may be suitable for a small shared-memory machine, another for a large [http://msdn.microsoft.com/en-us/library/ms178144.aspx NUMA] multi-processor, and yet another for an even larger collection of networked machines. Phoenix implements MapReduce for shared-memory systems. Hadoop and Google's MapReduce implement map reduce for large clusters of commodity PCs connected together with switched Ethernet. Mars is a MapReduce framework on graphics processors ([http://www.nvidia.com/object/gpu.html GPUs]).

== Google's MapReduce ==
Google's MapReduce implements MapReduce for large clusters of commodity PCs connected together with switched Ethernet.

=== Execution Overview ===
The Map invocations are distributed across multiple machines by automatically partitioning the input data into a set of M splits. The input splits can be processed in parallel by different machines. Reduce invocations are distributed by partitioning the intermediate key space into R pieces using a partitioning function (e.g., hash(key) mod R). The number of partitions (R) and
the partitioning function are specified by the user.

[[File:Google Map Reduce.jpg]] 
The figure above shows the overall flow of a MapReduce operation in Google's implementation. When the user program calls the MapReduce function, the following sequence of actions occurs (the numbered labels in the figure above correspond to the numbers in the list below):

1. The MapReduce library in the user program first splits the input files into M pieces of typically 16 megabytes to 64 megabytes (MB) per piece (controllable by the user via an optional parameter). It then starts up many copies of the program on a cluster of machines.

2. One of the copies of the program is special . The master. The rest are workers that are assigned work by the master. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a map task or a reduce task.

3. A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory.

4. Periodically, the buffered pairs are written to local disk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers. When a reduce worker is notified by the master about these locations, it uses remote procedure calls
to read the buffered data from the local disks of the map workers. When a reduce worker has read all intermediate data, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. The sorting is needed because typically many different keys map to the same reduce task. If the amount of intermediate data is too large to fit in memory, an external sort is used.

6. The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user's Reduce function. The output of the Reduce function is appended to a final output file for this reduce partition.

7. When all map tasks and reduce tasks have been completed, the master wakes up the user program. At this point, the MapReduce call in the user program returns back to the user code.
After successful completion, the output of the mapreduce execution is available in the R output files (one per reduce task, with file names as specified by the user). Typically, users do not need to combine these R output files into one file . they often pass these files as input to another MapReduce call, or use them from another distributed application that is able to deal with input that is partitioned into multiple files.

=== Master Data Structures ===

The master keeps several data structures. For each map task and reduce task, it stores the state (idle, in-progress, or completed), and the identity of the worker machine (for non-idle tasks).
The master is the conduit through which the location of intermediate file regions is propagated from map tasks to reduce tasks. Therefore, for each completed map task, the master stores the locations and sizes of the R intermediate file regions produced by the map task. Updates to this location and size information are received as map tasks are completed. The information is pushed incrementally to workers that have in-progress reduce tasks.

=== Fault Tolerance ===

Since the MapReduce library is designed to help process very large amounts of data using hundreds or thousands of machines, the library must tolerate machine failures gracefully.

==== Worker Failure ====

The master pings every worker periodically. If no response is received from a worker in a certain amount of time, the master marks the worker as failed. Any map tasks completed by the worker are reset back to their initial idle state, and therefore become eligible for scheduling
on other workers. Similarly, any map task or reduce task in progress on a failed worker is also reset to idle and becomes eligible for rescheduling. Completed map tasks are re-executed on a failure because their output is stored on the local disk(s) of the failed machine and is therefore inaccessible. Completed reduce tasks do not need to be re-executed since their output is stored in a global file system. When a map task is executed first by worker A and then later executed by worker B (because A failed), all workers executing reduce tasks are notified of the re-execution. Any reduce task that has not already read the data from worker A will read the data from worker B. MapReduce is resilient to large-scale worker failures.

==== Master Failure ====

It is easy to make the master write periodic checkpoints of the master data structures described above. If the master task dies, a new copy can be started from the last checkpointed state. However, given that there is only a single master, its failure is unlikely; therefore Google's current implementation aborts the MapReduce computation if the master fails. Clients can check for this condition and retry the MapReduce operation if they desire.

== Phoenix ==
Phoenix implements MapReduce for shared-memory systems. Its goal is to support efficient execution on multiple cores without burdening the programmer with concurrency management. Phoenix consists of a simple API that is visible to application programmers and an efficient runtime that handles parallelization, resource management, and fault recovery.

=== Phoenix API ===
The current Phoenix implementation provides an application-programmer interface (API) for C and C++.The first set is provided by Phoenix and is used by the programmer’s application code to initialize the system and emit output pairs (1 required and 2 optional functions). The second set includes the functions that the programmer defines (3 required and 2 optional functions). Apart from the Map and Reduce functions, the user provides functions that partition the data before each step and a function that implements key comparison. The API is type agnostic. The function arguments are declared as void pointers wherever possible to provide flexibility in their declaration and fast use without conversion overhead. The data structure used to communicate basic function information and buffer allocation between the user code and runtime is of type scheduler_args_t. There are additional data structure types to facilitate communication between the Splitter, Map, Partition, and Reduce functions. These types use pointers whenever possible to implement communication without actually copying significant amounts of data.
The Phoenix API does not rely on any specific compiler options and does not require a parallelizing compiler. However, it assumes that its functions can freely use stack-allocated and heap-allocated structures for private data. It also assumes that there is no communication through shared-memory structures other than the input/output buffers for these functions. For C/C++, these assumptions cannot be checked statically for arbitrary programs. Although there are stringent checks within the system to ensure valid data are communicated between user and runtime code, eventually it is the task of user to provide functionally correct code.

[[File:Phoenix.jpg]] 

The Phoenix runtime was developed on top of [http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html Pthreads], but can be easily ported to other shared memory thread packages. The figure above shows the basic data flow for the runtime system. The runtime is controlled by the scheduler, which is initiated by user code. The scheduler creates and manages the threads that run all Map and Reduce tasks. It also manages the buffers used for task communication. The programmer provides the scheduler with all the required data and function pointers through the scheduler_args_t structure. After initialization, the scheduler determines the number of cores to use for this computation. For each core, it spawns a worker thread that is dynamically assigned some number of Map and Reduce tasks.
To start the Map stage, the scheduler uses the Splitter to divide input pairs into equally sized units to be processed by the Map tasks. The Splitter is called once per Map task and returns a pointer to the data the Map task will process. The Map tasks are allocated dynamically to workers and each one emits intermediate <key,value> pairs. The Partition function splits the intermediate pairs into units for the Reduce tasks. The function ensures all values of the same key go to the same unit. Within each buffer, values are ordered by key to assist with the final sorting. At this point, the Map stage is over. The scheduler must wait for all Map tasks to complete before initiating the Reduce stage.
Reduce tasks are also assigned to workers dynamically, similar to Map tasks. The one difference is that, while with Map tasks there is complete freedom in distributing pairs across tasks, with Reduce all values for the same key must be processed in one task. Hence, the Reduce stage may exhibit higher imbalance across workers and dynamic scheduling is more important. The output of each Reduce task is already sorted by key. As the last step, the final output from all tasks is merged into a single buffer, sorted by keys.

=== Buffer Management ===
Two types of temporary buffers are necessary to store data between the various stages. All buffers are allocated in shared memory but are accessed in a well specified way by a few functions. To re-arrange buffers (e.g., split across tasks), pointer manipulation is done instead of the actual pairs, which may be large in size. The intermediate buffers are not directly visible to user code. Map-Reduce buffers are used to store the intermediate output pairs. Each worker has its own set of buffers. The buffers are initially sized to a default value and then resized dynamically as needed. At this stage, there may be multiple pairs with the same key. To accelerate the Partition function, the Emit intermediate function stores all values for the same key in the same buffer. At the end of the Map task, each buffer is sorted by key order. Reduce- Merge buffers are used to store the outputs of Reduce tasks before they are sorted. At this stage, each key has only one value associated with it. After sorting, the final output is available in the user allocated Output data buffer.

== Map Reduce on Graphics Processors ==

Compared with CPUs, the hardware architecture of GPUs differs significantly. For instance, current GPUs have over one hundred [http://encyclopedia.jrank.org/articles/pages/6904/SIMD-Single-Instruction-Multiple-Data-Processing.html SIMD (Single Instruction Multiple Data)] processors whereas current multi-core CPUs offer a much smaller number of cores. Moreover, most GPUs do not support atomic operations or locks. Due to the architectural differences, there are following three technical challenges in implementing the MapReduce framework on the GPU. First, the synchronization overhead in the runtime system of the framework must be low so that the system can scale to hundreds of processors. Second, due to the lack of dynamic thread scheduling on current GPUs, it is essential to allocate work evenly across threads on the GPU to exploit its massive thread parallelism. Third, the core tasks of MapReduce programs, including string processing, file manipulation and concurrent reads and writes, are unconventional to GPUs and must be handled efficiently.

Mars, MapReduce framework on the GPU was designed and implemented with these challenges in mind. Mars provides a small set of APIs that are similar to those of CPU-based MapReduce. Runtime system utilizes a large number of GPU threads for Map or Reduce tasks, and automatically assigns each thread a small number of key/value pairs to work on. As a result, the massive thread parallelism on the GPU is well utilized. To avoid any conflict between concurrent writes, Mars has a lock-free scheme with low runtime overhead on the massive thread parallelism of the GPU. This scheme guarantees the correctness of parallel execution with little synchronization overhead.

=== Mars API ===
Mars provides a small set of APIs. Similar to the existing MapReduce frameworks, Mars has two kinds of APIs, the user-implemented APIs, which the users implement, and the system-provided APIs, which the users can use as library calls. Mars has the following user-implemented APIs. These APIs are implemented with C/C++. void* type has been used so that the developer can manipulate strings and other complex data types conveniently.

<pre>
//MAP_COUNT counts result size of the map function.
voidMAP_COUNT(void *key, void *val, int keySize, int valSize);
//The map function.
voidMAP(void *key, void* val, int keySize, int valSize);
//REDUCE_COUNT counts result size of the reduce function.
void REDUCE_COUNT(void* key, void* vals, int keySize, int valCount);
//The reduce function.
void REDUCE(void* key, void* vals, int keySize, int valCount);
</pre>
Mars has the following four system-provided APIs. The emit functions are used in user-implemented map and reduce functions to output the intermediate/final results.
<pre>
//Emit the key size and the value size inMAP_COUNT.
void EMIT_INTERMEDIATE_COUNT(int keySize, int valSize);
//Emit an intermediate result in MAP.
void EMIT_INTERMEDIATE(void* key, void* val, int keySize, int valSize);
//Emit the key size and the value size in REDUCE_COUNT.
void EMIT_COUNT(int keySize, int valSize);
//Emit a final result in REDUCE.
void EMIT(void *key, void* val, int keySize, int valSize);
</pre>
Overall, the APIs in Mars are similar to those in the existing MapReduce frameworks such as Hadoop and Phoenix. The major difference is that Mars needs two APIs to implement the functionality of each CPU-based API. One is to count the size of results, and the other one is to output the results. This is because the GPU does not support atomic operations, and the Mars runtime uses a two-step design for the result output.

=== Implementation Details ===

Since the GPU does not support dynamic memory allocation on the device memory during the execution of the GPU code, arrays are used as the main data structure. The input data, the intermediate result and the final result are stored in three kinds of arrays, i.e., the key array, the value array and the directory index. The directory index consists of an entry of <key offset, key size, value offset, value size> for each key/value pair. Given a directory index entry, the key or the value at the corresponding offset in the key array or the value array is fetched. With the array structure, for the input data as well as for the result output the space on the device memory is allocated before executing the GPU program. However, the sizes of the output from the map and the reduce stages are unknown.The output scheme for the map stage is similar to that for the reduce stage.

First, each map task outputs three counts, i.e., the number of intermediate results, the total size of keys (in bytes) and the total size of values (in bytes) generated by the map task. Based on key sizes (or value sizes) of all map tasks, the runtime system computes a prefix sum on these sizes and produces an array of write locations. A write location is the start location in the output array for the corresponding map task to write. Based on the number of intermediate results, the runtime system computes a prefix sum and produces an array of start locations in the output directory index for the corresponding map task. Through these prefix sums, the sizes of the arrays for the intermediate result is also known. Thus, the runtime allocates arrays in the device memory with the exact size for storing the intermediate results.

Second, each map task outputs the intermediate key/value pairs to the output array and updates the directory index. Since each map has its deterministic and non-overlapping positions to write to, the write conflicts are avoided. This two-step scheme does not require the hardware support of atomic functions. It is suitable for the massive thread parallelism on the GPU. However, it doubles the map computation in the worst case. The overhead of this scheme is application dependent, and is usually much smaller than that in the worst case.

=== Optimization Techniques ===
==== Memory Optimizations ====
Two memory optimizations are used to reduce the number of memory requests in order to improve the memory bandwidth utilization.
===== Coalesced accesses =====
The GPU feature of coalesced accesses is utilized to improve the memory performance. The memory accesses of each thread to the data arrays are designed according to the coalesced access pattern when applicable. Suppose there are T threads in total and the number of key/value pairs is N in the map stage. Thread i processes the (i + T • k )th (k=0,..,N/T) key/value pair. Due to the SIMD property of the GPU, the memory addresses from the threads within a thread group are consecutive and these accesses are coalesced into one. The figure below illustrates the map stage with and without the coalesced access optimization. 
[[File:Mars.jpg]] 

===== Accesses using built-in vector types =====
Accessing the values in the device memory can be costly, because the data values are often
of different sizes and the accesses are hardly coalesced. Fortunately, GPUs such as G80 support built-in vector types such as char4 and int4. Reading built-in vectors fetches the entire vector
in a single memory request. Compared with reading char or int, the number of memory requests is greatly reduced and the memory performance is improved.

==== Thread parallelism ====
The thread configuration, i.e., the number of thread groups and the number of threads per thread group, is related to multiple factors including, (1) the hardware configuration such as the number of multiprocessors and the on-chip computation resources such as the number of registers on each multiprocessor, (2) the computation characteristics of the map and the reduce tasks, e.g., they are memory- or computation-intensive. Since the map and the reduce functions are implemented by the developer, and their costs are unknown to the runtime system, it is difficult to find the optimal setting for the thread configuration at
run time.

==== Handling variable-sized types ====
The variable-sized types are supported with the directory index. If two key/value pairs need to be swapped, their corresponding entries in the directory index are swapped without modifying the key and the value arrays. This choice is to save the swapping cost since the directory entries are typically much smaller than the key/value pairs. Even though swapping changes the order of entries in the directory index, the array layout is preserved and therefore accesses to the directory index can still be coalesced after swaps. Since strings are a typical variable-sized type, and string processing is common in web data analysis tasks, a GPU-based string manipulation library was developed for Mars. The operations in the library include strcmp, strcat, memset and so on. The APIs of these operations are consistent with those in C/C++ library on the CPU. The difference is that simple algorithms for these GPU-based string operations were used, since they usually handle small strings within a map or a reduce task. In addition, char4 is used to implement strings to optimize the memory performance.

==== Hashing ====
Hashing is used in the sort algorithm to store the results with the same key value consecutively. In that case, it is not needed that the results with the key values are in their strict ascending/ decreasing order. The hashing technique that hashes a key into a 32-bit integer is used, and the records are sorted according to their hash values. When two records are compared, their hash values are compared first. Only when their hash values are the same, their keys are fetched and compared. Given a good hash function, the probability of comparing the keys is low.

==== File manipulation ====
Currently, the GPU cannot directly access the data in the hard disk. Thus, the file manipulation with the assistance of the CPU is performed in three phases. First, the file I/O on the CPU is performed and the file data is loaded into a buffer in the main memory. To reduce the I/O stall, multiple threads are used to perform the I/O task. Second, the preprocessing on the buffered data is performed and the input key/value pairs are obtained. Finally, the input key/value pairs are copied to the GPU device memory.

= Summary =
Google’s MapReduce runtime implementation targets large clusters of Linux PCs connected through Ethernet switches. Tasks are forked using remote procedure calls. Buffering and communication occurs by reading and writing files on a distributed file system. The locality optimizations focus mostly on avoiding remote file accesses. While such a system is effective with distributed computing, it leads to very high overheads if used with shared-memory systems that facilitate communication through memory and are typically of much smaller scale.

Phoenix, implementation of MapReduce uses shared memory and minimizes the overheads of task spawning and data communication. With Phoenix,the programmer can provide a simple, functional expression of the algorithm and leaves parallelization and scheduling to the runtime system.Phoenix leads to scalable performance for both multi-core chips and conventional symmetric multiprocessors. Phoenix automatically handles key scheduling decisions during parallel execution. Despite runtime overheads, results have shown that performance of Phoenix to that of parallel code written in P-threads API are almost similar. Nevertheless,there are also applications that do not fit naturally in the MapReduce model for which P-threads code performs significantly better.

Graphics processors have emerged as a commodity platform for parallel computing. However, the developer requires the knowledge of the GPU architecture and much effort in developing GPU applications. Such difficulty is even more for complex and performance centric tasks such as web data analysis. Since MapReduce has been successful in easing the development of web data analysis tasks, one can use a GPU-based MapReduce for these applications. With the GPU-based framework, the developer writes their code using the simple and familiar MapReduce interfaces. The runtime on the GPU is completely hidden from the developer by the framework.

= References =
[[#1body|1.]] [http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/mapreduce-osdi04.pdfMapReduce Simplified Data Processing on Large Clusters. Sanjay Ghemawat, Jeffrey Dean ] 
[[#1body|2.]] [http://www.ece.rutgers.edu/~parashar/Classes/08-09/ece572/readings/mars-pact-08.pdf Mars: A MapReduce Framework on Graphics Processors. Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, Tuyong Wang] 
[[#1body|3.]] [http://csl.stanford.edu/~christos/publications/2007.cmp_mapreduce.hpca.pdf Evaluating MapReduce for Multi-core and Multiprocessor Systems. Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis] 
[[#1body|4.]] [http://hadoop.apache.org/mapreduce/ Hadoop MapReduce] 
[[#1body|5.]] [http://code.google.com/edu/parallel/mapreduce-tutorial.html Google MapReduce] 
[[#1body|6.]] [http://csl.stanford.edu/~christos/publications/2011.phoenixplus.mapreduce.pdf Phoenix++: Modular MapReduce for Shared Memory Systems. Justin Talbot, Richard M.Yoo Christos Kozyrakis ]