<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Smalexa2</id>
	<title>Expertiza_Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.expertiza.ncsu.edu/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Smalexa2"/>
	<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=Special:Contributions/Smalexa2"/>
	<updated>2026-06-03T21:05:27Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62781</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62781"/>
		<updated>2012-04-26T05:52:53Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* CHIPPER (Cheap-Interconnect Partially Permuting Router) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[14]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[2]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[3]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
There are many different topologies that could be introduced in this section. Some of the missing topologies include but are not limited to:&lt;br /&gt;
&lt;br /&gt;
* Hypercube&lt;br /&gt;
* Shuffle-exchange&lt;br /&gt;
* Torus&lt;br /&gt;
* Trees&lt;br /&gt;
&lt;br /&gt;
They are just cited here for completion, related information can be found at [http://www.cs.cf.ac.uk/Parallel/Year2/section5.html Interconnection Networks]&lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|upright=0.75|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[2]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|upright=0.75|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[4]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|upright=0.75|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|upright=0.75|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication. There is a 5-port router inside of each of the computing nodes and the communication is carried out through message-passing. Its name comes from the one trillion mathematical calculations per second (1 Teraflops) of performance, accomplished with the 80 simple cores with each containing 2 floating point units and all of this consuming only 62 watts (less than many other processors).&lt;br /&gt;
 &lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth. This design contains 48 fully functional cores and consumes only 25 watts. This newer model is more complete than the Teraflops Research model. It is full programmable and used for research by academia and private companies.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
The tiles that conform the Tilera designs contain a complete processor with L1 and L2 caches. And each one can run an operating system in an independent manner or several tiles can run, together as a whole, an operating system like SMP Linux, for example.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[5]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
The 3 building blocks can be used to create the specific design needed, with the input/output ports that the application requires. The blocks can be configured and stored in a library for creating the design. In the picture on the right, the example contains 2 of the building blocks (router and network interface) and a third undisclosed block.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional 16B-wide data rings, two in each direction. The name of the interconnect is the Element Interconnect Bus (EIB) and allows for communication among the different components of the Cell, among them and with the external I/O. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3]. The Cell consists of a PowerPC core which manages eight synergistic processing engines (SPEs) that can be used for floating-point calculations. These calculations provide the engine for better gaming systems.&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[7]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[7]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, both resulting in packets never reaching their destination.  They are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[8]]] Since a waiting activity cannot finish, the messages are deadlocked.  This is analogous to the [http://en.wikipedia.org/wiki/Dining_philosophers_problem Dining Philosophers Problem], each deadlocked message is waiting on the result of another deadlocked message, and none are able to reach their destination.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[8]]]  This is similar to deadlock in that the message never reaches its destination, but the message is still able to travel through portions of the network, making hops but never reaching its target.  This is analogous to a process spinning while waiting, the process itself is doing meaningless work but it is still active.  &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[7]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - This variation models up to two future hops before deciding where to send the packet next.    &lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - This variation adds packet multicast support to the protocol.&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - This variation adds the ability to broadcast messages to only certain regions (segments) of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[10]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[10]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this by choosing a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[11]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[9]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[6]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[12]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[13]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
 &lt;br /&gt;
[1] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[2] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[3] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[4] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[5] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[3] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[4] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[5] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[6] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[7] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[8] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[9] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[10] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[11] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[12] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[13] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[14] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;br /&gt;
&lt;br /&gt;
== Quiz ==&lt;br /&gt;
1. Advantage of 2-D Mesh&lt;br /&gt;
&lt;br /&gt;
a) simple design&lt;br /&gt;
&lt;br /&gt;
b) cumbersome design&lt;br /&gt;
&lt;br /&gt;
c) degree is the same for all nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Diameter is&lt;br /&gt;
&lt;br /&gt;
a) minimum hop count&lt;br /&gt;
&lt;br /&gt;
b) maximum hop count&lt;br /&gt;
&lt;br /&gt;
c) number of neighbors &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3. SOC stands for&lt;br /&gt;
&lt;br /&gt;
a) System of Chips&lt;br /&gt;
&lt;br /&gt;
b) Switch of Cores&lt;br /&gt;
&lt;br /&gt;
c) System on a Chip&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4. In a direct topology,&lt;br /&gt;
&lt;br /&gt;
a) each node contains a network interface acting as a router in order to transfer information&lt;br /&gt;
&lt;br /&gt;
b) there are nodes that act as routers&lt;br /&gt;
&lt;br /&gt;
c) only one node is a computational nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5. The Single-Chip Cloud Computer contains &lt;br /&gt;
&lt;br /&gt;
a) an 8x10 mesh&lt;br /&gt;
&lt;br /&gt;
b) a 64-router mesh network&lt;br /&gt;
&lt;br /&gt;
c) a 24-router mesh network&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6. A deterministic routing scheme uses algorithms to determine the most advantageous path to the target node.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
7. Livelock is necessary to maintain coherence in routing protocols.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
8. Dimension Order routing&lt;br /&gt;
&lt;br /&gt;
a) is only possible with 2D mesh-based topologies.&lt;br /&gt;
&lt;br /&gt;
b) attempts to route all packets in one dimension before starting another.&lt;br /&gt;
&lt;br /&gt;
c) uses routing tables to find the packet destination.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
9. Source routing&lt;br /&gt;
&lt;br /&gt;
a) includes information in the packet about the destination node&lt;br /&gt;
&lt;br /&gt;
b) uses routing information calculated by the sending node&lt;br /&gt;
&lt;br /&gt;
c) all of the above&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
10. Store and forward routing&lt;br /&gt;
&lt;br /&gt;
a) requires the entire message to be broken into regular sized pieces and sent over the network&lt;br /&gt;
&lt;br /&gt;
b) is an optimal routing protocol&lt;br /&gt;
&lt;br /&gt;
c) buffers the entire message in each node along the route before sending it to the next node&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62780</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62780"/>
		<updated>2012-04-26T05:51:30Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Logic Based Distributed Routing (LBDR) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[14]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[2]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[3]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
There are many different topologies that could be introduced in this section. Some of the missing topologies include but are not limited to:&lt;br /&gt;
&lt;br /&gt;
* Hypercube&lt;br /&gt;
* Shuffle-exchange&lt;br /&gt;
* Torus&lt;br /&gt;
* Trees&lt;br /&gt;
&lt;br /&gt;
They are just cited here for completion, related information can be found at [http://www.cs.cf.ac.uk/Parallel/Year2/section5.html Interconnection Networks]&lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|upright=0.75|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[2]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|upright=0.75|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[4]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|upright=0.75|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|upright=0.75|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication. There is a 5-port router inside of each of the computing nodes and the communication is carried out through message-passing. Its name comes from the one trillion mathematical calculations per second (1 Teraflops) of performance, accomplished with the 80 simple cores with each containing 2 floating point units and all of this consuming only 62 watts (less than many other processors).&lt;br /&gt;
 &lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth. This design contains 48 fully functional cores and consumes only 25 watts. This newer model is more complete than the Teraflops Research model. It is full programmable and used for research by academia and private companies.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
The tiles that conform the Tilera designs contain a complete processor with L1 and L2 caches. And each one can run an operating system in an independent manner or several tiles can run, together as a whole, an operating system like SMP Linux, for example.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[5]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
The 3 building blocks can be used to create the specific design needed, with the input/output ports that the application requires. The blocks can be configured and stored in a library for creating the design. In the picture on the right, the example contains 2 of the building blocks (router and network interface) and a third undisclosed block.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional 16B-wide data rings, two in each direction. The name of the interconnect is the Element Interconnect Bus (EIB) and allows for communication among the different components of the Cell, among them and with the external I/O. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3]. The Cell consists of a PowerPC core which manages eight synergistic processing engines (SPEs) that can be used for floating-point calculations. These calculations provide the engine for better gaming systems.&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[7]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[7]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, both resulting in packets never reaching their destination.  They are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[8]]] Since a waiting activity cannot finish, the messages are deadlocked.  This is analogous to the [http://en.wikipedia.org/wiki/Dining_philosophers_problem Dining Philosophers Problem], each deadlocked message is waiting on the result of another deadlocked message, and none are able to reach their destination.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[8]]]  This is similar to deadlock in that the message never reaches its destination, but the message is still able to travel through portions of the network, making hops but never reaching its target.  This is analogous to a process spinning while waiting, the process itself is doing meaningless work but it is still active.  &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[7]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - This variation models up to two future hops before deciding where to send the packet next.    &lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - This variation adds packet multicast support to the protocol.&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - This variation adds the ability to broadcast messages to only certain regions (segments) of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[10]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[10]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[11]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[9]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[6]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[12]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[13]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
 &lt;br /&gt;
[1] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[2] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[3] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[4] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[5] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[3] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[4] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[5] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[6] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[7] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[8] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[9] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[10] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[11] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[12] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[13] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[14] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;br /&gt;
&lt;br /&gt;
== Quiz ==&lt;br /&gt;
1. Advantage of 2-D Mesh&lt;br /&gt;
&lt;br /&gt;
a) simple design&lt;br /&gt;
&lt;br /&gt;
b) cumbersome design&lt;br /&gt;
&lt;br /&gt;
c) degree is the same for all nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Diameter is&lt;br /&gt;
&lt;br /&gt;
a) minimum hop count&lt;br /&gt;
&lt;br /&gt;
b) maximum hop count&lt;br /&gt;
&lt;br /&gt;
c) number of neighbors &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3. SOC stands for&lt;br /&gt;
&lt;br /&gt;
a) System of Chips&lt;br /&gt;
&lt;br /&gt;
b) Switch of Cores&lt;br /&gt;
&lt;br /&gt;
c) System on a Chip&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4. In a direct topology,&lt;br /&gt;
&lt;br /&gt;
a) each node contains a network interface acting as a router in order to transfer information&lt;br /&gt;
&lt;br /&gt;
b) there are nodes that act as routers&lt;br /&gt;
&lt;br /&gt;
c) only one node is a computational nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5. The Single-Chip Cloud Computer contains &lt;br /&gt;
&lt;br /&gt;
a) an 8x10 mesh&lt;br /&gt;
&lt;br /&gt;
b) a 64-router mesh network&lt;br /&gt;
&lt;br /&gt;
c) a 24-router mesh network&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6. A deterministic routing scheme uses algorithms to determine the most advantageous path to the target node.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
7. Livelock is necessary to maintain coherence in routing protocols.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
8. Dimension Order routing&lt;br /&gt;
&lt;br /&gt;
a) is only possible with 2D mesh-based topologies.&lt;br /&gt;
&lt;br /&gt;
b) attempts to route all packets in one dimension before starting another.&lt;br /&gt;
&lt;br /&gt;
c) uses routing tables to find the packet destination.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
9. Source routing&lt;br /&gt;
&lt;br /&gt;
a) includes information in the packet about the destination node&lt;br /&gt;
&lt;br /&gt;
b) uses routing information calculated by the sending node&lt;br /&gt;
&lt;br /&gt;
c) all of the above&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
10. Store and forward routing&lt;br /&gt;
&lt;br /&gt;
a) requires the entire message to be broken into regular sized pieces and sent over the network&lt;br /&gt;
&lt;br /&gt;
b) is an optimal routing protocol&lt;br /&gt;
&lt;br /&gt;
c) buffers the entire message in each node along the route before sending it to the next node&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62779</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62779"/>
		<updated>2012-04-26T05:43:05Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Deadlock and Livelock */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[14]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[2]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[3]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
There are many different topologies that could be introduced in this section. Some of the missing topologies include but are not limited to:&lt;br /&gt;
&lt;br /&gt;
* Hypercube&lt;br /&gt;
* Shuffle-exchange&lt;br /&gt;
* Torus&lt;br /&gt;
* Trees&lt;br /&gt;
&lt;br /&gt;
They are just cited here for completion, related information can be found at [http://www.cs.cf.ac.uk/Parallel/Year2/section5.html Interconnection Networks]&lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|upright=0.75|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[2]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|upright=0.75|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[4]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|upright=0.75|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|upright=0.75|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication. There is a 5-port router inside of each of the computing nodes and the communication is carried out through message-passing. Its name comes from the one trillion mathematical calculations per second (1 Teraflops) of performance, accomplished with the 80 simple cores with each containing 2 floating point units and all of this consuming only 62 watts (less than many other processors).&lt;br /&gt;
 &lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth. This design contains 48 fully functional cores and consumes only 25 watts. This newer model is more complete than the Teraflops Research model. It is full programmable and used for research by academia and private companies.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
The tiles that conform the Tilera designs contain a complete processor with L1 and L2 caches. And each one can run an operating system in an independent manner or several tiles can run, together as a whole, an operating system like SMP Linux, for example.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[5]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
The 3 building blocks can be used to create the specific design needed, with the input/output ports that the application requires. The blocks can be configured and stored in a library for creating the design. In the picture on the right, the example contains 2 of the building blocks (router and network interface) and a third undisclosed block.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional 16B-wide data rings, two in each direction. The name of the interconnect is the Element Interconnect Bus (EIB) and allows for communication among the different components of the Cell, among them and with the external I/O. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3]. The Cell consists of a PowerPC core which manages eight synergistic processing engines (SPEs) that can be used for floating-point calculations. These calculations provide the engine for better gaming systems.&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[7]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[7]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, both resulting in packets never reaching their destination.  They are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[8]]] Since a waiting activity cannot finish, the messages are deadlocked.  This is analogous to the [http://en.wikipedia.org/wiki/Dining_philosophers_problem Dining Philosophers Problem], each deadlocked message is waiting on the result of another deadlocked message, and none are able to reach their destination.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[8]]]  This is similar to deadlock in that the message never reaches its destination, but the message is still able to travel through portions of the network, making hops but never reaching its target.  This is analogous to a process spinning while waiting, the process itself is doing meaningless work but it is still active.  &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[7]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[10]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[10]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[11]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[9]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[6]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[12]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[13]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
 &lt;br /&gt;
[1] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[2] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[3] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[4] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[5] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[3] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[4] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[5] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[6] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[7] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[8] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[9] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[10] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[11] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[12] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[13] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[14] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;br /&gt;
&lt;br /&gt;
== Quiz ==&lt;br /&gt;
1. Advantage of 2-D Mesh&lt;br /&gt;
&lt;br /&gt;
a) simple design&lt;br /&gt;
&lt;br /&gt;
b) cumbersome design&lt;br /&gt;
&lt;br /&gt;
c) degree is the same for all nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Diameter is&lt;br /&gt;
&lt;br /&gt;
a) minimum hop count&lt;br /&gt;
&lt;br /&gt;
b) maximum hop count&lt;br /&gt;
&lt;br /&gt;
c) number of neighbors &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3. SOC stands for&lt;br /&gt;
&lt;br /&gt;
a) System of Chips&lt;br /&gt;
&lt;br /&gt;
b) Switch of Cores&lt;br /&gt;
&lt;br /&gt;
c) System on a Chip&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4. In a direct topology,&lt;br /&gt;
&lt;br /&gt;
a) each node contains a network interface acting as a router in order to transfer information&lt;br /&gt;
&lt;br /&gt;
b) there are nodes that act as routers&lt;br /&gt;
&lt;br /&gt;
c) only one node is a computational nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5. The Single-Chip Cloud Computer contains &lt;br /&gt;
&lt;br /&gt;
a) an 8x10 mesh&lt;br /&gt;
&lt;br /&gt;
b) a 64-router mesh network&lt;br /&gt;
&lt;br /&gt;
c) a 24-router mesh network&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6. A deterministic routing scheme uses algorithms to determine the most advantageous path to the target node.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
7. Livelock is necessary to maintain coherence in routing protocols.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
8. Dimension Order routing&lt;br /&gt;
&lt;br /&gt;
a) is only possible with 2D mesh-based topologies.&lt;br /&gt;
&lt;br /&gt;
b) attempts to route all packets in one dimension before starting another.&lt;br /&gt;
&lt;br /&gt;
c) uses routing tables to find the packet destination.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
9. Source routing&lt;br /&gt;
&lt;br /&gt;
a) includes information in the packet about the destination node&lt;br /&gt;
&lt;br /&gt;
b) uses routing information calculated by the sending node&lt;br /&gt;
&lt;br /&gt;
c) all of the above&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
10. Store and forward routing&lt;br /&gt;
&lt;br /&gt;
a) requires the entire message to be broken into regular sized pieces and sent over the network&lt;br /&gt;
&lt;br /&gt;
b) is an optimal routing protocol&lt;br /&gt;
&lt;br /&gt;
c) buffers the entire message in each node along the route before sending it to the next node&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62778</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62778"/>
		<updated>2012-04-26T05:31:44Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Quiz */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[14]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[2]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[3]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
There are many different topologies that could be introduced in this section. Some of the missing topologies include but are not limited to:&lt;br /&gt;
&lt;br /&gt;
* Hypercube&lt;br /&gt;
* Shuffle-exchange&lt;br /&gt;
* Torus&lt;br /&gt;
* Trees&lt;br /&gt;
&lt;br /&gt;
They are just cited here for completion, related information can be found at [http://www.cs.cf.ac.uk/Parallel/Year2/section5.html Interconnection Networks]&lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|upright=0.75|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[2]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|upright=0.75|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[4]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|upright=0.75|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|upright=0.75|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication. There is a 5-port router inside of each of the computing nodes and the communication is carried out through message-passing. Its name comes from the one trillion mathematical calculations per second (1 Teraflops) of performance, accomplished with the 80 simple cores with each containing 2 floating point units and all of this consuming only 62 watts (less than many other processors).&lt;br /&gt;
 &lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth. This design contains 48 fully functional cores and consumes only 25 watts. This newer model is more complete than the Teraflops Research model. It is full programmable and used for research by academia and private companies.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
The tiles that conform the Tilera designs contain a complete processor with L1 and L2 caches. And each one can run an operating system in an independent manner or several tiles can run, together as a whole, an operating system like SMP Linux, for example.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[5]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
The 3 building blocks can be used to create the specific design needed, with the input/output ports that the application requires. The blocks can be configured and stored in a library for creating the design. In the picture on the right, the example contains 2 of the building blocks (router and network interface) and a third undisclosed block.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional 16B-wide data rings, two in each direction. The name of the interconnect is the Element Interconnect Bus (EIB) and allows for communication among the different components of the Cell, among them and with the external I/O. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3]. The Cell consists of a PowerPC core which manages eight synergistic processing engines (SPEs) that can be used for floating-point calculations. These calculations provide the engine for better gaming systems.&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[7]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[7]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[8]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[8]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[7]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[10]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[10]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[11]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[9]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[6]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[12]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[13]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
 &lt;br /&gt;
[1] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[2] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[3] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[4] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[5] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[3] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[4] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[5] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[6] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[7] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[8] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[9] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[10] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[11] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[12] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[13] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[14] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;br /&gt;
&lt;br /&gt;
== Quiz ==&lt;br /&gt;
1. Advantage of 2-D Mesh&lt;br /&gt;
&lt;br /&gt;
a) simple design&lt;br /&gt;
&lt;br /&gt;
b) cumbersome design&lt;br /&gt;
&lt;br /&gt;
c) degree is the same for all nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Diameter is&lt;br /&gt;
&lt;br /&gt;
a) minimum hop count&lt;br /&gt;
&lt;br /&gt;
b) maximum hop count&lt;br /&gt;
&lt;br /&gt;
c) number of neighbors &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3. SOC stands for&lt;br /&gt;
&lt;br /&gt;
a) System of Chips&lt;br /&gt;
&lt;br /&gt;
b) Switch of Cores&lt;br /&gt;
&lt;br /&gt;
c) System on a Chip&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4. In a direct topology,&lt;br /&gt;
&lt;br /&gt;
a) each node contains a network interface acting as a router in order to transfer information&lt;br /&gt;
&lt;br /&gt;
b) there are nodes that act as routers&lt;br /&gt;
&lt;br /&gt;
c) only one node is a computational nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5. The Single-Chip Cloud Computer contains &lt;br /&gt;
&lt;br /&gt;
a) an 8x10 mesh&lt;br /&gt;
&lt;br /&gt;
b) a 64-router mesh network&lt;br /&gt;
&lt;br /&gt;
c) a 24-router mesh network&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6. A deterministic routing scheme uses algorithms to determine the most advantageous path to the target node.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
7. Livelock is necessary to maintain coherence in routing protocols.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
8. Dimension Order routing&lt;br /&gt;
&lt;br /&gt;
a) is only possible with 2D mesh-based topologies.&lt;br /&gt;
&lt;br /&gt;
b) attempts to route all packets in one dimension before starting another.&lt;br /&gt;
&lt;br /&gt;
c) uses routing tables to find the packet destination.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
9. Source routing&lt;br /&gt;
&lt;br /&gt;
a) includes information in the packet about the destination node&lt;br /&gt;
&lt;br /&gt;
b) uses routing information calculated by the sending node&lt;br /&gt;
&lt;br /&gt;
c) all of the above&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
10. Store and forward routing&lt;br /&gt;
&lt;br /&gt;
a) requires the entire message to be broken into regular sized pieces and sent over the network&lt;br /&gt;
&lt;br /&gt;
b) is an optimal routing protocol&lt;br /&gt;
&lt;br /&gt;
c) buffers the entire message in each node along the route before sending it to the next node&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62777</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62777"/>
		<updated>2012-04-26T05:28:26Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Quiz */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[14]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[2]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[3]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
There are many different topologies that could be introduced in this section. Some of the missing topologies include but are not limited to:&lt;br /&gt;
&lt;br /&gt;
* Hypercube&lt;br /&gt;
* Shuffle-exchange&lt;br /&gt;
* Torus&lt;br /&gt;
* Trees&lt;br /&gt;
&lt;br /&gt;
They are just cited here for completion, related information can be found at [http://www.cs.cf.ac.uk/Parallel/Year2/section5.html Interconnection Networks]&lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|upright=0.75|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[2]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|upright=0.75|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[4]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|upright=0.75|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|upright=0.75|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication. There is a 5-port router inside of each of the computing nodes and the communication is carried out through message-passing. Its name comes from the one trillion mathematical calculations per second (1 Teraflops) of performance, accomplished with the 80 simple cores with each containing 2 floating point units and all of this consuming only 62 watts (less than many other processors).&lt;br /&gt;
 &lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth. This design contains 48 fully functional cores and consumes only 25 watts. This newer model is more complete than the Teraflops Research model. It is full programmable and used for research by academia and private companies.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
The tiles that conform the Tilera designs contain a complete processor with L1 and L2 caches. And each one can run an operating system in an independent manner or several tiles can run, together as a whole, an operating system like SMP Linux, for example.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[5]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
The 3 building blocks can be used to create the specific design needed, with the input/output ports that the application requires. The blocks can be configured and stored in a library for creating the design. In the picture on the right, the example contains 2 of the building blocks (router and network interface) and a third undisclosed block.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional 16B-wide data rings, two in each direction. The name of the interconnect is the Element Interconnect Bus (EIB) and allows for communication among the different components of the Cell, among them and with the external I/O. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3]. The Cell consists of a PowerPC core which manages eight synergistic processing engines (SPEs) that can be used for floating-point calculations. These calculations provide the engine for better gaming systems.&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[7]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[7]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[8]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[8]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[7]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[10]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[10]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[11]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[9]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[6]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[12]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[13]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
 &lt;br /&gt;
[1] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[2] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[3] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[4] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[5] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[3] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[4] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[5] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[6] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[7] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[8] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[9] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[10] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[11] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[12] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[13] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[14] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;br /&gt;
&lt;br /&gt;
== Quiz ==&lt;br /&gt;
1. Advantage of 2-D Mesh&lt;br /&gt;
&lt;br /&gt;
a) simple design&lt;br /&gt;
&lt;br /&gt;
b) cumbersome design&lt;br /&gt;
&lt;br /&gt;
c) degree is the same for all nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Diameter is&lt;br /&gt;
&lt;br /&gt;
a) minimum hop count&lt;br /&gt;
&lt;br /&gt;
b) maximum hop count&lt;br /&gt;
&lt;br /&gt;
c) number of neighbors &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3. SOC stands for&lt;br /&gt;
&lt;br /&gt;
a) System of Chips&lt;br /&gt;
&lt;br /&gt;
b) Switch of Cores&lt;br /&gt;
&lt;br /&gt;
c) System on a Chip&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4. In a direct topology,&lt;br /&gt;
&lt;br /&gt;
a) each node contains a network interface acting as a router in order to transfer information&lt;br /&gt;
&lt;br /&gt;
b) there are nodes that act as routers&lt;br /&gt;
&lt;br /&gt;
c) only one node is a computational nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5. The Single-Chip Cloud Computer contains &lt;br /&gt;
&lt;br /&gt;
a) an 8x10 mesh&lt;br /&gt;
&lt;br /&gt;
b) a 64-router mesh network&lt;br /&gt;
&lt;br /&gt;
c) a 24-router mesh network&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6. A deterministic routing scheme uses algorithms to determine the most advantageous path to the target node.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
7. Livelock is necessary to maintain coherence in routing protocols.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
8. Dimension Order routing&lt;br /&gt;
&lt;br /&gt;
a) is only possible with 2D mesh-based topologies.&lt;br /&gt;
&lt;br /&gt;
b) attempts to route all packets in one dimension before starting another.&lt;br /&gt;
&lt;br /&gt;
c) uses routing tables to find the packet destination.&lt;br /&gt;
&lt;br /&gt;
9. Source routing&lt;br /&gt;
&lt;br /&gt;
a) includes information in the packet about the destination node&lt;br /&gt;
&lt;br /&gt;
b) uses routing information calculated by the sending node&lt;br /&gt;
&lt;br /&gt;
c) all of the above&lt;br /&gt;
&lt;br /&gt;
10.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62776</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62776"/>
		<updated>2012-04-26T05:26:38Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Quiz */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[14]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[2]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[3]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
There are many different topologies that could be introduced in this section. Some of the missing topologies include but are not limited to:&lt;br /&gt;
&lt;br /&gt;
* Hypercube&lt;br /&gt;
* Shuffle-exchange&lt;br /&gt;
* Torus&lt;br /&gt;
* Trees&lt;br /&gt;
&lt;br /&gt;
They are just cited here for completion, related information can be found at [http://www.cs.cf.ac.uk/Parallel/Year2/section5.html Interconnection Networks]&lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|upright=0.75|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[2]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|upright=0.75|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[4]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|upright=0.75|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|upright=0.75|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication. There is a 5-port router inside of each of the computing nodes and the communication is carried out through message-passing. Its name comes from the one trillion mathematical calculations per second (1 Teraflops) of performance, accomplished with the 80 simple cores with each containing 2 floating point units and all of this consuming only 62 watts (less than many other processors).&lt;br /&gt;
 &lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth. This design contains 48 fully functional cores and consumes only 25 watts. This newer model is more complete than the Teraflops Research model. It is full programmable and used for research by academia and private companies.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
The tiles that conform the Tilera designs contain a complete processor with L1 and L2 caches. And each one can run an operating system in an independent manner or several tiles can run, together as a whole, an operating system like SMP Linux, for example.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[5]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
The 3 building blocks can be used to create the specific design needed, with the input/output ports that the application requires. The blocks can be configured and stored in a library for creating the design. In the picture on the right, the example contains 2 of the building blocks (router and network interface) and a third undisclosed block.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional 16B-wide data rings, two in each direction. The name of the interconnect is the Element Interconnect Bus (EIB) and allows for communication among the different components of the Cell, among them and with the external I/O. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3]. The Cell consists of a PowerPC core which manages eight synergistic processing engines (SPEs) that can be used for floating-point calculations. These calculations provide the engine for better gaming systems.&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[7]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[7]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[8]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[8]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[7]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[10]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[10]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[11]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[9]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[6]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[12]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[13]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
 &lt;br /&gt;
[1] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[2] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[3] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[4] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[5] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[3] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[4] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[5] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[6] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[7] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[8] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[9] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[10] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[11] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[12] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[13] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[14] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;br /&gt;
&lt;br /&gt;
== Quiz ==&lt;br /&gt;
1. Advantage of 2-D Mesh&lt;br /&gt;
&lt;br /&gt;
a) simple design&lt;br /&gt;
&lt;br /&gt;
b) cumbersome design&lt;br /&gt;
&lt;br /&gt;
c) degree is the same for all nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Diameter is&lt;br /&gt;
&lt;br /&gt;
a) minimum hop count&lt;br /&gt;
&lt;br /&gt;
b) maximum hop count&lt;br /&gt;
&lt;br /&gt;
c) number of neighbors &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3. SOC stands for&lt;br /&gt;
&lt;br /&gt;
a) System of Chips&lt;br /&gt;
&lt;br /&gt;
b) Switch of Cores&lt;br /&gt;
&lt;br /&gt;
c) System on a Chip&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4. In a direct topology,&lt;br /&gt;
&lt;br /&gt;
a) each node contains a network interface acting as a router in order to transfer information&lt;br /&gt;
&lt;br /&gt;
b) there are nodes that act as routers&lt;br /&gt;
&lt;br /&gt;
c) only one node is a computational nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5. The Single-Chip Cloud Computer contains &lt;br /&gt;
&lt;br /&gt;
a) an 8x10 mesh&lt;br /&gt;
&lt;br /&gt;
b) a 64-router mesh network&lt;br /&gt;
&lt;br /&gt;
c) a 24-router mesh network&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6. A deterministic routing scheme uses algorithms to determine the most advantageous path to the target node.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
7. Livelock is necessary to maintain coherence in routing protocols.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
8. Dimension Order routing&lt;br /&gt;
&lt;br /&gt;
a) is only possible with 2D mesh-based topologies.&lt;br /&gt;
&lt;br /&gt;
b) attempts to route all packets in one dimension before starting another.&lt;br /&gt;
&lt;br /&gt;
c) uses routing tables to find the packet destination.&lt;br /&gt;
&lt;br /&gt;
9.&lt;br /&gt;
&lt;br /&gt;
10.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62775</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62775"/>
		<updated>2012-04-26T05:24:36Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Quiz */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[14]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[2]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[3]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
There are many different topologies that could be introduced in this section. Some of the missing topologies include but are not limited to:&lt;br /&gt;
&lt;br /&gt;
* Hypercube&lt;br /&gt;
* Shuffle-exchange&lt;br /&gt;
* Torus&lt;br /&gt;
* Trees&lt;br /&gt;
&lt;br /&gt;
They are just cited here for completion, related information can be found at [http://www.cs.cf.ac.uk/Parallel/Year2/section5.html Interconnection Networks]&lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|upright=0.75|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[2]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|upright=0.75|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[4]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|upright=0.75|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|upright=0.75|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication. There is a 5-port router inside of each of the computing nodes and the communication is carried out through message-passing. Its name comes from the one trillion mathematical calculations per second (1 Teraflops) of performance, accomplished with the 80 simple cores with each containing 2 floating point units and all of this consuming only 62 watts (less than many other processors).&lt;br /&gt;
 &lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth. This design contains 48 fully functional cores and consumes only 25 watts. This newer model is more complete than the Teraflops Research model. It is full programmable and used for research by academia and private companies.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
The tiles that conform the Tilera designs contain a complete processor with L1 and L2 caches. And each one can run an operating system in an independent manner or several tiles can run, together as a whole, an operating system like SMP Linux, for example.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[5]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
The 3 building blocks can be used to create the specific design needed, with the input/output ports that the application requires. The blocks can be configured and stored in a library for creating the design. In the picture on the right, the example contains 2 of the building blocks (router and network interface) and a third undisclosed block.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional 16B-wide data rings, two in each direction. The name of the interconnect is the Element Interconnect Bus (EIB) and allows for communication among the different components of the Cell, among them and with the external I/O. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3]. The Cell consists of a PowerPC core which manages eight synergistic processing engines (SPEs) that can be used for floating-point calculations. These calculations provide the engine for better gaming systems.&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[7]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[7]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[8]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[8]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[7]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[10]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[10]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[11]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[9]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[6]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[12]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[13]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
 &lt;br /&gt;
[1] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[2] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[3] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[4] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[5] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[3] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[4] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[5] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[6] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[7] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[8] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[9] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[10] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[11] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[12] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[13] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[14] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;br /&gt;
&lt;br /&gt;
== Quiz ==&lt;br /&gt;
1. Advantage of 2-D Mesh&lt;br /&gt;
&lt;br /&gt;
a) simple design&lt;br /&gt;
&lt;br /&gt;
b) cumbersome design&lt;br /&gt;
&lt;br /&gt;
c) degree is the same for all nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Diameter is&lt;br /&gt;
&lt;br /&gt;
a) minimum hop count&lt;br /&gt;
&lt;br /&gt;
b) maximum hop count&lt;br /&gt;
&lt;br /&gt;
c) number of neighbors &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
3. SOC stands for&lt;br /&gt;
&lt;br /&gt;
a) System of Chips&lt;br /&gt;
&lt;br /&gt;
b) Switch of Cores&lt;br /&gt;
&lt;br /&gt;
c) System on a Chip&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
4. In a direct topology,&lt;br /&gt;
&lt;br /&gt;
a) each node contains a network interface acting as a router in order to transfer information&lt;br /&gt;
&lt;br /&gt;
b) there are nodes that act as routers&lt;br /&gt;
&lt;br /&gt;
c) only one node is a computational nodes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
5. The Single-Chip Cloud Computer contains &lt;br /&gt;
&lt;br /&gt;
a) an 8x10 mesh&lt;br /&gt;
&lt;br /&gt;
b) a 64-router mesh network&lt;br /&gt;
&lt;br /&gt;
c) a 24-router mesh network&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
6. A deterministic routing scheme uses algorithms to determine the most advantageous path to the target node.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
7. Livelock is necessary to maintain coherence in routing protocols.&lt;br /&gt;
&lt;br /&gt;
a) True&lt;br /&gt;
&lt;br /&gt;
b) False&lt;br /&gt;
&lt;br /&gt;
8.&lt;br /&gt;
&lt;br /&gt;
9.&lt;br /&gt;
&lt;br /&gt;
10.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62091</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62091"/>
		<updated>2012-04-15T15:59:24Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[19]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[15]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[15]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[16]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[17]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[18]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[16] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[17] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[18] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[19] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62090</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62090"/>
		<updated>2012-04-15T15:58:43Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
The current trend in microprocessor design has shifted from extracting ever increasing performance gains from single core architecture to leveraging the power of multiple cores per die.  This creates new challenges not present in single core systems.  A multi core processor must have a method of passing information between processing cores that is efficient in terms of power consumed, space used on die, and the speed at which messages are delivered.  As physical wire widths are decreased and the number of wires is increased, the difference between gate delay and wire delay is exacerbated.[[#References|[8]]]  To combat these challenges, much research has been done in the area of on-chip networks.&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[15]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[15]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[16]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[17]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[18]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[16] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[17] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[18] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[19] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62089</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62089"/>
		<updated>2012-04-15T15:56:54Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[15]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[15]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[16]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[17]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[18]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[16] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[17] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[18] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;br /&gt;
&lt;br /&gt;
[19] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1016885&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62088</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62088"/>
		<updated>2012-04-15T15:19:07Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing Protocols in Soc's */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in SoC's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[15]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[15]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[16]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load. &lt;br /&gt;
&lt;br /&gt;
In [[#References|[17]]], the authors propose a design based on the properties of the  [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions. &lt;br /&gt;
&lt;br /&gt;
[[#References|[18]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[16] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;br /&gt;
&lt;br /&gt;
[17] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.&lt;br /&gt;
&lt;br /&gt;
[18] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62032</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62032"/>
		<updated>2012-04-15T02:33:19Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
===General Routing Schemes===&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]==== &lt;br /&gt;
This routing scheme has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]]  This approach can be quite effective when the average packet size is small in comparison with the channel widths.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====&lt;br /&gt;
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====&lt;br /&gt;
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====&lt;br /&gt;
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
====Deadlock and Livelock====&lt;br /&gt;
&lt;br /&gt;
Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Routing Protocols in Soc's===&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
==== Source Routing ====&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
==== Distributed Routing ====&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
==== Logic Based Distributed Routing (LBDR) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
==== Bufferless Deflection Routing (BLESS protocol) ====&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[15]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[15]]]  &lt;br /&gt;
&lt;br /&gt;
==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[16]]]&lt;br /&gt;
&lt;br /&gt;
==== Dimension-order Routing ====&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[16] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62031</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62031"/>
		<updated>2012-04-15T02:12:11Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[15]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[15]]]  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[15]]]&lt;br /&gt;
&lt;br /&gt;
=== Dimension-order Routing ===&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;br /&gt;
&lt;br /&gt;
[16] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62030</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62030"/>
		<updated>2012-04-15T02:09:05Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[15]]]  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[15]]]  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[15]]]&lt;br /&gt;
&lt;br /&gt;
=== Dimension-order Routing ===&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62029</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62029"/>
		<updated>2012-04-15T02:07:22Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
=== Dimension-order Routing ===&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;br /&gt;
&lt;br /&gt;
[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62028</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62028"/>
		<updated>2012-04-15T01:59:32Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Logic Based Distributed Routing (LBDR) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]]  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
=== Dimension-order Routing ===&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing ] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62027</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62027"/>
		<updated>2012-04-15T01:58:11Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Source Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.  The extra route information is sent in each packet, inflating their size.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
=== Dimension-order Routing ===&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing ] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62026</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62026"/>
		<updated>2012-04-15T01:56:37Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Source Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
=== Dimension-order Routing ===&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing ] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62025</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62025"/>
		<updated>2012-04-15T01:54:30Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
=== Dimension-order Routing ===&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;br /&gt;
&lt;br /&gt;
[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=1183584&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing ] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62024</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62024"/>
		<updated>2012-04-15T01:51:39Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
=== Dimension-order Routing ===&lt;br /&gt;
&lt;br /&gt;
This protocol is a deterministic strategy for multidimensional networks.  Each direction is chosen in order and routed completely before switching to the next direction.  For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension.  This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62023</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62023"/>
		<updated>2012-04-15T01:45:13Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;br /&gt;
&lt;br /&gt;
[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62022</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62022"/>
		<updated>2012-04-15T01:41:27Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing] describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing] describes a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria.  Adaptive routing is intended to provide as many routes as possible to reach the destination.&lt;br /&gt;
&lt;br /&gt;
When considering routing protocols, it is important to consider whether deadlock or livelock can occur.&lt;br /&gt;
&lt;br /&gt;
''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.&lt;br /&gt;
&lt;br /&gt;
''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62006</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62006"/>
		<updated>2012-04-14T21:33:02Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.  They can be broadly classified in several different ways.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]&lt;br /&gt;
&lt;br /&gt;
Dynamic Routing&lt;br /&gt;
&lt;br /&gt;
Static Routing&lt;br /&gt;
&lt;br /&gt;
The specific routing protocols below are built using the ideas from the classes of protocols previously described.&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62005</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62005"/>
		<updated>2012-04-14T21:30:07Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it. &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cu- through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&amp;amp;arnumber=4407676&amp;amp;url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,], 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62004</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62004"/>
		<updated>2012-04-14T21:24:58Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages.&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it. &lt;br /&gt;
&lt;br /&gt;
In contrast, a [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] protocol uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately.  True cu- through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches).  Using a cut-through protocol lowers latency&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62003</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62003"/>
		<updated>2012-04-14T21:15:25Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's].  [http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  &lt;br /&gt;
&lt;br /&gt;
In contrast, [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] the switch examines the header, decides where to send the message, and then starts forwarding it immediately.  Cut through routing lets the tail continue when the head is blocked, stacking message packets into a single switch. (Requires a&lt;br /&gt;
buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over the network, potentially blocking other messages (needs only enough buffer space to store the piece of the packet that is sent between switches). &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62002</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=62002"/>
		<updated>2012-04-14T21:14:43Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's].  [http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing] has been used since the early days of telecommunications.  It requires that the entire message be received at a node prior before it is propagated to the next node.  &lt;br /&gt;
&lt;br /&gt;
cut-through routing] will break the message into [http://en.wikipedia.org/wiki/Frame_%28networking%29 packets] and forward each message piece to the destination for reassembly.  Since buffer space is required for store and forward routing and since latency is increased under this scheme, cut-through routing is favored in multiprocessors. &lt;br /&gt;
&lt;br /&gt;
In [http://en.wikipedia.org/wiki/Cut-through_switching cut-through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching worm hole routing] the switch examines the header, decides where to send the message, and then starts forwarding it immediately.  Cut through routing lets the tail continue when the head is blocked, stacking message packets into a single switch. (Requires a&lt;br /&gt;
buffer large enough to hold the largest packet).  In worm hole routing, when the head of the message is blocked the message stays strung out over the network, potentially blocking other messages (needs only enough buffer space to store the piece of the packet that is sent between switches). &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=61998</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=61998"/>
		<updated>2012-04-14T20:48:09Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip), this is the most common term and also used in this article&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This data is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly. &lt;br /&gt;
&lt;br /&gt;
The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.&lt;br /&gt;
&lt;br /&gt;
=== Examples of topologies in current NoCs ===&lt;br /&gt;
&lt;br /&gt;
==== Intel ====&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.&lt;br /&gt;
&lt;br /&gt;
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.&lt;br /&gt;
&lt;br /&gt;
==== Tilera ====&lt;br /&gt;
&lt;br /&gt;
The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.&lt;br /&gt;
&lt;br /&gt;
==== ST Microelectronics ====&lt;br /&gt;
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]&lt;br /&gt;
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]]. &lt;br /&gt;
&lt;br /&gt;
The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.&lt;br /&gt;
&lt;br /&gt;
==== IBM ====&lt;br /&gt;
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s. &lt;br /&gt;
&lt;br /&gt;
As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
=== Source Routing ===&lt;br /&gt;
&lt;br /&gt;
The source node computes the path and stores the information in the packet header.  The header becomes additional information that must be transferred through the network.&lt;br /&gt;
&lt;br /&gt;
=== Distributed Routing ===&lt;br /&gt;
&lt;br /&gt;
Each switch in the network computes the next route that will be taking towards the destination.  The packet header contains only the destination information, reducing its size compared to source routing.  This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.&lt;br /&gt;
&lt;br /&gt;
=== Logic Based Distributed Routing (LBDR) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet.  It is most commonly used in 2D meshes, but it can be applied to other topologies as well.  Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
There are several variations of LBDR&lt;br /&gt;
&lt;br /&gt;
''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops&lt;br /&gt;
&lt;br /&gt;
''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol&lt;br /&gt;
&lt;br /&gt;
''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.&lt;br /&gt;
&lt;br /&gt;
=== Bufferless Deflection Routing (BLESS protocol) ===&lt;br /&gt;
&lt;br /&gt;
In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths.  Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router.  This may result in undesirable routing, but the packets will eventually reach the destination.  This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.  &lt;br /&gt;
&lt;br /&gt;
=== CHIPPER (Cheap-Interconnect Partially Permuting Router) ===&lt;br /&gt;
&lt;br /&gt;
This protocol was designed to address inefficient port allocation in the BLESS protocol.  A permutation network directs deflected flits to free output ports.  By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock.  In the case of contention, arbitration logic chooses a winning flit.  It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured.  Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).&lt;br /&gt;
&lt;br /&gt;
== Lines of Research ==&lt;br /&gt;
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.&lt;br /&gt;
&lt;br /&gt;
=== Optical on-chip interconnects ===&lt;br /&gt;
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.&lt;br /&gt;
&lt;br /&gt;
This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:&lt;br /&gt;
&lt;br /&gt;
{| {{table}}&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Electronics'''&lt;br /&gt;
| align=&amp;quot;center&amp;quot; style=&amp;quot;background:#f0f0f0;&amp;quot;|'''Photonics'''&lt;br /&gt;
|-&lt;br /&gt;
| Electronic network ~500W||Optic network &amp;lt;80W&lt;br /&gt;
|-&lt;br /&gt;
| power = bandwidth x length||power does not depend on bitrate nor length&lt;br /&gt;
|-&lt;br /&gt;
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx&lt;br /&gt;
|-&lt;br /&gt;
| ||switching fabric has almost no power dissipation&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a &amp;quot;scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths.&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
=== Reconfigurable NoC ===&lt;br /&gt;
&lt;br /&gt;
http://async.org.uk/async2008/async-nocs-slides/Tuesday/NoCS-2/ReNoC2008.pdf&lt;br /&gt;
&lt;br /&gt;
=== Bio NoC ===&lt;br /&gt;
&lt;br /&gt;
http://ssl.kaist.ac.kr/2007/data/journal/%5BJSSC2009%5DKHKIM.pdf&lt;br /&gt;
&lt;br /&gt;
http://www.crcpress.com/product/isbn/9781439829110&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;br /&gt;
&lt;br /&gt;
[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011&lt;br /&gt;
&lt;br /&gt;
[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&amp;amp;arnumber=5948588&amp;amp;isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=61846</id>
		<title>CSC/ECE 506 Spring 2012/12b sb</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/12b_sb&amp;diff=61846"/>
		<updated>2012-04-13T04:52:27Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Routing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;On-chip interconnects&lt;br /&gt;
__TOC__ &lt;br /&gt;
&lt;br /&gt;
== Introduction ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.&lt;br /&gt;
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]&lt;br /&gt;
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel@ Many Integrated Core (Intel® MIC)].&lt;br /&gt;
&lt;br /&gt;
To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.&lt;br /&gt;
&lt;br /&gt;
=== Terminology ===&lt;br /&gt;
Some common terms:&lt;br /&gt;
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.&lt;br /&gt;
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:&lt;br /&gt;
* NoC (network-on-chip)&lt;br /&gt;
* OCIN (on-chip interconnection network) &lt;br /&gt;
* OCN (on-chip network)&lt;br /&gt;
&lt;br /&gt;
== Topologies ==&lt;br /&gt;
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.&lt;br /&gt;
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.&lt;br /&gt;
&lt;br /&gt;
*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop&lt;br /&gt;
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination&lt;br /&gt;
*'''Diameter''' is the maximum hop count&lt;br /&gt;
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.&lt;br /&gt;
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Topologies can be classified as direct and indirect topologies.&lt;br /&gt;
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.&lt;br /&gt;
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.&lt;br /&gt;
&lt;br /&gt;
An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.  &lt;br /&gt;
&lt;br /&gt;
=== 2-D Mesh ===&lt;br /&gt;
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.&lt;br /&gt;
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.&lt;br /&gt;
&lt;br /&gt;
But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.&lt;br /&gt;
&lt;br /&gt;
Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:&lt;br /&gt;
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.&lt;br /&gt;
*average minimum hop count: &lt;br /&gt;
:{| {{table}}&lt;br /&gt;
| nk/3|| ||k even&lt;br /&gt;
|-&lt;br /&gt;
| n(k/3-1/3k)|| ||k odd&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4&lt;br /&gt;
*meshes provide diversity of paths for routing messages&lt;br /&gt;
&lt;br /&gt;
=== Concentration Mesh ===&lt;br /&gt;
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes. &lt;br /&gt;
&lt;br /&gt;
The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]&lt;br /&gt;
&lt;br /&gt;
The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].&lt;br /&gt;
&lt;br /&gt;
Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.&lt;br /&gt;
&lt;br /&gt;
=== Flattened Butterfly ===&lt;br /&gt;
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies k&amp;lt;sup&amp;gt;n&amp;lt;/sup&amp;gt; network nodes with n stages of k&amp;lt;sup&amp;gt;n−1&amp;lt;/sup&amp;gt; k × k intermediate routing nodes. The degree of each intermediate router is 2k.  &lt;br /&gt;
&lt;br /&gt;
The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.&lt;br /&gt;
&lt;br /&gt;
Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.&lt;br /&gt;
&lt;br /&gt;
For the disadvantages, it has high channel count (k&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt;/2 per row/column), low channel utilization, and increased control complexity.&lt;br /&gt;
&lt;br /&gt;
=== Multidrop Express Channels (MECS) ===&lt;br /&gt;
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring. &lt;br /&gt;
Multidrop Express Channels is defined by its authors as a &amp;quot;one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner.&amp;quot;  Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel. &lt;br /&gt;
&lt;br /&gt;
Some of the parameters calculated for MECS are:&lt;br /&gt;
*Bisection channel count per each row/column is equal to k.&lt;br /&gt;
*Network diameter (maximum hop count) is two.&lt;br /&gt;
*The number of nodes accessible through each channel ranges from 1 to k − 1.&lt;br /&gt;
*A node has 1 output port per direction&lt;br /&gt;
*The input port count is 2(k − 1)&lt;br /&gt;
&lt;br /&gt;
The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.&lt;br /&gt;
&lt;br /&gt;
=== Comparison of topologies ===&lt;br /&gt;
This information is taken from the analysis done in [[#References|[1]]]. &lt;br /&gt;
&lt;br /&gt;
[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]&lt;br /&gt;
&lt;br /&gt;
The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration). &lt;br /&gt;
&lt;br /&gt;
Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.&lt;br /&gt;
&lt;br /&gt;
== Routing ==&lt;br /&gt;
&lt;br /&gt;
The routing used on a network also has an important effect on the speed at which a packet reaches its destination.  Routing can be implemented as [http://en.wikipedia.org/wiki/Source_routing source routing] or [http://en.wikipedia.org/wiki/List_of_ad_hoc_routing_protocols distributed routing].&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.&lt;br /&gt;
&lt;br /&gt;
[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 &amp;quot;An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,&amp;quot;] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007&lt;br /&gt;
&lt;br /&gt;
[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 &amp;quot;A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,&amp;quot;] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006&lt;br /&gt;
&lt;br /&gt;
[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.&lt;br /&gt;
&lt;br /&gt;
[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.&lt;br /&gt;
&lt;br /&gt;
[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006&lt;br /&gt;
&lt;br /&gt;
[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.&lt;br /&gt;
&lt;br /&gt;
[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.&lt;br /&gt;
&lt;br /&gt;
[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58201</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58201"/>
		<updated>2012-02-07T03:21:48Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* A quick primer on current manufacturing techniques */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  One of these is the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]).  This is a type of random access memory that allows for the storage of each bit in a separate capacitor on an integrated circuit.  The main advantage of DRAM over its predecessor, SRAM, is that only one transistor and a capacitor are required per bit, compared to four or six transistors with SRAM.  Another is most certainly the  complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors embedded on a wafer of silicone.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. The stencil is then placed over a silicone wafer which is sensitive to ultraviolet light.  The light penetrates through the gaps of the stencil and exposes the silicon wafer, which is then is bathed in acid, carving the outlines of the circuits and the design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 [http://en.wikipedia.org/wiki/Nanometre nanometers], the smallest transistor that can be etched is about thirty atoms across.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Photolithography&amp;lt;/ref&amp;gt;  Due to physical limitations, This process cannot go on forever. At some point, it will be physically impossible to etch smaller effective transistors, and Moore’s law as we understand it will finally collapse.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.&amp;lt;ref&amp;gt;http://computer.howstuffworks.com/small-cpu2.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
===A Common Misconception===&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  &lt;br /&gt;
&lt;br /&gt;
====Do Transistor Counts Matter?====&lt;br /&gt;
Moore's Law concerns only the doubling of transistors on the same die space every 2 years.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.  Novel approaches, such as 3-D transistor manufacturing will allow for greater densities, but other approaches, such as quantum computing operate in a different way than the traditional transistor to solve the same problem more efficiently.&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.iue.tuwien.ac.at/phd/wittmann/node6.html&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into [http://en.wikipedia.org/wiki/Superposition_principle superposition.]  Using this superposition, each [http://en.wikipedia.org/wiki/Qubit &amp;quot;q-bit&amp;quot;] can be entangled with other q-bits to represent multiple states at once.  By using [http://en.wikipedia.org/wiki/Quantum_gate quantum logic gates], the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the [http://en.wikipedia.org/wiki/Discrete_logarithm_problem discrete logarithm problem], upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
On another note, if we relax the definition of Moore's Law to include computational performance gains, we open a whole new avenue by which to measure computing power.  Most of the easy gains in performance related to transistor counts have been realized, but new designs of how basic computing is performed can theoretically yield large increases in performance without doubling of transistor counts or extremely high power requirements.&amp;lt;ref&amp;gt;http://abcnews.go.com/Technology/story?id=4006166&amp;amp;page=1#.TzAbDcVA_H4&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;  The era of the traditional transistor is not quite over yet, but the relevance of transistor counts may be nearing it's end.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58040</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58040"/>
		<updated>2012-02-06T18:28:21Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Conclusions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.&amp;lt;ref&amp;gt;http://computer.howstuffworks.com/small-cpu2.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
===A Common Misconception===&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  &lt;br /&gt;
&lt;br /&gt;
====Do Transistor Counts Matter?====&lt;br /&gt;
Moore's Law concerns only the doubling of transistors on the same die space every 2 years.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.  Novel approaches, such as 3-D transistor manufacturing will allow for greater densities, but other approaches, such as quantum computing operate in a different way than the traditional transistor to solve the same problem more efficiently.&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.iue.tuwien.ac.at/phd/wittmann/node6.html&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into [http://en.wikipedia.org/wiki/Superposition_principle superposition.]  Using this superposition, each [http://en.wikipedia.org/wiki/Qubit &amp;quot;q-bit&amp;quot;] can be entangled with other q-bits to represent multiple states at once.  By using [http://en.wikipedia.org/wiki/Quantum_gate quantum logic gates], the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the [http://en.wikipedia.org/wiki/Discrete_logarithm_problem discrete logarithm problem], upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
On another note, if we relax the definition of Moore's Law to include computational performance gains, we open a whole new avenue by which to measure computing power.  Most of the easy gains in performance related to transistor counts have been realized, but new designs of how basic computing is performed can theoretically yield large increases in performance without doubling of transistor counts or extremely high power requirements.&amp;lt;ref&amp;gt;http://abcnews.go.com/Technology/story?id=4006166&amp;amp;page=1#.TzAbDcVA_H4&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;  The era of the traditional transistor is not quite over yet, but the relevance of transistor counts may be nearing it's end.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58038</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58038"/>
		<updated>2012-02-06T18:27:57Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Conclusions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.&amp;lt;ref&amp;gt;http://computer.howstuffworks.com/small-cpu2.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
===A Common Misconception===&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  &lt;br /&gt;
&lt;br /&gt;
====Do Transistor Counts Matter?====&lt;br /&gt;
Moore's Law concerns only the doubling of transistors on the same die space every 2 years.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.  Novel approaches, such as 3-D transistor manufacturing will allow for greater densities, but other approaches, such as quantum computing operate in a different way than the traditional transistor to solve the same problem more efficiently.&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.iue.tuwien.ac.at/phd/wittmann/node6.html&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into [http://en.wikipedia.org/wiki/Superposition_principle superposition.]  Using this superposition, each [http://en.wikipedia.org/wiki/Qubit &amp;quot;q-bit&amp;quot;] can be entangled with other q-bits to represent multiple states at once.  By using [http://en.wikipedia.org/wiki/Quantum_gate quantum logic gates], the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the [http://en.wikipedia.org/wiki/Discrete_logarithm_problem discrete logarithm problem], upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
On another note, if we relax the definition of Moore's Law to include performance gains, we open a whole new avenue by which to measure computing power.  Most of the easy gains in performance related to transistor counts have been realized, but new designs of how basic computing is performed can theoretically yield large increases in performance without doubling of transistor counts or extremely high power requirements.&amp;lt;ref&amp;gt;http://abcnews.go.com/Technology/story?id=4006166&amp;amp;page=1#.TzAbDcVA_H4&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;  The era of the traditional transistor is not quite over yet, but the relevance of transistor counts may be nearing it's end.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58030</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58030"/>
		<updated>2012-02-06T18:19:23Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Do Transistor Counts Matter? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.&amp;lt;ref&amp;gt;http://computer.howstuffworks.com/small-cpu2.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
===A Common Misconception===&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  &lt;br /&gt;
&lt;br /&gt;
====Do Transistor Counts Matter?====&lt;br /&gt;
Moore's Law concerns only the doubling of transistors on the same die space every 2 years.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.  Novel approaches, such as 3-D transistor manufacturing will allow for greater densities, but other approaches, such as quantum computing operate in a different way than the traditional transistor to solve the same problem more efficiently.&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.iue.tuwien.ac.at/phd/wittmann/node6.html&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into [http://en.wikipedia.org/wiki/Superposition_principle superposition.]  Using this superposition, each [http://en.wikipedia.org/wiki/Qubit &amp;quot;q-bit&amp;quot;] can be entangled with other q-bits to represent multiple states at once.  By using [http://en.wikipedia.org/wiki/Quantum_gate quantum logic gates], the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the [http://en.wikipedia.org/wiki/Discrete_logarithm_problem discrete logarithm problem], upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58029</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58029"/>
		<updated>2012-02-06T18:17:20Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Do Transistor Counts Matter? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.&amp;lt;ref&amp;gt;http://computer.howstuffworks.com/small-cpu2.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
===A Common Misconception===&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  &lt;br /&gt;
&lt;br /&gt;
====Do Transistor Counts Matter?====&lt;br /&gt;
Moore's Law concerns only the doubling of transistors on the same die space every 2 years.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.  Novel approaches, such as 3-D transistor manufacturing will allow for greater densities, but other approaches, such as quantum computing operate in a different way than the traditional transistor to solve the same problem more efficiently.&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into [http://en.wikipedia.org/wiki/Superposition_principle superposition.]  Using this superposition, each [http://en.wikipedia.org/wiki/Qubit &amp;quot;q-bit&amp;quot;] can be entangled with other q-bits to represent multiple states at once.  By using [http://en.wikipedia.org/wiki/Quantum_gate quantum logic gates], the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the [http://en.wikipedia.org/wiki/Discrete_logarithm_problem discrete logarithm problem], upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58028</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58028"/>
		<updated>2012-02-06T18:16:49Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.&amp;lt;ref&amp;gt;http://computer.howstuffworks.com/small-cpu2.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt;  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
===A Common Misconception===&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  &lt;br /&gt;
&lt;br /&gt;
===Do Transistor Counts Matter?===&lt;br /&gt;
Moore's Law concerns only the doubling of transistors on the same die space every 2 years.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.  Novel approaches, such as 3-D transistor manufacturing will allow for greater densities, but other approaches, such as quantum computing operate in a different way than the traditional transistor to solve the same problem more efficiently.&amp;lt;ref&amp;gt;http://www.monolithic3d.com/2/post/2011/09/is-there-a-fundamental-limit-to-miniaturizing-cmos-transistors1.html&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into [http://en.wikipedia.org/wiki/Superposition_principle superposition.]  Using this superposition, each [http://en.wikipedia.org/wiki/Qubit &amp;quot;q-bit&amp;quot;] can be entangled with other q-bits to represent multiple states at once.  By using [http://en.wikipedia.org/wiki/Quantum_gate quantum logic gates], the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the [http://en.wikipedia.org/wiki/Discrete_logarithm_problem discrete logarithm problem], upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58017</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58017"/>
		<updated>2012-02-06T17:44:07Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* A quick primer on current manufacturing techniques */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.&amp;lt;ref&amp;gt;http://computer.howstuffworks.com/small-cpu2.htm&amp;lt;/ref&amp;gt;  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==A Common Misconception==&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into [http://en.wikipedia.org/wiki/Superposition_principle superposition.]  Using this superposition, each [http://en.wikipedia.org/wiki/Qubit &amp;quot;q-bit&amp;quot;] can be entangled with other q-bits to represent multiple states at once.  By using [http://en.wikipedia.org/wiki/Quantum_gate quantum logic gates], the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the [http://en.wikipedia.org/wiki/Discrete_logarithm_problem discrete logarithm problem], upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58016</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58016"/>
		<updated>2012-02-06T17:40:19Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Quantum Computing */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==A Common Misconception==&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into [http://en.wikipedia.org/wiki/Superposition_principle superposition.]  Using this superposition, each [http://en.wikipedia.org/wiki/Qubit &amp;quot;q-bit&amp;quot;] can be entangled with other q-bits to represent multiple states at once.  By using [http://en.wikipedia.org/wiki/Quantum_gate quantum logic gates], the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the [http://en.wikipedia.org/wiki/Discrete_logarithm_problem discrete logarithm problem], upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58015</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58015"/>
		<updated>2012-02-06T17:35:10Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Beyond Moore's Law */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==A Common Misconception==&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.  While some of these new technologies deal directly with adding more transistors into the same amount of space, others take a different approach to boost overall computational performance.  While not strictly following Moore's Law, per se, these advanced designs will lead to a continuation of the increase in computational power that can be harnessed from hardware.  They are included in the discussion to illustrate that performance is not necessarily dependent on the number of transistors that can be placed on a die.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range.&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58014</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58014"/>
		<updated>2012-02-06T17:16:13Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* A Common Misconception */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==A Common Misconception==&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt; while the number of transistors continues to double roughly every 2 years.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.  &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58013</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=58013"/>
		<updated>2012-02-06T17:15:17Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==A Common Misconception==&lt;br /&gt;
Moore's Law is often linked to performance improvements as measured in CPU clock speeds.  In the 1980's, former Intel executive David House stated that chip performance would double every 18 months.&amp;lt;ref&amp;gt;http://news.cnet.com/Myths-of-Moores-Law/2010-1071_3-1014887.html&amp;lt;/ref&amp;gt;  This is a consequence of Moore's Law, but it is not what Moore's Law actually claims.  In fact, due to heat dissipation issues&amp;lt;ref&amp;gt;http://www.gotw.ca/publications/concurrency-ddj.htm&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://techtalk.pcpitstop.com/2007/08/06/cpu-clock-speeds/&amp;lt;/ref&amp;gt;, performance as measured in clock speed has remained flat since 2005.&amp;lt;ref&amp;gt;http://www.kmeme.com/2010/09/clock-speed-wall.html&amp;lt;/ref&amp;gt;  &lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.  &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57785</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57785"/>
		<updated>2012-02-01T16:43:58Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Conclusions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.  &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law &amp;quot;is about 10 years away&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57784</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57784"/>
		<updated>2012-02-01T16:43:03Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true for traditional transistor technology.  Transistors will be so small that quantum effects will begin to take over and electrons will &amp;quot;leak&amp;quot; out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.  &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law is about 10 years away.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57783</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57783"/>
		<updated>2012-02-01T16:08:33Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Conclusions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true.  Transistors will be so small that quantum theory or atomic physics takes over and electrons will leak out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.  &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  New technologies that leverage three dimensional chip architecture would allow for years of continued growth in transistor counts and exotic designs could further increase the theoretical capacity of transistors in a particular space.  If the past is used as a predictor for future trends, it is safe to say that the end of Moore's Law is about 10 years away.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57782</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57782"/>
		<updated>2012-02-01T15:58:21Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true.  Transistors will be so small that quantum theory or atomic physics takes over and electrons will leak out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.  &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]&lt;br /&gt;
&lt;br /&gt;
==Conclusions==&lt;br /&gt;
&lt;br /&gt;
The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average.  With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]&amp;lt;ref&amp;gt;http://news.cnet.com/2100-1008-5112061.html&amp;lt;/ref&amp;gt;.  If this is true, the current pace of innovation would lead to hitting &amp;quot;Moore's Wall&amp;quot; around 2022, or in about 10 years.  This &amp;quot;10 year horizon&amp;quot; for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.&amp;lt;ref&amp;gt;http://arxiv.org/pdf/astro-ph/0404510v2.pdf&amp;lt;/ref&amp;gt;&amp;lt;ref&amp;gt;http://java.sys-con.com/node/557154&amp;lt;/ref&amp;gt;  &lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57781</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57781"/>
		<updated>2012-02-01T15:45:13Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Beyond Moore's Law */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true.  Transistors will be so small that quantum theory or atomic physics takes over and electrons will leak out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.  &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Other Technologies===&lt;br /&gt;
&lt;br /&gt;
The arena of research to produce an alternative to the traditional transistor includes many novel approaches.  They include (but are not limited to):&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]&lt;br /&gt;
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57780</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57780"/>
		<updated>2012-02-01T15:22:05Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Beyond Moore's Law */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true.  Transistors will be so small that quantum theory or atomic physics takes over and electrons will leak out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===&lt;br /&gt;
&lt;br /&gt;
Quantum computing works by essentially allowing all available bits to enter into superposition.  Using this superposition, each &amp;quot;q-bit&amp;quot; can be entangled with other q-bits to represent multiple states at once.  By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states.  This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based.  Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.  &lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===&lt;br /&gt;
&lt;br /&gt;
Another promising avenue is a re-design of the traditional transistor.  Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1.  The theoretical speed of these transistors is in the terahertz range&amp;lt;ref&amp;gt;http://www.rochester.edu/news/show.php?id=2585&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57779</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57779"/>
		<updated>2012-02-01T14:46:59Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: /* Beyond Moore's Law */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true.  Transistors will be so small that quantum theory or atomic physics takes over and electrons will leak out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref&amp;gt;http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===Quantum Computing===&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57778</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57778"/>
		<updated>2012-02-01T14:41:13Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: Beyond section added.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of Moore's Law is the transistor.  Computer chips contain hundreds of millions of transistors on a silicon wafer.  To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true.  Transistors will be so small that quantum theory or atomic physics takes over and electrons will leak out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.&lt;br /&gt;
&lt;br /&gt;
==Beyond Moore's Law==&lt;br /&gt;
&lt;br /&gt;
There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.&lt;br /&gt;
&lt;br /&gt;
===[http://en.wikipedia.org/wiki/Memristor The Memristor]===&lt;br /&gt;
&lt;br /&gt;
Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux.  As current flows in one direction through the circuit, resistance increases.  Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state.  This type of structure allows for both data storage and data processing (logic gate construction).  Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available.  HP has reported the ability to fit 100GB in a square centimeter&amp;lt;ref name=&amp;quot;EETimes&amp;quot;&amp;gt;{{Citation |last=Johnson |first=R. C. |date=30 April 2008 |title='Missing link' memristor created |url=http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks- |work=[[EE Times]] |accessdate=2008-04-30 }}&amp;lt;/ref&amp;gt; and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].&lt;br /&gt;
&lt;br /&gt;
Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing.  Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.&lt;br /&gt;
&lt;br /&gt;
===Quantum Computing===&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57777</id>
		<title>CSC/ECE 506 Spring 2012/1b as</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_506_Spring_2012/1b_as&amp;diff=57777"/>
		<updated>2012-02-01T07:19:24Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: Added manufacture info&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==What is Moore's Law?==&lt;br /&gt;
[[Image:TransCount59-75.png|right]]&lt;br /&gt;
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years&amp;lt;ref&amp;gt;http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html&amp;lt;/ref&amp;gt;.  The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years &amp;lt;ref&amp;gt;http://arstechnica.com/hardware/news/2008/09/moore.ars&amp;lt;/ref&amp;gt;.  Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses.  This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success.  Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Moore's_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==The lesser known second law==&lt;br /&gt;
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development.  The law states that the cost of a semiconductor chip fabrication plant doubles ever four years.  Simply put, in order for Moore's law to hold, Rock's law must also hold.&amp;lt;ref&amp;gt;http://en.wikipedia.org/wiki/Rock%27s_law&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Moore's law, past to present==&lt;br /&gt;
[[Image:Mooreslaw.png|right|thumb|350px|]]&lt;br /&gt;
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years.  There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law.  Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself.  Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well. &lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
! Processor&lt;br /&gt;
! Transistor count&lt;br /&gt;
! Date of introduction&lt;br /&gt;
! Manufacturer&lt;br /&gt;
! Process&lt;br /&gt;
! Area&lt;br /&gt;
|-&lt;br /&gt;
|Intel 4004&lt;br /&gt;
|2,300&lt;br /&gt;
|1971&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|12&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8008&lt;br /&gt;
|3,500&lt;br /&gt;
|1972&lt;br /&gt;
|Intel&lt;br /&gt;
|10&amp;amp;nbsp;µm&lt;br /&gt;
|14&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|MOS Technology 6502&lt;br /&gt;
|3,510&lt;br /&gt;
|1975&lt;br /&gt;
|MOS Technology&lt;br /&gt;
|&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6800&lt;br /&gt;
|4,100&lt;br /&gt;
|1974&lt;br /&gt;
|Motorola&lt;br /&gt;
|&lt;br /&gt;
|16&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8080&lt;br /&gt;
|4,500&lt;br /&gt;
|1974&lt;br /&gt;
|Intel&lt;br /&gt;
|6 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|RCA 1802&lt;br /&gt;
|5,000&lt;br /&gt;
|1974&lt;br /&gt;
|RCA&lt;br /&gt;
|5 μm&lt;br /&gt;
|27&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8085&lt;br /&gt;
|6,500&lt;br /&gt;
|1976&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|20&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Zilog Z80&lt;br /&gt;
|8,500&lt;br /&gt;
|1976&lt;br /&gt;
|Zilog&lt;br /&gt;
|4 μm&lt;br /&gt;
|18&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 6809&lt;br /&gt;
|9,000&lt;br /&gt;
|1978&lt;br /&gt;
|Motorola&lt;br /&gt;
|5 μm&lt;br /&gt;
|21&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8086&lt;br /&gt;
|29,000&lt;br /&gt;
|1978&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 8088&lt;br /&gt;
|29,000&lt;br /&gt;
|1979&lt;br /&gt;
|Intel&lt;br /&gt;
|3 μm&lt;br /&gt;
|33&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80186&lt;br /&gt;
|55,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|&lt;br /&gt;
|-&lt;br /&gt;
|Motorola 68000&lt;br /&gt;
|68,000&lt;br /&gt;
|1979&lt;br /&gt;
|Motorola&lt;br /&gt;
|4 μm&lt;br /&gt;
|44&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80286&lt;br /&gt;
|134,000&lt;br /&gt;
|1982&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|49&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80386&lt;br /&gt;
|275,000&lt;br /&gt;
|1985&lt;br /&gt;
|Intel&lt;br /&gt;
|1.5&amp;amp;nbsp;µm&lt;br /&gt;
|104&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel 80486&lt;br /&gt;
|1,180,000&lt;br /&gt;
|1989&lt;br /&gt;
|Intel&lt;br /&gt;
|1&amp;amp;nbsp;µm&lt;br /&gt;
|160&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Intel P5|Pentium&lt;br /&gt;
|3,100,000&lt;br /&gt;
|1993&lt;br /&gt;
|Intel&lt;br /&gt;
|0.8&amp;amp;nbsp;µm&lt;br /&gt;
|294&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K5&lt;br /&gt;
|4,300,000&lt;br /&gt;
|1996&lt;br /&gt;
|AMD&lt;br /&gt;
|0.5&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium II&lt;br /&gt;
|7,500,000&lt;br /&gt;
|1997&lt;br /&gt;
|Intel&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|195&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6&lt;br /&gt;
|8,800,000&lt;br /&gt;
|1997&lt;br /&gt;
|AMD&lt;br /&gt;
|0.35&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium III&lt;br /&gt;
|9,500,000&lt;br /&gt;
|1999&lt;br /&gt;
|Intel&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K6-III&lt;br /&gt;
|21,300,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K7&lt;br /&gt;
|22,000,000&lt;br /&gt;
|1999&lt;br /&gt;
|AMD&lt;br /&gt;
|0.25&amp;amp;nbsp;µm&lt;br /&gt;
|-&lt;br /&gt;
|Pentium 4&lt;br /&gt;
|42,000,000&lt;br /&gt;
|2000&lt;br /&gt;
|Intel&lt;br /&gt;
|180&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Intel Atom|Atom&lt;br /&gt;
|47,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Athlon#Barton and Thorton|Barton&lt;br /&gt;
|54,300,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K8&lt;br /&gt;
|105,900,000&lt;br /&gt;
|2003&lt;br /&gt;
|AMD&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2&lt;br /&gt;
|220,000,000&lt;br /&gt;
|2003&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Cell (microprocessor)|Cell&lt;br /&gt;
|241,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Sony/IBM/Toshiba&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core 2 Duo&lt;br /&gt;
|291,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|463,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|AMD&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|AMD K10&lt;br /&gt;
|758,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
&amp;lt;!--C2Q is double die product - two C2D dies in a single package |-&lt;br /&gt;
|Core 2 Quad&lt;br /&gt;
|582,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|65 nm --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Itanium 2 with 9MB cache&lt;br /&gt;
|592,000,000&lt;br /&gt;
|2004&lt;br /&gt;
|Intel&lt;br /&gt;
|130&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|Core i7 (Quad)&lt;br /&gt;
|731,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|263&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Xeon 7400&lt;br /&gt;
|1,900,000,000&lt;br /&gt;
|2008&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|-&lt;br /&gt;
|POWER6&lt;br /&gt;
|789,000,000&lt;br /&gt;
|2007&lt;br /&gt;
|IBM&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|341&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Opteron 2400&lt;br /&gt;
|904,000,000&lt;br /&gt;
|2009&lt;br /&gt;
|AMD&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|346&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|16-Core SPARC T3&lt;br /&gt;
|1,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Sun Microsystems|Sun/Oracle Corporation|Oracle&lt;br /&gt;
|40&amp;amp;nbsp;nm&lt;br /&gt;
|377&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)&lt;br /&gt;
|1,170,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|240&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-core POWER7&lt;br /&gt;
|1,200,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|567&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Quad-core IBM z196 (microprocessor)|z196&lt;br /&gt;
|1,400,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|IBM&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Dual-Core Itanium 2&lt;br /&gt;
|1,700,000,000&lt;br /&gt;
|2006&lt;br /&gt;
|Intel&lt;br /&gt;
|90&amp;amp;nbsp;nm&lt;br /&gt;
|596&amp;amp;nbsp;mm²&lt;br /&gt;
&amp;lt;!-- Magny-Cours Opteron 6100 is double die product - two dies in a single package|-&lt;br /&gt;
|Twelve-Core Opteron 6100&lt;br /&gt;
|1,810,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|AMD&lt;br /&gt;
|45 nm&lt;br /&gt;
|692&amp;amp;nbsp;mm²  --&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|Quad-Core Itanium Tukwila (processor)|Tukwila&lt;br /&gt;
|2,000,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|65&amp;amp;nbsp;nm&lt;br /&gt;
|699&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E) &lt;br /&gt;
|2,270,000,000 &lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|434&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX&lt;br /&gt;
|2,300,000,000&lt;br /&gt;
|2010&lt;br /&gt;
|Intel&lt;br /&gt;
|45&amp;amp;nbsp;nm&lt;br /&gt;
|684&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|10-Core Xeon Westmere-EX&lt;br /&gt;
|2,600,000,000&lt;br /&gt;
|2011&lt;br /&gt;
|Intel&lt;br /&gt;
|32&amp;amp;nbsp;nm&lt;br /&gt;
|512&amp;amp;nbsp;mm²&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==A quick primer on current manufacturing techniques==&lt;br /&gt;
&lt;br /&gt;
At the heart of this revolution is the computer chip, which can contain hundreds of millions of transistors on a silicon wafer the size of your fingernail. Inside your laptop there is a chip whose transistors can be seen only under a microscope.   To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.&lt;br /&gt;
Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.&lt;br /&gt;
&lt;br /&gt;
One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.&lt;br /&gt;
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms.   Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.&lt;br /&gt;
Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true.  Transistors will be so small that quantum theory or atomic physics takes over and electrons will leak out of the wires.  For example, the thinnest layer inside a computer will be about five atoms across.  At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
	<entry>
		<id>https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2011/ch6_6c_sm&amp;diff=56286</id>
		<title>CSC/ECE 517 Fall 2011/ch6 6c sm</title>
		<link rel="alternate" type="text/html" href="https://wiki.expertiza.ncsu.edu/index.php?title=CSC/ECE_517_Fall_2011/ch6_6c_sm&amp;diff=56286"/>
		<updated>2011-11-26T17:46:12Z</updated>

		<summary type="html">&lt;p&gt;Smalexa2: Added links&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;p style=&amp;quot;font-size: 24px&amp;quot;&amp;gt;'''Generics'''&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=='''Introduction'''==&lt;br /&gt;
&lt;br /&gt;
[http://en.wikipedia.org/wiki/Generic_programming Generic programming] is a method of programming that has been defined from many different angles to solve a fundamental problem, writing general purpose algorithms that will work with a variety of system or user defined types of variables.  The ultimate goal of Generic programming is to avoid code duplication and provide [http://en.wikipedia.org/wiki/Compile-time_type_checking compile-time type checking], thereby providing [http://en.wikipedia.org/wiki/Type_safety run-time type safety].&lt;br /&gt;
&lt;br /&gt;
The idea was first brought by [http://en.wikipedia.org/wiki/Ada_(programming_language) Ada] in 1983 to reduce code duplication. Then later after, more and more languages joined the team. Gnenerics is used in Java, C#, F#, and Visual Basic .NET, in C++, we can achive generic programming by using templates.&lt;br /&gt;
&lt;br /&gt;
For example, the concept of a “[http://en.wikipedia.org/wiki/Linked_list linked list]” is pervasive throughout software design, with each node holding an object that may range from a simple int to a complex structure or an instantiated object.  In each case, the underlying “list” needs to know how to handle the object at its node.  Consider the following C++ class:&lt;br /&gt;
&lt;br /&gt;
 class Point {&lt;br /&gt;
    &lt;br /&gt;
   public:&lt;br /&gt;
   Point();&lt;br /&gt;
   Point(int x, int y, int z);&lt;br /&gt;
   void setCoord(int x, int y, int z);&lt;br /&gt;
   // Other functions&lt;br /&gt;
   &lt;br /&gt;
   private:&lt;br /&gt;
     int x;&lt;br /&gt;
     int y;&lt;br /&gt;
     int z;&lt;br /&gt;
 };&lt;br /&gt;
&lt;br /&gt;
Perhaps this “point” representation is the basis of a larger project that will use thousands of “points” to plot some meaningful data or provide the basis of more complex analysis.  In order to do this a list of points can be created by first creating the concept of a list node::&lt;br /&gt;
&lt;br /&gt;
 class PointNode  {&lt;br /&gt;
     &lt;br /&gt;
     public:&lt;br /&gt;
     PointNode();&lt;br /&gt;
     void setPoint(int x, int y, int z);&lt;br /&gt;
     // Other functions&lt;br /&gt;
     &lt;br /&gt;
     private:&lt;br /&gt;
       Point *_next;&lt;br /&gt;
 };&lt;br /&gt;
&lt;br /&gt;
And a list wrapper class to manage them all:&lt;br /&gt;
&lt;br /&gt;
 class PointList  {&lt;br /&gt;
     &lt;br /&gt;
     public:&lt;br /&gt;
     PointList();&lt;br /&gt;
     void add(Point &amp;amp;a);&lt;br /&gt;
     void remove(Point &amp;amp;a);&lt;br /&gt;
     int pointTotal();&lt;br /&gt;
     // Other functions necessary&lt;br /&gt;
     private:&lt;br /&gt;
       PointNode *head;&lt;br /&gt;
       int count;&lt;br /&gt;
 };&lt;br /&gt;
&lt;br /&gt;
Now suppose that in addition to “point” data with three dimensions, there is another class of data dealing with rectangular areas that can be arranged in a list and processed.  Consider the new data type:&lt;br /&gt;
&lt;br /&gt;
 class Rectangle  {&lt;br /&gt;
     &lt;br /&gt;
     public:&lt;br /&gt;
     RectangularArea(Point p1, Point p2, Point p3);&lt;br /&gt;
     // Other accessors, etc...&lt;br /&gt;
     &lt;br /&gt;
     private:&lt;br /&gt;
       Point p1, p2, p3;&lt;br /&gt;
 };&lt;br /&gt;
&lt;br /&gt;
Since Rectangle is a different type, the Point list will not accept Rectangle objects.  The above list code could be copied and modified to deal with the concept of this new “Node Type” to provide the list functionality, which would allow us to accomplish our goal in the short run, but this leads to the following problem:&lt;br /&gt;
&lt;br /&gt;
For each basic algorithmic function (sort, iteration, etc) A and each datatype (int, bool, user-defined types) T, there are a total of A*T implementations that need to be written.  &lt;br /&gt;
&lt;br /&gt;
Generic Programming is a method of programming that reuses the implementation of “List” and other concepts and abstracts away the notion of “type”.  &lt;br /&gt;
&lt;br /&gt;
Consider the following two lists that take advantage of the C++ STL:&lt;br /&gt;
&lt;br /&gt;
 list&amp;lt;Point&amp;gt; plist;&lt;br /&gt;
 list&amp;lt;Rectangle&amp;gt; rlist;&lt;br /&gt;
&lt;br /&gt;
That’s it!  The above two lines each instantiated a “list” of the appropriate type.  Given this flexibility, only A + T implementations need to be written, one implementation for the underlying algorithm (list in this case) and one for the underlying data type.  &lt;br /&gt;
&lt;br /&gt;
=='''Generics/Templates in Java, C# and C++'''==&lt;br /&gt;
&lt;br /&gt;
==='''C#'''===&lt;br /&gt;
&lt;br /&gt;
In C#, the following can be declared:&lt;br /&gt;
&lt;br /&gt;
 List&amp;lt;Point&amp;gt; plist = new List&amp;lt;Point&amp;gt;();&lt;br /&gt;
&lt;br /&gt;
The compiler will prevent anything that is not a “Point” from being added to this list.  The C# compiler uses JIT compilation to build the specific code needed for this list, which is similar to writing a special “behind the scenes” list class to handle Points.  The benefit of this is that it increases the execution speed, there is no behind the scenes casting, and [http://en.wikipedia.org/wiki/Reflection_(computer_programming) Reflection]can be used to tell that this list contains only “Point” objects.&lt;br /&gt;
&lt;br /&gt;
==='''Java'''===&lt;br /&gt;
&lt;br /&gt;
In Java, the following can be declared:&lt;br /&gt;
&lt;br /&gt;
 ArrayList&amp;lt;Point&amp;gt; plist = new ArrayList&amp;lt;Point&amp;gt;();&lt;br /&gt;
&lt;br /&gt;
The declaration looks the same, but the underlying implementation is different.  The compiler will still prevent any non-Point objects from being added to the list, but instead of the compiler generating a special class for Points like in C#, Java uses the ArrayList to build the list and performs “Type Erasure” (The compiler will insert the proper casts in the compiled byte-code and “throw away” the type of the argument, treating them all as type Object).  Basically, the Java compiler creates “bridge methods” that will perform casts similar to this -- Point p1 = (Point) plist.get(1);  Since the casting is still being done, the performance hits are still there.  The benefit is that existing code that knows how to manipulate ArrayList can manipulate the list of “Points”.  One drawback is that (beside the performance penalty of casting) is that the type being erased prevents Reflection, the code can’t tell that the ArrayList contains “Points”, only that it is an ArrayList.&lt;br /&gt;
&lt;br /&gt;
==='''C++'''===&lt;br /&gt;
&lt;br /&gt;
In C++, the following can be declared:&lt;br /&gt;
&lt;br /&gt;
 list&amp;lt;Point&amp;gt; *plist = new list&amp;lt;Point&amp;gt;();&lt;br /&gt;
&lt;br /&gt;
Once again, the syntax is similar to both Java and C#, but the implementation is very different.  Both C# and Java produce code that is consumed by a Virtual Machine (VM), which does the translation to specific hardware instructions.  C++, however, produces actual binary code (ie. actual x86 instructions).  Everything in C++ is not an object and there is no underlying VM that needs to know about the “Point” class.  Because of this, C++ does not restrict what can be done with templates.  For example, in C# and Java, the compiler needs to know what methods are available for a class so that it can communicate this to the VM, which is accomplished through the implementation of interfaces.  For example:&lt;br /&gt;
&lt;br /&gt;
 void averageXcoord&amp;lt;T&amp;gt;( T one, T two ) { return ( T.getX() + T.getX() ) / 2; }&lt;br /&gt;
&lt;br /&gt;
This code will not compile in Java or C# because the compiler can’t tell which type T actually has the getX() method.  In order to get it to compile, an interface has to be created, implemented and then declared in the function declaration.  C++ takes a different path, if both objects during compile time have a getX() method defined, then the code will compile, if they don’t then it will throw a compile-time error.&lt;br /&gt;
&lt;br /&gt;
=='''Comparison of Generics and Templates'''==&lt;br /&gt;
&lt;br /&gt;
Although Generics and Templates basically perform the same function, but there are major differences between them.&lt;br /&gt;
* Templates are used to create methods at compile-time while Generics handles the actual types at run-time. In C++, the compiler will check the type specification and create duplicates of the template for each actual type that is specified while compiling.  This approach consumes more space, because every duplicate method copy occupies a certain amount of space.&lt;br /&gt;
&lt;br /&gt;
* The way errors are handled is also very different. When a parameter is passed in to a method with an inappropriate type, a compile-time error will be reported when using C++ templates because the compiler knows the defined type of each method.  With Generics, the compiler can check for illegal assignments at compile time, but since the actual values are not dealt with until [http://en.wikipedia.org/wiki/Just-in-time_compilation JIT] compilation, a situation may arise with types passed to a method are incompatible with the operations that are performed by the method.  To avoid this, a programmer can use [http://rosettacode.org/wiki/Constrained_genericity constraints] to issue the appropriate compile-time errors.&lt;br /&gt;
&lt;br /&gt;
=='''Advantage of Using Generics'''==&lt;br /&gt;
&lt;br /&gt;
There are several important reasons why we should use generics.&lt;br /&gt;
* The most important reason is that the type safety is easily achived by using Generics.&lt;br /&gt;
* Programmers can avoid having multiple implementations of methods. All they need to do is create a generic type or method and specify it when they call it. This also significantly reduces the duplication, and improves the quality of the code. &lt;br /&gt;
* Generics can also improve the performance of the code since you don't need to [http://stackoverflow.com/questions/1028520/use-cases-for-boxing-a-value-type-in-c box] the value types.  &lt;br /&gt;
* There are also some limitation of using generics, but for most applications the benefits are great and the disadvantages few.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=='''Language Support'''==&lt;br /&gt;
&lt;br /&gt;
Many languages support some implementation of Generic Programming.  The following list is not intended to be exhaustive:&lt;br /&gt;
&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Ada_(programming_language) Ada]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/C%2B%2B C++]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/C_Sharp_(programming_language) C#]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/D_(programming_language) D]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Eiffel_(programming_language) Eiffel]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Java_(programming_language) Java]&lt;br /&gt;
*[http://delphi.wikia.com/wiki/Delphi_Wiki Delphi]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Haskell_(programming_language) Haskell]&lt;br /&gt;
*[http://en.wikipedia.org/wiki/Scheme_(programming_language) Scheme]&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
{{reflist}}&lt;br /&gt;
*http://math.hws.edu/javanotes/c10/s1.html&lt;br /&gt;
*http://www.artima.com/cppsource/type_erasure.html&lt;br /&gt;
*http://www2.research.att.com/~bs/bs_faq.html#generic&lt;br /&gt;
*http://download.oracle.com/javase/1,5.0/docs/guide/language/generics.html&lt;br /&gt;
*http://lcsd05.cs.tamu.edu/papers/dos_reis_et_al.pdf&lt;br /&gt;
*http://en.wikipedia.org/wiki/Generic_programming&lt;br /&gt;
*http://www.generic-programming.org/languages/cpp/techniques.php&lt;br /&gt;
*https://docs.google.com/a/ncsu.edu/View?id=dcsvntt2_61htsg5kg8&lt;br /&gt;
*http://stackoverflow.com/questions/1028520/use-cases-for-boxing-a-value-type-in-c&lt;br /&gt;
*http://en.wikipedia.org/wiki/Ada_(programming_language)&lt;br /&gt;
*http://blogs.msdn.com/b/csharpfaq/archive/2004/03/12/how-do-c-generics-compare-to-c-templates.aspx&lt;/div&gt;</summary>
		<author><name>Smalexa2</name></author>
	</entry>
</feed>