On-chip interconnects
__TOC__

== Introduction ==

== Background ==
On-chip interconnects are a natural extension of the high integration levels that nowadays are reached with multiprocessor integration. Moore's law predicted that the number of transistors in an integrated circuit doubles every two years. This assumption has driven the integration of on-chip components and continues to show the way in the semiconductor industry.
[[File:Itr MIC image 920x460.png|thumb|c|right|Intel® MIC]]
In recent years, the main players in the chip industry keep racing to provide more cores integrated in a chip, with the multi-core (more than one core) and many-core (multi-core with so many cores that the historical multi-core techniques are not efficient any longer) technologies. This integration is known as [http://en.wikipedia.org/wiki/Multi-core_(computing) CMP] (chip multiprocessor) and lately Intel has coined the term [http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html Intel® Many Integrated Core (Intel® MIC)].

To make feasible the communication in between these many cores inside of a single chip, the traditional off-chip network has proved to have limited applications. According to [[#References|[5]]], the off-chip designs suffered from I/O bottlenecks which are a diminished problem for on-chip technologies as the internal wiring provides much higher bandwidth and overcomes the delay associated with the external traffic. Nevertheless, the on-chip designs still have some challenges that need further study. Among some of these issues are power consumption and space constraints.

=== Terminology ===
Some common terms:
* [http://en.wikipedia.org/wiki/System_on_a_chip SoCs] (Systems-on-a-chip), which commonly refer to chips that are made for a specific application or domain area.
* [http://en.wikipedia.org/wiki/MPSoC MPSoCs] (Multiprocessor systems-on-chip), referring to a SoC that uses multi-core technology.
It is interesting to note that for the particular theme of this article, there are at least three different acronyms referring to the same term. These are new technologies and different researchers have adopted different nomenclature. The acronyms are:
* NoC (network-on-chip), this is the most common term and also used in this article
* OCIN (on-chip interconnection network)
* OCN (on-chip network)

== Topologies ==
Topology refers to the layout or arrangement of interconnections among the processing elements. In general, a good topology aims to minimize network latency and maximize throughput.
There are certain metrics that help with the classification and comparison of the different topology types. Some of them are defined in Solihin's [[#References|[7]]] textbook in chapter 12.

*'''Degree''' is defined as the number of nodes that are neighbors to, or in other words, can be reached from it in one hop
*'''Hop count''' is the number of nodes through which a message needs to go through to get to the destination
*'''Diameter''' is the maximum hop count
*'''Path diversity''' is useful for the routing algorithm and is given by the amount of shortest paths that a topology offers between two nodes.
*'''Bisection width''' is the smallest number of wires you have to cut to separate the network into two halves

Topologies can be classified as direct and indirect topologies.
In a direct topology, each node is connected to other nodes, which are named neighbouring nodes. Each node contains a network interface acting as a router in order to transfer information.
In an indirect topology, there are nodes that are no computational but act as switches to transfer the traffic among the rest of the nodes, including other switches. It is called indirect because packets are switched through specific elements that are not part of the computational nodes themselves.

An example of direct topologies is 2-D Mesh. An example of indirect topology is Flattened Butterfly.

=== 2-D Mesh ===
[[File:Mesh.png|thumb|c|right|2D Mesh]]This has been a very popular topology due to its simple design and low layout and router complexity. It is often described as a k-ary n-cube , where k is the number of nodes on each dimension, and n is the number of dimensions. For example, a 4-ary 2-cube is a 4x4 2D mesh.
Another advantage is that this topology is similar to the physical die layout, making it more suitable to implement in tiled architectures. For reference, the combination of the switch and a processor is named ''tile''.

But not everything are advantages in this topology. One of the drawbacks of 2D Meshes is that the degree of the nodes along the edges is lower than the degree of the central nodes. This makes the 2D Mesh asymmetrical along the edges, therefore from the networking perspective, there is less demand for edge channels than for central channels.

Jerger and Peh [[#References|[5]]], provide the following information on parameters for a mesh as defined as a k-ary n-cube:
*the switch degree for a 2D mesh would be 4, as its network requires two channels in each dimension or 2n, although some ports on the edge will be unused.
*average minimum hop count:
:{| {{table}}
| nk/3|| ||k even
|-
| n(k/3-1/3k)|| ||k odd
|-
|}

*the channel load across the bisection of a mesh under uniform random traffic with an even k is k/4
*meshes provide diversity of paths for routing messages

=== Concentration Mesh ===
[[File:Concentratedmesh.png|thumb|c|right|Concentration Mesh]] This is an evolution of the mesh topology. There is no real need to have a 1:1 relationship between the number of cores and the number of switches/routers. The Concentration mesh reduces the ratio to 1:4, i.e. each router serves four computing nodes.

The advantage over the simple mesh is the decrease in the average hop count. This is important in terms of scaling the solution. But it is not as scalable as it could seem, as its degree is confined to the crossbar complexity [[#References|[1]]]

The reduction in the ratio introduces a lower bisection channel count, but it can be avoided by introducing express channels, as demonstrated in [[#References|[8]]].

Another drawback is that the port bandwidth can become a bottleneck in periods of high traffic.

=== Flattened Butterfly ===
[[File:Flbfly.png|thumb|c|right|Flattened butterfly]]A butterfly topology is often described as a k-ary n-fly, which implies kn network nodes with n stages of kn−1 k × k intermediate routing nodes. The degree of each intermediate router is 2k.

The ﬂattened butterﬂy is made by ﬂattening (i.e. combining) the routers in each row of a butterﬂy topology while preserving the inter-router connections. It does non-minimal routing for load balancing improvement in the network.

Some advantages are that the maximum distance between nodes is two hops and it has lower latency and better throughput than that of the mesh topology.

For the disadvantages, it has high channel count (k2/2 per row/column), low channel utilization, and increased control complexity.

=== Multidrop Express Channels (MECS) ===
[[File:Mecs.png|thumb|c|right|MECS]] Multidrop Express Channels was proposed in [[#References|[1]]] by Grot and Keckler. Their motivation was that performance and scalability should be obtained by managing wiring.
Multidrop Express Channels is defined by its authors as a "one to-many communication fabric that enables a high degree of connectivity in a bandwidth-efﬁcient manner." Based on point-to-point unidirectional links. This makes for a high degree of connectivity with fewer bisection channels and higher bandwidth for each channel.

Some of the parameters calculated for MECS are:
*Bisection channel count per each row/column is equal to k.
*Network diameter (maximum hop count) is two.
*The number of nodes accessible through each channel ranges from 1 to k − 1.
*A node has 1 output port per direction
*The input port count is 2(k − 1)

The low channel count and the high degree of connectivity provided by each channel increase per channel bandwidth and wire utilization. At the same time, the design minimizes the serialization delay. It presents low network latencies due to its low diameter.

=== Comparison of topologies ===
This data is taken from the analysis done in [[#References|[1]]].

[[File:Topologycomp.png|thumbnail|center|upright=5|Comparison of CMesh, Flattened Butterfly, and MECS]]

The information in this table compares three of the topologies described above for two combinations of k which is the network radix (nodes/dimension) and c (concentration factor, 1 being no concentration).

Maximum hop count is 2 for flattened butterfly and MECS, whereas is directly proportional to k in the case of Concentrated Mesh, what makes flattened butterfly and MECS better solutions with less network latency.

The bisection channels is 1 for CMesh in both cases, but it gets doubled and even quadrupled between MECS and flattened butterfly.

The bandwidth per channel in this example is better for CMesh and MECS, getting attenuated in the case of flattened butterfly.

=== Examples of topologies in current NoCs ===

==== Intel ====
The [http://techresearch.intel.com/ProjectDetails.aspx?Id=151 Intel Teraflops Research Chip] is made of an 8x10 mesh, and two 38-bit unidirectional links per channel. It has a bisection bandwidth of 380 GB/s, this includes data and sideband communication.

The [http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Single-Chip Cloud Computer] contains a 24-router mesh network with 256 GB/s bisection bandwidth.

==== Tilera ====

The [http://www.tilera.com/products/processors Tilera TileGx, TilePro, and Tile64] use the Tilera’s iMesh™ on-chip network. The iMesh™ consists of five 8x8 independent mesh networks with two 32-bit unidirectional links per channel. It provides a bisection bandwidth of 320GB/s.

==== ST Microelectronics ====
[[File:Spidergon.png|thumb|c|right|upright=1.5|Example of Spidergon design]]
ST Microelectronics created the Spidergon design for the STNoC [[#References|[10]]].

The Spidergon is a pseudo-regular topology with a design that is composed of three building blocks: network interface, router, and physical link. These building blocks make the design ready to be tailored to the needs of the application. Each router building block has a degree of 3.

==== IBM ====
The IBM [http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html Cell] project uses an interconnect with four unidirectional rings, two in each direction. The total network bisection bandwidth is 307.2 GB/s.

As a curiosity, the Cell processor was jointly developed with Sony and Toshiba, and is [http://en.wikipedia.org/wiki/Cell_(microprocessor) used] in the [http://news.cnet.com/PlayStation-3-chip-has-split-personality/2100-1043_3-5566340.html?tag=nl Sony PlayStation 3].

== Routing ==

There are a variety of routing protocols that can be used for [http://en.wikipedia.org/wiki/System_on_a_chip SoC's], each having different advantages and disadvantages. They can be broadly classified in several different ways.

===General Routing Schemes===

====[http://en.wikipedia.org/wiki/Store_and_forward Store and forward routing]====
This routing scheme has been used since the early days of telecommunications. It requires that the entire message be received at a node prior before it is propagated to the next node. This protocol suffers from a high storage requirement and high latency, due to the need to completely buffer a message before forwarding it.[[#References|[12]]] This approach can be quite effective when the average packet size is small in comparison with the channel widths.

====[http://en.wikipedia.org/wiki/Cut-through_switching Cut-Through routing] or [http://en.wikipedia.org/wiki/Wormhole_switching Worm Hole routing]====
These two protocols uses the switch to examine the flit header, decide where to send the message, and then start forwarding it immediately. True cut-through routing lets the tail continue when the head is blocked, stacking message packets into a single switch (which requires a buffer large enough to hold the largest packet). In worm hole routing, when the head of the message is blocked the message stays strung out over multiple nodes in the network, potentially blocking other messages (however, this needs only enough buffer space to store the piece of the packet that is sent between switches). Using a cut-through protocol lowers latency but can suffer from packet corruption and must implement a scheme to handle this.[[#References|[12]]]

====[http://en.wikipedia.org/wiki/Deterministic_routing Deterministic routing]====
This describes a routing scheme where, if we are given a pair of nodes, the same path will always be used between those nodes.

====[http://en.wikipedia.org/wiki/Adaptive_routing Adaptive routing]====
This is a routing scheme where the underlying routers may alter the path of packet flow in response to system conditions or other algorithm criteria. Adaptive routing is intended to provide as many routes as possible to reach the destination.

====Deadlock and Livelock====

Deadlock and livelock are two separate situations that may occur during routing, they are defined as follows:

''' Deadlock ''' is defined as a situation where there are activities (e.g., messages) each waiting for another to finish something.[[#References|[13]]] Since a waiting activity cannot finish, the messages are deadlocked.

''' Livelock ''' is defined as a situation where a message can move from node to node but will never reach their destination node.[[#References|[13]]]

----

===Routing Protocols in SoC's===

The specific routing protocols below are built using the ideas from the classes of protocols previously described.

==== Source Routing ====

The source node partially or totally computes the path a packet will take through the network and stores the information in the packet header. The extra route information is sent in each packet, inflating their size.

==== Distributed Routing ====

Each switch in the network computes the next route that will be taking towards the destination. The packet header contains only the destination information, reducing its size compared to source routing. This approach requires routing tables to be present to direct the packet from node to node, which does not scale well when the number of nodes increases.

==== Logic Based Distributed Routing (LBDR) ====

In this protocol, routing is achieved by each router knowing its position in the architecture and being able to determine what direction it is from the destination of the packet. It is most commonly used in 2D meshes, but it can be applied to other topologies as well.[[#References|[12]]] Using this position information, it is possible to route the packet based on a small number of bits and a few logic gates per router, which saves over a table or a buffer.

There are several variations of LBDR

''' LBDRe ''' - Uses extended path information to when calculating current routing hops by modeling up to the next two potential hops

''' uLBDR (Universal LBDR) ''' - The adds multicast support to the protocol

''' bLBDR ''' - Adds the ability to broadcast messages to only certain regions of the network.

==== Bufferless Deflection Routing (BLESS protocol) ====

In this protocol, each flit of a packet is routed independently of every other flit through the network, and different flits from the same packet may take different paths. Any contention between multiple flits results in one flit taking the desired path and the other flit being “deflected” to some other router. This may result in undesirable routing, but the packets will eventually reach the destination.[[#References|[15]]] This type of routing is feasible on every network topology that satisfies the following two constraints: Every router has at least the same number of output ports as the number of its input ports, and every router is reachable from every other router.[[#References|[15]]]

==== CHIPPER (Cheap-Interconnect Partially Permuting Router) ====

This protocol was designed to address inefficient port allocation in the BLESS protocol. A permutation network directs deflected flits to free output ports. By limiting the requirements so that only that the highest-priority flit obtains its request, we can prevent livelock. In the case of contention, arbitration logic chooses a winning flit. It does this using a novel scheme, it is possible to pick a single packet, and prioritize that packet globally above all other packets for long enough that its delivery is ensured. Every packet in the system eventually receives this special status, so every packet is eventually delivered (the Golden Packet scheme).[[#References|[16]]]

==== Dimension-order Routing ====

This protocol is a deterministic strategy for multidimensional networks. Each direction is chosen in order and routed completely before switching to the next direction. For example, in a 2D mesh, dimension order routing could be implemented by completely routing the packet in the X-dimension before beginning to route in the Y-dimension. This is extensible to higher order connections as well, for example, hypercubes can be routed in dimension order by routing packets along the dimensions in the order of different bit positions of the source and destination address, one bit position at a time.[[#References|[14]]]

== Lines of Research ==
From NoCs perspective, there are many lines of research besides the abundant of technologies of the commercial designs. Some of them are presented in this section.

=== Optical on-chip interconnects ===
IBM has been performing extensive research on photonic layer inside of the CMP used not only for connecting several cores, but also for routing traffic: [http://researcher.ibm.com/view_project.php?id=2757 Silicon Integrated Nanophotonics.] This technology was actually used in the IBM Cell chip that was mentioned in above sections. The main advantages are reliability and power efficiency.

This [http://www.research.ibm.com/photonics/publications/ecoc_tutorial_2008.pdf tutorial] explains some differences between electronics and photonics in terms of power consumption, the more efficient is the computing from power's perspective, the more FLOPs per Watt:

{| {{table}}
| align="center" style="background:#f0f0f0;"|'''Electronics'''
| align="center" style="background:#f0f0f0;"|'''Photonics'''
|-
| Electronic network ~500W||Optic network <80W
|-
| power = bandwidth x length||power does not depend on bitrate nor length
|-
| buffer on chip that rx and re-tx every bit at every switch||rx (modulate) data once, without having to re-tx
|-
| ||switching fabric has almost no power dissipation
|}

In academia, there are articles like [[#References|[11]]] which proposes a new topology created for optical on-chip interconnects. They refer to previous papers that cite adaptations of well-known electronic designs, but highlight the need to provide a "scalable all-optical NoC, referred to as 2D-HERT, with passive routing of optical data streams based on their wavelengths."

=== Reconfigurable NoC ===
Another field of study is the Software reconfigurable on-chip networks. They are commonly based on the 2D mesh topology. The main idea is to be able to reconfigure the NoC depending on the application and during run-time to react to congestion problems or, in general, adapt to the traffic load.

In [[#References|[17]]], the authors propose a design based on the properties of the [http://en.wikipedia.org/wiki/Field-programmable_gate_array field-programmable gate array (FPGA)]. It can dynamically implement circuit-switching channels, perform variations in the topology, and reconfigure routing tables. One of the main drawbacks is the overhead that this reconfiguration introduces, although it is designed to minimize it.

=== Bio NoC ===
Bio NoC or ANoC (Autonomic Network-on-Chip) is based on the concept of the human autonomic nervous system or the human biological immune system. The intention is to provide a NoC with self-organization, self-configuration, and self-healing to dynamically control networking functions.

[[#References|[18]]] presents a collection of chapters/articles from emerging research issues in the ANoC field of application.

== References ==

[1] B. Grot and S. W. Keckler. [http://www.cs.utexas.edu/~bgrot/docs/CMP-MSI_08.pdf Scalable on-chip interconnect topologies.] 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.

[2] Mirza-Aghatabar, M.; Koohi, S.; Hessabi, S.; Pedram, M.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4341445 "An Empirical Investigation of Mesh and Torus NoC Topologies Under Different Routing Algorithms and Traffic Models,"] Digital System Design Architectures, Methods and Tools, 2007. DSD 2007. 10th Euromicro Conference on , vol., no., pp.19-26, 29-31 Aug. 2007

[3] Ying Ping Zhang; Taikyeong Jeong; Fei Chen; Haiping Wu; Nitzsche, R.; Gao, G.R.; , [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1639301 "A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture,"] Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International , vol., no., pp. 10 pp., 25-29 April 2006

[4] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. [http://www.eecg.toronto.edu/~enright/tilera.pdf On-Chip Interconnection Architecture of the Tile Processor.] IEEE Micro 27, 5 (September 2007), 15-31.

[5] Natalie Enright Jerger and Li-Shiuan Peh. [http://www.morganclaypool.com/doi/abs/10.2200/S00209ED1V01Y200907CAC008?journalCode=cac On-Chip Networks.] Synthesis Lectures on Computer Architecture. 2009, 141 pages. Morgan and Claypool Publishers.

[6] D. N. Jayasimha, B. Zafar, Y. Hoskote. [http://blogs.intel.com/wp-content/mt-content/com/research/terascale/ODI_why-different.pdf On-chip interconnection networks: why they are different and how to compare them.] Technical Report, Intel Corp, 2006

[7] Yan Solihin. (2008). [http://www.cesr.ncsu.edu/solihin/Main.html Fundamentals of parallel computer architecture.] Solihin Pub.

[8] James Balfour and William J. Dally. 2006. [http://www.cs.berkeley.edu.prox.lib.ncsu.edu/~kubitron/courses/cs258-S08/handouts/papers/jbalfour_ICS.pdf Design tradeoffs for tiled CMP on-chip networks.] In Proceedings of the 20th annual international conference on Supercomputing (ICS '06). ACM, New York, NY, USA, 187-198.

[9] John Kim, James Balfour, and William Dally. [http://cva.stanford.edu/publications/2007/MICRO_FBFLY.pdf Flattened butterfly topology for on-chip networks.] In Proceedings of the 40th International Symposium on Microarchitecture, pages 172–182, December 2007.

[10] Dubois, F.; Cano, J.; Coppola, M.; Flich, J.; Petrot, F.; , [http://www.comcas.eu/publications/Spidergon_STNoC_Design.pdf Spidergon STNoC design flow,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.267-268, 1-4 May 2011

[11] Koohi, S.; Abdollahi, M.; Hessabi, S.; , [http://ieeexplore.ieee.org.prox.lib.ncsu.edu/stamp/stamp.jsp?tp=&arnumber=5948588&isnumber=5948548 All-optical wavelength-routed NoC based on a novel hierarchical topology,] Networks on Chip (NoCS), 2011 Fifth IEEE/ACM International Symposium on , vol., no., pp.97-104, 1-4 May 2011

[12] Flich, J.; Duato, J.;, [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4407676&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4407676 Logic-Based Distributed Routing for NoCs,] 2008 Computer Architecture Letters, vol. 7, no. 1, pp.13-16, Jan 2008

[13] Wu, J.; [http://www.cse.fau.edu/~jie/research/publications/Publication_files/ieeetc0309.pdf A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model,] 2003 IEEE Transactions on Computers, Vol. 52, No. 9, pp.1154-1169, Sept 2003

[14] Veselovsky, G.; Batovski, D.A.; [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1183584&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1183584 A study of the permutation capability of a binary hypercube under deterministic dimension-order routing,] 2003 Parallel, Distributed and Network-Based Processing, 2003. Proceedings. Eleventh Euromicro Conference on, vol., no., pp.173-177, 5-7 Feb. 2003

[15] Moscibroda, T; Mutlu, O.; [http://research.microsoft.com/pubs/80241/isca_2009-bless.pdf A Case for Bufferless Routing in On-Chip Networks,] ACM SIGARCH Computer Architecture News, Volume 37 Issue 3, June 2009

[16] Fallin, C.; Craik, C.; Mutlu, O.; [http://www.ece.cmu.edu/~safari/pubs/chipper_hpca2011.pdf CHIPPER: A Low-complexity Bufferless Deflection Router,] Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA 2011), San Antonio, TX, February 2011.

[17] V. Rana, et al., [http://infoscience.epfl.ch/record/130661/files/paperM2B-VLSI-SoC2008%5b1%5d.pdf A Reconfigurable Network-on-Chip Architecture for Optimal Multi-Processor SoC Communication,] in VLSI-SoC, 2009.

[18] Cong-Vinh, P. (December 2011). [http://www.crcpress.com/product/isbn/9781439829110 Autonomic networking-on-chip: Bio-inspired specification, development, and verification.] CRC Press.

[19] S. Kumar, et al., [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1016885&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1016885 A Network on Chip Architecture and Design Methodology,] VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.

Smalexa2:

==What is Moore's Law?==
[[Image:TransCount59-75.png|right]]
Moore's law, named after Gordon Moore, co-founder of Intel, states that the number of transistors that can be placed on an [http://en.wikipedia.org/wiki/Integrated_circuit integrated circuit] will double approximately every two years<ref>http://www.computerhistory.org/semiconductor/timeline/1965-Moore.html</ref>. The original prediction in 1965 stated a doubling every 12 months, but in 1975, after microprocessors were introduced that were less dense he slowed the rate of doubling to it's current state of two years <ref>http://arstechnica.com/hardware/news/2008/09/moore.ars</ref>. Instead of giving an empirical formula predicting the rate of increase, Moore instead used prose, graphs, and images to convey these predictions and observations to the masses. This in some ways increased the staying power of Moore's law, allowing the industry to use it as a benchmark of success and a measurable determination of their success. Virtually all digital devices are in some way fundamentally linked to the growth set in place by Moore's law.<ref>http://en.wikipedia.org/wiki/Moore's_law</ref>

==The lesser known second law==
Also know as Rock's law, this law is a direct consequence to Moore's law in that the cost to produce transistors on a chip may go down, these costs instead flow towards manufacturing, testing, and research and development. The law states that the cost of a semiconductor chip fabrication plant doubles ever four years. Simply put, in order for Moore's law to hold, Rock's law must also hold.<ref>http://en.wikipedia.org/wiki/Rock%27s_law</ref>

==Moore's law, past to present==
[[Image:Mooreslaw.png|right|thumb|350px|]]
Reviewing data from the inception of Moore's law to the present shows that, consistent to Moore's prediction, the number of transistors on a chip has doubled approximately every 2 years. There are several contributing factors, that had they not been developed, could have slowed or plateaued Moore's law. Some of these are, the invention Dynamic random access memory ([http://en.wikipedia.org/wiki/DRAM DRAM]), complementary metal-oxide-semiconductor ([http://en.wikipedia.org/wiki/CMOS CMOS]), and the invention of the integrated circuit itself. Moore's law isn't only responsible for making larger and faster chips, but also smaller, cheaper, and more efficient ones as well.
{| class="wikitable sortable" style="text-align:center"
! Processor
! Transistor count
! Date of introduction
! Manufacturer
! Process
! Area
|-
|Intel 4004
|2,300
|1971
|Intel
|10 µm
|12 mm²
|-
|Intel 8008
|3,500
|1972
|Intel
|10 µm
|14 mm²
|-
|MOS Technology 6502
|3,510
|1975
|MOS Technology
|
|21 mm²
|-
|Motorola 6800
|4,100
|1974
|Motorola
|
|16 mm²
|-
|Intel 8080
|4,500
|1974
|Intel
|6 μm
|20 mm²
|-
|RCA 1802
|5,000
|1974
|RCA
|5 μm
|27 mm²
|-
|Intel 8085
|6,500
|1976
|Intel
|3 μm
|20 mm²
|-
|Zilog Z80
|8,500
|1976
|Zilog
|4 μm
|18 mm²
|-
|Motorola 6809
|9,000
|1978
|Motorola
|5 μm
|21 mm²
|-
|Intel 8086
|29,000
|1978
|Intel
|3 μm
|33 mm²
|-
|Intel 8088
|29,000
|1979
|Intel
|3 μm
|33 mm²
|-
|Intel 80186
|55,000
|1982
|Intel
|
|-
|Motorola 68000
|68,000
|1979
|Motorola
|4 μm
|44 mm²
|-
|Intel 80286
|134,000
|1982
|Intel
|1.5 µm
|49 mm²
|-
|Intel 80386
|275,000
|1985
|Intel
|1.5 µm
|104 mm²
|-
|Intel 80486
|1,180,000
|1989
|Intel
|1 µm
|160 mm²
|-
|Intel P5|Pentium
|3,100,000
|1993
|Intel
|0.8 µm
|294 mm²
|-
|AMD K5
|4,300,000
|1996
|AMD
|0.5 µm
|-
|Pentium II
|7,500,000
|1997
|Intel
|0.35 µm
|195 mm²
|-
|AMD K6
|8,800,000
|1997
|AMD
|0.35 µm
|-
|Pentium III
|9,500,000
|1999
|Intel
|0.25 µm
|-
|AMD K6-III
|21,300,000
|1999
|AMD
|0.25 µm
|-
|AMD K7
|22,000,000
|1999
|AMD
|0.25 µm
|-
|Pentium 4
|42,000,000
|2000
|Intel
|180 nm
|-
|Intel Atom|Atom
|47,000,000
|2008
|Intel
|45 nm
|-
|Athlon#Barton and Thorton|Barton
|54,300,000
|2003
|AMD
|130 nm
|-
|AMD K8
|105,900,000
|2003
|AMD
|130 nm
|-
|Itanium 2
|220,000,000
|2003
|Intel
|130 nm
|-
|Cell (microprocessor)|Cell
|241,000,000
|2006
|Sony/IBM/Toshiba
|90 nm
|-
|Core 2 Duo
|291,000,000
|2006
|Intel
|65 nm
|-
|AMD K10
|463,000,000
|2007
|AMD
|65 nm
|-
|AMD K10
|758,000,000
|2008
|AMD
|45 nm
|-

|-
|Itanium 2 with 9MB cache
|592,000,000
|2004
|Intel
|130 nm
|-
|Core i7 (Quad)
|731,000,000
|2008
|Intel
|45 nm
|263 mm²
|-
|Six-Core Xeon 7400
|1,900,000,000
|2008
|Intel
|45 nm
|-
|POWER6
|789,000,000
|2007
|IBM
|65 nm
|341 mm²
|-
|Six-Core Opteron 2400
|904,000,000
|2009
|AMD
|45 nm
|346 mm²
|-
|16-Core SPARC T3
|1,000,000,000
|2010
|Sun Microsystems|Sun/Oracle Corporation|Oracle
|40 nm
|377 mm²
|-
|Six-Core Gulftown (microprocessor)|Core i7 (Gulftown)
|1,170,000,000
|2010
|Intel
|32 nm
|240 mm²
|-
|8-core POWER7
|1,200,000,000
|2010
|IBM
|45 nm
|567 mm²
|-
|Quad-core IBM z196 (microprocessor)|z196
|1,400,000,000
|2010
|IBM
|45 nm
|512 mm²
|-
|Dual-Core Itanium 2
|1,700,000,000
|2006
|Intel
|90 nm
|596 mm²

|-
|Quad-Core Itanium Tukwila (processor)|Tukwila
|2,000,000,000
|2010
|Intel
|65 nm
|699 mm²
|-
|Six-Core Sandy Bridge-E (microprocessor)|Core i7 (Sandy Bridge-E)
|2,270,000,000
|2011
|Intel
|32 nm
|434 mm²
|-
|8-Core Xeon Beckton (microprocessor)|Nehalem-EX
|2,300,000,000
|2010
|Intel
|45 nm
|684 mm²
|-
|10-Core Xeon Westmere-EX
|2,600,000,000
|2011
|Intel
|32 nm
|512 mm²
|-
|}

==A quick primer on current manufacturing techniques==

At the heart of Moore's Law is the transistor. Computer chips contain hundreds of millions of transistors on a silicon wafer. To make these chips, first a “stencil” is made containing the outlines of millions of transistors. This is placed over a silicone wafer containing many layers of silicon, which is sensitive to light. Ultraviolet light is then focused on the stencil, which then penetrates through the gaps of the stencil and exposes the silicon wafer.Then the wafer is bathed in acid, carving the outlines of the circuits and creating the intricate design of millions of transistors. Since the wafer consists of many conducting and semiconducting layers, the acid cuts into the wafer at different depths and patterns, so one can create circuits of enormous complexity.

One reason why Moore’s law has relentlessly increased the power of chips is because UV light can be tuned so that its wavelength is smaller and smaller, making it possible to etch increasingly tiny transistors onto silicon wafers. Since UV light has a wavelength as small as 10 nanometers (a nanometer is a billionth of a meter), this means that the smallest transistor that you can etch is about thirty atoms across.
But this process cannot go on forever. At some point, it will be physically impossible to etch transistors in this way that are the size of atoms. Moore’s law will finally collapse when transistor sizes are reduced to the size of individual atoms.

Currently, estimates predict that around 2020 or soon afterward, Moore’s law will gradually cease to hold true. Transistors will be so small that quantum theory or atomic physics takes over and electrons will leak out of the wires. For example, the thinnest layer inside a computer will be about five atoms across. At that size, quantum effects will become dominant and transistors will not function as they currently do without other technological advances.

==Beyond Moore's Law==

There are a few new technologies that have the potential to change the underlying architecture of processors and extend performance gains past the theoretical limits of traditional transistors.

===[http://en.wikipedia.org/wiki/Memristor The Memristor]===

Currently being developed by Hewlett Packard, the memristor is a new type of transistor that combines both electrical charge and magnetic flux. As current flows in one direction through the circuit, resistance increases. Reversing the flow will decrease the resistance and stopping the flow will leave the resistance in the current state. This type of structure allows for both data storage and data processing (logic gate construction). Currently, it is postulated that memristors could be layered in three dimensions on silicone, yielding data and transistor densities of up to 1000 times greater than currently available. HP has reported the ability to fit 100GB in a square centimeter<ref>http://www.eetimes.com/electronics-news/4076910/-Missing-link-memristor-created-Rewrite-the-textbooks</ref> and with the ability to layer memristors, this could lead to pocket devices with a capacity of over 1 [http://en.wikipedia.org/wiki/Petabyte petabyte].

Some advanced theoretical capabilities of memristors are the ability to store more than one state, which can lead to analog computing. Memristor technology may also provide an excellent architecture for synaptic modeling and self-learning systems.

===[http://en.wikipedia.org/wiki/Quantum_computer Quantum Computing]===

Quantum computing works by essentially allowing all available bits to enter into superposition. Using this superposition, each "q-bit" can be entangled with other q-bits to represent multiple states at once. By using quantum logic gates, the qbits can be manipulated to find the desired state among the superposition of states. This has great potential for drastically shortening the time necessary to solve several important problems, including integer factorization and the discrete logarithm problem, upon which much current encryption is based. Quantum computing faces several technical issues, including [http://en.wikipedia.org/wiki/Quantum_decoherence decoherence], which makes quantum computers difficult to construct and maintain.

===[http://en.wikipedia.org/wiki/Ballistic_transistor Ballistic Deflection Transistors]===

Another promising avenue is a re-design of the traditional transistor. Essentially, single electrons are passed through a transistor and deflected into one path or the other, thus delivering a 0 or a 1. The theoretical speed of these transistors is in the terahertz range<ref>http://www.rochester.edu/news/show.php?id=2585</ref>

===Other Technologies===

The arena of research to produce an alternative to the traditional transistor includes many novel approaches. They include (but are not limited to):

*[http://en.wikipedia.org/wiki/Optical_computer Optical Computing]
*[http://en.wikipedia.org/wiki/DNA_computing DNA Computing]
*[http://en.wikipedia.org/wiki/Molecular_electronics Molecular Electronics]
*[http://researchnews.osu.edu/archive/hybridspin.htm Spintronics]
*[http://en.wikipedia.org/wiki/Chemical_computer Chemical Computing]
*[http://en.wikipedia.org/wiki/Artificial_neural_network Artificial Neural Networks]
*[http://en.wikipedia.org/wiki/Unconventional_computing Unconventional Computing]

==Conclusions==

The demise of Moore's Law has been predicted several times during the past 40 years, but transistor counts continue to follow a two year doubling on average. With the traditional transistor approach, inevitable physical limits will be reached around the 16 nm process, due to [http://en.wikipedia.org/wiki/Quantum_tunnelling quantum tunneling]<ref>http://news.cnet.com/2100-1008-5112061.html</ref>. If this is true, the current pace of innovation would lead to hitting "Moore's Wall" around 2022, or in about 10 years. This "10 year horizon" for Moore's Law has existed since the early 1990's, with new designs, processes, and breakthroughs which continue to extend the timeline.<ref>http://arxiv.org/pdf/astro-ph/0404510v2.pdf</ref><ref>http://java.sys-con.com/node/557154</ref>

==References==
<references/>

CSC/ECE 506 Spring 2012/1b as

2012-02-01T15:45:13Z