0321-2670057/wiki2 5 089321

From Expertiza_Wiki
Revision as of 21:25, 25 September 2007 by Bnadeem (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">

<head> <meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <meta name=ProgId content=Word.Document> <meta name=Generator content="Microsoft Word 10"> <meta name=Originator content="Microsoft Word 10"> <link rel=File-List href="Wiki_(Cache_sizes_in_multi-core_architecture)_files/filelist.xml"> <link rel=Edit-Time-Data href="Wiki_(Cache_sizes_in_multi-core_architecture)_files/editdata.mso"> <title>Wiki: Cache sizes in multicore architectures Create a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is sh</title> <style> </style> </head>

<body lang=EN-US link=blue vlink=purple style='tab-interval:.5in'>

Multi-core processor<o:p></o:p>

<o:p> </o:p>

A multi-core processor is a processor can have more than one execution cores packaged in a single die (a package comprised of a single integrated circuit (IC)). A dual core processor contains two core plugged into a socket whereas quad-core contains four cores. The connection between cores in the same socket is much faster than single core multi-processors. As single core multi-processors are reaching its physical limits of speed and complexity because of producing heat and data synchronization problems.

<o:p> </o:p>

Communication between multiple CPUs on same die is much faster than single core CPUs on different dies which leads to better results in case of cache coherence in multi-processing. Packaging multi-cores in single die allows the cache coherency circuitry to operate at much higher clock rate than communicating with off-chip processors.

<o:p> </o:p>

Cache organization in multicore<o:p></o:p>

<o:p> </o:p>

Cache is small high speed memory usually Static RAM (SRAM) that contains most recently accessed pieces of main memory and basic purpose of cache is to minimize the latency to frequently accessed data. Cache organization deals with the number of levels in the cache hierarchy, and with the size, associativity, latency, and bandwidth parameters at each level. Cache policies determine accessibility, allocation, and eviction policies to effectively utilize on-chip cache resource.

<o:p> </o:p>

In single core processor hierarchy, we minimize latency by moving cache blocks closer and closer to the core through the levels of cache in the cache hierarchy. Same way we do in case of multi-core cache hierarchies, but there are some more things to be included in the design like some levels of cache can be private to a core and others can be shared one we have to into account whether cores has to share a given level in the cache hierarchy or whether a level is implemented as a single physical block or as a multiple physical distributed banks with non-uniform access latency to each bank.

<o:p> </o:p>

In recent release of multi-core processors, the first one or two levels in the cache are private to each core. However, deciding whether a level should be private to a core or shared among cores can depends upon the levels of parallelism like following figure depicts for Data Level Parallelism both 1st and 2nd level caches are private to each core where in case of Thread Level Parallelism 2nd level is shared among cores.

<o:p> </o:p>

<o:p> </o:p>

<o:p> </o:p>

<![if !vml]><img width=370 height=218 src="Wiki_(Cache_sizes_in_multi-core_architecture)_files/image001.gif" v:shapes="_x0000_i1025"><![endif]>

<o:p> </o:p>

<o:p> </o:p>

Cache Table<o:p></o:p>

<o:p> </o:p>

Following table shows caches used in recent multi-core architectures with the help of different parameters cache performance can be measured and this includes caches size in each level, line size in each level, latency in each level, associativity of each level, whether each level is private or shared and coherence protocol used. Each level of latency includes previous latency cycles.

<o:p> </o:p>

<o:p> </o:p>

Quad -Core AMD Opteron ™ 2000 Series Procesors<o:p></o:p>

Intel Core 2 Duo<o:p></o:p>

Intel Core  Duo<o:p></o:p>

AMD Athlon 64 <o:p></o:p>

<o:p> </o:p>

L1<o:p></o:p>

L2<o:p></o:p>

L3<o:p></o:p>

L1<o:p></o:p>

L2<o:p></o:p>

L1<o:p></o:p>

L2<o:p></o:p>

L1<o:p></o:p>

L2<o:p></o:p>

Cache Size<o:p></o:p>

64 KB Data+64 KB Instruction <o:p></o:p>

512 per core<o:p></o:p>

2 MB<o:p></o:p>

32 KB Data+32 KB Instruction<o:p></o:p>

4 MB<o:p></o:p>

32 KB<o:p></o:p>

2 MB<o:p></o:p>

64 KB <o:p></o:p>

512 x 2 KB<o:p></o:p>

Shared<o:p></o:p>

NO<o:p></o:p>

NO<o:p></o:p>

YES<o:p></o:p>

NO<o:p></o:p>

YES<o:p></o:p>

NO<o:p></o:p>

YES<o:p></o:p>

NO<o:p></o:p>

YES<o:p></o:p>

Line Size<o:p></o:p>

64 bytes<o:p></o:p>

64 bytes<o:p></o:p>

64 bytes<o:p></o:p>

64 byte<o:p></o:p>

64 byte<o:p></o:p>

64 bytes<o:p></o:p>

64 bytes<o:p></o:p>

64 bytes<o:p></o:p>

64 bytes<o:p></o:p>

Latency<o:p></o:p>

N/A<o:p></o:p>

N/A<o:p></o:p>

N/A<o:p></o:p>

3<o:p></o:p>

14<o:p></o:p>

3<o:p></o:p>

14<o:p></o:p>

3<o:p></o:p>

20<o:p></o:p>

Associatively<o:p></o:p>

2-way<o:p></o:p>

16-way<o:p></o:p>

32-way<o:p></o:p>

8-way<o:p></o:p>

16-way<o:p></o:p>

8-way<o:p></o:p>

8-way<o:p></o:p>

2-way<o:p></o:p>

16-way<o:p></o:p>

Coherence Protocol<o:p></o:p>

MOESI<o:p></o:p>

<o:p> </o:p>

MESI<o:p></o:p>

MESI<o:p></o:p>

MOESI<o:p></o:p>

<o:p> </o:p>

<o:p> </o:p>

<o:p> </o:p>

Dual Processor vs Dual Core<o:p></o:p>

<o:p> </o:p>

To compare performance difference between these two different MP architectures we can see following selection of benchmarks. First we can see results of AMD Dual Core System vs AMD Dual Processor System.

<a href="http://www.pugetsystems.com/pic_disp.php?id=7888" target="_blank"><![if !vml]><img border=0 width=550 height=127 src="Wiki_(Cache_sizes_in_multi-core_architecture)_files/image002.jpg" v:shapes="_x0000_i1026"><![endif]></a>

<o:p> </o:p>

<![if !vml]><img border=0 width=674 height=236 src="Wiki_(Cache_sizes_in_multi-core_architecture)_files/image004.jpg" v:shapes="_x0000_i1027"><![endif]>

Intel Performance comparison can also be seen

<o:p> </o:p>

<![if !vml]><img border=0 width=746 height=172 src="Wiki_(Cache_sizes_in_multi-core_architecture)_files/image005.jpg" v:shapes="_x0000_i1028"><![endif]>

<o:p> </o:p>

<![if !vml]><img border=0 width=680 height=256 src="Wiki_(Cache_sizes_in_multi-core_architecture)_files/image007.jpg" v:shapes="_x0000_i1029"><![endif]>

<o:p> </o:p>

Both AMD and Intel has launch many of single core multi-processors in past and spent time to improve in performance, reducing size of single gates but they found physical limits of semiconductor-based microelectronics becomes a major design concern. Multiple cores in a single die gave them a new way to improve processing power and go beyond the physical limit of single core processors.<o:p></o:p>

<o:p> </o:p>

<o:p> </o:p>

References:<o:p></o:p>

<a href="http://www.intel.com/technology/itj/2007/v11i3/1-integration/5-cache-heirarchy.htm">http://www.intel.com/technology/itj/2007/v11i3/1-integration/5-cache-heirarchy.htm</a>

<a href="http://en.wikipedia.org/wiki/Multi-core_(computing)">http://en.wikipedia.org/wiki/Multi-core_(computing)</a>

<a href="http://www.intel.com/performance/desktop/digoffice/index.htm">http://www.intel.com/performance/desktop/digoffice/index.htm</a>

<a href="http://pages.cs.wisc.edu/~isca2005/papers/07A-01.PDF">http://pages.cs.wisc.edu/~isca2005/papers/07A-01.PDF</a>

<a href="http://www.pugetsystems.com/articles.php?id=23">http://www.pugetsystems.com/articles.php?id=23</a>

<o:p> </o:p>

</body>

</html>