CSC/ECE 506 Fall 2007/wiki2 5 as

From Expertiza_Wiki
Jump to navigation Jump to search

Cache size and characteristics of multi-core processors

The following tables shows the current tide of the multicore processors's cache.

Create a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. Compare this with two or three recent single-core designs.


AMD

Processor L1 Cache L2 Cache L3 Cache Description
Opteron Server/Workstation 64K Data + 64K Instruction per core

2-way associative data cache, two 64bits operations per cycle, 3 cycle latency 2-way associative instruction cache,

1 MB per core, 16-way associative &nbsp Dual cores, introduced on 4/22/2005
Athlon64 X2 family 64K Data + 64K Instruction per core

2-way associative data cache, two 64bits operations per cycle, 3 cycle latency 2-way associative instruction cache,

up to 1 MB per core, 16-way associative &nbsp Dual cores, intoduced on 5/13/2005
Athlon64 FX 64K Data + 64K Instruction per core

2-way associative data cache, two 64bits operations per cycle, 3 cycle latency 2-way associative instruction cache,

up to 1 MB per core, 16-way associative &nbsp High performance desktop
Turion64 X2 Total 256KB (128K per core) Total 1MB (512K per core) &nbsp For Laptop

ARM

MPCore container for ARM9 & ARM11 High-performance embedded and entertainment

Broadcom

  • SiByte SB1250

Two scalable MIPS core, MESI L1: 32K data + 32K instruction Cache block 32 bytes Cache line 32 bytes L1 to L1 latency : 28~36 cycles

L2: 512K shared, ECC, 4way associative 32 bytes cache line

  • SB1255

L1: 32K data + 32K instruction Cache block 32 bytes Cache line 32 bytes L1 to L1 latency : 28~36 cycles

L2: 512K shared, ECC, 4way associative 32 bytes cache line

  • SB1455

L1: 32K data + 32K instruction

L2: 1MB shared, 8way associative, ECC protected Cradle Technology CT3400 CT3600 Multi-core DSP CT3400 8 32bits DSPs, 6 RISC-like CPUs 32K instruction cache, 64K local data memory

CT3600

2 quad DSPs, 8 DPS per quad, 32KB instruction cache per quad, 125K data memory per quad

Cavium Networks

  • Octeon

16 MIPS cores CN38XX/CN36XX 4 to 16 MIPS64 cores 1M ECC protected shared L2 (CN38XX), 512K (CN36XX) 32K instruction cache/8K data cache/2K write buffer per every MIPS core

IBM

  • Cell

In the PlayStation 3 PowerPC based 8 cores optimized for vector operation Power4 1st dual core 2000

  • Power4

L1: 64K per CPU instruction cache(128byte line, dirrect map, LRU)

    32K per CPU data cache (2way, 128byte line, LRU)

L2: 1440K, 8way, 128 byte line L3: 128M, 8way 512 byte line

  • Power4+

L1: data 32K(2way set), instruction 64K(directly mapped) L2: 3 x 0.5M shared by dual core, 40B/cycle per port L3: 32M

  • Power5

Dual core L1: I-64K(2way LRU), D-32K(4way, LRU) L2: three 0.625M, 10way, LRU, 128 byte line L3: 36M, 12way

  • PowerPC 970MP Dual

In the Apple PowerMac L1: 32K data(2 way associative with parity protection), 64K instruction (directly mapped)

L2: 1M per core, ECC Cache-coherency snooping protocol

Intel

  • Core2 Quad
  • Xeon Quad core

12/13/2006

  • Xeon 5000

L1: 16KB (Data cache per core)+ 12KµOPS(Trace cache per core) L2: 2MB per core

  • Xeon MP7000

L1:16KB (Data cache per core)+ 12K µOPS (Trace cache per core) L2: 1MB per core or 2x2MB per core

  • Xeon MP7300

L1: 32K data per core, 32K instruction per core L2: up tp 8MB, snoop filter, 4M shared L2 per die Core Duo

  • Core2 Duo

Xeon (x1xx) Dual core

  • Xeon 5100

L1 : 32KB (Data) + 32KB (Instruction) per core L2 : 4MB (Shared)

  • Xeon 7100

L1: 16K data L2: 2M 8way ECC L3: up to 16M ECC. 16way

  • Itanium 2

Multi-core Montecito L1:32KB L2:256KB L3:up to 9MB

PARISC

  • PA8800

L1: 1.5M data 4way set, 1.5M instruction 4way set, 2cycle L2: 32M

Stream Processors

  • Strom-1 fmaily

40 to 80 ALUs

SP16HP-G220, SP16-G160, SP8-G80 L1: 32K data, 16K instruction 96K VLIW instruction memory

Sun Microsystems

  • UltraSPARC IV

dual L1: 64K data, 32K instruction L2: up to 16MB, external 8M 2way set associative per core Cache line sizes changed 512 to 128 bytes to reduce data contentionassociated with sub-blocked cache, LRU replacement policy, ECC write cache: hash-indexed 2K, 2K prefetch cache

  • UltraSPARC IV+

L2: 2M on-chip L3: 32M external

  • UlatraSPARC T1

8 cores L1: 16K instruction cache per core, 4way-set associative parity protected 8K data cache , parity protected, 4way set associative

L2: 3M 12way, 4banks, ECC

Cache size and characteristics of single-core processors

AMD

  • Athlon 64

dual L1: 64K data, 32K instruction L2: up to 16MB, external 8M 2way set associative per core Cache line sizes changed 512 to 128 bytes to reduce data contentionassociated with sub-blocked cache, LRU replacement policy, ECC write cache: hash-indexed 2K, 2K prefetch cache

Conclusion

References

[1] http://www.amd.com/us-en/Processors/ProductInformation

[2] http://www.broadcom.com/products/Enterprise-Networking/Communications-Processors/BCM1250

[3] http://www-01.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_970MP_Microprocessor

[4] http://www.intel.com/products/processor/xeon7000/documentation.htm?iid=products_xeon7000+tab_techdocs#datasheets

[5] http://www.sun.com/processors/UltraSPARC-IV/

[6] http://www.sun.com/processors/UltraSPARC-IV+/

[7] http://www.sun.com/processors/UltraSPARC-T1/specs.xml

[8] http://www.streamprocessors.com/streamprocessors/Home/Products/Storm-1Family.html

[9] http://www.netlib.org/utk/papers/advanced-computers/pa-risc.html

[10] http://www.netlib.org/utk/papers/advanced-computers/power4.html

[11] http://www.netlib.org/utk/papers/advanced-computers/power5.html