CSC/ECE 506 Fall 2007/wiki2 5 as: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 1: Line 1:
== Cache size and characteristics of multi-core processors ==
== Cache size and characteristics of multi-core processors ==


Wiki: Cache sizes in multicore architectures
Create a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. Compare this with two or three recent single-core designs.
AMD Opteron Server/workstation
2 cores, 4/22/2005
L1: 64KB (Data) + 64KB (Instruction) per core
2way associative data cache, two 64bits operations per cycle, 3 cycle latency
2way associative instruction cache,
L2: 1 MB per core, 16way associative
Athlon 64 X2 family 2 cores
5/13/2005
L1: 64KB (Data) + 64KB (Instruction) per core
2way associative data cache, two 64bits operations per cycle, 3 cycle latency
2way associative instruction cache,
L2: up to 1 MB per core, 16way associative
Athlon 64 FX High performance desktop
L1: 64KB (Data) + 64KB (Instruction) per core
2way associative data cache, two 64bits operations per cycle, 3 cycle latency
2way associative instruction cache,
L2: up to 1 MB per core, 16way associative
Athlon64 ### L1: 64K Data + 64K Instr.
2way associative data cache, two 64bits operations per cycle, 3 cycle latency
2way associative instruction cache,
L2 : 512K
16 way associative
Turion 64 X2 Laptop
L1: Total 256KB (128K per core)
L2: Total 1MB (512K per core)
ARM MPCore container for ARM9 & ARM11 High-performance embedded and entertainment
Broadcom SiByte SB1250
Two scalable MIPS core, MESI
L1: 32K data + 32K instruction
Cache block 32 bytes
Cache line 32 bytes
L1 to L1 latency : 28~36 cycles
L2: 512K shared, ECC, 4way associative
32 bytes cache line
SB1255
L1: 32K data + 32K instruction
Cache block 32 bytes
Cache line 32 bytes
L1 to L1 latency : 28~36 cycles
L2: 512K shared, ECC, 4way associative
32 bytes cache line
SB1455
L1: 32K data + 32K instruction
L2: 1MB shared, 8way associative, ECC protected
Cradle Technology CT3400
CT3600 Multi-core DSP
CT3400
8 32bits DSPs, 6 RISC-like CPUs
32K instruction cache, 64K local data memory
CT3600
2 quad DSPs, 8 DPS per quad, 32KB instruction cache per quad, 125K data memory per quad
Cavium Networks Octeon 16 MIPS cores
CN38XX/CN36XX 4 to 16 MIPS64 cores
1M ECC protected shared L2 (CN38XX), 512K (CN36XX)
32K instruction cache/8K data cache/2K write buffer per every MIPS core
IBM Cell In the PlayStation 3
PowerPC based
8 cores optimized for vector operation
Power4 1st dual core
2000
Power4
L1: 64K per CPU instruction cache(128byte line, dirrect map, LRU)
    32K per CPU data cache (2way, 128byte line, LRU)
L2: 1440K, 8way, 128 byte line
L3: 128M, 8way 512 byte line
Power4+
L1: data 32K(2way set), instruction 64K(directly mapped)
L2: 3 x 0.5M shared by dual core, 40B/cycle per port
L3: 32M
Power5 Dual core
L1: I-64K(2way LRU), D-32K(4way, LRU)
L2: three 0.625M, 10way, LRU, 128 byte line
L3: 36M, 12way
PowerPC 970MP Dual
In the Apple PowerMac
L1: 32K data(2 way associative with parity protection), 64K instruction (directly mapped)
L2: 1M per core, ECC
Cache-coherency snooping protocol
Intel Core2 Quad
Xeon Quad core
12/13/2006
Xeon 5000
L1: 16KB (Data cache per core)+ 12KµOPS(Trace cache per core)
L2: 2MB per core
Xeon MP7000
L1:16KB (Data cache per core)+ 12K µOPS (Trace cache per core)
L2: 1MB per core or 2x2MB per core
Xeon MP7300
L1: 32K data per core, 32K instruction per core
L2: up tp 8MB, snoop filter, 4M shared L2 per die
Core Duo
Core2 Duo
Xeon (x1xx) Dual core
Xeon 5100
L1 : 32KB (Data) + 32KB (Instruction) per core
L2 : 4MB (Shared)
Xeon 7100
L1: 16K data
L2: 2M 8way ECC
L3: up to 16M ECC. 16way
Itanium 2 Multi-core
Montecito
Itanium2
L1:32KB
L2:256KB
L3:up to 9MB
PARISC PA8800 PA8800
L1: 1.5M data 4way set, 1.5M instruction 4way set, 2cycle
L2: 32M
Stream Processor Strom-1 fmaily 40 to 80 ALUs
SP16HP-G220, SP16-G160, SP8-G80
L1: 32K data, 16K instruction
96K VLIW instruction memory
Sun Microsystems UltraSPARC IV
UltraSPARC IV+ UltraSPARC IV
dual
L1: 64K data, 32K instruction
L2: up to 16MB, external 8M 2way set associative per core
Cache line sizes changed 512 to 128 bytes to reduce data contentionassociated with sub-blocked cache, LRU replacement policy, ECC
write cache: hash-indexed 2K,
2K prefetch cache
UltraSPARC IV+
L2: 2M on-chip
L3: 32M external
UlatraSPARC T1 8 cores
L1: 16K instruction cache per core, 4way-set associative parity protected
8K data cache , parity protected, 4way set associative
L2: 3M 12way, 4banks, ECC


== Cache size and characteristics of single-core processors ==
== Cache size and characteristics of single-core processors ==


== Conclusion ==
== Conclusion ==

Revision as of 22:03, 24 September 2007

Cache size and characteristics of multi-core processors

Wiki: Cache sizes in multicore architectures

Create a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. Compare this with two or three recent single-core designs.


AMD Opteron Server/workstation 2 cores, 4/22/2005

L1: 64KB (Data) + 64KB (Instruction) per core 2way associative data cache, two 64bits operations per cycle, 3 cycle latency 2way associative instruction cache, L2: 1 MB per core, 16way associative

Athlon 64 X2 family 2 cores 5/13/2005 L1: 64KB (Data) + 64KB (Instruction) per core 2way associative data cache, two 64bits operations per cycle, 3 cycle latency 2way associative instruction cache, L2: up to 1 MB per core, 16way associative Athlon 64 FX High performance desktop L1: 64KB (Data) + 64KB (Instruction) per core 2way associative data cache, two 64bits operations per cycle, 3 cycle latency 2way associative instruction cache, L2: up to 1 MB per core, 16way associative Athlon64 ### L1: 64K Data + 64K Instr. 2way associative data cache, two 64bits operations per cycle, 3 cycle latency 2way associative instruction cache, L2 : 512K 16 way associative Turion 64 X2 Laptop L1: Total 256KB (128K per core) L2: Total 1MB (512K per core) ARM MPCore container for ARM9 & ARM11 High-performance embedded and entertainment

Broadcom SiByte SB1250 Two scalable MIPS core, MESI L1: 32K data + 32K instruction Cache block 32 bytes Cache line 32 bytes L1 to L1 latency : 28~36 cycles

L2: 512K shared, ECC, 4way associative 32 bytes cache line

SB1255 L1: 32K data + 32K instruction Cache block 32 bytes Cache line 32 bytes L1 to L1 latency : 28~36 cycles

L2: 512K shared, ECC, 4way associative 32 bytes cache line

SB1455 L1: 32K data + 32K instruction

L2: 1MB shared, 8way associative, ECC protected Cradle Technology CT3400 CT3600 Multi-core DSP CT3400 8 32bits DSPs, 6 RISC-like CPUs 32K instruction cache, 64K local data memory

CT3600 2 quad DSPs, 8 DPS per quad, 32KB instruction cache per quad, 125K data memory per quad

Cavium Networks Octeon 16 MIPS cores CN38XX/CN36XX 4 to 16 MIPS64 cores 1M ECC protected shared L2 (CN38XX), 512K (CN36XX) 32K instruction cache/8K data cache/2K write buffer per every MIPS core IBM Cell In the PlayStation 3 PowerPC based 8 cores optimized for vector operation Power4 1st dual core 2000

Power4 L1: 64K per CPU instruction cache(128byte line, dirrect map, LRU)

    32K per CPU data cache (2way, 128byte line, LRU)

L2: 1440K, 8way, 128 byte line L3: 128M, 8way 512 byte line

Power4+ L1: data 32K(2way set), instruction 64K(directly mapped) L2: 3 x 0.5M shared by dual core, 40B/cycle per port L3: 32M Power5 Dual core

L1: I-64K(2way LRU), D-32K(4way, LRU) L2: three 0.625M, 10way, LRU, 128 byte line L3: 36M, 12way PowerPC 970MP Dual In the Apple PowerMac L1: 32K data(2 way associative with parity protection), 64K instruction (directly mapped)

L2: 1M per core, ECC Cache-coherency snooping protocol Intel Core2 Quad Xeon Quad core 12/13/2006

Xeon 5000 L1: 16KB (Data cache per core)+ 12KµOPS(Trace cache per core) L2: 2MB per core


Xeon MP7000 L1:16KB (Data cache per core)+ 12K µOPS (Trace cache per core) L2: 1MB per core or 2x2MB per core

Xeon MP7300 L1: 32K data per core, 32K instruction per core L2: up tp 8MB, snoop filter, 4M shared L2 per die Core Duo Core2 Duo Xeon (x1xx) Dual core

Xeon 5100 L1 : 32KB (Data) + 32KB (Instruction) per core L2 : 4MB (Shared)

Xeon 7100 L1: 16K data L2: 2M 8way ECC L3: up to 16M ECC. 16way

Itanium 2 Multi-core Montecito

Itanium2 L1:32KB L2:256KB L3:up to 9MB PARISC PA8800 PA8800 L1: 1.5M data 4way set, 1.5M instruction 4way set, 2cycle L2: 32M

Stream Processor Strom-1 fmaily 40 to 80 ALUs

SP16HP-G220, SP16-G160, SP8-G80 L1: 32K data, 16K instruction

96K VLIW instruction memory

Sun Microsystems UltraSPARC IV UltraSPARC IV+ UltraSPARC IV dual L1: 64K data, 32K instruction L2: up to 16MB, external 8M 2way set associative per core Cache line sizes changed 512 to 128 bytes to reduce data contentionassociated with sub-blocked cache, LRU replacement policy, ECC write cache: hash-indexed 2K, 2K prefetch cache

UltraSPARC IV+ L2: 2M on-chip L3: 32M external

UlatraSPARC T1 8 cores L1: 16K instruction cache per core, 4way-set associative parity protected 8K data cache , parity protected, 4way set associative

L2: 3M 12way, 4banks, ECC

Cache size and characteristics of single-core processors

Conclusion