CSC/ECE 506 Fall 2007/wiki 2 5 2281
Objective
To create a table having details of the current multi-core architecture processors and their internal cache specifications.
Cache Sizes in multicore architectures
Below is a table containing details of the current multi-core processor architectures along with their intricate details like number of levels, cache size, etc.
Multi-core Architecture | Number of levels | Line Size | Cache Size | Associativity | Latency | Is the Level shared | Coherence Protocol used |
---|---|---|---|---|---|---|---|
AMD Opteron Processor | 2 | - | 64 Byte L1 Cache - Data and Instruction Cache Separated, 1024 KByte L2 | 2 Way Associative ECC Protected L1 Data Cache & Parity Protected Instruction Cache; 16 Way Associative Parity Protected L2 Cache |
Two 64 bit operations per 3 cycle latency | No | Exclusive cache architecture |
AMD Athlon X2 Dual Core | 2 | - | 64 Byte L1 Cache - Data and Instruction Cache Separated, 1024 KByte L2 | 2 Way Associative ECC Protected L1 Data Cache & Parity Protected Instruction Cache; 16 Way Associative Parity Protected L2 Cache |
Two 64 bit operations per 3 cycle latency | No | Exclusive cache architecture |
AMD Turin 64 Mobile | 2 | - | 64 Kbyte L1; Upto 1MByte of L2 with 512 Kbyte Options | 2-Way Associative ECC-Protected L1 Data Cache & Parity Protected L1 Instruction Cache; 16-Way Associative ECC-Protected L2 Cache |
Two 64-bit operations per cycle, 3-cycle latency - With advanced branch prediction | No | Exclusive cache architecture—storage |
AMD Sempron Processor | 2 | - | 64-Kbyte ECC-Protected L1 Data Cache && Parity-Protected Instruction Cache; 256-Kbyte ECC-Protected L2 Cache |
2-Way Associative L1 Cache ; 16-Way Associative L2 Cache | Two 64-bit operations per cycle, 3-cycle latency | No | Exclusive cache architecture—storage |
AMD Athlon Duron Processor | 2 | - | Integrated 128-Kbyte L1 Cache and an exclusive 64-Kbyte L2 Cache | - | - | No | Exclsive cache architecture-storage |
AMD Palemo Processor | 2 | - | 64 KByte L1 Data Cache & L1 Instruction Cache; Unified 128 or 256 KByte L2 Cache |
- | - | No | Inclusive |
AMD Thoroughbred (TBRED) | 2 | - | 64 KByte L1 Data Cache & L1 Instruction Cache; Unified 256 KByte full-speed L2 Cache |
- | - | No | Inclusive |
AMD Barton Processor | 2 | - | 64 KByte L1 Data Cache & L1 Instruction Cache; Unified 512 KByte L2 Cache |
- | - | No | - |
AMD Thunderbird | 2 | - | - | 16-Way | - | - | - |
CELL Processor (Playstation3 Processor) Manufactured by TOSHIBA, IBM and SONY | Power PC Core, which is at the centre of the Cell, contains 2 Levels ; Each of the "surrounding" SPEs have just one level of Cache |
- | 32 KByte Data Cache + 32 Kbyte Instruction Cache in the Power PC Core ; The surrounding SPEs have 256 Kbyte Unified Cache |
- | - | The L2 Cache of the Power PC Core is shared by the surrounding SPEs | - |
AMD Athlon 64 X2 Dual Core - 4600+ | 2 | - | 128 KByte L1 Unified Cache ; 512 KByte L2 Unified Cache | - | - | No | - |
Storm-1 Family by Stream Processors | 1 | - | 16 KByte L1 Data / Instruction Cache | - | 533 MHz Data Rate between L1 and DDR Memory | No | NA |
UltraSPARC IV | 2 | 128 bytes to 512 bytes | 64KByte Dual L1 Data Cache; 32KByte L2 extendable up to 16MB | 2way set associative per core | - | No | - |
UltraSPARC IV+ | 2 | - | 32 MByte L2 on-chip ; 32 MByte L3 External | - | - | No | - |
UlatraSPARC T1 | 2 | 4 Banks for L2 | 8 KByte Data Cache ; 16 KByte Instruction Cache; 3 MByte L2 Cache |
4-Way Set Associative for L1 ; 12-Way Set Associative for L2 Cache | - | No | - |
Intel Pentium D Series | 3 (L1 + L2 + Execution Trace cache holding the decoded Micro-Ops) | - | 2 * 16 KByte of L1 Data Cache; 2 * 2 MByte L2 Unified Cache |
- | - | Yes | - |
Intel Itanium 2 | 3 | - | 16 KByte L1 Instruction Cache ; 16 KByte L1 Data Cache; 265 KByte L2 Unified Cache; 3 - 9 MByte of L3 Unified Cache | 4-Way Set Associative L1 Cache ; 8-Way Set Associative L2 Cache | 6.4 - 2.1 GB/s transfer rate between L3 and External Memory | L2 and L3 Cache is Shared | - |
CRAY X1 | 2 | - | 32 KByte Core L1 Unified Cache; 512 MByte of L2 Unified Cache | - | 76 GB/s - 50 GB/s for Loads and 26 GB/s for stores between the L2 Cache and the Memory | Yes | Integrated Vector Cache is used for Coherence and helps tolerate memory latency |
References
http://www.interfacebus.com/Controllers.html
http://www-03.ibm.com/servers/eserver/opteron/pdf/IBM_dualcore_whitepaper.pdf
http://arstechnica.com/news.ars/post/20061102-8135.html
http://compreviews.about.com/od/cpus/a/dualcore.htm
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2343
http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118,00.html
http://www.interfacebus.com/Controllers.html
http://www.broadcom.com/
http://www.via.com.tw/en/products/processors/eden/
http://www.centtech.com/
http://www.ibm.com/us/en/
http://www.streamprocessors.com/
http://www.streamprocessors.com/streamprocessors/resources/
http://www.netlib.org/utk/papers/advanced-computers/