CSC/ECE 506 Fall 2007/wiki2 5 as: Difference between revisions
(19 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== | = Introduction = | ||
Several cache technology which can expedite the speed of processing are used for modern processors over memory-CPU gap. Since the cache structure itself can affect the performance of the cache, to choose an appropriate structure is an important and a hard to solve problem. For example, generally bigger cache shows better performance. However due to cache pollution, the performance shows diminishing returns as the cache size goes bigger. Thus we have to choose an appropriate cache size. From this point, it might be valuable to look through the cache chracteristics of modern processors. | |||
The | The cache structure can be determined by a few parameters such as cache size, replacement algorithm and associativity, and cache line size. While multi-core processors are introduced, the cache coherency also becomes issue and the coherency protocol such as MESI and MOESI affects the performance. In this psage, several cache parameters will be shown for modern multicore processors as well as for a couple of single-core processors. | ||
= Cache sizes in multicore architectures = | |||
''Topic'' - Create a table of caches used in current multicore architectures, | |||
including such parameters as number of levels, line size, size and | |||
associativity of each level, latency of each level, whether each level | |||
is shared, and coherence protocol used. Compare this with two or three | |||
recent single-core designs. | |||
{| border="1" cellpadding="5" cellspacing="0" align="center" | |||
{| border="1" cellspacing="0" | |+'''Detail of Caches''' | ||
|- | |- | ||
! | ! colspan="6" style="background:#ffdead;" | Multicore Processors | ||
|- | |- | ||
! | ! Processor Name | ||
! | ! Number of Levels | ||
! Line Size | |||
! Cache Size | |||
! | ! Associativity | ||
! | ! Coherence Protocol | ||
! | |||
|- | |- | ||
| AMD Athlon 64 X2 | |||
| 2 | |||
| 64 bytes (for both L1 & L2) | |||
2 | | L1 - 64KB (Data) + 64KB (Instruction) per core<br/>L2 - 512KB to 1MB per core | ||
| L1 - 2 way (Data and Instruction cache)<br/>L2 - 16 way associative | |||
| Modified Owner Exclusive Shared Invalid (MOESI) | |||
|- | |- | ||
| AMD Athlon 64 FX | |||
| 2 | |||
| 64 bytes (for both L1 & L2) | |||
! | | L1 - 64KB (Data) + 64KB (Instruction) per core<br/>L2 - 1MB per core | ||
| L1 - 2 way (Data and Instruction cache)<br/>L2 - 16 way associative | |||
| Modified Owner Exclusive Shared Invalid (MOESI) | |||
|- | |||
| AMD Athlon Opteron<br/>(marketed for servers) | |||
| 2 | |||
| 64 bytes (for both L1 & L2) | |||
| L1 - 64KB (Data) + 64KB (Instruction) per core<br/>L2 - 1MB per core | |||
| L1 - 2 way (Data and Instruction cache)<br/>L2 - 16 way associative | |||
| Modified Owner Exclusive Shared Invalid (MOESI) | |||
|- | |||
| Intel Pentium D | |||
| 2 | |||
| L1 - 64 byte lines<br/>L2 - 128 byte lines | |||
| L1 - 16 KB (data only. Instead of instruction cache, a "150KB trace cache" is used)<br/>L2 - 1MB or 2MB per core | |||
| L1 - 4 way<br/>L2 - 8 way | |||
| Modified Exclusive Shared Invalid (MESI) | |||
|- | |||
| Intel Pentium Dual Core | |||
| 2 | |||
| L1 - 64 byte lines<br/>L2 - 64 byte lines | |||
| L1 - 32 KB (both Data and Instruction cache)<br/>L2 - 1MB or 2MB per core | |||
| L1 - 4 way<br/>L2 - 8 way | |||
| Modified Exclusive Shared Invalid (MESI) | |||
|- | |||
| Intel Core 2 Duo | |||
| 2 | |||
| L1 - 64 byte lines<br/>L2 - 64 byte lines | |||
| L1 - 32 KB (each for Data and Instruction cache)<br/>L2 - 2MB or 4MB | |||
| L1 - 4 way<br/>L2 - 8 way | |||
| Modified Exclusive Shared Invalid (MESI) | |||
|- | |||
| Broadcom SiByte SB1250 | |||
| 2 | |||
| L1 - 32 byte lines<br/>L2 - 32 byte lines | |||
| L1 - 32 KB (a piece for Data and Instruction caches)<br/>L2 - 512KB | |||
| L1 - 2 way<br/>L2 - 4 way | |||
| Modified Exclusive Shared Invalid (MESI) | |||
|- | |||
| Sun Microsystems UltraSPARC IV | |||
| 2 | |||
| L1 - 128byte lines<br/>L2 - 128 byte lines | |||
| L1 - 64KB data, 32KB instruction<br/>L2 - up to 16MB | |||
| L2 - 2 way | |||
| Modified Owner Exclusive Shared Invalid (MOESI) | |||
|- | |||
| IBM Cell Processor | |||
| 2 | |||
| Not Available | |||
| L1 - 32 KB (a piece for both data and instruction caches)<br/>L2 - 512KB | |||
| L1 - 2 way instruction, 4 way data<br/>L2 - 8 way | |||
| Modified Exclusive Shared Invalid (MESI) | |||
|- | |||
! colspan="6" style="background:#ffdead;" | Singlecore Processors | |||
|- | |||
| AMD Athlon 64 | |||
| 2 | |||
| L1 - 64 byte lines<br/>L2 - 64 byte lines | |||
| L1 - 64 KB (each for Data and Instruction cache)<br/>L2 - 512KB | |||
| L1 - 2 way<br/>L2 - 16 way | |||
| Modified Owner Exclusive Shared Invalid (MOESI) | |||
|- | |||
| AMD K6 / K6 III | |||
| 2 | |||
| L1 - 32 byte lines<br/2>L2 - 32 byte lines | |||
| L1 - 32KB data, 32KB instruction<br/>L2 - 256KB | |||
| L1 - 2 way<br/>L2 - 4 way | |||
| Modified Exclusive Shared Invalid (MESI) | |||
|- | |||
| Intel Pentium 4 | |||
| 2 | |||
| L1 - 64 byte lines<br/>L2 - 128 byte lines | |||
| L1 - 8 KB (data only. Instead of instruction cache, a "150KB trace cache" is used))<br/>L2 -256KB, 512KB or 1MB | |||
| L1 - 4 way<br/>L2 - 8 way | |||
| Modified Exclusive Shared Invalid (MESI) | |||
|- | |||
| Intel PentiumIII 600 | |||
| 2 | |||
| L1 - 32 byte lines<br/>L2 - 32 byte lines | |||
| L1 - 16 KB data, 16KB Instruction<br/>L2 - 256KB | |||
| L1 - 4 way <br/>L2 - 8 way | |||
| Modified Exclusive Shared Invalid (MESI) | |||
|} | |} | ||
= | = Conclusion = | ||
Most of processors introduced nowadays have 32K or 64K data/instruction L1 cache which have 2, 4 or 8 set-associativity, although we can have many other sizes and associativity. The cache lines are either 64 or 128 bytes. MESI and MOESI are the prevailed cache coherency protocol. | |||
L1 | |||
8 | |||
From the above table we find that there isn't much difference in the specifications of caches used in multi-core and single-core processors. | |||
= References = | |||
[1] http://www.amd.com/us-en/Processors/ProductInformation | [1] http://www.amd.com/us-en/Processors/ProductInformation | ||
Line 212: | Line 131: | ||
[3] http://www-01.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_970MP_Microprocessor | [3] http://www-01.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_970MP_Microprocessor | ||
[4] http://www.intel.com/products/processor | [4] http://www.intel.com/products/processor | ||
[5] http://www.sun.com/processors/UltraSPARC-IV/ | [5] http://www.sun.com/processors/UltraSPARC-IV/ | ||
Line 227: | Line 146: | ||
[11] http://www.netlib.org/utk/papers/advanced-computers/power5.html | [11] http://www.netlib.org/utk/papers/advanced-computers/power5.html | ||
[12] http://en.wikipedia.org/wiki/Cell_microprocessor | |||
[13] http://techreport.com/articles.x/8236/2 | |||
[14] http://www.hardwaresecrets.com/article/481/9 |
Latest revision as of 02:01, 29 September 2007
Introduction
Several cache technology which can expedite the speed of processing are used for modern processors over memory-CPU gap. Since the cache structure itself can affect the performance of the cache, to choose an appropriate structure is an important and a hard to solve problem. For example, generally bigger cache shows better performance. However due to cache pollution, the performance shows diminishing returns as the cache size goes bigger. Thus we have to choose an appropriate cache size. From this point, it might be valuable to look through the cache chracteristics of modern processors.
The cache structure can be determined by a few parameters such as cache size, replacement algorithm and associativity, and cache line size. While multi-core processors are introduced, the cache coherency also becomes issue and the coherency protocol such as MESI and MOESI affects the performance. In this psage, several cache parameters will be shown for modern multicore processors as well as for a couple of single-core processors.
Cache sizes in multicore architectures
Topic - Create a table of caches used in current multicore architectures, including such parameters as number of levels, line size, size and associativity of each level, latency of each level, whether each level is shared, and coherence protocol used. Compare this with two or three recent single-core designs.
Multicore Processors | |||||
---|---|---|---|---|---|
Processor Name | Number of Levels | Line Size | Cache Size | Associativity | Coherence Protocol |
AMD Athlon 64 X2 | 2 | 64 bytes (for both L1 & L2) | L1 - 64KB (Data) + 64KB (Instruction) per core L2 - 512KB to 1MB per core |
L1 - 2 way (Data and Instruction cache) L2 - 16 way associative |
Modified Owner Exclusive Shared Invalid (MOESI) |
AMD Athlon 64 FX | 2 | 64 bytes (for both L1 & L2) | L1 - 64KB (Data) + 64KB (Instruction) per core L2 - 1MB per core |
L1 - 2 way (Data and Instruction cache) L2 - 16 way associative |
Modified Owner Exclusive Shared Invalid (MOESI) |
AMD Athlon Opteron (marketed for servers) |
2 | 64 bytes (for both L1 & L2) | L1 - 64KB (Data) + 64KB (Instruction) per core L2 - 1MB per core |
L1 - 2 way (Data and Instruction cache) L2 - 16 way associative |
Modified Owner Exclusive Shared Invalid (MOESI) |
Intel Pentium D | 2 | L1 - 64 byte lines L2 - 128 byte lines |
L1 - 16 KB (data only. Instead of instruction cache, a "150KB trace cache" is used) L2 - 1MB or 2MB per core |
L1 - 4 way L2 - 8 way |
Modified Exclusive Shared Invalid (MESI) |
Intel Pentium Dual Core | 2 | L1 - 64 byte lines L2 - 64 byte lines |
L1 - 32 KB (both Data and Instruction cache) L2 - 1MB or 2MB per core |
L1 - 4 way L2 - 8 way |
Modified Exclusive Shared Invalid (MESI) |
Intel Core 2 Duo | 2 | L1 - 64 byte lines L2 - 64 byte lines |
L1 - 32 KB (each for Data and Instruction cache) L2 - 2MB or 4MB |
L1 - 4 way L2 - 8 way |
Modified Exclusive Shared Invalid (MESI) |
Broadcom SiByte SB1250 | 2 | L1 - 32 byte lines L2 - 32 byte lines |
L1 - 32 KB (a piece for Data and Instruction caches) L2 - 512KB |
L1 - 2 way L2 - 4 way |
Modified Exclusive Shared Invalid (MESI) |
Sun Microsystems UltraSPARC IV | 2 | L1 - 128byte lines L2 - 128 byte lines |
L1 - 64KB data, 32KB instruction L2 - up to 16MB |
L2 - 2 way | Modified Owner Exclusive Shared Invalid (MOESI) |
IBM Cell Processor | 2 | Not Available | L1 - 32 KB (a piece for both data and instruction caches) L2 - 512KB |
L1 - 2 way instruction, 4 way data L2 - 8 way |
Modified Exclusive Shared Invalid (MESI) |
Singlecore Processors | |||||
AMD Athlon 64 | 2 | L1 - 64 byte lines L2 - 64 byte lines |
L1 - 64 KB (each for Data and Instruction cache) L2 - 512KB |
L1 - 2 way L2 - 16 way |
Modified Owner Exclusive Shared Invalid (MOESI) |
AMD K6 / K6 III | 2 | L1 - 32 byte lines L2 - 32 byte lines |
L1 - 32KB data, 32KB instruction L2 - 256KB |
L1 - 2 way L2 - 4 way |
Modified Exclusive Shared Invalid (MESI) |
Intel Pentium 4 | 2 | L1 - 64 byte lines L2 - 128 byte lines |
L1 - 8 KB (data only. Instead of instruction cache, a "150KB trace cache" is used)) L2 -256KB, 512KB or 1MB |
L1 - 4 way L2 - 8 way |
Modified Exclusive Shared Invalid (MESI) |
Intel PentiumIII 600 | 2 | L1 - 32 byte lines L2 - 32 byte lines |
L1 - 16 KB data, 16KB Instruction L2 - 256KB |
L1 - 4 way L2 - 8 way |
Modified Exclusive Shared Invalid (MESI) |
Conclusion
Most of processors introduced nowadays have 32K or 64K data/instruction L1 cache which have 2, 4 or 8 set-associativity, although we can have many other sizes and associativity. The cache lines are either 64 or 128 bytes. MESI and MOESI are the prevailed cache coherency protocol.
From the above table we find that there isn't much difference in the specifications of caches used in multi-core and single-core processors.
References
[1] http://www.amd.com/us-en/Processors/ProductInformation
[2] http://www.broadcom.com/products/Enterprise-Networking/Communications-Processors/BCM1250
[3] http://www-01.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_970MP_Microprocessor
[4] http://www.intel.com/products/processor
[5] http://www.sun.com/processors/UltraSPARC-IV/
[6] http://www.sun.com/processors/UltraSPARC-IV+/
[7] http://www.sun.com/processors/UltraSPARC-T1/specs.xml
[8] http://www.streamprocessors.com/streamprocessors/Home/Products/Storm-1Family.html
[9] http://www.netlib.org/utk/papers/advanced-computers/pa-risc.html
[10] http://www.netlib.org/utk/papers/advanced-computers/power4.html
[11] http://www.netlib.org/utk/papers/advanced-computers/power5.html
[12] http://en.wikipedia.org/wiki/Cell_microprocessor