CSC 456 Fall 2013/1c wa
Trends in cache size and organization
Introduction
Cache size has varied over the years. Intuitively one would expect cache sizes to keep growing larger and larger following some law similar to Moore’s Law. In actuality however L1 cache sizes have all but maxed out for an individual processor, and if the trends are analyzed over the years sometimes they even decreased in size. To go along with this cache associativity has varied over the years. While it is true that no cache organization is optimal for every situation certain organizations certainly perform better for the average task on certain systems. This wiki will try to analyze data on cache size and associativities to gain some insight into the trends and reasoning behind vendor’s choices in cache size and organization over the years, specifically from the late 80’s / early 90’s to the early 2000’s.
Cache Associativity
This table shows cache associativities found in some mainstream processors from the late 80’s to the early 2000’s with one processor from 1968 just for reference. As can be seen from the data, the late 80’s early 90’s tended towards a set associative cache with around four lines. In the mid-90’s it tended towards lower associativity and direct mapping. Then in the late 90’s and early 2000’s it tended back towards higher associativities with larger set sizes again.
L1, L2, L3 Associativity
System | Year | L1 Associativity | L2 Associativity | L3 Associativity | Notes: |
IBM 360/85 | 1968 | Sector | N/A | N/A | First processor with a cache, clock speed 12.5MHz |
Intel 80486 | 1989 | 4-way associative | N/A | N/A | First processor with a cache, clock speed 12.5MHz |
SuperSPARC | 1992 | 4 & 5 way set | N/A | N/A | Used to render Toy Story, Core @ 40MHz |
Alpha 21064(DEC) | 1992 | Direct | Direct | N/A | |
UltraSPARC | 1995 | 2-Way & Direct | Direct | N/A | 64-bit w/ Core@200MHz |
Alpha 21164(DEC) | 1995 | Direct | 3 way set | N/A | |
K6-III | 1999 | 2 way | 4 way | n/a | |
Pentium 4 | 10/2000 | 4 Way | 8 Way | N/A | |
UltraSPARC III | 2001 | 4 Way | N/A | N/A | |
Itanium 2 | 2002 | 4 -way | 8-way | 12 way |
Cache Size
L1, L2, L3 Size by Year
Processor | System Type | Year | L1 size | L2 size | L3 size |
IBM 360/85 | Mainframe | 1968 | 16 to 32 KB | — | — |
PDP-11/70 | Minicomputer | 1975 | 1 KB | — | — |
VAX 11/780 | Minicomputer | 1978 | 16 KB | — | — |
IBM 3033 | Mainframe | 1978 | 64 KB | — | — |
IBM 3090 | Mainframe | 1985 | 128 to 256 KB | — | — |
Intel 80486 | PC | 1989 | 8 KB | — | — |
SuperSPARC | PC | 1992 | 16 KB/20 KB | 0 to 2 MB | — |
Pentium | PC | 1993 | 8 KB/8 KB | 256 to 512 KB | — |
PowerPC 601 | PC | 1993 | 32 KB | — | — |
UltraSPARC | PC | 1995 | 16 KB/16 KB | 512 KB to 4 MB | — |
PowerPC | 620 PC | 1996 | 32 KB/32 KB | — | — |
PowerPC G4 | PC/server | 1999 | 32 KB/32 KB | 256 KB to 1 MB | 2 MB |
IBM S/390 G4 | Mainframe | 1997 | 32 KB | 256 KB | 2 MB |
IBM S/390 G6 | Mainframe | 1999 | 256 KB | 8 MB | — |
Pentium 4 | PC/server | 2000 | 8 KB/8 KB | 256 KB | — |
IBM SP | High-end server | 2000 | 64 KB/32 KB | 8 MB | — |
CRAY MTAb | Supercomputer | 2000 | 8 KB | 2 MB | — |
UltraSPARCIII | PC | 2001 | 32 KB/64 KB | 2 to 8 MB | — |
Itanium | PC/server | 2001 | 16 KB/16 KB | 96 KB | 4 MB |
SGI Origin 2001 | High-end server | 2001 | 32 KB/32 KB | 4 MB | — |
Itanium 2 | PC/server | 2002 | 32 KB | 256 KB | 6 MB |
IBM POWER5 | High-end server | 2003 | 64 KB | 1.9 MB | 36 MB |
CRAY XD-1 | Supercomputer | 2004 | 64 KB/64 KB | 1MB | — |
Main Memory Specs
DRAM Memory Standards:
Standard | Mem Clock | Cycle time | I/O Bus Clock | Module Name | Peak Transfer Rate | Prefetch | Latency | Year |
DDR-333 | 166 MHz | 6 ns | ||||||
DDR2-400 | 100MHz | 10 ns | 200 MHz | PC2-3200 | 3200 MB/s | 4 n | 4-6 Bus CC | 2003 |
DDR2-533 | 133 MHz | 7.5 ns | 266 | PC2-4200 | 4266 MB/s | 4 n | ||
DDR2-667 | 166 MHz | 6 ns | 333 MHz | PC2-5300 | 5333 MB/s | 4 n | ||
DDR2-800 | 200 MHz | 5 ns | 400 MHz | PC2-6400 | 6400 MB/s | 4 n | ||
DDR2-1066 | 266 MHz | 3.75 ns | 533 MHz | PC2-8500 | 8533 MB/s | 4 n | ||
DDR3-800 | 100 MHz | 10 ns | 400 MHz | PC2-6400 | 6400 MB/s | 8 n | 5-9 ns (7 avg.) | 2007 |
DDR3-1066 | 133 MHz | 7.5 ns | 533 MHz | PC2-8500 | 8533 MB/s | 8 n | ||
DDR3-1333 | 166 MHz | 6 ns | 667 MHz | PC2-10600 | 10667 MB/s | 8 n | ||
DDR3-1600 | 200 MHz | 5 ns | 800 MHz | PC2-12800 | 12800 MB/s | 8 n |
Conclusion
Theory: Cache Associativity decreased as cache size became larger because it became too expensive to have to search the cache each time once the cache was too large. Also, bigger the cache size as a percentage of main memory, the less need for associativity. But while caches and main memory have both grown, main memory size has grown faster in the 2000’s. So when the percentage of cache to main memory goes down associativity needs to increase.
The Pentium/Pentium (1995)pro was the first processor to have the l2 cache on the processor chip. Before this, the l2 cache was an option to add on to the motherboard. [1]
Systems to consider in table
Pentium
amd
Mips
sun-microsystems: sparc
ibm: power pc
DEC: alpha
Penalty <100 when before 2000
after 2000 started to increase to get to main memory
< 20 1 level fine
<=100 2 level
>=200 3 level
miss rate reported, spec benchmarks
>=200 3 level
miss rate reported, spec benchmarks
References
Itanium Specs(p.20)
Pentium Pro
Intel Processors
First on-board L1
Cache Trend Table
Sector Caches
DDR2/3 Speeds
Memory Wall