Chapter 6: Joshua Mohundro, Patrick Wong
Sectored Cache
History
One of the first commercially available PCs that used a cache, IBM 360/85, used a sectored cache. The primary reason for a sectored cache is that during the time of the IBM 360/85 it was easier to build then current non-sectored designs. However, the sectored design proved to be much less efficient then the non-sectored designs (of the time) and thus largely disappeared.
How they work
A sectored cache is broken up into sectors (hence the name) each of which has an address tag associated with it. Each sector is further broken down into subsectors, each of which has a "valid" bit allowing for some subsectors to remain empty while others are full. When there is a miss to a sector, a resident sector is evicted, an address tag is set to point to the missed sector and a single subsector is fetched. When a subsector is missing but the sector "containing" it is present then only the subsector needs to be fetched
Victim Cache
The Victim Cache, in architectures with them, stores just-evicted lines from another level of cache. This cache is usually highly associative and has very few entries, but solves one of the pathological cases for direct-mapped caches, the alternating memory access pattern (of which a cache line conflict occurs). In effect, this extends the associativity of would-be conflict misses by the number of entries in the victim cache for very low cost.
Architectures implementing victim cache for x86 include the Transmeta Efficeon, AMD K7, AMD K8, and finally the AMD K10.
AMD has traditionally implemented an exclusive cache hierarchy, a form of cache that avoids duplication of data. Therefore, a victim cache is a natural development from implementation of an exclusive cache.
In K7, the cache is on a very slow external bus. The victim cache acted as a buffer between evicted lines from L1 cache, and slow L2 cache.
The K10's "victim cache" deserves some more inspection, as it is 2-6 MB, an order of magnitude larger than most victim cache implementations. It is more of a buffer for efficient implementation of AMD's exclusive cache hierarchy. It is possible that AMD decided that the L3 cache was fast enough to act as a victim cache.
Look into: non-x86 victim caches, herp
Actual *implications* of victim caches for inclusive vs. exclusive cache hierarchies... yep