CSC/ECE 506 Spring 2011/ch6b ab: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
(About 1/3 done)
m (fixed the format of the references)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Overview==
==Overview==
 
Cache addressing has a significant impact on the performance of the cache, determining cache latency and when the cache must be flushed. Since a cache is designed purely to improve performance, the addressing scheme must be a prime consideration.


==Cache Addressing==
==Cache Addressing==
Line 6: Line 6:


===Virtually Indexed, Virtually Tagged===
===Virtually Indexed, Virtually Tagged===
In a cache that uses the virtual address for both the index and the tag, no address translation is required on a cache hit. Thus the TLB and page table are only used on a cache miss. This allows for expedient retrieval of the requested data from the cache since no lookup occurs and the operand of the load or store instruction can be used as-is. However, after context switch, the same virtual addresses can now refer to completely different data so the cache must recognize this and flush on a context switch or at the very least flush the lines that conflict. Another issue with a VIVT cache is the same data may have different virtual addresses if it is shared among different threads/processes. This data would be stored at multiple places in the cache even though it originates from a single memory location.
In a cache that uses the virtual address for both the index and the tag, no address translation is required on a cache hit. Thus the TLB and page table are only used on a cache miss. This allows for expedient retrieval of the requested data from the cache since no lookup occurs and the operand of the load or store instruction can be used as-is. However, after context switch, the same virtual addresses can now refer to completely different data so the cache must recognize this and flush on a context switch or at the very least flush the lines that conflict. Another issue with a VIVT cache is the same data may have different virtual addresses if it is shared among different threads/processes. This data would be stored at multiple places in the cache even though it originates from a single memory location. [[#References|<sup>[1]</sup>]]


===Physically Indexed, Physically Tagged===
===Physically Indexed, Physically Tagged===
A lookup in this type of cache requires an address translation be the first step of any memory access. Thus the TLB must be large enough to contain references for the data in the cache otherwise the address translation would require a main memory access even on a cache hit, defeating the purpose of caching the value in the first place. The time to translate the address through the TLB is still non-negligible and is added on the front of the latency incurred by the cache lookup itself. After the address translation is complete, the cache uses the resultant physical address to find the line and check the tag. No flushing is necessary on a context switch because there is only one line on which any cache-block-sized piece of memory can reside, and the physical address is compared with the tag to determine if the data associated with the requested memory location is indeed present on that cache line. If multiple virtual addresses correspond to a single physical address, they will all seek out the same cache block when they do a cache lookup.  
A lookup in this type of cache requires an address translation be the first step of any memory access. Thus the TLB must be large enough to contain references for the data in the cache otherwise the address translation would require a main memory access even on a cache hit, defeating the purpose of caching the value in the first place. The time to translate the address through the TLB is still non-negligible and is added on the front of the latency incurred by the cache lookup itself. After the address translation is complete, the cache uses the resultant physical address to find the line and check the tag. No flushing is necessary on a context switch because there is only one line on which any cache-block-sized piece of memory can reside, and the physical address is compared with the tag to determine if the data associated with the requested memory location is indeed present on that cache line. If multiple virtual addresses correspond to a single physical address, they will all seek out the same cache block when they do a cache lookup. One small downside is that the tags must be longer because they must contain the entire physical address rather than the part of the address not used for indexing as in the previous examples. [[#References|<sup>[1]</sup>]]


===Virtually Indexed, Physically Tagged===
===Virtually Indexed, Physically Tagged===
 
This cache type allows the virtual address to be used right away to begin the lookup of the cache line. While this is going on, the TLB look up for the physical address can occur in parallel. When both lookups are complete, the physical address returned from the TLB is compared with the tag on the cache blocks to determine if the requested data is on this line. This hides the latency from address translation (assuming it takes approximately as long as retrieving the cache line) and obviates the need to flush the cache on a context switch because the physical address is used for the final check of the tag. [[#References|<sup>[1]</sup>]]


===Physically Indexed, Virtually Tagged===
===Physically Indexed, Virtually Tagged===
This is basically a "worst of both worlds" approach. The address translation must still be performed in order to find the index, which increases latency, and the cache must still be flushed on a context switch since there is the potential for a tag conflict using virtual tagging.
This is basically a "worst of both worlds" approach. The address translation must still be performed in order to find the index, which increases latency, and the cache must still be flushed on a context switch since there is the potential for a tag conflict using virtual tagging. [[#References|<sup>[1]</sup>]]


==TLB Coherence==
==TLB Coherence==
There is the potential for the information in one processor's TLB to be made stale by another processor if the other processor changes the permissions on a page or handles the swapping of a page out to disk. Thus there must be some method of updating the TLB with fresher information when this (somewhat rare) scenario occurs. [[#References|<sup>[2]</sup>]]
===Virtually Addressed Caches===
If the cache is virtually addressed and the miss rate is sufficiently low, the TLB can be eschewed entirely without impacting performance too much because it would only be used when a memory access is already required. The TLB need not be kept coherent if it doesn't exist. [[#References|<sup>[2]</sup>]]
===TLB Shootdown===
The processor making changes to the page table sends an interrupt to other processors alerting them that there has been a change made. The other processors look at a shared memory to determine which page table entries have changed and either invalidate or update their TLBs accordingly.[[#References|<sup>[2]</sup>]]
===Address Space Identifiers===
This is a concept similar to process tagging in a virtually addressed cache. The software maintains control of the TLB and marks each of the entires with an address space identifier denoting which process the buffered translation belongs to. These identifiers can be used by the OS to manage TLB coherence by updating or invalidating other processors' TLBs or flushing only the entries corresponding to the process whose page table is changing. The MIPS architecture uses this strategy.[[#References|<sup>[2]</sup>]]


===<Recent Processor>'s approach===
===Write Invalidate===
This protocol uses the fact that other processors are already implementing a cache coherence protocol by snooping the bus and responding to the instructions and data that go across it. When a processor changes a page table entry it issues a command on the bus similar to a BusUpgr as used in cache coherence that tells the snooping processors to invalidate that entry in their TLBs. The PowerPC architecture uses this to maintain its TLB coherence.[[#References|<sup>[2]</sup>]]


==Other Contemporary Issues==
==Other Contemporary Issues==
The increase in prevalence of virtualization has caused many architectural changes to newer x86 processors. The TLBs were formerly managed fully by the hardware, but in order to better cope with virtual machines, both Intel and AMD have added address space identifiers to the TLB so that the entire thing isn't flushed every context switch. [[#References|<sup>[3]</sup>]]
==References==
[http://www.linuxjournal.com/article/7105?page=0,1 1: Linux Journal Article on Caching]
[http://books.google.com/books?id=g82fofiqa5IC&pg=PA440&lpg=PA440&dq=tlb+coherence&source=bl&ots=COtleqdaUp&sig=fCU_8vD9_PhadrY62lneWUMG57g&hl=en&ei=VWhsTfatNMH7lwek9e3-BA&sa=X&oi=book_result&ct=result&resnum=6&ved=0CDYQ6AEwBQ#v=onepage&q=tlb%20coherence&f=false 2: Parallel computer architecture: a hardware/software approach by David E. Culler]
[http://en.wikipedia.org/wiki/Translation_lookaside_buffer#Virtualization_and_x86_TLB 3: Wikipedia on the TLB]

Latest revision as of 04:37, 1 March 2011

Overview

Cache addressing has a significant impact on the performance of the cache, determining cache latency and when the cache must be flushed. Since a cache is designed purely to improve performance, the addressing scheme must be a prime consideration.

Cache Addressing

The data in a CPU cache is addressed using an index and a tag. The index is used to find the cache line where the block containing the data being sought might be stored and the tag is used to determine if the data contained in any of the blocks at that line is indeed the data being sought. Each of these two lookup operations can proceed using either the physical or the virtual address. This leads to four possible schemes for cache addressing.

Virtually Indexed, Virtually Tagged

In a cache that uses the virtual address for both the index and the tag, no address translation is required on a cache hit. Thus the TLB and page table are only used on a cache miss. This allows for expedient retrieval of the requested data from the cache since no lookup occurs and the operand of the load or store instruction can be used as-is. However, after context switch, the same virtual addresses can now refer to completely different data so the cache must recognize this and flush on a context switch or at the very least flush the lines that conflict. Another issue with a VIVT cache is the same data may have different virtual addresses if it is shared among different threads/processes. This data would be stored at multiple places in the cache even though it originates from a single memory location. [1]

Physically Indexed, Physically Tagged

A lookup in this type of cache requires an address translation be the first step of any memory access. Thus the TLB must be large enough to contain references for the data in the cache otherwise the address translation would require a main memory access even on a cache hit, defeating the purpose of caching the value in the first place. The time to translate the address through the TLB is still non-negligible and is added on the front of the latency incurred by the cache lookup itself. After the address translation is complete, the cache uses the resultant physical address to find the line and check the tag. No flushing is necessary on a context switch because there is only one line on which any cache-block-sized piece of memory can reside, and the physical address is compared with the tag to determine if the data associated with the requested memory location is indeed present on that cache line. If multiple virtual addresses correspond to a single physical address, they will all seek out the same cache block when they do a cache lookup. One small downside is that the tags must be longer because they must contain the entire physical address rather than the part of the address not used for indexing as in the previous examples. [1]

Virtually Indexed, Physically Tagged

This cache type allows the virtual address to be used right away to begin the lookup of the cache line. While this is going on, the TLB look up for the physical address can occur in parallel. When both lookups are complete, the physical address returned from the TLB is compared with the tag on the cache blocks to determine if the requested data is on this line. This hides the latency from address translation (assuming it takes approximately as long as retrieving the cache line) and obviates the need to flush the cache on a context switch because the physical address is used for the final check of the tag. [1]

Physically Indexed, Virtually Tagged

This is basically a "worst of both worlds" approach. The address translation must still be performed in order to find the index, which increases latency, and the cache must still be flushed on a context switch since there is the potential for a tag conflict using virtual tagging. [1]

TLB Coherence

There is the potential for the information in one processor's TLB to be made stale by another processor if the other processor changes the permissions on a page or handles the swapping of a page out to disk. Thus there must be some method of updating the TLB with fresher information when this (somewhat rare) scenario occurs. [2]

Virtually Addressed Caches

If the cache is virtually addressed and the miss rate is sufficiently low, the TLB can be eschewed entirely without impacting performance too much because it would only be used when a memory access is already required. The TLB need not be kept coherent if it doesn't exist. [2]

TLB Shootdown

The processor making changes to the page table sends an interrupt to other processors alerting them that there has been a change made. The other processors look at a shared memory to determine which page table entries have changed and either invalidate or update their TLBs accordingly.[2]

Address Space Identifiers

This is a concept similar to process tagging in a virtually addressed cache. The software maintains control of the TLB and marks each of the entires with an address space identifier denoting which process the buffered translation belongs to. These identifiers can be used by the OS to manage TLB coherence by updating or invalidating other processors' TLBs or flushing only the entries corresponding to the process whose page table is changing. The MIPS architecture uses this strategy.[2]

Write Invalidate

This protocol uses the fact that other processors are already implementing a cache coherence protocol by snooping the bus and responding to the instructions and data that go across it. When a processor changes a page table entry it issues a command on the bus similar to a BusUpgr as used in cache coherence that tells the snooping processors to invalidate that entry in their TLBs. The PowerPC architecture uses this to maintain its TLB coherence.[2]

Other Contemporary Issues

The increase in prevalence of virtualization has caused many architectural changes to newer x86 processors. The TLBs were formerly managed fully by the hardware, but in order to better cope with virtual machines, both Intel and AMD have added address space identifiers to the TLB so that the entire thing isn't flushed every context switch. [3]


References

1: Linux Journal Article on Caching

2: Parallel computer architecture: a hardware/software approach by David E. Culler

3: Wikipedia on the TLB