CSC 456 Spring 2012/10a AJ
Prefetching and Consistency Models
While memory consistency models ensure instructions are executed in correct order, these models can hinder efficiency. Since consistency models dictate order of execution, prefetching allows operations to complete quicker once their turn comes by bringing the necessary data closer into the cache before it is needed. Prefetching is a hardware optimization technique in which the processor automatically prefetches ownership for any write operations that are delayed. These delays are due to the program order requirement. Using prefetching as well as sequential consistency creates delayed writes that become partially overlapped with the operations preceding them in program order. This technique is only applicable to cache-based systems that use an invalidation-based protocol. This technique is suitable for statically scheduled processors.
Fixed vs. Adaptive Sequential Prefetching
Fixed sequential prefetching refers to prefetching that occurs at a constant rate over time. Because the amount of prefetching remains constant throughout execution, fixed sequential is the simplest method of prefetching to implement. When block n is read, a fixed number of consecutive blocks (K), are prefetched if they are not already in the cache regardless of a read miss or read hit. Finding the best value for K remains tricky, however. The K value must be large enough to lower cold miss rates while at the same time small enough to ensure that useless data is not loaded into the cache causing unnecessary traffic.
Adaptive sequential prefetching, on the other hand, changes the rate of prefetching allowed over time. The prefetching rate is increased or decreased based on the number of past successful prefetches (prefetched blocks that were actually used). The rate of prefetching is increased when the ratio between useful prefetches and total prefetches becomes greater than a particular threshold. The rate is also consequently decreased the the same ratio fall below a second threshold. While both methods improve efficiency, adaptive sequential prefetching is the most efficient but also the most costly. (Dahlgren)
Exclusive Mode-PrefetchingExclusive-mode prefetching helps to reduce both the miss latencies and the message traffic associated with writes. Unlike read misses, which directly stall the processor for their entire duration. Write misses affect performance
Where they stand now
Prefetching began to fade due to its many disadvantages and memory speeds starting to catch up with processor transfer rates. One issue with prefetching is the increased complexity and overhead of handing the prefetching algorithms. There is great risk that this overhead can be larger than any benefits if the prefetching algorithm used is not accurate, and fetches too early or fetches too late to be effective. Performance must be improved significantly to overcome the overhead and complexity or be a waste. Another problem comes about with the introduction of multicore architectures. In a single core architecture, prefetching requests are able to come from one core. With multiple cores, prefetching requests can originate from any of the different cores. This puts additional stress on memory to not only deal with regular prefetch requests but also to handle prefetch from different sources, and greatly increases the overhead and complexity of logic. Coherence algorithms must account not only for sequential consistency issues, but also account for the data change in another location, the location of prefetched data. Flushing data becomes significantly more complicated. If prefetched data is stored in the data cache, then cache conflict, can become a significant problem. This is because the current and predictive sets of data must exist in the cache at the same time. Without prefetching, you could use the additional space to simply increase the cache size itself. The solution here would be to add extra hardware to act as a buffer to prevent utilizing this cache space.
- Kourosh Gharachorloo, Anoop Gupta, and John Hennessy of Stanford University, "Two techniques to enhance the performance of memory consistency models," International Conference on Parallel Processing, 1991
- Frederick Dahlgren, Michel Dubois, and Per Stenstrom, "Sequential hardware prefetching in shared-memory multiprocessors," IEEE Trans. Parallel and Distributed Sytems, July 1995
- Mark Hill, "Multiprocessors should support simple memory consistency models," IEEE Computer, August 1998
- Amir Kamil Jimmy Su, and Katherine Yelick, "Making sequential consistency practical in Titanium," Supercomputing, SC-2005