CSC 456 Spring 2012/10a AJ: Difference between revisions
(→Intro) |
|||
Line 11: | Line 11: | ||
===Exclusive Mode-Prefetching=== | ===Exclusive Mode-Prefetching=== | ||
Exclusive-mode prefetching helps to reduce both the miss latencies and the message traffic associated with writes. Unlike read misses, which directly stall the processor for their entire duration, write misses affect performance more indirectly, since writes can be buffered. A processor stalls while waiting for writes to complete in two situations: (i) when executing a write instruction if the write buffer is full, and (ii) during a read miss if previous writes must complete before the read miss can proceed. The impact of the former effect can be reduced through larger write buffers. In summary, exclusive-mode prefetching can provide significant performance benefits in architectures that have not already eliminated write stall times through aggressive implementations of weaker consistency models with lockup-free caches. Even if write stall times cannot be further reduced, exclusive-mode prefetching can improve performance somewhat by reducing the traffic associated with cache coherency. | Exclusive-mode prefetching helps to reduce both the miss latencies and the message traffic associated with writes. Unlike read misses, which directly stall the processor for their entire duration, write misses affect performance more indirectly, since writes can be buffered. A processor stalls while waiting for writes to complete in two situations: (i) when executing a write instruction if the write buffer is full, and (ii) during a read miss if previous writes must complete before the read miss can proceed. The impact of the former effect can be reduced through larger write buffers. In summary, exclusive-mode prefetching can provide significant performance benefits in architectures that have not already eliminated write stall times through aggressive implementations of weaker consistency models with lockup-free caches. Even if write stall times cannot be further reduced, exclusive-mode prefetching can improve performance somewhat by reducing the traffic associated with cache coherency.[[File:exclusive_prefetch.jpg|200px|thumb|right| exclusive prefetch]] | ||
==Where they stand now== | ==Where they stand now== |
Revision as of 13:41, 16 April 2012
Prefetching and Consistency Models
Intro
While memory consistency models insure instructions are executed in correct order, these models can hinder efficiency. Since consistency models dictate order of execution, prefetching allows operations to complete quicker once their turn comes by bringing the necessarily data closer into the cache before it is needed. Prefetching is a hardware optimization technique in which the processor automatically prefetches ownership for any write operations that are delayed due to the program order requirement (e.g., by issuing prefetch-exclusive requests for any writes delayed in the write buffer), thus partially overlapping the service of the delayed writes with the operations preceding them in program order. This technique is only applicable to cache-based systems that use an invalidation-based protocol. This technique is suitable for statically scheduled processors.
Methods
Fixed vs. Adaptive Sequential Prefetching
Fixed sequential prefetching refers to prefetching of that occurs at a constant rate over time. Adaptive sequential prefetching, on the other hand, changes the rate of prefetching allowed over time. The prefetching rate is increased/decreased based on the count of successful prefetches. The rate is therefore dependent on workload and application (a start-up process will have a high rate of cold misses). While both methods improve efficiency, adaptive sequential prefetching is the most efficient as well as the most costly.
Exclusive Mode-Prefetching
Exclusive-mode prefetching helps to reduce both the miss latencies and the message traffic associated with writes. Unlike read misses, which directly stall the processor for their entire duration, write misses affect performance more indirectly, since writes can be buffered. A processor stalls while waiting for writes to complete in two situations: (i) when executing a write instruction if the write buffer is full, and (ii) during a read miss if previous writes must complete before the read miss can proceed. The impact of the former effect can be reduced through larger write buffers. In summary, exclusive-mode prefetching can provide significant performance benefits in architectures that have not already eliminated write stall times through aggressive implementations of weaker consistency models with lockup-free caches. Even if write stall times cannot be further reduced, exclusive-mode prefetching can improve performance somewhat by reducing the traffic associated with cache coherency.
Where they stand now
Prefetching began to fade due to its many disadvantages and memory speeds starting to catch up with processor transfer rates. One issue with prefetching is the increased complexity and overhead of handing the prefetching algorithms. There is great risk that this overhead can be larger than any benefits if the prefetching algorithm used is not accurate, and fetches too early or fetches too late to be effective. Performance must be improved significantly to overcome the overhead and complexity or be a waste. Another problem comes about with the introduction of multicore architectures. In a single core architecture, prefetching requests are able to come from one core. With multiple cores, prefetching requests can originate from any of the different cores. This puts additional stress on memory to not only deal with regular prefetch requests but also to handle prefetch from different sources, and greatly increases the overhead and complexity of logic. Coherence algorithms must account not only for sequential consistency issues, but also account for the data change in another location, the location of prefetched data. Flushing data becomes significantly more complicated. If prefetched data is stored in the data cache, then cache conflict, can become a significant problem. This is because the current and predictive sets of data must exist in the cache at the same time. Without prefetching, you could use the additional space to simply increase the cache size itself. The solution here would be to add extra hardware to act as a buffer to prevent utilizing this cache space.