CSC/ECE 506 Spring 2012/10a vm

From Expertiza_Wiki
Jump to navigation Jump to search

Landing page for "Prefetching and consistency models."

Prefetching

Introduction to prefetching

Almost all processors today, use prefetching as a means to speed up execution. Primarily prefetching is used to shorten the amount of time a processor is in wait state by predicting which cache block would be accessed, so that where there is a cache miss the old block in the cache can be replaced with the prefetched block immediately and thus decreasing the idle time of the processor. Performance in prefetching is best when it is done by following the program order. However it need not always be that prefetching is done in program order, a processor trying to guess the result of a calculation during a complex branch prediction algorithm will need to anticipate the result and fetch the right set of instructions for execution. Things get more complex when it comes to graphical processing units or GPUs. Prefetching can take advantage of spatial coherence and the data that is prefetched are not a set of instructions but instead they are texture elements that can be mapped to a polygon.<ref>Instruction prefetch wiki article.</ref>


Types of prefetching

Prefetching can be primarily classified based on whether it is binding or non-binding and whether it is hardware or software controlled. Using a binding prefetch, the value of a later reference, like a register load, is bound at the time the prefetch completes. However this comes with restrictions because the value in the prefetch might not be accurate, there might be invalidation that can be caused by another processor in the time frame between prefetch and reference. Coming to the non-binding prefetch, for the data is got into the processor cache, the coherence is maintained till the processor reads or writes the value. This approach unlike the binding prefetch will not have an effect on the correctness of any consistency model. This can serve as a serious improvement in terms of performance.

The kind of prefetch that is interesting to us is the hardware controlled and non-binding prefetch. Prefetching done is software involves significant effort from the user side and it may not be scalable as the length of the program increases. Whereas non-binding prefetch yields significantly more performance when compared to the binding prefetch. The major performance enhancement from prefetching comes with decreasing the memory latency that are caused by all consistency models.

In the case of a read operation, the read prefetch is issed to get the data in a read-only shared state in the cache. Since we are considering the non-binding prefetch there is a guarantee that the read operation is going to return a correct value when it is allowed to run irrespective of the actual prefetch completion time. However there is also a catch to this, the value read might not always be correct. Consider a case where there is a write operation performed on the memory location that was just read-prefetched. Thus the value can be read again depending on whether an invalidation or an update based coherence scheme is being used. If an update based coherence scheme is being used, we can be assured that the other processor sends a bus update and the current processor can pick it up to update its value. However, if an invalidation based coherence scheme is used, there would be a coherence miss and the value will be read again from the system memory giving an illusion that the prefetch did not occur.

In the case of a write operation, to acquire the exclusive ownership of the line, a read-exclusive prefetch can be used. Since it is cached in the exclusive mode, a write operation can proceed without incurring an invalidate or an update on other caches which have cached the same line or block. The write operation performs quickly because the value is already cached and there is not coherence miss, thus reducing the idle time of the processor. However, it is to be noted that the read-exclusive prefetch is only possible if the coherence scheme is invalidation based and not an update one. The same logic of incorrectness applies here that we have seen in the read case, suppose that another processor asks for a write to the same block, the exclusive ownership will no longer be maintained. Instead the cache block will be invalidated.

References

<references />