User:Stchen

From Expertiza_Wiki
Jump to navigation Jump to search

Introduction to Update and Adaptive Coherence Protocols on Real Architectures

In parallel computer architectures, cache coherence refers to the consistency of data that is stored throughout the caches on individual processors or throughout the shared memory. The problem here is that we have multiple caches on multiple processors. When an update to a single cache makes changes to a shared memory, you will need to have all the caches be coherent on the value change. This is better shown below.


Multiple Caches of Shared Resource
Figure 1. Multiple Caches of Shared Resource


According to <ref name="glasco">Glasco, D.B.; Delagi, B.A.; Flynn, M.J.; , "Update-based cache coherence protocols for scalable shared-memory multiprocessors," System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on , vol.1, no., pp.534-545, 4-7 Jan. 1994 doi: 10.1109/HICSS.1994.323135 Paper</ref> there are two ways to maintain cache consistency. These are invalidation and update. The difference is that invalidation will "... purges the copies of the line from the other caches which results in a single copy of the line, and updating forwards the write value to the other caches, after which all caches are consistent"<ref name="glasco"></ref>. Since we have already talked about invalidation protocols, we will focus on update protocols. However, now there has been a new type of adaptive coherence protocols and that will also be discussed below.


According to Solihin textbook, page number 229, "One of the drawbacks of an invalidate-based protocol is that it incurs high number of coherence misses." What this means is that when a read has been made to an invalidated block, there will be a cache miss and serving this miss can create a high latency. To solve this, one can use a update coherence protocol, or an adaptive coherence protocol.

Update Coherence Protocol

Introduction

Update-based cache coherence protocols work by directly updating all the cache values in the system. This differs from the invalidation-based protocols because it achieves write propagation without having to invalidation and misses. This saves on numerous coherence misses, time spent to correct the miss, and bandwidth usage. One of the update-based protocols we will be discussing is the Dragon Protocol.

Dragon Protocol

The Dragon Protocol is an implementation of update-based coherence protocol. It further saves on bandwidth by updating the specific words within the cache instead of the entire block. The caches use write allocate and write update policies. The Dragon Protocol is made up of four states (Modified (M), Exclusive (E), Shared Modified (Sm), Shared Clean (Sc)) and does not include an invalidation state, because if the block is in cache it is valid.

  • Modified (M) - cache block is exclusively owned, however it can be different from main memory
  • Exclusive (E) - cache block is clean and is only in one cache
  • Shared Modified (Sm) - cache block resides in multiple caches and is possible dirty
  • Shared Clean (Sc) - cache block resides in multiple caches and is clean

There is not an invalidation state, because if a block is cached then it assumed to be valid. However, it can differ from main memory. Below are the finite state machines for the processor-side calls and bus-side calls.

The Dragon Protocol is implemented in the Cray CS6400 (also know as the Xerox Dragon multiprocessor workstation) and was developed by Xerox.

Firefly Protocol

Firefly protocol is another example of update coherence cache protocols. However, unlike the Dragon Protocol, it uses write-through policy (which writes all changes back to memory).

  • Valid Exclusive (VE) - cache block is exclusively owned, cache block is clean
  • Dirty (D) - exclusive rights to the cache block, cache block is dirty
  • Shared (S) - cache block is shared but is not modified

The Firefly Protocol uses a special bus technique called SharedLine to allow quick detection to copies of the block in other caches. It is similar to the COPIES_EXIST (C) and !COPIES_EXIST, and is shown that way in the finite state machines below. Similar to the Dragon protocol, there is no invalidation state because no cache blocks are ever invalidated.

The Firefly protocol is used in the DEC Firefly multiprocessor workstation, developed by Digital Equipment Corporation. The system is asymmetic and the cache is direct-mapped to support multiprocessing. The cache capicity was 16KB for the original MicroVAX 78032 microprocessor were latter increased to 64KB when upgraded to the CVAX_78034 microprocessor.

Adaptive Coherence Protocol

Introduction

Power Considerations

Quiz

References

<references />