CSC/ECE 506 Spring 2010/ch 11 maf

From Expertiza_Wiki
Revision as of 15:10, 25 April 2010 by Mafashin (talk | contribs)
Jump to navigation Jump to search

Real Cache Coherence Protocols

DASH Coherence Protocol

The DASH multiprocessor used a two-level coherence protocol, relying on a snoopy bus to ensure cache coherence within cluster and a directory-based protocol to ensure coherence across clusters. The protocol uses a Remote Access Cache (RAC) at each cluster, which essentially consolidates memory blocks from remote clusters into a single cache on the local snoopy bus. When a request is issued for a block from a remote cluster that is not in the RAC, the request is denied but the request is also forwarded to the owner. The owner supplies the block to the RAC. Eventually, when the requestor retries, the block will be waiting in the RAC.

There are too many cases to express the protocol succinctly as a finite state machine. Instead, the protocol for servicing Read and ReadX requests is summarized in the following subsections. For a more detailed elaboration, refer to the appendix of Lenoski 1990 which describes the protocol in pseudo-code.

Read

 if (Data held locally in shared state by processor or RAC)
   Other cache(s) supply data for fill;
 
 else if (Data held locally in dirty state by processor or RAC) {
   Dirty cache supplies data for fill and goes to shared state;
   if (Memory Home is Local)
     Writeback Data to main memory;
   else
     RAC takes data in shared-dirty state;
   }
 
 else if (Memory home is Local) {
   if (Directory entry state != Dirty-Remote)
     Memory supplies read data;
   else {
     Allocate RAC entry, mask arbitration and force retry;
     Forward Read Request to Dirty Cluster;
     PCPU on Dirty Cluster issues read request;
     Dirty cache supplies data and does to shared state;
     DC sends shared data reply to local cluster;
     Local RC gets reply and unmasks processor arbitration;
     Upon local processor read, RC supplies data and the
       RAC entry goes to shared state;
     Directory entry state = Shared-Remote;
     }
   }
 
 else /* Memory home is Remote */
   Allocate RAC entry, mask arbitration and force retry;
   Local DC sends read request to home cluster;
   if (Directory entry state != Dirty-Remote) {
     Directory entry state = Shared-Remote, update vector;
     Home DC sends reply to local RC;
     Local RC gets reply and unmasks processor arbitration;
   else {
     Home DC forwards Read Request to dirty cluster;
     PCPU on dirty cluster issues read request and DC sends
       reply to local cluster and sharing writeback to home;
     Local RC gets reply and unmasks processor arbitration;
     Home DC gets sharing writeback, writes back dirty data,
       Directory entry state = Shared-Remote, update vector;
     }
   Upon local processor read, RC supplies the data and the
     RAC entry goes to shared state;
   }

When a Read request is issued, it first goes to the snoopy bus of the local cluster. If the block is already in the local RAC or in a local processor's cache, then one of these caches supplies the data. If the block was dirty and the local cluster is the home cluster, then the owner will also have to flush to main memory and transition to the shared state. If the block was dirty and the local cluster is a remote cluster, then the owner flushes to the local RAC which becomes the new owner. In this last case, locally the block will be shared, but remotely it will be considered dirty.

If the data is not already held locally, but the local cluster is the home cluster and the block is not held in the dirty state by a remote cluster, then memory supplies the data. However, if the block is dirty in a remote cluster, the bus request is denied, prompting a retry. Meanwhile, the request is fowarded to the owning cluster. The owner flushes the block and transitions to the shared state. The directory transitions to the shared state as well.

If the data is not already held locally and the local cluster is not the home cluster, the bus request is denied, prompting a retry. Meanwhile, the request is forwarded to the home cluster. When the block is owned by a remote cluster, the request is forwarded to the owning cluster and the owner flushes the block and transitions to the shared state. The home cluster sends the data to the requestor's RAC and the directory transitions to the shared state.

ReadX

When a ReadX request is issued, it first goes to the snoopy bus of the local cluster. If the block is being held dirty in the local RAC or in a local processor's cache, then the owner supplies the data and transitions to the invalid state.

Otherwise, if the local cluster is the home cluster and the block is in the uncached or shared state in the directory, then memory supplies the data and other local caches transistion to the invalid state. If the block is in the shared state in the directory, then the directory must also send out invalidation requests to the remote clusters that are in the sharing vector and transition the block to the uncached state in the directory. However, if the block is in the dirty state in the directory, the bus request is denied, prompting a retry. Meanwhile, the request is forwarded to the owning cluster. The owner flushes and transitions to the invalid state. The directory transitions the block to the uncached state.

Otherwise, if the local cluster is not the home cluster, the bus request is denied, prompting a retry. Meanwhile, the request is forwarded to the home cluster. If the block is in the uncached or shared state in the directory, the home cluster will send the data to the requestor's RAC and the directory transitions to the dirty state. If the block is in the shared state in the directory, the directory must also send invalidation requests to the sharing clusters. However, if the block is in the dirty state in the directory, the request is forwarded to the owning cluster. The owner sends the dirty block to the RAC of the requestor and an acknowledgement to home and transitions to the invalid state.

References