CSC/ECE 506 Spring 2010/ch 11 maf: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
Line 5: Line 5:
The DASH multiprocessor uses a two-level coherence protocol, relying on a snoopy bus to ensure cache coherence within cluster and a directory-based protocol to ensure coherence across clusters.  The protocol uses a Remote Access Cache (RAC) at each cluster, which essentially consolidates memory blocks from remote clusters into a single cache on the local snoopy bus.  When a request is issued for a block from a remote cluster that is not in the RAC, the request is denied but the request is also forwarded to the owner.  The owner supplies the block to the RAC.  Eventually, when the requestor retries, the block will be waiting in the RAC.
The DASH multiprocessor uses a two-level coherence protocol, relying on a snoopy bus to ensure cache coherence within cluster and a directory-based protocol to ensure coherence across clusters.  The protocol uses a Remote Access Cache (RAC) at each cluster, which essentially consolidates memory blocks from remote clusters into a single cache on the local snoopy bus.  When a request is issued for a block from a remote cluster that is not in the RAC, the request is denied but the request is also forwarded to the owner.  The owner supplies the block to the RAC.  Eventually, when the requestor retries, the block will be waiting in the RAC.


There are too many cases to express the protocol succinctly as a finite state machine.  Instead, the protocol for servicing Read and ReadX requests is summarized in the following subsections.  For a more detailed elaboration, [[#References | Lenoski <i>et al.</i>]] describe the protocol in pseudo-code.
Because the local snoopy bus of the home cluster maintains coherence with the home memory, the directory does not keep track of the block's status in the home cluster.  Instead, the directory tracks the state of the block in the remote clusters.  This is in contrast to the protocols discussed by [[#References | Solihin 2008]] which do not incorporate a local bus and therefore use directories which track the state of the block across all caches.
 
There are too many cases to express the protocol succinctly as a finite state machine.  Instead, the protocol for servicing Read and ReadX requests is summarized in the following subsections.  For a more detailed elaboration, [[#References | Lenoski <i>et al.</i> 1990]] describe the protocol in pseudo-code.


===Read===
===Read===

Revision as of 16:25, 25 April 2010

Real Cache Coherence Protocols

DASH Coherence Protocol

The DASH multiprocessor uses a two-level coherence protocol, relying on a snoopy bus to ensure cache coherence within cluster and a directory-based protocol to ensure coherence across clusters. The protocol uses a Remote Access Cache (RAC) at each cluster, which essentially consolidates memory blocks from remote clusters into a single cache on the local snoopy bus. When a request is issued for a block from a remote cluster that is not in the RAC, the request is denied but the request is also forwarded to the owner. The owner supplies the block to the RAC. Eventually, when the requestor retries, the block will be waiting in the RAC.

Because the local snoopy bus of the home cluster maintains coherence with the home memory, the directory does not keep track of the block's status in the home cluster. Instead, the directory tracks the state of the block in the remote clusters. This is in contrast to the protocols discussed by Solihin 2008 which do not incorporate a local bus and therefore use directories which track the state of the block across all caches.

There are too many cases to express the protocol succinctly as a finite state machine. Instead, the protocol for servicing Read and ReadX requests is summarized in the following subsections. For a more detailed elaboration, Lenoski et al. 1990 describe the protocol in pseudo-code.

Read

 // "Figure 7: Normal flow of read request bus transaction."
 // From Lenoski et al. 1990 (see References).
 
 if (Data held locally in shared state by processor or RAC)
   Other cache(s) supply data for fill;
 
 else if (Data held locally in dirty state by processor or RAC) {
   Dirty cache supplies data for fill and goes to shared state;
   if (Memory Home is Local)
     Writeback Data to main memory;
   else
     RAC takes data in shared-dirty state;
   }
 
 else if (Memory home is Local) {
   if (Directory entry state != Dirty-Remote)
     Memory supplies read data;
   else {
     Allocate RAC entry, mask arbitration and force retry;
     Forward Read Request to Dirty Cluster;
     PCPU on Dirty Cluster issues read request;
     Dirty cache supplies data and does to shared state;
     DC sends shared data reply to local cluster;
     Local RC gets reply and unmasks processor arbitration;
     Upon local processor read, RC supplies data and the
       RAC entry goes to shared state;
     Directory entry state = Shared-Remote;
     }
   }
 
 else /* Memory home is Remote */
   Allocate RAC entry, mask arbitration and force retry;
   Local DC sends read request to home cluster;
   if (Directory entry state != Dirty-Remote) {
     Directory entry state = Shared-Remote, update vector;
     Home DC sends reply to local RC;
     Local RC gets reply and unmasks processor arbitration;
   else {
     Home DC forwards Read Request to dirty cluster;
     PCPU on dirty cluster issues read request and DC sends
       reply to local cluster and sharing writeback to home;
     Local RC gets reply and unmasks processor arbitration;
     Home DC gets sharing writeback, writes back dirty data,
       Directory entry state = Shared-Remote, update vector;
     }
   Upon local processor read, RC supplies the data and the
     RAC entry goes to shared state;
   }

When a Read request is issued, it first goes to the snoopy bus of the local cluster. If the block is already in the local RAC or in a local processor's cache, then one of these caches supplies the data. If the block was dirty and the local cluster is the home cluster, then the owner will also have to flush to main memory and transition to the shared state. If the block was dirty and the local cluster is a remote cluster, then the owner flushes to the local RAC which becomes the new owner. In this last case, locally the block will be shared, but remotely it will be considered dirty.

If the data is not already held locally, but the local cluster is the home cluster and the block is not held in the dirty state by a remote cluster, then memory supplies the data. However, if the block is dirty in a remote cluster, the bus request is denied, prompting a retry. Meanwhile, the request is fowarded to the owning cluster. The owner flushes the block and transitions to the shared state. The directory transitions to the shared state as well.

If the data is not already held locally and the local cluster is not the home cluster, the bus request is denied, prompting a retry. Meanwhile, the request is forwarded to the home cluster. When the block is owned by a remote cluster, the request is forwarded to the owning cluster and the owner flushes the block and transitions to the shared state. The home cluster sends the data to the requestor's RAC and the directory transitions to the shared state.

ReadX

 // "Figure 9: Normal flow of read-exclusive request bus transaction."
 // From Lenoski et al. 1990 (see References).
 
 if (Data held locally in dirty state by processor or RAC)
   Dirty cache supplies Read-Exclusive fill data and
     invalidates self;
 
 else if (Memory Home is Local) {
   switch (Directory entry state) {
 
     case Uncached-Remote :
       Memory supplies data, any locally cached copies
         are invalidated;
       break;
 
     case Shared-Remote:
       RC allocates an entry in RAC with DC specified
         invalidate acknowledge count;
       Memory supplies data, any locally cached copies are
         invalidated;
       Local DC sends invalidate request to shared clusters;
       Dir. entry state = Uncached-Remote, update vector;
       Upon receipt of all acknowledges RC deallocates RAC
         entry;
       break;
 
     case Dirty-Remote :
       Allocate RAC entry, mask arbitration and force retry;
       Forward Read-Exclusive Request to dirty cluster;
       PCPU at dirty cluster issues Read-Ex request,
         Dirty cache supplies data and invalidates self;
       DC in dirty cluster sends reply to local RC;
       Local RC gets reply from dirty cluster and unmasks
         processor arbitration;
       Upon local processor re-Read-Ex, RC supplies data,
         RAC entry is deallocated and
         Dir. entry state = Uncached-Remote, update vector;
     }
   }
 else /* Memory Home is Remote */ {
   RC allocates RAC entry, masks arbitration and forces retry;
   Local DC sends Read-Exclusive request to home;
   switch (Directory entry state) {
 
     case Uncached-Remote :
       Home memory supplies data, any locally cached copies
         are invalidated, Home DC sends reply to local RC;
       Directory entry state = Dirty-Remote, update vector;
       Local RC gets Read-Ex reply with zero invalidation
         count and unmasks processor for arbitration;
       Upon local processor re-Read-Ex, RC supplies data and
         RAC entry is deallocated;
       break;
 
     case Shared-Remote :
       Home memory supplies data, any locally cached copies
         are invalidated, Home DC sends reply to local RC;
       Home DC sends invalidation requests to sharing
         clusters;
       Directory entry state = Dirty-Remote, update vector;
       Local RC gets reply with data and invalidate acknow-
         ledge count and unmasks processor for arbitration;
       Upon local processor re-Read-Ex, RC supplies data;
       Upon receipt of all acknowledge RC deallocates RAC
         entry;
       break;
 
     case Dirty-Remote :
       Home DC forwards Read-Ex request to dirty cluster;
       PCPU at dirty cluster issues Read-Ex request,
         Dirty cache supplies data and invalidates self;
       DC in dirty cluster sends reply to local RC with
         acknowledge count of one and sends Dirty Transfer
         request to home;
       Local RC gets reply and acknowledge count and unmasks
         processor for arbitration;
       Upon local processor re-Read-Ex, RC supplies data;
       Upon receipt of Dirty Transfer request, Home DC
         sends acknowledgement to local RC,
         Home Dir. entry state = Dirty-Remote, update vector;
       Upon receipt of acknowledge RC deallocates RAC entry;
     }
   }

When a ReadX request is issued, it first goes to the snoopy bus of the local cluster. If the block is being held dirty in the local RAC or in a local processor's cache, then the owner supplies the data and transitions to the invalid state.

Otherwise, if the local cluster is the home cluster and the block is in the uncached or shared state in the directory, then memory supplies the data and other local caches transistion to the invalid state. If the block is in the shared state in the directory, then the directory must also send out invalidation requests to the remote clusters that are in the sharing vector and transition the block to the uncached state in the directory. However, if the block is in the dirty state in the directory, the bus request is denied, prompting a retry. Meanwhile, the request is forwarded to the owning cluster. The owner flushes and transitions to the invalid state. The directory transitions the block to the uncached state.

Otherwise, if the local cluster is not the home cluster, the bus request is denied, prompting a retry. Meanwhile, the request is forwarded to the home cluster. If the block is in the uncached or shared state in the directory, the home cluster will send the data to the requestor's RAC and the directory transitions to the dirty state. If the block is in the shared state in the directory, the directory must also send invalidation requests to the sharing clusters. However, if the block is in the dirty state in the directory, the request is forwarded to the owning cluster. The owner sends the dirty block to the RAC of the requestor and an acknowledgement to home and transitions to the invalid state.

References