CSC/ECE 506 Fall 2007/wiki3 2 tl

From Expertiza_Wiki
Jump to navigation Jump to search

Directory Based Cache Coherence

Scalable distributed memory machines are made up of various nodes connected by a network. Each of these nodes is comprised of a processor with cache, a memory unit and a communication assist unit, which acts as the interface between the processor and the network. To obtain cache coherence on a physically distributed system built from numerous nodes without a interconnect which can be snooped, we can use a flat directory based scheme. A flat directory based scheme can be either memory-based or cache-based and exhibit the following characteristics:

  • In a flat directory scheme, directory information is located in a fixed place, typically at the home node where the memory is located.
  • To locate copies of the data
    • Memory-based: all the information is held in the directory at the home node.
    • Cache-based: home node has pointer to first element of a linked list
  • To communicate with those copies
    • Memory-based: uses point-to-point messages which can be multicast or overlapped.
    • Cache-based: uses a point-to-point linked list traversal to find communicate.

Simple Scalable Coherent Interface (SSCI)

In a cache-based directory scheme, every shareable block in memory is associated with a list of processors sharing that block. The home node maintains a pointer to first sharer plus state bits, which is the head pointer for the block. Each node with a cached copy maintains two additional pointers for each cache line to next and previous sharer, which are the forward and backward pointers.

Flat Cache-based Directory Scheme

The SSCI protocol is an example of a flat cache-based directory scheme. The SSCI protocol is a simplified version of the SCI protocol which we will examine briefly. The directory entries can be distributed along with the memory and the high order bits of an address can be used to identify the location of the memory and directory entries for that portion of memory. The SSCI protocol scheme makes use of a full bit-vector approach by maintaining a vector which keeps track of the states in the cache, the states in the memory directory and a pointer to the processors that share a particular block.

Cache states in the SSCI can be labeled:

  1. M(modified)
  2. E(exclusive)
  3. S(Shared)
  4. I(Invalid)

States in the memory directory may have the value of:

  1. U(Unowned)
  2. S(Shared)
  3. EM(Exclusive Modified)

In addition, there is a local node, where the request originates, a home node Where the memory and directory live and a remote node that has a copy of the block (exclusive or shared).

The Scalable Coherent Interface (SCI)

The Scalable Coherent Interface (SCI) is a cache coherent memory model that can be used in systems up to 64K nodes. SCI's flexibility stems mainly from its communication protocol: In contrast to many former systems, it is not only restricted to either message-based or shared-memory programming models. Instead, it rather combines both. By also providing a distributed directory-based cache coherence protocol, it is up to the computer architect to choose from a broad range of execution models, including efficient message passing architectures as well as shared-memory models that can feature both of its NUMA or CC-NUMA variants.

The core feature of SCI based networks is the ability to perform remote memory operations through direct hardware distributed shared memory (DSM) support. The figure below gives a general overview of how this capability can be applied. The basis is formed by the SCI physical address space which allows addressing of any physical memory location on any connected node through a 64 bit identifier (16 bit to specify the node, 48 bit to specify the physical address). From this global address space, each node can import pieces of remote memory into a designated address window within the PCI address space using special address translation tables on the SCI adapter cards. After mapping the PCI address space into the virtual memory of a process, the remote memory can be directly accessed using standard user--level read and write operations. The SCI hardware forwards these operations transparently to the remote node and, in case of a read operation, returns the result. Due to the pure hardware implementation avoiding any software overhead, extremely low latencies of about 1.8 us (one way) can be achieved.

References and Links

  1. James, D.V.; Laundrie, A.T.; Gjessing, S.; Sohi, G.S., "Distributed-directory scheme: scalable coherent interface," Computer , vol.23, no.6, pp.74-77, Jun 1990 URL: http://www.lib.ncsu.edu:2178/iel5/2/2005/00055503.pdf?isnumber=2005∏=JNL&arnumber=55503&arnumber=55503&arSt=74&ared=77&arAuthor=James%2C+D.V.%3B+Laundrie%2C+A.T.%3B+Gjessing%2C+S.%3B+Sohi%2C+G.S.
  2. Gustavson, D. B. 1992. "The Scalable Coherent Interface and Related Standards Projects". IEEE Micro 12, 1 (Jan. 1992), 10-22. DOI= http://dx.doi.org/10.1109
  3. Gustavson, D.B.; Qiang Li, "The Scalable Coherent Interface (SCI)," Communications Magazine, IEEE , vol.34, no.8, pp.52-63, Aug 1996 URL: http://www.lib.ncsu.edu:2178/iel1/35/11187/00533919.pdf?isnumber=11187∏=STD&arnumber=533919&arnumber=533919&arSt=52&ared=63&arAuthor=Gustavson%2C+D.B.%3B+Qiang+Li/40.124376
  4. Alnaes, K.; Kristiansen, E.H.; Gustavson, D.B.; James, D.V., "Scalable Coherent Interface," CompEuro '90. Proceedings of the 1990 IEEE International Conference on Computer Systems and Software Engineering , vol., no., pp.446-453, 8-10 May 1990 URL: http://www.lib.ncsu.edu:2178/iel5/285/3365/00113656.pdf?isnumber=3365∏=STD&arnumber=113656&arnumber=113656&arSt=446&ared=453&arAuthor=Alnaes%2C+K.%3B+Kristiansen%2C+E.H.%3B+Gustavson%2C+D.B.%3B+James%2C+D.V.