CSC/ECE 506 Fall 2007/wiki3 2 tl: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
 
(50 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Simple Scalable Coherent Interface (SSCI)==
==Directory Based Cache Coherence==


Scalable distributed memory machines are made up of various nodes connected by a network. Each of these nodes is comprised of a processor with cache, a memory unit and a communication assist unit, which acts as the interface between the processor and the network. To obtain cache coherence on a physically distributed system built from numerous nodes without a interconnect which can be snooped, we can use a flat directory based scheme. In a flat directory scheme, directory information is located in a fixed place, typically at the home node where the memory is located.  
Scalable distributed memory machines are made up of various nodes connected by a network. Each of these nodes is comprised of a processor with cache, a memory unit and a communication assist unit, which acts as the interface between the processor and the network. To obtain cache coherence on a physically distributed system built from numerous nodes without a interconnect which can be snooped, we can use a flat directory based scheme. A flat directory based scheme can be either memory-based or cache-based and exhibit the following characteristics:
*In a flat directory scheme, directory information is located in a fixed place, typically at the home node where the memory is located.
*To locate copies of the data
**Memory-based: all the information is held in the directory at the home node.
**Cache-based: home node has pointer to first element of a linked list
*To communicate with those copies
**Memory-based: uses point-to-point messages which can be multicast or overlapped.
**Cache-based: uses a point-to-point linked list traversal to find communicate.
 
===Simple Scalable Coherent Interface (SSCI)===


In a cache-based directory scheme, every shareable block in memory is associated with a list of processors sharing that block. The home node maintains a pointer to first sharer plus state bits, which is the head pointer for the block. Each node with a cached copy maintains two additional pointers for each cache line to next and previous sharer, which are the forward and backward pointers.
In a cache-based directory scheme, every shareable block in memory is associated with a list of processors sharing that block. The home node maintains a pointer to first sharer plus state bits, which is the head pointer for the block. Each node with a cached copy maintains two additional pointers for each cache line to next and previous sharer, which are the forward and backward pointers.


[[Image:pict.jpg|thumnail|center|300px|Flat Cache-based Directory Scheme]]
[[Image:ccc.jpg|frame|center|300px|Flat Cache-based Directory Scheme [http://peace.snu.ac.kr/courses/parallelprocessing/materials/lecture11.ppt]]]
 
The SSCI protocol is an example of a flat cache-based directory scheme. The SSCI protocol is a simplified version of the SCI protocol which also uses the [http://en.wikipedia.org/wiki/MESI MESI protocol] in the caches. The directory entries can be distributed along with the memory and the high order bits of an address can be used to identify the location of the memory and directory entries for that portion of memory. The SSCI protocol scheme makes use of a full bit-vector approach by maintaining a vector which keeps track of the states in the cache, the states in the memory directory and a pointer to the processors that share a particular block.
 
Cache states in the SSCI can be labeled:
#M (modified)
#E (exclusive)
#S (Shared)
#I (Invalid)
States in the  memory directory may have the value of:
#U (Uncached)
#S (Shared)
#EM (Exclusive Modified)
 
In addition, there is a '''''local node''''' or requestor node, where the request originates, a '''''home node''''' where the memory and directory live and a '''''remote node''''' that has a copy of the block (exclusive or shared). 
 
In handling a read miss, a transaction takes place from the requestor node to the home node. Home either replies with the data or with head of the list of sharer nodes. The head pointer is changed to point to the requestor node and the requestor sends a transaction to first sharer to be inserted into the list. In handling a write miss, the requestor node obtains the head information from the home node and the requestor is inserted as new head into list. If the requestor is on the list it is deleted and reinserted as the head of the list. The list is then traversed and copies are invalidated.
 
===The Scalable Coherent Interface (SCI)===
 
The  [http://en.wikipedia.org/wiki/Scalable_Coherent_Interconnect Scalable Coherent Interface (SCI)] is a cache coherent memory model that can be used in systems up to 64K nodes. SCI's versatility stems from the fact that it combines the best of both message-based and shared-memory programming models. SCI uses point-to-point links so the concurrent data transfers can occur. The above SSCI protocol is a simplified version of the SCI protocol and therefore acts in much the same manner. For each block of memroy, there is a centralized “directory” that maintains the state of the block in the different caches. The directory is co-located with the corresponding memory and requests and replies on the interconnect are no longer seen by everyone but are limited to the specific nodes involved in the request.


The SSCI protocol is an example of a flat cahce-based directory scheme. The SSCI protocol is a simplified version of the SCI protocol which we will examine briefly. The SSCI protocol scheme makes use of a full bit-vector approach by maintaining a vector which keeps track of the staetes in the cache, the states in the memory directory and a pointer to the processors that share a particular block.
Memory states in SCI are defined as:
*''Home'':  no cached copy exists
*''Fresh'':  read-only copies may exist and the copy in the memory is valid.
*''Gone'':  a writable (either exclusive or dirty) copy exists and no valid copy exists on the local node.
The cached states differs significantly from the SSCI protocol in that SCI has 29 stable states and many pending states. The additional states in the SCI protocol exist to specify the location and state of a block of memory within the list of sharers. The stable states have two parts:
*The position in the sharing list
**Head, Tail, Mid, Only
*The actual state
**Dirty: modified or writable
**Clean: writable but it's the same value that's in memory
**Copy: readable
[[Image:ddd.jpg|frame|center|300px|Read miss in SCI Protocol <sup>[5]</sup>]]


==The Scalable Coherent Interface (SCI)==
The current head of a list may be in a ''pending'' state when a request arrives from another node. If this is the case, it is handled with a pending list where requests are linked in list starting at the current head. When the current head finishes, it passes sets the head of the list to the next requester in the pending list.


The Scalable Coherent Interface (SCI) is a cache coherent memory model that can be used in systems up to 64K nodes.
The above figure looks at the instance when a read request in handled. The requesting cache is in a pending state and sends a read transaction to the home node.The home node, if in the HOME state, means that no other cached copy exists so it changes its state to FRESH, updates its head pointer and replies to the requesting node that changes its state to ONLY_FRESH (it's a single node list). However, if the home node is in the FRESH state, then cached copies exist somewhere but the home node value is valid. Therefore, the home node updates its head pointer and sends the data as well as the previous list head. The requester tells the previous head and the previous head changes its state from ONLY_FRESH to TAIL_VALID and updates its backward pointer. The requester then receives an acknowledgment from the previous head, sets its forward pointer to the previous head and changes its state to HEAD_FRESH.
SCI's flexibility stems mainly from its communication protocol: In contrast to many former systems, it is not only restricted to either message-based or shared-memory programming models. Instead, it rather combines both. By also providing a distributed directory-based cache coherence protocol, it is up to the computer architect to choose from a broad range of execution models, including efficient message passing architectures as well as shared-memory models that can feature both of its NUMA or CC-NUMA variants.


The core feature of SCI based networks is the ability to perform remote memory operations through direct hardware distributed shared memory (DSM) support. The figure below gives a general overview of how this capability can be applied. The basis is formed by the SCI physical address space which allows addressing of any physical memory location on any connected node through a 64 bit identifier (16 bit to specify the node, 48 bit to specify the physical address). From this global address space, each node can import pieces of remote memory into a designated address window within the PCI address space using special address translation tables on the SCI adapter cards. After mapping the PCI address space into the virtual memory of a process, the remote memory can be directly accessed using standard user--level read and write operations. The SCI hardware forwards these operations transparently to the remote node and, in case of a read operation, returns the result. Due to the pure hardware implementation avoiding any software overhead, extremely low latencies of about 1.8 us (one way) can be achieved.
There are also the cases when there is a read request when the home node is in the GONE state, handling a write request and handling a write request when the node is the head of the list. For examples of these and a description of what occurs I refer you to [http://peace.snu.ac.kr/courses/parallelprocessing/materials/lecture11.ppt] as well as the links and references listed at the end.


==References and Links==
==References and Links==


#James, D.V.; Laundrie, A.T.; Gjessing, S.; Sohi, G.S., "Distributed-directory scheme: scalable coherent interface," Computer , vol.23, no.6, pp.74-77, Jun 1990 URL: http://www.lib.ncsu.edu:2178/iel5/2/2005/00055503.pdf?isnumber=2005∏=JNL&arnumber=55503&arnumber=55503&arSt=74&ared=77&arAuthor=James%2C+D.V.%3B+Laundrie%2C+A.T.%3B+Gjessing%2C+S.%3B+Sohi%2C+G.S.
#James, D.V.; Laundrie, A.T.; Gjessing, S.; Sohi, G.S., "Distributed-directory scheme: scalable coherent interface," Computer , vol.23, no.6, pp.74-77, Jun 1990 URL: http://www.lib.ncsu.edu:2178/iel5/2/2005/00055503.pdf?isnumber=2005∏=JNL&arnumber=55503&arnumber=55503&arSt=74&ared=77&arAuthor=James%2C+D.V.%3B+Laundrie%2C+A.T.%3B+Gjessing%2C+S.%3B+Sohi%2C+G.S.
#Gustavson, D. B. 1992. The Scalable Coherent Interface and Related Standards Projects. IEEE Micro 12, 1 (Jan. 1992), 10-22. DOI= http://dx.doi.org/10.1109/40.124376
#Gustavson, D. B. 1992. "The Scalable Coherent Interface and Related Standards Projects". IEEE Micro 12, 1 (Jan. 1992), 10-22. DOI= http://dx.doi.org/10.1109
#Gustavson, D.B.; Qiang Li, "The Scalable Coherent Interface (SCI)," Communications Magazine, IEEE , vol.34, no.8, pp.52-63, Aug 1996 URL: http://www.lib.ncsu.edu:2178/iel1/35/11187/00533919.pdf?isnumber=11187∏=STD&arnumber=533919&arnumber=533919&arSt=52&ared=63&arAuthor=Gustavson%2C+D.B.%3B+Qiang+Li/40.124376
#Alnaes, K.; Kristiansen, E.H.; Gustavson, D.B.; James, D.V., "Scalable Coherent Interface," CompEuro '90. Proceedings of the 1990 IEEE International Conference on Computer Systems and Software Engineering , vol., no., pp.446-453, 8-10 May 1990 URL: http://www.lib.ncsu.edu:2178/iel5/285/3365/00113656.pdf?isnumber=3365∏=STD&arnumber=113656&arnumber=113656&arSt=446&ared=453&arAuthor=Alnaes%2C+K.%3B+Kristiansen%2C+E.H.%3B+Gustavson%2C+D.B.%3B+James%2C+D.V.
#David E. Culler, Jaswinder Pal Singh. Parallel Computer Architecture: A Hardware-Software Approach. With Anoop Gupta. Morgan Kaufmann Publishers, 1998.
#http://engr.smu.edu/~rewini/slides/adv-arch-bk/chapter04.pdf
#http://peace.snu.ac.kr/courses/parallelprocessing/materials/lecture11.ppt

Latest revision as of 23:36, 17 October 2007

Directory Based Cache Coherence

Scalable distributed memory machines are made up of various nodes connected by a network. Each of these nodes is comprised of a processor with cache, a memory unit and a communication assist unit, which acts as the interface between the processor and the network. To obtain cache coherence on a physically distributed system built from numerous nodes without a interconnect which can be snooped, we can use a flat directory based scheme. A flat directory based scheme can be either memory-based or cache-based and exhibit the following characteristics:

  • In a flat directory scheme, directory information is located in a fixed place, typically at the home node where the memory is located.
  • To locate copies of the data
    • Memory-based: all the information is held in the directory at the home node.
    • Cache-based: home node has pointer to first element of a linked list
  • To communicate with those copies
    • Memory-based: uses point-to-point messages which can be multicast or overlapped.
    • Cache-based: uses a point-to-point linked list traversal to find communicate.

Simple Scalable Coherent Interface (SSCI)

In a cache-based directory scheme, every shareable block in memory is associated with a list of processors sharing that block. The home node maintains a pointer to first sharer plus state bits, which is the head pointer for the block. Each node with a cached copy maintains two additional pointers for each cache line to next and previous sharer, which are the forward and backward pointers.

Flat Cache-based Directory Scheme [1]

The SSCI protocol is an example of a flat cache-based directory scheme. The SSCI protocol is a simplified version of the SCI protocol which also uses the MESI protocol in the caches. The directory entries can be distributed along with the memory and the high order bits of an address can be used to identify the location of the memory and directory entries for that portion of memory. The SSCI protocol scheme makes use of a full bit-vector approach by maintaining a vector which keeps track of the states in the cache, the states in the memory directory and a pointer to the processors that share a particular block.

Cache states in the SSCI can be labeled:

  1. M (modified)
  2. E (exclusive)
  3. S (Shared)
  4. I (Invalid)

States in the memory directory may have the value of:

  1. U (Uncached)
  2. S (Shared)
  3. EM (Exclusive Modified)

In addition, there is a local node or requestor node, where the request originates, a home node where the memory and directory live and a remote node that has a copy of the block (exclusive or shared).

In handling a read miss, a transaction takes place from the requestor node to the home node. Home either replies with the data or with head of the list of sharer nodes. The head pointer is changed to point to the requestor node and the requestor sends a transaction to first sharer to be inserted into the list. In handling a write miss, the requestor node obtains the head information from the home node and the requestor is inserted as new head into list. If the requestor is on the list it is deleted and reinserted as the head of the list. The list is then traversed and copies are invalidated.

The Scalable Coherent Interface (SCI)

The Scalable Coherent Interface (SCI) is a cache coherent memory model that can be used in systems up to 64K nodes. SCI's versatility stems from the fact that it combines the best of both message-based and shared-memory programming models. SCI uses point-to-point links so the concurrent data transfers can occur. The above SSCI protocol is a simplified version of the SCI protocol and therefore acts in much the same manner. For each block of memroy, there is a centralized “directory” that maintains the state of the block in the different caches. The directory is co-located with the corresponding memory and requests and replies on the interconnect are no longer seen by everyone but are limited to the specific nodes involved in the request.

Memory states in SCI are defined as:

  • Home: no cached copy exists
  • Fresh: read-only copies may exist and the copy in the memory is valid.
  • Gone: a writable (either exclusive or dirty) copy exists and no valid copy exists on the local node.

The cached states differs significantly from the SSCI protocol in that SCI has 29 stable states and many pending states. The additional states in the SCI protocol exist to specify the location and state of a block of memory within the list of sharers. The stable states have two parts:

  • The position in the sharing list
    • Head, Tail, Mid, Only
  • The actual state
    • Dirty: modified or writable
    • Clean: writable but it's the same value that's in memory
    • Copy: readable
Read miss in SCI Protocol [5]

The current head of a list may be in a pending state when a request arrives from another node. If this is the case, it is handled with a pending list where requests are linked in list starting at the current head. When the current head finishes, it passes sets the head of the list to the next requester in the pending list.

The above figure looks at the instance when a read request in handled. The requesting cache is in a pending state and sends a read transaction to the home node.The home node, if in the HOME state, means that no other cached copy exists so it changes its state to FRESH, updates its head pointer and replies to the requesting node that changes its state to ONLY_FRESH (it's a single node list). However, if the home node is in the FRESH state, then cached copies exist somewhere but the home node value is valid. Therefore, the home node updates its head pointer and sends the data as well as the previous list head. The requester tells the previous head and the previous head changes its state from ONLY_FRESH to TAIL_VALID and updates its backward pointer. The requester then receives an acknowledgment from the previous head, sets its forward pointer to the previous head and changes its state to HEAD_FRESH.

There are also the cases when there is a read request when the home node is in the GONE state, handling a write request and handling a write request when the node is the head of the list. For examples of these and a description of what occurs I refer you to [2] as well as the links and references listed at the end.

References and Links

  1. James, D.V.; Laundrie, A.T.; Gjessing, S.; Sohi, G.S., "Distributed-directory scheme: scalable coherent interface," Computer , vol.23, no.6, pp.74-77, Jun 1990 URL: http://www.lib.ncsu.edu:2178/iel5/2/2005/00055503.pdf?isnumber=2005∏=JNL&arnumber=55503&arnumber=55503&arSt=74&ared=77&arAuthor=James%2C+D.V.%3B+Laundrie%2C+A.T.%3B+Gjessing%2C+S.%3B+Sohi%2C+G.S.
  2. Gustavson, D. B. 1992. "The Scalable Coherent Interface and Related Standards Projects". IEEE Micro 12, 1 (Jan. 1992), 10-22. DOI= http://dx.doi.org/10.1109
  3. Gustavson, D.B.; Qiang Li, "The Scalable Coherent Interface (SCI)," Communications Magazine, IEEE , vol.34, no.8, pp.52-63, Aug 1996 URL: http://www.lib.ncsu.edu:2178/iel1/35/11187/00533919.pdf?isnumber=11187∏=STD&arnumber=533919&arnumber=533919&arSt=52&ared=63&arAuthor=Gustavson%2C+D.B.%3B+Qiang+Li/40.124376
  4. Alnaes, K.; Kristiansen, E.H.; Gustavson, D.B.; James, D.V., "Scalable Coherent Interface," CompEuro '90. Proceedings of the 1990 IEEE International Conference on Computer Systems and Software Engineering , vol., no., pp.446-453, 8-10 May 1990 URL: http://www.lib.ncsu.edu:2178/iel5/285/3365/00113656.pdf?isnumber=3365∏=STD&arnumber=113656&arnumber=113656&arSt=446&ared=453&arAuthor=Alnaes%2C+K.%3B+Kristiansen%2C+E.H.%3B+Gustavson%2C+D.B.%3B+James%2C+D.V.
  5. David E. Culler, Jaswinder Pal Singh. Parallel Computer Architecture: A Hardware-Software Approach. With Anoop Gupta. Morgan Kaufmann Publishers, 1998.
  6. http://engr.smu.edu/~rewini/slides/adv-arch-bk/chapter04.pdf
  7. http://peace.snu.ac.kr/courses/parallelprocessing/materials/lecture11.ppt