CSC/ECE 506 Fall 2007/wiki3 8 38: Difference between revisions
No edit summary |
(→SCI) |
||
Line 6: | Line 6: | ||
= SCI = | = SCI = | ||
The scalable coherent interface has now become a chief hardware based approach to the cache coherence problem in shared memory multiprocessors. SCI is a directory based invalidate coherence protocol. The state of a cache block is distributed to the sharers of that block. Limits that are inherent in bus technology are easily avoided by SCI. The SCI protocol is to provide scalability, coherence and an interface. Scalability is to guarantee that the same mechanisms can be used in single processor systems and large highly parallel multiprocessors. Coherence is to guarantee efficient and integral use of cache memories in distributed shared memory. An interface provides a communication architecture that has multiple values to be brought into a single system and provide smooth inter-operation. | |||
In SCI, every interface does not wait for the signal to propagate before it begins to send the next signal. Also, SCI utilizes multiple links so that, concurrently, several transfers can take place. | |||
Usually, a directory entry is in either of the two states : ‘home’ or ‘gone’. If the state is in ‘home’, then memory can immediately satisfy requests to a block as the block has not been cached by any processor. If the state is ‘gone’, then the block has been cached by a processor and might even be modified. Now, the directory contains a pointer to the first processor in the sharing list for this particular block. Hence, on requesting the data, memory returns the pointer to the first processor on the shared list rather than the data itself. The processor asking for the block now forwards its request to the processor on the top of the shared list. Now, the requesting processor adds itself into the shared list as the new head of the list. | |||
In the SCI protocol, any coherent transaction has three phases. | |||
Memory read – When a processor misses in its cache, It asks for the block in the home directory in memory. If the state of the memory is ‘home’, then the main memory replies with the block to the requesting processor. If the state is ‘gone’, then the main memory returns the head of the shared list of processors for that particular block. Then, memory updates its pointer and puts the requesting processor as the new head. | |||
Cache read – When memory returns a pointer toe the requesting processor instead of data, the processor forwards its request for the block to the head of the doubly linked list. When the cache receives, the head of the list returns the data which might have been modified. The head of the list changes its backward pointer to the requesting processor’s node. Now, the requesting processor becomes the head of the list and it has the cache block. | |||
Cleanup – If the cache miss from the requesting processor is a store, the processor has to first invalidate all other cached copies and then only proceed with the store. The new head gives an invalidate request to the next address on the list i.e. to its next processor. This processor invalidates and gives back a pointer to the next processor on its list. The head of the list uses this new pointer and sends it an invalidate request. This goes on until a NULL pointer is returned. This is the cleanup process for invalidation of other cache copies of the block. | |||
= SSCI = | = SSCI = |
Revision as of 18:21, 17 October 2007
Wiki: SCI. The IEEE Scalable Coherent Interface is a superset of the SSCI protocol we have been considering in class. A lot has been written about it, but it is still difficult to comprehend. Using SSCI as a starting point, explain why additional states are necessary, and give (or cite) examples that demonstrate how they work. Ideally, this would still be an overview of the working of the protocol, referencing more detailed documentation on the Web.
Introduction
SCI
The scalable coherent interface has now become a chief hardware based approach to the cache coherence problem in shared memory multiprocessors. SCI is a directory based invalidate coherence protocol. The state of a cache block is distributed to the sharers of that block. Limits that are inherent in bus technology are easily avoided by SCI. The SCI protocol is to provide scalability, coherence and an interface. Scalability is to guarantee that the same mechanisms can be used in single processor systems and large highly parallel multiprocessors. Coherence is to guarantee efficient and integral use of cache memories in distributed shared memory. An interface provides a communication architecture that has multiple values to be brought into a single system and provide smooth inter-operation.
In SCI, every interface does not wait for the signal to propagate before it begins to send the next signal. Also, SCI utilizes multiple links so that, concurrently, several transfers can take place.
Usually, a directory entry is in either of the two states : ‘home’ or ‘gone’. If the state is in ‘home’, then memory can immediately satisfy requests to a block as the block has not been cached by any processor. If the state is ‘gone’, then the block has been cached by a processor and might even be modified. Now, the directory contains a pointer to the first processor in the sharing list for this particular block. Hence, on requesting the data, memory returns the pointer to the first processor on the shared list rather than the data itself. The processor asking for the block now forwards its request to the processor on the top of the shared list. Now, the requesting processor adds itself into the shared list as the new head of the list.
In the SCI protocol, any coherent transaction has three phases.
Memory read – When a processor misses in its cache, It asks for the block in the home directory in memory. If the state of the memory is ‘home’, then the main memory replies with the block to the requesting processor. If the state is ‘gone’, then the main memory returns the head of the shared list of processors for that particular block. Then, memory updates its pointer and puts the requesting processor as the new head.
Cache read – When memory returns a pointer toe the requesting processor instead of data, the processor forwards its request for the block to the head of the doubly linked list. When the cache receives, the head of the list returns the data which might have been modified. The head of the list changes its backward pointer to the requesting processor’s node. Now, the requesting processor becomes the head of the list and it has the cache block.
Cleanup – If the cache miss from the requesting processor is a store, the processor has to first invalidate all other cached copies and then only proceed with the store. The new head gives an invalidate request to the next address on the list i.e. to its next processor. This processor invalidates and gives back a pointer to the next processor on its list. The head of the list uses this new pointer and sends it an invalidate request. This goes on until a NULL pointer is returned. This is the cleanup process for invalidation of other cache copies of the block.