CSC/ECE 506 Spring 2012/ch10 sj: Difference between revisions
| Line 6: | Line 6: | ||
| Communicated data is automatically replicated only in the processor cache, not in local memory. This can lead to capacity misses and artifactual communication when working sets are large and include nonlocal data or when conflict misses are numerous. | Communicated data is automatically replicated only in the processor cache, not in local memory. This can lead to capacity misses and artifactual communication when working sets are large and include nonlocal data or when conflict misses are numerous. | ||
| ==High design and implementation cost== | ==High design and implementation cost== | ||
| Protocols are complex and getting them right in hardware take substantial design time. A simple directory-based protocol can be developed under various assumptions, namely that the directory state and sharing vector are always aware of current cache state and that there is no overlap between requests. These assumptions, however, do not generally hold true in real systems, and various means have been devised to handle these types of situations. The first situation can be handled by splitting up one of the states based on which node is believed to be the owner at the time. While the second situation can be handle by serializing requests, this tends to be very expensive from a performance standpoint. Additional complexity, therefore, must be built into the protocol design in order to allow for non-atomic request processing. | Protocols are complex and getting them right in hardware take substantial design time. A simple directory-based protocol can be developed under various assumptions, namely that the directory state and sharing vector are always aware of current cache state and that there is no overlap between requests. These assumptions, however, do not generally hold true in real systems, and various means have been devised to handle these types of situations. The first situation can be handled by splitting up one of the states based on which node is believed to be the owner at the time. While the second situation can be handle by serializing requests, this tends to be very expensive from a performance standpoint. Additional complexity, therefore, must be built into the protocol design in order to allow for non-atomic request processing.<ref>Yah Solihin. Fundamentals of Parallel Computer Architecture. 2008-2009.</ref> | ||
| =Intuition behind relaxed memory-consistency models= | =Intuition behind relaxed memory-consistency models= | ||
Revision as of 02:02, 4 April 2012
Limitations of cache-coherent directory-based systems
While cache-coherent directory-based systems offer a more scalable alternative than both snoopy bus protocols and snoopy point-to-point protocols, they are not without their limitations. These limitations can be overcome to some extent, but this results in trade-offs on both the hardware and the software level.
High waiting time at memory operations
Sequential consistency affects performance in scalable design
Limited capacity for replication
Communicated data is automatically replicated only in the processor cache, not in local memory. This can lead to capacity misses and artifactual communication when working sets are large and include nonlocal data or when conflict misses are numerous.
High design and implementation cost
Protocols are complex and getting them right in hardware take substantial design time. A simple directory-based protocol can be developed under various assumptions, namely that the directory state and sharing vector are always aware of current cache state and that there is no overlap between requests. These assumptions, however, do not generally hold true in real systems, and various means have been devised to handle these types of situations. The first situation can be handled by splitting up one of the states based on which node is believed to be the owner at the time. While the second situation can be handle by serializing requests, this tends to be very expensive from a performance standpoint. Additional complexity, therefore, must be built into the protocol design in order to allow for non-atomic request processing.<ref>Yah Solihin. Fundamentals of Parallel Computer Architecture. 2008-2009.</ref>
Intuition behind relaxed memory-consistency models
The System Specification
What program orders among memory operations are guaranteed to be preserved? If program order is not guaranteed to be preserved by default, what mechanisms does the system provide for a programmer to enforce order explicitly when desired?
The Programmer's Interface
The programmer's interface should provide certain rules for the programmer to follow so that (s)he doesn't have to worry about order-preserving mechanisms.
The Translation Mechanism
There needs to be a translation of the programmers' annotations to the interface exported by the system specification, so that the system may do its job.