CSC/ECE 506 Fall 2007/wiki3 1 ncdt
In a multiprocessor using a snoopy coherence protocol, the overall performance is a combination of the behavior of uniprocessor cache miss traffic and the traffic caused by the communication, which results in invalidations and subsequent cache misses. The uniprocessor miss rate is classified into the three C’s (capacity, compulsory, and conflict [1]). Similarly, the misses that arise from interprocessor communication are called coherence misses and can be divided into two different sources:
1. True sharing misses: Theses misses are caused by the communication of data through the cache coherence mechanism. For example, in an invalidation-based protocol, the first write by a processor to a shared cache block causes an invalidation to establish ownership of that block. Subsequently, when another processor attempts to read a modified word in that cache block, a miss occurs and the resultant block is transferred.
2. False sharing misses: These misses arise from the use of an invalidation-based coherence protocol with a single valid bit per cache block. False sharing occurs when a block is invalidated and a subsequent reference causes a miss because some word in the block, other than one being read, is written into. It is evident if the cache block size is increased, false sharing misses would also increase.
If the word (in a block) modified is actually used by the processor that received the invalidation, then the reference was a true sharing reference and would have caused a miss independent of the block size. If, however, the word being written and the word read are different and the invalidation does not cause a new value to be communicated, but only causes an extra cache miss, then it is a false sharing miss. The effect of coherence misses become significant for tightly coupled applications that share significant amount of data for example commercial workloads.