Strategies to combat “False Sharing”

From Expertiza_Wiki
Jump to navigation Jump to search

Proper Block Sizing

An algorithm can be designed to select the block sizes, to minimize bus traffic through the use of variable (static) size blocks; i.e., the block size choice varies over the memory space of the program, but any given word is assigned to a specific fixed block size for the entire program execution. Starting with each word in the memory space that is used, neighboring blocks are combined. If when combined they produce less bus traffic than when left as single blocks. When neighboring words have similar access patterns and it is useful to prefetch one while demand fetching the other, the traffic is reduced when the words (or blocks) are grouped into a single unit due to fewer address transmissions over the bus. When excessive traffic is generated due to false sharing, the problem blocks are isolated by not combining them into larger units.


Data placement optimizations

(a) SplitScalar: Place scalar variables that cause false sharing in different blocks.

(b) Heap Allocate: Allocate shared space from different heap regions according to which processor request the space. It is common for a slave process to access the shared space that it requests itself. If no action is taken, the space allocated by different processes may share the same cache block and lead to false sharing.

(c) Expand Record: Expand records in an array (padding with dummy words) to reduce the sharing of a cache block by different records. While successful prefetching may occur within a record or across records, false sharing usually occurs across records, when more than one of them share the same cache block.

(d) Align Record: Choose a layout for arrays of records that minimizes the number of blocks the average record spans. This optimization maximizes prefetching of the rest of the record when one word of a record is accessed, and may also reduce false sharing.

(e) Lockscalar: Place active scalars that are protected by a lock in the same block as the lock variable. As a result, the scalar is prefetched when the lock is accessed.

False sharing is caused by a mismatch between the memory layout of the write-shared data and the cross-processor memory reference pattern to it. Manually changing the placement of this data to better conform to the memory reference pattern can reduce false sharing up to 75%.