CSC/ECE 506 Spring 2010/10 DJ: Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 181: Line 181:
=='''Relaxing the WRITE to READ Program Order'''==   
=='''Relaxing the WRITE to READ Program Order'''==   


Here, the program orders are relaxed for a write followed by a read to a different location. These models  <sup>[3]</sup> include the '''IBM370model''', the '''SPARC V8 TSO''' and the '''processor consistency''' model (PC) .These models allow a read to be reordered with respect to previous writes from the same processor. <br />
Here, the program orders are relaxed for a write followed by a read to a different location. These models  <sup>[3]</sup> include the '''IBM370model''', the '''SPARC V8 TSO''' and the '''processor consistency''' model (PC) .These models allow a read to be reordered with respect to previous writes from the same processor.


'''IBM 370'''
 
'''IBM 370''' </br>
It is the strictest because it does not allow a read from returning the value of a write before the write is made visible to all processors. Therefore, even if a processor issues a  
It is the strictest because it does not allow a read from returning the value of a write before the write is made visible to all processors. Therefore, even if a processor issues a  


Line 195: Line 196:


<center>[[Image:ibm2.jpg]] </center>
<center>[[Image:ibm2.jpg]] </center>
<center>IBM 370 model <sup[6]/sup></center>


Placing a serialization instruction after the write on each processor provides sequentially consistent results. <br />
Placing a serialization instruction after the write on each processor provides sequentially consistent results. <br />
It does not need a safety net to ensure atomicity since it does not relax atomicity. <br />
It does not need a safety net to ensure atomicity since it does not relax atomicity. <br />


'''TSO model''':
'''TSO model''' </br>
The total store ordering (TSO) model is one of the models proposed for the SPARC V8 architecture [SFC91,SUN91].
The total store ordering (TSO) model is one of the models proposed for the SPARC V8 architecture [SFC91,SUN91].
It allows a read to return the value of its own processor’s write even before the write is
It allows a read to return the value of its own processor’s write even before the write is

Revision as of 02:41, 12 April 2010

MEMORY CONSISTENCY MODELS

The interface[1] for memory in a shared memory multiprocessor is called a memory consistency model. The memory consistency model of a shared-memory multiprocessor provides a formal specification[3] of how the memory system will appear to the programmer, eliminating the gap between the behavior expected by the programmer and the actual behavior supported by a system.

SEQUENTIAL CONSISTENCY

Lamport[1] defined a multiprocessor to be Sequentially consistent [SC] if: the result of any execution is the same as if the operations of all the processors were executed in some sequential order and the operations of each individual processor appear in this sequence in the order specified by the program.
The main advantage of selecting SC as the interface to shared memory hardware is that it is programmer’s intuition. SC permits coherence caching, prefetching and multithreading and yet keeps the software simple. In particular, SC enables middleware authors to use the same target as a multiprogrammed uniprocessor. Thus, SC should be a preferable choice for hardware architects.


Limitations of SC:

120p


Under SC, 1a -> 1b and 2a -> 2b. [2] So, in the above example A= 0 and B=0 never possible under SC.

1. A processor [3] must ensure that its previous memory operation is complete before proceeding with its next memory operation in program order. This requirement is called the program order requirement.

2. Determining [3] the completion of a write requires an explicit acknowledgement message from memory. Additionally, in a cache-based system, a write must generate invalidate or update messages for all cached copies.

3, SC makes it hard to use write buffers [1],because write buffers cause operations to be presented to the cache coherence protocol out of program order.

4. Some processors are precluded from overlapping multiple reads and writes in the memory system. This restriction is crippling in systems without caches.

5. Out of order execution [4] and instruction level parallelism can’t be used under SC.

6. Bypassing [4] a store to a younger load value is not allowed under SC since it violates atomicity.

7. Also, non blocking caches are not allowed.

8. For compilers [3], an analog of the program order requirement applies to straightforward implementations.

INTUITION behind relaxed memory-consistency models

After observing the limitations of SC, we conclude that the performance suffers greatly. The key idea will be to make execution of memory accesses faster and allow overlapping.

One possible solution is prefetching [4] When the older (previous) load/store is completed the load which has already issued a prefetch can access the cache and complete sooner because of cache hit. Also, when a store’s address is generated, a prefetch exclusive can be issued even though there are older pending load/stores. When these older loads/stores complete, the store can access the cache and complete without going to the bus.

After a block is been prefetched into the cache, it can be invalidated or after a block is prefetched in exclusive state, another processor may read, downgrading state to shared. So, prefetches are useless because the load will suffer a cache miss and store with still need to go the bus. They also incur unnecessary traffic. hence, it is not a perfect solution.

Next possible solution is speculation[4] With speculation, a younger (later) load is allowed to access the cache, but marked as speculative. If, by the time the first load completes and the block read by the second load has not been invalidated or naturally evicted, then the value obtained by the second load would be same, if it had to wait for the first load to be executed atomically. So, speculation is successful, if it fails, cancel the younger load and re execute it. Also, a younger load can be speculative to an older pending store.

However, applying it to store is harder since it cannot be cancelled easily.

Both these techniques have been used in MIPS R10000 and Intel Pentium architecture.

Still, the compiler cannot re order memory accesses when compiling the program. Welcome relaxed memory consistency models.

RELAXED CONSISTENCY MODELS

Relaxed memory consistency models[3] can be categorized based on three key characteristics: (1) how they relax the program order requirement, i.e.: whether they relax the order from a write to a following read, between two writes, and finally from a read to a following read or write and (2) how they relax the write atomicity requirement, i.e: whether they allow a read to return the value of another processor’s write before the write is made visible to all other processors.(3) A relaxation to both program order and write atomicity, where a processor is allowed to read the value of its own previous write before the write is made visible to other processors. In a cache-based system, this relaxation allows the read to return the value of the write before the write is serialized with respect to other writes to the same location and before the invalidations/updates of the write reach any other processor.


Relaxation W-> R order W -> W order R -> RW order Read Others Write Early Order Read Own Write Early Order Safety net
SC Y
IBM 370 Y serialization instructions
Total Store Ordering Y Y RMW
PC Y Y Y RMW
PSO Y Y Y RMW, STBAR
WO Y Y Y Y synchronization
RCsc Y Y Y Y release, acquire, nsync, RMW
RCpc Y Y Y Y Y release, acquire, nsync, RMW
Alpha Y Y Y Y MB, WMB
RMO Y Y Y Y MEMBARs
PowerPC Y Y Y Y Y Sync
Simple categorization of relaxed models [3]


Y: corresponding relaxation is allowed by straightforward implementations of the corresponding model & can be detected by the programmer


Relaxation Commercial Systems Providing the Relaxation
W -> R Order AlphaServer 8200/8400, Cray T3D, Sequent Balance, SparcCenter1000/2000
W -> W Order AlphaServer 8200/8400, Cray T3D
R -> RW Order AlphaServer 8200/8400, Cray T3D
Read Others’ Write Early Cray T3D
Read Own Write Early AlphaServer 8200/8400, Cray T3D, SparcCenter1000/2000
Some commercial systems that relax sequential consistency [3]

Relaxing the WRITE to READ Program Order

Here, the program orders are relaxed for a write followed by a read to a different location. These models [3] include the IBM370model, the SPARC V8 TSO and the processor consistency model (PC) .These models allow a read to be reordered with respect to previous writes from the same processor.


IBM 370
It is the strictest because it does not allow a read from returning the value of a write before the write is made visible to all processors. Therefore, even if a processor issues a

Here, Rs & Ws mean read and write generated by serialization instructions. Dashed lines indicate in program order.

read to the same address as a previous pending write from itself, the read must be delayed until the write is made visible to all processors.
To ensure the program order constraint, it provides special serialization instructions that may be placed between the two operations. eg. compare&swap.

IBM 370 model <sup[6]/sup>

Placing a serialization instruction after the write on each processor provides sequentially consistent results.
It does not need a safety net to ensure atomicity since it does not relax atomicity.

TSO model
The total store ordering (TSO) model is one of the models proposed for the SPARC V8 architecture [SFC91,SUN91]. It allows a read to return the value of its own processor’s write even before the write is serialized with respect to other writes to the same location.
It requires that no other writes to any location appear to occur between the read and the write of the read-modify-write.
It does not provide explicit safety nets.
The atomicity can be achieved by ensuring program order from the write to the read using read-modify-writes.
The disadvantages to rely on a read-modify write as a safety net:
A system may not implement a general read-modify-write that can be used to appropriately replace any read or write and replacing a read by a read-modify-write needs invalidating other copies of the line.