ECE506 CSC/ECE 506 Spring 2012/11a az: Difference between revisions
Line 17: | Line 17: | ||
Thus, for correctness purposes, it is required that write propagation and write serialization are provided by the cache coherence implementation. | Thus, for correctness purposes, it is required that write propagation and write serialization are provided by the cache coherence implementation. | ||
In order to maintain cache coherence, a cache coherence protocol is implemented in hardware (or in specific cases, in software). In DSM systems, the cache coherence controller interfaces with the processor and it's cache, but also has a communication link to the other nodes through an interconnect network. It receives and acts upon requests from the local processor, as well as receives and acts on requests or messages sent | In order to maintain cache coherence, a cache coherence protocol is implemented in hardware (or in specific cases, in software). In DSM systems, the cache coherence controller in a node interfaces with the processor and it's cache, but also has a communication link to the other nodes through an interconnect network via a <i>communication assist</i>. It receives and acts upon requests from the local processor, as well as sends and receives (and acts on) requests sent to/from other nodes as <i>network transactions</i> through the communication assist. | ||
Unlike bus based multiprocessor systems, the coherence controllers are not connected with a medium that allows for (serialized) communication nor bus signal lines, such as the SHARED line (which is asserted in a bus based system when another processor has a copy of that cache block which is being addressed). In bus based systems, the bus is also the medium in which invalidations or updates are sent to other coherence controllers, depending on the coherence protocol. Further, bus based systems allow for snooping of requests from other coherence controllers such as read, read-exclusive, flushes, etc. Since no bus exists, but invalidations or updates have to be sent to other coherence controllers, these are sent as network transactions. Additionally, since no bus exists, it isn't guaranteed that a request will be seen by other processors once it is sent, so acknowledgement messages are also sent as network transactions in response to requests. | |||
DSM based systems do not replicate the broadcasting of messages to other coherence controllers as bus based systems do because the bandwidth requirements would be prohibitively large. Many DSM systems utilize a construct called a <i>directory</i> that stores information about which cache block is cached in which state by the different nodes to avoid having to broadcast invalidations, updates, upgrades, interventions, flushes, or other messages sent by coherence controllers on buses. The directory enables a node to select a subset of nodes as message recipients intelligently, thereby reducing the network traffic. | |||
== Memory consistency == | == Memory consistency == |
Revision as of 16:26, 15 April 2012
11a. Performance of DSM systems. Distributed shared memory systems combine the programming models of shared memory systems, and the scalability of distributed systems. However, since DSM systems need extra coordination between software layer and underlying hardware, achieving good performance could be a big challenge. The factors that harm the performance could be the overhead to maintain cache coherence, memory consistency, and the latency of interconnections. Please further explore the factors that can affect the performance of DSM systems, and the improvements that have been made on the existing systems.
Introduction
Cache coherence
DSM systems must maintain cache coherence just as it required by bus-based multiprocessor systems. Cache coherence problems arise when it is undefined how a change of a value in a specific processor's cache is propagated to the other caches [1, p. 183]. If multiple processors access and modify a shared location in memory and produce outputs based on that shared variable, it is possible to calculate incorrect values if cache coherence is not maintained.
Ensuring that a value changed in one cache is sent to another cache is called write propagation. [1, p. 183] Write propagation is one of the requirements that must be addressed to be provide cache coherency. Without write propagation, one processor could modify a cached value and not notify the other processors that have the same value cached. The other caches may believe they have the latest data, thus on subsequent reads, their caches will provide it to their respective processors, leading to incoherent results.
FIXME: EXAMPLE?
Another requirement for cache coherence is write serialization, which Solihin [1 p. 183] defines as a requirement that "multiple changes to a single memory location are seen in the same order by all processors". If two processors perform writes to a single memory location in a certain order, then all other processors in the system should see the writes (by reading that memory location and subsequently caching the values) in the order in which they were written. If other processors observe the writes by reading the variable, but see the writes in different orders, this can lead to incoherent copies of the same variable in multiple caches while each think they have the latest copy.
FIXME: EXAMPLE?
Thus, for correctness purposes, it is required that write propagation and write serialization are provided by the cache coherence implementation.
In order to maintain cache coherence, a cache coherence protocol is implemented in hardware (or in specific cases, in software). In DSM systems, the cache coherence controller in a node interfaces with the processor and it's cache, but also has a communication link to the other nodes through an interconnect network via a communication assist. It receives and acts upon requests from the local processor, as well as sends and receives (and acts on) requests sent to/from other nodes as network transactions through the communication assist.
Unlike bus based multiprocessor systems, the coherence controllers are not connected with a medium that allows for (serialized) communication nor bus signal lines, such as the SHARED line (which is asserted in a bus based system when another processor has a copy of that cache block which is being addressed). In bus based systems, the bus is also the medium in which invalidations or updates are sent to other coherence controllers, depending on the coherence protocol. Further, bus based systems allow for snooping of requests from other coherence controllers such as read, read-exclusive, flushes, etc. Since no bus exists, but invalidations or updates have to be sent to other coherence controllers, these are sent as network transactions. Additionally, since no bus exists, it isn't guaranteed that a request will be seen by other processors once it is sent, so acknowledgement messages are also sent as network transactions in response to requests.
DSM based systems do not replicate the broadcasting of messages to other coherence controllers as bus based systems do because the bandwidth requirements would be prohibitively large. Many DSM systems utilize a construct called a directory that stores information about which cache block is cached in which state by the different nodes to avoid having to broadcast invalidations, updates, upgrades, interventions, flushes, or other messages sent by coherence controllers on buses. The directory enables a node to select a subset of nodes as message recipients intelligently, thereby reducing the network traffic.
Memory consistency
Interconnections
Performance Concerns
Maintaining cache coherence
Maintaining memory consistency
Latency of interconnections
Performance Improvements
Maintaining cache coherence
Maintaining memory consistency
Relaxed memory models with fine granularity coherence
Latency of interconnections
Definitions
- DSM
- Distributed shared memory, a parallel computer architecture which consists of a set of nodes that maintain their own local memory, but all nodes are connected together, making their memories one shared addressable space.
- granularity
- FIXME
- node
- A compute unit that makes up one components of a DSM system. A node consists of one or more sets of processors, cache, and memory. A node is connected to the larger DSM system through an interconnect.
- write propagation
- FIXME
- write serialization
- FIXME
References
<references/>