11a. Performance of DSM systems. Distributed shared memory systems combine the programming models of shared memory systems, and the scalability of distributed systems. However, since DSM systems need extra coordination between software layer and underlying hardware, achieving good performance could be a big challenge. The factors that harm the performance could be the overhead to maintain cache coherence, memory consistency, and the latency of interconnections. Please further explore the factors that can affect the performance of DSM systems, and the improvements that have been made on the existing systems.

Introduction

Cache coherence

DSM systems must maintain cache coherence just as it required by bus-based multiprocessor systems. Cache coherence problems arise when it is undefined how a change of a value in a specific processor's cache is propagated to the other caches [1, p. 183]. If multiple processors access and modify a shared location in memory and produce outputs based on that shared variable, it is possible to calculate incorrect values if cache coherence is not maintained.

Ensuring that a value changed in one cache is sent to another cache is called write propagation. [1, p. 183] Write propagation is one of the requirements that must be addressed to be provide cache coherency. Without write propagation, other caches may believe they have the latest data and will provide it to their respective processors, leading to possibly incoherent results.

Another requirement for cache coherence is write serialization, which Solihin [1 p. 183] defines as a requirement that "multiple changes to a single memory location are seen in the same order by all processors".

In order to maintain cache coherence, a cache coherence protocol is implemented in hardware (or in specific cases, in software). In DSM systems, the cache coherence controller interfaces with the processor and it's cache, but also has a communication link to the other nodes through an interconnect network. It receives and acts upon requests from the local processor, as well as receives and acts on requests or messages sent across the interconnects from other nodes.

Memory consistency

Interconnections

Performance Concerns

Maintaining cache coherence

Maintaining memory consistency

Latency of interconnections

Performance Improvements

Maintaining cache coherence

Maintaining memory consistency

Relaxed memory models with fine granularity coherence

Latency of interconnections

Definitions

DSM: Distributed shared memory, a parallel computer architecture which consists of a set of nodes that maintain their own local memory, but all nodes are connected together, making their memories one shared addressable space.

granularity: FIXME

node: A compute unit that makes up one components of a DSM system. A node consists of one or more sets of processors, cache, and memory. A node is connected to the larger DSM system through an interconnect.

References

ECE506 CSC/ECE 506 Spring 2012/11a az

Contents

Introduction

Cache coherence

Memory consistency

Interconnections

Performance Concerns

Maintaining cache coherence

Maintaining memory consistency

Latency of interconnections

Performance Improvements

Maintaining cache coherence

Maintaining memory consistency

Relaxed memory models with fine granularity coherence

Latency of interconnections

Definitions

References

Quiz

Navigation menu

ECE506 CSC/ECE 506 Spring 2012/11a az

Introduction

Cache coherence

Memory consistency

Interconnections

Performance Concerns

Maintaining cache coherence

Maintaining memory consistency

Latency of interconnections

Performance Improvements

Maintaining cache coherence

Maintaining memory consistency

Relaxed memory models with fine granularity coherence

Latency of interconnections

Definitions

References

Quiz

Navigation menu

Search