CSC/ECE 506 Spring 2011/ch10 MC

From Expertiza_Wiki
Jump to navigation Jump to search

Supplement to Chapter 10: Memory Consistency Models

Introduction

Memory consistency deals with visibility of the ordering of memory accesses to shared data across processors (or cores) in a multiprocessor environment. Memory consistency models define frameworks that outline what programmers can expect when writing multi-threaded programs using shared memory. We briefly discuss the guarantees provided by the hardware memory model. We then look at the motivation of defining the memory model at the programming language level. The Java programming model was designed ground up with a memory model in mind, this has resulted in language features that allow it to enforce the Java Memory Model (JMM) in a manner that is transparent to the Java programmer. We look at some of the building blocks in the Java programming language to enforce the JMM. We follow that up with a detailed discussion of the JMM. After discussing the JMM, we look at safe ways to elide unnecessary synchronization by using the guarantees that the JMM provides. We look at programming idioms that should be used for safe initialization of objects by considering the invariants that the JMM provides. We discuss the Double-Checked Locking idiom that is an example of incorrect elision of the synchronized construct and explain why it is broken. We round off by looking at useful thumb rules to write safe Java code.

The Java Memory Model

Motivation for the Java Memory Model

Chapter 10 of Solihin (2009) looks at what is specified by the memory model at the hardware level and why it is required. At the processor level, a memory model defines necessary and sufficient conditions for knowing that writes to memory by other processors are visible to the current processor, and writes by the current processor are visible to other processors. Some processors exhibit a strong memory model, where all processors see the same sequence of changes to a memory location. Other processors exhibit a weaker memory model, where special instructions, called memory barriers, are required to flush or invalidate the local processor cache in order to see writes made by other processors or make writes by this processor visible to others. These memory barriers are usually performed when lock and unlock actions are taken. Even on some of the strongest memory models, memory barriers are often necessary; quite frequently their placement is counterintuitive. Recent trends in processor design have encouraged weaker memory models, because the relaxations they make for cache consistency allow for greater scalability across multiple processors and larger amounts of memory.

The issue of when a write becomes visible to another thread is compounded by the compiler's reordering of code. For example, the compiler might decide that it is more efficient to move a write operation later in the program; as long as this code motion does not change the program's semantics, it is free to do so. If a compiler defers an operation, another thread will not see it until it is performed; this mirrors the effect of caching. Most languages provide keywords such as "volatile" to get around compiler reordering of memory accesses.

All of this flexibility is by design -- by giving the compiler, runtime, or hardware the flexibility to execute operations in the optimal order, within the bounds of the memory model, we can achieve higher performance.

The Java programming language is fundamentally different than older languages such as C/C++ in a few different ways. Java is platform independent and strongly follows the philosophy of write once , run everywhere. It achieves this by abstracting away the specifics of the underlying platform from the Java programmer. This abstraction is provided by the Java Virtual machine (JVM), which is the layer between the bare metal and a Java program. Java programs as opposed to C/C++ programs don't run on directly on bare metal; instead they are run atop a virtual machine. The JVM is cognizant of the underlying platform; i.e., the underlying instruction-set architecture, the memory model, the operating system etc. Java programs are first compiled into a universal binary, referred to as byte code, the byte code is then translated by the JVM into instructions specific to the underlying architecture. The JVM has interpreters and compilers built into it for this purpose.

Providing platform independence is one of the key goals of the Java programming languages; thus multi-threaded Java programs are expected to run safely on platforms with different memory models. The JMM shields the Java developer from the differences between memory models across architectures and the JVM deals with the differences between the JMM and the underlying platform's memory model by inserting memory barriers at the appropriate places.

The Java Memory Model describes what behaviors are legal in multithreaded code, and how threads may interact through memory. It describes the relationship between variables in a program and the low-level details of storing and retrieving them to and from memory or registers in a real computer system. It does this in a way that can be implemented correctly using a wide variety of hardware and a wide variety of compiler optimizations.

The JMM specifies the minimal guarantees the JVM must make about when writes to variables become visible to other threads. It was designed to balance the need for predictability and ease of program development with the realities of implementing high-performance JVMs on a wide range of popular processor It preserves existing safety guarantees, like type-safety. It also defines the semantics of incompletely or incorrectly synchronized programs so that potential security hazards are minimized.

Concurrency-related building blocks in Java

Java includes several language constructs, including volatile, final, and synchronized, which are intended to help the programmer describe a program's concurrency requirements to the compiler. Each programming language defines the behavior of volatile variables differently, however, in general it implies that it prevents the compiler from applying any optimizations on accesses to these variables, for e.g., these variables generally are not cached in registers and are read from memory each time. The final keyword is used in several different contexts to define an entity which cannot later be changed. A final class cannot be extended, a final method cannot be overridden by subclasses and a final variable can only be assigned once. The Java Memory Model defines the behavior of volatile and synchronized, and, more importantly, ensures that a correctly synchronized Java program runs correctly on all processor architectures.

The JMM Specifications

The Java memory model is specified in terms of actions, which include reads and writes to variables, locks and unlocks of monitors, and starting and joining with threads. The JMM defines a partial ordering called happens-before on all actions within the program. To guarantee that the thread executing action B can see the results of action A (whether or not A and B occur in different threads), there must be a happens-before relationship between A and B. In the absence of a happens-before ordering between two operations, the JVM is free to reorder them as it pleases.

A data race occurs when a variable is read by more than one thread, and written by at least one thread, but the reads and writes are not ordered by happens-before. A correctly synchronized program is one with no data-races, they exhibit sequential consistency.

The rules for happens-before are:

  • Program-order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
  • Monitor-lock rule: An unlock on a monitor lock happens-before every subsequent lock on that same monitor lock.
  • Volatile-variable rule: A write to a volatile field happens-before every subsequent read of that same field.
  • Thread-start rule: A call to Thread.Start on a thread happens-before every action in the started thread.
  • Thread-termination rule: Any action in a thread happens-before any other thread detects that thread has terminated, either by successfully return from Thread.join or by Thread.isAlive returning false.
  • Interruption rule: A thread calling interrupt on another thread happens-before the interrupted thread detects the interrupt (either by having InterruptedException thrown, or invoking isInterrupted or interrupted).
  • Finalizer rule: The end of a constructor for an object happens-before the start of the finalizer for that object.
  • Transitivity: If a A happens-before B, and B happens-before C, then A happens-before C.

Locks and unlocks on explicit Lock objects have the same memory semantics as intrinsic locks. Read and writes of atomic variables have the same memory semantics as volatile variables. For example, if two threads synchronize on a lock, then all memory accesses on the thread releasing the lock first are visible to the thread immediately acquiring the lock after the release. The JMM allows accesses outside the lock region to be moved into locked regions; this is analogous to the release consistency model covered in section 10.3.4 of Solihin (2009). Re-entrant and thread-local locks are generally no-ops.

It is important to note that if two threads synchronize on different locks, then we can't say anything about the ordering of actions between them, since there is no happens-before relation between the actions in the two threads.

Volatile

volatile variables are a useful approach to writing non-blocking algorithms, especially if programmers want to avoid using synchronization, i.e., the synchronized block. Volatile access guarantees visibility and correct ordering. Consider the case where thread-1 writes to a non-volatile variable x and then sets a volatile boolean value ready, thread-2 spin-waits on the volatile variable ready before it reads the variable x. Since ready is a volatile boolean variable, there is a happens-before edge from the write to ready to any read that sees that write. Assume thread 2’s read of ready sees the write to ready, thread 1’s write of x must happen-before thread 2’s read of x (because of program order), and thread 2’s read must return the new value of x. If ready were not volatile, one could imagine a compiler transformation that reordered the two writes in thread-1, resulting in a read of true for ready, but a read of 0 for x.

volatile reads/writes go directly to memory, i.e., no cached values are used. Also, accesses to volatile longs and doubles are atomic, while accesses to non-volatile longs/doubles can be split into two 32-bit accesses. This can result in subtle concurrency bugs in the code, however on most current JVM's, on modern 64 bit architecture even these are atomic in nature. The JMM states ensures that volatile reads/writes cannot be reordered, essentially reads and writes to volatile variables become acquire/release pairs. A volatile write is similar to an unlock and a volatile read is similar to a lock. A volatile write happens-before all volatile reads of the same variable.

If there are 2 accesses to a memory location and if one them is a write and the location is not a volatile then there should be a happens-before relationship between the two accesses to avoid a data race.

There is a general impression among programmers that there is a performance penalty to using volatile variables. However, this is misplaced, for, e.g., on the X86 platform, there is no penalty for a read of a volatile variable, i.e., no extra barriers, the coherence/consistency model of X86 provides this for the programmer. On the other hand, not using volatile variables can result in subtle, very difficult-to-fix concurrency bugs.

Piggybacking on synchronization

Since the JMM guarantees happens-before ordering in the cases listed above, it is possible to piggyback on the visibility properties of an existing synchronization. This entails combining the program order rule for happens-before with one of the other ordering rules (usually the monitor lock or volatile variable rule) to order accesses to a variable not otherwise guarded by a lock. This technique is called piggybacking because it uses an existing happens-before ordering that was created for some other reason to ensure the visibility of an object, rather than creating a happens-before ordering specifically for publishing the object. This is commonly done in class libraries. A Java developer, however, should be careful in using this technique as it assumes a good understanding of the JMM and can be fragile otherwise. The happens-before orderings guaranteed by the class library include:

  • Placing an item in a thread-safe collection happens-before another thread retrieves that item from the collection
  • Releasing a permit to a semaphore happens-before acquiring a permit from the same Semaphore.
  • Actions taken by the task represented by a Future happens-before another thread successfully returns from Future.get.

Other such guarantees are listed by Goetz .

Safe Initialization

It is important to initialize the objects completely and to ensure that the object state is entirely visible before publishing a reference to the newly constructed object to other threads. This can be done easily by using the synchronized keyword on the constructor. The treatment of static fields with initializers (of fields whose value is initialized in a static initialization block) offers additional thread-safety guarantees. Static initializers are run by the JVM at class-initialization time, after class loading but before the class is used by any thread. Because the JVM acquires a lock during initialization and this lock is acquired by each thread at least once to ensure that the class has been loaded, memory writes made during static initialization are automatically visible to all threads. Thus statically initialized objects require no explicit synchronization either during construction or when being referenced. This only applies to the as-constructed state-- if the object is mutable, synchronization is still required by both readers and writers to make subsequent modifications visible and to avoid data corruption.

Initialization safety guarantees that for properly constructed objects, all threads will see the correct values of final fields that were set by the constructor, regardless of how the object is published. Publishing an object means making it available to code outside of its current scope, such as by storing a reference to it where other code can find it, returning it from a non-private method, or passing it to a method in another class. Further, any variables that can be reached through a final field of a properly constructed object (such as the elements of a final array or the contents of a HashMap referenced by a final field) are also guaranteed to be visible to other threads.( Goetz)

Double-Checked Locking

Lazy initialization, has often been used to reduce startup time for an application, or to avoid expensive operations that are potentially unnecessary. The goal of Lazy-initialization is to defer initializing an object unitl it is actually needed while at the same time ensuring that it is initialized only once. A properly written lazy-initialization method requires synchronization; however synchronization was perceived as expensive and something to be avoided. In addition, the visibility aspects of synchronization, were not completely understood. This resulted in the invention of the double-checked locking (DCL) paradigm, which tries to offer lazy initialization without paying the synchronization penalty on the common code path. Double-checked locking first checks--without synchronizing--whether initialization is needed, and if the resource reference is not null, it can be used immediately. Otherwise, the thread synchronizes again and checks again whether the resource is initialized, ensuring that only one thread actually initializes the shared resource. The common code path is generally the one fetching a reference to an already constructed resource and this idiom elides synchronization in this path. This, however, results in the problem because there is no happens-before edge between the unsynchronized reader of the reference and the publisher of the reference to the resource. This implies that the reader can see a partially constructed resource.

The code snippet below shows an example of the DCL idiom.

   private Resource resource = null; // Single threaded version
   public Resource getHelper() {
       if (resource == null)
           resource = new Resource();
       return resource;
   }

The current JMM specification has enabled DCL to work if the resource reference is made volatile. However the utility of the idiom has largely passed--the motivating factors such as slow uncontended synchronization, slow JVM startup are no longer serious concerns. The lazy-initialization holder idiom offers the same benefits and is easier to understand.

Writing safe Java code

The Java memory model, along with the JDK, makes it easier for Java programmers to write safe concurrent programs. Java programmers should use concurrency abstractions provided by the JDK libraries, i.e., java.util.concurrent, java.util.concurrent.locks and synchronized blocks. Programmers can also use low-level primitives such as volatile variables and atomic classes in the JDK if they need to implement safe non-blocking synchronization. The java.util.concurrent.atomic provides a toolkit of classes that support lock-free thread-safe programming on single variables. Java programmers are strongly encouraged to use lock-based thread-safety constructs such as synchronized blocks, modern JVM's implement optimizations such as thin-locks and biased locking to reduce the overhead of locks significantly.

References

  1. Yan Solihin, Fundamentals of Parallel Computer Architecture: Multichip and Multicore Systems, Solihin Books, August 2009.
  2. David E. Culler, Jaswinder Pal Singh, and Anoop Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Gulf Professional Publishing, August 1998.
  3. Jeremy Manson, William Pugh and Sarita Adve, The Java Memory Model http://rsim.cs.illinois.edu/Pubs/popl05.pdf
  4. Bill Pugh JMM Page http://www.cs.umd.edu/~pugh/java/memoryModel/
  5. Brian Goetz With Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, and Doug Lea, Java Concurrency in Practice, Addison-Wesley, 2006.
  6. Reentrant Mutex http://en.wikipedia.org/wiki/Reentrant_mutex
  7. Jeremy Manson, William Pugh and Sarita Adve The Java Memory Model http://cseweb.ucsd.edu/classes/fa05/cse231/Fish.pdf
  8. Jeremy Manson, The Java Memory Model, Programming Languages Talk Series http://video.google.com/videoplay?docid=8394326369005388010#
  9. Jeremy Manson, William Pugh and Sarita Adve The Java Memory Model http://www.inf.ethz.ch/personal/daniekro/classes/se-sem/ss2005/slides/sgier.pdf
  10. Final fields in Java http://www.cs.umd.edu/~pugh/java/memoryModel/may-12.pdf
  11. Doug Lea Java Memory Model http://gee.cs.oswego.edu/dl/cpj/jmm.html
  12. The JSR-133 Cookbook for Compiler Writers http://gee.cs.oswego.edu/dl/jmm/cookbook.html
  13. Java Development Kit http://en.wikipedia.org/wiki/Java_Development_Kit