Synchronization Optimizations

Efficient mutual exclusion synchronization is one of the fundamental requirements of effective parallel computing. The tasks in fine-grain parallel computations, for example, need fast synchronization for efficient control of their frequent interactions. Efficient synchronization also promotes the development of reliable parallel software because it allows programmers to structure programs as a set of synchronized operations on fine-grain objects. This development methodology helps programmers overcome the challenging problems (nondeterministic behavior, deadlock, etc.) that complicate the development of parallel software.

Efficient mutual exclusion synchronization is also important for modern multithreaded languages such as Java. In this context, a primary issue is the need to make class libraries thread-safe. The standard mechanism is to augment each object with a mutual exclusion lock. Each operation on the object acquires the lock, accesses the object, then releases the lock. This mechanism ensures that the operation executes atomically even in the face of concurrent accesses by multiple parallel threads.

We have developed several optimizations for parallel and multithreaded computations that contain mutual exclusion synchronizations. These optimizations eliminate mutual exclusion synchronization, reduce its frequency, replace it with more efficient optimistic synchronization primitives, and replicate objects to eliminate the need to synchronize.

Synchronization Elimination: If an object is never accessed by more than one thread, there is no need to synchronize its operations. We have developed a combined pointer and escape analysis that is designed, in part, to find objects that do not escape from their allocating thread. The compiler can then generate customized, synchronization-free versions of the operations that execute on such objects. Our OOPSLA '99 paper on this topic presents the algorithm and provides experimental results from an implementation in a dynamic compiler for Java. For our benchmark programs, the algorithms eliminate between 24% and 67% of the synchronization operations.
Lock Coarsening: The goal of lock coarsening is to reduce the frequency with which the computation acquires and releases locks. We have developed two approaches: data lock coarsening, which associates one lock with multiple objects that tend to be accessed together, and computation lock coarsening, which coalesces multiple critical sections that acquire and release the same lock multiple times into a single critical section that acquires and releases the lock only once. Our experimental results show that lock elimination can significantly reduce the synchronization overhead in programs that use mutual exclusion synchronization.
Our JPDC paper presents an algorithm that performs both data and computation lock coarsening for object-based programs. Our POPL '97 paper presents a set of general transformations that move lock constructs both within and between procedures to coarsen the lock granularity.
Dynamic Feedback: There is a trade-off associated with lock coarsening: increasing the granularity reduces the synchronization overhead, but also increases the size of the critical regions, which may reduce the amount of available concurrency. We have developed an algorithm that generates several different versions of the program; the versions differ in how aggressively they apply the lock coarsening transformation. When the program runs, the generated code samples the performance of each version, then uses the version with the best performance. The code periodically resamples to adapt to changes in the best version. Our TOCS paper presents the algorithm and experimental results. We also published a PLDI '97 paper on this topic.
Optimistic Synchronization: Another way to reduce the synchronization overhead is to replace mutual exclusion locks with efficient optimistic synchronization primitives such as load linked/store conditional. Our PPOPP '97 paper on this topic presents our experience using optimistic synchronization in the context of a parallelizing compiler for object-based programs. Our results show that using optimistic synchronization can improve the performance and significantly reduce the amount of memory required to execute the computation.
Adaptive Replication: In many programs it is possible to eliminate synchronization bottlenecks by replicating frequently accessed objects that cause these bottlenecks. Each thread that updates such an object creates its own local replica and performs all updates locally on that replica with no synchronization and no interaction with other threads. When the computation needs to access the final value of the object, it combines the values stored in the replicas to generate the final value. A key question is determining which objects to replicate - replicating all objects can cause significant and unnecessary memory and computation overheads. We have developed a technique that dynamically measures the contention at objects to replicate only those objects that would otherwise cause synchronization bottlenecks.
Our ICS '99 paper presents experimental results from an implementation of this technique. The results show that, for our set of benchmark programs, adaptive replication can improve the overall performance by up to a factor of three over versions that use no replication, and can reduce the memory consumption by up to a factor of four over versions that fully replicate updated objects. Furthermore, adaptive replication never significantly harms the performance and never increases the memory usage without a corresponding increase in the performance.

Here are several papers that present research related to synchronization optimizations:

Compositional Pointer and Escape Analysis for Java Programs
John Whaley and Martin Rinard
Proceedings of the 14th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications
Denver, CO November 1999
Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-based Programs
Pedro C. Diniz and Martin C. Rinard
Journal of Parallel and Distributed Computing
Volume 49, Number 2, (March 1998), pp. 218-244.
Synchronization Transformations for Parallel Computing
Pedro C. Diniz and Martin C. Rinard
Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of Programming Languages
Paris, France January 1997
Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs
Pedro C. Diniz and Martin C. Rinard
Languages and Compilers for Parallel Computing, Ninth International Workshop
San Jose, California August 1996
Eliminating Synchronization Overhead in Automatically Parallelized Programs Using Dynamic Feedback
Pedro C. Diniz and Martin C. Rinard
ACM Transactions on Computer Systems
To Appear
Copyright 1999 by ACM, Inc.
Dynamic Feedback: An Effective Technique for Adaptive Computing
Pedro C. Diniz and Martin C. Rinard
Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation
Las Vegas, Nevada June 1997
Effective Fine-Grain Synchronization For Automatically Parallelized Programs Using Optimistic Synchronization Primitives
Martin C. Rinard
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Las Vegas, Nevada June 1997
Eliminating Synchronization Bottlenecks in Object-Based Programs Using Adaptive Replication
Martin C. Rinard and Pedro C. Diniz
1999 ACM International Conference on Supercomputing
Rhodes, Greece June 1999
The full version of this paper is available.

For more information, see my list of papers.

Back to home page