Synchronization Optimizations
Efficient mutual exclusion
synchronization is one of the fundamental requirements of effective
parallel computing. The tasks in fine-grain parallel computations, for example,
need fast synchronization for efficient control of their frequent interactions.
Efficient synchronization also promotes the development of reliable parallel
software because it allows programmers to structure programs as a set of
synchronized operations on fine-grain objects.
This development methodology helps programmers overcome the challenging
problems (nondeterministic behavior, deadlock, etc.) that
complicate the development of
parallel software.
Efficient mutual exclusion synchronization is also important for
modern multithreaded languages such as Java. In this context, a
primary issue is the need to make class libraries thread-safe.
The standard mechanism is to augment each object with a mutual
exclusion lock. Each operation on the object acquires the lock,
accesses the object, then releases the lock. This mechanism
ensures that the operation executes atomically even in the
face of concurrent accesses by multiple parallel threads.
We have developed several optimizations for
parallel and multithreaded computations that contain
mutual exclusion synchronizations. These optimizations
eliminate mutual exclusion synchronization, reduce its
frequency, replace it with more efficient optimistic
synchronization primitives, and replicate objects to
eliminate the need to synchronize.
- Synchronization Elimination:
If an object is never accessed by more than one thread,
there is no need to synchronize its operations. We
have developed a
combined pointer and escape analysis
that is designed, in part, to find objects that do
not escape from their allocating thread. The compiler can then
generate customized, synchronization-free versions of
the operations that execute on such objects.
Our OOPSLA '99 paper on this
topic presents the algorithm and provides experimental results
from an implementation in a dynamic compiler for Java.
For our benchmark programs, the algorithms eliminate between
24% and 67% of the synchronization operations.
- Lock Coarsening:
The goal of lock coarsening is to reduce the frequency
with which the computation acquires and releases locks.
We have developed two approaches: data lock coarsening,
which associates one lock with multiple objects that tend
to be accessed together, and computation lock coarsening, which
coalesces multiple critical sections that acquire and release the same lock
multiple times
into a single critical section that acquires and releases the lock only once.
Our experimental results show that lock elimination can significantly
reduce the synchronization overhead in programs that use mutual
exclusion synchronization.
Our JPDC
paper presents an algorithm that performs both data and
computation lock coarsening for object-based programs.
Our POPL '97 paper presents
a set of general transformations that move lock constructs
both within and between procedures to coarsen the lock
granularity.
- Dynamic Feedback:
There is a trade-off associated with lock coarsening:
increasing the granularity reduces the synchronization overhead,
but also increases the size of the critical regions, which may
reduce the amount of available concurrency. We have developed
an algorithm that generates several different versions of the
program; the versions differ in how aggressively they apply
the lock coarsening transformation. When the program runs, the
generated code samples the performance of each version, then
uses the version with the best performance. The code periodically
resamples to adapt to changes in the best version.
Our TOCS paper presents the
algorithm and experimental results. We also published a
PLDI '97 paper on this topic.
-
Optimistic Synchronization:
Another way to reduce the synchronization overhead is to
replace mutual exclusion locks with
efficient optimistic synchronization primitives
such as load linked/store conditional. Our
PPOPP '97 paper on
this topic presents our experience using optimistic
synchronization in the context of a parallelizing compiler for
object-based programs. Our results show that
using optimistic synchronization can improve the performance
and significantly reduce the amount of memory
required to execute the computation.
- Adaptive Replication:
In many programs it is possible to eliminate synchronization
bottlenecks by replicating frequently accessed objects that cause
these bottlenecks. Each thread that updates such an
object creates its own local replica and performs all updates locally
on that replica with no synchronization and no interaction with other threads.
When the computation needs to access the final value of the object, it combines
the values stored in the replicas to generate the final value.
A key question is determining which objects to replicate -
replicating all objects can cause significant and unnecessary memory
and computation overheads. We have developed a technique that
dynamically measures the contention at objects to replicate only
those objects that would otherwise cause synchronization
bottlenecks.
Our ICS '99 paper
presents experimental results from an implementation of this technique.
The results show that,
for our set of benchmark programs, adaptive replication
can improve the overall performance by up to a factor of
three over versions that use no replication, and can
reduce the memory consumption by up to a factor of four over
versions that fully replicate updated objects. Furthermore,
adaptive replication never significantly
harms the performance and never increases the memory usage
without a corresponding increase in the performance.
Here are several papers that present research related to
synchronization optimizations:
-
Compositional Pointer and Escape Analysis for Java Programs
John Whaley and Martin Rinard
Proceedings of the 14th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications
Denver, CO November 1999
-
Lock Coarsening: Eliminating Lock Overhead in
Automatically Parallelized Object-based Programs
Pedro C. Diniz and Martin C. Rinard
Journal of Parallel and Distributed Computing
Volume 49, Number 2, (March 1998), pp. 218-244.
-
Synchronization Transformations for Parallel Computing
Pedro C. Diniz and Martin C. Rinard
Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of
Programming Languages
Paris, France January 1997
-
Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs
Pedro C. Diniz and Martin C. Rinard
Languages and Compilers for Parallel Computing, Ninth International Workshop
San Jose, California August 1996
-
Eliminating Synchronization Overhead in Automatically
Parallelized Programs Using Dynamic Feedback
Pedro C. Diniz and Martin C. Rinard
ACM Transactions on Computer Systems
To Appear
Copyright 1999 by ACM, Inc.
-
Dynamic Feedback: An Effective Technique for Adaptive Computing
Pedro C. Diniz and Martin C. Rinard
Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation
Las Vegas, Nevada June 1997
-
Effective Fine-Grain Synchronization For Automatically Parallelized
Programs Using Optimistic Synchronization Primitives
Martin C. Rinard
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming
Las Vegas, Nevada June 1997
-
Eliminating Synchronization Bottlenecks in Object-Based
Programs Using
Adaptive Replication
Martin C. Rinard and Pedro C. Diniz
1999 ACM International Conference on Supercomputing
Rhodes, Greece June 1999
The full version of this paper is available.
For more information, see my list of papers.
Back to home page