Synchronization Optimizations

Pointer and Escape Analysis for Multithreaded Programs
Alexandru Salcianu and Martin C. Rinard
Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Snowbird, Utah June 2001

Presents a combined pointer and escape analysis for Java programs with unstructured multithreading. The algorithm analyzes interactions between threads to obtain precise points-to and escape information for objects accessible to multiple threads. The results are used for synchronization elimination and elimination of dynamic checks associated with the use of region-based memory allocation. Unlike previous synchronization elimination algorithms, this algorithm is able to eliminate synchronizations on the same object from different threads, provided that task creation actions temporally separate the synchronizations.
Effective Fine-Grain Synchronization for Automatically Parallelized Programs Using Optimistic Synchronization Primitives
Martin C. Rinard
ACM Transactions on Computer Systems
Volume 17, Number 4 (November 1999), pp. 337-371.
Copyright 1999 by ACM, Inc.

Effective Fine-Grain Synchronization For Automatically Parallelized Programs Using Optimistic Synchronization Primitives
Martin C. Rinard
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 112-123
Las Vegas, Nevada June 1997
Presents an algorithm that analyzes irregular, object-based computations to replace heavyweight lock synchronization with lightweight optimistic synchronization primitives such as load linked/store conditional. The results show that the use of optimistic synchronization in this context can significantly reduce the memory consumption and improve the overall performance.
Eliminating Synchronization Overhead in Automatically Parallelized Programs Using Dynamic Feedback
Pedro C. Diniz and Martin C. Rinard
ACM Transactions on Computer Systems
Volume 17, Number 2 (May 1999), pp. 89-132.
Copyright 1999 by ACM, Inc.

Dynamic Feedback: An Effective Technique for Adaptive Computing
Pedro C. Diniz and Martin C. Rinard
Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation, 71-84
Las Vegas, Nevada June 1997

The compiler produces several different versions of the same source code; each version is compiled with a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment.
We apply this approach to optimize the lock coarsening granularity. Lock coarsening algorithms merge critical sections to reduce the frequency and overhead of synchronization. The tradeoff is that lock coarsening may reduce the amount of available concurrency. Our experimental results show that the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that it is difficult to statically choose the best policy, and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy.
Compositional Pointer and Escape Analysis for Java Programs (updated version)
John Whaley and Martin Rinard
Proceedings of the 14th Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications
Denver, CO November 1999

Presents a combined pointer and escape analysis for Java programs. The algorithm is designed to analyze arbitrary regions of complete or incomplete programs, obtaining complete information for objects that do not escape the analyzed regions. The information is used to eliminate synchronization for objects that do not escape their allocating thread and to allocate objects on the stack instead of in the heap.
Synchronization Transformations for Parallel Computing
Pedro C. Diniz and Martin C. Rinard
Concurrency Practice & Experience
Volume 11, Number 13 (Nov 1999), pp. 773-802.

Synchronization Transformations for Parallel Computing
Pedro C. Diniz and Martin C. Rinard
Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of Programming Languages, pp. 187-200
Paris, France January 1997
Presents an algorithm that moves lock acquire and release operations through multithreaded programs. Adjacent acquire and release operations cancel, reducing the synchronization frequency and overhead.
Eliminating Synchronization Bottlenecks in Object-Based Programs Using Adaptive Replication
Martin C. Rinard and Pedro C. Diniz
1999 ACM International Conference on Supercomputing, pp. 83-92
Rhodes, Greece June 1999
The full version of this paper is available.

Replicates objects in multithreaded programs to eliminate synchronization bottlenecks that occur when multiple threads attempt to concurrently update the same object. Eliminates unnecessary replication by dynamically measuring the amount of contention at each object to detect and replicate only those objects that would otherwise cause synchronization bottlenecks.
Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-based Programs
Pedro C. Diniz and Martin C. Rinard
Journal of Parallel and Distributed Computing
Volume 49, Number 2, (March 1998), pp. 218-244.

Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs
Pedro C. Diniz and Martin C. Rinard
Languages and Compilers for Parallel Computing, Ninth International Workshop, pp. 284-299
San Jose, California August 1996

Analyzes the program to apply two kinds of lock coarsening. Computation lock coarsening finds computations that repeatedly acquire and release the same lock, then transforms them to acquire and release the lock once. Data lock coarsening transforms the computation to use the same lock for multiple objects instead of a single lock for each object. Our experimental results show that together, these transformations can significantly reduce the synchronization overhead in our benchmark applications.