Presents a combined pointer and escape analysis for Java programs with unstructured multithreading. The algorithm analyzes interactions between threads to obtain precise points-to and escape information for objects accessible to multiple threads. The results are used for synchronization elimination and elimination of dynamic checks associated with the use of region-based memory allocation. Unlike previous synchronization elimination algorithms, this algorithm is able to eliminate synchronizations on the same object from different threads, provided that task creation actions temporally separate the synchronizations.
Effective Fine-Grain Synchronization For Automatically Parallelized
Programs Using Optimistic Synchronization Primitives
Martin C. Rinard
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and
Practice of Parallel Programming, pp. 112-123
Las Vegas, Nevada June 1997
Presents an algorithm that analyzes irregular, object-based computations to replace heavyweight lock synchronization with lightweight optimistic synchronization primitives such as load linked/store conditional. The results show that the use of optimistic synchronization in this context can significantly reduce the memory consumption and improve the overall performance.
Dynamic Feedback: An Effective Technique for Adaptive Computing
Pedro C. Diniz and Martin C. Rinard
Proceedings of the ACM SIGPLAN '97 Conference on Programming Language Design and Implementation, 71-84
Las Vegas, Nevada June 1997
The compiler produces several different versions of the same source code; each version is compiled with a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment.
We apply this approach to optimize the lock coarsening granularity. Lock coarsening algorithms merge critical sections to reduce the frequency and overhead of synchronization. The tradeoff is that lock coarsening may reduce the amount of available concurrency. Our experimental results show that the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that it is difficult to statically choose the best policy, and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy.
Presents a combined pointer and escape analysis for Java programs. The algorithm is designed to analyze arbitrary regions of complete or incomplete programs, obtaining complete information for objects that do not escape the analyzed regions. The information is used to eliminate synchronization for objects that do not escape their allocating thread and to allocate objects on the stack instead of in the heap.
Synchronization Transformations for Parallel Computing
Pedro C. Diniz and Martin C. Rinard
Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of
Programming Languages, pp. 187-200
Paris, France January 1997
Presents an algorithm that moves lock acquire and release operations through multithreaded programs. Adjacent acquire and release operations cancel, reducing the synchronization frequency and overhead.
Replicates objects in multithreaded programs to eliminate synchronization bottlenecks that occur when multiple threads attempt to concurrently update the same object. Eliminates unnecessary replication by dynamically measuring the amount of contention at each object to detect and replicate only those objects that would otherwise cause synchronization bottlenecks.
Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs
Pedro C. Diniz and Martin C. Rinard
Languages and Compilers for Parallel Computing, Ninth International Workshop, pp. 284-299
San Jose, California August 1996
Analyzes the program to apply two kinds of lock coarsening. Computation lock coarsening finds computations that repeatedly acquire and release the same lock, then transforms them to acquire and release the lock once. Data lock coarsening transforms the computation to use the same lock for multiple objects instead of a single lock for each object. Our experimental results show that together, these transformations can significantly reduce the synchronization overhead in our benchmark applications.