Martin Rinard

Parallel Computing

Commutativity Analysis

A key problem that complicates the automatic parallelization of irregular, object-oriented computations written in languages such as Java and C++ is the pervasive use of irregular linked data structures such as lists, trees and graphs. Our research attacks this problem using a new analysis technique called commutativity analysis.

Commutativity analysis exploits the structure of the object-oriented paradigm to view the computation as consisting of operations on objects. It then analyzes the computation at this granularity to determine if operations commute (i.e. generate the same result regardless of the order in which they execute). If all of the operations in a given computation commute, the compiler can automatically generate parallel code. The generated code uses mutual exclusion locks to make operations in parallel phases execute atomically.

Our experimental results demonstrate the potential of this approach. We have built a prototype compiler that uses commutativity analysis as its primary analysis technique. This compiler accepts an unannotated program written in a subset of C++ and automatically generates a parallel C++ program that performs the same computation. In addition to the parallelization transformation, we also implemented a variety of synchronization optimizations . We used the compiler to automatically parallelize several complete computations:

The following papers present research related to commutativity analysis:

The following papers present research on related optimization techniques:

For more information, see my list of papers.

Synchronization Optimizations

Efficient mutual exclusion synchronization is one of the fundamental requirements of effective parallel computing. The tasks in fine-grain parallel computations, for example, need fast synchronization for efficient control of their frequent interactions. Efficient synchronization also promotes the development of reliable parallel software because it allows programmers to structure programs as a set of synchronized operations on fine-grain objects. This development methodology helps programmers overcome the challenging problems (nondeterministic behavior, deadlock, etc.) that complicate the development of parallel software.

Efficient mutual exclusion synchronization is also important for modern multithreaded languages such as Java. In this context, a primary issue is the need to make class libraries thread-safe. The standard mechanism is to augment each object with a mutual exclusion lock. Each operation on the object acquires the lock, accesses the object, then releases the lock. This mechanism ensures that the operation executes atomically even in the face of concurrent accesses by multiple parallel threads.

We have developed several optimizations for parallel and multithreaded computations that contain mutual exclusion synchronizations. These optimizations eliminate mutual exclusion synchronization, reduce its frequency, replace it with more efficient optimistic synchronization primitives, and replicate objects to eliminate the need to synchronize.

Here are several papers that present research related to synchronization optimizations:

For more information, see my list of papers.

Symbolic Analysis of Divide and Conquer Programs

Divide and conquer algorithms solve problems by breaking them up into smaller subproblems, recursively solving the subproblems, then combining the results to generate a solution to the original problem. A simple algorithm that works well for small problem sizes terminates the recursion. Good divide and conquer algorithms exist for a large variety of problems, including sorting, matrix manipulation, and many dynamic programming problems. Divide and conquer algorithms have several appealing properties that make them a good match for modern parallel machines.

In spite of these advantages, divide and conquer algorithms present an imposing set of analysis challenges. The natural formulation of these algorithms is recursive. For efficiency reasons, programs often use dynamic memory allocation to match the sizes of the data structures to the problem size, and often use pointers into arrays and pointer arithmetic to identify subproblems. Moreover, traditional analyses for parallelizing compilers are of little or no help for this class of programs --- they are designed to analyze loop nests that access arrays using affine array index expressions, not recursive procedures that use pointers and pointer arithmetic.

The key analysis problem for divide and conquer programs is characterizing how procedures access subregions of allocated memory blocks. This analysis must be symbolic, extracting bounds for the accessed regions in terms of the parameters of the procedure. So, for example, the analysis must extract facts of the form "the procedure f(double *p, int n) accesses the region of memory from p to p+n-1". In general, procedures may use pointers and pointer arithmetic to access multiple potentially overlapping regions, significantly complicating the analysis.

We have developed the following analyses and optimizations:

The following papers present research related to this topic:

For more information, see my list of papers.


Jade is an implicitly parallel programming language for irregular, coarse-grain parallelism.