### Symbolic Analysis of Divide and Conquer Algorithms

Divide and conquer algorithms solve problems by breaking them up into smaller subproblems, recursively solving the subproblems, then combining the results to generate a solution to the original problem. A simple algorithm that works well for small problem sizes terminates the recursion. Good divide and conquer algorithms exist for a large variety of problems, including sorting, matrix manipulation, and many dynamic programming problems. Divide and conquer algorithms have several appealing properties that make them a good match for modern parallel machines.

• Parallelism: Divide and conquer algorithms tend to have a lot of inherent parallelism. Once the division phase is complete, the subproblems are usually independent and can therefore be solved in parallel. Moreover, the recursive structure of the algorithm naturally leads to recursively generated concurrency, which means that even the divide and combine phases execute in parallel with divide and combine phases from other subproblems. This approach typically generates more than enough concurrency to keep the machine busy.
• Cache Performance: Second, divide and conquer algorithms also tend to have good cache performance. Once a subproblem fits in the cache, the standard recursive solution reuses the cached data until the subproblem has been completely solved. Because most of the work takes place deep in the recursive call tree, the algorithm usually spends most of its execution time running out of the cache. Furthermore, divide and conquer algorithms naturally work well with a range of cache sizes and at all levels of the memory hierarchy. As soon as a subproblem fits into one level of the memory hierarchy, the algorithm runs out of that level (or below) until the subproblem has been solved. Divide and conquer programs therefore automatically adapt to different cache hierarchies, and tend to run well without modification on whatever machine is available.

In spite of these advantages, divide and conquer algorithms present an imposing set of analysis challenges. The natural formulation of these algorithms is recursive. For efficiency reasons, programs often use dynamic memory allocation to match the sizes of the data structures to the problem size, and often use pointers into arrays and pointer arithmetic to identify subproblems. Moreover, traditional analyses for parallelizing compilers are of little or no help for this class of programs --- they are designed to analyze loop nests that access arrays using affine array index expressions, not recursive procedures that use pointers and pointer arithmetic.

The key analysis problem for divide and conquer programs is characterizing how procedures access subregions of allocated memory blocks. This analysis must be symbolic, extracting bounds for the accessed regions in terms of the parameters of the procedure. So, for example, the analysis must extract facts of the form "the procedure f(double *p, int n) accesses the region of memory from p to p+n-1". In general, procedures may use pointers and pointer arithmetic to access multiple potentially overlapping regions, significantly complicating the analysis.

We have developed an analysis that formulates the problem as a system of symbolic constraints between multivariate polynomials in the program variables. It then reduces the constraint system to a linear program. The solution to the linear program can be used to solve several important analysis problems, including static race detection for parallel divide and conquer programs, automatic parallelization of sequential divide and conquer programs, static detection of array bounds violations, and elimination of array bounds checks. Basically, the analysis can verify that the program is completely free of a wide range of memory access anomalies that can cause a wide range of very nasty programming errors. See our PLDI paper for more information on this technique.

Our earlier PPoPP paper on this subject presents an approach based on a set of dataflow analysis algorithms. The focus of this paper is on automatic parallelization.