Source code for Nabbit, task graph library. Feel free to send all
questions, comments, or possibly feature requests to me via
In order to use Nabbit, you must have Cilk++ installed. You can get
Cilk++ via Intel's software development website. Some useful links:
NOTE: (11/25/10) Within the last year, Intel has changed
Cilk++ into Intel Cilk Plus. I have not had a chance to do testing of
Nabbit on the latest version of Intel Cilk Plus, although it is
certainly on my TODO list. In principle, there should be little to
nothing that needs to be changed, but you never know for sure until
Aside from comments in the code, the main documentation for Nabbit is
the following paper at IPDPS.
Executing Task Graphs Using Work-Stealing
by Kunal Agrawal, Charles E. Leiserson, and Jim Sukha
Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Atlanta, GA, USA
April 19--23, 2010
To download the paper:
Slides from IPDPS are also posted.
Dynamic Programming Example
One of the benchmarks I implemented for the paper was a dynamic
program modeling the Smith-Waterman dynamic program with a generic
penalty gap. The equations for this dynamic program take the form:
Note that this dynamic program is not uniform, i.e., the computation
by each task graph node is not constant, but increases as we get
closer to the lower left corner (higher i and j).
At some point, I would be interested in implementing a real
Smith-Waterman dynamic program; however I think often, people use the
O(n^2) version of the dynamic program instead of the
O(n^3) version which I have described here, so there may be
Using Nabbit, we can execute this task graph in parallel, by creating a
task graph whose dependency structure is an N by N grid.
For efficiency, instead of having each task node be the computation of
a single value of M(i, j), we use a base case
where each task node computes a B by B block of values.
The following video captures a sample parallel execution of this
dynamic program using Nabbit to other fork-join executions:
with P=4, N = 750 and B = 64
This video compares a parallel execution using Nabbit to other
fork-join versions of the computation written in Cilk++, without using
Comparison with Fork-Join computation with P=16, N =
3000 and B = 16
Until I have time to set up a more permanent (and dynamic) solution, I
am posting updates to the code and doucmentation here.
- 11/25/10: Added some video animations of execution using
Nabbit. Unfortunately, I haven't done a particularly good job
figuring out how to embed videos correctly and scaling them to the
right size, but hopefully I will figure out a better way to do this
- 5/28/10: Posted version 1 of Nabbit.
- 5/28/10: Also added separate code for indexing of 2d
arrays, arraged in Morton order, and arranged in a two-level blocked
format. This code is identical to the arrays directory in the
full Nabbit library.
Back to main page.