Conventional parallel computer architectures do not provide support
for non-uniformly distributed objects. In this thesis, I introduce
sparsely faceted arrays (SFAs), a new low-level mechanism for naming
regions of memory, or facets, on different processors in a
distributed, shared memory parallel processing system. Sparsely
faceted arrays address the disconnect between the global distributed
arrays provided by conventional architectures (e.g. the Cray T3
series), and the requirements of high-level parallel programming
methods that wish to use objects that are distributed over only a
subset of processing elements. A sparsely faceted array names a
virtual globally-distributed array, but actual facets are lazily
allocated. By providing simple semantics and making efficient use of
memory, SFAs enable efficient implementation of a variety of
non-uniformly distributed data structures and related algorithms. I
present example applications which use SFAs, and describe and evaluate
simple hardware mechanisms for implementing SFAs.
Keeping track of which nodes have allocated facets for a particular
SFA is an important task that suggests the need for automatic memory
management, including garbage collection. To address this need, I
first argue that conventional tracing techniques such as mark/sweep
and copying GC are inherently unscalable in parallel systems. I then
present a parallel memory-management strategy, based on
reference-counting, that is capable of garbage collecting sparsely
faceted arrays. I also discuss opportunities for hardware support of
this garbage collection strategy.
I have implemented a high-level hardware/OS simulator featuring
hardware support for sparsely faceted arrays and automatic garbage
collection. I describe the simulator and outline a few of the
numerous details associated with a ``real'' implementation of SFAs and
SFA-aware garbage collection. Simulation results are used throughout
this thesis in the evaluation of hardware support mechanisms.