Traditional memory fences are program-counter (PC) based. That is, a
memory fence enforces a serialization point in the program instruction
stream --- it ensures that all memory references before the fence in the
program order have taken effect before the execution continues onto
instructions after the fence. Such PC-based memory fences always cause the
processor to stall, even when the synchronization is unnecessary during a
particular execution. We propose the concept of location-based
memory fences, which aim to reduce the cost of synchronization due to the
latency of memory fence execution in parallel algorithms.

Unlike a PC-based memory fence, a location-based memory fence serializes
the instruction stream of the executing thread T1 only when a different
thread T2 attempts to read the memory location which is guarded by the
location-based memory fence. In this work, we describe a hardware
mechanism for location-based memory fences, prove its correctness, and
evaluate its potential performance benefit. Our experimental results are
based on a software simulation of the proposed location-based memory fence,
and thus expected to incur higher overhead than the proposed hardware
mechanism would. Nevertheless, our software experiments show that
applications can benefit from using location-based memory fences, but they
do not scale as well in some cases, due to the software overhead. These
results suggest that a hardware support for location-based memory fences is
worth considering.

Go back to Angelina's homepage