[Prev] [Next] [Index]

4 Future Work / Ideas

Although we found software cache coherence to help for several applications, it may not be useful in all cases. If the application is regular enough, the programmer may be able to handle the caching of remote values more efficiently. It would be ideal if caching could be provided as an optional feature. With our implementation, it would be simple to allow the user to choose whether or not to use caching on a per-variable basis. Global variables using automatic caching would invoke our SWCC-Split-C library calls, while others would just use the standard Split-C library calls.
Perhaps an even better possibility could be to use our SWCC as a tool for an optimizing compiler. Optimizations have been created for the Split-C compiler which will analyze a program to automatically "cache" global variables in local memory when the compiler detects repeated accesses [6]. However, the ability to evaluate the program at compile time is limited, and our software cache coherence library could be used as a fall-back when the compiler isn't able manage the caching statically. Ideally, the compiler would even determine which variables have a possibility of benefiting from software caching.

The semantics of the Split-C language provide the programmer with precise knowledge of where variables live, and which ones are local or remote. This environment limits the usefulness of software cache coherence since the programmer can easily control the data layout and access patterns. However, In many parallel programming languages, the user doesn't have such detailed knowledge, and software cache coherence becomes more important. An example is the Titanium programming language [12]. Titanium doesn't provide the programmer with the notion of local and remote as in Split-C. Additionally, there is a back-end for the Titanium compiler which targets the Split-C library, so it could be an ideal platform to further test our SWCC optimizations.

Several aspects of our MSI coherence protocol incur unnecessary network transactions and overhead. For example, if a block is being used in a producer-consumer relationship, the block will be in the shared state with one or more readers (consumers). Then the producer will obtain the block in the modified state by invalidating the consumers' copies. After the producer writes the new data, the consumers again get the block in the shared state, and the cycle repeats. A more ideal scenario would be for the producer to send out updates to the block, rather than invalidating the consumers copies.
Another example of unnecessary network transactions occurs when a processor reads a variable, and then writes it soon after. In this scenario, the processor first obtains the block in the shared state, and then must send an additional request to upgrade it to the modified state. A more ideal sequence would be for the processor to do a read exclusive, obtaining the block in the modified state with the first access.
It would be easy to add update and read exclusive functionality to our SWCC-Split-C library. We could either allow the user to specify which kind of transaction to perform, or let an optimizing pass of the compiler choose the best option. The Munin system [1,2] takes a slightly different approach, and allows the programmer to indicate the type of access pattern expected for each variable. This is another option to explore, or we could even attempt to determine the access pattern dynamically.

Because private variables can be mixed with shared variables in the address space provided by the Split-C compiler, we were forced to make copies of shared blocks in the directory entries. Ideally, this could be avoided if we had a segmented address space, such as in Shasta [10,11]. Additionally, with more control over the memory layout we could better align data to benefit from the block size of our cache coherence system and to avoid false sharing. Another aspect of Shasta that a segmented address space could enable is variable coherence granularity. Shasta has the ability to choose a different block size for each page of shared data. With the ability to control memory layout, it should be possible to have variable coherence granularity with our SWCC implementation; we could further segment the address space into regions of different block size. This could potentially be a great benefit in choosing the best granularity for a data structure, and eliminating false sharing.


[Prev] [Next] [Index]