More Effective C++ | Item 18: Amortize the cost of expected computations

Back to Item 17: Consider using lazy evaluation
Continue to Item 19: Understand the origin of temporary objects

Item 18: Amortize the cost of expected computations.

In Item 17, I extolled the virtues of laziness, of putting things off as long as possible, and I explained how laziness can improve the efficiency of your programs. In this item, I adopt a different stance. Here, laziness has no place. I now encourage you to improve the performance of your software by having it do more than it's asked to do. The philosophy of this item might be called over-eager evaluation: doing things before you're asked to do them.

Consider, for example, a template for classes representing large collections of numeric data:

template<class NumericalType>
class DataCollection {
public:
  NumericalType min() const;
  NumericalType max() const;
  NumericalType avg() const;
  ...
};

Assuming the min, max, and avg functions return the current minimum, maximum, and average values of the collection, there are three ways in which these functions can be implemented. Using eager evaluation, we'd examine all the data in the collection when min, max, or avg was called, and we'd return the appropriate value. Using lazy evaluation, we'd have the functions return data structures that could be used to determine the appropriate value whenever the functions' return values were actually used. Using over-eager evaluation, we'd keep track of the running minimum, maximum, and average values of the collection, so when min, max, or avg was called, we'd be able to return the correct value immediately — no computation would be required. If min, max, and avg were called frequently, we'd be able to amortize the cost of keeping track of the collection's minimum, maximum, and average values over all the calls to those functions, and the amortized cost per call would be lower than with eager or lazy evaluation.

The idea behind over-eager evaluation is that if you expect a computation to be requested frequently, you can lower the average cost per request by designing your data structures to handle the requests especially efficiently.

One of the simplest ways to do this is by caching values that have already been computed and are likely to be needed again. For example, suppose you're writing a program to provide information about employees, and one of the pieces of information you expect to be requested frequently is an employee's cubicle number. Further suppose that employee information is stored in a database, but, for most applications, an employee's cubicle number is irrelevant, so the database is not optimized to find it. To avoid having your specialized application unduly stress the database with repeated lookups of employee cubicle numbers, you could write a findCubicleNumber function that caches the cubicle numbers it looks up. Subsequent requests for cubicle numbers that have already been retrieved can then be satisfied by consulting the cache instead of querying the database.

Here's one way to implement findCubicleNumber; it uses a map object from the Standard Template Library (the "STL" — see Item 35) as a local cache:

int findCubicleNumber(const string& employeeName)
{
  // define a static map to hold (employee name, cubicle number)
  // pairs. This map is the local cache.
  typedef map<string, int> CubicleMap;
  static CubicleMap cubes;

// try to find an entry for employeeName in the cache;
  // the STL iterator "it" will then point to the found
  // entry, if there is one (see Item 35 for details)
  CubicleMap::iterator it = cubes.find(employeeName);

// "it"'s value will be cubes.end() if no entry was
  // found (this is standard STL behavior). If this is
  // the case, consult the database for the cubicle
  // number, then add it to the cache
  if (it == cubes.end()) {
    int cubicle =
the result of looking up employeeName's cubicle
      number in the database;

cubes[employeeName] = cubicle;           // add the pair
                                             // (employeeName, cubicle)
                                             // to the cache
    return cubicle;
  }
  else {
    // "it" points to the correct cache entry, which is a
    // (employee name, cubicle number) pair. We want only
    // the second component of this pair, and the member
    // "second" will give it to us
    return (*it).second;
  }
}

Try not to get bogged down in the details of the STL code (which will be clearer after you've read Item 35). Instead, focus on the general strategy embodied by this function. That strategy is to use a local cache to replace comparatively expensive database queries with comparatively inexpensive lookups in an in-memory data structure. Provided we're correct in assuming that cubicle numbers will frequently be requested more than once, the use of a cache in findCubicleNumber should reduce the average cost of returning an employee's cubicle number.

One detail of the code requires explanation. The final statement returns (*it).second instead of the more conventional it->second. Why? The answer has to do with the conventions followed by the STL. In brief, the iterator it is an object, not a pointer, so there is no guarantee that "->" can be applied to it.⁶ The STL does require that "." and "*" be valid for iterators, however, so (*it).second, though syntactically clumsy, is guaranteed to work.)

Caching is one way to amortize the cost of anticipated computations. Prefetching is another. You can think of prefetching as the computational equivalent of a discount for buying in bulk. Disk controllers, for example, read entire blocks or sectors of data when they read from disk, even if a program asks for only a small amount of data. That's because it's faster to read a big chunk once than to read two or three small chunks at different times. Furthermore, experience has shown that if data in one place is requested, it's quite common to want nearby data, too. This is the infamous locality of reference phenomenon, and systems designers rely on it to justify disk caches, memory caches for both instructions and data, and instruction prefetches.

Excuse me? You say you don't worry about such low-level things as disk controllers or CPU caches? No problem. Prefetching can yield dividends for even one as high-level as you. Imagine, for example, you'd like to implement a template for dynamic arrays, i.e., arrays that start with a size of one and automatically extend themselves so that all nonnegative indices are valid:

template<class T>                            // template for dynamic
class DynArray { ... };                      // array-of-T classes

DynArray<double> a;                          // at this point, only a[0]
                                             // is a legitimate array
                                             // element

a[22] = 3.5;                                 // a is automatically
                                             // extended: valid indices
                                             // are now 0-22

a[32] = 0;                                   // a extends itself again;
                                             // now a[0]-a[32] are valid

How does a DynArray object go about extending itself when it needs to? A straightforward strategy would be to allocate only as much additional memory as needed, something like this:

template<class T>
T& DynArray<T>::operator[](int index)
{
  if (index < 0) {
throw an exception;                     // negative indices are
  }                                         // still invalid

if (index > the current maximum index value) {
call new to allocate enough additional memory so that
    index is valid;
}

return the indexth element of the array;
}

This approach simply calls new each time it needs to increase the size of the array, but calls to new invoke operator new (see Item 8), and calls to operator new (and operator delete) are usually expensive. That's because they typically result in calls to the underlying operating system, and system calls are generally slower than are in-process function calls. As a result, we'd like to make as few system calls as possible.

An over-eager evaluation strategy employs this reasoning: if we have to increase the size of the array now to accommodate index i, the locality of reference principle suggests we'll probably have to increase it in the future to accommodate some other index a bit larger than i. To avoid the cost of the memory allocation for the second (anticipated) expansion, we'll increase the size of the DynArray now by more than is required to make i valid, and we'll hope that future expansions occur within the range we have thereby provided for. For example, we could write DynArray::operator[] like this:

template<class T>
T& DynArray<T>::operator[](int index)
{
  if (index < 0) throw an exception;

if (index > the current maximum index value) {
    int diff = index - the current maximum index value;

call new to allocate enough additional memory so that
    index+diff is valid;
}

return the indexth element of the array;
}

This function allocates twice as much memory as needed each time the array must be extended. If we look again at the usage scenario we saw earlier, we note that the DynArray must allocate additional memory only once, even though its logical size is extended twice:


DynArray<double> a;                          // only a[0] is valid

a[22] = 3.5;                                 // new is called to expand
                                             // a's storage through
                                             // index 44; a's logical
                                             // size becomes 23

a[32] = 0;                                   // a's logical size is
                                             // changed to allow a[32],
                                             // but new isn't called

If a needs to be extended again, that extension, too, will be inexpensive, provided the new maximum index is no greater than 44.

There is a common theme running through this Item, and that's that greater speed can often be purchased at a cost of increased memory usage. Keeping track of running minima, maxima, and averages requires extra space, but it saves time. Caching results necessitates greater memory usage but reduces the time needed to regenerate the results once they've been cached. Prefetching demands a place to put the things that are prefetched, but it reduces the time needed to access those things. The story is as old as Computer Science: you can often trade space for time. (Not always, however. Using larger objects means fewer fit on a virtual memory or cache page. In rare cases, making objects bigger reduces the performance of your software, because your paging activity increases, your cache hit rate decreases, or both. How do you find out if you're suffering from such problems? You profile, profile, profile (see Item 16).)

The advice I proffer in this Item — that you amortize the cost of anticipated computations through over-eager strategies like caching and prefetching — is not contradictory to the advice on lazy evaluation I put forth in Item 17. Lazy evaluation is a technique for improving the efficiency of programs when you must support operations whose results are not always needed. Over-eager evaluation is a technique for improving the efficiency of programs when you must support operations whose results are almost always needed or whose results are often needed more than once. Both are more difficult to implement than run-of-the-mill eager evaluation, but both can yield significant performance improvements in programs whose behavioral characteristics justify the extra programming effort.

Back to Item 17: Consider using lazy evaluation
Continue to Item 19: Understand the origin of temporary objects

⁶ In July 1995, the °ISO/ANSI committee standardizing C++ added a requirement that STL iterators support the "->" operator, so it->second should now work. Some STL implementations fail to satisfy this requirement, however, so (*it).second is still the more portable construct.
Return