Effective C++, 2E | Item 34: Minimize compilation dependencies between files

Back to Item 33: Use inlining judiciously.
Continue to Inheritance and Object-Oriented Design

Item 34: Minimize compilation dependencies between files.

So you go into your C++ program and you make a minor change to the implementation of a class. Not the class interface, mind you, just the implementation; only the private stuff. Then you get set to rebuild the program, figuring that the compilation and linking should take only a few seconds. After all, only one class has been modified. You click on Rebuild or type make (or its moral equivalent), and you are astonished, then mortified, as you realize that the whole world is being recompiled and relinked!

Don't you just hate it when that happens?

The problem is that C++ doesn't do a very good job of separating interfaces from implementations. In particular, class definitions include not only the interface specification, but also a fair number of implementation details. For example:

class Person {
public:
  Person(const string& name, const Date& birthday,
         const Address& addr, const Country& country);
  virtual ~Person();

...                      // copy constructor and assignment
                           // operator omitted for simplicity
  string name() const;
  string birthDate() const;
  string address() const;
  string nationality() const;

private:
  string name_;            // implementation detail
  Date birthDate_;         // implementation detail
  Address address_;        // implementation detail
  Country citizenship_;    // implementation detail
};

This is hardly a Nobel Prize-winning class design, although it does illustrate an interesting naming convention for distinguishing private data from public functions when the same name makes sense for both: the former are tagged with a trailing underbar. The important thing to observe is that class Person can't be compiled unless the compiler also has access to definitions for the classes in terms of which Person is implemented, namely, string, Date, Address, and Country. Such definitions are typically provided through #include directives, so at the top of the file defining the Person class, you are likely to find something like this:

#include <string>           // for type string (see Item 49)
#include "date.h"
#include "address.h"
#include "country.h"

Unfortunately, this sets up a compilation dependency between the file defining Person and these include files. As a result, if any of these auxiliary classes changes its implementation, or if any of the classes on which it depends changes its implementation, the file containing the Person class must be recompiled, as must any files that use the Person class. For clients of Person, this can be more than annoying. It can be downright incapacitating.

You might wonder why C++ insists on putting the implementation details of a class in the class definition. For example, why can't you define Person this way,

class string;         // "conceptual" forward declaration for the
                      // string type. See Item 49 for details.

class Date;           // forward declaration
class Address;        // forward declaration
class Country;        // forward declaration

class Person {
public:
  Person(const string& name, const Date& birthday,
         const Address& addr, const Country& country);
  virtual ~Person();

  ...                      // copy ctor, operator=

  string name() const;
  string birthDate() const;
  string address() const;
  string nationality() const;
};

specifying the implementation details of the class separately? If that were possible, clients of Person would have to recompile only if the interface to the class changed. Because interfaces tend to stabilize before implementations do, such a separation of interface from implementation could save untold hours of recompilation and linking over the course of a large software effort.

Alas, the real world intrudes on this idyllic scenario, as you will appreciate when you consider something like this:

int main()
{
  int x;                      // define an int

Person p(...);              // define a Person
                              // (arguments omitted for
  ...                         // simplicity)

When compilers see the definition for x, they know they must allocate enough space to hold an int. No problem. Each compiler knows how big an int is. When compilers see the definition for p, however, they know they have to allocate enough space for a Person, but how are they supposed to know how big a Person object is? The only way they can get that information is to consult the class definition, but if it were legal for a class definition to omit the implementation details, how would compilers know how much space to allocate?

In principle, this is no insuperable problem. Languages such as Smalltalk, Eiffel, and Java get around it all the time. The way they do it is by allocating only enough space for a pointer to an object when an object is defined. That is, they handle the code above as if it had been written like this:

int main()
{
  int x;                     // define an int

Person *p;                 // define a pointer
                             // to a Person
  ...
}

It may have occurred to you that this is in fact legal C++, and it turns out that you can play the "hide the object implementation behind a pointer" game yourself.

Here's how you employ the technique to decouple Person's interface from its implementation. First, you put only the following in the header file declaring the Person class:

// compilers still need to know about these type
// names for the Person constructor
class string;      // again, see Item 49 for information
                   // on why this isn't correct for string
class Date;
class Address;
class Country;

// class PersonImpl will contain the implementation
// details of a Person object; this is just a
// forward declaration of the class name
class PersonImpl;

class Person {
public:
  Person(const string& name, const Date& birthday,
         const Address& addr, const Country& country);
  virtual ~Person();

...                               // copy ctor, operator=

  string name() const;
  string birthDate() const;
  string address() const;
  string nationality() const;

private:
  PersonImpl *impl;                 // pointer to implementation
};

Now clients of Person are completely divorced from the details of strings, dates, addresses, countries, and persons. Those classes can be modified at will, but Person clients may remain blissfully unaware. More to the point, they may remain blissfully un-recompiled. In addition, because they're unable to see the details of Person's implementation, clients are unlikely to write code that somehow depends on those details. This is a true separation of interface and implementation.

The key to this separation is replacement of dependencies on class definitions with dependencies on class declarations. That's all you need to know about minimizing compilation dependencies: make your header files self-sufficient whenever it's practical, and when it's not practical, be dependent on class declarations, not class definitions. Everything else flows from this simple design strategy.

There are three immediate implications:

Avoid using objects when object references and pointers will do. You may define references and pointers to a type with only a declaration for the type. Defining objects of a type necessitates the presence of the type's definition.
Use class declarations instead of class definitions whenever you can. Note that you never need a class definition to declare a function using that class, not even if the function passes or returns the class type by value:
Of course, pass-by-value is generally a bad idea (see Item 22), but if you find yourself forced to use it for some reason, there's still no justification for introducing unnecessary compilation dependencies.

If you're surprised that the declarations for returnADate and takeADate compile without a definition for Date, join the club; so was I. It's not as curious as it seems, however, because if anybody calls those functions, Date's definition must be visible. Oh, I know what you're thinking: why bother to declare functions that nobody calls? Simple. It's not that nobody calls them, it's that not everybody calls them. For example, if you have a library containing hundreds of function declarations (possibly spread over several namespaces — see Item 28), it's unlikely that every client calls every function. By moving the onus of providing class definitions (via #include directives) from your header file of function declarations to clients' files containing function calls, you eliminate artificial client dependencies on type definitions they don't really need.
Don't #include header files in your header files unless your headers won't compile without them. Instead, manually declare the classes you need, and let clients of your header files #include the additional headers necessary to make their code compile. A few clients may grumble that this is inconvenient, but rest assured that you are saving them much more pain than you're inflicting. In fact, this technique is so well-regarded, it's enshrined in the standard C++ library (see Item 49); the header <iosfwd> contains declarations (and only declarations) for the types in the iostream library.

Classes like Person that contain only a pointer to an unspecified implementation are often called Handle classes or Envelope classes. (In the former case, the classes they point to are called Body classes; in latter case, the pointed-to classes are known as Letter classes.) Occasionally, you may hear people refer to such classes as Cheshire Cat classes, an allusion to the cat in Alice in Wonderland that could, when it chose, leave behind only its smile after the rest of it had vanished.

Lest you wonder how Handle classes actually do anything, the answer is simple: they forward all their function calls to the corresponding Body classes, and those classes do the real work. For example, here's how two of Person's member functions would be implemented:

#include "Person.h"          // because we're implementing
                             // the Person class, we must
                             // #include its class definition

#include "PersonImpl.h"      // we must also #include
                             // PersonImpl's class definition,
                             // otherwise we couldn't call
                             // its member functions. Note
                             // that PersonImpl has exactly
// the same member functions as
                             // Person — their interfaces
                             // are identical

Person::Person(const string& name, const Date& birthday,
               const Address& addr, const Country& country)
{
  impl = new PersonImpl(name, birthday, addr, country);
}

string Person::name() const
{
  return impl->name();
}

Note how the Person constructor calls the PersonImpl constructor (implicitly, by using new — see Items 5 and M8) and how Person::name calls PersonImpl::name. This is important. Making Person a handle class doesn't change what Person does, it just changes where it does it.

An alternative to the Handle class approach is to make Person a special kind of abstract base class called a Protocol class. By definition, a Protocol class has no implementation; its raison d'être is to specify an interface for derived classes (see Item 36). As a result, it typically has no data members, no constructors, a virtual destructor (see Item 14), and a set of pure virtual functions that specify the interface. A Protocol class for Person might look like this:

class Person {
public:
  virtual ~Person();

  virtual string name() const = 0;
  virtual string birthDate() const = 0;
  virtual string address() const = 0;
  virtual string nationality() const = 0;
};

Clients of this Person class must program in terms of Person pointers and references, because it's not possible to instantiate classes containing pure virtual functions. (It is, however, possible to instantiate classes derived from Person — see below.) Like clients of Handle classes, clients of Protocol classes need not recompile unless the Protocol class's interface is modified.

Of course, clients of a Protocol class must have some way of creating new objects. They typically do it by calling a function that plays the role of the constructor for the hidden (derived) classes that are actually instantiated. Such functions go by several names (among them factory functions and virtual constructors), but they all behave the same way: they return pointers to dynamically allocated objects that support the Protocol class's interface (see also Item M25). Such a function might be declared like this,

// makePerson is a "virtual constructor" (aka, a "factory
// function") for objects supporting the Person interface
Person*
  makePerson(const string& name,         // return a ptr to
             const Date& birthday,       // a new Person
             const Address& addr,        // initialized with
             const Country& country);    // the given params

and used by clients like this:

string name;
Date dateOfBirth;
Address address;
Country nation;

...

// create an object supporting the Person interface
Person *pp = makePerson(name, dateOfBirth, address, nation);

...

cout  << pp->name()              // use the object via the
      << " was born on "         // Person interface
      << pp->birthDate()
      << " and now lives at "
      << pp->address();

...

delete pp;                       // delete the object when
                                 // it's no longer needed

Because functions like makePerson are closely associated with the Protocol class whose interface is supported by the objects they create, it's good style to declare them static inside the Protocol class:

class Person {
public:
  ...  						// as above

// makePerson is now a member of the class
  static Person * makePerson(const string& name,
                             const Date& birthday,
                             const Address& addr,
                             const Country& country);
};

This avoids cluttering the global namespace (or any other namespace) with lots of functions of this nature (see also Item 28).

At some point, of course, concrete classes supporting the Protocol class's interface must be defined and real constructors must be called. That all happens behind the scenes inside the implementation files for the virtual constructors. For example, the Protocol class Person might have a concrete derived class RealPerson that provides implementations for the virtual functions it inherits:

class RealPerson: public Person {
public:
  RealPerson(const string& name, const Date& birthday,
             const Address& addr, const Country& country)
  :  name_(name), birthday_(birthday),
     address_(addr), country_(country)
  {}

  virtual ~RealPerson() {}

string name() const;          // implementations of
  string birthDate() const;     // these functions are not
  string address() const;       // shown, but they are
  string nationality() const;   // easy to imagine

private:
  string name_;
  Date birthday_;
  Address address_;
  Country country_;
};

Given RealPerson, it is truly trivial to write Person::makePerson:

Person * Person::makePerson(const string& name,
                            const Date& birthday,
                            const Address& addr,
                            const Country& country)
{
  return new RealPerson(name, birthday, addr, country);
}

RealPerson demonstrates one of the two most common mechanisms for implementing a Protocol class: it inherits its interface specification from the Protocol class (Person), then it implements the functions in the interface. A second way to implement a Protocol class involves multiple inheritance, a topic explored in Item 43.

Okay, so Handle classes and Protocol classes decouple interfaces from implementations, thereby reducing compilation dependencies between files. Cynic that you are, I know you're waiting for the fine print. "What does all this hocus-pocus cost me?" you mutter. The answer is the usual one in Computer Science: it costs you some speed at runtime, plus some additional memory per object.

In the case of Handle classes, member functions have to go through the implementation pointer to get to the object's data. That adds one level of indirection per access. And you must add the size of this implementation pointer to the amount of memory required to store each object. Finally, the implementation pointer has to be initialized (in the Handle class's constructors) to point to a dynamically allocated implementation object, so you incur the overhead inherent in dynamic memory allocation (and subsequent deallocation) — see Item 10.

For Protocol classes, every function call is virtual, so you pay the cost of an indirect jump each time you make a function call (see Items 14 and M24). Also, objects derived from the Protocol class must contain a virtual pointer (again, see Items 14 and M24). This pointer may increase the amount of memory needed to store an object, depending on whether the Protocol class is the exclusive source of virtual functions for the object.

Finally, neither Handle classes nor Protocol classes can get much use out of inline functions. All practical uses of inlines require access to implementation details, and that's the very thing that Handle classes and Protocol classes are designed to avoid in the first place.

It would be a serious mistake, however, to dismiss Handle classes and Protocol classes simply because they have a cost associated with them. So do virtual functions, and you wouldn't want to forgo those, would you? (If so, you're reading the wrong book.) Instead, consider using these techniques in an evolutionary manner. Use Handle classes and Protocol classes during development to minimize the impact on clients when implementations change. Replace Handle classes and Protocol classes with concrete classes for production use when it can be shown that the difference in speed and/or size is significant enough to justify the increased coupling between classes. Someday, we may hope, tools will be available to perform this kind of transformation automatically.

A skillful blending of Handle classes, Protocol classes, and concrete classes will allow you to develop software systems that execute efficiently and are easy to evolve, but there is a serious disadvantage: you may have to cut down on the long breaks you've been taking while your programs recompile.

Back to Item 33: Use inlining judiciously.
Continue to Inheritance and Object-Oriented Design