Item 34:  Minimize compilation dependencies between files.

So you go into your C++ program and you make a minor change to the implementation of a class. Not the class interface, mind you, just the implementation; only the private stuff. Then you get set to rebuild the program, figuring that the compilation and linking should take only a few seconds. After all, only one class has been modified. You click on Rebuild or type make (or its moral equivalent), and you are astonished, then mortified, as you realize that the whole world is being recompiled and relinked!

Don't you just hate it when that happens?

The problem is that C++ doesn't do a very good job of separating interfaces from implementations. In particular, class definitions include not only the interface specification, but also a fair number of implementation details. For example:

This is hardly a Nobel Prize-winning class design, although it does illustrate an interesting naming convention for distinguishing private data from public functions when the same name makes sense for both: the former are tagged with a trailing underbar. The important thing to observe is that class Person can't be compiled unless the compiler also has access to definitions for the classes in terms of which Person is implemented, namely, string, Date, Address, and Country. Such definitions are typically provided through #include directives, so at the top of the file defining the Person class, you are likely to find something like this:

Unfortunately, this sets up a compilation dependency between the file defining Person and these include files. As a result, if any of these auxiliary classes changes its implementation, or if any of the classes on which it depends changes its implementation, the file containing the Person class must be recompiled, as must any files that use the Person class. For clients of Person, this can be more than annoying. It can be downright incapacitating.

You might wonder why C++ insists on putting the implementation details of a class in the class definition. For example, why can't you define Person this way,

specifying the implementation details of the class separately? If that were possible, clients of Person would have to recompile only if the interface to the class changed. Because interfaces tend to stabilize before implementations do, such a separation of interface from implementation could save untold hours of recompilation and linking over the course of a large software effort.

Alas, the real world intrudes on this idyllic scenario, as you will appreciate when you consider something like this:

When compilers see the definition for x, they know they must allocate enough space to hold an int. No problem. Each compiler knows how big an int is. When compilers see the definition for p, however, they know they have to allocate enough space for a Person, but how are they supposed to know how big a Person object is? The only way they can get that information is to consult the class definition, but if it were legal for a class definition to omit the implementation details, how would compilers know how much space to allocate?

In principle, this is no insuperable problem. Languages such as Smalltalk, Eiffel, and Java get around it all the time. The way they do it is by allocating only enough space for a pointer to an object when an object is defined. That is, they handle the code above as if it had been written like this:

It may have occurred to you that this is in fact legal C++, and it turns out that you can play the "hide the object implementation behind a pointer" game yourself.

Here's how you employ the technique to decouple Person's interface from its implementation. First, you put only the following in the header file declaring the Person class:

Now clients of Person are completely divorced from the details of strings, dates, addresses, countries, and persons. Those classes can be modified at will, but Person clients may remain blissfully unaware. More to the point, they may remain blissfully un-recompiled. In addition, because they're unable to see the details of Person's implementation, clients are unlikely to write code that somehow depends on those details. This is a true separation of interface and implementation.

The key to this separation is replacement of dependencies on class definitions with dependencies on class declarations. That's all you need to know about minimizing compilation dependencies: make your header files self-sufficient whenever it's practical, and when it's not practical, be dependent on class declarations, not class definitions. Everything else flows from this simple design strategy.

There are three immediate implications:

Classes like Person that contain only a pointer to an unspecified implementation are often called Handle classes or Envelope classes. (In the former case, the classes they point to are called Body classes; in latter case, the pointed-to classes are known as Letter classes.) Occasionally, you may hear people refer to such classes as Cheshire Cat classes, an allusion to the cat in Alice in Wonderland that could, when it chose, leave behind only its smile after the rest of it had vanished.

Lest you wonder how Handle classes actually do anything, the answer is simple: they forward all their function calls to the corresponding Body classes, and those classes do the real work. For example, here's how two of Person's member functions would be implemented:

Note how the Person constructor calls the PersonImpl constructor (implicitly, by using new — see Items 5 and M8) and how Person::name calls PersonImpl::name. This is important. Making Person a handle class doesn't change what Person does, it just changes where it does it.

An alternative to the Handle class approach is to make Person a special kind of abstract base class called a Protocol class. By definition, a Protocol class has no implementation; its raison d'être is to specify an interface for derived classes (see Item 36). As a result, it typically has no data members, no constructors, a virtual destructor (see Item 14), and a set of pure virtual functions that specify the interface. A Protocol class for Person might look like this:

Clients of this Person class must program in terms of Person pointers and references, because it's not possible to instantiate classes containing pure virtual functions. (It is, however, possible to instantiate classes derived from Person — see below.) Like clients of Handle classes, clients of Protocol classes need not recompile unless the Protocol class's interface is modified.

Of course, clients of a Protocol class must have some way of creating new objects. They typically do it by calling a function that plays the role of the constructor for the hidden (derived) classes that are actually instantiated. Such functions go by several names (among them factory functions and virtual constructors), but they all behave the same way: they return pointers to dynamically allocated objects that support the Protocol class's interface (see also Item M25). Such a function might be declared like this,

and used by clients like this:

Because functions like makePerson are closely associated with the Protocol class whose interface is supported by the objects they create, it's good style to declare them static inside the Protocol class:

This avoids cluttering the global namespace (or any other namespace) with lots of functions of this nature (see also Item 28).

At some point, of course, concrete classes supporting the Protocol class's interface must be defined and real constructors must be called. That all happens behind the scenes inside the implementation files for the virtual constructors. For example, the Protocol class Person might have a concrete derived class RealPerson that provides implementations for the virtual functions it inherits:

Given RealPerson, it is truly trivial to write Person::makePerson:

RealPerson demonstrates one of the two most common mechanisms for implementing a Protocol class: it inherits its interface specification from the Protocol class (Person), then it implements the functions in the interface. A second way to implement a Protocol class involves multiple inheritance, a topic explored in Item 43.

Okay, so Handle classes and Protocol classes decouple interfaces from implementations, thereby reducing compilation dependencies between files. Cynic that you are, I know you're waiting for the fine print. "What does all this hocus-pocus cost me?" you mutter. The answer is the usual one in Computer Science: it costs you some speed at runtime, plus some additional memory per object.

In the case of Handle classes, member functions have to go through the implementation pointer to get to the object's data. That adds one level of indirection per access. And you must add the size of this implementation pointer to the amount of memory required to store each object. Finally, the implementation pointer has to be initialized (in the Handle class's constructors) to point to a dynamically allocated implementation object, so you incur the overhead inherent in dynamic memory allocation (and subsequent deallocation) — see Item 10.

For Protocol classes, every function call is virtual, so you pay the cost of an indirect jump each time you make a function call (see Items 14 and M24). Also, objects derived from the Protocol class must contain a virtual pointer (again, see Items 14 and M24). This pointer may increase the amount of memory needed to store an object, depending on whether the Protocol class is the exclusive source of virtual functions for the object.

Finally, neither Handle classes nor Protocol classes can get much use out of inline functions. All practical uses of inlines require access to implementation details, and that's the very thing that Handle classes and Protocol classes are designed to avoid in the first place.

It would be a serious mistake, however, to dismiss Handle classes and Protocol classes simply because they have a cost associated with them. So do virtual functions, and you wouldn't want to forgo those, would you? (If so, you're reading the wrong book.) Instead, consider using these techniques in an evolutionary manner. Use Handle classes and Protocol classes during development to minimize the impact on clients when implementations change. Replace Handle classes and Protocol classes with concrete classes for production use when it can be shown that the difference in speed and/or size is significant enough to justify the increased coupling between classes. Someday, we may hope, tools will be available to perform this kind of transformation automatically.

A skillful blending of Handle classes, Protocol classes, and concrete classes will allow you to develop software systems that execute efficiently and are easy to evolve, but there is a serious disadvantage: you may have to cut down on the long breaks you've been taking while your programs recompile.

