Item 27: Requiring or prohibiting heap-based objects.
Sometimes you want to arrange things so that objects of a particular type can commit suicide, i.e., can "delete
this
." Such an arrangement clearly requires that objects of that type be allocated on the heap. Other times you'll want to bask in the certainty that there can be no memory leaks for a particular class, because none of the objects could have been allocated on the heap. This might be the case if you are working on an embedded system, where memory leaks are especially troublesome and heap space is at a premium. Is it possible to produce code that requires or prohibits heap-based objects? Often it is, but it also turns out that the notion of being "on the heap" is more nebulous than you might
Let us begin with the prospect of limiting object creation to the heap. To enforce such a restriction, you've got to find a way to prevent clients from creating objects other than by calling new
. This is easy to do. Non-heap objects are automatically constructed at their point of definition and automatically destructed at the end of their lifetime, so it suffices to simply make these implicit constructions and destructions
The straightforward way to make these calls illegal is to declare the constructors and the destructor private
. This is overkill. There's no reason why they both need to be private. Better to make the destructor private and the constructors public. Then, in a process that should be familiar from Item 26, you can introduce a privileged pseudo-destructor function that has access to the real destructor. Clients then call the pseudo-destructor to destroy the objects they've
If, for example, we want to ensure that objects representing unlimited precision numbers are created only on the heap, we can do it like
class UPNumber { public: UPNumber(); UPNumber(int initValue); UPNumber(double initValue); UPNumber(const UPNumber& rhs); // pseudo-destructor (a const member function, because // even const objects may be destroyed) void destroy() const { delete this; } ... private: ~UPNumber(); };
Clients would then program like
UPNumber n; // error! (legal here, but // illegal when n's dtor is // later implicitly invoked) UPNumber *p = new UPNumber; // fine ... delete p; // error! attempt to call // private destructor p->destroy(); // fine
An alternative is to declare all the constructors private. The drawback to that idea is that a class often has many constructors, and the class's author must remember to declare each of them private. This includes the copy constructor, and it may include a default constructor, too, if these functions would otherwise be generated by compilers; compiler-generated functions are always public (see Item E45). As a result, it's easier to declare only the destructor private, because a class can have only one of
Restricting access to a class's destructor or its constructors prevents the creation of non-heap objects, but, in a story that is told in Item 26, it also prevents both inheritance and
class UPNumber { ... }; // declares dtor or ctors // private class NonNegativeUPNumber: public UPNumber { ... }; // error! dtor or ctors // won't compile class Asset { private: UPNumber value; ... // error! dtor or ctors // won't compile };
Neither of these difficulties is insurmountable. The inheritance problem can be solved by making UPNumber
's destructor protected (while keeping its constructors public), and classes that need to contain objects of type UPNumber
can be modified to contain pointers to UPNumber
objects
class UPNumber { ... }; // declares dtor protected class NonNegativeUPNumber: public UPNumber { ... }; // now okay; derived // classes have access to // protected members class Asset { public: Asset(int initValue); ~Asset(); ... private: UPNumber *value; }; Asset::Asset(int initValue) : value(new UPNumber(initValue)) // fine { ... } Asset::~Asset() { value->destroy(); } // also fine
Determining Whether an Object is On The Heap
If we adopt this strategy, we must reexamine what it means to be "on the heap." Given the class definition sketched above, it's legal to define a non-heap NonNegativeUPNumber
NonNegativeUPNumber n; // fine
Now, the UPNumber
part of the NonNegativeUPNumber
object n
is not on the heap. Is that okay? The answer depends on the details of the class's design and implementation, but let us suppose it is not okay, that all UPNumber
objects even base class parts of more derived objects must be on the heap. How can we enforce this
There is no easy way. It is not possible for a UPNumber
constructor to determine whether it's being invoked as the base class part of a heap-based object. That is, there is no way for the UPNumber
constructor to detect that the following contexts are
NonNegativeUPNumber *n1 = new NonNegativeUPNumber; // on heap NonNegativeUPNumber n2; // not on heap
But perhaps you don't believe me. Perhaps you think you can play games with the interaction among the new
operator, operator
new
and the constructor that the new
operator calls (see Item 8). Perhaps you think you can outsmart them all by modifying UPNumber
as
class UPNumber { public: // exception to throw if a non-heap object is created class HeapConstraintViolation {}; static void * operator new(size_t size); UPNumber(); ... private: static bool onTheHeap; // inside ctors, whether // the object being ... // constructed is on heap }; // obligatory definition of class static bool UPNumber::onTheHeap = false; void *UPNumber::operator new(size_t size) { onTheHeap = true; return ::operator new(size); } UPNumber::UPNumber() { if (!onTheHeap) { throw HeapConstraintViolation(); } proceed with normal construction here; onTheHeap = false; // clear flag for next obj. }
There's nothing deep going on here. The idea is to take advantage of the fact that when an object is allocated on the heap, operator
new
is called to allocate the raw memory, then a constructor is called to initialize an object in that memory. In particular, operator
new
sets onTheHeap
to true, and each constructor checks onTheHeap
to see if the raw memory of the object being constructed was allocated by operator
new
. If not, an exception of type HeapConstraintViolation
is thrown. Otherwise, construction proceeds as usual, and when construction is finished, onTheHeap
is set to false, thus resetting the default value for the next object to be
This is a nice enough idea, but it won't work. Consider this potential client
UPNumber *numberArray = new UPNumber[100];
The first problem is that the memory for the array is allocated by operator
, not operator
new
, but (provided your compilers support it) you can write the former function as easily as the latter. What is more troublesome is the fact that numberArray
has 100 elements, so there will be 100 constructor calls. But there is only one call to allocate memory, so onTheHeap
will be set to true for only the first of those 100 constructors. When the second constructor is called, an exception is thrown, and woe is
Even without arrays, this bit-setting business may fail. Consider this
UPNumber *pn = new UPNumber(*new UPNumber);
Here we create two UPNumber
s on the heap and make pn
point to one of them; it's initialized with the value of the second one. This code has a resource leak, but let us ignore that in favor of an examination of what happens during execution of this
new UPNumber(*new UPNumber)
This contains two calls to the new
operator, hence two calls to operator
new and two calls to UPNumber
constructors (see Item 8). Programmers typically expect these function calls to be executed in this
operator
new
for first object
operator
new
for second object
but the language makes no guarantee that this is how it will be done. Some compilers generate the function calls in this order
operator
new
for first object
operator
new
for second object
There is nothing wrong with compilers that generate this kind of code, but the set-a-bit-in-operator
-new
trick fails with such compilers. That's because the bit set in steps 1 and 2 is cleared in step 3, thus making the object constructed in step 4 think it's not on the heap, even though it
These difficulties don't invalidate the basic idea of having each constructor check to see if *this
is on the heap. Rather, they indicate that checking a bit set inside operator
new
(or operator
) is not a reliable way to determine this information. What we need is a better way to figure it
If you're desperate enough, you might be tempted to descend into the realm of the unportable. For example, you might decide to take advantage of the fact that on many systems, a program's address space is organized as a linear sequence of addresses, with the program's stack growing down from the top of the address space and the heap rising up from the
On systems that organize a program's memory in this way (many do, but many do not), you might think you could use the following function to determine whether a particular address is on the
// incorrect attempt to determine whether an address // is on the heap bool onHeap(const void *address) { char onTheStack; // local stack variable return address < &onTheStack; }
The thinking behind this function is interesting. Inside onHeap
, onTheStack
is a local variable. As such, it is, well, it's on the stack. When onHeap
is called, its stack frame (i.e., its activation record) will be placed at the top of the program's stack, and because the stack grows down (toward lower addresses) in this architecture, the address of onTheStack
must be less than the address of any other stack-based variable or object. If the parameter address
is less than the location of onTheStack
, it can't be on the stack, so it must be on the
Such logic is fine, as far as it goes, but it doesn't go far enough. The fundamental problem is that there are three places where objects may be allocated, not two. Yes, the stack and the heap hold objects, but let us not forget about static objects. Static objects are those that are initialized only once during a program run. Static objects comprise not only those objects explicitly declared static
, but also objects at global and namespace scope (see Item E47). Such objects have to go somewhere, and that somewhere is neither the stack nor the
Where they go is system-dependent, but on many of the systems that have the stack and heap grow toward one another, they go below the heap. The earlier picture of memory organization, while telling the truth and nothing but the truth for many systems, failed to tell the whole truth for those systems. With static objects added to the picture, it looks like
Suddenly it becomes clear why onHeap
won't work, not even on systems where it's purported to: it fails to distinguish between heap objects and static
void allocateSomeObjects() { char *pc = new char; // heap object: onHeap(pc) // will return true char c; // stack object: onHeap(&c) // will return false static char sc; // static object: onHeap(&sc) // will return true ... }
Now, you may be desperate for a way to tell heap objects from stack objects, and in your desperation you may be willing to strike a deal with the portability Devil, but are you so desperate that you'll strike a deal that fails to guarantee you the right answers? Surely not, so I know you'll reject this seductive but unreliable compare-the-addresses
The sad fact is there's not only no portable way to determine whether an object is on the heap, there isn't even a semi-portable way that works most of the time. If you absolutely, positively have to tell whether an address is on the heap, you're going to have to turn to unportable, implementation-dependent system calls, and that's that. As such, you're better off trying to redesign your software so you don't need to determine whether an object is on the heap in the first
If you find yourself obsessing over whether an object is on the heap, the likely cause is that you want to know if it's safe to invoke delete
on it. Often such deletion will take the form of the infamous "delete
this
." Knowing whether it's safe to delete a pointer, however, is not the same as simply knowing whether that pointer points to something on the heap, because not all pointers to things on the heap can be safely delete
d. Consider again an Asset
object that contains a UPNumber
class Asset { private: UPNumber value; ... }; Asset *pa = new Asset;
Clearly *pa
(including its member value
) is on the heap. Equally clearly, it's not safe to invoke delete
on a pointer to pa->value
, because no such pointer was ever returned from new
.
As luck would have it, it's easier to determine whether it's safe to delete a pointer than to determine whether a pointer points to something on the heap, because all we need to answer the former question is a collection of addresses that have been returned by operator
new
. Since we can write operator
new
ourselves (see Items E8-E10), it's easy to construct such a collection. Here's how we might approach the
void *operator new(size_t size) { void *p = getMemory(size); // call some function to // allocate memory and // handle out-of-memory // conditions add p to the collection of allocated addresses; return p; } void operator delete(void *ptr) { releaseMemory(ptr); // return memory to // free store remove ptr from the collection of allocated addresses; } bool isSafeToDelete(const void *address) { return whether address is in collection of allocated addresses; }
This is about as simple as it gets. operator
new
adds entries to a collection of allocated addresses, operator
delete
removes entries, and isSafeToDelete
does a lookup in the collection to see if a particular address is there. If the operator
new
and operator
delete
functions are at global scope, this should work for all types, even the
In practice, three things are likely to dampen our enthusiasm for this design. The first is our extreme reluctance to define anything at global scope, especially functions with predefined meanings like operator
new
and operator
delete
. Knowing as we do that there is but one global scope and but a single version of operator
new
and operator
delete
with the "normal" signatures (i.e., sets of parameter types) within that scope (see Item E9), the last thing we want to do is seize those function signatures for ourselves. Doing so would render our software incompatible with any other software that also implements global versions of operator
new
and operator
delete
(such as many object-oriented database
Our second consideration is one of efficiency: why burden all heap allocations with the bookkeeping overhead necessary to keep track of returned addresses if we don't need
Our final concern is pedestrian, but important. It turns out to be essentially impossible to implement isSafeToDelete
so that it always works. The difficulty has to do with the fact that objects with multiple or virtual base classes have multiple addresses, so there's no guarantee that the address passed to isSafeToDelete
is the same as the one returned from operator
new
, even if the object in question was allocated on the heap. For details, see Items 24 and 31.
What we'd like is the functionality provided by these functions without the concomitant pollution of the global namespace, the mandatory overhead, and the correctness problems. Fortunately, C++ gives us exactly what we need in the form of an abstract mixin base
An abstract base class is a base class that can't be instantiated, i.e., one with at least one pure virtual function. A mixin ("mix in") class is one that provides a single well-defined capability and is designed to be compatible with any other capabilities an inheriting class might provide (see Item E7). Such classes are nearly always abstract. We can therefore come up with an abstract mixin base class that offers derived classes the ability to determine whether a pointer was allocated from operator
new
. Here's such a
class HeapTracked { // mixin class; keeps track of public: // ptrs returned from op. new class MissingAddress{}; // exception class; see below virtual ~HeapTracked() = 0; static void *operator new(size_t size); static void operator delete(void *ptr); bool isOnHeap() const; private: typedef const void* RawAddress; static list<RawAddress> addresses; };
This class uses the list
data structure that's part of the standard C++ library (see Item E49 and Item 35) to keep track of all pointers returned from operator
new
. That function allocates memory and adds entries to the list; operator
delete
deallocates memory and removes entries from the list; and isOnHeap
returns whether an object's address is in the
Implementation of the HeapTracked
class is simple, because the global operator
new
and operator
delete
functions are called to perform the real memory allocation and deallocation, and the list
class has functions to make insertion, removal, and lookup single-statement operations. Here's the full implementation of HeapTracked
:
// mandatory definition of static class member list<RawAddress> HeapTracked::addresses; // HeapTracked's destructor is pure virtual to make the // class abstract (see Item E14). The destructor must still // be defined, however, so we provide this empty definition. HeapTracked::~HeapTracked() {} void * HeapTracked::operator new(size_t size) { void *memPtr = ::operator new(size); // get the memory addresses.push_front(memPtr); // put its address at // the front of the list return memPtr; } void HeapTracked::operator delete(void *ptr) { // get an "iterator" that identifies the list // entry containing ptr; see Item 35 for details list<RawAddress>::iterator it = find(addresses.begin(), addresses.end(), ptr); if (it != addresses.end()) { // if an entry was found addresses.erase(it); // remove the entry ::operator delete(ptr); // deallocate the memory } else { // otherwise throw MissingAddress(); // ptr wasn't allocated by } // op. new, so throw an } // exception bool HeapTracked::isOnHeap() const { // get a pointer to the beginning of the memory // occupied by *this; see below for details const void *rawAddress = dynamic_cast<const void*>(this); // look up the pointer in the list of addresses // returned by operator new list<RawAddress>::iterator it = find(addresses.begin(), addresses.end(), rawAddress); return it != addresses.end(); // return whether it was } // found
This code is straightforward, though it may not look that way if you are unfamiliar with the list
class and the other components of the Standard Template Library. Item 35 explains everything, but the comments in the code above should be sufficient to explain what's happening in this
The only other thing that may confound you is this statement (in isOnHeap
):
const void *rawAddress = dynamic_cast<const void*>(this);
I mentioned earlier that writing the global function isSafeToDelete
is complicated by the fact that objects with multiple or virtual base classes have several addresses. That problem plagues us in isOnHeap
, too, but because isOnHeap
applies only to HeapTracked
objects, we can exploit a special feature of the dynamic_cast
operator (see Item 2) to eliminate the problem. Simply put, dynamic_cast
ing a pointer to void*
(or const
void*
or volatile
void*
or, for those who can't get enough modifiers in their usual diet, const
volatile
void*
) yields a pointer to the beginning of the memory for the object pointed to by the pointer. But dynamic_cast
is applicable only to pointers to objects that have at least one virtual function. Our ill-fated isSafeToDelete
function had to work with any type of pointer, so dynamic_cast
wouldn't help it. isOnHeap
is more selective (it tests only pointers to HeapTracked
objects), so dynamic_cast
ing this
to const
void*
gives us a pointer to the beginning of the memory for the current object. That's the pointer that HeapTracked
::operator
new
must have returned if the memory for the current object was allocated by HeapTracked
::operator
new
in the first place. Provided your compilers support the dynamic_cast
operator, this technique is completely
Given this class, even BASIC programmers could add to a class the ability to track pointers to heap allocations. All they'd need to do is have the class inherit from HeapTracked
. If, for example, we want to be able to determine whether a pointer to an Asset
object points to a heap-based object, we'd modify Asset
's class definition to specify HeapTracked
as a base
class Asset: public HeapTracked { private: UPNumber value; ...
};
We could then query Asset*
pointers as
void inventoryAsset(const Asset *ap) { if (ap->isOnHeap()) { ap is a heap-based asset inventory it as such; } else { ap is a non-heap-based asset record it that way; } }
A disadvantage of a mixin class like HeapTracked
is that it can't be used with the built-in types, because types like int
and char
can't inherit from anything. Still, the most common reason for wanting to use a class like HeapTracked
is to determine whether it's okay to "delete
this
," and you'll never want to do that with a built-in type because such types have no this
Prohibiting Heap-Based Objects
Thus ends our examination of determining whether an object is on the heap. At the opposite end of the spectrum is preventing objects from being allocated on the heap. Here the outlook is a bit brighter. There are, as usual, three cases: objects that are directly instantiated, objects instantiated as base class parts of derived class objects, and objects embedded inside other objects. We'll consider each in
Preventing clients from directly instantiating objects on the heap is easy, because such objects are always created by calls to new
and you can make it impossible for clients to call new
. Now, you can't affect the availability of the new
operator (that's built into the language), but you can take advantage of the fact that the new
operator always calls operator
new
(see Item 8), and that function is one you can declare yourself. In particular, it is one you can declare private
. If, for example, you want to keep clients from creating UPNumber
objects on the heap, you could do it this
class UPNumber { private: static void *operator new(size_t size); static void operator delete(void *ptr); ... };
Clients can now do only what they're supposed to be able to
UPNumber n1; // okay static UPNumber n2; // also okay UPNumber *p = new UPNumber; // error! attempt to call // private operator new
It suffices to declare operator
new
private, but it looks strange to have operator
new
be private and operator
delete
be public, so unless there's a compelling reason to split up the pair, it's best to declare them in the same part of a class. If you'd like to prohibit heap-based arrays of UPNumber
objects, too, you could declare operator
and operator
(see Item 8) private as well. (The bond between operator
new
and operator
delete
is stronger than many people think. For information on a rarely-understood aspect of their relationship, turn to the sidebar in my article on counting objects.)
Interestingly, declaring operator
new
private often also prevents UPNumber
objects from being instantiated as base class parts of heap-based derived class objects. That's because operator
new
and operator
delete
are inherited, so if these functions aren't declared public in a derived class, that class inherits the private versions declared in its
class UPNumber { ... }; // as above class NonNegativeUPNumber: // assume this class public UPNumber { // declares no operator new ... }; NonNegativeUPNumber n1; // okay static NonNegativeUPNumber n2; // also okay NonNegativeUPNumber *p = // error! attempt to call new NonNegativeUPNumber; // private operator new
If the derived class declares an operator
new
of its own, that function will be called when allocating derived class objects on the heap, and a different way will have to be found to prevent UPNumber
base class parts from winding up there. Similarly, the fact that UPNumber
's operator
new
is private has no effect on attempts to allocate objects containing UPNumber
objects as
class Asset { public: Asset(int initValue); ... private: UPNumber value; }; Asset *pa = new Asset(100); // fine, calls // Asset::operator new or // ::operator new, not // UPNumber::operator new
For all practical purposes, this brings us back to where we were when we wanted to throw an exception in the UPNumber
constructors if a UPNumber
object was being constructed in memory that wasn't on the heap. This time, of course, we want to throw an exception if the object in question is on the heap. Just as there is no portable way to determine if an address is on the heap, however, there is no portable way to determine that it is not on the heap, so we're out of luck. This should be no surprise. After all, if we could tell when an address is on the heap, we could surely tell when an address is not on the heap. But we can't, so we can't. Oh