More Effective C++ | Item 34: Understand how to combine C++ and C in the same program

Back to Item 33: Make non-leaf classes abstract
Continue to Item 35: Familiarize yourself with the language standard

Item 34: Understand how to combine C++ and C in the same program.

In many ways, the things you have to worry about when making a program out of some components in C++ and some in C are the same as those you have to worry about when cobbling together a C program out of object files produced by more than one C compiler. There is no way to combine such files unless the different compilers agree on implementation-dependent features like the size of ints and doubles, the mechanism by which parameters are passed from caller to callee, and whether the caller or the callee orchestrates the passing. These pragmatic aspects of mixed-compiler software development are quite properly ignored by °language standardization efforts, so the only reliable way to know that object files from compiler A and compiler B can be safely combined in a program is to obtain assurances from the vendors of A and B that their products produce compatible output. This is as true for programs made up of C++ and C as it is for all-C++ or all-C programs, so before you try to mix C++ and C in the same program, make sure your C++ and C compilers generate compatible object files.

Having done that, there are four other things you need to consider: name mangling, initialization of statics, dynamic memory allocation, and data structure compatibility.

Name Mangling

Name mangling, as you may know, is the process through which your C++ compilers give each function in your program a unique name. In C, this process is unnecessary, because you can't overload function names, but nearly all C++ programs have at least a few functions with the same name. (Consider, for example, the iostream library, which declares several versions of operator<< and operator>>.) Overloading is incompatible with most linkers, because linkers generally take a dim view of multiple functions with the same name. Name mangling is a concession to the realities of linkers; in particular, to the fact that linkers usually insist on all function names being unique.

As long as you stay within the confines of C++, name mangling is not likely to concern you. If you have a function name drawLine that a compiler mangles into xyzzy, you'll always use the name drawLine, and you'll have little reason to care that the underlying object files happen to refer to xyzzy.

It's a different story if drawLine is in a C library. In that case, your C++ source file probably includes a header file that contains a declaration like this,

void drawLine(int x1, int y1, int x2, int y2);

and your code contains calls to drawLine in the usual fashion. Each such call is translated by your compilers into a call to the mangled name of that function, so when you write this,

drawLine(a, b, c, d);          // call to unmangled function name

your object files contain a function call that corresponds to this:

xyzzy(a, b, c, d);             // call to mangled function mame

But if drawLine is a C function, the object file (or archive or dynamically linked library, etc.) that contains the compiled version of drawLine contains a function called drawLine; no name mangling has taken place. When you try to link the object files comprising your program together, you'll get an error, because the linker is looking for a function called xyzzy, and there is no such function.

To solve this problem, you need a way to tell your C++ compilers not to mangle certain function names. You never want to mangle the names of functions written in other languages, whether they be in C, assembler, FORTRAN, Lisp, Forth, or what-have-you. (Yes, what-have-you would include COBOL, but then what would you have?) After all, if you call a C function named drawLine, it's really called drawLine, and your object code should contain a reference to that name, not to some mangled version of that name.

To suppress name mangling, use C++'s extern "C" directive:

// declare a function called drawLine; don't mangle
// its name
extern "C"
void drawLine(int x1, int y1, int x2, int y2);

Don't be drawn into the trap of assuming that where there's an extern "C", there must be an extern "Pascal" and an extern "FORTRAN" as well. There's not, at least not in °the standard. The best way to view extern "C" is not as an assertion that the associated function is written in C, but as a statement that the function should be called as if it were written in C. (Technically, extern "C" means the function has C linkage, but what that means is far from clear. One thing it always means, however, is that name mangling is suppressed.)

For example, if you were so unfortunate as to have to write a function in assembler, you could declare it extern "C", too:

// this function is in assembler — don't mangle its name
extern "C" void twiddleBits(unsigned char bits);

You can even declare C++ functions extern "C". This can be useful if you're writing a library in C++ that you'd like to provide to clients using other programming languages. By suppressing the name mangling of your C++ function names, your clients can use the natural and intuitive names you choose instead of the mangled names your compilers would otherwise generate:

// the following C++ function is designed for use outside
// C++ and should not have its name mangled
extern "C" void simulate(int iterations);

Often you'll have a slew of functions whose names you don't want mangled, and it would be a pain to precede each with extern "C". Fortunately, you don't have to. extern "C" can also be made to apply to a whole set of functions. Just enclose them all in curly braces:

extern "C" {                           // disable name mangling for
                                       // all the following functions

void drawLine(int x1, int y1, int x2, int y2);
  void twiddleBits(unsigned char bits);
  void simulate(int iterations);
  ...
}

This use of extern "C" simplifies the maintenance of header files that must be used with both C++ and C. When compiling for C++, you'll want to include extern "C", but when compiling for C, you won't. By taking advantage of the fact that the preprocessor symbol __cplusplus is defined only for C++ compilations, you can structure your polyglot header files as follows:

#ifdef __cplusplus

extern "C" {

#endif

void drawLine(int x1, int y1, int x2, int y2);
  void twiddleBits(unsigned char bits);
  void simulate(int iterations);
  ...

#ifdef __cplusplus

}

#endif

There is, by the way, no such thing as a "standard" name mangling algorithm. Different compilers are free to mangle names in different ways, and different compilers do. This is a good thing. If all compilers mangled names the same way, you might be lulled into thinking they all generated compatible code. The way things are now, if you try to mix object code from incompatible C++ compilers, there's a good chance you'll get an error during linking, because the mangled names won't match up. This implies you'll probably have other compatibility problems, too, and it's better to find out about such incompatibilities sooner than later,

Initialization of Statics

Once you've mastered name mangling, you need to deal with the fact that in C++, lots of code can get executed before and after main. In particular, the constructors of static class objects and objects at global, namespace, and file scope are usually called before the body of main is executed. This process is known as static initialization (see Item E47). This is in direct opposition to the way we normally think about C++ and C programs, in which we view main as the entry point to execution of the program. Similarly, objects that are created through static initialization must have their destructors called during static destruction; that process typically takes place after main has finished executing.

To resolve the dilemma that main is supposed to be invoked first, yet objects need to be constructed before main is executed, many compilers insert a call to a special compiler-written function at the beginning of main, and it is this special function that takes care of static initialization. Similarly, compilers often insert a call to another special function at the end of main to take care of the destruction of static objects. Code generated for main often looks as if main had been written like this:

int main(int argc, char *argv[])
{
  performStaticInitialization();         // generated by the
                                         // implementation

  the statements you put in main go here;

performStaticDestruction();            // generated by the
                                         // implementation
}

Now don't take this too literally. The functions performStaticInitialization and performStaticDestruction usually have much more cryptic names, and they may even be generated inline, in which case you won't see any functions for them in your object files. The important point is this: if a C++ compiler adopts this approach to the initialization and destruction of static objects, such objects will be neither initialized nor destroyed unless main is written in C++. Because this approach to static initialization and destruction is common, you should try to write main in C++ if you write any part of a software system in C++.

Sometimes it would seem to make more sense to write main in C — say if most of a program is in C and C++ is just a support library. Nevertheless, there's a good chance the C++ library contains static objects (if it doesn't now, it probably will in the future — see Item 32), so it's still a good idea to write main in C++ if you possibly can. That doesn't mean you need to rewrite your C code, however. Just rename the main you wrote in C to be realMain, then have the C++ version of main call realMain:

extern "C"                                // implement this
int realMain(int argc, char *argv[]);     // function in C

int main(int argc, char *argv[])          // write this in C++
{
  return realMain(argc, argv);
}

If you do this, it's a good idea to put a comment above main explaining what is going on.

If you cannot write main in C++, you've got a problem, because there is no other portable way to ensure that constructors and destructors for static objects are called. This doesn't mean all is lost, it just means you'll have to work a little harder. Compiler vendors are well acquainted with this problem, so almost all provide some extralinguistic mechanism for initiating the process of static initialization and static destruction. For information on how this works with your compilers, dig into your compilers' documentation or contact their vendors.

Dynamic Memory Allocation

That brings us to dynamic memory allocation. The general rule is simple: the C++ parts of a program use new and delete (see Item 8), and the C parts of a program use malloc (and its variants) and free. As long as memory that came from new is deallocated via delete and memory that came from malloc is deallocated via free, all is well. Calling free on a newed pointer yields undefined behavior, however, as does deleteing a malloced pointer. The only thing to remember, then, is to segregate rigorously your news and deletes from your mallocs and frees.

Sometimes this is easier said than done. Consider the humble (but handy) strdup function, which, though standard in neither C nor C++, is nevertheless widely available:

char * strdup(const char *ps);         // return a copy of the
                                       // string pointed to by ps

If a memory leak is to be avoided, the memory allocated inside strdup must be deallocated by strdup's caller. But how is the memory to be deallocated? By using delete? By calling free? If the strdup you're calling is from a C library, it's the latter. If it was written for a C++ library, it's probably the former. What you need to do after calling strdup, then, varies not only from system to system, but also from compiler to compiler. To reduce such portability headaches, try to avoid calling functions that are neither in the standard library (see Item E49 and Item 35) nor available in a stable form on most computing platforms.

Data Structure Compatibility

Which brings us at long last to passing data between C++ and C programs. There's no hope of making C functions understand C++ features, so the level of discourse between the two languages must be limited to those concepts that C can express. Thus, it should be clear there's no portable way to pass objects or to pass pointers to member functions to routines written in C. C does understand normal pointers, however, so, provided your C++ and C compilers produce compatible output, functions in the two languages can safely exchange pointers to objects and pointers to non-member or static functions. Naturally, structs and variables of built-in types (e.g., ints, chars, etc.) can also freely cross the C++/C border.

Because the rules governing the layout of a struct in C++ are consistent with those of C, it is safe to assume that a structure definition that compiles in both languages is laid out the same way by both compilers. Such structs can be safely passed back and forth between C++ and C. If you add nonvirtual functions to the C++ version of the struct, its memory layout should not change, so objects of a struct (or class) containing only non-virtual functions should be compatible with their C brethren whose structure definition lacks only the member function declarations. Adding virtual functions ends the game, because the addition of virtual functions to a class causes objects of that type to use a different memory layout (see Item 24). Having a struct inherit from another struct (or class) usually changes its layout, too, so structs with base structs (or classes) are also poor candidates for exchange with C functions.

From a data structure perspective, it boils down to this: it is safe to pass data structures from C++ to C and from C to C++ provided the definition of those structures compiles in both C++ and C. Adding nonvirtual member functions to the C++ version of a struct that's otherwise compatible with C will probably not affect its compatibility, but almost any other change to the struct will.

Summary

If you want to mix C++ and C in the same program, remember the following simple guidelines:

Make sure the C++ and C compilers produce compatible object files.
Declare functions to be used by both languages extern "C".
If at all possible, write main in C++.
Always use delete with memory from new; always use free with memory from malloc.
Limit what you pass between the two languages to data structures that compile under C; the C++ version of structs may contain non-virtual member functions.

Back to Item 33: Make non-leaf classes abstract
Continue to Item 35: Familiarize yourself with the language standard