Item 34: Understand how to combine C++ and C in the same program.
In many ways, the things you have to worry about when making a program out of some components in C++ and some in C are the same as those you have to worry about when cobbling together a C program out of object files produced by more than one C compiler. There is no way to combine such files unless the different compilers agree on implementation-dependent features like the size of int
s and double
s, the mechanism by which parameters are passed from caller to callee, and whether the caller or the callee orchestrates the passing. These pragmatic aspects of mixed-compiler software development are quite properly ignored by
Having done that, there are four other things you need to consider: name mangling, initialization of statics, dynamic memory allocation, and data structure
Name mangling, as you may know, is the process through which your C++ compilers give each function in your program a unique name. In C, this process is unnecessary, because you can't overload function names, but nearly all C++ programs have at least a few functions with the same name. (Consider, for example, the iostream library, which declares several versions of operator<<
and operator>>
.) Overloading is incompatible with most linkers, because linkers generally take a dim view of multiple functions with the same name. Name mangling is a concession to the realities of linkers; in particular, to the fact that linkers usually insist on all function names being
As long as you stay within the confines of C++, name mangling is not likely to concern you. If you have a function name drawLine
that a compiler mangles into xyzzy
, you'll always use the name drawLine
, and you'll have little reason to care that the underlying object files happen to refer to xyzzy
.
It's a different story if drawLine
is in a C library. In that case, your C++ source file probably includes a header file that contains a declaration like
void drawLine(int x1, int y1, int x2, int y2);
and your code contains calls to drawLine
in the usual fashion. Each such call is translated by your compilers into a call to the mangled name of that function, so when you write
drawLine(a, b, c, d); // call to unmangled function name
your object files contain a function call that corresponds to
xyzzy(a, b, c, d); // call to mangled function mame
But if drawLine
is a C function, the object file (or archive or dynamically linked library, etc.) that contains the compiled version of drawLine
contains a function called drawLine
; no name mangling has taken place. When you try to link the object files comprising your program together, you'll get an error, because the linker is looking for a function called xyzzy
, and there is no such
To solve this problem, you need a way to tell your C++ compilers not to mangle certain function names. You never want to mangle the names of functions written in other languages, whether they be in C, assembler, FORTRAN, Lisp, Forth, or what-have-you. (Yes, what-have-you would include COBOL, but then what would you have?) After all, if you call a C function named drawLine
, it's really called drawLine
, and your object code should contain a reference to that name, not to some mangled version of that
To suppress name mangling, use C++'s extern
"C"
// declare a function called drawLine; don't mangle // its name extern "C" void drawLine(int x1, int y1, int x2, int y2);
Don't be drawn into the trap of assuming that where there's an extern
"C"
, there must be an extern
"Pascal"
and an extern
"FORTRAN"
as well. There's not, at least not in extern "C"
is not as an assertion that the associated function is written in C, but as a statement that the function should be called as if it were written in C. (Technically, extern
"C"
means the function has C linkage, but what that means is far from clear. One thing it always means, however, is that name mangling is
For example, if you were so unfortunate as to have to write a function in assembler, you could declare it extern
"C"
,
// this function is in assembler don't mangle its name extern "C" void twiddleBits(unsigned char bits);
You can even declare C++ functions extern
"C"
. This can be useful if you're writing a library in C++ that you'd like to provide to clients using other programming languages. By suppressing the name mangling of your C++ function names, your clients can use the natural and intuitive names you choose instead of the mangled names your compilers would otherwise
// the following C++ function is designed for use outside // C++ and should not have its name mangled extern "C" void simulate(int iterations);
Often you'll have a slew of functions whose names you don't want mangled, and it would be a pain to precede each with extern
"C"
. Fortunately, you don't have to. extern
"C"
can also be made to apply to a whole set of functions. Just enclose them all in curly
extern "C" { // disable name mangling for // all the following functions void drawLine(int x1, int y1, int x2, int y2); void twiddleBits(unsigned char bits); void simulate(int iterations); ... }
This use of extern
"C"
simplifies the maintenance of header files that must be used with both C++ and C. When compiling for C++, you'll want to include extern
"C"
, but when compiling for C, you won't. By taking advantage of the fact that the preprocessor symbol __cplusplus
is defined only for C++ compilations, you can structure your polyglot header files as
#ifdef __cplusplus extern "C" { #endif void drawLine(int x1, int y1, int x2, int y2); void twiddleBits(unsigned char bits); void simulate(int iterations); ... #ifdef __cplusplus } #endif
There is, by the way, no such thing as a "standard" name mangling algorithm. Different compilers are free to mangle names in different ways, and different compilers do. This is a good thing. If all compilers mangled names the same way, you might be lulled into thinking they all generated compatible code. The way things are now, if you try to mix object code from incompatible C++ compilers, there's a good chance you'll get an error during linking, because the mangled names won't match up. This implies you'll probably have other compatibility problems, too, and it's better to find out about such incompatibilities sooner than
Once you've mastered name mangling, you need to deal with the fact that in C++, lots of code can get executed before and after main
. In particular, the constructors of static class objects and objects at global, namespace, and file scope are usually called before the body of main
is executed. This process is known as static initialization (see Item E47). This is in direct opposition to the way we normally think about C++ and C programs, in which we view main
as the entry point to execution of the program. Similarly, objects that are created through static initialization must have their destructors called during static destruction; that process typically takes place after main
has finished
To resolve the dilemma that main
is supposed to be invoked first, yet objects need to be constructed before main
is executed, many compilers insert a call to a special compiler-written function at the beginning of main
, and it is this special function that takes care of static initialization. Similarly, compilers often insert a call to another special function at the end of main
to take care of the destruction of static objects. Code generated for main
often looks as if main
had been written like
int main(int argc, char *argv[]) { performStaticInitialization(); // generated by the // implementation the statements you put in main go here; performStaticDestruction(); // generated by the // implementation }
Now don't take this too literally. The functions performStaticInitialization
and performStaticDestruction
usually have much more cryptic names, and they may even be generated inline, in which case you won't see any functions for them in your object files. The important point is this: if a C++ compiler adopts this approach to the initialization and destruction of static objects, such objects will be neither initialized nor destroyed unless main
is written in C++. Because this approach to static initialization and destruction is common, you should try to write main
in C++ if you write any part of a software system in
Sometimes it would seem to make more sense to write main
in C say if most of a program is in C and C++ is just a support library. Nevertheless, there's a good chance the C++ library contains static objects (if it doesn't now, it probably will in the future see Item 32), so it's still a good idea to write main
in C++ if you possibly can. That doesn't mean you need to rewrite your C code, however. Just rename the main
you wrote in C to be realMain
, then have the C++ version of main
call realMain
:
extern "C" // implement this int realMain(int argc, char *argv[]); // function in C int main(int argc, char *argv[]) // write this in C++ { return realMain(argc, argv); }
If you do this, it's a good idea to put a comment above main
explaining what is going
If you cannot write main
in C++, you've got a problem, because there is no other portable way to ensure that constructors and destructors for static objects are called. This doesn't mean all is lost, it just means you'll have to work a little harder. Compiler vendors are well acquainted with this problem, so almost all provide some extralinguistic mechanism for initiating the process of static initialization and static destruction. For information on how this works with your compilers, dig into your compilers' documentation or contact their
That brings us to dynamic memory allocation. The general rule is simple: the C++ parts of a program use new
and delete
(see Item 8), and the C parts of a program use malloc
(and its variants) and free
. As long as memory that came from new
is deallocated via delete
and memory that came from malloc
is deallocated via free
, all is well. Calling free
on a new
ed pointer yields undefined behavior, however, as does delete
ing a malloc
ed pointer. The only thing to remember, then, is to segregate rigorously your new
s and delete
s from your malloc
s and free
s.
Sometimes this is easier said than done. Consider the humble (but handy) strdup
function, which, though standard in neither C nor C++, is nevertheless widely
char * strdup(const char *ps); // return a copy of the // string pointed to by ps
If a memory leak is to be avoided, the memory allocated inside strdup
must be deallocated by strdup
's caller. But how is the memory to be deallocated? By using delete
? By calling free
? If the strdup
you're calling is from a C library, it's the latter. If it was written for a C++ library, it's probably the former. What you need to do after calling strdup
, then, varies not only from system to system, but also from compiler to compiler. To reduce such portability headaches, try to avoid calling functions that are neither in the standard library (see Item E49 and Item 35) nor available in a stable form on most computing
Which brings us at long last to passing data between C++ and C programs. There's no hope of making C functions understand C++ features, so the level of discourse between the two languages must be limited to those concepts that C can express. Thus, it should be clear there's no portable way to pass objects or to pass pointers to member functions to routines written in C. C does understand normal pointers, however, so, provided your C++ and C compilers produce compatible output, functions in the two languages can safely exchange pointers to objects and pointers to non-member or static functions. Naturally, structs and variables of built-in types (e.g., int
s, char
s, etc.) can also freely cross the C++/C
Because the rules governing the layout of a struct
in C++ are consistent with those of C, it is safe to assume that a structure definition that compiles in both languages is laid out the same way by both compilers. Such structs can be safely passed back and forth between C++ and C. If you add nonvirtual functions to the C++ version of the struct, its memory layout should not change, so objects of a struct (or class) containing only non-virtual functions should be compatible with their C brethren whose structure definition lacks only the member function declarations. Adding virtual functions ends the game, because the addition of virtual functions to a class causes objects of that type to use a different memory layout (see Item 24). Having a struct inherit from another struct (or class) usually changes its layout, too, so structs with base structs (or classes) are also poor candidates for exchange with C
From a data structure perspective, it boils down to this: it is safe to pass data structures from C++ to C and from C to C++ provided the definition of those structures compiles in both C++ and C. Adding nonvirtual member functions to the C++ version of a struct that's otherwise compatible with C will probably not affect its compatibility, but almost any other change to the struct
If you want to mix C++ and C in the same program, remember the following simple
extern
"C"
.
main
in C++.
delete
with memory from new
; always use free
with memory from malloc
.