[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Python's GC approach



jeremy@alum.mit.edu (Jeremy Hylton) writes:

> >>>>> "MM" == Morgan McGuire <morgan3d@yahoo.com> writes:
> 
>   >> I'm amused because I chose the Boehm gc in REBOL 1.0 because I
>   >> thought it *was* rather portable and reliable.
> 
>   MM> I think that it is more portable than many other solutions but
>   MM> makes a bad first impression because it happens to have well
>   MM> documented the few platform-dependent issues.
> 
> The bad first impression had as much to do with its
> platform-dependence as with the various limitations it places on the
> C code you write and how it interacts with libraries.  If I
> misunderstand the warnings or if you can comment on your experience,
> I'd appreciate it.

We ran the Boehm GC on Windows (98 & NT), Unix (Linux, FreeBsd,
Solaris), MacIntosh, and even an Amiga.  The collector has several
options and `modes of operation'.  Some of these (e.g. generational
scavenging) require more assistance from the OS and compiler than
others.  The basic non-generational, single-threaded mode of operation
seems to work on a large variety of platforms.

As far as dealing with the compiler goes, we compiled under MSVC 5 for
windows, GCC for the various unices, whateveritwas (Code Warrior? or
is that just the editor?) on the Mac, and Sabre C on the Amiga.  We
*did* turn of certain optimizations that had the possibility of
confusing the GC (e.g. strength reduction), but the basic
optimizations (inlining, peephole) were left on.

> The limitations that worried me most were about:
> 
>     - only working with a limited set of threads packages on various
>       platforms,
>
>     - requiring that code that uses threads by modified to include the
>       gc header file because it replaces some of the pthreads calls
>       with macros that help it track things,

Threading is a hairy issue.  The fact of the matter is that there are
*still* commercial unix systems that have well-known race conditions
in their process code, let alone in the thread libraries.  One of our
design goals was that the code run *exactly* the same on *all*
platforms, so we could only take advantage of threads if we could
emulate them on every platform.  There were some ideas kicked around,
but the original release of REBOL didn't support threads.
(Architectural decisions in later releases make it highly unlikely
that threads will *ever* be supported in REBOL.)

>     - concern about whether (given the about constraints) it is
>       feasible to extend or embed an interpreter that uses the Boehm
>       GC with a large, multi-threaded C app that isn't similarly
>       modified and recompiled.

We used few external libraries, though (how many libraries run on both
Solaris, Windows, Mac, and Amiga?); we never encountered a case where
we had to modify the library header files.  In *theory*, you ought to
be able to simply  #define malloc GC_malloc  and shadow the entry
point to the `real' malloc at link time.

> The limited experience of Python developers using the Boehm collector
> was frustrating, because it failed when we integrated Python with
> Tcl/TK.
> 
> On the other hand, Python pays a real price for not using the Boehm
> GC.  We have to maintain our own mark-and-sweep collector for catching
> cycles.  We have to manually track reference counts in all our C
> code.  But Python runs on platforms that the Boehm GC hasn't been
> ported to and plays nicely with C libraries.  Worse is better :-).

You don't *have* to use the Boehm GC in conservative mode.  Since you
are maintaining refcounts and marks (and since you haven't modified
the library headers) you aren't managing the storage allocated by
library code.  So don't tell the GC about it.  There are
customizations galore in the Boehm GC that allow you to give it extra
information about what you are doing, so if you are in the position of
having complete knowledge about memory usage you ought to be able to
avoid many of the problems you noted above.  At the worst, you'd end
up writing your own GC.  (The hairiest issue with the Boehm GC is
doing the conservative scan of the stack.  This is why all the thread
hackery exists.  If you don't need to do a scan of the stack, you can
basically ignore the thread stuff.)


Another deciding factor was familiarity.  I've written and maintained
several GC's, and I felt that debugging, porting, and extending the
Boehm GC would be no harder than debugging, porting, and extending a
hand-rolled one.  Easier, perhaps.