[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: following up on speed



> On a sort-of related note, I've wondered for a long time why garbage
> collection isn't provided as an operating system service instead of
> having to be re-implemented for every single language that needs it
> (which is rapidly becoming every single language except C/C++).
> Given that writing a good GC requires knowledge of the hardware (and
> usually some assembly coding) it seems like almost a perfect example
> of something that should be done by the OS.  In addition, I suspect
> that an OS-level GC would be much more efficient, because it would
> be working on a much larger heap, with more opportunities for compaction
> etc. (I don't really know much about GC... can you tell?).  I'd
> appreciate it if any GC experts could tell me why this isn't being done.

Without making any claims to being a GC expert, I think the main issue here
is simply that OS-level GC would require objects across languages and
applications to have a well-defined format.   GC relies on being able to
traverse an application's object graph, and you can't easily do that if e.g.
a Python object doesn't have the same structure as a Lisp object - how does
the OS tell which data fields are pointers, as opposed to ones which are
simply scalar values?

Actually, that last question is partially answered by conservative
collectors like Boehm-Demers-Weiser: "Any bit patterns that represent
addresses inside heap objects managed by the collector are viewed as
pointers" (from http://www.hpl.hp.com/personal/Hans_Boehm/gc/gcdescr.html).
But there are various drawbacks to this, which are summarized at
http://www.hpl.hp.com/personal/Hans_Boehm/gc/issues.html.

To improve on conservative collection, the OS would need to know a bit more
about the object formats used by its clients.  It would certainly be
possible for an OS to provide an API that supported this, but I suspect the
full implications of this go quite far beyond the level of service provided
by traditional OSes like Unix and Windows.

One alternative is used by Microsoft's COM, as well as some CORBA
implementations - reference counting via a standardized API (other CORBA
solutions vary, or at least used to).  I imagine this isn't a good enough
solution for all objects allocated by an OS, though, for the usual reasons -
high overhead, the need to explicitly manage, problems dealing with circular
references, etc.

In fact, the Windows NT/2000/XP kernel has an "Object Manager" subsystem
which uses reference-counted objects internally (unrelated to COM), used to
represent OS-level entities like resource handles.  This feature was not
exported via the Win32 API (afaik; if it is, it's not used except in device
drivers).  I've always assumed that the reason this feature wasn't made more
generally available is because it wasn't considered sufficiently scalable.

But one answer to the question of why OS level GC isn't being done is that
it *is* in fact being done, by "operating systems" like the Java VM and the
.NET CLR (runtime).  These provide a semi-language-neutral virtual machine
with a single object representation format, which all languages and
applications on the VM share.  These environments can be quite
self-contained and can run without a complete traditional OS underneath
them, so they can be considered high-level OSes in their own right.  Within
these constrained virtual machines, it's much easier to provide services to
languages and applications that, to shamelessly quote myself, "go quite far
beyond the level of service provided by traditional OSes like Unix and
Windows".

As I said, I'm not an expert, so any corrections of misconceptions are
welcome.

Anton