Want illusion of infinite memory
-- most programs need only a bounded amount of memory
-- but create an unbounded number of cells over time
-- most memory usage is just temporary
Knowing what's no longer needed can be tricky
-- can't usually be determined at compile time
How systems with manual management (eg, C) work
-- language provides:
an operation for creating a cell (in C, it's called malloc)
an operation for disposing of a cell (in C, it's called free)
-- programmer puts explicit calls to dispose in the code
-- runtime system decides where to allocate new cell
-- very hard to do exactly the right number of disposes!
-- too few: program runs out of memory (called a memory leak)
-- too many: program crashes because needed data is trashed
-- that's why programs like Purify are popular: detect leaks at runtime
-- garbage collector can produce unpredicable slowdowns
-- can be a problem for real-time systems
-- no commercial language had GC until Java
-- LISP has had automatic storage management since the 60's
-- many other languages with GC in between (CLU, ML)
-- can't tell what's reachable because program can arbitrarily access any part of memory
-- that's one reason that pointer arithmetic is a bad idea!
In type-safe languages
-- only way to reach a cell is through an existing reference
-- so a cell is dead if it can't be reached
((x (list 1 2 3))
(y (cdr x)))
(set-cdr! (cdr y) y)
Draw a BPD that results from this. Which cells are dead?
Pair P (ie pointer to a pair)
Empty list E
Word of memory may be 4 bytes = 32 bits
-- each item takes a word
-- a pair is stored as two consecutive words
This is just another scheme
-- compare to the one you saw in class
-- for GC, this one's slightly easier to explain
-- but no fundamental difference
-- here's how the memory might look after executing the code above
-- assume z is in a register (ie, ignore the representation of the environment)
Exercise for the class: fill this in
Where does the marking start?
-- at roots: ie, registers
-- handling stack is a little trickier
This is not pay/view!
-- cost of sweep is proportional to memory available, not memory used
-- S&C better in this crucial respect
For a picture of this, see my old slides.
GC keeps two pointers
-- free: shows where next cell should be allocated
-- scan: points to object just relocated
-- (there's also the root, which can be thought of as a third pointer)
-- when we move a cell, what about references to it that haven't yet been copied?
-- put in a forwarding pointer using tag F (a "broken heart")
Algorithm: first step
-- set FREE and SCAN to start of free memory
-- copy pair from root to new location & update ROOT pointer accordingly
-- increment FREE
-- put a broken heart in old location
-- relocate car and cdr of pair just moved (and pointed to by SCAN)
-- increment SCAN by one
-- stop when no more objects to move
What does "relocate" mean exactly?
-- if primitive value
-- if already moved the cell pointed to
replace pointer in pair being scanned by forwarding address
copy cell at point marked by FREE
setup broken heart in old location pointing to new location
update pointer in pair being scanned to new location
-- ... at the end, always increment FREE
-- no need for a stack or queue! just scan through relocated objects in free space
-- which uses space more efficiently?
-- which can be made incremental?
-- which compacts the memory?
-- which needs memory to execute?
-- how does cost depend on amount of memory used?