Want illusion of infinite memory
-- most programs need only a bounded amount of memory
-- but create an unbounded number of cells over time
-- most memory usage is just temporary
Knowing what's no longer needed can be tricky
-- can't usually be determined at compile time
How systems with manual management (eg, C) work
-- language provides:
an operation for creating
a cell (in C, it's called malloc)
an operation for disposing
of a cell (in C, it's called free)
-- programmer puts explicit calls to dispose in the code
-- runtime system decides where to allocate new cell
Problems
-- very hard to do exactly the right number of disposes!
-- too few: program runs out of memory (called a memory leak)
-- too many: program crashes because needed data is trashed
-- that's why programs like Purify are popular: detect leaks at runtime
Problems
-- garbage collector can produce unpredicable slowdowns
-- can be a problem for real-time systems
History
-- no commercial language had GC until Java
-- LISP has had automatic storage management since the 60's
-- many other languages with GC in between (CLU, ML)
In C
-- can't tell what's reachable because program can arbitrarily access
any part of memory
-- that's one reason that pointer arithmetic is a bad idea!
In type-safe languages
-- only way to reach a cell is through an existing reference
-- so a cell is dead if it can't be reached
Example
(define z
(let*
((x (list 1 2 3))
(y (cdr x)))
(begin
(set-cdr! (cdr y) y)
y)))
Draw a BPD that results from this. Which cells are dead?
Pair P (ie pointer to a pair)
Number N
Empty list E
Word of memory may be 4 bytes = 32 bits
-- each item takes a word
-- a pair is stored as two consecutive words
This is just another scheme
-- compare to the one you saw in class
-- for GC, this one's slightly easier to explain
-- but no fundamental difference
Example
-- here's how the memory might look after executing the code above
-- assume z is in a register (ie, ignore the representation of the
environment)
Exercise for the class: fill this in
100 N001
104 ??
108
112
116
120
Where does the marking start?
-- at roots: ie, registers
-- handling stack is a little trickier
This is not pay/view!
-- cost of sweep is proportional to memory available, not memory used
-- S&C better in this crucial respect
For a picture of this, see my old slides.
GC keeps two pointers
-- free: shows where next cell should be allocated
-- scan: points to object just relocated
-- (there's also the root, which can be thought of as a third pointer)
Tricky bit
-- when we move a cell, what about references to it that haven't yet
been copied?
-- put in a forwarding pointer using tag F (a "broken heart")
Algorithm: first step
-- set FREE and SCAN to start of free memory
-- copy pair from root to new location & update ROOT pointer accordingly
-- increment FREE
-- put a broken heart in old location
Now repeat:
-- relocate car and cdr of pair just moved (and pointed to by SCAN)
-- increment SCAN by one
-- stop when no more objects to move
What does "relocate" mean exactly?
-- if primitive value
-- if already moved the cell pointed to
replace pointer in pair being scanned by forwarding address
-- otherwise
copy cell at point marked by FREE
setup broken heart in old location pointing to new location
update pointer in pair being scanned to new location
-- ... at the end, always increment FREE
Note
-- no need for a stack or queue! just scan through relocated objects
in free space
Comparison
-- which uses space more efficiently?
-- which can be made incremental?
-- which compacts the memory?
-- which needs memory to execute?
-- how does cost depend on amount of memory used?