Recitation 25

Today: Memory Management

Motivation
Manual scheme
Automatic schemes: GC
-- Mark & Sweep
-- Stop & Copy
Check your understanding ...

Motivation

Want illusion of infinite memory
-- most programs need only a bounded amount of memory
-- but create an unbounded number of cells over time
-- most memory usage is just temporary

Knowing what's no longer needed can be tricky
-- can't usually be determined at compile time

Manual storage management

How systems with manual management (eg, C) work
-- language provides:
an operation for creating a cell (in C, it's called malloc)
an operation for disposing of a cell (in C, it's called free)
-- programmer puts explicit calls to dispose in the code
-- runtime system decides where to allocate new cell

Problems
-- very hard to do exactly the right number of disposes!
-- too few: program runs out of memory (called a memory leak)
-- too many: program crashes because needed data is trashed
-- that's why programs like Purify are popular: detect leaks at runtime

Automatic storage management

How systems with automatic storage management (eg, LISP, Java) work
-- language provides:
an operation for creating a cell (in Java, it's called new)
NO operation for disposing of a cell
-- runtime system decides when to deallocate cells: garbage collector (GC)

Problems
-- garbage collector can produce unpredicable slowdowns
-- can be a problem for real-time systems

History
-- no commercial language had GC until Java
-- LISP has had automatic storage management since the 60's
-- many other languages with GC in between (CLU, ML)

Identifying Dead Cells

Basic idea of all storage management
-- reclaim dead cells
-- a cell is dead if it's unreachable

In C
-- can't tell what's reachable because program can arbitrarily access any part of memory
-- that's one reason that pointer arithmetic is a bad idea!

In type-safe languages
-- only way to reach a cell is through an existing reference
-- so a cell is dead if it can't be reached

Example

(define z
(let*
        ((x (list 1 2 3))
         (y (cdr x)))
    (begin
        (set-cdr! (cdr y) y)
        y)))

Draw a BPD that results from this. Which cells are dead?

Tagging

To do GC
-- must be able to distinguish pointers from primitive data
-- so tag each data item:

Pair P (ie pointer to a pair)
Number N
Empty list E

Word of memory may be 4 bytes = 32 bits
-- each item takes a word
-- a pair is stored as two consecutive words

This is just another scheme
-- compare to the one you saw in class
-- for GC, this one's slightly easier to explain
-- but no fundamental difference

Example
-- here's how the memory might look after executing the code above
-- assume z is in a register (ie, ignore the representation of the environment)

Exercise for the class: fill this in

100 N001
104 ??
108
112
116
120

Mark & Sweep

Basic idea
-- keep vector of mark bits
-- do recursive marking: need stack as deep as deepest structure
-- sweep through all of memory, and put unmarked in freelist

Where does the marking start?
-- at roots: ie, registers
-- handling stack is a little trickier

This is not pay/view!
-- cost of sweep is proportional to memory available, not memory used
-- S&C better in this crucial respect

Stop & Copy

Basic idea
-- split the memory in half
working half: cons cells allocated here
free half: unused
-- when working half fills up, move live cells over
-- now switch roles of two halves
-- (working and free are sometimes called FROM and TO spaces instead)

For a picture of this, see my old slides.

GC keeps two pointers
-- free: shows where next cell should be allocated
-- scan: points to object just relocated
-- (there's also the root, which can be thought of as a third pointer)

Tricky bit
-- when we move a cell, what about references to it that haven't yet been copied?
-- put in a forwarding pointer using tag F (a "broken heart")

Algorithm: first step
-- set FREE and SCAN to start of free memory
-- copy pair from root to new location & update ROOT pointer accordingly
-- increment FREE
-- put a broken heart in old location

Now repeat:
-- relocate car and cdr of pair just moved (and pointed to by SCAN)
-- increment SCAN by one
-- stop when no more objects to move

What does "relocate" mean exactly?
-- if primitive value
-- if already moved the cell pointed to
            replace pointer in pair being scanned by forwarding address
-- otherwise
            copy cell at point marked by FREE
            setup broken heart in old location pointing to new location
            update pointer in pair being scanned to new location
-- ... at the end, always increment FREE

Note
-- no need for a stack or queue! just scan through relocated objects in free space

Questions to check your understanding

Questions
-- why is tagging essential in both schemes?
-- why can't you GC program in C?

Comparison
-- which uses space more efficiently?
-- which can be made incremental?
-- which compacts the memory?
-- which needs memory to execute?
-- how does cost depend on amount of memory used?

Daniel Jackson
December 3, 1999