Weight-Balanced Trees (slib)

6.3 Weight-Balanced Trees

(require 'wt-tree)

Balanced binary trees are a useful data structure for maintaining large sets of ordered objects or sets of associations whose keys are ordered. MIT Scheme has an comprehensive implementation of weight-balanced binary trees which has several advantages over the other data structures for large aggregates:

In addition to the usual element-level operations like insertion, deletion and lookup, there is a full complement of collection-level operations, like set intersection, set union and subset test, all of which are implemented with good orders of growth in time and space. This makes weight balanced trees ideal for rapid prototyping of functionally derived specifications.
An element in a tree may be indexed by its position under the ordering of the keys, and the ordinal position of an element may be determined, both with reasonable efficiency.
Operations to find and remove minimum element make weight balanced trees simple to use for priority queues.
The implementation is functional rather than imperative. This means that operations like ‘inserting’ an association in a tree do not destroy the old tree, in much the same way that (+ 1 x) modifies neither the constant 1 nor the value bound to x. The trees are referentially transparent thus the programmer need not worry about copying the trees. Referential transparency allows space efficiency to be achieved by sharing subtrees.

These features make weight-balanced trees suitable for a wide range of applications, especially those that require large numbers of sets or discrete maps. Applications that have a few global databases and/or concentrate on element-level operations like insertion and lookup are probably better off using hash-tables or red-black trees.

The size of a tree is the number of associations that it contains. Weight balanced binary trees are balanced to keep the sizes of the subtrees of each node within a constant factor of each other. This ensures logarithmic times for single-path operations (like lookup and insertion). A weight balanced tree takes space that is proportional to the number of associations in the tree. For the current implementation, the constant of proportionality is six words per association.

Weight balanced trees can be used as an implementation for either discrete sets or discrete maps (associations). Sets are implemented by ignoring the datum that is associated with the key. Under this scheme if an associations exists in the tree this indicates that the key of the association is a member of the set. Typically a value such as (), #t or #f is associated with the key.

Many operations can be viewed as computing a result that, depending on whether the tree arguments are thought of as sets or maps, is known by two different names. An example is wt-tree/member?, which, when regarding the tree argument as a set, computes the set membership operation, but, when regarding the tree as a discrete map, wt-tree/member? is the predicate testing if the map is defined at an element in its domain. Most names in this package have been chosen based on interpreting the trees as sets, hence the name wt-tree/member? rather than wt-tree/defined-at?.

The weight balanced tree implementation is a run-time-loadable option. To use weight balanced trees, execute

(load-option 'wt-tree)

once before calling any of the procedures defined here.

Construction of Weight-Balanced Trees
Basic Operations on Weight-Balanced Trees
Advanced Operations on Weight-Balanced Trees
Indexing Operations on Weight-Balanced Trees

6.3.1 Construction of Weight-Balanced Trees

Binary trees require there to be a total order on the keys used to arrange the elements in the tree. Weight balanced trees are organized by types, where the type is an object encapsulating the ordering relation. Creating a tree is a two-stage process. First a tree type must be created from the predicate which gives the ordering. The tree type is then used for making trees, either empty or singleton trees or trees from other aggregate structures like association lists. Once created, a tree ‘knows’ its type and the type is used to test compatibility between trees in operations taking two trees. Usually a small number of tree types are created at the beginning of a program and used many times throughout the program’s execution.

procedure+: make-wt-tree-type key<? ¶

This procedure creates and returns a new tree type based on the ordering predicate key<?. Key<? must be a total ordering, having the property that for all key values a, b and c:

(key<? a a)                         ⇒ #f
(and (key<? a b) (key<? b a))       ⇒ #f
(if (and (key<? a b) (key<? b c))
    (key<? a c)
    #t)                             ⇒ #t

Two key values are assumed to be equal if neither is less than the other by key<?.

Each call to make-wt-tree-type returns a distinct value, and trees are only compatible if their tree types are eq?. A consequence is that trees that are intended to be used in binary tree operations must all be created with a tree type originating from the same call to make-wt-tree-type.

variable+: number-wt-type ¶

A standard tree type for trees with numeric keys. Number-wt-type could have been defined by

(define number-wt-type (make-wt-tree-type  <))

variable+: string-wt-type ¶

A standard tree type for trees with string keys. String-wt-type could have been defined by

(define string-wt-type (make-wt-tree-type  string<?))

procedure+: make-wt-tree wt-tree-type ¶: This procedure creates and returns a newly allocated weight balanced tree. The tree is empty, i.e. it contains no associations. Wt-tree-type is a weight balanced tree type obtained by calling make-wt-tree-type; the returned tree has this type.

procedure+: singleton-wt-tree wt-tree-type key datum ¶: This procedure creates and returns a newly allocated weight balanced tree. The tree contains a single association, that of datum with key. Wt-tree-type is a weight balanced tree type obtained by calling make-wt-tree-type; the returned tree has this type.

procedure+: alist->wt-tree tree-type alist ¶

Returns a newly allocated weight-balanced tree that contains the same associations as alist. This procedure is equivalent to:

(lambda (type alist)
  (let ((tree (make-wt-tree type)))
    (for-each (lambda (association)
                (wt-tree/add! tree
                              (car association)
                              (cdr association)))
              alist)
    tree))

6.3.2 Basic Operations on Weight-Balanced Trees

This section describes the basic tree operations on weight balanced trees. These operations are the usual tree operations for insertion, deletion and lookup, some predicates and a procedure for determining the number of associations in a tree.

procedure+: wt-tree/empty? wt-tree ¶: Returns #t if wt-tree contains no associations, otherwise returns #f.

procedure+: wt-tree/size wt-tree ¶: Returns the number of associations in wt-tree, an exact non-negative integer. This operation takes constant time.

procedure+: wt-tree/add wt-tree key datum ¶: Returns a new tree containing all the associations in wt-tree and the association of datum with key. If wt-tree already had an association for key, the new association overrides the old. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/add! wt-tree key datum ¶: Associates datum with key in wt-tree and returns an unspecified value. If wt-tree already has an association for key, that association is replaced. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/member? key wt-tree ¶: Returns #t if wt-tree contains an association for key, otherwise returns #f. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/lookup wt-tree key default ¶: Returns the datum associated with key in wt-tree. If wt-tree doesn’t contain an association for key, default is returned. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/delete wt-tree key ¶: Returns a new tree containing all the associations in wt-tree, except that if wt-tree contains an association for key, it is removed from the result. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

procedure+: wt-tree/delete! wt-tree key ¶: If wt-tree contains an association for key the association is removed. Returns an unspecified value. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in wt-tree.

6.3.3 Advanced Operations on Weight-Balanced Trees

In the following the size of a tree is the number of associations that the tree contains, and a smaller tree contains fewer associations.

procedure+: wt-tree/split< wt-tree bound ¶: Returns a new tree containing all and only the associations in wt-tree which have a key that is less than bound in the ordering relation of the tree type of wt-tree. The average and worst-case times required by this operation are proportional to the logarithm of the size of wt-tree.

procedure+: wt-tree/split> wt-tree bound ¶: Returns a new tree containing all and only the associations in wt-tree which have a key that is greater than bound in the ordering relation of the tree type of wt-tree. The average and worst-case times required by this operation are proportional to the logarithm of size of wt-tree.

procedure+: wt-tree/union wt-tree-1 wt-tree-2 ¶: Returns a new tree containing all the associations from both trees. This operation is asymmetric: when both trees have an association for the same key, the returned tree associates the datum from wt-tree-2 with the key. Thus if the trees are viewed as discrete maps then wt-tree/union computes the map override of wt-tree-1 by wt-tree-2. If the trees are viewed as sets the result is the set union of the arguments. The worst-case time required by this operation is proportional to the sum of the sizes of both trees. If the minimum key of one tree is greater than the maximum key of the other tree then the time required is at worst proportional to the logarithm of the size of the larger tree.

procedure+: wt-tree/intersection wt-tree-1 wt-tree-2 ¶: Returns a new tree containing all and only those associations from wt-tree-1 which have keys appearing as the key of an association in wt-tree-2. Thus the associated data in the result are those from wt-tree-1. If the trees are being used as sets the result is the set intersection of the arguments. As a discrete map operation, wt-tree/intersection computes the domain restriction of wt-tree-1 to (the domain of) wt-tree-2. The time required by this operation is never worse that proportional to the sum of the sizes of the trees.

procedure+: wt-tree/difference wt-tree-1 wt-tree-2 ¶: Returns a new tree containing all and only those associations from wt-tree-1 which have keys that do not appear as the key of an association in wt-tree-2. If the trees are viewed as sets the result is the asymmetric set difference of the arguments. As a discrete map operation, it computes the domain restriction of wt-tree-1 to the complement of (the domain of) wt-tree-2. The time required by this operation is never worse that proportional to the sum of the sizes of the trees.

procedure+: wt-tree/subset? wt-tree-1 wt-tree-2 ¶

Returns #t iff the key of each association in wt-tree-1 is the key of some association in wt-tree-2, otherwise returns #f. Viewed as a set operation, wt-tree/subset? is the improper subset predicate. A proper subset predicate can be constructed:

(define (proper-subset? s1 s2)
  (and (wt-tree/subset? s1 s2)
       (< (wt-tree/size s1) (wt-tree/size s2))))

As a discrete map operation, wt-tree/subset? is the subset test on the domain(s) of the map(s). In the worst-case the time required by this operation is proportional to the size of wt-tree-1.

procedure+: wt-tree/set-equal? wt-tree-1 wt-tree-2 ¶

Returns #t iff for every association in wt-tree-1 there is an association in wt-tree-2 that has the same key, and vice versa.

Viewing the arguments as sets wt-tree/set-equal? is the set equality predicate. As a map operation it determines if two maps are defined on the same domain.

This procedure is equivalent to

(lambda (wt-tree-1 wt-tree-2)
  (and (wt-tree/subset? wt-tree-1 wt-tree-2
       (wt-tree/subset? wt-tree-2 wt-tree-1)))

In the worst-case the time required by this operation is proportional to the size of the smaller tree.

procedure+: wt-tree/fold combiner initial wt-tree ¶

This procedure reduces wt-tree by combining all the associations, using an reverse in-order traversal, so the associations are visited in reverse order. Combiner is a procedure of three arguments: a key, a datum and the accumulated result so far. Provided combiner takes time bounded by a constant, wt-tree/fold takes time proportional to the size of wt-tree.

A sorted association list can be derived simply:

(wt-tree/fold  (lambda (key datum list)
                 (cons (cons key datum) list))
               '()
               wt-tree))

The data in the associations can be summed like this:

(wt-tree/fold  (lambda (key datum sum) (+ sum datum))
               0
               wt-tree)

procedure+: wt-tree/for-each action wt-tree ¶

This procedure traverses the tree in-order, applying action to each association. The associations are processed in increasing order of their keys. Action is a procedure of two arguments which take the key and datum respectively of the association. Provided action takes time bounded by a constant, wt-tree/for-each takes time proportional to in the size of wt-tree. The example prints the tree:

(wt-tree/for-each (lambda (key value)
                    (display (list key value)))
                  wt-tree))

procedure+: wt-tree/union-merge wt-tree-1 wt-tree-2 merge ¶

Returns a new tree containing all the associations from both trees. If both trees have an association for the same key, the datum associated with that key in the result tree is computed by applying the procedure merge to the key, the value from wt-tree-1 and the value from wt-tree-2. Merge is of the form

(lambda (key datum-1 datum-2) ...)

If some key occurs only in one tree, that association will appear in the result tree without being processed by merge, so for this operation to make sense, either merge must have both a right and left identity that correspond to the association being absent in one of the trees, or some guarantee must be made, for example, all the keys in one tree are known to occur in the other.

These are all reasonable procedures for merge

(lambda (key val1 val2) (+ val1 val2))
(lambda (key val1 val2) (append val1 val2))
(lambda (key val1 val2) (wt-tree/union val1 val2))

However, a procedure like

(lambda (key val1 val2) (- val1 val2))

would result in a subtraction of the data for all associations with keys occuring in both trees but associations with keys occuring in only the second tree would be copied, not negated, as is presumably be intent. The programmer might ensure that this never happens.

This procedure has the same time behavior as wt-tree/union but with a slightly worse constant factor. Indeed, wt-tree/union might have been defined like this:

(define (wt-tree/union tree1 tree2)
  (wt-tree/union-merge tree1 tree2
                       (lambda (key val1 val2) val2)))

The merge procedure takes the key as a parameter in case the data are not independent of the key.

6.3.4 Indexing Operations on Weight-Balanced Trees

Weight balanced trees support operations that view the tree as sorted sequence of associations. Elements of the sequence can be accessed by position, and the position of an element in the sequence can be determined, both in logarthmic time.

procedure+: wt-tree/index wt-tree index ¶

procedure+: wt-tree/index-datum wt-tree index ¶

procedure+: wt-tree/index-pair wt-tree index ¶

Returns the 0-based indexth association of wt-tree in the sorted sequence under the tree’s ordering relation on the keys. wt-tree/index returns the indexth key, wt-tree/index-datum returns the datum associated with the indexth key and wt-tree/index-pair returns a new pair (key . datum) which is the cons of the indexth key and its datum. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.

These operations signal an error if the tree is empty, if index<0, or if index is greater than or equal to the number of associations in the tree.

Indexing can be used to find the median and maximum keys in the tree as follows:

median:  (wt-tree/index wt-tree (quotient (wt-tree/size wt-tree) 2))

maximum: (wt-tree/index wt-tree (-1+ (wt-tree/size wt-tree)))

procedure+: wt-tree/rank wt-tree key ¶: Determines the 0-based position of key in the sorted sequence of the keys under the tree’s ordering relation, or #f if the tree has no association with for key. This procedure returns either an exact non-negative integer or #f. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.

procedure+: wt-tree/min wt-tree ¶

procedure+: wt-tree/min-datum wt-tree ¶

procedure+: wt-tree/min-pair wt-tree ¶

Returns the association of wt-tree that has the least key under the tree’s ordering relation. wt-tree/min returns the least key, wt-tree/min-datum returns the datum associated with the least key and wt-tree/min-pair returns a new pair (key . datum) which is the cons of the minimum key and its datum. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.

These operations signal an error if the tree is empty. They could be written

(define (wt-tree/min tree)        (wt-tree/index tree 0))
(define (wt-tree/min-datum tree)  (wt-tree/index-datum tree 0))
(define (wt-tree/min-pair tree)   (wt-tree/index-pair tree 0))

procedure+: wt-tree/delete-min wt-tree ¶

Returns a new tree containing all of the associations in wt-tree except the association with the least key under the wt-tree’s ordering relation. An error is signalled if the tree is empty. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree. This operation is equivalent to

(wt-tree/delete wt-tree (wt-tree/min wt-tree))

procedure+: wt-tree/delete-min! wt-tree ¶

Removes the association with the least key under the wt-tree’s ordering relation. An error is signalled if the tree is empty. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree. This operation is equivalent to

(wt-tree/delete! wt-tree (wt-tree/min wt-tree))