Previous: Relational Infrastructure, Up: Database Packages [Contents][Index]

`(require 'wt-tree)`

Balanced binary trees are a useful data structure for maintaining large sets of ordered objects or sets of associations whose keys are ordered. MIT Scheme has an comprehensive implementation of weight-balanced binary trees which has several advantages over the other data structures for large aggregates:

- In addition to the usual element-level operations like insertion, deletion and lookup, there is a full complement of collection-level operations, like set intersection, set union and subset test, all of which are implemented with good orders of growth in time and space. This makes weight balanced trees ideal for rapid prototyping of functionally derived specifications.
- An element in a tree may be indexed by its position under the ordering of the keys, and the ordinal position of an element may be determined, both with reasonable efficiency.
- Operations to find and remove minimum element make weight balanced trees simple to use for priority queues.
- The implementation is
*functional*rather than*imperative*. This means that operations like ‘inserting’ an association in a tree do not destroy the old tree, in much the same way that`(+ 1 x)`

modifies neither the constant 1 nor the value bound to`x`

. The trees are referentially transparent thus the programmer need not worry about copying the trees. Referential transparency allows space efficiency to be achieved by sharing subtrees.

These features make weight-balanced trees suitable for a wide range of applications, especially those that require large numbers of sets or discrete maps. Applications that have a few global databases and/or concentrate on element-level operations like insertion and lookup are probably better off using hash-tables or red-black trees.

The *size* of a tree is the number of associations that it
contains. Weight balanced binary trees are balanced to keep the sizes
of the subtrees of each node within a constant factor of each other.
This ensures logarithmic times for single-path operations (like lookup
and insertion). A weight balanced tree takes space that is proportional
to the number of associations in the tree. For the current
implementation, the constant of proportionality is six words per
association.

Weight balanced trees can be used as an implementation for either
discrete sets or discrete maps (associations). Sets are implemented by
ignoring the datum that is associated with the key. Under this scheme
if an associations exists in the tree this indicates that the key of the
association is a member of the set. Typically a value such as
`()`

, `#t`

or `#f`

is associated with the key.

Many operations can be viewed as computing a result that, depending on
whether the tree arguments are thought of as sets or maps, is known by
two different names. An example is `wt-tree/member?`

, which, when
regarding the tree argument as a set, computes the set membership
operation, but, when regarding the tree as a discrete map,
`wt-tree/member?`

is the predicate testing if the map is defined at
an element in its domain. Most names in this package have been chosen
based on interpreting the trees as sets, hence the name
`wt-tree/member?`

rather than `wt-tree/defined-at?`

.

The weight balanced tree implementation is a run-time-loadable option. To use weight balanced trees, execute

(load-option 'wt-tree)

once before calling any of the procedures defined here.

- Construction of Weight-Balanced Trees
- Basic Operations on Weight-Balanced Trees
- Advanced Operations on Weight-Balanced Trees
- Indexing Operations on Weight-Balanced Trees

Next: Basic Operations on Weight-Balanced Trees, Previous: Weight-Balanced Trees, Up: Weight-Balanced Trees [Contents][Index]

Binary trees require there to be a total order on the keys used to
arrange the elements in the tree. Weight balanced trees are organized
by *types*, where the type is an object encapsulating the ordering
relation. Creating a tree is a two-stage process. First a tree type
must be created from the predicate which gives the ordering. The tree
type is then used for making trees, either empty or singleton trees or
trees from other aggregate structures like association lists. Once
created, a tree ‘knows’ its type and the type is used to test
compatibility between trees in operations taking two trees. Usually a
small number of tree types are created at the beginning of a program and
used many times throughout the program’s execution.

- procedure+:
**make-wt-tree-type***key<?*¶ This procedure creates and returns a new tree type based on the ordering predicate

`key<?`.`Key<?`must be a total ordering, having the property that for all key values`a`

,`b`

and`c`

:(key<? a a) ⇒ #f (and (key<? a b) (key<? b a)) ⇒ #f (if (and (key<? a b) (key<? b c)) (key<? a c) #t) ⇒ #t

Two key values are assumed to be equal if neither is less than the other by

`key<?`.Each call to

`make-wt-tree-type`

returns a distinct value, and trees are only compatible if their tree types are`eq?`

. A consequence is that trees that are intended to be used in binary tree operations must all be created with a tree type originating from the same call to`make-wt-tree-type`

.

- variable+:
**number-wt-type**¶ A standard tree type for trees with numeric keys.

`Number-wt-type`

could have been defined by(define number-wt-type (make-wt-tree-type <))

- variable+:
**string-wt-type**¶ A standard tree type for trees with string keys.

`String-wt-type`

could have been defined by(define string-wt-type (make-wt-tree-type string<?))

- procedure+:
**make-wt-tree***wt-tree-type*¶ This procedure creates and returns a newly allocated weight balanced tree. The tree is empty, i.e. it contains no associations.

`Wt-tree-type`is a weight balanced tree type obtained by calling`make-wt-tree-type`

; the returned tree has this type.

- procedure+:
**singleton-wt-tree***wt-tree-type key datum*¶ This procedure creates and returns a newly allocated weight balanced tree. The tree contains a single association, that of

`datum`with`key`.`Wt-tree-type`is a weight balanced tree type obtained by calling`make-wt-tree-type`

; the returned tree has this type.

- procedure+:
**alist->wt-tree***tree-type alist*¶ Returns a newly allocated weight-balanced tree that contains the same associations as

`alist`. This procedure is equivalent to:(lambda (type alist) (let ((tree (make-wt-tree type))) (for-each (lambda (association) (wt-tree/add! tree (car association) (cdr association))) alist) tree))

Next: Advanced Operations on Weight-Balanced Trees, Previous: Construction of Weight-Balanced Trees, Up: Weight-Balanced Trees [Contents][Index]

This section describes the basic tree operations on weight balanced trees. These operations are the usual tree operations for insertion, deletion and lookup, some predicates and a procedure for determining the number of associations in a tree.

- procedure+:
**wt-tree/empty?***wt-tree*¶ Returns

`#t`

if`wt-tree`contains no associations, otherwise returns`#f`

.

- procedure+:
**wt-tree/size***wt-tree*¶ Returns the number of associations in

`wt-tree`, an exact non-negative integer. This operation takes constant time.

- procedure+:
**wt-tree/add***wt-tree key datum*¶ Returns a new tree containing all the associations in

`wt-tree`and the association of`datum`with`key`. If`wt-tree`already had an association for`key`, the new association overrides the old. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in`wt-tree`.

- procedure+:
**wt-tree/add!***wt-tree key datum*¶ Associates

`datum`with`key`in`wt-tree`and returns an unspecified value. If`wt-tree`already has an association for`key`, that association is replaced. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in`wt-tree`.

- procedure+:
**wt-tree/member?***key wt-tree*¶ Returns

`#t`

if`wt-tree`contains an association for`key`, otherwise returns`#f`

. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in`wt-tree`.

- procedure+:
**wt-tree/lookup***wt-tree key default*¶ Returns the datum associated with

`key`in`wt-tree`. If`wt-tree`doesn’t contain an association for`key`,`default`is returned. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in`wt-tree`.

- procedure+:
**wt-tree/delete***wt-tree key*¶ Returns a new tree containing all the associations in

`wt-tree`, except that if`wt-tree`contains an association for`key`, it is removed from the result. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in`wt-tree`.

- procedure+:
**wt-tree/delete!***wt-tree key*¶ If

`wt-tree`contains an association for`key`the association is removed. Returns an unspecified value. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in`wt-tree`.

Next: Indexing Operations on Weight-Balanced Trees, Previous: Basic Operations on Weight-Balanced Trees, Up: Weight-Balanced Trees [Contents][Index]

In the following the *size* of a tree is the number of associations
that the tree contains, and a *smaller* tree contains fewer
associations.

- procedure+:
**wt-tree/split<***wt-tree bound*¶ Returns a new tree containing all and only the associations in

`wt-tree`which have a key that is less than`bound`in the ordering relation of the tree type of`wt-tree`. The average and worst-case times required by this operation are proportional to the logarithm of the size of`wt-tree`.

- procedure+:
**wt-tree/split>***wt-tree bound*¶ Returns a new tree containing all and only the associations in

`wt-tree`which have a key that is greater than`bound`in the ordering relation of the tree type of`wt-tree`. The average and worst-case times required by this operation are proportional to the logarithm of size of`wt-tree`.

- procedure+:
**wt-tree/union***wt-tree-1 wt-tree-2*¶ Returns a new tree containing all the associations from both trees. This operation is asymmetric: when both trees have an association for the same key, the returned tree associates the datum from

`wt-tree-2`with the key. Thus if the trees are viewed as discrete maps then`wt-tree/union`

computes the map override of`wt-tree-1`by`wt-tree-2`. If the trees are viewed as sets the result is the set union of the arguments. The worst-case time required by this operation is proportional to the sum of the sizes of both trees. If the minimum key of one tree is greater than the maximum key of the other tree then the time required is at worst proportional to the logarithm of the size of the larger tree.

- procedure+:
**wt-tree/intersection***wt-tree-1 wt-tree-2*¶ Returns a new tree containing all and only those associations from

`wt-tree-1`which have keys appearing as the key of an association in`wt-tree-2`. Thus the associated data in the result are those from`wt-tree-1`. If the trees are being used as sets the result is the set intersection of the arguments. As a discrete map operation,`wt-tree/intersection`

computes the domain restriction of`wt-tree-1`to (the domain of)`wt-tree-2`. The time required by this operation is never worse that proportional to the sum of the sizes of the trees.

- procedure+:
**wt-tree/difference***wt-tree-1 wt-tree-2*¶ Returns a new tree containing all and only those associations from

`wt-tree-1`which have keys that*do not*appear as the key of an association in`wt-tree-2`. If the trees are viewed as sets the result is the asymmetric set difference of the arguments. As a discrete map operation, it computes the domain restriction of`wt-tree-1`to the complement of (the domain of)`wt-tree-2`. The time required by this operation is never worse that proportional to the sum of the sizes of the trees.

- procedure+:
**wt-tree/subset?***wt-tree-1 wt-tree-2*¶ Returns

`#t`

iff the key of each association in`wt-tree-1`is the key of some association in`wt-tree-2`, otherwise returns`#f`

. Viewed as a set operation,`wt-tree/subset?`

is the improper subset predicate. A proper subset predicate can be constructed:(define (proper-subset? s1 s2) (and (wt-tree/subset? s1 s2) (< (wt-tree/size s1) (wt-tree/size s2))))

As a discrete map operation,

`wt-tree/subset?`

is the subset test on the domain(s) of the map(s). In the worst-case the time required by this operation is proportional to the size of`wt-tree-1`.

- procedure+:
**wt-tree/set-equal?***wt-tree-1 wt-tree-2*¶ Returns

`#t`

iff for every association in`wt-tree-1`there is an association in`wt-tree-2`that has the same key, and*vice versa*.Viewing the arguments as sets

`wt-tree/set-equal?`

is the set equality predicate. As a map operation it determines if two maps are defined on the same domain.This procedure is equivalent to

(lambda (wt-tree-1 wt-tree-2) (and (wt-tree/subset? wt-tree-1 wt-tree-2 (wt-tree/subset? wt-tree-2 wt-tree-1)))

In the worst-case the time required by this operation is proportional to the size of the smaller tree.

- procedure+:
**wt-tree/fold***combiner initial wt-tree*¶ This procedure reduces

`wt-tree`by combining all the associations, using an reverse in-order traversal, so the associations are visited in reverse order.`Combiner`is a procedure of three arguments: a key, a datum and the accumulated result so far. Provided`combiner`takes time bounded by a constant,`wt-tree/fold`

takes time proportional to the size of`wt-tree`.A sorted association list can be derived simply:

(wt-tree/fold (lambda (key datum list) (cons (cons key datum) list)) '()

`wt-tree`))The data in the associations can be summed like this:

(wt-tree/fold (lambda (key datum sum) (+ sum datum)) 0

`wt-tree`)

- procedure+:
**wt-tree/for-each***action wt-tree*¶ This procedure traverses the tree in-order, applying

`action`to each association. The associations are processed in increasing order of their keys.`Action`is a procedure of two arguments which take the key and datum respectively of the association. Provided`action`takes time bounded by a constant,`wt-tree/for-each`

takes time proportional to in the size of`wt-tree`. The example prints the tree:(wt-tree/for-each (lambda (key value) (display (list key value)))

`wt-tree`))

- procedure+:
**wt-tree/union-merge***wt-tree-1 wt-tree-2 merge*¶ Returns a new tree containing all the associations from both trees. If both trees have an association for the same key, the datum associated with that key in the result tree is computed by applying the procedure

`merge`to the key, the value from`wt-tree-1`and the value from`wt-tree-2`.`Merge`is of the form(lambda (

`key``datum-1``datum-2`) ...)If some key occurs only in one tree, that association will appear in the result tree without being processed by

`merge`, so for this operation to make sense, either`merge`must have both a right and left identity that correspond to the association being absent in one of the trees, or some guarantee must be made, for example, all the keys in one tree are known to occur in the other.These are all reasonable procedures for

`merge`(lambda (key val1 val2) (+ val1 val2)) (lambda (key val1 val2) (append val1 val2)) (lambda (key val1 val2) (wt-tree/union val1 val2))

However, a procedure like

(lambda (key val1 val2) (- val1 val2))

would result in a subtraction of the data for all associations with keys occuring in both trees but associations with keys occuring in only the second tree would be copied, not negated, as is presumably be intent. The programmer might ensure that this never happens.

This procedure has the same time behavior as

`wt-tree/union`

but with a slightly worse constant factor. Indeed,`wt-tree/union`

might have been defined like this:(define (wt-tree/union tree1 tree2) (wt-tree/union-merge tree1 tree2 (lambda (key val1 val2) val2)))

The `merge` procedure takes the `key` as a parameter in case the
data are not independent of the key.

Previous: Advanced Operations on Weight-Balanced Trees, Up: Weight-Balanced Trees [Contents][Index]

Weight balanced trees support operations that view the tree as sorted sequence of associations. Elements of the sequence can be accessed by position, and the position of an element in the sequence can be determined, both in logarthmic time.

- procedure+:
**wt-tree/index***wt-tree index*¶ - procedure+:
**wt-tree/index-datum***wt-tree index*¶ - procedure+:
**wt-tree/index-pair***wt-tree index*¶ Returns the 0-based

`index`th association of`wt-tree`in the sorted sequence under the tree’s ordering relation on the keys.`wt-tree/index`

returns the`index`th key,`wt-tree/index-datum`

returns the datum associated with the`index`th key and`wt-tree/index-pair`

returns a new pair`(`

which is the`key`.`datum`)`cons`

of the`index`th key and its datum. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.These operations signal an error if the tree is empty, if

`index``<0`

, or if`index`is greater than or equal to the number of associations in the tree.Indexing can be used to find the median and maximum keys in the tree as follows:

median: (wt-tree/indexwt-tree(quotient (wt-tree/sizewt-tree) 2)) maximum: (wt-tree/indexwt-tree(-1+ (wt-tree/sizewt-tree)))

- procedure+:
**wt-tree/rank***wt-tree key*¶ Determines the 0-based position of

`key`in the sorted sequence of the keys under the tree’s ordering relation, or`#f`

if the tree has no association with for`key`. This procedure returns either an exact non-negative integer or`#f`

. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.

- procedure+:
**wt-tree/min***wt-tree*¶ - procedure+:
**wt-tree/min-datum***wt-tree*¶ - procedure+:
**wt-tree/min-pair***wt-tree*¶ Returns the association of

`wt-tree`that has the least key under the tree’s ordering relation.`wt-tree/min`

returns the least key,`wt-tree/min-datum`

returns the datum associated with the least key and`wt-tree/min-pair`

returns a new pair`(key . datum)`

which is the`cons`

of the minimum key and its datum. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree.These operations signal an error if the tree is empty. They could be written

(define (wt-tree/min tree) (wt-tree/index tree 0)) (define (wt-tree/min-datum tree) (wt-tree/index-datum tree 0)) (define (wt-tree/min-pair tree) (wt-tree/index-pair tree 0))

- procedure+:
**wt-tree/delete-min***wt-tree*¶ Returns a new tree containing all of the associations in

`wt-tree`except the association with the least key under the`wt-tree`’s ordering relation. An error is signalled if the tree is empty. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree. This operation is equivalent to(wt-tree/delete

`wt-tree`(wt-tree/min`wt-tree`))

- procedure+:
**wt-tree/delete-min!***wt-tree*¶ Removes the association with the least key under the

`wt-tree`’s ordering relation. An error is signalled if the tree is empty. The average and worst-case times required by this operation are proportional to the logarithm of the number of associations in the tree. This operation is equivalent to(wt-tree/delete!

`wt-tree`(wt-tree/min`wt-tree`))

Previous: Relational Infrastructure, Up: Database Packages [Contents][Index]