Split keys (wb)

2.1.4 Split keys

Split keys seem to be an accepted method for B-tree organization. In a Blink-tree, where one might potentially have to chain forward to find a desired key, the split key allows one to determine when the search can be terminated. Having the split key at the end of the block saves one block access per search. (If the split key were at the front of the block, one would have to access the next block in the chain to determine that a search could terminate.)

Last-key values

It is useful to define a largest key to be the split key of the last block at any given level; the code is made more complex if the last key is some special value (eg. the null string) that doesn’t follow lexicographic order, not to mention the problems that arise with inserting into empty (last) blocks. We’ve reserved the set of strings starting with ‘0xff’ as split keys only; they cannot be used as real key values. Last-key values are of the form xFF<level>. This makes the last key in level N+1 strictly larger than any in level N, so the end-of-chain key for level N need not be treated specially when it is inserted as a key at level N+1.

Fixed split keys/Independence of levels

We make the split keys at index levels act like those at the leaf level, that is, the split key in a block NEVER CHANGES. The advantage is that inserts and deletes cannot cause split-key changes, avoiding the need for code to propagate these changes, which are a pain to test. It also makes the levels genuinely independent, a useful conceptual simplification.

The disadvantage is that this introduces a dead zone between the last key-value pair of an index block and its split key. This complicates the code slightly, and makes the average search path slightly longer than log(N) (by a constant factor of 1+(1/BF), where BF is the branching factor of the tree). More annoyingly, it interacts with insertions for block splits (see INSERTION).