Next: , Previous: , Up: B-tree Structure and Access   [Contents][Index]

2.1.6 Deletion

Deleting can be divided into two parts: the block to be deleted must be unlinked from both parent and predecessor, after which the block can be reclaimed.

Unlinking the block

At the moment, we’ve decided to simplify the delete operation by requiring that both the parent and the previous block be available for modification, else the deletion fails (is deferred). This makes block deletion an atomic change, which avoids several problems introduced by permitting partial deletions (see CONCURRENCY).

Alternatives: [WANG] gives a clever way to do deletion w/o PREV, by swapping blocks. This seemed too hard to make bulletproof for concurrency so we stuck with the PREV method, which after all only does extra block accesses 1/BF of the time.

Crashes during DELETE

What about the order for writing the parent and predecessor blocks in a delete? We elect to write out the PARENT block first, so that if the system should crash the chain will still be intact. The block that we were trying to delete will still be in the chain, but it cannot be used, it is "dead." Deferred Index Updates and concurrency discusses how to deal with "dead" blocks.

Reclaiming the block

One major change from [SAGIV] is that SAGIV assumed multiple copies of a block could exist, which makes reclaiming deleted blocks complex (he suggests a timeout-based protocol). We opted instead to track how many pointers to each block are extant by adding a lock for that purpose, the NAME lock (essentially, a DELETE lock). A routine must get NAME lock on a pointer BEFORE releasing READ/WRITE on the block from which the pointer is obtained. (SAGIV’s method is almost certainly more efficient in the sense that our method incurs overhead on every operation as opposed to just on the problem case of a READ request during a WRITE; several empirical studies of such tradeoffs support this conclusion.) On the other hand, NAME lock is useful for other things, such as insuring that the block you are PREVing from can’t be deleted while you’re looking for its parent or predecessor …

Non-symmetry of INSERT and DELETE

It is worth remembering that INSERT and DELETE are not symmetric, in the sense that a postponed insertion is NOT equivalent to deleting a (KEY,PTR) pair. The latter operation leaves a block whose pointer is missing unaccessible via the index, while the former leaves the block accessible through the NEXT pointer chain.

This asymmetry has been the death of more proposals for fixing various problems involving concurrency than I care to recall!


  1. Start with blk p with split key k at level n;
    • the index (level n+1) contains: …k,p, …
  2. split p at k’, adding block p’;
    • the index should now contain …k’,p,k,p’, …
  3. Now neither deleting p nor p’ yields the original index contents:
    • If we delete p, the result is …k,p’, …
    • If we delete p’, the result is …k’,p, …

Therefore, it is not possible for a subsequent delete to "cancel" an insert; the deletes must either wait for the relevant inserts to complete or else do special work to maintain correctness.

Next: Non-delete of last block in the chain, Previous: Insertion method, Up: B-tree Structure and Access   [Contents][Index]