6.824 2015 Lecture 15: Optimism, Causality, Vector Timestamps

Note: These lecture notes were slightly modified from the ones posted on the 6.824 course website from Spring 2015.

Consistency so far:

Sequential and release consistency are slow:

Paxos:

Optimistic Concurrency Control

A simple example -- optimistic peer-to-peer chat

Diagram:

m0              m1              m2 
\             /\              /\
 \------------/               /
  \                          /
   \------------------------/

Do we care about message ordering for chat?

What went wrong in this example?

Suppose this is an auction chat program:

Joe         Fred        Alice

$10 -->
            20
          <-- -->  

                 <-- winner is $20

If there were a 4th person, Sam:

Joe         Fred        Alice               Sam

$10 -->                                   sees $10
            20  
          <-- -->                         does not see $20 

                 <-- winner is $20 -->    sees winner is $20

So to Sam this might not make sense. His problem is that Sam didn't know what Alice knew when she sent her message.

Definition: x causally precedes y

Definition: causal consistency

Pros, cons:

Slow implementation of causal consistency

History sets will grow huge -- can we abbreviate?

Vector Timestamp

VT comparisons

Many systems use VT variants, but for somewhat different purposes

CBCAST -- "causal broadcast" protocol

[diagram: node, msg buf, VC, chat app]

    APP         ^
         |      |
    -----|------|-----------
        \ /     |  CBCAST
         .   
    ---------      vector
    | m3    |      clock
    ---------      VT 
    | wait  |
    ---------
    | m1    |

Code:

on receive(message m, node i, timestamp v):
    release when:
        this node's vector clock VT >= v EXCEPT FOR v[i] = VT[i] + 1

Example:

    All VCs start <0,0,0>
    M0 sends msg1 w/ <1,0,0>
    M1 receives msg1 w/ <1,0,0>
    M1 sends msg2 w/ <1,1,0>
    M2 receives msg2 w/ <1,1,0> -- must delay because don't have msg1
    M2 receives msg1 w/ <1,0,0> -- can process, unblocks other msg

Why fast?

Causal consistency still allows more surprises than sequential

TreadMarks uses VTs to order writes to same variable by different machines:

  M0: a1 x=1 r1    a2 y=9 r2
  M1:              a1 x=2 r1
  M2:                           a1 a2 z=x+y r2 r1

  Could M2 hear x=2 from M1, then x=1 from M0?
  How does M2 know what to do?

VTs are often used for optimistic updating of replicated data

File synchronization -- e.g. Ficus

Scenario:

Constraint: No Lost Updates

Example 1:

  Focus on a single file

  H1: f=1 \----------\
  H2:      \->  f=2   \               /--> ???
  H3:                  \-> tell H2 --/

  What is the right thing to do?
  Is it enough to simply take file with latest modification time?
  Yes in this case, as long as you carry them along correctly.
    I.e. H3 remembers mtime assigned by H1, not mtime of sync.

Example 2:

   mtime = 10 | mtime = 20 | mtime = 25

  H1: f=1 --\       f=2              /-->
  H2:        \-->             f=0 --/
  H3: 

  H2's mtime will be bigger.

  Should the file synchronizer use "0" and discard "2"?
    No! They were conflicting changes. We need to detect this case.
    Modification times are not enough by themselves

What if there were concurrent updates?

How to decide if one version contains all of another's updates?

We can use VTs to compress these histories!

VTs for Example 1:

VTs for Example 2:

What if there are conflicting updates?

How to think about VTs for file synchronization?

What about file deletion?

How to delete the VTs of deleted files?

Is it enough to wait until all hosts have seen the delete msg?

"Wait until everyone has seen delete" doesn't work:

Diagram:

                 | Phase 1              | Phase 2               | Phase 3 (forget f's VT)
H1: del f  \     | seen f  -\->         | done f  -\->          |
H2:         \--> | seen f  -/-> (bcast) | done f  -/-> (bcast)  |
H3:         |--> | seen f  -\->         | done f  -\->          |

Working VT GC scheme from Ficus replicated file system

A classic problem with VTs:

Many file synchronizers don't use VTs -- e.g. Unison, rsync

Summary