6.824 2015 Lecture 5: Paxos

Note: These lecture notes were slightly modified from the ones posted on the 6.824 course website from Spring 2015.

Intro

Starting a new group of lectures on stronger fault tolerance

Paxos

Links:

Recall: RSM

Lab 2 critique

We would like a general-purpose ordering scheme with:

Paxos will be a key building block for this.

What does Paxos provide? How does it work?

How to build a system using Paxos?

  1. Primary/Backup like Lab 2, but use Paxos to replicate the ViewServer
  2. Lab 3: no ViewServer, all replicas use Paxos instead of primary/backup

Replicating either the ViewServer or K/V server with Paxos is similar.

Will look at a sketch of how to do a Paxos-based K/V server.

The basic idea:

Example:

Example:

Q: Why a log?

Q: What about agreement -- we need all replicas to have same op in each log slot

Agreement is hard (1):

Agreement is hard (2):

Two main ideas in Paxos:

  1. Many rounds may be required but they will converge on one value
  2. A majority is required for agreement -- prevent "split brain"

Lab 3B K/V server creates a separate Paxos instance for each client Put, Get (much harder)

Paxos sketch

Broken strawman: can we do Paxos in a single round?

Basic Paxos exchange

     proposer          acceptors

           prepare(n) ->
        <- prepare_ok(n, n_a, v_a)

           accept(n, v') ->
        <- accept_ok(n)

           decided(v') ->

Why n?

Definition: server S accepts n/v

Definition: n/v is chosen - a majority of servers accepted n/v

The crucial property:

So:

Now the protocol -- see the handout

    proposer(v):
      choose n, unique and higher than any n seen so far
      send prepare(n) to all servers including self
      if prepare_ok(n, n_a, v_a) from majority:
        v' = v_a with highest n_a; choose own v otherwise
        send accept(n, v') to all
        if accept_ok(n) from majority:
          send decided(n, v') to all

    acceptor state:
      must persist across reboots
      n_p (highest prepare seen)
      n_a, v_a (highest accept seen)

    acceptor's prepare(n) handler:
      if n > n_p
        n_p = n
        reply prepare_ok(n, n_a, v_a)
      else
        reply prepare_reject

    acceptor's accept(n, v) handler:
      if n >= n_p
        n_p = n
        n_a = n
        v_a = v
        reply accept_ok(n)
      else
        reply accept_reject

Example 1 (normal operation):

    `S1`, `S2`, `S3` 
    [ but `S3` is dead or slow ]

    `S1`: -> starts proposal w/ n=1 v=A
    `S1`: <- p1   <- a1vA    <- dA
    `S2`: <- p1   <- a1vA    <- dA
    `S3`: dead...

    "p1" means Sx receives prepare(n=1)
    "a1vA" means Sx receives accept(n=1, v=A)
    "dA" means Sx receives decided(v=A)

These diagrams are not specific about who the proposer is

Note Paxos only requires a majority of the servers

What would happen if network partition?

The homework question

How does Paxos ensure that the following sequence of events can't happen? What actually happens, and which value is ultimately chosen?

  proposer 1 crashes after sending two accept() requests
  proposer 2 has a different value in mind

  A: p1 a1foo
  B: p1       p2 a2bar
  C: p1 a1foo p2 a2bar

  C's prepare_ok to B really included "foo"
    thus a2foo, and so no problem

The point:

Example 2 (concurrent proposers):

    A1 starts proposing n=10 by sending prepare(n=10) 
    A1 sends out just one accept v=10
    [ => A1 must have received prepare_ok from majority ]
    A3 starts proposing n=11
      but A1 does not receive its proposal
      [ => A1 never replies to A3 with prepare_ok(n=11, n_a=10, v=10) because it never got the prepare ]
      A3 only has to wait for a majority of proposal responses

    A1: p10 a10v10 
    A2: p10        p11
    A3: p10        p11  a11v11

    A1 and A3 have accepted different values!

What will happen?

What if A3 (2nd proposer) were to crash at this point (and not restart)?

How about this:

A1: p10  a10v10               p12
A2: p10          p11  a11v11  
A3: p10          p11          p12   a12v10

Has the system agreed to a value at this point?

What's the commit point?

Why does the proposer need to pick v_a with highest n_a?

    A1: p10  a10vA               p12
    A2: p10          p11  a11vB  
    A3: p10          p11  a11vB  p12   a12v??

    n=11 already agreed on vB
    n=12 sees both vA and vB, but must choose vB

Two cases:

  1. There was a majority before n=11
  2. There was not a majority before n=11

Why does prepare handler check that n > n_p?

Why does accept handler check n >= n_p?

Scenario:

    A1: p1 p2 a1vA
    A2: p1 p2 a1vA a2vB
    A3: p1 p2      a2vB

Why does accept handler update n_p = n?

Scenario:

    A1: p1    a2vB a1vA p3 a3vA
    A2: p1 p2           p3 a3vA
    A3:    p2 a2vB

What if new proposer chooses n < old proposer?

What if an acceptor crashes after receiving accept?

A1: p1  a1v1
A2: p1  a1v1 reboot  p2  a2v?
A3: p1               p2  a2v?

A2 must remember v_a/n_a across reboot! on disk
  might be only intersection with new proposer's majority
  and thus only evidence that already agreed on v1

What if an acceptor reboots after sending prepare_ok?

Example:

  `S1`: p10            a10v10
  `S2`: p10 p11 reboot a10v10 a11v11
  `S3`:     p11               a11v11

Can Paxos get stuck?