6.824 2015 Lecture 3: Primary/Backup Replication

Note: These lecture notes were slightly modified from the ones posted on the 6.824 course website from Spring 2015.

Today

Fault tolerance

Need a failure model: what will we try to cope with?

Core idea: replication

Example: fault-tolerant MapReduce master

Big questions

Two main approaches:

  1. State transfer
  2. Replicated state machine

State transfer is simpler

Replicated state machine can be more efficient

Remus: High Availability via Asynchronous Virtual Machine Replication, NSDI 2008

Very ambitious system

Plan 1 (slow, broken):

Q: Is Plan 1 correct (as described above)?

Q: What will outside world see if primary fails and replica takes over?

Q: How to decide if primary has failed?

Q: How will clients know to talk to backup rather than primary?

Q: What if site-wide power failure?

Q: What if primary fails while sending state to backup?

Q: What if primary gets request, sends checkpoint to backup, and just before replying primary fails?

Q: Is Plan 1 efficient?

Remus epochs, checkpoints

  1. Primary runs for a while in Epoch 1 (E1), holding E1's output
  2. Primary pauses
  3. Primary copies RAM+disk changes from E1 to local buffer
  4. Primary resumes execution in E2, holding E2's output
  5. Primary sends checkpoint of RAM+disk to backup
  6. Backup copies all to separate RAM, then applies, then ACKs
  7. Primary releases E1's output
  8. Backup applies E1's changes to RAM and disk

If primary fails, backup finishes applying last epoch's disk+RAM, then starts executing

Q: Any externally visible anomalies?

Q: What if primary receives + executes a request, crashes before checkpoint? backup won't have seen request!

Q: If primary sends a packet, then crashes, is backup guaranteed to have state changes implied by that packet?

Q: What if primary crashes partway through release of output? must backup re-send? How does it know what to re-send?

Q: How does Remus decide it should switch to backup?

Q: Are there situations in which Remus will incorrectly activate the backup? i.e. primary is actually alive

Q: When primary recovers, how does Remus restore replication? Needed, since eventually active ex-backup will itself fail

Q: What if both fail, e.g. site-wide power failure?

Q: In what situations will Remus likely have good performance?

Q: In what situations will Remus likely have low performance?

Q: Should epochs be short or long?

Remus evaluation

Why so slow?

How could one get better performance for replication?

Primary-backup replication in Lab 2

Outline

"View server" decides who primary p and backup b are

Repair

Key points:

  1. Only one primary at a time!
  2. The primary must have the latest state!

We will work out some rules to ensure these

View server

Example:

    view #, primary, backup
    0:      --       --
    1:      S1       --
    2:      S1       S2
    4:      S2       --
    3:      S2       S3

How to ensure new primary has up-to-date replica of state?

Q: Can more than one server think it is primary?

    1: S1, S2
       net broken, so viewserver thinks S1 dead but it's alive
    2: S2, --
    now S1 alive and not aware of view #2, so S1 still thinks it is primary
    AND S2 alive and thinks it is primary
    => split brain, no good

How to ensure only one server acts as primary?

...even though more than one may think it is primary.

"Acts as" == executes and responds to client requests

The basic idea:

    1: S1 S2
    2: S2 --
    S1 still thinks it is primary
    S1 must forward ops to S2
    S2 thinks S2 is primary
    so S2 must reject S1's forwarded ops

The rules:

  1. Primary in view i must have been primary or backup in view i-1
  2. Primary must wait for backup to accept each request
  3. Non-backup must reject forwarded requests
  4. Non-primary must reject direct client requests
  5. Every operation must be before or after state transfer

Example:

    1: S1, S2
       viewserver stops hearing Pings from S1
    2: S2, --
       it may be a while before S2 hears about view #2

    before S2 hears about view #2
      S1 can process ops from clients, S2 will accept forwarded requests
      S2 will reject ops from clients who have heard about view #2
    after S2 hears about view #2
      if S1 receives client request, it will forward, S2 will reject
        so S1 can no longer act as primary
      S1 will send error to client, client will ask viewserver for new view,
         client will re-send to S2
    the true moment of switch-over occurs when S2 hears about view #2

How can new backup get state?

Rule for state transfer:

Q: Does primary need to forward Get()'s to backup?

Q: How could we make primary-only Get()'s work?

Q: Are there cases when the Lab 2 protocol cannot make forward progress?