6.824 2015 Lecture 8: Harp

Note: These lecture notes were slightly modified from the ones posted on the 6.824 course website from Spring 2015.

Paper: Replication in the Harp File System

Why are we reading this paper?

How could a 1991 paper still be worth reading?

The paper is a mix of fundamentals and incidentals

I'm going to focus on parts of Harp that aren't already present in Raft.

What does Harp paper explain that Raft paper does not?

Harp authors had not implemented recovery yet

Basic setup is familiar

Why are 2b+1 servers necessary to tolerate b failures?

What are Harp's witnesses?

Does primary need to send operations to witnesses?

Note: somewhat different from Raft

What's the story about the UPS?

What does the UPS buy for Harp?

Larger point, faced by every fault-tolerant system

Let's talk about Harp's log management and operation execution

What is in a typical log record?

Log record stores:

Why does Harp have so many log pointers?

Why the FP-CP gap?

Why the AP-LB gap?

What is the LB?

Why the LB-GLB gap?

When does Harp execute a client operation?

There are two answers!

  1. When operation arrives, primary figures out exactly what should happen.
  2. After the operation commits, primary and backup can apply it to their file systems.

Why does Harp split execution in this way?

Append example:

                         /---> picks up the modified inode
                        /
 --------------------------------
      |   Append1   |  Append2  | 
 ---- * -------------------------
     / \        \
      |          \-> new inode stored here
     CP

If backup crashes after it writes A1 to disk but before replying to primary, when the backup reboots there's no obvious way of telling whether it executed A1. As a result, it has to reexecute it. Thus, these log records have to be "repeatable."

Actually, a lot of replication systems have to cope with this, and this is one way to deal with it. It also illustrates how non-straightforward replication can be.

The point: multiple replay means replication isn't transparent to the service.

Can Harp primary execute read-only operations w/o replicating to backups?

What state should primary use when executing a read-only operation?

How does failure recovery work?

Setup for the following scenarios:

5 servers: S1 is usually primary, S2+S3 are backups, S4+S5 witnesses

Scenario:

  S1+S2+S3; then S1 crashes
  S2 is primary in new view (and S4 is promoted)

What if S1 crashed just before replying to a client?

After S1 recovers, with intact disk, but lost memory.

New scenario: S2 and S3 are partitioned (but still alive)

New scenario: S2 and S3 are partitioned (but still alive)

Everybody (S1-5) suffers a power failure.

When can Harp form a new view?

  1. No other view possible.
  2. Know view # of most recent view.
  3. Know all ops from most recent view.

Details:

Does Harp have performance benefits?

Why graph x=load y=response-time?

Why does response time go up with load?