6.824 2015 Lecture 12: Eventual Consistency

Note: These lecture notes were slightly modified from the ones posted on the 6.824 course website from Spring 2015.

Exam

Bayou: Eventual consistency

Conflicts

Meeting room scheduler

Traditional approach (central server):

    PDA
|-----------------
|9am    824 staff     |----------------|
|--                   |  Server        |
|10am        -------------> | DB   |   |
|--                         | 9am  |   |
|11am                       | 10am |   |
|--                                    |
|12pm                                  |
|--

Not a good approach because it requires everyone to have connectivity to the server.

Would be nice if you have PDA send appointment to laptop, which can then send it to the server.

    PDA
|-----------------
|9am    824 staff     |----------------|
|--                   |  Server        |
|10am                       | DB   |   |
|--                         | 9am  |   | <-----\
|11am                       | 10am |   |        \
|--                                    |         |
|12pm                                  |      laptop
|--      \                                      /
          \----------------------------------->/

Update functions

Main idea: Update functions. Instead of the application saying "write this DB record", the application hands a function that behaves differently based on what's in the DB.

Example:

Bayou takes this function from the PDA and gives it to the laptop.

Suppose A and B want the same times:

If you simply apply these functions to node A's and B's databases, that's not enough:

=> have to execute A's and B's update functions in the same order

Numbering updates

Next idea: number update functions, so that you can view them as being a log

If we take the previous example:

<T=10, nodeId=A>, A wants: either staff meeting at 10  or 11
<T=20, nodeId=B>, B wants: hiring meeting at 10 or 11

We need to be able to roll back and re-execute the log.

Are the updates consistent with causality?

If some 3rd node sees these updates, it would be necessary to have the meeting creation timestamp be smaller than the deletion timestamp.

Lamport logical clock

Each node maintains T_max, the highest timestamp this node has ever seen from itself or from another node.

When a node creates an event and adds it to the log, it picks timestamp T = max (T_max + 1, wall clock time)

Tentative entries, commit scheme

It's annoying that entries in the calendar are always displayed as tentative because another (earlier) update could come in and replace it.

We're looking for a way to all agree that anything above a certain point in the log will never change (it's frozen, no one can modify stuff there)

Bad idea: One possibility is to have all the replicas exchange summary w/ each other about what they've seen:

Commit scheme for Bayou

They have one magic node, a primary. Every update that passes through the primary, the primary stamps it with a commit sequence number (CSN), the actual ordering number becomes: <csn, T, node ID>

If you don't have a CSN: <-, T, nodeID>, all commited operations are considered to occur before uncommitted ones.

TODO: not clear what this example was supposed to show

Vector timestamps

Synchronization

A new node joins

Now some VTs will have an entry for some new node Z. For instance, in the previous example

We also need a way to remove nodes.

But B won't know if Z is newly added or newly deleted?

Forgetting nodes:

Now B needs to figure out from A's updates if Z was added or removed

Case 1: If B's VT entry for X is less than the timestamp in Z's ID, then that means that B hasn't even seen the creation for Z, let alone any updates from Z => B should create the entry for Z because Z is new to B

Case 2: If B's VT entry for X is higher than the timestamp in Z's ID, (ie. B has seen updates from X after it created Z), then B must've seen Z's creation => B must have seen a deletion notice

Q: If Z's entry is missing from B then Z (probably?) says <-, T, Z> bye, T > Tz


6.824 notes

Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System Terry, Theimer, Petersen, Demers, Spreitzer, Hauser, SOSP 95

Some material from Flexible Update Propagation for Weakly Consistent Replication, SOSP 97

Why this paper?

Paper context:

Let's build a conference room scheduler

Traditional approach: one server

Why aren't we satisfied with central server? - I want to use scheduler on disconnected iPhone &c + So need DB replica in each node. + Modify on any node, as well as read. - Periodic connectivity to net. - Periodic direct contact with other calendar users (e.g. bluetooth).

Straw man 1: merge DBs

Idea for conflicts: update functions

Problem: can't just apply update functions to DB replica

Goal: eventual consistency

Idea: ordered update log

How can nodes agree on update order?

Example:

 <10,A>: staff meeting at 10:00 or 11:00
 <20,B>: hiring meeting at 10:00 or 11:00

 what's the correct eventual outcome?
   the result of executing update functions in timestamp order
   staff at 10:00, hiring at 11:00

What DB content before sync?

Now A and B sync with each other

Roll back and replay

Displayed meeting room calendar entries are "tentative"

Will update order be consistent with wall-clock time?

Will update order be consistent with causality?

Lamport logical clocks for causal consistency

Logical clock solves add/delete causality example - When B sees <10,A>, + B will set its Tmax to 10, so + B will generate <11,B> for its delete

Irritating that there could always be a long-delayed update with lower TS

Bad idea: a fully decentralized "commit" scheme

How does Bayou commit updates, so that they are stable?

Will commit order match tentative order?

Will commit order always match tentative order?

Committing allows app to tell users which calendar entries are stable.

Nodes can discard committed updates.

How do I sync if I've discarded part of my log?

How to sync?

How could we cope with a new server Z joining the system?

What happens when Z retires (leaves the system)?

Bayou's retirement plan

How does ID=<Tz,X> scheme help disambiguate new vs forgotten?

Let's step back.

Is eventual consistency a useful idea?

Are update conflicts a real problem?

Is Bayou's complexity warranted?

But there's are good ideas for us to learn from Bayou