6.824 2015 Lecture 1: Introduction

Note: These lecture notes were slightly modified from the ones posted on the 6.824 course website from Spring 2015.

Distributed systems

What is a distributed system?

Why distribute?

...but:

Why take this course?

Course structure

See the course website.

Course components

Main topics

Example:

Architecture

Implementation

Performance

Fault tolerance

Consistency

Labs

Focus: fault tolerance and consistency -- central to distributed systems.

What you'll learn from the labs:

Test cases simulate failure scenarios:

We've tried to ensure that the hard problems have to do w/ distributed systems:

Lab 1: MapReduce

Computational model

Example: wc

Example: grep

Performance

Fault tolerance model

What kinds of faults might we want to tolerate?

Tools for dealing with faults?

Retry jobs

Lab 1 code

The lab 1 app (see main/wc.go):

The lab 1 sequential implementation (see mapreduce/mapreduce.go):

The lab 1 worker (see mapreduce/worker.go):

The lab 1 master (see mapreduce/master.go)