Questions for 6.S897 lecture 9 (Parallel Processing 1, 10/08).
Email your answers to 6.s897staff@gmail.com.

1) Why do you think that MapReduce writes map task outputs to files instead of pushing them directly to reduce tasks?

2) Apart from the ability to persist intermediate results, are there other differences between the Spark programming model and MapReduce that affect performance?

3) MapReduce, Spark and Spark Streaming all handle recovery from faults or stragglers by re-executing parts of the computation. Are there applications where this approach would not work, or would not be the most efficient way to recover from a failure?