Workload Shift

For the fifth simulation experiment, we wanted to see how the algorithms responded to workload shifts. We ran each algorithm using trace-driven churn and capacity for ten hours. Halfway through each run, we changed the query destinations from one moderately skewed set to another (both with $\alpha=1.2$). We recorded statistics throughout the trace. As noted, each algorithm activates after 400 seconds. Each node initiated 10 queries per second on average.

We monitored success rates and VS activity. VS activity captures the amount of state transfer that occurs due to natural and artificial churn. When a node enters or leaves the system, the number of VS actions equals the number of VSs in use. Creating or destroying a VS is also a VS action. Each k-Choices and Threshold relocate is two VS actions; each transfer is one. Conservatively, we did not include Threshold's neighbor-adjustments or Transfer's splits as VS actions.

The results are plotted in Figure 15. We show the success rate on the left y-axis and VS activity on the right y-axis. The results show that active k-Choices sustains $>75\%$ success rates, recovering immediately after the workload shift. Passive k-Choices (not shown) gradually plateaus at about $40\%$. We found that in systems with higher rates of churn, passive reached equilibrium more quickly. As soon as active k-Choices is activated, the success rates dramatically increase. With current tuning, however, active produces an order-of-magnitude more VS activity than passive. After the shift, k-Choices active settles to a slightly lower success rate because queries heading to the new highest ranked spot take slightly more hops per average query: a change from $6.8$ to $7.3$.

Proportion, Transfer, and Threshold all portray greater variance in success rates than k-Choices. Proportion exhibits the greatest average VS activity and has the largest average hop count at $10.5$ hops per successful query. The performance of Threshold steadily declines as its gaps become tightly clustered. That Transfer stabilizes at different levels had two causes. First, a burst of births soon after the shift caused more accurate fingers than average and a burst of deaths at $6$ hours caused the decline because many fingers became invalid. Second, after the shift, the average path to highest ranked destination was fewer hops than before. Although to a lesser extent than Threshold, Transfer's hop count steadily rises as nodes move to arbitrarily compressed locations.

Figure 15: Plot of success rate and VS activity during a workload shift.
\includegraphics{graphs/shift-kchoices-active} \includegraphics{graphs/shift-transfer} \includegraphics{graphs/shift-proportional} \includegraphics{graphs/shift-threshold}

Jonathan Ledlie 2006-01-06