Scheduling with Implicit Information in Distributed Systems (1998, Berkeley) Jonathan Ledlie CS 736 April 19, 2000 One central difficulty with clusters is that jobs which have been spread acress a cluster are often not scheduled to run at the same time. Thus, when one process sens another process which is part of the same job a message, it won't get the message and act on it until the next time it gets the processor. Once it responds, assuming it can do so with its one quantum, the originating process will problably be swapped out. Two context switches will have occurred where non would have been neccessary if the job have been concurrently scheuled. The obvious solution to this problem is for the schedulers to communicate about when they are planning on running a job and explicitly coordinate their schedules. The difficulties with this are that it becomes exponentially more complicated and uses exponentially more messages as nods in the cluster grow. The Implicit Scheduling paper portrays that we can approach the concurrent scheduling with explicit messages and scale by having the message layer convey which jobs need to be coscheduled to the scheduler, by watching each job's activity. The message layer tells the scheduler how long to wait on a job (how long its quantum should be) based on the round trip times of messages and the context switch times on the remote machine(s). If the process does not receive a message in this time (while it is spinning), it calls a "barrier" and blocks -- ending before its quantum and increasing its priority. My points are: P1: Even though these priorities may help the scheduler, they are not required: implicit coscheduling does not need priorities to achieve coordination as orginally proposed in the previous simulations." P2: It does scale much better than explicit scheduling, even when running several competing jobs (section 5.0). P3: Implicit scheduling still allows local policies to dominate, resulting in good response for interactive jobs. N: I'm not sure what would happen in a highly heterogeneous environment as far as tracking round trips and context switches. This paper had 16 Sparcs where this information would be the same but how would you know how long to spin for if the nodes were different.