9.5. 9.5 On-line scheduling

The area of scheduling theory has a huge literature. The first result in on-line scheduling belongs to Graham, who analysed the List scheduling algorithm in 1966. We can say that despite of the fact that Graham did not use the terminology which was developed in the area of the on-line algorithms, and he did not consider the algorithm as an on-line algorithm, he analysed it as an approximation algorithm.

From the area of scheduling we only recall the definitions which are used in this chapter.

In a scheduling problem we have to find an optimal schedule of jobs. We consider the parallel machines case, where machines are given, and we can use them to schedule the jobs. In the most fundamental model each job has a known processing time and to schedule the job we have to assign it to a machine, and we have to give its starting time and a completion time, where the difference between the completion time and the starting time is the processing time. No machine may simultaneously run two jobs.

Concerning the machine environment three different models are considered. If the processing time of a job is the same for each machine, then we call the machines identical machines. If each machine has a speed , the jobs have a processing weight and the processing time of job on the -th machine is , then we call the machines related machines. If the processing time of job is given by an arbitrary positive vector , where the processing time of the job on the -th machine is , then we call the machines unrelated machines.

Many objective functions are considered for scheduling problems, but here we consider only such models where the goal is the minimisation of the makespan (the maximal completion time).

In the next subsection we define the two most fundamental on-line scheduling models, and in the following two subsections we consider these models in details.

9.5.1. 9.5.1 On-line scheduling models

Probably the following models are the most fundamental examples of on-line machine scheduling problems.

LIST model

In this model we have a fixed number of machines denoted by , and the jobs arrive from a list. This means that the jobs and their processing times are revealed to the on-line algorithm one by one. When a job is revealed, the on-line algorithm has to assign the job to a machine with a starting time and a completion time irrevocably.

By the load of a machine, we mean the sum of the processing times of all jobs assigned to the machine. Since the objective function is to minimise the maximal completion time, it is enough to consider the schedules where the jobs are scheduled on the machines without idle time. For these schedules the maximal completion time is the load for each machine. Therefore this scheduling problem is reduced to a load balancing problem, i.e. the algorithm has to assign the jobs to the machines minimising the maximum load, which is the makespan in this case.

Example 9.8 Consider the LIST model and two identical machines. Consider the following sequence of jobs where the jobs are given by their processing time: . The on-line algorithm first receives job from the list, and the algorithm has to assign this job to one of the machines. Suppose that the job is assigned to machine . After that the on-line algorithm receives job from the list, and the algorithm has to assign this job to one of the machines. Suppose that the job is assigned to machine . After that the on-line algorithm receives job from the list, and the algorithm has to assign this job to one of the machines. Suppose that the job is assigned to machine . Finally, the on-line algorithm receives job from the list, and the algorithm has to assign this job to one of the machines. Suppose that the job is assigned to machine . Then the loads on the machines are and , and we can give a schedule where the maximal completion times on the machines are the loads: we can schedule the jobs on the first machine in time intervals and , and we can schedule the jobs on the second machine in time intervals and .

TIME model

In this model there are a fixed number of machines again. Each job has a processing time and a release time. A job is revealed to the on-line algorithm at its release time. For each job the on-line algorithm has to choose which machine it will run on and assign a start time. No machine may simultaneously run two jobs. Note that the algorithm is not required to assign a job immediately at its release time. However, if the on-line algorithm assigns a job at time then it cannot use information about jobs released after time and it cannot start the job before time . Our aim is to minimise the makespan.

Example 9.9 Consider the TIME model with two related machines. Let be the first machine with speed , and be the second machine with speed . Consider the following input , where the jobs are given by the (processing time, release time) pairs. Thus a job arrives at time with processing time , and the algorithm can either start to process it on one of the machines or wait for jobs with larger processing time. Suppose that the algorithm waits till time and then it starts to process the job on machine . The completion time of the job is . At time three further jobs arrive, and at that time only can be used. Suppose that the algorithm starts to process job on this machine. At time both jobs are completed. Suppose that the remaining jobs are started on machines and , and the completion times are and , thus the makespan achieved by the algorithm is . Observe that an algorithm which starts the first job immediately at time can make a better schedule with makespan . But it is important to note that in some cases it can be useful to wait for larger jobs before starting a job.

9.5.2. 9.5.2 LIST model

The first algorithm in this model has been developed by Graham. Algorithm LIST assigns each job to the machine where the actual load is minimal. If there are more machines with this property, it uses the machine with the smallest index. This means that the algorithm tries to balance the loads on the machines. The competitive ratio of this algorithm is determined by the following theorem.

Theorem 9.18 The competitive ratio of algorithm LIST is in the case of identical machines.

Proof. First we prove that the algorithm is -competitive. Consider an arbitrary input sequence denoted by , and denote the processing times by . Consider the schedule produced by LIST . Let be a job with maximal completion time. Investigate the starting time of this job. Since LIST chooses the machine with minimal load, thus the load was at least on each of the machines when was scheduled. Therefore we obtain that

This yields that

On the other hand, OPT also processes all of the jobs, thus we obtain that . Furthermore, is scheduled on one of the machines of OPT , thus . Due to these bounds we obtain that

which inequality proves that LIST is -competitive.

Now we prove that the bound is tight. Consider the following input. It contains jobs with processing time and one job with processing time . LIST assigns small jobs to each machine and the last large job is assigned to . Therefore its makespan is . On the other hand, the optimal algorithm schedules the large job on and small jobs on the other machines, and its makespan is . Thus the ratio of the makespans is which shows that the competitive ratio of LIST is at least .

Although it is hard to imagine any other algorithm for the on-line case, many other algorithms have been developed. The competitive ratios of the better algorithms tend to smaller numbers than as the number of machines tends to . Most of these algorithms are based on the following idea. The jobs are scheduled keeping the load uniformly on most of the machines, but in contrast to LIST , the loads are kept low on some of the machines, keeping the possibility of using these machines for scheduling large jobs which may arrive later.

Below we consider the more general cases where the machines are not identical. LIST may perform very badly, and the processing time of a job can be very large on the machine where the actual load is minimal. However, we can easily change the greedy idea of LIST as follows. The extended algorithm is called Greedy and it assigns the job to the machine where the load with the processing time of the job is minimal. If there are several machines which have minimal value, then the algorithm chooses the machine where the processing time of the job is minimal from them, if there are several machines with this property, the algorithm chooses the one with the smallest index from them.

Example 9.10 Consider the case of related machines where there are machines and the speeds are , . Suppose that the input is , where the jobs are defined by their processing weight. The load after the first job is on machine and on the other machines, thus is assigned to . The load after job is on all of the machines, and its processing time is minimal on machine , thus Greedy assigns it to . The load after job is on and , and on , thus the job is assigned to . The load after job is on , on , and on , thus the job is assigned to . Finally, the load after job is on , on and , and on , thus the job is assigned to .

Example 9.11 Consider the case of unrelated machines with two machines and the following input: , where the jobs are defined by the vectors of processing times. The load after job is on and on , thus the job is assigned to . The load after job is on and also on , thus the job is assigned to , because it has smaller processing time. The load after job is on and , thus the job is assigned to because it has smaller processing time. Finally, the load after job is on and on , thus the job is assigned to .

The competitive ratio of the algorithm is determined by the following theorems.

Theorem 9.19 The competitive ratio of algorithm Greedy is in the case of unrelated machines.

Proof. First we prove that the competitive ratio of the algorithm is at least . Consider the following input sequence. Let be an arbitrarily small number. The sequence contains jobs. The processing time of job is on machine , on machine , and on the other machines, (). For job , the processing time is on machine , on machine and on the other machines (, , , if and ).

In this case job is scheduled on by Greedy and the makespan is . On the other hand, the optimal off-line algorithm schedules on and is scheduled on for the other jobs, thus the optimal makespan is . The ratio of the makespans is . This ratio tends to , as tends to , and this proves that the competitive ratio of the algorithm is at least .

Now we prove that the algorithm is -competitive. Consider an arbitrary input sequence, denote the makespan in the optimal off-line schedule by and let denote the maximal load in the schedule produced by Greedy after scheduling the first jobs. Since the processing time of the -th job is at least , and the load is at most on the machines in the off-line optimal schedule, we obtain that .

We prove by induction that the inequality is valid. Since the first job is assigned to the machine where its processing time is minimal, the statement is obviously true for . Let be an arbitrary number and suppose that the statement is true for . Consider the -th job. Let be the machine where the processing time of this job is minimal. If we assign the job to , then we obtain that the load on this machines is at most from the induction hypothesis.

On the other hand, the maximal load in the schedule produced by Greedy can not be more than the maximal load in the case when the job is assigned to , thus , which means that we proved the inequality for .

Therefore we obtained that , which yields that the algorithm is -competitive.

To investigate the case of the related machines consider an arbitrary input. Let and denote the makespans achieved by GREEDY and OPT respectively. The analysis of the algorithm is based on the following lemmas which give bounds on the loads of the machines.

Lemma 9.20 The load on the fastest machine is at least .

Proof. Consider the schedule produced by GREEDY . Consider a job which causes the makespan (its completion time is maximal). If this job is scheduled on the fastest machine, then the lemma follows immediately, i.e. the load on the fastest machine is . Suppose that is not scheduled on the fastest machine. The optimal maximal load is , thus the processing time of on the fastest machine is at most . On the other hand, the completion time of is , thus at the time when the job was scheduled the load was at least on the fastest machine, otherwise Greedy would assign to the fastest machine.

Lemma 9.21 If the loads are at least on all machines having a speed of at least then the loads are at least on all machines having a speed of at least .

Proof. If , then the statement is obviously valid. Suppose that . Consider the jobs which are scheduled by Greedy on the machines having a speed of at least in the time interval . The total processing weight of these jobs is at least times the total speed of the machines having a speed of at least . This yields that there exists a job among them which is assigned by OPT to a machine having a speed of smaller than (otherwise the optimal off-line makespan would be larger than ). Let be such a job.

Since OPT schedules on a machine having a speed of smaller than , thus the processing weight of is at most . This yields that the processing time of is at most on the machines having a speed of at least . On the other hand, Greedy produces a schedule where the completion time of is at least , thus at the time when the job was scheduled the loads were at least on the machines having a speed of at most (otherwise Greedy would assign to one of these machines).

Now we can prove the following statement.

Theorem 9.22 The competitive ratio of algorithm Greedy is in the case of the related machines.

Proof. First we prove that Greedy is -competitive. Consider an arbitrary input. Let and denote the makespans achieved by Greedy and OPT , respectively.

Let be the speed of the fastest machine. Then by Lemma 9.20 the load on this machine is at least . Then using Lemma 9.21 we obtain that the loads are at least on the machines having a speed of at least . Therefore the loads are at least on the machines having a speed of at least . Denote the set of the machines having a speed of at most by .

Denote the sum of the processing weights of the jobs by . OPT can find a schedule of the jobs which has maximal load , and there are at most machines having smaller speed than , thus

On the other hand, Greedy schedules the same jobs, thus the load on some machine not included in is smaller than in the schedule produced by Greedy (otherwise we would obtain that the sum of the processing weights is greater than ).

Therefore we obtain that

which yields that , which proves that Greedy is -competitive.

Now we prove that the competitive ratio of the algorithm is at least . Consider the following set of machines: contains one machine with speed and contains machines with speed . For each , contains machines with speed , and contains machines. Observe that the number of jobs of processing weight which can be scheduled during time unit is the same on the machines of and on the machines of . It is easy to calculate that , if , thus the number of machines is .

Consider the following input sequence. In the first phase jobs arrive having processing weight , in the second phase jobs arrive having processing weight , in the -th phase jobs arrive with processing weight , and the sequence ends with the -th phase, which contains one job with processing weight . An off-line algorithm can schedule the jobs of the -th phase on the machines of set achieving maximal load , thus the optimal off-line cost is at most .

Investigate the behaviour of algorithm Greedy on this input. The jobs of the first phase can be scheduled on the machines of during time unit, and it takes also time unit to process these jobs on the machines of . Thus Greedy schedules these jobs on the machines of , and each load is on these machines after the first phase. Then the jobs of the second phase are scheduled on the machines of , the jobs of the third phase are scheduled on the machines of and so on. Finally, the jobs of the -th and -th phase are scheduled on the machine of set . Thus the cost of Greedy is , (this is the load on the machine of set ). Since , we proved the required statement.

9.5.3. 9.5.3 TIME model

In this model we investigate only one algorithm. The basic idea is to divide the jobs into groups by the release time and to use an optimal off-line algorithm to schedule the jobs of the groups. This algorithm is called interval scheduling algorithm and we denote it by INTV . Let be the release time of the first job, and . The algorithm is defined by the following pseudocode:

INTV( )

  1  
                        WHILE
                      
                     
                        NOT
                      end of sequence   2    let  be the set of the unscheduled jobs released till    3    let  be an optimal off-line schedule of the jobs of    4    schedule the jobs as it is determined by  starting the schedule at    5    let  be the maximal completion time   6    
                        IF
                      a new job is released in time interval  or the sequence is ended   7       
                        THEN
                      
                        7       
                        ELSE
                      let  be the release time of the next job   8     
                  

Example 9.12 Consider two identical machines. Suppose that the sequence of jobs is , where the jobs are defined by the (processing time, release time) pairs. In the first iteration are scheduled: an optimal off-line algorithm schedules on machine and on machine , so the jobs are completed at time . Since no new job have been released in the time interval , the algorithm waits for a new job until time . Then the second iteration starts: and are scheduled on and respectively in the time interval . During this time interval has been released thus at time the next iteration starts and INTV schedules on in the time interval .

The following statement holds for the competitive ratio of algorithm INTV .

Theorem 9.23 In the TIME model algorithm INTV is 2-competitive.

Proof. Consider an arbitrary input and the schedule produced by INTV . Denote the number of iterations by . Let , , and let denote the optimal off-line cost. In this case . This inequality is obvious if . If , then the inequality holds, because also the optimal off-line algorithm has to schedule the jobs which are scheduled in the -th iteration by INTV , and INTV uses an optimal off-line schedule in each iteration. On the other hand, . To prove this inequality first observe that the release time is at least for the jobs scheduled in the -th iteration (otherwise the algorithm would schedule them in the -th iteration).

Therefore also the optimal algorithm has to schedule these jobs after time . On the the other hand, it takes at least time units to process these jobs, because INTV uses optimal off-line algorithm in the iterations. The makespan of the schedule produced by INTV is , and we have shown that , thus we proved that the algorithm is -competitive.

Some other algorithms have also been developed in the TIME model. Vestjens proved that the on-line LPT algorithm is -competitive. This algorithm schedules the longest unscheduled, released job at each time when some machine is available. The following lower bound for the possible competitive ratios of the on-line algorithms is also given by Vestjens.

Theorem 9.24 The competitive ratio of any on-line algorithm is at least in the TIME model for minimising the makespan.

Proof. Let be the solution of the equation which belongs to the interval . We prove that no on-line algorithm can have smaller competitive ratio than . Consider an arbitrary on-line algorithm, denote it by ALG . Investigate the following input sequence.

At time one job arrives with processing time . Let be the time when the algorithm starts to process the job on one of the machines. If , then the sequence is finished and , which proves the statement. So we can suppose that .

The release time of the next job is and its processing time is . Denote its starting time by . If , then we end the sequence with jobs having release time , and processing time . In this case an optimal off-line algorithm schedules the first two jobs on the same machine and the last jobs on the other machines starting them at time , thus its cost is . On the other hand, the on-line algorithm must schedule one of the last jobs after the completion of the first or the second job, thus in this case, which yields that the competitive ratio of the algorithm is at least . Therefore we can suppose that .

At time further jobs arrive with processing times and one job with processing time . The optimal off-line algorithm schedules the second and the last jobs on the same machine, and the other jobs are scheduled one by one on the other machines and the makespan of the schedule is . Since before time none of the last jobs is started by ALG , after this time ALG must schedule at least two jobs on one of the machines and the maximal completion time is at least . Since , the ratio is minimal if , and in this case the ratio is , which proves the theorem.

Exercises

9.5-1 Prove that the competitive ratio is at least for any on-line algorithm in the case of two identical machines.

9.5-2 Prove that LIST is not constant competitive in the case of unrelated machines.

9.5-3 Prove that the modification of INTV which uses a -approximation schedule (a schedule with at most times more cost than the optimal cost) instead of the optimal off-line schedule in each step is -competitive.

  PROBLEMS  

9-1 Paging problem

Consider the special case of the -server problem, where the distance between each pair of points is . (This problem is equivalent with the on-line paging problem.) Analyse the algorithm which serves the requests not having server on their place by the server which was used least recently. (This algorithm is equivalent with the LRU paging algorithm.) Prove that the algorithm is -competitive.

9-2 ALARM2 algorithm

Consider the following alarming algorithm for the data acknowledgement problem. ALARM2 is obtained from the general definition with the values . Prove that the algorithm is not constant-competitive.

9-3 Bin packing lower bound

Prove, that no on-line algorithm can have smaller competitive ratio than using a sequence which contains items of size , , , where is a small positive number.

9-4 Strip packing with modifiable rectangles

Consider the following version of the strip packing problem. In the new model the algorithms are allowed to lengthen the rectangles keeping the area fixed. Develop a -competitive algorithm for the solution of the problem.

9-5 On-line LPT algorithm

Consider the algorithm in the TIME model which starts the longest released job to schedule at each time when a machine is available. This algorithm is called on-line LPT . Prove that the algorithm is -competitive.

  CHAPTER NOTES  

More details about the results on on-line algorithms can be found in the books [ 31 ], [ 73 ].

The first results about the -server problem (Theorems 9.1 and 9.2) are published by Manasse, McGeoch and Sleator in [ 172 ]. The presented algorithm for the line (Theorem 9.3) was developed by Chrobak, Karloff, Payne and Viswanathan (see [ 45 ]). Later Chrobak and Larmore extended the algorithm for trees in [ 46 ]. The first constant-competitive algorithm for the general problem was developed by Fiat, Rabani and Ravid (see [ 72 ]). The best known algorithm is based on the work function technique. The first work function algorithm for the problem was developed by Chrobak and Larmore in [ 47 ]. Koutsoupias and Papadimitriou have proven that the work function algorithm is -competitive in [ 147 ].

The first mathematical model for the data acknowledgement problem and the first results (Theorems 9.5 and 9.6) are presented by Dooly, Goldman, and Scott in [ 64 ]. Online algorithms with lookahead property are presented in [ 124 ]. Albers and Bals considered a different objective function in [ 11 ]. Karlin Kenyon and Randall investigated randomised algorithms for the data acknowledgement problem in [ 139 ]. The Landlord algorithm was developed by Young in [ 280 ]. The detailed description of the results in the area of on-line routing can be found in the survey [ 162 ] written by Leonardi. The exponential algorithm for the load balancing model is investigated by Aspnes, Azar, Fiat, Plotkin and Waarts in [ 13 ]. The exponential algorithm for the throughput objective function is applied by Awerbuch, Azar and Plotkin in [ 16 ].

A detailed survey about the theory of on-line bin packing is written by Csirik and Woeginger (see [ 56 ]). The algorithms NF and FF are analysed with competitive analysis by Johnson, Demers, Ullman, Garey and Graham in [ 133 ], [ 134 ], further results can be found in the PhD thesis of Johnson ([ 132 ]). Our Theorem 9.12 is a special case of Theorem 1 in [ 126 ] and Theorem 9.13 is a special case of Theorems 5.8 and 5.9 of the book [ 48 ] and Corollary 20.13 in the twentieth chapter of this book [ 22 ]. Van Vliet applied the packing patterns to prove lower bounds for the possible competitive ratios in [ 264 ], [ 265 ]. For the on-line strip packing problem algorithm was developed and analysed by Baker and Schwarz in [ 20 ]. Later further shelf packing algorithms were developed, the best shelf packing algorithm for the strip packing problem was developed by Csirik and Woeginger in [ 57 ].

A detailed survey about the results in the area of on-line scheduling was written by Sgall ([ 239 ]). The first on-line result is the analysis of algorithm LIST , it was published by Graham in [ 97 ]. Many further algorithms were developed and analysed for the case of identical machines, the algorithm with smallest competitive ratio (tends to 1.9201 as the number of machines tends to ) was developed by Fleischer and Wahl in [ 76 ]. The lower bound for the competitive ratio of Greedy in the related machines model was proved by Cho and Sahni in [ 43 ]. The upper bound, the related machines case and a more sophisticated exponential function based algorithm were presented by Aspnes, Azar, Fiat, Plotkin and Waarts in [ 13 ]. A summary of further results about the applications of the exponential function technique in the area of on-line scheduling can be found in the paper of Azar ([ 17 ]). The interval algorithm presented in the TIME model and Theorem 9.23 are based on the results of Shmoys, Wein and Williamson (see [ 236 ]). A detailed description of further results (on-line LPT , lower bounds) in the area TIME model can be found in the PhD thesis of Vestjens [ 262 ]. We presented only the most fundamental on-line scheduling models in the chapter, although an interesting model has been developed recently, where the number of the machines is not fixed, and the algorithm is allowed to purchase machines. The model is investigated in papers [ 65 ], [ 66 ], [ 123 ], [ 125 ].

Problem 9-1 is based on [ 244 ], Problem 9-2 is based on [ 64 ], Problem 9-3 is based on [ 278 ], Problem 9-4 is based on [ 122 ] and Problem 9-5 is based on [ 262 ].