xiaoqiang q. cai xianyi wu xian zhou optimal stochastic … · 2016-11-22 · tion of stochastic...

International Series in Operations Research & Management Science

Xiaoqiang Q. CaiXianyi WuXian Zhou

Optimal Stochastic Scheduling

International Series in OperationsResearch & Management Science

Volume 207

Series Editor:Frederick S. HillierStanford University, CA, USA

For further volumes:http://www.springer.com/series/6161

Xiaoqiang Q. Cai • Xianyi Wu • Xian Zhou

Optimal StochasticScheduling

123

Xiaoqiang Q. CaiDepartment of Systems Engineering

and Engineering ManagementThe Chinese University of Hong KongShatin, N.T., Hong Kong SAR

Xian ZhouDepartment of Applied Finance

and Actuarial StudiesMacquarie UniversityNorth Ryde, NSWAustralia

Xianyi WuDepartment of Statistics

and Actuarial ScienceEast China Normal UniversityShanghai, People’s Republic of China

ISSN 0884-8289 ISSN 2214-7934 (electronic)ISBN 978-1-4899-7404-4 ISBN 978-1-4899-7405-1 (eBook)DOI 10.1007/978-1-4899-7405-1Springer New York Heidelberg Dordrecht London

Library of Congress Control Number: 2014930759

© Springer Science+Business Media New York 2014This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodologynow known or hereafter developed. Exempted from this legal reservation are brief excerpts in connectionwith reviews or scholarly analysis or material supplied specifically for the purpose of being enteredand executed on a computer system, for exclusive use by the purchaser of the work. Duplication ofthis publication or parts thereof is permitted only under the provisions of the Copyright Law of thePublisher’s location, in its current version, and permission for use must always be obtained from Springer.Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violationsare liable to prosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date of pub-lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for anyerrors or omissions that may be made. The publisher makes no warranty, express or implied, with respectto the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Preface

Machine scheduling concerns about how to optimally allocate the limited resources(machines available) to process jobs over time. It is a decision-making processthat plays a crucial role in many environments, including manufacturing, logistics,healthcare, communications, and computing systems. In some industries, such astransportation, scheduling is the mission-critical decision that directly determinesthe effectiveness and even the survival of a business. There have been examples in-dicating how a good scheduling solution can enable an organization to significantlyenhance its efficiency or a system (e.g., an airline) to quickly recover from a majordisruption.

Scheduling is a discipline that has been extensively studied for several decades,with various models established and results derived. However, while there is a largeliterature on scheduling problems, the majority of the research has been devoted todeterministic scheduling in which all attributes of the problem, such as the amountof time required to process a job and the deadline to complete it, are assumed to beexactly known in advance without any uncertainty. Clearly such an assumption ishardly justifiable in practical situations where more often such parameters are notknown in advance and can only be estimated with a varying level of uncertainty. Inaddition, the majority of scheduling problems studied in the literature assume thatthe machine to be used to process the jobs is continuously available until all jobsare completed. In reality, however, it is a common phenomenon that a machine maybreak down randomly from time to time.

As Albert Einstein said: “As far as the laws of mathematics refer to reality, theyare not certain, as far as they are certain, they do not refer to reality.” Research inter-ests have been increasingly devoted to stochastic scheduling in recent years, whichincorporates the approaches of probability and stochastic processes into schedulingproblems to account for uncertainties from different sources. Many interesting andimportant results on stochastic scheduling problems have been developed, with theaid of probability theory. The main purpose of this book is to provide a compre-hensive and unified coverage of studies in this area. Our objective is two-fold: (i) tosummarize the elementary models and results in stochastic scheduling, so as to offeran entry-level reading material for students to learn and understand the fundamen-tals of this area; and (ii) to include in details the latest developments and research

v

vi Preface

topics on stochastic scheduling, so as to provide a useful reference for researchersand practitioners who are performing research and development work in this area.

Accordingly, the materials of this book are organized into two clusters:Chaps. 1–4 cover more fundamental models and results, whereas Chaps. 5–10 elab-orate on more advanced topics. Specifically, In Chap. 1, we first provide the rele-vant basic theory of probability, and then introduce the basic concepts and nota-tion of stochastic scheduling. In Chaps. 2 and 3, we review those well-establishedmodels and scheduling policies, under regular and irregular performance measures,respectively. Chapter 4 describes models with stochastic machine breakdowns.Chapters 5 and 6 introduce, respectively, the optimal stopping problems and themulti-armed bandit processes, which are necessary for studies of more advancedsubjects. Chapter 7 is focused on dynamic policies. Chapter 8 describes stochasticscheduling with incomplete information, where the probability distributions of ran-dom variables contain also unknown parameters, which can however be estimatedprogressively according to updated information. Chapter 9 is devoted to the situationwhere the processing time of a job depends on the time when it is started. Lastly, inChap. 10 we describe several recent models beyond those in Chaps. 1–9.

This book is intended for researchers, practitioners, and graduate students andsenior-year undergraduates as a unified reference and textbook on optimal stochasticscheduling. While the various topics are presented within the general framework ofstochastic scheduling, we will try to make each chapter relatively self-contained sothat they can be read separately. Also, for each model presented, apart from the for-mulation of the model and the descriptions of the relevant properties and schedulingpolicies, we will try to provide as much as possible discussion on the open questionsand the likely directions for further research.

The publication of this book would not be possible without the help and gener-ous support of many people and organizations. First, we would like to express oursincere gratitude to Prof. Fred Hillier, the Editor of Springer’s book series in Oper-ations Research and Management Science, for his encouragement and in particularhis patience to wait for the completion of our manuscript. We are indebted to thepublishers and staff members of Springer, for their support for us to complete thisbook project. Many of our colleagues and students have kindly provided us withinvaluable comments and suggestions in various occasions such as seminars andconferences. Part of our researches that comprise several chapters in this book havebeen financially supported by the Research Grants Council of Hong Kong underGeneral Research Fund Nos. 410509 and 410211, Natural Science Foundation ofChina (NSFC) Grant Nos. 71071056 and 71371074 and Australian Research Coun-cil Discovery Project Grant No. DP1094153.

Last but not least, we must express our most sincere gratitude to our families,for their continued and selfless support over the many days and nights during ourwriting of this book.

Hong Kong Xiaoqiang Q. CaiShanghai Xianyi WuNSW Xian Zhou

Contents

1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Fundamental of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.1.3 Family of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.2 Stochastic Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.2.1 Definitions of Stochastic Orders . . . . . . . . . . . . . . . . . . . . . . . . 231.2.2 Relations Between Stochastic Orders . . . . . . . . . . . . . . . . . . . . 261.2.3 Existence of Stochastic Orders . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.3 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.3.1 Job Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.3.2 Machine Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.3.3 Scheduling Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371.3.4 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2 Regular Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.1 Total Completion Time Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.1.1 Single Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.1.2 Parallel Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.2 Makespan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.3 Regular Costs with Due Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.3.1 Weighted Number of Tardy Jobs . . . . . . . . . . . . . . . . . . . . . . . . 602.3.2 Total Weighted Tardiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.4 General Regular Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662.4.1 Total Expected Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672.4.2 Maximum Expected Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2.5 Exponential Processing Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742.5.1 Optimal Sequence for General Costs . . . . . . . . . . . . . . . . . . . . 762.5.2 Optimal Sequences with Due Dates . . . . . . . . . . . . . . . . . . . . . 792.5.3 Examples of Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

vii

viii Contents

2.6 Compound-Type Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852.6.1 Classes of Compound-Type Distributions . . . . . . . . . . . . . . . . 852.6.2 Optimal Sequences for Total Expected Costs . . . . . . . . . . . . . 892.6.3 Optimal Sequences with Due Dates . . . . . . . . . . . . . . . . . . . . . 91

3 Irregular Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953.1 Earliness/Tardiness Penalties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

3.1.1 Normal Processing Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963.1.2 Exponential Processing Times . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.2 Expected Cost of Earliness and Tardy Jobs . . . . . . . . . . . . . . . . . . . . . 1173.2.1 Single Machine Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183.2.2 Parallel Machine Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3.3 Completion Time Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1273.3.1 The Weighted Variance Problem. . . . . . . . . . . . . . . . . . . . . . . . 1283.3.2 Structural Property of Optimal Sequence . . . . . . . . . . . . . . . . 1303.3.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4 Stochastic Machine Breakdowns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1414.1 Formulation of Breakdown Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 142

4.1.1 Machine Breakdown Processes . . . . . . . . . . . . . . . . . . . . . . . . . 1424.1.2 Processing Time and Achievement . . . . . . . . . . . . . . . . . . . . . . 143

4.2 No-Loss (Preemptive-Resume) Model . . . . . . . . . . . . . . . . . . . . . . . . . 1454.2.1 Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1454.2.2 Minimizing Regular Cost Functions . . . . . . . . . . . . . . . . . . . . . 1464.2.3 Minimizing Irregular Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

4.3 Total-Loss (Preemptive-Repeat) Model . . . . . . . . . . . . . . . . . . . . . . . . 1574.3.1 Expected Occupying Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1584.3.2 Minimizing the Expected Weighted Flowtime . . . . . . . . . . . . 1644.3.3 Maximizing the Expected Discounted Reward . . . . . . . . . . . . 169

4.4 Partial-Loss Breakdown Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

5 Optimal Stopping Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1875.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

5.1.1 σ -Algebras and Monotone Class Theorems . . . . . . . . . . . . . . 1885.1.2 σ -Algebras vs Linear Spaces of Measurable Functions . . . . . 1905.1.3 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1915.1.4 Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1925.1.5 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1925.1.6 Essential Supremum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

5.2 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975.2.1 Information Filtrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975.2.2 Stochastic Processes as Stochastic Functions of Time . . . . . . 199

5.3 Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Contents ix

5.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2065.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2065.4.2 Doob’s Stopping Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2075.4.3 Upcrossings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2095.4.4 Maxima Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2115.4.5 Martingale Convergence Theorems . . . . . . . . . . . . . . . . . . . . . 2135.4.6 Regularity of Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

5.5 Optimal Stopping Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

6 Multi-Armed Bandit Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2256.1 Closed Multi-Armed Bandit Processes in Discrete Time . . . . . . . . . . 227

6.1.1 Model and Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2276.1.2 Single-Armed Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2306.1.3 Proof of Theorem 6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

6.2 Open Bandit Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2346.2.1 Formulation and Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2366.2.2 Proof of Theorem 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

6.3 Generalized Open Bandit Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 2446.3.1 Nash’s Generalized Bandit Problem . . . . . . . . . . . . . . . . . . . . . 2446.3.2 Extension of Nash’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

6.4 Closed Multi-Armed Bandit Processes in Continuous Time . . . . . . . 2486.4.1 Problem Formulation and Its Solution . . . . . . . . . . . . . . . . . . . 2486.4.2 An Account for Deteriorating Bandits . . . . . . . . . . . . . . . . . . . 251

7 Dynamic Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2537.1 Dynamic Policies and Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2537.2 Restricted Dynamic Policies for Total-Loss Breakdown Models . . . . 257

7.2.1 Total-Loss Breakdown Model . . . . . . . . . . . . . . . . . . . . . . . . . . 2577.2.2 Optimal Policies with Independent Processing Times . . . . . . 2617.2.3 Optimal Policies with Identical Processing Times . . . . . . . . . 266

7.3 Restricted Dynamic Policies for No-Loss Breakdown Models . . . . . 2697.4 Partial-Loss Breakdown Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

7.4.1 The Semi-Markov Model for Job Processing . . . . . . . . . . . . . 2727.4.2 Integral Equations for Gittins Indices . . . . . . . . . . . . . . . . . . . . 2747.4.3 Optimal Policies via Gittins Indices . . . . . . . . . . . . . . . . . . . . . 2767.4.4 Specific Partial-Loss Breakdown Models . . . . . . . . . . . . . . . . 278

7.5 Unrestricted Policies for a Parallel Machine Model . . . . . . . . . . . . . . 2817.5.1 Optimality Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2827.5.2 SEPT Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2837.5.3 LEPT Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2917.6 Bibliographical Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297

x Contents

8 Stochastic Scheduling with Incomplete Information . . . . . . . . . . . . . . . . 2998.1 Modelling and Probabilistic Characteristics . . . . . . . . . . . . . . . . . . . . . 300

8.1.1 Formulation and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 3008.1.2 Repetition Frequency and Occupying Time . . . . . . . . . . . . . . 3018.1.3 Impact of Incomplete Information on Static Policies . . . . . . . 303

8.2 Optimal Restricted Dynamic Policies . . . . . . . . . . . . . . . . . . . . . . . . . . 3048.3 Posterior Gittins Indices with One-Step Reward Rates . . . . . . . . . . . . 308

8.3.1 Posterior Gittins Indices by One-Step Reward Rates . . . . . . . 3088.3.2 Incompletion Information for Processing Times . . . . . . . . . . . 311

9 Optimal Policies in Time-Varying Scheduling . . . . . . . . . . . . . . . . . . . . . 3219.1 Stochastic Scheduling with Deteriorating Processing Times . . . . . . . 322

9.1.1 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3229.1.2 Processibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3259.1.3 The Characteristics of Occupying Time . . . . . . . . . . . . . . . . . . 3319.1.4 Optimal Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

9.2 Stochastic Model with Learning Effects . . . . . . . . . . . . . . . . . . . . . . . . 3399.2.1 Optimal Policies with Learning Effects . . . . . . . . . . . . . . . . . . 3409.2.2 Consideration of Unreliable Machines . . . . . . . . . . . . . . . . . . . 344

10 More Stochastic Scheduling Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34710.1 Optimization Under Stochastic Order . . . . . . . . . . . . . . . . . . . . . . . . . . 347

10.1.1 Basic Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34810.1.2 Stochastic Minimization of Maximum Lateness . . . . . . . . . . . 34910.1.3 Optimal Solutions with Exponential Processing Times

and Due Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35410.2 Team-Work Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

10.2.1 Team-Work Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36010.2.2 The Deterministic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36110.2.3 The Stochastic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

10.3 Scheduling of Perishable Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37710.3.1 Perishable Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37710.3.2 The Base Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38010.3.3 Waiting Decision on a Finished Product . . . . . . . . . . . . . . . . . 38310.3.4 Decisions on Unfinished Products . . . . . . . . . . . . . . . . . . . . . . 38510.3.5 Accounting for Random Market Demand . . . . . . . . . . . . . . . . 390

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

Chapter 1Basic Concepts

This chapter introduces and summarizes basic concepts and terminology inprobability theory and stochastic scheduling, which build the foundation to developoptimal policies for a wide range of scheduling problems in subsequent chapters.In Sect. 1.1, we summarize the fundamental theory of probability in a compact andconcise way. Section 1.2 discusses several versions of stochastic orders for compar-ing random variables, which are essential for the optimality criteria in schedulingproblems. Section 1.3 describes basic concepts of stochastic scheduling models,including job characteristics, machine environments, scheduling policies, perfor-mance measures and optimality criteria. The notation used throughout the book issummarized in Sect. 1.4.

1.1 Fundamental of Probability

To deal with extensive uncertainty in stochastic scheduling, we apply the theory andmethods of probability to quantify the level of uncertainty. This section introducesthe basic concepts and fundamental theory of probability.

1.1.1 Probability Space

Sample Space

A set Ω that includes all elements of interest is referred to as a space. If Ω consistsof all possible outcomes of an experiment with uncertain result, it is referred toas a sample space. Typical examples of sample space include Ω = 1,2,3,4,5,6

X.Q. Cai et al., Optimal Stochastic Scheduling, International Series in OperationsResearch & Management Science 207, DOI 10.1007/978-1-4899-7405-1 1,© Springer Science+Business Media New York 2014

1

2 1 Basic Concepts

for tossing a die, and Ω = [0,∞) for recording a random time. An element in Ω isdenoted by ω , which represents a particular outcome if Ω is a sample space.

A set Ω is said to be countable if its elements can be listed as a sequence:

Ω = ω1,ω2, . . .

Otherwise Ω is uncountable. In particular, any finite set Ω is countable. Any interval[a,b] with a < b is uncountable.

A sample space Ω is said to be discrete if it is countable; or continuous if it is aninterval or a product of intervals, such as:

• Ω = R= (−∞,∞), Ω = [0,∞), Ω = [0,1],

• Ω = Rn = (x1, . . . ,xn) : x1, . . . ,xn ∈ R, Ω = [0,∞)2 = (x,y) : x,y ≥ 0, or

• Ω = [0,∞)× [0,1] = (x,y) : x ≥ 0, 0 ≤ y ≤ 1, etc.

σ -Algebra

Given a space Ω , a collection F of subsets of Ω is said to be a σ -algebra or σ -fieldon Ω if it satisfies the following three axioms:

(i) The empty set /0 ∈ F ;

(ii) If a subset E ∈ F , then its complement Ec ∈ F ;

(iii) If subsets Ei ∈ F , i = 1,2, . . . , then∞⋃

i=1

Ei ∈ F .

These axioms imply:

(iv) The sample space Ω ∈ F ;

(v) If E ∈ F and F ∈ F , then E ∪F ∈ F , E ∩F ∈ F and E −F = E ∩Fc ∈ F ;

(vi) If subsets Ei ∈ F , i = 1,2, . . . , then∞⋂

i=1

Ei ∈ F .

In summary, a σ -algebra is a “self-contained” collection of subsets of Ω undercountable set operations of unions, intersections and complements.

Given a collection G of E ⊂ Ω , the smallest σ -algebra F such that G ⊂ F iscalled the σ -algebra generated by G and denoted by F = σ(G ).

For example, the smallest σ -algebra on Ω is σ( /0)= /0,Ω, and the σ -algebragenerated by a single-event collection E with E = /0 is σ(E) = /0,E,Ec,Ω.The largest σ -algebra on Ω is the collection of all subsets of Ω , denoted by 2Ω .

1.1 Fundamental of Probability 3

If Ω is a discrete space, then every subset E of Ω can be expressed as a countableunion of single-outcome sets:

E =⋃

ω∈E

ω

As a result, if a σ -algebra F on Ω includes all single-point sets ω for ω ∈ Ω ,then F = 2Ω .

Borel Field and Sets

If Ω is an interval, then the σ -algebra B generated by all subintervals of Ω is calledthe Borel algebra or Borel field on Ω . The Borel field can be generated by

B = σ((a,b] : a,b ∈ Ω),

where (a,b] can be equivalently replaced by [a,b], (a,b) or [a,b), provided a,b∈ Ω .The Borel field is among most useful σ -algebras.

Clearly, the Borel field B on an interval Ω includes all subintervals of Ω as wellas their countable unions, intersections and complements, but it does not include allsubsets of Ω ; in other words, B = 2Ω .

The Borel field can also be defined on a multidimensional space such as

Ω = Rn = (x1, . . . ,xn) : x1, . . . ,xn ∈ R

The Borel field on Rn is defined by

B = σ(

n

∏i=1

(ai,bi] : ai,bi ∈ R)

Again, the intervals (ai,bi] can be replaced by [ai,bi], (ai,bi) or [ai,bi). Similarly,we can define the Borel field on [0,∞)n, [0,1]n, and so on.

Any B ∈ B on Rn is called a Borel set, or said to be Borel measurable, on Rn.

Events

Given a sample space Ω and a σ -algebra F on Ω , every E ∈ F is called an eventand said to beF -measurable. Following the axioms of σ -algebra, every countableunion, intersection or complement of events is also an event.

4 1 Basic Concepts

For practical purposes, a meaningful σ -algebra F of events should include allsingle-outcome sets ω, ω ∈ Ω . Therefore, for a discrete sample space Ω , F istaken to be the collection of all subsets of Ω , i.e., F = 2Ω .

If Ω is a continuous sample space, then the σ -algebra generated by single-pointsets is too small, whereas 2Ω is too big, to be useful for practical purposes. The Borelfield B is usually sufficient and commonly adopted to define events on a continuoussample space.

An event E occurs if the outcome ω of the experiment belongs to E . The indicatorIE of an event E is defined by IE = 1 if E occurs; and IE = 0 if not.

Lebesgue Measure

Given a space Ω and a σ -algebra F on Ω , a set function m(·) defined on F is saidto be a measure if it satisfies the following two axioms:

(i) m(E)≥ 0 for any E ∈ F ;

(ii) If E1,E2, · · · ∈F are disjoint or mutually exclusive in the sense that Ei∩E j = /0for i = j, where /0 represents the empty set, then

m

(∞⋃

i=1

Ei

)=

∞

∑i=1

m(Ei)

A measure defined on the Borel field is called a Borel measure. Among the mostuseful Borel measures is the well-known Lebesgue measure, which assigns measurem([a,b]) = b− a to an interval [a,b] in R and

m

(n

∏i=1

[ai,bi]

)=

n

∏i=1

m([ai,bi]) =n

∏i=1

(bi − ai)

to a product of intervals on Rn, where [a,b] can be replaced by (a,b], [a,b) or (a,b)without affecting its measure.

A Borel set B on Rn is said to have zero measure if it has Lebesgue measurem(B) = 0. In particular, for any a = (a1, . . . ,an) ∈ Rn,

m(a) = m

(n

∏i=1

[ai,ai]

)=

n

∏i=1

(ai − ai) = 0

Thus every single point set a ⊂ Rn has zero measure, and consequently, everycountable subset of Rn has zero measure.

If a property holds on a Borel set B ⊂Rn and m(Bc) = 0, we say that the propertyholds almost everywhere on Rn.


Measurable Functions

A real-valued function g(x) defined on Rn is said to be measurable if

g−1(A) = x : g(x) ∈ A

is a Borel set on Rn for every Borel set A on R.

A real-valued function g(x) = g(x1, . . . ,xn) defined on Rn is said to be Riemannintegrable if the integral

∫

Rng(x1, . . . ,xn)dx1 · · ·dxn

is well-defined in the usual sense of Calculus, and this integral is referred to as theRiemann integral. It is well-known in the measure theory that a function g(x) on Rn

is Riemann integrable if and only if it is continuous almost everywhere on Rn.

An almost everywhere continuous function g(x) is measurable. Consequently, allanalytic functions, continuous functions, piecewise continuous functions, as wellas Riemann integrable functions are measurable. In fact, all functions of practicalinterest are measurable.

Define and denote the indicator function of a set A ⊂ Rn by

IA = IA(x) = Ix∈A =

1 if x ∈ A;0 if x /∈ A.

Then IA(x) is a measurable function for every Borel set A.

Lebesgue Integral

The Lebesgue integral is defined for measurable functions based on the Lebesguemeasure m(·). For an indicator function IA(x) of a Borel set A on Rn, its Lebesgueintegral is denoted and defined by

∫IA(x)dm(x) =

∫

x∈RnIA(x)dm(x) =

∫

x∈Adm(x) = m(A)

For any nonnegative measurable function g(x), there exist functions of the form

gn(x) =k(n)

∑i=1

αi(n)IAi(n)(x), n = 1,2, . . .

6 1 Basic Concepts

such that 0 ≤ g1(x) ≤ g2(x) ≤ · · · ≤ g(x) and limn→∞

gn(x) = g(x), where Ai(n) are

Borel sets for all i and n and αi(n) are real numbers. The Lebesgue integral of g(x)is then defined by

∫g(x)dm(x) = lim

n→∞

∫gn(x)dm(x) = lim

n→∞

k(n)

∑i=1

αi(n)m(Ai(n))

The existence of the limit is guaranteed by the Lebesgue monotone convergencetheorem.

A measurable function g(x) is said to be Lebesgue integrable if∫|g(x)|dm(x) =

∫g+(x)dm(x)+

∫g−(x)dm(x)< ∞

where g+(x) =maxg(x),0 and g−(x) =max−g(x),0, both are nonnegative andmeasurable.

The Lebesgue integral of a Lebesgue integrable function g(x) is defined by∫

g(x)dm(x) =∫

g+(x)dm(x)−∫

g−(x)dm(x)

Furthermore, the Lebesgue integral of a Lebesgue integrable function g(x) over aBorel set A is denoted and defined by

∫

Ag(x)dm(x) =

∫IA(x)g(x)dm(x)

The Lebesgue integral is defined for a much wider range of functions than theRiemann integral. If g(x) is Riemann integrable, then the Lebesgue integral of g(x)coincides with its Riemann integral. From now on we will denote the Lebesgueintegral of g(x) over a Borel set A ⊂ Rn by the traditional integral notation for theRiemann integral in Calculus:

∫

Ag(x)dm(x) =

∫

x∈Ag(x)dx =

∫

Ag(x1, . . . ,xn)dx1 · · ·dxn

Throughout the rest of the book, the integral in the above notation will be in thesense of Lebesgue integral.

Probability Space

Given a sample space Ω and a σ -algebra F of events on Ω , a set function Pr(·)on F is said to be a probability measure, or simply probability, if it satisfies thefollowing three axioms:


(i) Pr(Ω) = 1;

(ii) Pr(E)≥ 0 for any E ∈ F ;

(iii) If E1,E2, · · · ∈F are disjoint or mutually exclusive in the sense that Ei∩E j = /0for i = j, then

Pr

(∞⋃

i=1

Ei

)=

∞

∑i=1

Pr(Ei)

In other words, Pr(·) is a measure on F with Pr(Ω) = 1.

From Axioms (i)–(iii) we can further derive:

(iv) Pr( /0) = 0;

(v) Pr(Ec) = 1−Pr(E) for any E ∈ F ;

(vi) Pr(E)≤ 1 for any E ∈ F ;

(vii) If E ∈ F and F ∈ F , then

Pr(E ∪F) = Pr(E)+Pr(F)−Pr(E ∩F)

(viii) If E ∈ F , F ∈ F and E ⊂ F , then Pr(E)≤ Pr(F) and

Pr(F −E) = Pr(F ∩Ec) = Pr(F)−Pr(E)

Furthermore, let En be a sequence of events. We write

• En ↓ E if E1 ⊃ E2 ⊃ · · · and∞⋂

n=1

En = E;

• En ↑ E if E1 ⊂ E2 ⊂ · · · and∞⋃

n=1

En = E .

Then Axioms (i)–(iii) also imply

(ix) limn→∞

Pr(En) = Pr(E) if either En ↓ E or En ↑ E .

The triplet (Ω ,F ,Pr) is referred to as a probability space. The probability theoryensures the existence of a probability space to meet all our needs, and we can assumeto work on a common probability space throughout the book.

8 1 Basic Concepts

Conditional Probability

Given a probability space (Ω ,F ,Pr), let E ∈ F and F ∈ F . If Pr(F) > 0, theconditional probability of E given F is denoted and defined by

Pr(E|F) =Pr(E ∩F)

Pr(F)(1.1)

In particular, if E ⊂ F , then

Pr(E|F) = Pr(E)Pr(F)

(1.2)

If Pr(F) = 0, Pr(E|F) can be obtained by

Pr(E|F) = limn→∞

Pr(E|Fn) = limn→∞

Pr(E ∩Fn)

Pr(Fn)(1.3)

where Fn is a sequence of events such that Fn ↓ F and Pr(Fn)> 0 for n = 1,2, . . . ,provided the limit exists uniquely.

By the definition of Pr(E|F) in (1.1), we have

Pr(E ∩F) = Pr(E|F)Pr(F) (1.4)

and

Pr(E) = Pr(E|F)Pr(F) if E ⊂ F (1.5)

regardless of Pr(F)> 0 or Pr(F) = 0.

More generally, if F1,F2, . . . ,Fn are disjoint events such thatn⋃

i=1

Fi = Ω , then

Pr(E) =n

∑i=1

Pr(E|Fi)Pr(Fi) (1.6)

This formula shows how Pr(E) can be calculated by conditioning on F1,F2, . . . ,Fn,and is useful in situations where Pr(E|Fi) and Pr(Fi) are known.

Independent Events

Two events E and F on a probability space (Ω ,F ,Pr) are said to be independent if

Pr(E ∩F) = Pr(E)Pr(F) (1.7)

which implies Pr(E|F) = Pr(E) and Pr(F |E) = Pr(F).


More generally, n events E1, . . . ,En on (Ω ,F ,Pr) are said to be independent ormutually independent if

Pr

(⋂

i∈J

Ei

)= ∏

i∈JPr(Ei) (1.8)

for any non-empty J ⊂ 1,2, . . . ,n. For example, E1,E2,E3 are independent if

Pr(E1 ∩E2) = Pr(E1)Pr(E2), Pr(E1 ∩E3) = Pr(E1)Pr(E3),

Pr(E2 ∩E3) = Pr(E2)Pr(E3) and Pr(E1 ∩E2 ∩E3) = Pr(E1)Pr(E2)Pr(E3)

1.1.2 Random Variables

Given a probability space (Ω ,F ,Pr), a real-valued function X = X(ω) defined onthe sample space Ω is said to be a random variable if

X ∈ A= ω : X(ω) ∈ A ∈ F for any A ∈ B (1.9)

In other words, X ∈ A is an event for any Borel set A ⊂ R= (−∞,∞). The set ofall possible values of a random variable X is called the state space of X .

Throughout the rest of the book, we assume that the underlying probability space(Ω ,F ,Pr) is sufficiently large that any real-valued variable X of interest is a randomvariable. This can be achieved by generating F from X ∈ A : A ∈ B for all Xof interest.

If X is a random variable and g(x) is a measurable function on R, then g−1(A) isa Borel set for any Borel set A, so that

g(X) ∈ A= X ∈ g−1(A) ∈ F

This shows that g(X) is a random variable. Therefore, a measurable function of arandom variable is also a random variable.

Distribution Functions

Given a random variable X , the function

F(x) = Pr(X ≤ x) = Pr(X ∈ (−∞,x]), x ∈R,

is called the cumulative distribution function (cdf) of X .

10 1 Basic Concepts

Since (−∞,x] is a Borel set for any x ∈ R, the cdf F(x) is well defined for anyrandom variable X . Every cdf F(x) is a nondecreasing and right-continuous functionwith left limit

F(a−) = limx↑a

F(x) at every point a ∈ R

If F(a−)< F(a), then a is called a mass point of X or F(x), and F(a)−F(a−)is the mass at a. A cdf F(x) has at most countable mass points. In other words, arandom variable X can have at most countable points a such that

Pr(X = a) = F(a)−F(a−)> 0

Let M = a : Pr(X = a)> 0) denote the set of all mass points of X .

A random variable X is said to be discrete if

∑a∈M

Pr(X = a) = 1

The state space S of a discrete random variable X is identical to its set of masspoints. The function

f (x) = Pr(X = x) = F(x)−F(x−)

is called the probability mass function (pmf) of a discrete X . Since x = [x,x] is aBorel set for any x ∈ R, the pmf f (x) is well-defined with f (x) > 0 for x ∈ S = Mand f (x) = 0 for x /∈ S.

A cdf F(x) is said to be absolutely continuous if there is a measurable functionf (x) ≥ 0 defined on R such that

F(x) =∫ x

−∞f (y)dy =

∫

y≤xf (y)dy, x ∈ R (1.10)

Since F(x)≤ 1, the function f (x) that satisfies (1.10) must be Lebesgue integrable.The representation of F(x) in (1.10) implies that F(x) has derivative F ′(x) = f (x)almost everywhere.

A random variable X is said to be continuous if it has an absolutely continuouscdf F(x). The function f (x) that satisfies (1.10) is called the probability densityfunction (pdf), or simply the density, of X .

A random variable X with cdf F(x) is said to be mixed if there exists a measurablefunction f (x) ≥ 0 on R and a countable set M ⊂ R such that

F(x) =∫ x

−∞f (y)dy+ ∑

a∈M,a≤xf (a) (1.11)


and f (x) = F(x)−F(x−) > 0 for x ∈ M. The function f (x) in (1.11) is called theprobability function (pf) of X . Under (1.11), F(x) also has derivative F ′(x) = f (x)almost everywhere.

Clearly, a mixed random variable X reduces to a discrete random variable if itspf f (x) = 0 almost everywhere; or a continuous random variable if M is empty. Thepf of X coincides with the pmf if X is discrete; or the pdf if X is continuous.

Remark 1.1. The pf defined in (1.11) is more general than pmf and pdf, and moreflexible to model a random variable and determine a cdf. Not every cdf, however, canbe expressed by (1.11). Nevertheless, for practical purpose it is generally sufficientto consider random variables and cdfs with a pf that satisfies (1.11).

Stieltjes Integral

Given a cdf F(x) with pf f (x) and set M of mass points, the Stieltjes integral of aLebesgue integrable function g(x) with respect to F(x) on R is denoted and can becalculated by

∫

x∈Rg(x)dF(x) =

∫

x∈Rg(x) f (x)dx+ ∑

x∈Mg(x) f (x) (1.12)

It is further referred to as the Lebesgue-Stieltjes integral. When g(x) is Riemannintegrable, the integral in (1.12) is also called the Riemann-Stieltjes integral.

We will write dF(x) = f (x) for convenience if (1.12) holds for all Lebesgueintegrable functions g(x).

Probability Distribution

Given a random variable X , the probability distribution of X is a rule that determinesthe probability Pr(X ∈ A) for all A ∈ B.

For a discrete random variable X with countable state space S, since any A ⊂ S isa Borel set, the probability distribution of X can be given by its pmf f (x) via

Pr(X ∈ A) = ∑x∈A

Pr(X = x) = ∑x∈A

f (x) (1.13)

The probability distribution of a continuous random variable X can be given byits pdf f (x) via

Pr(X ∈ A) =∫

x∈Af (x)dx (1.14)

for any Borel set A ⊂ S.

12 1 Basic Concepts

The probability distribution of a mixed random variable X with pf f (x) and setM of mass points is given by

Pr(X ∈ A) =∫

x∈Af (x)dx+ ∑

x∈A∩Mf (x) (1.15)

Clearly, (1.13) and (1.14) are special cases of (1.15).

Take g(x) = Ix∈A = IA(x) in (1.12). Then (1.15) can be expressed by

Pr(X ∈ A) =∫

x∈RIA(x) f (x)dx+ ∑

x∈MIA(x) f (x) =

∫

x∈RIA(x)dF(x) =

∫

x∈AdF(x)

Thus formulae (1.13)–(1.15) can be unified by the Stieltjes integral as

Pr(X ∈ A) =∫

x∈AdF(x) (1.16)

Hazard Rate

Let X be a continuous random variable with cdf F(x) and density f (x). Define theright extreme of F(x) by τF = supx : F(x)< 1, which may be finite or infinite.

The hazard rate (function) λ (x) of F(x) is defined by

λ (x) = f (x)1−F(x)

=f (x)F(x)

=− ddx

log F(x) for x < τF (1.17)

where F(x) = 1−F(x). In case τF < ∞, we define λ (x) = ∞ for x ≥ τF .

The hazard rate is an important function in survival analysis where X representsthe lifetime and the hazard rate measures the risk of death. In mortality studies, thehazard rate is called the force of mortality and defined as the “rate of death for a lifeat risk” by

λ (x) = limδ↓0

1δ Pr(x < X < x+ δ |X > x)

Joint Probability Distributions

Given two random variables X and Y , the joint distribution of (X ,Y ) is a rule todetermine joint probability Pr((X ,Y ) ∈ A) for all Borel sets A on R2.

The joint cdf of two random variables (X ,Y ) is defined by

F(x,y) = Pr(X ≤ x,Y ≤ y), x,y ∈ R


Furthermore, (X ,Y ) are said to be jointly continuous if their joint cdf F(x,y) can beexpressed as an integral of the form:

F(x,y) =∫

u≤x,v≤yf (u,v)dudv

with a nonnegative measurable function f (x,y) on R2. The function f (x,y) in theabove integral is called the joint pdf (density) of (X ,Y ) and satisfies

f (x,y) =∂ 2

∂x∂yF(x,y) almost everywhere on R2

On the other hand, (X ,Y ) are jointly discrete if

F(x,y) = ∑u≤x,v≤y

f (u,v) = ∑u≤x,v≤y

Pr(X = u,Y = v)

for (u,v) in a countable subset of R2. The function

f (x,y) = Pr(X = x,Y = y)

for discrete (X ,Y ) is called the joint pmf of discrete (X ,Y ).

More generally, (X ,Y ) are jointly mixed if there exists a measurable functionf (x,y) ≥ 0 on R2 such that

F(x,y) =∫

u≤x,v≤yf (u,v)dudv+ ∑

u≤x,u∈M

∫

v≤yf (u,v)dv

+ ∑v≤y,v∈J

∫

u≤xf (u,v)du+ ∑

u≤x,v≤y;u∈M,v∈Jf (u,v) (1.18)

where

M = x : Pr(X = x)> 0 and J = y : Pr(Y = y)> 0

are countable sets of mass points of X and Y respectively. The function f (x,y) in(1.18) is called the joint pf of (X ,Y ).

The joint cdf F(x,y) of mixed (X ,Y ) has partial derivatives with respect to x andy almost everywhere on R2.

For mixed (X ,Y ) with joint cdf F(x,y), the pf f (x,y) can be determined by

f (x,y) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Pr(X = x,Y = y) if x ∈ M, y ∈ J

∂∂x

Pr(X ≤ x,Y = y) if x /∈ M, y ∈ J

∂∂y

Pr(X = x,Y ≤ y) if x ∈ M, y /∈ J

∂ 2F(x,y)∂x∂y

if x /∈ M, y /∈ J

(1.19)

14 1 Basic Concepts

For a joint cdf F(x,y) of the form (1.18), the Stieltjes integral of a measurablefunction g(x,y) on R2 with respect to F(x,y) can be calculated by

∫g(x,y)dF(x,y) =

∫g(x,y) f (x,y)dxdy+ ∑

y∈J

∫

x∈Rg(x,y) f (x,y)dx

+ ∑x∈M

∫

y∈Rg(x,y) f (x,y)dy+ ∑

x∈M,y∈Jg(x,y) f (x,y) (1.20)

Similar to (1.12), we will write dF(x,y) = f (x,y) if (1.20) holds for all Lebesgueintegrable functions g(x,y) on R2.

For any Borel set A on R2, taking g(x,y) = I(x,y)∈A in (1.20), we can determinethe joint distribution of mixed (X ,Y ) by

Pr((X ,Y ) ∈ A) =∫

(x,y)∈Af (x,y)dxdy+ ∑

x∈M

∫

y:(x,y)∈Af (x,y)dy

+ ∑y∈J

∫

x:(x,y)∈Af (x,y)dx+ ∑

(x,y)∈A,x∈M,y∈J

f (x,y)

In particular,

Pr((X ,Y ) ∈ A) = ∑(x,y)∈A

f (x,y) for discrete (X ,Y )

Pr((X ,Y ) ∈ A) =∫

(x,y)∈Af (x,y)dxdy for continuous (X ,Y )

and

Pr((X ,Y ) ∈ A) = ∑x∈M

∫

y:(x,y)∈Af (x,y)dy if X is discrete and Y is continuous

Similarly, we can define the joint distribution of any number of random variablesX1, . . . ,Xn to determine their joint probability

Pr((X1, . . . ,Xn) ∈ A) for all Borel sets A on Rn

via joint pdf, pmf or pf.

Marginal Distributions

When considering multiple random variables, the probability distribution of a singlerandom variable is referred to as the marginal distribution.


Given the joint cdf F(x,y) and pf f (x,y) of two random variables (X ,Y ), themarginal cdfs of X and Y are given respectively by

FX(x) = F(x,∞) and FY (y) = F(∞,y)

The marginal pfs of X and Y are given by

fX (x) =∫

y∈RdF(x,y) =

∫

y∈Rf (x,y)dy+ ∑

y∈Jf (x,y)

and

fY (y) =∫

x∈RdF(x,y) =

∫

x∈Rf (x,y)dx+ ∑

x∈Mf (x,y)

respectively, which can also be written as

dFX(x) =∫

y∈RdF(x,y) and dFY (y) =

∫

x∈RdF(x,y) (1.21)

In particular, for continuous (X ,Y ), the marginal densities are

fX (x) =∫

y∈Rf (x,y)dy, fY (y) =

∫

x∈Rf (x,y)dx

and for discrete (X ,Y ), the marginal pmfs are

fX (x) = ∑y∈J

f (x,y), fY (y) = ∑x∈M

f (x,y)

If X is discrete and Y is continuous, then the marginal pmf of X and the marginaldensity of Y are given respectively by

fX (x) =∫

y∈Rf (x,y)dy and fY (y) = ∑

x∈Mf (x,y)

Conditional Distribution

Given two random variables X and Y on a probability space (Ω ,F ,Pr), let f (x,y)denote the joint pf of (X ,Y ), and fX (x) and fY (y) the marginal pfs of X and Yrespectively.

The conditional pf of X given Y = y is denoted and defined by

fX |Y (x|y) =f (x,y)fY (y)

(1.22)

16 1 Basic Concepts

If X and Y are discrete or continuous, then (1.22) also defines the conditional pmfor pdf of X given Y = y.

The conditional cdf of X given Y = y is defined by

FX |Y (x|y) = Pr(X ≤ x|Y = y) =Pr(X ≤ x,Y = y)

Pr(Y = y)if Pr(Y = y)> 0

or, in the case of Pr(Y = y) = 0, by

FX |Y (x|y) = limδ↓0

Pr(X ≤ x|y ≤ Y ≤ y+ δ ) = limδ↓0

Pr(X ≤ x,y ≤ Y ≤ y+ δ )Pr(y ≤ Y ≤ y+ δ )

Equivalently, FX |Y (x|y) is given by

dFX |Y (x|y) = dPr(X ≤ x|Y = y) = fX |Y (x|y) (1.23)

in the sense that

FX |Y (x|y) =∫ x

−∞fX |Y (z|y)dz+ ∑

z≤x,z∈M(y)

fX |Y (z|y)

where M(y) = x : Pr(X = x|Y = y)> 0.

In particular, if X and Y are discrete with conditional pmf fX |Y (x|y), then

FX |Y (x|y) = ∑z≤x

fX |Y (z|y) (1.24)

and if X and Y are continuous with conditional pdf fX |Y (x|y), then

FX |Y (x|y) =∫ x

−∞fX |Y (z|y)dz (1.25)

The conditional probability distribution of X given Y = y can be determined by

Pr(X ∈ A|Y = y) =∫

x∈AdFX |Y (x|y) =

∫

x∈AfX |Y (x|y)dx+ ∑

x∈A∩M(y)

fX |Y (x|y) (1.26)

Independent Random Variables

A set of random variables X1, . . . ,Xn are said to be (mutually) independent if

Pr(X1 ∈ A1, . . . ,Xn ∈ An) = Pr(X1 ∈ A1) · · ·Pr(Xn ∈ An)

for all Borel sets A1, . . . ,An on R.


An equivalent definition for X1, . . . ,Xn to be independent is given by

F(x1, . . . ,xn) = F1(x1) · · ·Fn(xn) for all x1, . . . ,xn ∈R

where F(x1, . . . ,xn) is the joint cdf of X1, . . . ,Xn and Fi(x) denotes the marginal cdfof Xi, i = 1, . . . ,n.

Similarly, X1, . . . ,Xn are independent if and only if

f (x1, . . . ,xn) = f1(x1) · · · fn(xn) for all x1, . . . ,xn ∈ R

where f (x1, . . . ,xn) is the joint pf of X1, . . . ,Xn and fi(x) is the marginal pf of Xi,i = 1, . . . ,n.

Furthermore, if X1, . . . ,Xn are independent random variables, then for any real-valued measurable functions g1(x), . . . ,gn(x) defined on R, g1(X1), . . . ,gn(Xn) areindependent as well.

If X is independent of Y , the conditional distribution of X given Y = y reduces tothe unconditional (marginal) distribution of X independent of y:

fX |Y (x|y) = fX (x) and FX |Y (x|y) = FX(x)

Expectation

For a random variable X with cdf F(x) and pf f (x), the expectation of X is denotedand defined by

E[X ] =∫

x∈RxdF(x) =

∫

x∈Rx f (x)dx+ ∑

x∈Mx f (x) (1.27)

where M = x : Pr(X = x)> 0 is the set of mass points of F(x). In particular,

E[X ] = ∑x∈S

x f (x) (1.28)

if X is discrete with state space S and pmf f (x), and

E[X ] =∫

x∈Rx f (x)dx (1.29)

if X is continuous with pdf f (x). (1.29) remains valid if X has only one mass at 0.

The expectation E[X ] is also referred to as the expected value or mean of X .

If X = IE is an indicator of an event E , then X has two masses at 1 and 0 with

Pr(X = 1) = Pr(E) and Pr(X = 0) = Pr(Ec) = 1−Pr(E)

18 1 Basic Concepts

Hence the expectation of an indicator IE is given by

E[IE ] = 1 ·Pr(E)+ 0 ·Pr(Ec) = Pr(E) (1.30)

For any measurable function g(x) on R, the expectation of g(X) is calculated by

E[g(X)] =∫

x∈Rg(x)dF(x) =

∫

x∈Rg(x) f (x)dx+ ∑

x∈Mg(x) f (x) (1.31)

where M is the set of mass points of F(x). In particular, the k-th moment of a randomvariable X is defined by

E[Xk] =∫

x∈RxkdF(x), k = 1,2, . . . (1.32)

and the variance of X is given by

Var(X) = E[(X −E[X ])2]= E[X2]− (E[X ])2 (1.33)

Conditional Expectation

Given two random variables X and Y with joint cdf F(x,y) and joint pf f (x,y), theconditional expectation of X given Y = y is defined by

E[X |Y = y] =∫

x∈RxdFX |Y (x|y) =

∫

x∈Rx fX |Y (x|y)dx+ ∑

x∈MX

x fX |Y (x|y) (1.34)

where FX |Y (x|y) and FX |Y (x|y) are the joint cdf and pf of X given Y = y, respectively,and MX = x : Pr(X = x)> 0 is the set of mass points of X .

Recall that dF(x) = f (x) and dF(x,y) = f (x,y) in the sense that (1.12) and(1.20) hold for all Lebesgue integrable functions g(x) and g(x,y). Then by (1.22)and (1.23),

dFX |Y (x|y) = fX |Y (x|y) =f (x,y)fY (y)

=dF(x,y)dFY (y)

It follows thatdFX |Y (x|y)dFY (y) = dF(x,y) (1.35)

Let g(y) =E[X |Y = y]. Then the conditional expectation of X given Y is a randomvariable defined by

E[X |Y ] = g(Y )


By (1.34) and (1.35) together with (1.21), the expectation of E[X |Y ] is

E[E[X |Y ]] = E[g(Y )] =∫

y∈Rg(y)dFY (y) =

∫

y∈RE[X |Y = y]dFY (y)

=∫

y∈R

∫

x∈RxdFX |Y (x|y)dFY (y) =

∫

y∈R

∫

x∈RxdF(x,y)

=∫

x∈Rx∫

y∈RdF(x,y) =

∫

RxdFX(x) = E[X ]

This gives the law of iterated expectations:

E[E[X |Y ]] = E[X ] (1.36)

Thus the expectation E[X ] can be calculated by conditioning on Y as follows:

E[X ] = E[E[X |Y ]] =∫

y∈RE[X |Y = y]dFY (y) (1.37)

By (1.30) and (1.37), the probability Pr(E) of an event E can also be calculated byconditioning on Y :

Pr(E) = E[IE ] =∫

y∈RE[IE |Y = y]dFY (y) =

∫

y∈RPr(E|Y = y)dFY (y) (1.38)

This is an extension to the formula in (1.6).

1.1.3 Family of Distributions

We will often consider certain families of distributions for random variables in theproblems of stochastic scheduling, such as processing times and due dates. Someof the typical families of distributions that are of practical interest and commonlyassumed in the literature are listed below for ease of reference.

Exponential Distribution

A random variable X is said to have an exponential distribution or be exponentiallydistributed, or simply exponential, if the density of X has the form:

f (x) = λ e−λ xIx≥0

20 1 Basic Concepts

where λ > 0 is a parameter called the rate. An exponential random variable X withrate λ is denoted by X ∼ exp(λ ). It has a constant hazard rate λ (x) = λ for all x,

cdf F(x) = 1− e−λ x, F(x) = e−λ x, E[X ] =1λ and Var(X) =

1λ 2 .

If X1, · · · ,Xr are stochastically independent, exponentially distributed randomvariables with rates λ1, · · · ,λr, then minX1, · · · ,Xr is exponentially distributedwith rate λ1 + · · ·+λr and

Pr(minX1, · · · ,Xr= Xi) = λi/(λ1 + · · ·+λr), i = 1, · · · ,r.

Weibull Distribution

A random variable X has a Weibull distribution, denoted by X ∼ Weibull(α,β ), ifits density is

f (x) = αβ α xα−1 exp−(β x)αIx≥0

where α > 0 is called the shape parameter and β > 0 is the scale parameter.A Weibull random variable X has

cdf F(x) = 1− exp−(β x)α, hazard rate λ (t) = αβ α xα−1

and kth moments

E[Xk] =1

β k Γ(

kα + 1

), k = 1,2, . . .

where Γ (·) is the gamma function defined by

Γ (α) =∫ ∞

0tα−1e−t dt, α > 0.

It has a recursive formula Γ (α +1) = αΓ (α), and when α = n is a positive integer,Γ (α) = Γ (n) = (n− 1)!.

Gamma Distribution

A gamma distribution for a random variable X , denoted by X ∼ Gamma(α,β ), hasa density

f (x) =1

Γ (α)β α xα−1e−β xIx≥0


where α > 0 is the shape parameter and β > 0 is the scale parameter. Its cdf doesnot have a closed form other than an integral, but its mean and variance are given by

E[X ] =αβ and Var(X) =

αβ 2

Normal Distribution

A random variable X has a normal distribution, and written X ∼ N(µ ,σ2), if itsdensity is of the form:

f (x) =1√

2πσexp− (x− µ)2

2σ2

, x ∈ R

where µ ∈ R is the location parameter and σ > 0 is the scale parameter. It hasE[X ] = µ and Var(X) = σ2.

Log-Normal Distribution

A random variable X has a log-normal distribution, and written X ∼ LN(µ ,σ2), iflogX ∼ N(µ ,σ2). Its kth moment is given by

E[Xk] = exp

kµ +12

k2σ2, k = 1,2, . . .

Uniform Distribution

A random variable X has a uniform distribution over interval [a,b], and writtenX ∼U [a,b], if its density is of the form:

f (x) =1

b− aIa≤x≤b

Its cdf, mean and variance are

F(x) =x

b− aIa≤x≤b+ Ix>b, E[X ] =

a+ b2

and Var(X) =(b− a)2

12

The interval [a,b] can be replaced by (a,b), (a,b] or [a,b) without affecting theproperties of the distribution.

22 1 Basic Concepts

Pareto Distribution

A random variable X has a Pareto distribution, and written X ∼ Pareto(α,θ ), if itsdensity is of the form:

f (x) =αθ α

(x+θ )α+1 Ix≥0

Its cdf, hazard rate, mean and variance are given respectively by

F(x) = 1−(

θx+θ

)α, λ (x) = α

x+θ (x ≥ 0)

E[X ] =θ

α − 1(α > 1) and Var(X) =

αθ 2

(α − 2)(α − 1)2 (α > 2)

Poisson Distribution

A discrete random variable X has a Poisson distribution with mean λ , and writtenX ∼ Poisson(λ ), if its probability mass function (pmf) is of the form:

f (x) = e−λ λ x

x!, x = 0,1,2, . . . (λ > 0)

Its mean and variance are equal: E[X ] = λ = Var(X).

Binomial Distribution

A binomial distribution, and written X ∼ Bin(n, p), is given by pmf:

f (x) =(n

x

)px(1− p)n−x, x = 0,1, . . . ,n (0 < p < 1)

Its mean and variance are given by

E[X ] = np and Var(X) = np(1− p)

If n = 1, Bin(1, p) is called the Bernoulli distribution and written X ∼ Ber(p).

Negative Binomial Distribution

A negative binomial distribution, written X ∼ NB(r, p), is given by pmf:

f (x) = pr(1− p)x, x = 0,1,2, . . . (0 0)

1.2 Stochastic Orders 23

Its mean and variance are given by

E[X ] =r(1− p)

pand Var(X) =

r(1− p)p2

When r = 1, NB(1, p) is called the geometric distribution and written X ∼ Geo(p).

1.2 Stochastic Orders

To determine the optimal scheduling strategy in stochastic environment, we need tocompare random variables so as to set the criteria of optimality. In this section, weintroduce ways to order random variables, referred to as stochastic orders.

1.2.1 Definitions of Stochastic Orders

Standard Stochastic Order

Let X and Y be two random variables on a common probability space (Ω ,F ,Pr)with cdfs FX(x) and FY (y) respectively.

Definition 1.1. X is said to be “less than or equal to Y stochastically”, and writtenX ≤st Y or Y ≥st X , if

Pr(X > t)≤ Pr(Y > t) for all t ∈ R (1.39)

or equivalently,

Pr(X < t)≥ Pr(Y < t) for all t ∈ R (1.40)

Furthermore, if the strict inequality in (1.39) or (1.40) holds for some t ∈ R, then Xis said to be “less than Y stochastically”, and written X <st Y or Y >st X .

Write F(x) = 1−F(x) for any cdf F(x). Then (1.39) and (1.40) can be expressedrespectively as

FX(t)≤ FY (t) or FX(t)≥ FY (t) for all t ∈R (1.41)

and

FX(t−)≥ FY (t−) or FX(t−)≤ FY (t−) for all t ∈ R (1.42)

F(x) is called the decumulative distribution function or survival function of X .

24 1 Basic Concepts

Under Definition 1.1, X ≤st Y means that X has a smaller chance of taking largevalues than Y . We will refer to this version of stochastic order as the standardstochastic order.

Hazard-Rate Order

Recall the hazard rate λ (x) of F(x) defined in (1.17):

λ (x) = f (x)1−F(x)

=f (x)F(x)

=− ddx

log F(x) for x < τF (1.43)

and λ (x) = ∞ for x ≥ τF in case τF < ∞, where τF is the right extreme of F(x).

We further define the cumulative hazard rate (function) Λ(x) of F(x) by

Λ(x) =∫ x

−∞λ (x)dx =

− log F(x) if x < τF

∞ if x ≥ τF(1.44)

It follows that

F(x) = e−Λ(x) for all x ∈R (1.45)

Let X and Y be two continuous random variables with respective cdfs FX(x) andFY (y) and hazard rates λX(x) and λY (y).

Definition 1.2. X is said to be “less than or equal to Y in hazard-rate order”, andwritten X ≤hr Y or Y ≥hr X , if

λX(t)≥ λY (t) for all t ∈ R (1.46)

Likelihood-Ratio Order

Let X and Y be two continuous random variables with densities fX (x) and fY (y)respectively.

Definition 1.3. X is said to be “less than or equal to Y in likelihood-ratio order”, andwritten X ≤lr Y or Y ≥lr X , if

fX (t) fY (s)≥ fX (s) fY (t) for all t ≤ s (1.47)

Since (1.47) implies

fX (t)fY (t)

≥ fX (s)fY (s)

for t ≤ s


provided fY (t)> 0 and fY (s) > 0, the order X ≤lr Y can be interpreted as having adecreasing likelihood-ratio fX (t)/ fY (t).

Remark 1.2. The hazard-rate and likelihood-ratio orders as defined above apply tocontinuous random variables. We can also define such orders for discrete randomvariables. The hazard-rate of a discrete random variable X with state space S andpmf f (x) can be defined by

λ (x) = Pr(X = x)Pr(X ≥ x)

=f (x)

F(x−)for x ∈ S (1.48)

Then Definition 1.2 remains valid for the hazard-rate order.

For the likelihood-ratio order, we just need to replace the pdfs fX (x) and fY (y) in(1.47) with pmfs. Then Definition 1.3 remains valid as well.

Almost-Sure and Mean Orders

The three types of orders introduced in Definitions 1.1–1.3 are determined by theprobability distributions of the random variables involved. In contrast, the followingtwo types of orders compare the values of the random variables or their numericalcharacteristics under the usual order of real numbers.

Let X = X(ω) and Y = Y (ω) be two random variables on a common probabilityspace (Ω ,F ,Pr).

Definition 1.4. X is said to be “less than or equal to Y almost surely”, and writtenX ≤ Y a.s. or Y ≥ X a.s., if

Pr(X ≤ Y ) = Pr(ω : X(ω)≤ Y (ω)) = 1 (1.49)

Definition 1.5. X is said to be “less than or equal to Y in mean order” if

E[X ]≤ E[Y ] (1.50)

Remark 1.3. The almost-sure order uses the usual order of numbers, but appliedpoint-wisely to the values of the random variables involved, while the mean order ison deterministic expected values. These two orders are straightforward extensionsof the usual order of deterministic numbers, hence are of deterministic nature. Westill include them in stochastic orders as they involve probability distributions andare defined for random variables.

26 1 Basic Concepts

1.2.2 Relations Between Stochastic Orders

General Random Variables

Consider the relations between the stochastic orders introduced in Sect. 1.2.1. First,if X ≤hr Y so that λX(t)≥ λY (t) for all t ∈ R, then

ΛX (t) =∫ t

0λX(s)ds ≥

∫ t

0λY (s)ds = ΛY (t) for all t ∈ R

Hence by (1.45),

FX(t) = e−ΛX (t) ≤ e−ΛY (t) = FY (t) for all t ∈ R (1.51)

This together with (1.41) shows that the hazard-ratio order X ≤hr Y implies thestandard stochastic order X ≤st Y , which has an intuitive explanation that a smallerhazard rate for death leads to a longer lifetime.

Next, (1.47) implies that

fX (t)FY (t) = fX (t)∫ ∞

tfY (s)ds =

∫ ∞

tfX (t) fY (s)ds ≥

∫ ∞

tfX (s) fY (t)ds

= fY (t)∫ ∞

tfX (s)ds = fY (t)FX(t)

This together with (1.43) shows that

λX(t) =fX (t)FX(t)

≥ fY (t)FY (t)

= λY (t) for all t ∈ R

Hence (1.47) implies (1.46); that is, the likelihood-ratio order X ≤lr Y implies thehazard-rate order X ≤hr Y .

Furthermore, if X ≤Y a.s., then X ≤ y a.s. conditional on Y = y. Hence by (1.38),

FX(x) = Pr(X > x) =∫

y∈RPr(x < X ≤ y|Y = y)dFY (y)

=∫

y>xPr(x < X ≤ y|Y = y)dFY (y)≤

∫

y>xdFY (y) = FY (x) ∀x ∈R

Thus the almost-sure order X ≤Y a.s. implies the standard stochastic order X ≤st Y .

Moreover, it is easy to derive the following formula for E[X ]:

E[X ] =∫

x∈RxdF(x) =

∫

x≥0

∫ x

0dydF(x)−

∫

x<0

∫ 0

xdydF(x)

=∫

y≥0

∫

x>ydF(x)dy−

∫

y<0

∫

x≤ydF(x)dy

=∫

y≥0F(y)dy−

∫

y<0F(y)dy =

∫

x≥0F(x)dx−

∫

x<0F(x)dx (1.52)

If X ≤st Y , then by (1.41), FX(x) ≤ FY (x) and FX(x) ≥ FY (x) for all x ∈ R. It thenfollows from (1.52) that

E[X ] =∫

x≥0FX(x)dx−

∫

x<0FX(x)dx ≤

∫

x≥0FY (x)dx−

∫

x<0FY (x)dx = E[Y ]

Thus the standard stochastic order X ≤st Y implies the mean order E[X ]≤ E[Y ].

In summary, we have the following chains of implication relations:

• Likelihood-ratio order X ≤lr Y =⇒ hazard-rate order X ≤hr Y =⇒standard stochastic order X ≤st Y =⇒ mean order E[X ]≤ E[Y ]

• Almost-sure order X ≤ Y a.s. =⇒ standard stochastic order X ≤st Y =⇒mean order E[X ]≤ E[Y ].

Such implication relations are generally not available between the almost-sureorder and the likelihood-ratio or hazard-rate order, see the following two examples:

Example 1.1. Let Y and Z be two independent random variables with densities

fY (y) = 0.8I0≤y<1+ 0.2I1≤y≤2 and fZ(z) = I0≤z≤1 respectively.

Define X = Y +Z. Then X ≥ Y a.s. since Pr(Z ≥ 0) = 1. It is not difficult to obtainthe density of X by convolution as

fX (x) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

0.8x if 0 ≤ x < 11.4− 0.6x if 1 ≤ x < 20.6− 0.2x if 2 ≤ x ≤ 30 otherwise

It follows that

λX(0.5) =fX (0.5)FX(0.5)

=0.4

1− 0.1=

49

and λX(1) =fX (1)FX(1)

=0.8

1− 0.4=

43

On the other hand,

λY (0.5) =fY (0.5)FY (0.5)

=0.8

1− 0.4=

43

and λY (1) =fY (1)FY (1)

=0.2

1− 0.8= 1

Therefore, λX(0.5) < λY (0.5) and λX(1) > λY (1), which imply λX (t) < λY (t) ina neighborhood of 0.5 and λX(t) > λY (t) for 1 ≤ t < 1+ δ with δ > 0 due to thecontinuity or right-continuity of fX (t) and fY (t) at t = 0.5 and 1. This shows thatan almost-sure order X ≥ Y a.s. does not imply a hazard-rate order, hence neither alikelihood-ratio order, between X and Y .

28 1 Basic Concepts

Example 1.2. Let X and Y be two independent exponential random variables withdensities fX (x) = 2e−2xIx≥0 and fY (y) = e−yIy≥0 respectively. Then for t ≤ s,

fX (t) fY (s) = 2e−2t−sIt≥0,s≥0 = 2e−te−(t+s)It≥0,s≥0

≥ 2e−se−(t+s)It≥0,s≥0 = 2e−s−2t It≥0,s≥0 = fX (s) fY (t)

This shows X ≤lr Y . On the other hand, by (1.38),

Pr(X ≤ Y ) =∫ ∞

0Pr(X ≤ Y |X = x) fX (x)dx =

∫ ∞

0Pr(Y ≥ x) fX (x)dx

=∫ ∞

0

(∫ ∞

xe−y)

2e−2xdx =∫ ∞

0e−x2e−2xdx =

∫ ∞

02e−3xdx =

23

Hence there is no almost-sure order between X and Y . As a result, a likelihood-ratioorder X ≤lr Y does not imply an almost-sure order between X and Y , and so neitherdoes a hazard-rate order.

Independent Random Variables

Suppose that X and Y are independent random variables with respective cdf FX(x)and FY (y) and densities fX (x) and fY (y).

By (1.38), X ≤ Y a.s. if and only if

0 = Pr(X > Y ) =∫

y∈RPr(X > Y |Y = y) fY (y)dy

=∫

y∈RPr(X > y) fY (y)dy =

∫

y∈RFX(y) fY (y)dy (1.53)

Since we can ignore a Borel set of zero measure for probability values, (1.53) isequivalent to

FX(y) fY (y) = 0 for all y ∈ R

As FX(x) is a nonincreasing function on R, FX(y) fY (y) = 0 holds for all y ∈R if andonly if there exists a point a ∈ [−∞,∞] such that

FX(y) = 0 for all y ≥ a and fY (y) = 0 for all y < a

or equivalently,

fX (x) = 0 for all x ≥ a and fY (y) = 0 for all y < a (1.54)

It is easy to see that (1.54) implies fX (t) fY (s)≥ 0 = fX (s) fY (t) for all t ≤ s, so thatX ≤lr Y . Thus for independent random variables X and Y ,

Almost-sure order X ≤ Y a.s. =⇒ likelihood-ratio order X ≤lr Y =⇒

hazard-rate order X ≤hr Y =⇒ standard stochastic order X ≤st Y =⇒

mean order E[X ]≤ E[Y ]

Remark 1.4. The above implication relations between stochastic orders have beenderived for continuous random variables. For a discrete random variable X withstate space S and hazard rate λ (k) defined by (1.48), it can be shown that

F(x) = ∏k∈S:k≤x

[1−λ (k)], x ∈ S (1.55)

Replacing (1.45) with (1.55), we can show that these implication relations are alsovalid for discrete random variables by similar arguments to those for continuousrandom variables.

1.2.3 Existence of Stochastic Orders

Among the five stochastic orders introduced in Sect. 1.2.1, only the mean order nec-essarily exists between two random variables X and Y , as long as E[X ] and E[Y ]exist. It is easy to show an example without standard stochastic order, which im-plies no hazard-rate, likelihood-ratio or almost-sure order either. A simple exampleis given below.

Example 1.3. Consider two random variables X and Y with respective cdfs

FX(x) =13

xI0≤x≤3+ Ix>3 and FY (y) = (y− 1)I1≤y≤2+ Iy>2

Clearly,

FX(1) =13> 0 = FY (1) and FX(2) =

23< 1 = FY (2)

Hence the standard stochastic order does not exist between X and Y .

One situation of practical interest for the existence of stochastic orders is whenrandom variables X and Y have distributions in the same family. In such a case, thereoften exist likelihood-ratio, hazard-rate and/or standard stochastic orders between Xand Y , determined by the parameters of the distributions. Examples include:

• Exponential distributions: Let fX (x) = λ e−λ x, fY (y) = µe−µy, and “⇐⇒” standfor “if and only if”. Then for 0 ≤ x ≤ y,

30 1 Basic Concepts

fX (x) fY (y)≥ fX (y) fY (x) ⇐⇒ fX (x)fX (y)

≥ fY (x)fY (y)

⇐⇒ λ e−λ x

λ e−λ y≥ µe−µx

µe−µy

⇐⇒ eλ (y−x) ≥ eµ(y−x) ⇐⇒ λ ≥ µ

Therefore,

X ≤lr Y ⇐⇒ X ≤hr Y ⇐⇒ X ≤st Y ⇐⇒ E[X ]≤ E[Y ] ⇐⇒ λ ≥ µ

• Weibull distributions with common shape parameter: If FX(x) = exp−(β x)αand FY (y) = exp−(γy)α, then

X ≤lr Y ⇐⇒ X ≤hr Y ⇐⇒ X ≤st Y ⇐⇒ E[X ]≤ E[Y ] ⇐⇒ β ≥ γ

• Gamma distributions with common shape parameter: If

fX (x) =1

Γ (α)β α xα−1e−β x and fY (y) =

1Γ (α)

γα yα−1e−γy

then

X ≤lr Y ⇐⇒ X ≤hr Y ⇐⇒ X ≤st Y ⇐⇒ E[X ]≤ E[Y ] ⇐⇒ β ≥ γ

• Normal distributions with common variance: If X ∼ N(µX ,σ2), Y ∼ N(µY ,σ2),then

X ≤lr Y ⇐⇒ X ≤hr Y ⇐⇒ X ≤st Y ⇐⇒ E[X ]≤ E[Y ] ⇐⇒ µX ≤ µY

• Uniform distributions with lower bound zero: If X ∼U(0,θX) and Y ∼U(0,θY ),then

X ≤lr Y ⇐⇒ X ≤hr Y ⇐⇒ X ≤st Y ⇐⇒ E[X ]≤ E[Y ] ⇐⇒ θX ≤ θY

• Pareto distributions with common shape parameter: If X ∼ Pareto(α,θX) andY ∼ Pareto(α,θY ), then

X ≤lr Y ⇐⇒ X ≤hr Y ⇐⇒ X ≤st Y ⇐⇒ E[X ]≤ E[Y ] ⇐⇒ θX ≤ θY

• Poisson distributions: If X ∼ Poisson(λX) and Y ∼ Poisson(λY ), then

X ≤lr Y ⇐⇒ X ≤hr Y ⇐⇒ X ≤st Y ⇐⇒ E[X ]≤ E[Y ] ⇐⇒ λX ≤ λY

• Geometric distributions: If fX (n)= p(1− p)n and fY (n)= q(1−q)n, n= 0,1, . . . ,then

X ≤lr Y ⇐⇒ X ≤hr Y ⇐⇒ X ≤st Y ⇐⇒ E[X ]≤ E[Y ] ⇐⇒ p ≥ q

1.3 Model Description 31

This situation, however, is not valid for almost-sure order, which depends notonly on the distributions of X and Y , but also on the interrelationship between X andY themselves. In particular, if X and Y are independent, then an almost-sure order isonly available when (1.54) is satisfied. Therefore, independent random variables inthe same family of distributions cannot be almost-surely ordered.

1.3 Model Description

The fundamental issue of a scheduling problem is to determine an optimal strategy(policy) to complete a set of jobs by one or more machines. Therefore a model forsuch a problem can be described in four aspects:

1. Jobs: A job in a scheduling problem can have a wide sense. It may be a simpletask in manufacturing; a computing program; a reliability test; a journey fromone place to another; a crop to be harvested; a customer to be serviced; or a setof tasks in a complex project such as design of an aircraft.

2. Machine: A machine represents a facility to process the jobs, hence also referredto as a processor. It can be a machine in the usual sense, such as a computer orharvester; but can also be in a wider sense such as a test site, a transporter, anairport terminal, a service desk, a team of designers, or the entire organization.

3. Policy: A policy is a strategy to determine how jobs are to be processed, such asthe order to process the jobs, or which job to be processed at a given time point.It takes all factors that influence the outcome of job processing into account,including job characteristics, machine environments, and performance measures.

4. Performance measure: Each scheduling problem has a performance measure asa basis to compare the outcomes of different scheduling strategies and determinethe optimal solution to the problem.

In this section, we introduce some basic concepts in each aspect of a stochasticscheduling model. More details will be provided in subsequent chapters.

1.3.1 Job Characteristics

Consider a set of n jobs, labeled by i = 1, . . . ,n, in a scheduling problem. Each jobis associated with the following characteristics:

32 1 Basic Concepts

Processing Times

If a job is processed without interruption to completion, then the time required tocomplete it is referred as the processing time of the job. In stochastic environment,the processing time of job i is generally a random variable and denoted by Pi. WhenPi degenerates to a deterministic value, we sometimes denote it by pi.

If the processing of job i is interrupted, however, the total actual time spent onprocessing job i could be longer than Pi. This is particularly the case if the workdone on the job is lost due to the interruption, so that it has to be processed from thestart again. This issue is related to the machine environment.

While deterministic processing times have been commonly assumed and studiedin the scheduling literature, they are only approximately valid in certain situationsand unrealistic in others, especially when the concept of a job is extended to a widerange of applications.

For example, if a job is to test the reliability of a product, the processing timemay represent the time until the product fails, which is highly uncertain in mostpractical scenarios. Other examples may include customer service, account audit,house renovation, product design, and so on. The times required to complete suchjobs cannot be predetermined with accuracy due to variations and uncertainties incustomer demand, account complexity, house condition, weather, outcome of newdesign, etc. In these applications, it is unrealistic to assume deterministic processingtimes that are supposed to be known exactly in advance.

In stochastic scheduling problems, the processing times P1, . . . ,Pn are usuallyassumed to be mutually independent random variables, and their probability distri-butions are generally allowed to be arbitrary.

In some problems, we will consider Pi that follow certain specific distributions,usually in the same family, such as the normal or exponential distributions. Thenormal distribution is justified if a job consists of many small and independent parts,while the exponential distribution can describe a high level of uncertainty.

Due Dates/Deadlines

If a job has a target time for completion, we refer to this target as the due date. Thedue date of job i is denoted by Di, i = 1, . . . ,n. Similar to processing times, the duedates D1, . . . ,Dn are generally assumed to be independent random variables witharbitrary distributions, and when Di degenerates to a deterministic value, we denoteit by di in some occasions.

If job i misses its due date Di, it will incur a cost or penalty. Costs associated withmissing a due date in scheduling problems are generally of the following types:

• Tardiness cost: Let Ci denote the completion time of job i with due date Di. A jobi is said to be tardy if Ci > Di (i.e., it is completed after its due date). We denoteand define the tardiness of job i by

Ti = maxCi −Di,0= (Ci −Di)ICi>Di =

Ci −Di if Ci > Di;0 otherwise.

A tardiness cost is incurred on a tardy job. This type of cost is common inscheduling problems.

A tardiness cost may represent a penalty stipulated in a contract for missing thedue date, lower profit due to delayed sales, extra interest cost of borrowing, lossof market share, loss of opportunity, and so on.

The tardiness cost generally has the form g(Ti) = g(Ci −Di), where g(·) is anonnegative and nondecreasing function defined on [0,∞), referred to as the costfunction. Typical examples of g(·) include:

– g(x) = cIx>0 (fixed cost), where c is a positive constant;

– g(x) = cx (linear cost);

– g(x) = cx2 (quadratic cost); and

– g(x) = 1− e−cx (exponential decay).

• Earliness cost: A job i is said to be early if Ci < Di (it is completed before its duedate). The earliness of job i is denoted and defined by

Ei = maxDi −Ci,0= (Di −Ci)ICi<Di =

Di −Ci if Ci < Di;0 otherwise.

An earliness cost is incurred on an early job. It may represent, for example, theinventory cost if the due date Di is the pick-up time of a completed product bythe customer and a product completed before Di has to be stored in a warehouseuntil time Di. The earliness cost generally has the form g(Ei) = g(Di −Ci) forsome cost function g(·) defined on [0,∞).

• Lateness cost: The lateness of job i is denoted and defined by

Li =Ci −Di regardless whether Ci < Di or Ci > Di

A lateness cost has the form g(Li) = g(Ci −Di), which is similar to the tardinesscost, except that the cost function g(·) is defined on R= (−∞,∞) and allowed totake negative values.

One example is g(x) = x for x ∈ R. The lateness cost in this example is given byg(Li) =Ci −Di, which represents a penalty (positive cost) if the job is completedlater than the due date, or a reward (negative cost) if earlier.

34 1 Basic Concepts

Deterministic due dates are common in practice, since they can be negotiated anddetermined in advance in many applications. There are, however, many situations inwhich the due dates are naturally stochastic. For example, if job processing is subjectto uncertain factors, such as a construction job whose completion may be hamperedby bad weather, it is justifiable and a common practice to negotiate a random duedate that depends on the outcome of uncertain factors.

As another example, consider gathering ripe crops on a block of land as a job.Suppose that a storm is on the way and the crops will suffer heavy losses if theyare not gathered before the storm strikes. Then the arrival time of the storm can betreated as a due date and a tardy job will incur heavy cost. Such a due date is clearlyof stochastic nature.

A due date is sometimes also called a deadline, usually in the applications wheremissing it has more serious consequences, such as a total loss of value for a job thatmisses the deadline. In some scheduling problems, a job may be subject to two duedates, such as one for the pick-up time and another for a disaster. In such a casewe will call one (pick-up time) as the due date and the other (disaster time) as thedeadline.

Arrival (Available) Times

The time a job is available to be processed is referred to as the arrival time. Thearrival time of job i is denoted by Ai, which is generally a random variable, andsometimes by ai if it is deterministic.

If all jobs are available at the start time, then the arrival time is zero for everyjob. This is the case in most scheduling problems and realistic in many applications.There are, however, practical situations in which jobs arrive at different time points.One example is to process insurance claims that are received randomly at differenttimes. The arrival process is often modeled by a Poisson process.

Weights

Each job i can be assigned a weight wi. The weight may represent the level ofimportance or the value of the job. A weight is usually a deterministic value, but canbe a random variable if it is subject to uncertainty.

A weight can also represent the cost or reward associated with a job. For example,wi may be the fixed cost for missing the due date of job i, or the reward to be receivedat the time of completing job i.


1.3.2 Machine Environments

Single Machine

Most scheduling problems consider the environment in which a single machine isavailable to process a set of jobs. With a single machine, a common assumption isthat one and only one job can be processed at the same time by the machine. A keyissue in a single-machine scheduling problem is the order of processing the jobs bythe machine.

Parallel Machines

If there are m machines available to process the jobs (m > 1), and each machine canprocess one and only one job at a time, independent of others, then they are referredto as parallel machines. A scheduling problem with parallel machines includes firstselecting m jobs to be processed by the m machines in parallel, and then decidingwhich one of the remaining jobs yet to be completed for processing whenever oneof the m machines becomes available to process it.

Flowshop/Jobshop

If each job needs to be processed by m machines in sequel, and all jobs have togo through the m machines in the same routing, the model is referred to as a flow-shop. One example is food processing, where each job (food product) is processedsequentially in steps of washing, preparing, cooking, freezing, packaging, etc., andeach step is carried out by a particular processor. If each job has its own routing tovisit the m machines, then the model is referred to as a jobhop.

Team-Work Machines

In this case, there are also m machines available to process the jobs, but they donot process jobs independently in parallel, nor necessarily in sequel. Instead, eachmachine has a special role to process a particular part of the job. In other words,the m machines work as a “team” to complete a job together, each plays a uniquerole. This situation is actually dictated by the nature of the job, and we refer to ajob that requires a team of machines to complete as a team-work job. Examples ofteam-work jobs include assembly of large and complex products such as automobileor aircraft, and design of large projects such as a power plant.

36 1 Basic Concepts

Machine Breakdowns

In deterministic scheduling problems, machines are assumed to be continuouslyavailable at all times. In reality, however, all machines can breakdown randomlyfrom time to time. The problem of machine breakdowns is in fact a key motivationthat prompts extensive research interests and efforts in stochastic scheduling.

A breakdown may be caused by an actual fault of the machine, which resultsin disruption of the job being processed. A breakdown may also be caused bya processing discipline that assigns higher priority to certain jobs: When a jobof higher priority arrives, the machine has to process it immediately, causing adisruption to the normal job being processed. For normal jobs, the random arrivalsof jobs with higher priority are equivalent, in effect, to random breakdowns of themachine. No matter which type of breakdowns is involved, it is imperative to takeinto account the impact of machine breakdowns on job processing and the infor-mation of the breakdown process in the determination of appropriate schedulingpolicies.

The impact of machine breakdowns on job processing varies in different cases.In the literature on machine breakdown models, a breakdown is referred to aspreemptive-resume if it does not result in any loss of work already done on thejob being processed; or preemptive-repeat if it results in a total loss of work doneon the job being processed. In other words:

• If a preemptive-resume breakdown occurs while a job is being processed, thereis no loss of the work done on the disrupted job prior to the breakdown and theprocessing of the job can be resumed from where it was interrupted when themachine is fixed;

• If a preemptive-repeat breakdown occurs before a job is completed, the workdone on this job is totally lost and so its processing will have to restart all overagain after the machine resumes its operation.

Furthermore, two different scenarios may occur on the random processing timeof the disrupted job after a preemptive-repeat breakdown:• The processing time is re-sampled independently after each breakdown, which

is reasonable if the uncertainty of breakdowns is from external sources. We willreferred to this scenario as independent processing times and denote it by the Rs(R¯

e-s¯

ampling) model;

• The processing time remains a same (but unknown) amount (random variable)as that before the breakdown, which models the breakdown uncertainty frominternal sources. This scenario is referred to as identical processing times, anddenoted by the NRs (N

¯o R

¯e-s

¯ampling) model.

The process of breakdowns is modeled by a sequence of pairs of nonnegativerandom variables Yik,Zik∞

k=1, where Yik and Zik represent the durations of the k-th


uptime and downtime respectively for processing job i. The sequence Yik,Zik∞k=1

are usually assumed to be independent and identically distributed (i.i.d.) as a typicalrepresentative (Yi,Zi).

Let Pik denote the remaining processing time required to complete job i withoutfurther interruption after the k-th breakdown during its processing, and suppose thatjob i is completed after experiencing K breakdowns. Then

• Pi1 +Pi2 + · · ·+PiK = Pi in the case of preemptive-resume breakdowns;

• Pi1,Pi2, . . . ,PiK are i.i.d as Pi with Rs preemptive-repeat breakdowns; and

• Pi1 = Pi2 = · · ·= PiK = Pi in the NRs model.

The total time that job i occupies the machine, including both the uptimes and thedowntimes of the machine during processing job i, is denoted by Oi and referred toas the occupying time.

In order to reflect the impact of the breakdown on the job being processed moreclosely, we will refer to a preemptive-resume breakdown as no loss of work; anda preemptive-repeat breakdown total loss of work. It is easy to see that these twotypes of breakdowns do not cover all possibilities of machine breakdowns, as thework done on a disrupted job may be neither fully preserved nor totally lost after abreakdown.

We will further consider breakdown models with partial loss of work, whichcreates a unified framework for machine breakdowns, with preemptive-resume andpreemptive-repeat breakdowns as special cases at two extreme ends.

One example of partial loss of work is the loss of setups in manufacturing: Itis common in manufacturing systems that the total time required to complete a jobconsists of two parts: the setup time to get the machine ready to process the job,and the subsequent operating time to complete the job. If a job is disrupted by amachine breakdown, the work that has been done on the job remains intact but thesetup is lost. Consider the total time to complete the job as its processing time. Thena breakdown results in a partial loss of work.

More details on machine breakdown models will be provided in Chap. 4.

1.3.3 Scheduling Policies

A decision or strategy that fully specifies how a set of jobs are to be processed iscalled a policy. We denote a policy by ζ .

38 1 Basic Concepts

Deterministic Policies

In a deterministic environment where n jobs are to be processed nonpreemptivelyon a single machine, a policy consists of the order and timing to process the jobs.The order to process n jobs can be specified by a permutation π = i1, . . . , in of1, . . . ,n, with ik = j if and only if job j is the kth to be processed. For instance,π = 2,5,3,1,4 for n = 5 specifies that job 2 is the first to be processed, job 5 isthe second, and so on. There are totally n! permutations to order the job processing.

The timing to process each job depends on whether the job can be preemptedin the sense that a job being processed can be pulled off the machine before it iscompleted. A job is said to be preemptive if it can be preempted; or nonpreemptiveif its processing, once starts, must continue until the job is completed. If a job isnonpreemptive, then only the start time to process it needs to be determined.

Let si denote the idle time of job i in the sense that the machine is kept idle beforestarting to process job i. Then for nonpreemptive jobs, a policy ζ is specified by apermutation π together with a set S = s1, . . . ,sn of idle times. Such a policy iscompletely determined before it is implemented, and is referred to as a static policy.

One example of static policies is the well known shortest processing time (SPT)policy for a single-machine problem, which consists of π = i1, . . . , in and S =0, . . . ,0, with ik = j if and only if the processing time p j of job j is the kth smallestamong p1, . . . , pn. In other words, the SPT policy processes jobs in nondecreasingorder of the processing times, and imposes zero idle times (i.e., each job is processedimmediately after the last job in the queue is completed).

A policy becomes more complicated if preemptive jobs and multiple machinesare involved, as it also includes the timing to preempt a job and the selection ofwhich machine to process it. These issues will be dealt with in more specific cases.

Stochastic Policies

In a stochastic environment, the attributes of a scheduling problem, including jobprocessing times, due dates, and machine up/downtimes, may be random variables.Hence a policy can no longer use these attributes before they are realized (observed).There are two types of approaches to determine a policy in a stochastic environment:

(I) Probability distribution: While the values of random variables for the prob-lem attributes are unknown before they are observed, their probability distri-butions are assumed to be known. Hence a policy can be established based onthese probability distributions. For example, the SPT policy can be modifiedto process jobs in nondecreasing order of the expected processing times E[Pi],i = 1, . . . ,n, with zero idle times. This policy is called the shortest expectedprocessing time (SEPT) policy.


(II) Policy update: In this approach, a policy is allowed to be updated according tothe realizations of the random variables involved over the time when jobs arebeing processed. A typical example is to select a job to be processed at anytime when a machine is available based on all information observed up to thatpoint of time. This approach can also handle preemptive jobs based on theirprocessing history. It may result in pulling off a job from the machine at a timebefore it is completed, if the updated information indicates it is more beneficialto process another job at that time.

A policy from Type I approach is generally determined in advance and does notvary during the processing of the jobs. In such a case, the policy remains static andis of deterministic nature in that sense. It is however still in the domain of stochasticscheduling since it is based on probability distributions of random variables.

A policy from Type II approach, on the other hand, is truly of stochastic nature inthe sense that it evolves according to the dynamics of the system and the changinginformation realized from random variables over time. Such a policy is not fixed inadvance and cannot be specified by a permutation and/or a set of numbers. Instead,it is a rule that dictates how jobs are to be processed at every time point according tothe up-to-date information at that time point. This is referred to as a dynamic policy.

Policy Classification

We now define the classes of policies for unambiguous references in the rest of thebook. Generally, a policy is said to be dynamic if it can be revised from time to timeduring the process. More specific definitions are given as follows:

• A static policy specifies completely the allocation of jobs to each machine andthe order of jobs to process on the machine. A static policy is determined at timezero and will not change thereafter.

• A static list policy specifies a priority list (order) of jobs to process. In the caseof multiple machines, a job at the top of the list will be processed every time amachine is freed. The priority list is determined at time zero and will not changethereafter.

• A nonpreemptive dynamic policy can determine which job to process at timezero, or the time that a job is completed. No job can be preempted under such apolicy.

• A restricted dynamic policy can determine which job to process at time zero,the time that a job is completed, or the time when the machine is to resume itsoperation after a breakdown. Under such a policy, a job can be preempted onlyat machine breakdown times.

40 1 Basic Concepts

• An unrestricted dynamic policy can determine which job to process at any timeas long as the machine is working. Under such a policy, a job can be preemptedat any time.

In a single-machine problem, a static list policy is same as a static policy. Theyare, nevertheless, different if there are more than one machine. When there are mul-tiple machines, a static policy specifies, a priori, the machine allocation and thesequence to process the jobs on each machine, whereas a static list policy does notspecify the machine allocation. Under a static list policy, a job on the priority list willbe allocated to the machine that becomes available. Due to randomness involved ina stochastic scheduling problem, the information on when and which machine willbecome available to process the next job is unknown at time zero when a static listpolicy is determined. Thus, in a sense this is a kind of semi-dynamic policy.

In a deterministic environment, it is possible to determine job preemption undera static policy since all information is available at time zero, including the result ofpreempting a job. In a stochastic environment, however, job preemption is neithersensible nor practical under a static policy since it is impossible to determine futurepreemption at time zero based on the information not yet available. Therefore we donot consider static policies that allow job preemption.

1.3.4 Performance Measures

Objective Function

For a scheduling problem, we define an objective function to measure the perfor-mance of each policy. An objective function may represent the cost of or loss/profitfrom completing the jobs.

Consider a set of n jobs to be processed. Let Ci = Ci(ζ ) denote the completiontime of job i, i = 1, . . . ,n, under policy ζ . The flowtime of a job is the amount oftime that the job stays in the system; that is, the difference between its completiontime and arrival time. When all jobs arrive at time zero, the flowtime of any job isequal to its completion time.

Some typical objective functions considered in the scheduling literature are listedbelow:

• Total flowtime:

FT (ζ ) =n

∑i=1

Ci =n

∑i=1

Ci(ζ )


• Makespan:

MS(ζ ) = max1≤i≤n

Ci = max1≤i≤n

Ci(ζ )

• Completion time variance:

CTV (ζ ) = 1n

n

∑i=1

(Ci − C)2, where C =1n

n

∑i=1

Ci

• Total weighted flowtime:

W FT (ζ ) =n

∑i=1

wiCi =n

∑i=1

wiCi(ζ )

where wi is the weight assigned to job i with w1 + · · ·+wn = 1.

• Maximum lateness:

ML(ζ ) = max1≤i≤n

(Ci − di)

where di is the due date of job i.

• Weighted number of tardy jobs:

WNT (ζ ) =n

∑i=1

wiICi>di = ∑i:Ci>di

wi

• Total weighted tardiness:

W T (ζ ) =n

∑i=1

wi maxCi − di,0=n

∑i:Ci>di

wi(Ci − di)

• Total earliness and tardiness:

ET (ζ ) =n

∑i=1

|Ci − di|

• Total discounted reward:

DR(ζ ) =n

∑i=1

wi exp−δCi

where wi is the reward received at the time of completing job i and δ is thediscount rate.

42 1 Basic Concepts

In stochastic environment, the objective functions as shown above are randomvariables. A common approach in stochastic scheduling is to take the expectation ofa random objective function as a performance measure, which itself is considered asan objective function. For example, we can define the following objective functionsby taking expectations in stochastic environment:

• Expected total flowtime:

EFT (ζ ) = E[T FT (ζ )] = E

[n

∑i=1

Ci

]=

n

∑i=1

E[Ci]

• Expected makespan:

EMS(ζ ) = E[MS(ζ )] = E[

max1≤i≤n

Ci

]

• Maximum expected completion time:

MECT (ζ ) = max1≤i≤n

E[Ci]

• Expected completion time variance:

ECTV (ζ ) = E

[1n

n

∑i=1

(Ci − C)2

]=

1n

n

∑i=1

E[(Ci − C)2]

• Expected total weighted flowtime:

EWFT (ζ ) = E

[n

∑i=1

wiCi

]=

n

∑i=1

wiE[Ci(ζ )]

• Expected maximum lateness:

EML(ζ ) = E[ML(ζ )] = E[

max1≤i≤n

(Ci −Di)

]

where Di is the stochastic due date of job i.

• Maximum expected lateness:

MEL(ζ ) = max1≤i≤n

E[Ci −Di]

• Expected weighted number of tardy jobs:

EWNT (ζ ) = E[WNT (ζ )] = E

[n

∑i=1

wiICi>Di

]=

n

∑i=1

wiPr(Ci > Di)


• Expected total weighted tardiness:

EWT (ζ ) =n

∑i=1

wiE[Ti] =n

∑i=1

wiE[maxCi −Di,0]

• Expected total earliness and tardiness:

EET (ζ ) = E

[n

∑i=1

|Ci −Di|]=

n

∑i=1

E[|Ci −Di|]

• Expected total discounted reward:

EDR(ζ ) = E

[n

∑i=1

wi exp−δCi]=

n

∑i=1

wiE[exp−δCi]

A performance measure or objective function is said to be regular if it is non-decreasing in completion times Ci, i = 1, . . . ,n; otherwise it is irregular. In theabove lists of objective functions, the completion time variance and total earlinessand tardiness, as well as their expectations, are irregular; all others are regular.

Further details of objective functions will be discussed in subsequent chapters.

Optimality Criteria

Let Obj(ζ ) denote a generic objective function. In deterministic environment, thetarget of a scheduling problem is to find an optimal policy ζ ∗ that minimizesan objective function of cost/penalty/loss; or maximizes an objective function ofprofit/reward.

Since a maximization problem can always be converted to one of minimization,without loss of generality we can formulate a scheduling problem as to minimize anobjective function. Therefore, an optimal policy ζ ∗ is a solution that satisfies

Obj(ζ ∗) = minζ

Obj(ζ ) (1.56)

In other words, Obj(ζ ∗)≤ Obj(ζ ) for any policy ζ . For example, the SPT policy isoptimal if the flowtime is the objective function.

In stochastic environment, the objective function Obj(ζ ) involves random vari-ables. Hence the optimality criterion in (1.56) needs to be revised with an stochasticorder. The most common approach is to define the optimality criterion in mean or-der:

44 1 Basic Concepts

Definition 1.6. A policy ζ ∗ is optimal in mean order if and only if

E[Obj(ζ ∗)] = minζ

E[Obj(ζ )] (1.57)

That is, an optimal policy ζ ∗ minimizes the objective function in an average sense.This approach has been used overwhelmingly in the previous literature, and will beso in this book as well.

Note that if an objective function is defined as the expected value of a randomobjective function, then criterion (1.57) is identical to (1.56).

Moreover, we will also consider other two stochastic orders for the optimalitycriteria in this book:

Definition 1.7. A policy ζ ∗ is optimal in stochastic order if and only if

Obj(ζ ∗) =st minζ

Obj(ζ ) (1.58)

in the sense that Obj(ζ ∗)≤st Obj(ζ ) for any policy ζ .

Definition 1.8. A policy ζ ∗ is optimal almost surely if and only if

Obj(ζ ∗) = minζ

Obj(ζ ) a.s. (1.59)

in the sense that Pr(Obj(ζ ∗)≤ Obj(ζ )) = 1 for any policy ζ .

The other two stochastic orders introduced in Sect. 1.2.1, namely the hazard-rateand likelihood-ratio orders, will not be used to define optimality criteria as they areless directly interpretable on the random variables themselves. They will howeverbe needed in the conditions for optimal policies under other orders.

Recall that a mean order always exists between any random variables, but that isnot the case for other stochastic orders. If a standard stochastic order does not existbetween the objective functions, it is only possible to find an optimal policy in meanorder.

On the other hand, if an optimal policy in stochastic order is available, it is moredesirable than one in mean order, because the standard stochastic order comparesrandom variables over their whole ranges, whereas the mean order only comparestheir average values. An optimal policy in almost-sure order only exists in ratherrestricted circumstances under strong assumptions.

In this book, we will consider optimal policies in mean order for every problem,those in stochastic order in a variety of problems, and those in almost-sure orderonly occasionally.

1.4 Notation 45

1.4 Notation

We now summarize the notation we have introduced, which will be used throughoutthe book, but not always exclusively.

Notation on Probability Space

• Ω : Sample space

• F : Collection of events

• 2Ω : Collection of all subsets of Ω

• /0: Empty set

• E , F , etc.: Events

• Ec: Complement of event E

• IE : Indicator of event E

• Pr(E): Probability of event E

• E −F = E ∩Fc

• Pr(E|F): Conditional probability of E given F

• σ(G ): σ -algebra generated by a collection G of subsets of a space

• B: Borel field

• R: Real line (−∞,∞)

Notation on Random Variables

• X , Y , etc.: Random variables

• F(x): Cumulative distribution function (cdf)

• f (x): Probability mass function (pmf); probability density function (pdf); orprobability function (pf)

•∫

g(x)dF(x): Stieltjes integral of function g(x) with respect to cdf F(x)

• F(x) = 1−F(x): Survival (or de-cumulative distribution) function

• F(x−): Left-limit of F(x) at point x

46 1 Basic Concepts

• τF : Right extreme of F(x)

• λ (x): hazard rate

• Λ(x): cumulative hazard rate

• FX(x): cdf of random variable X

• fX (x): pf of random variable X

• FX |Y (x|y): Conditional cdf of X given Y = y

• fX |Y (x|y): Conditional pf of X given Y = y

• F(x,y): Joint cdf of (X ,Y )

• f (x,y): Joint pf of (X ,Y )

• E[X ]: Expectation of random variable X

• Var(X): Variance of random variable X

• E[X |Y = y]: Conditional expectation of X given Y = y

• E[X |Y ]: Conditional expectation of X given Y

• Φ(z): cdf of standard normal distribution N(0,1)

• φ(z): density of standard normal distribution N(0,1)

Notation on Stochastic Orders

• X ≤st Y : X is less than or equal to Y in standard stochastic order

• X ≤hr Y : X is less than or equal to Y in hazard-rate order

• X ≤lr Y : X is less than or equal to Y in likelihood-ratio order

• X ≤ Y a.s.: X is less than or equal to Y in almost-sure order

• ⇐⇒: if and only if

• ⇒: imply

Notation on Scheduling Problems

• pi: Deterministic processing time of job i

• Pi: Random processing time of job i

• Oi: Occupying time of job i

1.4 Notation 47

• di: Deterministic due date of job i

• Di: Random due date of job i

• µi: Mean (expected) processing time of job i

• σi: Standard deviation of the processing time of job i

• wi: Weight assigned to job i

• Ai: Random arrival time of job i

• ai: Deterministic arrival time of job i

• si: Idle time before processing job i

• δ : Discount rate

• X : Arrival time of transporter

• π = (i1, . . . , in): Permutation of (1, . . . ,n)

• ζ : Policy to determine how jobs are processed

• Ci =Ci(ζ ): Completion time of job i under policy ζ

• Bi(ζ ): Set of jobs that are completed before job i is processed under policy ζ

• Ti = maxCi −Di,0: Tardiness of job i

• Ei = maxDi −Ci,0: Earliness of job i

• Li =Ci −Di: Lateness of job i

• Yik: kth uptime of the machine when processing job i

• Zik: kth downtime of the machine when processing job i

• Pik: Processing time required to complete job i without further interruption afterthe k-th breakdown during its processing

Chapter 2Regular Performance Measures

The scheduling field has undergone significant development since 1950s. Whilethere has been a large literature on scheduling problems, the majority, however, isdevoted to models characterized by the so-called regular performance measures,which are monotone functions of the completion times of the jobs. This is natural,because many problems in real-world applications involve the objective to completeall jobs as early as possible, which result in the requirement of minimizing regu-lar cost functions. Scheduling models aiming to minimize the total flowtime, themakespan, or the total tardiness cost of missing the due dates, are typical examplesof regular performance measures.

This chapter covers stochastic scheduling problems with regular performancemeasures. Section 2.1 is focused on models of minimizing the sum of expected com-pletion time costs. In Sect. 2.2, we consider the problem of minimizing the expectedmakespan (the maximum completion time). Some basic models with due-date re-lated objective functions are addressed in Sect. 2.3. More general cost functions areconsidered in Sect. 2.4. Optimal scheduling policies when processing times followcertain classes of distributions are described in Sects. 2.5 and 2.6, respectively. Theobjective functions considered in Sects. 2.1–2.3 are in fact special cases of thosestudied in Sects. 2.4–2.6. However, the discussions in the first three sections illus-trate the basic techniques commonly employed in the field of stochastic scheduling,including the approach of adjacent job interchange, the argument of induction, andthe formation of stochastic dynamic programming.

2.1 Total Completion Time Cost

2.1.1 Single Machine

Suppose that n jobs, all available at time zero, are to be processed by a single ma-chine, with (random) processing time Pi for job i, i = 1, . . . ,n. If the cost to complete


49

50 2 Regular Performance Measures

job i at time t is wit, where wi is a constant cost rate for job i, then the expected totalcost to complete all jobs is:

EWFT (ζ ) = E

[n

∑i=1

wiCi(ζ )]=

n

∑i=1

wiE[Ci(ζ )], (2.1)

where Ci =Ci(ζ ) is the completion time of job i, i = 1, . . . ,n, under policy ζ . Sincejobs are available at time zero, the flowtime of a job is equal to its completion time,and so the measure above is also referred to as the expected total weighted flowtime.Minimization of the total weighted flowtime is a basic model in scheduling (cf.Smith, 1956; Rothkopf, 1966a).

Let us first consider the case with no job preemption being allowed. Then, thecompletion time of job i can be expressed as

Ci(ζ ) = ∑k∈Bi(ζ )

Pk, (2.2)

where Bi(ζ ) denotes the set of jobs scheduled no later than job i under ζ . Con-sequently, E[Ci(ζ )] = ∑k∈Bi(ζ ) E[Pk]. It is therefore clear that, if we regard E[Pi] asthe processing time for job i, i = 1, . . . ,n, the problem of minimizing EWFT (ζ )reduces to one of minimizing the weighted flowtime under deterministic processingtimes E[Pi]. It is well known that the optimal policy for this problem is to sequencethe jobs in non-increasing order of the ratio wi/E[Pi], or according to the so-calledweighted shortest expected processing time (WSEPT) rule. We can show that, bya standard induction argument, this rule is in fact also optimal in the class of non-preemptive dynamic policies.

Theorem 2.1. When a set of jobs with random processing times are to be processedby a single machine with no preemption being allowed, the WSEPT rule minimizesEWFT in the class of static policies as well as in the class of dynamic policies.

Proof. That WSEPT minimizes EWFT in the class of static policies can be shownby an adjacent job interchange argument, a technique commonly adopted in thescheduling field. Denote pi = E[Pi], i = 1, . . . ,n, which are now regarded as the(deterministic) processing times of jobs i. When there is only a single machine andno preemption is allowed, any static policy reduces to a sequence to process the njobs. Suppose a sequence ζ 0 is optimal, which is however not WSEPT. Then, in thissequence there must exist a pair of adjacent jobs j,k, with job k following jobj, such that wj/p j < wk/pk. Denote s as the starting time to process job j. Now,create a new sequence ζ ′ by interchanging the positions of j and k in the sequenceζ 0. Clearly, the completion times of all jobs before and after the pair j,k are notaffected by the interchange operation. The weighted completion time of j,k is

2.1 Total Completion Time Cost 51

Q0 = wj(s+ p j)+wk(s+ p j + pk) under ζ 0, and Q′ = wk(s+ pk)+wj(s+ pk + p j)under ζ ′. It is easy to see that

EW FT (ζ 0)−EWFT (ζ ′)=Q0−Q′ =wk p j−wj pk = p j pk(wk/pk−wj/p j)> 0,

which contradicts the optimality of ζ 0. Thus an optimal sequence must be WSEPT.

Observe that WSEPT remains an optimal static policy for any subset of jobs thatstart their processing at any time s ≥ 0. The claim that WSEPT minimizes EWFTin the class of non-preemptive dynamic policies can be established by an inductionargument on k jobs, starting at time s. The claim is true trivially for k = 1. Supposethat it is true for k− 1 jobs with starting time s′. For k jobs, any non-preemptivedynamic policy must first process a job i, and then process the next k− 1 jobs non-preemptively. Denote π i to be a WSEPT policy for the k jobs excluding job i. Then,the optimal non-preemptive dynamic policy to process the k jobs must be chosenamong the k static policies i,π i, i = 1,2, . . . ,k, due to the inductive hypothesis.As we have shown that WSEPT is optimal among any static policies for k jobs withany starting time s ≥ 0, the claim is true for k jobs.

Suppose that the hazard rate of the processing time Pi is λi(x) (see the definitionin (1.17)). A special but important case is when λi(x) is a nondecreasing function.In this case, conditional on that job i has been processed for t units of time, the re-maining time to complete it is stochastically no greater than the original processingtime Pi, and is in fact stochastically nonincreasing in t, as should be in most practicalsituations. It is easy to see that, if the hazard rate λi(x) is a nondecreasing function,then job i will never be preempted by any other job once it has been selected forprocessing. This enables us to extend the result of Theorem 2.1 to the problem withpreemption allowed.

Corollary 2.1. If the hazard rate λi(x) of any job i, i= 1,2, . . . ,n, is a nondecreasingfunction, then WSEPT is optimal in the class of preemptive dynamic policies.

The WSEPT rule, however, cannot be easily extended to the multi-machine casewith general processing times, even if the weights are identical, i.e. wi ≡ w. Thefollowing is an example from Pinedo and Weiss (1987).

Example 2.1. Suppose that all weights wi ≡ 1, and the distributions of processingtimes Pj belong to one of the following classes:

• Class I: Fj(x) = Pr(Pj ≤ x) = 1− (1− 2p j)e−x − p je−x/2;

• Class II: Fj(x) = Pr(Pj ≤ x) = 1− (1− p j)e−x − p jxe−x;

• Class III: Pr(Pj = 0) = p j, Pr(Pj = 1) = 1− 2p j, Pr(Pj = 2) = p j.

Then it is easy to verify that E[Pj] = 1 for all jobs, and the variances of processingtimes are 1+4p j,1+2p j and 2p j in Class I, II and III, respectively. If there is only

one machine to process all jobs, then WSEPT is optimal (in fact, any sequenceis WSEPT because all jobs have E[Pj] = 1). However, if there are more than onemachine, then by the result of Pinedo and Weiss, the optimal policy is to process thejobs in nondecreasing order of variances. Thus in this case, the WSEPT rule fails todeliver the optimal policy.

Some conditions on the processing time distributions are needed in the multi-machine case. This is to be studied in the next subsection, under a more generalperspective.

2.1.2 Parallel Machines

Suppose that n jobs are to be processed non-preemptively by m identical machineswhich operate in parallel. The processing times are random variables that can bestochastically ordered as P1≤st · · ·≤stPn. Let τk ≥ 0 denote the time at which ma-chine k becomes available, k = 1, . . . ,m, and τ = (τ1,τ2, . . . ,τm). The objective is tofind the scheduling policy in the class of non-preemptive policies ζ that maximizesthe expected total reward of completing all jobs,

R(τ,ζ ) = E

[n

∑i=1

r(Ci)

], (2.3)

where r(t) is a convex and decreasing function of t (0 ≤ t < ∞), and Ci =Ci(ζ ) isthe completion time of job i under policy ζ . Note that for r(t) = −t the problemis one of minimizing the expected flowtime as we have discussed in the previoussubsection. The reward function r(t) being considered here is identical for all jobs.The case of job dependent reward functions will be introduced later. The expositionbelow is based on Weber et al. (1986).

We first consider the optimal static list policy. As defined in Chap. 1, A static listpolicy specifies a priority list of jobs to process, and the job at the top of the listwill be processed every time a machine is freed. Note that a list policy does not pre-specify the allocation of jobs to the machines. This is different from the completelystatic policy that specifies, a priori, both the machine allocation and the processingorder on each machine.

Now let L = (k1, . . . ,kn) denote a static list policy, which processes the n jobsin the order k1, . . . ,kn. Let R(τ;L) denote the expected reward obtained when thejobs are processed according to L. Without loss of generality let τ1 ≤ τ2 ≤ · · ·≤ τm.For convenience we suppose that r(t) is twice differentiable and that the processingtimes are continuous random variables with density functions.


A static list policy according to the shortest expected processing time (SEPT)rule is to process the jobs in non-decreasing order of the expected processing timesE[Pi]. We will show, in this subsection, that the SEPT rule is optimal to maximizethe total reward R(τ,L). This result will be established through a few lemmas. Thefirst lemma states that for a list policy L, the rate of change of the expected rewardwith respect to the starting time of any machine is just the expected reward obtainedon that machine when the reward function is altered to the derivative r(t) of r(t)with respect to t. Let Ri(τ;L) denote the expected reward obtained on machine iwhen the list policy L is applied and the reward function is r(t). Let dR(τ;L)/dτidenote the right-derivative of R(τ;L) with respect to τi.

Lemma 2.1. For any static list policy L, dR(τ;L)/dτi exists and

dR(τ;L)/dτi = Ri(τ;L), i = 1, . . . ,m. (2.4)

Proof. The proof is by induction on the number of jobs n. It is clearly true forn = 0. Suppose it is true for fewer than n jobs. Let L = (i1, i2, . . . , in), so that job i1is processed first, on machine 1 (due to the assumption that τ1 ≤ · · · ≤ τm). DenoteL1 = (i2, i3, . . . , in), and let f (t) be the density function of Pi1 . Then

R(τ;L) =∫ ∞

0f (t)r(τ1 + t)+R(τ1+ t,τ2, . . . ,τm;L1)dt.

Differentiating and using the inductive hypothesis,

dR(τ;L)dτ1

=∫ ∞

0f (t)r(τ1 + t)+ R1(τ1 + t,τ2, . . . ,τm;L1)dt = R1(τ;L).

Similarly,

dR(τ;L)dτi

=∫ ∞

0f (t)Ri(τ1 + t,τ2, . . . ,τm;L1)dt = Ri(τ;L), for i = 1.

This completes the inductive step and so the proof of the lemma.

The next lemma states that when the reward function is r(t) and the schedulingpolicy is SEPT, the expected reward obtained on a given machine is not reducedif that machine is made to start later and does not increase if any other machine ismade to start later. Moreover, if jobs 1 and k are interchanged on machines 1 and 2,the expected reward (with r(t) as the reward function) increases.

Lemma 2.2.

(a) Suppose that L is the SEPT list (1,2, . . . ,n). Then for j = i and n ≥ 1, Ri(τ;L)is non-decreasing in τi and non-increasing in τ j .

(b) Suppose that Lk is the list (2,3, . . . ,n), omitting some k ≥ 2. Then for n ≥ 2,

E[R1(τ1 +P1,τ2 +Pk, . . . ,τm;Lk)−R1(τ1 +Pk,τ2 +P1, . . . ,τm;Lk)]≤ 0. (2.5)

Proof. The proof is again by induction on the number of jobs n. Part (a) is trivialfor n = 1; and part (b) is trivial for n = 2. Suppose that the lemma is true whenthere are fewer than n jobs to process. We show that it is true for n jobs to process.The inductive step for (b) follows from that for (a) when there are n− 2 jobs toprocess and the fact that if a function h(x1,xk) = R1(τ1 + x1,τ2 + xk, . . . ,τm;Lk) isnon-decreasing in x1 and non-increasing in xk, then E[h(P1,Pk)− h(Pk,P1)]≤ 0 forP1 ≤st Pk.

To establish the inductive step for part (a) we begin by showing that Ri(τ;L) isnon-decreasing in τi. Without loss of generality, consider i = 1 and suppose thatτ2 ≤ · · ·≤ τm. Let L1 = (2,3, . . . ,n). Then for τ1 < τ2,

R1(τ;L) = E[r(τ1 +P1)+ R1(τ1 +P1,τ2, . . . ,τm;L1)].

Because r(t) is non-decreasing and by the inductive hypothesis, the expressionto take expectation is non-decreasing in τ1. Thus R1(τ;L) is non-decreasing in τ1within the region τ1 < τ2. If τ1 > τ2, then

R1(τ;L) = E[R1(τ1,τ2 +P1, . . . ,τm;L1)],

which is non-decreasing in τ1 due to the inductive hypothesis.

It remains to consider the change in R1(τ;L) as τ1 passes through the value τ2.Suppose that τ2 = · · ·= τk < τk+1 ≤ · · ·≤ τm and let Lk be the list (2,3, . . . ,n) afteromitting job k. Then the change in R1(τ;L) as τ1 passes through the value τ2 maybe written as

R1(τ2+,τ2, . . . ,τm;L)− R1(τ2−,τ2, . . . ,τm;L).

This change equals

E[r(τ2 +Pk)+ R1(τ2 +Pk,τ2 +P1,τ3, . . . ,τm;Lk)

−r(τ2 +P1)− R1(τ2 +P1,τ2 +Pk,τ3, . . . ,τm;Lk)],

for k ≤ n; and E[−r(τ2 +P1)] for k > n (in this case, there are at least n identicalmachines which are available at τ2 to process the n jobs).

In both cases, a negative and non-decreasing r(t) together with the inductivehypothesis for part (b) implies that the expression for taking expectation is non-negative. This completes the inductive step showing that Ri(τ;L) is non-decreasingin τi. By similar arguments we can show that Ri(τ;L) is non-increasing in τ j ,j = i.

The next lemma states that when the reward function is r(t), the SEPT list doesnot produce a greater expected reward on machine 1 (the machine that starts first)

than a policy that schedules the shortest job first on machine 2 (the machine thatstarts second) and the remaining jobs according to SEPT.

Lemma 2.3. Suppose that L = (1,2, . . . ,n) is the SEPT list. Let L1 = (2,3, . . . ,n).Then for n ≥ 2,

R1(τ;L) ≤ E[R1(τ1,τ2 +P1, . . . ,τm;L1)]. (2.6)

Proof. The proof is again by induction on n. When n = 2 we have

R1(τ;L) =∫ ∞

0f (t)r(τ1 + t)+ R1(τ1 + t,τ2, . . . ,τm;L1)dt

≤ E[r(τ1 +P1)]≤ E[r(τ1 +P2)] = E[R1(τ1,τ2 +P1, . . . ,τm;(1))].

Thus (2.6) holds for n = 2. Suppose that the lemma is true when there are fewerthan n jobs to process. Let L2 = (3,4, . . . ,n). If τ1 = τ2, then the lemma is true withequality. If τ1 < τ2, then

R1(τ;L) = E[r(τ1 +P1)+ R1(τ1 +P1,τ2, . . . ,τm;L1)]

≤ E[r(τ1 +P1)+ R1(τ1 +P1,τ2 +P2, . . . ,τm;L2)]

≤ E[r(τ1 +P2)+ R1(τ1 +P2,τ2 +P1, . . . ,τm;L2)]

= E[R1(τ1,τ2 +P1, . . . ,τm;L1)],

where the first inequality follows from the inductive hypothesis, and the secondinequality follows from r(t) being non-decreasing and part (b) of Lemma 2.2. Thiscompletes the proof of the lemma.

The theorem below is the main result, which shows that SEPT is the optimalstatic list policy and the optimal non-preemptive dynamic policy.

Theorem 2.2. Suppose that n jobs have processing times which can be stochasti-cally ordered and job preemption is not allowed. Then SEPT maximizes the expectedreward R(τ,ζ ) in the class of static list policies and in the class of non-preemptivedynamic policies.

Proof. We first establish the optimality of SEPT in the class of static list policies.The proof is by induction on n. The result is trivial for n = 1. Suppose that theresult is true when there are fewer than n jobs to process. Consider a static listpolicy, which begins by processing job k (k > 1) on machine 1 (the first machineto become available). By the inductive hypothesis it must be optimal to start job 1next and then start the remaining jobs according to the SEPT list policy Lk, whereLk is (2,3, . . . ,n), omitting job k. Thus amongst those policies that start processingjob k first, the best one is the list policy (k,1,Lk); denoted by Lk,1. Interchangingjobs k and 1 will generate the list policy L1,k = (1,k,Lk). We shall show that L1,k isbetter than Lk,1 in the sense that ∆ = R(τ;L1,k)−R(τ;Lk,1)≥ 0. Assuming this, by

the inductive hypothesis and continuing the interchanging argument, we can showthat L = (1,2, . . . ,n) is optimal.

Let R(τ;S;c) be the expected reward when the policy S is applied, conditionalon Pk = c. We shall shortly show that ∆(c) = R(τ;L1,k;c)− R(τ;Lk,1;c) is non-decreasing in c. If so, we have ∆(Pk)≥st∆(X1) for any random variable X1 indepen-dent of P1, . . . ,Pn and identically distributed as P1. By taking the expectation,

∆ = E[∆(Pk)]≥ E[∆(X1)] = 0,

where E[∆(X1)] = 0 because X1 and P1 are identically distributed. Therefore, theoptimality of SEPT will follow once we show that ∆(c) is non-decreasing in c. It iseasy to see that

R(τ;L1,k;c)

= E[IP1<τ2−τ1r(τ1 +P1)+ r(τ1 +P1+ c)+R(τ1 +P1 + c,τ2, . . . ,τm;Lk)

+ IP1≥τ2−τ1r(τ1 +P1)+ r(τ2 + c)+R(τ1+P1,τ2 + c, . . . ,τm;Lk)]

and

R(τ;Lk,1;c) = r(τ1 + c)+R(τ1+ c,τ2, . . . ,τm;(1)+Lk),

where (1)+Lk denotes the list policy with job 1 processed first, then followed by Lk.Differentiation of the above gives

dR(τ;L1,k;c)dc

= E[IP1<τ2−τ1r(τ1 +P1 + c)+ R1(τ1 +P1 + c,τ2, . . . ,τm;Lk)

+ IP1≥τ2−τ1r(τ2 + c)+R2(τ1 +P1,τ2 + c, . . . ,τm;Lk)]

= E[IP1<τ2−τ1r(τ1 +P1 + c)+ R1(τ1 +P1 + c,τ2, . . . ,τm;Lk)

+ IP1≥τ2−τ1r(τ2 + c)+ R1(τ2 + c,τ1 +P1, . . . ,τm;Lk)],

where the second equality follows by the fact that the machines are identical, and

dR(τ;Lk,1;c)dc

= r(τ1 + c)+ R1(τ1 + c,τ2, . . . ,τm;(1)+Lk)

≤ E[r(τ1 + c)+ R1(τ1 + c,τ2 +P1, . . . ,τm;Lk)],

where the inequality follows from Lemma 2.3. Thus

d∆(c)dc

≥ E[IP1<τ2−τ1r(τ1 +P1 + c)+ R1(τ1 +P1+ c,τ2, . . . ,τm;Lk)

+ IP1≥τ2−τ1r(τ2 + c)+ R1(τ2 + c,τ1 +P1, . . . ,τm;Lk)− r(τ1 + c)+ R1(τ1 + c,τ2 +P1, . . . ,τm;Lk)

].

2.2 Makespan 57

Using part (a) of Lemma 2.2, τ1 ≤ τ2 and non-decreasing r(t), we can see that theexpression above to take expectation is non-negative. This shows that ∆(c) is non-decreasing in c, and completes the inductive step. Consequently, the optimality ofSEPT in the class of static list policies is proven.

Observe that SEPT is an optimal static list policy for any subset of k jobs andstarting times (τ1,τ2, . . . ,τm) on the m machines. Thus, that SEPT is optimal inthe class of non-preemptive dynamic policies can be proven by a similar inductionargument as in the proof of Theorem 2.1.

Based on the observation that a job will never be preempted once it is started ifits processing time has a non-decreasing hazard rate, we have

Corollary 2.2. If all jobs have non-decreasing hazard rate functions, then SEPT isoptimal to maximize R(τ,ζ ) in the class of preemptive dynamic policies.

By examining the proof of Theorem 2.2, one can see that the result is still truefor some models in which the reward obtained on completing each job differs fromjob to job, under a compatibility condition as specified in the theorem below.

Theorem 2.3. If ri(t) are job-dependent, the results of Theorem 2.2 and Corollary2.2 remain valid under the following conditions: the processing times of the jobscan be stochastically ordered and Pi ≤st Pj ⇒ ri(t)≤ r j(t) for all t and i, j.

Consider ri(t) = −wit. Then the problem of maximizing R(τ,ζ ) reduces toone of minimizing the expected weighted flowtime. The compatibility conditionof Theorem 2.3 reduces to Pi≤stPj ⇒ wi ≥ wj for any i, j.

2.2 Makespan

In this section we focus on the problem to minimize the expected makespan (themaximum completion time amongst all jobs). We assume that the processing timefor any job does not depend on the scheduling policy.1 The makespan measure isthus trivial when there is only one machine (because the expected makespan is equalto the total expected processing time). We will consider the problem with multipleparallel machines in this section, and show that the optimal policy should processthe jobs in the non-increasing order of the expected processing times E[Pi], thatis, according to the longest expected processing time (LEPT) rule. This scheduling

1 An alternative assumption is that the setup time, if any, before processing any job does not dependon the processing sequence and so has been included in the processing time; This is an assumptionwe implicitly make throughout the whole book.


policy is in sharp contrast to the SEPT rule, and is reasonable for the makespanmeasure because it proposes, in a way, to deal with longer jobs first, so that shorterjobs can be used for allocation to different machines to reduce their idle times. Theproof of the optimality of LEPT is limited to the problem where the processing timesare all exponentially distributed.

More specifically, suppose that n jobs are to be processed on m identical ma-chines operating in parallel. All jobs are available for processing at time zero.The processing times Pi, i = 1,2, . . . ,n, are exponentially distributed with ratesλ1,λ2, . . . ,λn. Preemption is allowed, but will not be needed under the LEPT rule.This is due to the memoryless property of the exponential processing times, withwhich a job always remains to be of the highest priority once it is selected. Withoutloss of generality, suppose that the jobs have been numbered so that λ1 ≤ · · ·≤ λn.

The approach of dynamic programming will be used to establish the optimalityof LEPT. The proof below is based on Weber (1982a).

Theorem 2.4. The LEPT rule minimizes the expected makespan in the class of dy-namic policies.

Proof. To simplify the notation, the proof is conducted for the case of two machines.A similar approach can be used for the general case with any number of machines.The proof is by induction on n. Suppose that the claim is true for fewer than njobs. We will show that it is true when there are n jobs to process. It is clear that theoptimal policy is non-preemptive due to the memoryless property of the exponentialprocessing times, and that if processing jobs i and j is optimal at time t, then thiswill remain to be the case until one of them is completed.

Let UJ denote the expected value of the remaining time needed to complete all njobs under an optimal policy, given that the jobs in the subset J = i1, . . . , iℓ havealready been completed. If J = /0, then UJ is denoted by U . Let V J denote the samequantity under the LEPT policy, and V =V /0.

Conditioning on the first job completion and by the inductive hypothesis thatLEPT is optimal when there are fewer than n jobs, we obtain

U = mini, j

1

λi +λ j+

λi

λi +λ jV i+

λ j

λi +λ jV j

, (2.7)

where the first term on the RHS is the expected time until the first job is completed,and the second (third) term is equal to the probability that job i ( j) is the first jobto be completed, multiplied by the expected remaining time needed to complete then− 1 remaining jobs under LEPT. It is easy to see that (2.7) is equivalent to

0 = mini, j

1+λi(V i−V)+λ j(V j−V)+ (λi+λ j)(V −U)

. (2.8)

2.2 Makespan 59

Since λ1 and λ2 are the two smallest values of λk and V ≥ U , the fourth term onthe RHS of (2.8) is minimized by i, j = 1,2. Hence to show that LEPT is optimalit suffices to show that (i, j) = (1,2) also minimize the sum of the second and thirdterms. Define

Vi = λi(V i−V) and Di j =Vi −Vj.

The result of Lemma 2.4 below shows that if λi < λ j, then Di j ≤ 0. Hence thesum of the second and third terms on the RHS of (2.8), Vi +Vj, is minimized by(i, j) = (1,2) and the induction is complete.

In what follows we shall consider V , Vi and Di j as functions of λ1, . . . ,λn. DefineV J

i and DJi j similarly to Vi and Di j, except that i and j are excluded from J. For

example, V Ji = λi(V Ji −V J), where Ji denotes the list J with job i appended.

Lemma 2.4. Suppose λi < λ j . Then

Di j ≤ 0 anddD12

dλ1≥ 0. (2.9)

Proof. The proof is by induction on n. When n = 2, Di j = (λi/λ j)−(λ j/λi) and thelemma is true. If i and j are the two smallest indices not in the subset J then jobs iand j will be processed first. Conditioning on the first job completion, we have

V J =1

λi +λ j+

λi

λi +λ jV Ji +

λ j

λi +λ jV J j ,

or (λi +λ j)V J = 1+λiV Ji +λ jV J j . This together with the definition of V Ji allows

us to derive the following identities.

(λ1 +λ2 +λ3)V1 = λ1(λ1 +λ2 +λ3)V 1 −λ1(λ1 +λ2 +λ3)V

= λ1(1+λ1V 1 +λ2V 12 +λ3V 13)−λ1(1+λ1V 1 +λ2V 2 +λ3V )

= λ1(λ3V 13 −λ3V 1)+λ2(λ1V 12 −λ1V 2)+λ3V1,

or (λ1 +λ2)V1 = λ1V 13 +λ2V 2

1 . We can establish the following similarly.

(λ1 +λ2)V2 = λ1V 12 +λ2V 2

3 , (λ1 +λ2)Vi = λ1V 1i +λ2V 2

i , i = 3, . . . ,n.

Combining these we have

D12 =λ1

λ1 +λ2D1

32 +λ2

λ1 +λ2D2

13 (2.10)

and

D2i =λ1

λ1 +λ2D1

2i +λ2

λ1 +λ2D2

3i, i = 3, . . . ,n. (2.11)


The inductive hypothesis states that (2.9) is true when there are fewer than njobs to process, and this hypothesis for the first inequality in (2.9) implies that bothD2

13 ≤ 0 and D123 ≤ 0 are true when there are n jobs to process. The hypothesis for the

second inequality in (2.9) similarly implies that dD213/dλ1 ≥ 0. By integrating this

with respect to λ1 we have D213 ≤ D1

23 = −D132 ≤ 0. Since λ1 ≤ λ2, we can see that

D12 ≤ 0. The inductive hypothesis also implies that D12i and D2

3i are nonpositive,and thus D2i ≤ 0. Combining these inequalities establishes the inductive step forthe first inequality in (2.9). The inductive step for the second inequality in (2.9) isestablished by differentiating the RHS of (2.10) with respect to λ1 and then usingthe inductive hypothesis to show that every term is nonnegative.

Theorem 2.4 shows that LEPT is optimal in the class of preemptive dynamicpolicies. The LEPT policy, however, requires no job preemption, and is thus alsooptimal in the class of static list policies and in the class of non-preemptive dynamicpolicies.

2.3 Regular Costs with Due Dates

We consider a few basic due-date related models in this section. More results onstochastic scheduling problems involving due dates can be found in Sects. 2.4–2.6,as special cases of general cost functions. The analyses in this section are based onPinedo (1983) and Emmons and Pinedo (1990).

2.3.1 Weighted Number of Tardy Jobs

Suppose that n jobs are to be processed on a single machine. The processing timeof job i is a random variable Pi, exponentially distributed with rate λi. Job i has adue date Di, which is a random variable with distribution Fi. If the job is completedlater than the due date, then there is a constant tardy cost wi, which is also called theweight for job i. The objective is to determine a scheduling policy ζ to process then jobs so as to minimize the objective function:

EWNT (ζ ) = E

[n

∑i=1

wiICi≥Di

], (2.12)

where Ci is the completion time of job i under the scheduling policy ζ and Di isthe due date of job i. The objective function EWNT (ζ ) represents the expectedtotal tardy penalty of missing due dates: if job i misses its due date Di, it incurs afixed penalty wi. The model is also commonly referred to as one of minimizing the

2.3 Regular Costs with Due Dates 61

expected weighted number of tardy jobs. If wi = 1 for all jobs i, it reduces to one ofminimizing the expected total number of tardy jobs.

Since the processing times are exponentially distributed with means 1/λ j, theWSEPT rule processes the jobs in the non-increasing order of λ jw j. The followingtheorem shows the optimality of WSEPT in the class of non-preemptive static listpolicies.

Theorem 2.5. If all due dates have the same cumulative distribution function (cdf)F, then processing the jobs in non-increasing order of λ jw j is optimal to minimizeEW NT (ζ ) in the class of non-preemptive static list policies.

Proof. Consider first the case with two jobs only, where there are only two possiblejob sequences (1,2) and (2,1). Then

EWNT (1,2) = w1 Pr(X1 > D1)+w2 Pr(X1 +X2 > D2)

= w1

∫ ∞

0e−λ1x f (x)dx+

w2

λ1 −λ2

∫ ∞

0(λ1e−λ2x −λ2e−λ1x) f (x)dx.

Similarly, we can get the expression for EWNT (2,1), and show that

EWNT (1,2)−EWNT (2,1) = (λ2w2 −λ1w1)∫ ∞

0K(x) f (x)dx, (2.13)

where K(x) = (e−λ1x − e−λ2x)/(λ2 −λ1)> 0 for x > 0. Thus

EWNT (1,2)≤ EWNT (2,1) when λ1w1 ≥ λ2w2.

The argument can be extended to the case with n jobs. Compare the sequence1, . . . , i−1, i+1, i, i+2, . . . ,n with the sequence 1, . . . , i−1, i, i+1, i+2, . . . ,n. It isclear that the expected tardy penalties of jobs 1, . . . , i−1 and i+2, . . . ,n are the samein the two sequences. Therefore, we need only compare the sum of the expectedtardy penalties of jobs i and i+ 1 in the two sequences. Conditional on the timethat job i− 1 is finished, the problem of comparing the sum of the expected tardypenalties of jobs i and i+ 1 in the two sequences reduces to the case of two jobsas described above. We can thus show that the total expected tardy penalty can bereduced if the jobs are not processed in the non-increasing order of λ jw j.

We now consider the optimal policy in the class of dynamic policies. We limitour consideration here to the case with a common, deterministic due date d for alljobs. Generally speaking, the argument to show that a static list policy is optimalalso in the class of dynamic policies is mainly based on the observation that it isnot necessary to alter the processing priority of a job once a processing policy hasbeen applied (consequently, the optimal static sequence is also an optimal dynamicpolicy). If the job due dates are random and distinct, this argument may not be valid.This can be seen from the following example.

Example 2.2. Consider a case with random due dates and optimal static sequence(1,2, . . . ,n). Suppose that when the machine is processing job 1, the due date of job2 is realized. As a result, when job 1 is finished, job 2 (which would follow job 1under the static policy) should be preempted by job 3, because job 2 has alreadybeen tardy and it will incur a fixed tardy penalty w2 no matter when it is processed.As a result, job 2 should be re-sequenced to the end of the job sequence.

This example shows that the optimal static sequence may no longer be optimaleven in the class of non-preemptive dynamic policies, if the due dates are randomand distinct. However, if all jobs have the same deterministic due date, then theclaim that an optimal static policy WSEPT is also an optimal dynamic policy can beestablished.

Theorem 2.6. Suppose that all jobs have the common fixed due date d. Then,processing the jobs in the non-increasing order of λ jw j is optimal to minimizeEW NT (ζ ) in the class of non-preemptive dynamic policies and in the class of pre-emptive dynamic policies.

Proof. We first show that the static policy WSEPT is optimal in the class of non-preemptive dynamic policies. The proof is by induction on n. It follows from (2.13)that the claim is true for n = 2. Suppose that the claim holds for k− 1 jobs, whichstart at time t. We consider the case with k jobs, to start at time t ′ < t. Accordingto the inductive hypothesis, a non-preemptive dynamic policy must first process onejob at time t ′, and after finishing this job, process the remaining k−1 jobs in the non-increasing order of λ jw j. The optimal policy must select the job with the highestλ jw j to be processed first. Otherwise, this job must be the second to be processeddue to the WSEPT order for the k− 1 jobs. Then, following a similar analysis as inthe proof of Theorem 2.5, we can show that interchanging the positions of the firsttwo jobs will reduce the expected value of the objective function. This completesthe inductive step.

The proof that the static policy WSEPT is optimal also in the class of preemp-tive dynamic policies can be established using the memoryless property of the pro-cessing times, which ensures that a job will never be preempted once it is selectedfor processing, because under the WSEPT rule, λ jw j and the due date d do notdepend on time t. Thus preemption is not needed even though it is allowed, andconsequently, the optimal non-preemptive dynamic policy is optimal in the class ofpreemptive dynamic policies.

The WSEPT rule remains optimal for the problem where the jobs have a commondue date D, which is a random variable with an arbitrary distribution F . The proofcan be found in Derman et al. (1978).

We now consider the problem with m identical machines operating in parallel.Again, the objective is to minimize EWNT (ζ ); that is, to minimize the expectedsum of tardy costs, where the tardy cost for a job i is a fixed penalty wi when it


misses its due date. We can show that, under some quite restrictive conditions, theoptimal non-preemptive static list policy can be determined by solving a determin-istic assignment problem. The idea is to optimally assign the n jobs to the n posi-tions of a list policy, under certain assignment costs. The assignment costs can bepre-calculated if the processing times and the due dates satisfy some conditions asshown below.

First, consider the case where all processing times are deterministic and equal(thus, without loss of generality, they can be assumed to equal 1). The weights wiare job dependent, and the due dates Di of the jobs are random variables, followingarbitrary distributions Fi(x), i = 1,2, . . . ,n. Job preemption is not allowed. Then,under a static list policy, the first batch of m jobs start on the m machines at timezero, and complete their processing at time 1. Thus, the probability for a job j inthis batch to be overdue is Fj(1), and so the expected cost is wjFj(1). Similarly, thecompletion time of the second batch of m jobs complete their processing at time2, and the corresponding expected cost is wjFj(2), etc. To summarize, we have thefollowing theorem.

Theorem 2.7. Suppose that m parallel identical machines are to process n jobswith processing times being deterministic and equal to 1. Then, the optimal non-preemptive static list policy to minimize EWNT (ζ ) can be obtained by solving adeterministic assignment problem with the following cost matrix: If job j is as-signed to position i in the static list, where km+ 1 ≤ i ≤ (k + 1)m, then the costis wjFj(k+ 1), k = 0,1,2, . . . . The optimal assignment solution that minimizes thetotal assignment cost specifies the optimal non-preemptive static list policy.

We now consider random processing times that are i.i.d. exponential with mean 1.The due date of job i is also exponential with rate µi, i = 1,2, . . . ,n. The due datesdo not have to be independent. Again, we can show that the optimal non-preemptivestatic list policy can be obtained by solving a deterministic assignment problem.Clearly, the first batch of m jobs in a static list policy start their processing on them machines at time 0. The probability for a job j amongst this batch to miss its duedate is µ j/(1+ µ j), and so the expected cost is wjµ j/(1+ µ j). Job j in positioni of the list policy, i = m+ 1, . . . ,n, has to wait for i−m job completions beforeits processing starts. Given that all machines are busy, the time between successivecompletions is exponentially distributed with rate m. Thus, the probability that a jobstarts before its due date is (m/(m+µ j))i−m and so the probability that it completesbefore its due date is (m/(m+ µ j))i−m/(1+ µ j). Consequently, the probability forthe job to miss its due date is 1−(m/(m+µ j))i−m/(1+ µ j), and the expected cost iswj(1− (m/(m+µ j))i−m/(1+ µ j)). To summarize, we have the following theorem.

Theorem 2.8. Suppose that m parallel identical machines are to process n jobs,where the processing times are i.i.d. exponential with mean 1, and the due datesof the jobs are exponential random variables with rate µi, i = 1,2, . . . ,n. Then, theoptimal non-preemptive static list policy to minimize EWNT (ζ ) can be obtained by


solving a deterministic assignment problem with the following cost matrix: If job jis assigned to position i ∈ 1,2, . . . ,m in the static list, then the expected cost iswjµ j/(1+ µ j); If job j is assigned to position i ∈ m+ 1, . . . ,n, then the expectedcost is

wj

(1−(

mm+ µ j

)i−m 11+ µ j

).

The optimal assignment solution that minimizes the total assignment cost specifiesthe optimal non-preemptive static list policy.

2.3.2 Total Weighted Tardiness

Suppose that n jobs are to be processed on a single machine. The processing timeof job i is a random variable Pi, exponentially distributed with rate λi. Job i has adue date Di, which is a random variable with cdf Fi. If job i is completed at timeCi > Di (missing its due date), then it incurs a tardiness cost wiTi, where wi is theunit tardiness cost (which is also called the weight of job i) and Ti =maxCi−Di,0is the tardiness. The objective is to determine a scheduling policy ζ to process the njobs so as to minimize the expected total tardiness cost:

EWT (ζ ) = E

[n

∑i=1

wiTi

]. (2.14)

This objective function is also referred to as the expected total weighted tardiness.

Since the processing times are exponentially distributed with means 1/λ j, theWSEPT rule processes the jobs in the non-increasing order of λ jw j. We will showthat WSEPT is optimal in the class of static list policies, under a compatibilitycondition that requires λkwk ≥ λlwl ⇒ Dk ≤st Dl , i.e., the due date of job k isstochastically smaller than the due date of job l for every pair of jobs k, l suchthat λkwk ≥ λlwl . Note that if the jobs have a common due date distribution, thecompatibility condition is satisfied automatically.

Theorem 2.9. When λkwk ≥ λlwl ⇒ Dk ≤st Dl, sequencing the jobs in the non-increasing order of λ jw j is optimal to minimize EWT (ζ ) in the class of non-preemptive static list policies.

Proof. Consider first the case with two jobs only, so there are only two possible jobsequences (1,2) and (2,1). Then


EWT (1,2) =w1

λ1Pr(X1 > D1)+

w2

λ1Pr(X1 > D2)+

w2

λ2Pr(X1 +X2 > D2)

=w1

λ1

∫ ∞

0e−λ1x f1(x)dx+

w2

λ1

∫ ∞

0e−λ1x f2(x)dx

+w2

λ2(λ1 −λ2)

∫ ∞

0(λ1e−λ2x −λ2e−λ1x) f2(x)dx.

Similarly, we can derive the expression for EWT (2,1). These lead to

EWT (1,2)−EWT (2,1) =−w1λ1

∫ ∞

0H(x) f1(x)dx+w2λ2

∫ ∞

0H(x) f2(x)dx,

where

H(x) =λ2e−λ1x −λ1e−λ2x

λ1λ2(λ2 −λ1)

decreases monotonically from 1/λ1λ2 to 0 on [0,∞]. Hence, by the property ofstochastic ordering, if D1 ≤st D2, then

∫ ∞

0H(x) f1(x)dx ≥

∫ ∞

0H(x) f2(x)dx.

So EWT (1,2)≤ EWT (2,1) when λ1w1 ≥ λ2w2 and D1 ≤st D2.

The argument can be extended to the case with n jobs, similar to the last part ofthe proof of Theorem 2.5.

We now examine the optimal policy in the class of dynamic policies. Again, herewe limit our analysis to the case with a common, deterministic due date d for alljobs. With a fixed due date d, we can employ an idea of Pinedo (2002) to convertthe weighted tardiness wjTj to a sum of weighted number of tardy jobs. Specifically,the tardiness Tj can be approximated by an infinite series of tardy indicators:

Tj ≈ ε∞

∑k=0

ICj≥d+kε,

It follows from Theorem 2.6 that the same WSEPT rule minimizes the tardy penalty∑wjICj≥d+kε for each k. Consequently, it also minimizes their sum over k. Thistogether with a continuity argument gives rise to the following theorem.

Theorem 2.10. When all jobs have a common deterministic due date d, processingthe jobs in the non-increasing order of λ jw j is optimal to minimize EWT (ζ ) in theclass of non-preemptive dynamic policies and in the class of preemptive policies.

Pinedo (1983) shows that the optimality of WSEPT extends to the case of randomdue dates, under a compatibility condition that requires the distributions of the duedates be nonoverlapping and compatible with the order of WSEPT, in the sense thatλkwk ≥ λlwl implies Pr(Dk ≤ Dl) = 1.


2.4 General Regular Costs

We now consider more general regular cost functions. We limit our considerationto non-preemptive static policies and problems with a single machine. The costfunctions under consideration in this section are, nevertheless, stochastic processes,which are to be elaborated below. We will see that, with a unified treatment of suchgeneral cost functions, numerous results established in the literature, including someof those presented in the previous sections, are covered as special cases. The expo-sition in this Section is mainly based on Zhou and Cai (1997).

Again, note that the completion time of job i under a sequence (static policy) πcan be expressed as

Ci =Ci(π) = ∑k∈Bi(π)

Pk, (2.15)

where Bi(π) denotes the set of jobs scheduled no later than job i under sequenceπ . Let fi(Ci) denote the cost of processing job i, where fi(·) is a general regular(stochastic) cost function under the following assumptions:

(i) fi(t), t ≥ 0 is a stochastic process independent of processing times Pi;

(ii) fi(t), t ≥ 0 is nondecreasing in t ≥ 0 almost surely; and

(iii) mi(t) = E[ fi(t)] exists and is finite for every t ≥ 0.

This cost function fi(·) is general enough to cover most regular costs, deterministicor stochastic, that have been studied in the literature. Examples include:

• Weighted flowtime: fi(t) = wit ⇒ fi(Ci) = wiCi;

• Tardiness: fi(t) = max0, t −Di ⇒ fi(Ci) = max0,Ci −Di, where Di is thedue date of job i;

• Weighted number of tardy jobs: fi(t) = wiIt>Di ⇒ fi(Ci) = wiICi>Di.

We will address the following two types of performance measures with generalregular costs:

(i) Total Expected Cost:

TEC(π) =n

∑i=1

E[ fi(Ci)]; (2.16)

(ii) Maximum Expected Cost:

MEC(π) = max1≤i≤n

E[ fi(Ci)]. (2.17)

2.4 General Regular Costs 67

2.4.1 Total Expected Cost

We first state some properties regarding stochastic order in a lemma, whose proofcan be found in, e.g., Zhou and Cai (1997).

Lemma 2.5.

(i) If X ≤st Y and U is independent of (X ,Y ), then X +U ≤st Y +U.

(ii) If X ≤st Y and f (t) is nondecreasing in t ≥ 0 a.s., then f (X)≤st f (Y ).

(iii) If X ≤ Y a.s. and E[ f (t)] is nondecreasing in t ≥ 0, then E[ f (X)]≤ E[ f (Y )].

The main result for the TEC problem is as follows.

Theorem 2.11.

(i) Let π = (. . . , i, j, . . . ) and π ′ = (. . . , j, i, . . . ) be two job sequences with identicalorder except that two consecutive jobs i and j are interchanged. If Pi ≤st Pj and(mi −m j)(t) = mi(t)−m j(t) is nondecreasing in t ≥ 0, where mi(t) = E[ fi(t)],then TEC(π)≤ T EC(π ′).

(ii) If the jobs can be arranged such that P1 ≤st P2 ≤st · · ·≤st Pn, and (mi−m j)(t) isnondecreasing in t for any i < j, then the sequence π∗ = 1,2, . . . ,n minimizesT EC(π). In other words, a sequence in nondecreasing stochastic order of theprocessing times is optimal.

Proof. Clearly, Part (ii) of the theorem follows immediately from Part (i). Hence itsuffices to prove Part (i) only.

By (2.15), it is easy to see that Bk(π) = Bk(π ′) for k = i, j, hence

Ck(π) =Ck(π ′) if k = i, j. (2.18)

Moreover, let C denote the completion time of the job sequenced just before job iunder π (which is the same job sequenced just before job j under π ′). Then

Ci(π ′) =C+Pj +Pi =C+Pi+Pj =Cj(π). (2.19)

It follows that

T EC(π)−TEC(π ′) = E[ fi(Ci(π))]+E[ f j(Cj(π))]−E[ f j(Cj(π ′))]−E[ fi(Ci(π ′))]

= E[ fi(Ci(π))− fi(Cj(π ′))]+E[ fi(Cj(π ′))− f j(Cj(π ′))]

−E[ fi(Ci(π ′))− f j(Ci(π ′))]

= E[ fi(Ci(π))]−E[ fi(Cj(π ′))]

+E[( fi − f j)(Cj(π ′))]−E[( fi − f j)(Ci(π ′))]. (2.20)


By the independence between jobs, C is independent of Pi and Pj. Thus by Part (i)of Lemma 2.5,

Pi ≤st Pj =⇒ Ci(π) =C+Pi ≤st C+Pj =Cj(π ′).

It then follows from Part (ii) of Lemma 2.5 that fi(Ci(π)) ≤st fi(Cj(π ′)), whichimplies

E[ fi(Ci(π))]≤ E[ fi(Cj(π ′))]. (2.21)

Furthermore, since

Cj(π ′) =C+Pj ≤C+Pj +Pi =Ci(π ′) a.s.

and E[( fi− f j)(t)] = (mi−m j)(t) is a nondecreasing function of t by the assumptionof the theorem, Part (iii) of Lemma 2.5 implies

E[( fi − f j)(Cj(π ′))]≤ E[( fi − f j)(Ci(π ′))]. (2.22)

Combining (2.21) and (2.22), we get TEC(π)− TEC(π ′) ≤ 0 from (2.20), whichproves Part (i) of the theorem. Part (ii) then follows.

Remark 2.1. A key assumption in Theorem 2.11 is that the processing times Pihave a stochastic order. Such an order often exists and reduces to the order of themeans when Pi follow a certain family of distributions. Examples include:

1. Exponential distributions: If Pi are exponentially distributed, then Pi ≤st Pj if andonly if E[Pi]≤ E[Pj].

2. Normal distributions: If Pi are normally distributed with a common variance, thenPi ≤st Pj if and only if E[Pi]≤ E[Pj].

3. Uniform distributions: If Pi are uniformly distributed over intervals [0,bi], thenPi ≤st Pj ⇐⇒ bi ≤ b j ⇐⇒ E[Pi]≤ E[Pj].

4. Gamma distributions: If Pi are gamma distributed with a common shape param-eter, then Pi ≤st Pj if and only if E[Pi]≤ E[Pj].

5. Poisson distributions: If Pi are Poisson distributed, then Pi ≤st Pj if and only ifE[Pi]≤ E[Pj].

The next theorem addresses a class of problems involving ‘due dates’.

Theorem 2.12. Let fi(t) = wig(t−Di)It>Di, i = 1, . . . ,n, where D1,D2, . . . ,Dn arenonnegative random variables (due dates) following arbitrary distributions, wi is adeterministic weight associated with job i, and g(·) is a strictly increasing, convexand absolutely continuous function defined on [0,∞) with g(0) = 0. Then,

(i) If Di ≤st D j and wi ≥ wj, then (mi −m j)(t) is nondecreasing in t ≥ 0.

(ii) If P1 ≤st · · ·≤st Pn, D1 ≤st · · ·≤st Dn and w1 ≥ · · ·≥wn, then π∗ = 1,2, . . . ,nminimizes TEC(π).

Proof. By the assumptions, g(x) ≥ 0 on [0,∞), g−1(x) exists on [0,g(∞)), and thederivative g′(x) of g(x) exists almost everywhere in any closed subinterval of [0,∞)and is nondecreasing on its domain. Hence

mi(t) = E[ fi(t)] =∫ ∞

0Pr( fi(t)≥ x)dx

=∫ wig(t)

0Pr(wig(t −Di)≥ x, t > Di)dx

=∫ wig(t)

0Pr(Di ≤ t − g−1(x/wi))dx. (2.23)

Let y = t − g−1(x/wi), so that x = wig(t − y), dx = −wig′(t − y)dy, x = 0 ⇒ y = tand x = wig(t)⇒ y = t − t = 0. Then by (2.23),

mi(t) =∫ t

0Pr(Di ≤ y)wig′(t − y)dy.

It follows that

mi(t)−m j(t) =∫ t

0wig′(t − y)Pr(Di ≤ y)dy−

∫ t

0wjg′(t − y)Pr(D j ≤ y)dy

=∫ t

0wig′(t − y)[Pr(Di ≤ y)−Pr(D j ≤ y)]dy

+(wi −wj)∫ t

0g′(t − y)Pr(D j ≤ y)dy. (2.24)

For any i < j, the assumptions of the theorem imply wi ≥ wj and Di ≤st D j so thatPr(Di ≤ y) ≥ Pr(D j ≤ y) for all y ≥ 0. Note also that g′(t − y) is nondecreasing int. Thus (2.24) shows that mi(t)−m j(t) is nondecreasing in t. This proves Part (i) ofthe theorem, and Part (ii) follows from Part (i) together with Theorem 2.11.

The next theorem gives the optimal solutions when the jobs have a common costfunction associated with or without job-dependent weights.

Theorem 2.13. Let f (t), t ≥ 0 be a stochastic process which is nondecreasing int almost surely, and suppose that a stochastic order exists between the processingtimes P1, . . . ,Pn.

(i) A sequence in nondecreasing stochastic order of Pi minimizes

T EC(π) =n

∑i=1

E[ f (Ci)]. (2.25)

(ii) Let wi denote the weights associated with job i, i= 1, . . . ,n. If wi are ‘agreeable’with the processing times in the sense that the jobs can be arranged such thatP1 ≤st · · ·≤st Pn and w1 ≥ · · ·≥wn, then a sequence in nondecreasing stochasticorder of Pi minimizes

T EC(π) =n

∑i=1

wiE[ f (Ci)]. (2.26)

Proof. Part (i) is obviously a special case of Part (ii). To prove (ii), it suffices toverify that mi(t)−m j(t) is nondecreasing in t if Pi <st Pj according to Theorem 2.11.Since f (t) is nondecreasing a.s., E[ f (t)] is nondecreasing in t. Moreover, under theagreeability assumption, Pi <st Pj implies wi ≥ wj , which in turn implies that

mi(t)−m j(t) = E[wi f (t)]−E[wj f (t)] = (wi −wj)E[ f (t)]

is nondecreasing in t.

Special Cases of Total Expected Cost

Because of the generality of probabilistic distributions of the processing timesand the cost functions, Theorems 2.11–2.13 can cover many commonly studiedstochastic problems with specific performance measures (usually involving linearor squared form of cost functions). We list a number of special cases below, whichextend the previous results on these cases to more general situations.

Case 1. Expected Total Weighted Tardiness:

EWT (π) = E

[

∑i:Ci>Di

wi(Ci −Di)

]=

n

∑i=1

wiE[max0,Ci −Di]. (2.27)

When all variables are deterministic, this is an unary NP-hard problem and thus ananalytical optimal is unlikely obtainable (cf. Lawler et al. 1982). When Pi areexponentially distributed with rates τi, Pinedo (1983) shows that a sequence innonincreasing order of τiwi minimizes EWT (π) under an agreeable conditionthat τiwi ≥ τ jw j ⇒ Di ≤st D j. We now generalize this result to the general case withrandom processing times and due dates following arbitrary distributions.

Take fi(t) = wi(t −Di)It>Di, i = 1, . . . ,n. Then our T EC(π) equals EWT (π).Thus under the stochastic agreeable condition: P1 ≤st · · · ≤st Pn, D1 ≤st · · · ≤st Dnand w1 ≥ · · · ≥ wn, Part (ii) of Theorem 2.12 with g(x) = x shows that a sequencein nondecreasing stochastic order of Pi minimizes EWT (π).

Case 2. Expected Weighted Number of Tardy Jobs:

EWNT (π) = E

[

∑i:Ci>Di

wi

]=

n

∑i=1

wi Pr(Ci > Di). (2.28)

The deterministic version of this problem is NP-hard even with equal due datesDi (Karp 1972). When Pi are exponentially distributed with rates τi and Diare identically distributed, Pinedo (1983) shows that a sequence in nonincreasingorder of τiwi minimizes EWNT (π). Boxma and Forst (1986) give several resultson the optimal sequences for cases under conditions such as i.i.d. or constant duedates, i.i.d. or exponential processing times; etc. These studies reveal that the EWNTproblem is difficult and certain conditions are always needed to obtain an analyticsolution. We now provide a result in the general case with random processing timesand due dates.

Assume that Di follow a common distribution as D. Take fi(t) = wiIt>Di.Then T EC(π) = EWNT (π). By Part (ii) of Theorem 2.13, a sequence in nonde-creasing stochastic order of Pi minimizes EWNT (π) under the agreeable condi-tion as specified in Theorem 2.13.

Case 3. Weighted Lateness Probability:

WLP(π) =n

∑i=1

wiPr(Li > 0), (2.29)

where Li = Ci −Di is the lateness of job i. Sarin et al. (1991) and Erel and Sarin(1989) investigated the problem with normally distributed Pi and a common de-terministic due date D. Note that Pr(Li > 0) = Pr(Ci > Di). Hence the same resultin Case 2 above is valid as well for WLP(π) with general distributions of Pi.

Case 4. Expected Total Weighted Squared Flowtime: Note that all cases discussedabove involve linear cost functions. Theorems 2.11–2.13 can be applied to problemswith nonlinear cost functions. As an example, we consider the problem to minimizethe Expected Total Weighted Squared Flowtime (EWSFT):

EWSFT (π) = E

[n

∑i=1

wiC2i

]=

n

∑i=1

wiE[C2i ]. (2.30)

This problem is much more difficult than the EWFT problem considered earlier.Townsend (1978) and Bagga and Kalra (1981) proposed branch-and-bound methodsto solve the problem when all parameters are deterministic. Furthermore, Bagga andKalra (1981) show that in a deterministic environment, a sequence in nondecreasingorder of Pi minimizes EWSFT under an agreeable condition that Pi < Pj implies


wi ≥wj . We now generalize the result to the stochastic version with general randomprocessing times. Take f (t) = t2. Then

EWSFT (π) =n

∑i=1

wiE[C2i ] =

n

∑i=1

wiE[ f (Ci)].

Since f (t) is an increasing deterministic function, it clearly satisfies the condition ofTheorem 2.13. Hence under the agreeable condition as specified in Theorem 2.13, asequence in nondecreasing stochastic order of Pi minimizes EWSFT (π).

2.4.2 Maximum Expected Cost

We first define an inequality relation between two functions f (x) and g(x) on [0,∞)in the usual sense: f ≤ g if and only if f (x) ≤ g(x) for all x ≥ 0. Then we have thefollowing result for MEC(π) defined in (2.17) under general cost functions:

Theorem 2.14. If the jobs can be arranged such that m1 ≥ m2 ≥ · · ·≥ mn, then thesequence π∗ = 1,2, . . . ,n minimizes MEC(π). In other words, if an inequalityrelation exists between the mean cost functions, then a sequence in nonincreasingorder of mi is optimal.

Proof. Let π = (. . . , i, j, . . . ) and π ′ = (. . . , j, i, . . . ). It suffices to show that mi ≥ m jimplies MEC(π)≤ MEC(π ′). By (2.18),

E[ fk(Ck(π))] = E[ fk(Ck(π ′))]≤ max1≤i≤n

E[ fi(Ci(π ′))] = MEC(π ′), k = i, j. (2.31)

Moreover, as Ci(π)≤Cj(π) a.s. and mi(t) = E[ fi(t)] is nondecreasing in t, Part (ii)of Lemma 2.5 together with (2.19) give

E[ fi(Ci(π))]≤ E[ fi(Cj(π))] = E[ fi(Ci(π ′))]≤ MEC(π ′). (2.32)

If mi ≥ m j, then

E[ f j(Cj(π))|Cj(π) = x] = E[ f j(x)] = m j(x)≤ mi(x) = E[ fi(Cj(π))|Cj(π) = x],

for any x ≥ 0, which implies that

E[ f j(Cj(π))]≤ E[ fi(Cj(π))] = E[ fi(Ci(π ′))]≤ MEC(π ′) (2.33)

Combining (2.31) through (2.33) we get


E[ fi(Ci(π))]≤ MEC(π ′).

This completes the proof.


Consider the cost functions of the form

fi(t) = wig(t −Di)It>Di, i = 1, . . . ,n,

with a nondecreasing function g(·) on [0,∞). Then Theorem 2.14 can be applied toshow the following result:

Theorem 2.15. If the jobs can be arranged such that D1 ≤st D2 ≤st · · · ≤st Dn andw1 ≥ w2 · · · ≥ wn, then a sequence in nondecreasing order of Di is optimal inminimizing


wiE[g(Ci −Di)ICi>Di

]. (2.34)

Proof. Let Di ≤st D j and wi ≥ wj . Similar to (2.24) we can show that

mi(t)−m j(t) =∫ t

0wi[Pr(Di ≤ y)−Pr(D j ≤ y)]dgt(y)

+ (wi −wj)∫ t

0Pr(D j ≤ y)dgt(y), (2.35)

where gt(y) = −g(t − y). As g is a nondecreasing function, gt(y) is nondecreasingin y. This together with the facts that Pr(Di ≤ y) ≥ Pr(D j ≤ y) for all y ≤ 0 (asDi ≤st D j) and wi ≥ wj show that the right hand side of (2.35) is nonnegative. Hencemi(t)≥ m j(t). Theorem 2.14 then applies to complete the proof.

Special Cases of Maximum Expected Cost

We now show the applications of the general results on MEC obtained above tosome special cases, which will extend the previously known results on MEC tomore general situations.

Case 5. Maximum Expected Lateness:

MEL(π) = max1≤i≤n

E[Ci −Di].

In deterministic environment, Jackson (1955) provided an elegant result that MEL(π)is minimized by a sequence in nondecreasing order of Di, which is referred to asthe Earliest Due Date (EDD) rule. We now extend this result to stochastic situationwith random processing times and due dates.

Take fi(t) = t −Di for i = 1, . . . ,n. Then mi(t) = t −E[Di], so that mi(t)≥ m j(t)if and only if E[Di] ≤ E[D j]. It then follows from Theorem 2.14 that a sequencein nondecreasing order of E[Di] minimizes MEL(π). In other words, the optimalsequence is the Earliest Expected Due Date (EEDD) rule.


A more general result is on the problem where each job i is assigned with aweight wi. In this case, the objective is to minimize the maximum expected weightedlateness. According to Theorem 2.14, it is not hard to show that the EEDD rule isoptimal under the agreeable condition that E[D j]≤ E[D j] if and only if wi ≥ wj.

It is interesting to note that Theorem 4 of Crabill and Maxwell (1969) is similarto the above result.

Case 6. Maximum Expected Weighted Tardiness:

MEW T (π) = max1≤i≤n

wiE[(Ci −Di)ICi>Di].

It is also a well-known result that an optimal sequence for the deterministic versionof this problem, when all weights are equal, should schedule the jobs in EDD order(cf. Jackson, 1955). We now generalize this result to the stochastic case.

Take fi(t) = wi(t −Di)It>Di for i = 1, . . . ,n. Then MEW T (π) = MEC(π) in(2.34) with g(t) = t. Thus by Theorem 2.15, if Di ≤st D j ⇐⇒ wi ≥ wj , then theEEDD rule is optimal in minimizing MEW T (π).

Case 7. Maximum Weighted Probability of Lateness:

MW PL(π) = max1≤i≤n

wiPr(Ci ≥ Di).

When Pi are random variables, Di are deterministic, and weights are equal,Banerjee (1965) shows that the EDD rule minimizes MW PL. Crabill and Maxwell(1969) extend this result to random Di. We now consider the case with randomPi and Di as a special case of MEC.

Since Pr(Ci ≥Di) =E[ICi≥Di], MW PL(π) =max1≤i≤n wiE[ICi≥Di] is a specialcase of (2.34) with g(t)≡ 1. Thus by Theorem 2.15, if Di ≤st D j ⇐⇒ wi ≥ wj, thenthe EEDD rule minimizes MW PL(π).

2.5 Exponential Processing Times

When the processing times are exponentially distributed, we can obtain more resultson optimal sequences to minimize the total expected cost T EC(π) in (2.16).

In this section, we assume that the processing times P1, . . . ,Pn follow exponentialdistributions with rates λ1, . . . ,λn respectively. The density and cumulative distribu-tion functions of Pi are given by λie−λix and Pr(Pi ≤ x) = 1−e−λix, respectively, fori = 1, . . . ,n.

The cost to complete job i is fi(Ci) with the cost function fi(·) as described inSect. 2.4. That is, fi(t) is a nondecreasing random function of t a.s., independent of

2.5 Exponential Processing Times 75

processing times Pi, with mean function E[ fi(t)] = mi(t). In particular, we con-sider the case fi(t) = wig(t −Di)It>Di, where wi is the weight of job i, g(·) is anonnegative and nondecreasing function on [0,∞), and Di is the due date of job i.

Stochastic scheduling problems with exponentially distributed processing timeshave been studied by many authors, which have produced some elegant results.Derman et al. (1978) considered the problem of minimizing the weighted numberof tardy jobs on a single machine. They showed that the weighted shortest expectedprocessing time (WSEPT) sequence is optimal when all jobs have a common ran-dom due date following an arbitrary distribution. Glazebrook (1979) examined aparallel-machine problem. He showed that the shortest expected processing time(SEPT) sequence minimizes the expected mean flowtime.

Weiss and Pinedo (1980) investigated multiple non-identical machine problemsunder a performance measure that covers the expected sum of weighted completiontimes, expected makespan, and expected lifetime of a series system, and showedthat a SEPT or LEPT sequence minimizes this performance measure.

Pinedo (1983) examined the minimizations of the expected weighted sum ofcompletion times with random arrival times, the expected weighted sum of tardi-nesses, and the expected weighted number of tardy jobs. He showed that the WSEPTsequences are optimal under certain (compatability) conditions.

Boxma and Forst (1986) investigated the minimization of the expected weightednumber of tardy jobs and derived optimal sequences for various processing timeand due date distributions, including exponential and independently and identicallydistributed (i.i.d.) processing times and/or due dates.

Kampke (1989) generalized the work of Weiss and Pinedo (1980) and derivedsufficient conditions for optimal priority policies beyond SEPT and LEPT.

In Pinedo (2002), the WSEPT sequence was shown to minimize the performancemeasure E

[∑wih(Ci)

], where h(·) is a general function. Moreover, the performance

measure E[

∑wihi(Ci)]

was also studied with a job-dependent cost function hi(·).Pinedo defined an order h j ≥s hk (termed as h j is steeper than hk) between the costfunctions by dh j(t) ≥ dhk(t) for all t ≥ 0 if the differentials exist; or h j(t + δ )−h j(t) ≥ hk(t + δ )− hk(t) for all t ≥ 0 and δ > 0 otherwise. It was shown that theWSEPT sequence minimizes E

[∑wihi(Ci)

]under the agreeability condition λ jw j ≥

λiwi ⇐⇒ h j ≥s hk.

In this section, we present three more general results:

1. A sequence in the order based on the increments of λ jE[ f j(t)] is optimal to min-imize E

[∑ fi(Ci)

].

2. When the due dates Di have a common distribution, the WSEPT sequence isoptimal to minimize E

[∑wig(Ci −Di)ICi>Di

]without requiring any additional

conditions.


3. When Di have different distributions, if g(·) is convex on [0,∞) with g(0) = 0,then a sequence in the nonincreasing order of λiwi Pr(Pi ≤ x) is optimal tominimize E

[∑wig(Ci−Di)ICi>Di

]. In particular, if λiwi ≥ λ jw j ⇒Di ≤rmst D j,

then the WSEPT sequence is optimal to minimize E[

∑wig(Ci −Di)ICi>Di].

This section is mainly based on Cai and Zhou (2005).

2.5.1 Optimal Sequence for General Costs

The optimal sequence to minimize E[

∑ fi(Ci)]

is stated in the following theorem.

Theorem 2.16. If i > j implies that λimi(t) has increments no more than those ofλ jm j(t) at any t in the sense that

λi[mi(t)−mi(s)]≤ λ j[m j(t)−m j(s)] ∀t > s, (2.36)

or equivalently,∫ ∞

0φ(s)λidmi(s)≤

∫ ∞

0φ(s)λ jdm j(s) (2.37)

for any nonnegative measurable function φ(s) on [0,∞), where the integrals arein Lebesgue-Stieltjes sense, then the sequence (1,2, . . . ,n) is optimal to minimizeE[

∑ fi(Ci)]. In other words, a sequence in nonincreasing order of the increments of

λimi(t) is optimal to minimize E[

∑ fi(Ci)].

Proof. First, by taking φ(s) = I[s,t] in (2.37) we can see that (2.37) implies (2.36).Conversely, for any nonnegative measurable function φ(s), we can construct func-tions φ1(s)≤ φ2(s)≤ · · · , with each φk(s) a linear combination of functions of formI[s,t], such that φk(s)→ φ(s) as k → ∞. Hence an application of the monotone con-vergence theorem shows that (2.36) implies (2.37). This establishes the equivalencebetween (2.36) and (2.37).

Next, since fi(t) are independent of Pi,

E[ fi(t +Pj)] = EE[ fi(t +Pj)|Pj]=∫ ∞

0E[ fi(t + x)|Pj = x]λ je−λ jxdx

=∫ ∞

0E[ fi(t + x)]λ je−λ jxdx =

∫ ∞

0mi(t + x)λ je−λ jxdx (2.38)

for i, j = 1,2, . . . ,n and t ≥ 0. Furthermore, by convolution it can be shown that

the density of Pi +Pj =

⎧⎪⎨

⎪⎩

λiλ j

λ j −λi

(e−λix − e−λ jx

)if λi = λ j

λ 2i xe−λix if λi = λ j.

(2.39)

(Note that the second part of (2.39) is equal to the limit of the first part as λ j con-verges to λi.) Thus, when λi = λ j, by (2.39) together with an argument similar to(2.38) we obtain

E[ fi(t +Pi+Pj)] =λiλ j

λ j −λi

∫ ∞

0mi(t + x)


)dx. (2.40)

Let π = . . . , i, j, . . . be an arbitrary job sequence, π ′ = . . . , j, i, . . . be the se-quence by interchanging two consecutive jobs i, j in π , and C denote the completiontime of the job prior to job i under π . Then, for T EC(π) = E

[∑ fi(Ci(π))

],

T EC(π)−TEC(π ′) = E[ fi(C+Pi)]+E[ f j(C+Pi +Pj)]

−E[ f j(C+Pj)]−E[ fi(C+Pi +Pj)]. (2.41)

Since P1, . . . ,Pn are mutually independent, conditional on C = t we have

E[ fi(C+Pi)|C = t] = E[ fi(t +Pi)|C = t] = E[ fi(t +Pi)]

and similarly, E[ fi(C+Pi +Pj)|C = t] = E[ fi(t +Pi +Pj)]. Hence a combination of(2.41) with (2.38) and (2.40) yields that, conditional on C = t,

T EC(π)−TEC(π ′)

= E[ fi(t +Pi)]+E[ f j(t +Pi+Pj)]−E[ f j(t +Pj)]−E[ fi(t +Pi+Pj)]

=∫ ∞

0mi(t + x)

λie−λix −

λiλ j

λ j −λi


)dx

−∫ ∞

0m j(t + x)

λ je−λ jx − λiλ j

λ j −λi


)dx

=∫ ∞

0[λimi(t + x)−λ jm j(t + x)]

λ je−λ jx −λie−λix

λ j −λidx

= ai j(t), say. (2.42)

Extend the domain of each mi(t) to (−∞,∞) by defining mi(t) = 0 for t < 0. Thenmi(·) is a nondecreasing function on (−∞,∞). Hence we can write

mi(t + x) =∫ t+x

−∞dmi(s), i = 1, . . . ,n.

An application of Fubini’s Theorem then gives

ai j(t) =∫ ∞

0

∫ t+x

−∞[λ jdm j(s)−λidmi(s)]

λie−λix −λ je−λ jx

λ j −λidx


=∫ t

−∞

∫ ∞

0


λ j −λidx [λ jdm j(s)−λidmi(s)]

+∫ ∞

t

∫ ∞

s−t


λ j −λidx [λ jdm j(s)−λidmi(s)]

=∫ ∞

t

e−λi(s−t)− e−λ j(s−t)

λ j −λi[λ jdm j(s)−λidmi(s)] . (2.43)

It is easy to see that

e−λi(s−t)− e−λ j(s−t)

λ j −λi≥ 0 for all s ≥ t.

Hence by (2.42) and (2.43) together with condition (2.37), conditional on C = t,

i > j =⇒ T EC(π)−TEC(π ′) = ai j(t)≥ 0 ∀t ≥ 0,

which in turn implies, unconditionally, T EC(π)−TEC(π ′)≥ 0.

Thus we have shown that TEC(π)≥ TEC(π ′) for i > j when λi = λ j. The sameholds when λi = λ j as well, which can be similarly proven using the second part of(2.39), or considering the limit as λ j converges to λi. It follows that the sequence π ′

is better than π if i > j. Consequently the sequence (1,2, . . . ,n) is optimal.

Note that condition (2.37) is what we need to prove Theorem 2.16, while condi-tion (2.36) is usually easier to check in specific cases. Also, (2.36) does not requiremi(t) to be differentiable at all t ≥ 0. If mi(t) may be discontinuous at some points,then (2.36) assumes that

λi[mi(t+)−mi(t−)]≤ λ j[m j(t+)−m j(t−)] for i > j

at any discontinuity t (which can also be written as λidmi(t) ≤ λ jdm j(t) in thatsense). If mi(t) have different left and right derivatives at some t, then (2.36) requiresλidmi(t+)≤ λ jdm j(t+) and λidmi(t−)≤ λ jdm j(t−) for i > j.

Remark 2.2. Theorem 2.16 extends the results of Pinedo (2002). Condition (2.36)or (2.37) is in fact equivalent to ‘λ jm j is steeper than λimi’ in Pinedo’s termi-nology. Hence Theorem 2.16 says that the sequence in a reverse steepness orderof λimi(t), i = 1, . . . ,n is optimal. Note that in Pinedo (2002), which considersdeterministic cost functions fi only, an agreeable condition is needed between thesteepness of fi(t)/wi and the order of λiwi, i.e., λiwi ≥ λ jw j implies that fi(t)/wiis steeper than f j(t)/wj. In Theorem 2.16, such an agreeable condition can be re-placed by a weaker condition (2.36). In addition, Theorem 2.16 is more generalthan the results of Pinedo (2002) in that it allows stochastic cost functions, so thatthe parameters such as due dates, weights, etc., can be random variables.

The following example shows an application of Theorem 2.16.

Example 2.3. Let fi(t) = wih(t), where wi is a deterministic weight and h(t) is anondecreasing stochastic process. Then mi(t) = E[ fi(t)] = wiE[h(t)] is nondecreas-ing in t. Furthermore, if λiwi > λ jw j , then

λi[mi(t)−mi(s)] = λiwiE[h(t)]−E[h(s)]≥ λ jw jE[h(t)]−E[h(s)]= λ j[m j(t)−m j(s)] ∀t > s.

Hence by Theorem 2.16, a sequence in nonincreasing order of λiwi minimizesE[∑wih(Ci)]. As E[Pi] = 1/λi, this sequence is the WSEPT and so the result gener-alizes that of Pinedo (2002) to an arbitrary stochastic cost function h, which allows,for example, a random common due date with an arbitrary distribution.

There are, of course, also examples where the condition of Theorem 2.16 doesnot hold. A simple one is given below.

Example 2.4. Let f1(t) = 2t and f2(t) = t2, which are deterministic cost functions.Then m1(t) = 2t and m2(t) = t2. Hence dm1(t) = 2dt and dm2(t) = 2tdt. Itfollows that λ1dm1(t) ≤ λ2dm2(t) when t ≥ λ1/λ2, and λ1dm1(t) > λ2dm2(t) fort < λ1/λ2. Thus (2.36) cannot hold for jobs 1 and 2. Furthermore, suppose w1 = w2.Then it is not difficult to show that T EC(1,2)< T EC(2,1) if and only if λ1 > λ 2

2 .Hence the WSEPT rule is not optimal even if the jobs have a common weight.

2.5.2 Optimal Sequences with Due Dates

The applications of Theorem 2.16 lead to the next two theorems for the case withfi(t) = wig(t −Di)It>Di. The first one is for identically distributed due dates.

Theorem 2.17. If Di have a common distribution, then a sequence in nonincreas-ing order of λiwi, or equivalently, in nondecreasing order of E[Pi]/wi, mini-mizes E

[∑wig(Ci −Di)ICi>Di

].

Proof. Let fi(t) = wig(t −Di)It>Di, i = 1,2, . . . ,n, and F(x) = Pr(Di ≤ x) be thecommon distribution function of Di. Then

mi(t) = E[ fi(t)] = wiE[g(t −Di)It>Di] = wi

∫

0≤x<tg(t − x)dF(x). (2.44)

Let

g(t) =∫

0≤x<tg(t − x) dF(x)

so that λimi(t) = λiwig(t). Since g(t) is nonnegative and nondecreasing, so is g(t).It follows that λiwi ≥ λ jw j implies

λi[mi(b)−mi(a)] = λiwi[g(b)− g(a)]≥ λ jw j[g(b)− g(a)] = λ j[m j(b)−m j(a)]

for all a< b. Thus if λ1w1 ≥ · · ·≥ λnwn, then 1, . . . ,n is optimal by Theorem 2.16.In other words, a sequence in nonincreasing order of λiwi is optimal to minimizeE[


Example 2.5.

(i) Let g(x) ≡ 1. Then Theorem 2.17 says that a sequence in nonincreasing orderof λiwi minimizes the expected weighted number of tardy jobs, or equiva-lently, the weighted lateness probability, when the due dates have a commondistribution (not necessarily a common due date).

(ii) Let g(x) = x. Then by Theorem 2.17, a sequence in nonincreasing order ofλiwi minimizes the expected weighted sum of job tardinesses.

Note that the above results do not require any compatibility conditions betweenthe weights and processing times.

The next theorem allows the due dates to have different distributions.

Theorem 2.18. Let Fi(x) = Pr(Di ≤ x). If g(·) is convex with g(0) = 0 and

λ1w1F1(x)≥ λ2w2F2(x)≥ · · ·≥ λnwnFn(x) for x ≥ 0, (2.45)

then the sequence (1,2, . . . ,n) minimizes E[

∑wig(Ci−Di)ICi>Di]. In other words,

a sequence in the nonincreasing order of λ jw jFj(x) is optimal.

Proof. Since g(x) is nondecreasing with g(0) = 0, we have g(t − x) =∫ t−x

0 dg(y).By Fubini’s Theorem,

mi(t) = wiE[g(t −Di)It>Di] = wi

∫

0≤x<tg(t − x)dFi(x)

= wi

∫

0≤x<t

∫ t−x

0dg(y)dFi(x) = wi

∫

0≤y<t

∫

0≤x≤t−ydFi(x)dg(y)

= wi

∫

0≤y<tFi(t − y) dg(y) = wi

∫ t

0Fi(x)dgt(x),

where gt(x) = −g(t − x), which is a nondecreasing function on [0, t] for any t ≥ 0.Hence

λi[mi(b)−mi(a)] = λiwi

∫ b

0Fi(x)dgb(x)−

∫ a

0Fi(x)dga(x)

=∫ b

aλiwiFi(x)dgb(x)dx+

∫ a

0λiwiFi(x)[dgb(x)− dga(x)]. (2.46)

Because g(t) is convex, its increment g(t+∆)−g(t) is nondecreasing in t for ∆ > 0,which implies

gb(y)− gb(x) = g(b− x)− g(b− y)= g(b− y+∆)− g(b− y)

≥ g(a− y+∆)− g(a− y)= g(a− x)− g(a− y)= ga(y)− ga(x)

for 0 ≤ x < y ≤ a < b, where ∆ = y− x. Thus, for a < b, gb has increments greaterthan or equal to those of ga. As a result,

∫ a

0φ(x)dga(x)≤

∫ a

0φ(x)dgb(x), or

∫ a

0φ(x)[dgb(x)− dga(x)]≥ 0,

for any nonnegative measurable function φ(x) on [0,a]. If λiwiFi(x)≤ λ jw jFj(x) forx ≥ 0, then

∫ a

0λiwiFi(x) [dgb(x)− dga(x)]≤

∫ a

0λ jw jFj(x)[dgb(x)− dga(x)]. (2.47)

Moreover, as gb(x) is nondecreasing on [0, t],

∫ b

aλiwiFi(x) dgb(x)≤

∫ b

aλ jw jFj(x) dgb(x). (2.48)

Now, if i > j, then λiwiFi(x)≤ λ jw jFj(x) for x ≥ 0 by the condition of the theorem.It then follows from (2.46) to (2.48) that

λi[mi(b)−mi(a)]≤ λ j[m j(b)−m j(a)] ∀a < b, i > j.

Thus by Theorem 2.16, the sequence (1,2, . . . ,n) minimizes E[

∑wig(Ci −Di)ICi>

Di].

Finally, if λiwiFi(x) ≥ λ jw jFj(x) for x ≥ 0, then λiwi ≥ λ jw j as x → ∞. Con-sequently, given the existence of an order between λ jw jFj(x), a sequence in thenonincreasing order of λ jw j, i.e., the WSEPT, is optimal.

Corollary 2.3. If g(·) satisfies the conditions in Theorem 2.18 and λiwi ≥ λ jw jimplies Di ≤st D j, then a sequence in nonincreasing order of λ jw j minimizesE[


Proof. By the condition of the corollary and the definition for the stochastic order,λiwi ≥ λ jw j implies Fi(x) ≥ Fj(x) for all x ≥ 0. As a result, a nonincreasing orderexists between λ jw jFj(x) and is equivalent to the nonincreasing order of λ jw j,so the corollary follows immediately from Theorem 2.18.

Example 2.6. Both g(x) = x and g(x) = x2 satisfies the conditions of Theorem 2.18.Hence if the compatibility condition in the corollary holds, then a sequence in nonin-creasing order of λiwi, or equivalently, in nondecreasing stochastic order of Di,

minimizes both the expected weighted sum of tardinesses E[∑i:Ci>Diwi(Ci −Di)]

and the expected weighted sum of squared tardinesses E[∑i:Ci>Diwi(Ci−Di)2]. (This

is not true for the expected weighted number of tardy jobs. Note that g(x) = 1 doesnot satisfy the conditions of Theorem 2.18 because g(0) = 0.)

Condition (2.45) is weaker than the agreeable condition between λ jw j andDi. If for some i = j, λiwi > λ jw j but Di ≤st D j fails, a sequence in nonincreasingorder of λ jw j could still be optimal. We illustrate this in the next example.

Example 2.7. Suppose Di ∼ exp(δi) so that Fi(x) = 1− e−δix. We show below thatan order exists between λ jw jFj(x) if and only if λ jw j have the same orderas λ jw jδ j. To see this, let λiwi ≥ λ jw j and λiwiδi ≥ λ jw jδ j. We show thatλiwiFi(x)≥ λ jw jFj(x) for x > 0 below. Consider the following two cases:

Case 1: δi < δ j. It is easy to see that (1− e−x)/x is a decreasing function of x on(0,∞). Hence δi < δ j and λiwiδi ≥ λ jw jδ j imply, for x > 0,

Fi(x)Fj(x)

=1− e−δix

1− e−δ jx>

δixδ jx

=δi

δ j≥ λ jw j

λiwi,

or equivalently, λiwiFi(x)> λ jw jFj(x).

Case 2: δi ≥ δ j. Then Fi(x) ≥ Fj(x) for x ≥ 0, which together with λiwi ≥ λ jw jleads immediately to λiwiFi(x) ≥ λ jw jFj(x). Conversely, if λiwiFi(x) ≥ λ jw jFj(x)for x ≥ 0, then λiwi ≥ λ jw j as x → ∞. Furthermore,

1 ≤ λiwiFi(x)λ jw jFj(x)

=λiwi(1− e−δix)

λ jw j(1− e−δ jx)−→ λiwiδi

λ jw jδ jas x ↓ 0.

Hence λiwiδi ≥ λ jw jδ j. Thus we have shown that λiwiFi(x)≥ λ jw jFj(x) for x ≥ 0if and only if λiwi ≥ λ jw j and λiwiδi ≥ λ jw jδ j. As a result, even if λiwi > λ jw j butδi < δ j (so that Di ≤st D j fails), a sequence in nonincreasing order of λ jw j wouldstill be optimal if we have λiwiδi ≥ λ jw jδ j for such i and j.

2.5.3 Examples of Applications

Two examples of applications are provided below. The first example takes into ac-count random price variations and interest accrual of capitals, while the second oneallows a deadline in addition to the due dates.

Example 2.8. A company produces a variety of goods for sale. While the currentprice of a product is known, the future price is uncertain and expected to decline

over time due to fading popularity and advancement of technology. This appliesparticularly to fashion products (e.g., toys, clothes), entertainment products (e.g.,music, video), and technology products (e.g., computers, softwares). To allow ran-dom variations in the future price, we model the price of job i at time t by aihi(t),where ai is a constant representing the current price and hi(t) is a stochastic processwith hi(0) = 1. Assume that E[hi(t)] = u(t) is a nonincreasing function of t, reflect-ing a downward trend of price over time.

At the start of production, an amount of capital is invested to produce job i, whichis proportional to the current price, namely β ai, where 0 < β < 1. Let α denote theinterest rate, which is a random variable following an arbitrary distribution. Thenthe value of the investment for job i at time t is given by β ai(1+α)t . Hence if job iis sold at time t, then its net profit is aihi(t)−β ai(1+α)t . Suppose that each job issold to a retailer upon its completion, then the total net profit from a set of n jobs is

n

∑i=1

[aihi(Ci)−β ai(1+α)Ci

](2.49)

where Ci is the completion time of job i.

If the company produces the goods in sequel, then the problem faced by themanagement is how to schedule the production optimally so as to maximize theexpected total net profit. Define stochastic processes

fi(t) = β ai(1+α)t − aihi(t), i = 1, . . . ,n. (2.50)

Then the problem of maximizing the total net profit given by (2.49) is equivalent tominimizing E[∑ fi(Ci)]. From (2.50) we can see that the mean function of fi(t) is

mi(t) = E[ fi(t)] = β aiE[(1+α)t ]− aiE[hi(t)]+ ai(1−β )= aiβ E[(1+α)t]− u(t), (2.51)

As E[hi(t)] = u(t) is a nonincreasing function of t, by (2.51) mi(t) is nondecreasingin t. Write G(t) = β E[(1+α)t ]−u(t) for brevity, which is nondecreasing in t. Then,assuming that the processing times are exponentially distributed with parametersλ1, . . . ,λn, it follows from (2.51) that λiai ≥ λ ja j implies

λi[mi(t)−mi(s)] = λiai[G(t)−G(s)]≥ λ ja j[G(t)−G(s)] = λ j[m j(t)−m j(s)]

for all t > s. Thus by Theorem 2.1, a sequence in nonincreasing order of λ ja jminimizes E[∑ fi(Ci)], and so is optimal to maximize the expected total net profit.

It is interesting to note in this example that the optimal sequence can be con-structed based on the current available price and the rates of the processing times,regardless of future price fluctuations and the cost of interest on the capital.

Example 2.9. A laboratory is contracted to perform reliability tests on n items. Thetest is to be performed sequentially on a particular facility, with each item testedimmediately after the failure of the last item. The failure times of the items are sup-posed to be independently and exponentially distributed with failure rates λ1, . . . ,λnrespectively. If the test result for item i is reported on or before a due date Di, thelaboratory will receive a payment valued vi for the test. If it is later than Di by time t,then the payment will be reduced proportionally to vih(t), where h(t) is a stochasticprocess taking values in [0,1] and is decreasing in t almost surely. The due datesare assumed to be random variables with a common distribution. In addition, if thefacility to perform the tests breaks down, then the tests will not be able to continueand so no payment will be made for items not yet tested by the breakdown time.The breakdown time B is assumed to be exponentially distributed with a rate δ .

The laboratory wishes to schedule the tests optimally so as to maximize the ex-pected total payment it can receive. This is equivalent to minimizing the followingobjective function (representing the expected total loss):

ETL(π) = E

[n

∑i=1

vih(Ci −Di)IDi<Ci≤B+ viICi>B

]

(2.52)

where h(t) = 1− h(t) and Ci is the completion time of testing item i. Let

fi(t) = vih(t −Di)IDi<t≤B+ viIt>B.

Then the objective function in (2.52) is equal to ETL(π) = E [∑ fi(Ci)] . As h(t) isdecreasing in t almost surely and 0 ≤ h(t) ≤ 1, fi(t), t ≥ 0 is a nondecreasingstochastic process for each i. Let D denote a random variable with the same distri-bution as Di. Then the mean function of fi(t) is

mi(t) = E[ fi(t)] = viE[h(t −D)Id<t≤B]+ viP(t > B)

= viE[h(t −D)e−δ t It>D+ vi(1− e−δ t)

= vi

e−δ t (E[h(t −D)It>D]− 1

)+ 1= viG(t)

where G(t) = 1 − e−δ t(1−E[h(t −D)It>D]

). Since 0 ≤ E[h(t − D)It>D] ≤ 1

and by the assumptions of the problem E[h(t −D)It>D] is nondecreasing in t,e−δ t

(1−E[h(t −D)It>D]

)is nonincreasing in t and so G(t) is nondecreasing in t.

Hence, similar to the arguments in Example 2.8, it follows from Theorem 2.16 thata sequence in nonincreasing order of λ jv j is optimal. That is, items with higherratios of value over mean testing time should be tested earlier.

2.6 Compound-Type Distributions 85

2.6 Compound-Type Distributions

In this section, we consider a more general class of compound-type distributionsthan the exponential distributions for the processing times, and derive the optimalsequence to minimize the total expected cost in (2.16) and (2.17), which generalizethe results presented in Sect. 2.5.

This class of distributions are characterized by a common form of their character-istic functions. More specifically, we consider a class of distributions parameterizedby γ , with characteristic functions of the form

φ(t) = E[eitX ] =1

1+G(t)/γ (2.53)

for random variable X , where i denotes the imaginary unit and G(t) is a complex-valued function of real variable t such that φ(t) is a characteristic function of aprobability distribution. We refer to this class of distributions as compound-typedistributions. It is easy to see that the exponential distribution is a special case of(2.53) with G(t) =−it.

Similar to the exponential distributions, the effects of a job interchange procedureunder compound-type distributions can be computed such that the conditions for asequence to minimize the T EC(π) in (2.16) or (2.17) can be readily verified.

Generally speaking, the processing times on jobs, Pi,1≤ i≤ n, are nonnegativevariables. But in some circumstances, it is convenient to approximate the distribu-tion of a processing time with one that can take negative values, such as a normaldistribution. See, for example, Boys et al. (1997) and Jang and Klein (2002) andthe references therein. Thus we allow the processing times to be real-valued randomvariables with positive means.

We will show that the nonincreasing order of the increments of mi(t) = γ jE[ fi(t)]is optimal to minimize E[∑ fi(Ci)]. Furthermore, if the due dates Di have a com-mon distribution, then the optimal policy to minimize E[∑wig(Ci −Di)ICi>Di] isto schedule the jobs according to non-increasing order of wjγ j. On the other hand,if the due dates D j have different distributions, then the optimal policy dependson these distributions and relies on the convexity or concavity of g(·). Specifically,when jobs can be ordered by wjγ jPr(D j ≤ x) for all x, such an ordering is optimalwhen g(·) is convex or concave. The exposition below is based on Cai et al. (2007a).

2.6.1 Classes of Compound-Type Distributions

We first give a lemma on the compound-type distributions defined by (2.53), whichwill play a key role in the interchange procedures to find the optimal sequences tominimize T EC(π).

Suppose that the processing times P1, . . . ,Pn, are independent random variablesfollowing the compound-type distributions with cumulative distribution function(cdf) Fi(x) for Pi, i = 1, . . . ,n. Denote by Fi j(x) the cdf of Pi + Pj and φi(t) thecharacteristic function of Pi.

Lemma 2.6. There exists a function G(t) and a series of numbers γ1, . . . ,γn suchthat φi(t) = (1+G(t)/γi)−1 if and only if

Fi(x)−Fi j(x)γi

=Fj(x)−Fi j(x)

γ j.

In particular, γi can take the value 1/E[Pi].

Proof. Since the Fourier transformation is linear and the characteristic function ofFi j(x) is the product of the characteristic functions of Fi(x) and Fj(x),

Fi(x)−Fi j(x)γi

=Fj(x)−Fi j(x)

γ jfor 1 ≤ i, j ≤ n

⇐⇒ φi(t)−φi(t)φ j(t)γi

=φ j(t)−φi(t)φ j(t)

γ jfor 1 ≤ i, j ≤ n

⇐⇒ φi(t)γi(1−φi(t))

=φ j(t)

γ j(1−φ j(t))=

1G(t)

(say) for 1 ≤ i, j ≤ n

⇐⇒ φi(t) =1

1+G(t)/γifor 1 ≤ i ≤ n. (2.54)

Moreover, it is clear that the equivalence in (2.54) still holds if G(t) and γi arereplaced with aG(t) and aγi, respectively, for any fixed complex value a. As a result,we can take G′(0) = −i without loss of generality. Consequently, since φi(0) = 1implies G(0) = 0,

iE[Pi] = φ ′i (0) =− G′(0)/γi

(1+G(0)/γi)2 =

iγi

=⇒ γi =1

E[Pi].

The proof is thus complete.

In the case of non-negative Pi and Pj, since

Fi(x)−Fi j(x)Fi(x)−Fi j(x)

=Pr(Pi ≤ x < Pi +Pj)

Pr(Pj ≤ x < Pi +Pj)=

Pr(Pi ≤ x|Pi +Pj > x)Pr(Pj ≤ x|Pi +Pj > x)

,

Lemma 2.6 states that this ratio is constant for all x. For real-valued Pi and Pj,however, this implication is not generally true as we no longer necessarily have theequality Fi(x)−Fi j(x) = Pr(Pi ≤ x < Pi +Pj).

Some of the most commonly used distributions belong to this class of distribu-tions, as shown in the following examples.

Example 2.10 (Exponential). If P follows the exponential distribution with rate γ ,then the characteristic function of P is φ(t) = (1− it/γ)−1 (i.e., G(t) =−it).

Example 2.11 (Laplace). If P has a Laplace distribution with scale parameter α withdensity (2α)−1e−|x|/α for x ≥ 0, then φ(t) = (1+α2t2)−1 (γ = 1/α2 and G(t) = t2).

Example 2.12 (Polya-type). Let φ(t) = (1+C|t|α)−1 for −∞ < t < ∞, with parame-ters 0<α ≤ 2 and C ≥ 0, which has the form in (2.53) with γ = 1/C and G(t) = |t|α .Then φ(t) is the characteristic function of Polya-type; see for example, Bisgaard andZoltan (2000).

Example 2.13 (Geometric). If Pr(P = k) = (1−α)αk (0 < α < 1), k = 0,1,2, . . . ,then φ(t) = (1+α(1− eit)/(1−α))−1, which is also of the type in (2.53), withγ = (1−α)/α and G(t) = 1− eit .

Example 2.14 (Compound geometric). Let Xn∞n=1 be i.i.d. with common distribu-

tion function F , and N a random variable independent of Xn. Then X = ∑Nn=1 Xn

is said to have a compound distribution, whose name is determined by the distribu-tion of N. If N is geometrically distributed with Pr(N = n) = (1− θ )θ n−1 (where0 < θ < 1), n = 1,2, . . . , then X is said to be compound geometric.

The compound geometric distributions arise in some practical situations ofscheduling. For example, consider the situation where the processing of a job con-sists of many subtasks whose processing times are independently and identicallydistributed. The total processing time of the job is then compound geometricallydistributed if the number of subtasks has a geometric distribution. As another ex-ample, a compound geometric distribution can arise when a task may not be donecorrectly, so it must be repeated until it is done correctly, where θ is the probabilityit is done incorrectly, and all repetitions are i.i.d.

The following proposition characterizes the compound geometric distribution byits characteristic functions. Its proof is straightforward and thus omitted.

Proposition 2.1. If X is compound geometric, then its characteristic function hasthe form in (2.53).

Example 2.15 (Levy Process with Exponentially Distributed Horizon). Suppose thatX(t) : t ∈ I is a stochastic process with independent increments, where I is theset of its (time) horizon. Let T be a random variable taking values in I, indepen-dent of the process X(t) : t ∈ I. The random variable X(T ) is termed as a processwith random horizon, and its distribution is called a generalized compound dis-tribution. For simplicity, we will only discuss the Levy process here, which is aprocess X(t), t ≥ 0 with increment X(t + s)−X(t) independent of the processX(v),0 ≤ v ≤ t and has the same distribution law as X(s), for every s, t ≥ 0.


The generalized compound distribution may arise from practical situations aswell. Consider a manufacturing practice in which processing time is a Levy Processwith Exponentially Distributed Horizon. Suppose that the processing of a job is torepair the flaw of the job (or product). The flaw is an irregular area with length Tand irregular width, which gives rise to a random processing time on repairing anyfixed length of flaw area. The processing time to repair a flaw of l units of length isa random variable, related only to the length but not the position of the flaw. Thenthe processing time on a length of flaw is a Levy process with the length as its timeparameter. So if the length of flaw is distributed exponentially, then the processingtime is a Levy process with exponentially distributed horizon.

We now calculate the characteristic function of X(T ), which again has the form in(2.53). Denote the characteristic exponent of X(1) by Ψ (t), i.e., E[eisX(1)] = e−Ψ (s).Then we have the following result.

Proposition 2.2. If X(t), t ≥ 0 is a Levy process and T is exponentially distributedwith rate γ , independent of X(t), t ≥ 0, then the characteristic function of X(T ) isφX(T )(s) = (1+Ψ(s)/γ)−1.

Proof. First note that a Levy process X(t) is infinitely divisible. Then the character-istic function of X(t) is given by φX(t)(s) = e−tΨ (s) (see, for example, Bertoin 1996).Hence if T is exponentially distributed with rate γ , independent of the Levy processX(t), t ≥ 0, then the characteristic function of X(T ) is

φX(T )(s) = E[eisX(T )] = E

E[eisX(T )|T ]= E[e−TΨ (s)]

= γ∫ ∞

0e−tΨ (s)−γt dt = (1+Ψ(s)/γ)−1.


Remark 2.3. One question of interest is whether the distributions in the class givenby (2.53) can be likelihood-ratio ordered. While some of the distributions in ourexamples above can indeed be likelihood-ratio ordered, such as the exponentialand geometric distributions, this is not the case in general. For example, considerthe Laplace distributions in Example 2.11 above. Let f1(x) = (2α1)−1e−|x|/α1 andf2(x) = (2α2)−1e−|x|/α2 be two Laplace densities. Then the likelihood-ratio

f1(x)f2(x)

=(2α1)−1e−|x|/α1

(2α2)−1e−|x|/α2=

α2

α1e(1/α2−1/α1)|x|

is not monotone in x. Therefore f1(x) and f2(x) cannot be likelihood-ratio ordered.


2.6.2 Optimal Sequences for Total Expected Costs

Some orders between two nondecreasing functions are defined below to shorten thenotation. Suppose that H1(x) and H2(x) are two nondecreasing functions.

• H1(x) is said to be prior to H2(x) in increment order, denoted as H1 ≺inc H2,if H2(x)−H1(x) is nondecreasing in x, or equivalently, H2(x) has greater incre-ments than H1(x) in the sense that

H1(t)−H1(s)≤ H2(t)−H2(s) for t > s. (2.55)

In this case we also say that H2 is steeper than H1.

• H1 is said to be prior to H2 in convexity order, written H1 ≺cv H2 (or H2 ≻cv H1),if H2(x) has more convexity than H1(x) in the sense that

H1(αs+(1−α)t)−αH1(s)− (1−α)H1(t)

≥ H2(αs+(1−α)t)−αH2(s)− (1−α)H2(t) (2.56)

for all α ∈ (0,1), which is equivalent to the convexity of H2(x)−H1(x).

Let Hi j(t) = γimi(t)− γ jm j(t). It follows from (2.55) and (2.56) that

γimi ≺inc γ jm j ⇐⇒ Hi j(t) is nonincreasing; (2.57)

γimi ≺cv γ jm j ⇐⇒ Hi j(t) is concave. (2.58)

The following theorem presents the optimal sequence to minimize E[

∑ f j(Cj)].

Theorem 2.19. Let Pi ∼ φi(t) = (1+G(t)/γi)−1, i = 1, . . . ,n.

(a) For nonnegative P1, . . . ,Pn, the sequence 1,2, . . . ,n minimizes E[

∑ f j(Cj)]

if

i > j =⇒ γimi ≺inc γ jm j. (2.59)

In other words, the non-increasing order of γimi(t) in the ≺inc sense is opti-mal if such an order exists.

(b) When P1, . . . ,Pn are real-valued random variables with nonnegative means, thesequence 1,2, . . . ,n minimizes E

[∑ f j(Cj)

]if

i > j =⇒ γimi ≺inc γ jm j and γimi ≺cv γ jm j. (2.60)

In other words, if γimi(t) have the same order in the ≺inc and ≺cv sense, thenthe nonincreasing order in either sense is optimal.


Proof. Let π = . . . , i, j, . . . be a job sequence with i > j, π ′ = . . . , j, i, . . . bethe sequence by interchanging two consecutive jobs i, j in π , and C denote thecompletion time of the job prior to job i under π . Then for the objective functionT EC(π) = E

[∑ f j(Cj)

], since fi(t) are independent of Pi,

TEC(π)−TEC(π ′)

= E[ fi(C+Pi)]+E[ f j(C+Pi+Pj)]−E[ f j(C+Pj)]−E[ fi(C+Pi+Pj)]

= E[mi(C+Pi)]+E[m j(C+Pi+Pj)]−E[m j(C+Pj)]−E[mi(C+Pi+Pj)].

Denote the cdf’s of Pi, Pj and Pi +Pj by Fi(x), Fj(x) and Fi j(x) respectively as inLemma 2.6. Since Pi,Pj and C are independent, conditional on C,


= E[mi(C+Pi)]−E[mi(C+Pi+Pj)]−E[m j(C+Pj)]+E[m j(C+Pi +Pj)]

= E[∫ ∞

−∞mi(C+ x)d[Fi(x)−Fi j(x)]

]−E[∫ ∞

−∞m j(C+ x)d[Fj(x)−Fi j(x)]

].

By Lemma 2.6,

T EC(π)−TEC(π ′) =1γi

E[∫ ∞

−∞(γimi(C+ x)− γ jm j(C+ x))d[Fi(x)−Fi j(x)]

]

=1γi

E[Hi j(C+Pi)]−E[Hi j(C+Pi +Pj)]

. (2.61)

Consider two cases corresponding to parts (a) and (b) of the theorem.

Case 1. P1, . . . ,Pn are nonnegative variables. Under the condition that i > j impliesγimi ≺inc γ jm j, Hi j(t) is nonincreasing by (2.57). Hence by (2.61) it is clear that

TEC(π)−TEC(π ′) =1γi

E[Hi j(C+Pi)−Hi j(C+Pi+Pj)]≥ 0. (2.62)

Case 2. P1, . . . ,Pn are real-valued with E[Pi]≥ 0, i = 1, . . . ,n. Then under the condi-tion that Hi j(x) are non-increasing and concave functions,

TEC(π)−TEC(π ′) =1γiE[Hi j(C+Pi)−Hi j(C+Pi+Pj)]

≥ 1γiE[Hi j(C+Pi)−Hi j(C+Pi+E[Pj])]≥ 0, (2.63)

where the first inequality follows from applying Jensen’s inequality to the concaveHi j(x) conditional on C+Pi, and the second inequality holds because Hi j(x) is non-increasing.

Remark 2.4. Pinedo and Wei (1986) obtained the optimal schedule to minimize thetotal expected waiting cost ∑E[g(Ci)] with a general but deterministic waiting cost


function g. This is a special case of T EC(π) = E[

∑ f j(Cj)]

with all f j equal to acommon deterministic function. On the other hand, the results of Pinedo and Wei(1986) allow more general distributions of the processing times and multiple ma-chines in a flowshop setting.

2.6.3 Optimal Sequences with Due Dates

We now discuss the cost function E[

∑wjg(Cj − D j)ICj>Dj]

with random duedates Di. An application of Theorem 2.19 yields the next theorem.

Theorem 2.20. Suppose that Di have a common distribution. Then a sequence innonincreasing order of γiwi, or equivalently, in nondecreasing order of E[Pi]/wi,minimizes the TEC(π) = E

[∑wjg(Cj −D j)ICj>Dj

]if either

(1) g(t) is an increasing function and Pi are nonnegative, or

(2) g(t) is a convex and non-decreasing function and Pi are real-valued withnonnegative means.

Proof. We first note by Lemma 2.6 that nonincreasing order of γiwi is equivalentto nondecreasing order of E[Pi]/wi since γ j may take 1/E[Pj]. Let

fi(t) = wig(t −Di), i = 1,2, . . . ,n,

and D be a representative of Di. Then mi(t) = E[ fi(t)] = wiE[g(t −D)], whichgives Hi j = (wiγi −wjγ j)E[g(t −D)]. Since g(t) is nondecreasing,

γimi ≺inc γ jm j ⇐⇒ Hi j(t) is nonincreasing ⇐⇒ wiγi ≤ wjγ j. (2.64)

Thus if γ1w1 ≥ · · · ≥ γnwn, then 1, . . . ,n is optimal by Theorem 2.19 and so asequence in non-increasing order of γiwi minimizes T EC(π). This proves theoptimality result under condition (1).

Furthermore, when condition (2) holds, note that a convex g implies that

γimi ≺cv γ jm j ⇐⇒ Hi j(x) is concave ⇐⇒ wiγi ≤ wjγ j. (2.65)

Combining (2.65) with (2.64), the result under condition (2) follows.

Remark 2.5. Theorem 2.20 reveals an interesting fact regarding the problem ofminimizing the expected discounted cost function E

[∑wj(1− e−rCj )

]. This prob-

lem is a special case of the model in Theorem 2.20 by setting Di = 0 and tak-ing g(t) = 1− e−rt . It is well-known that the sequence with weighted discountedshortest expected processing time first rule (WDSEPT) is optimal for this problem.


Theorem 2.20 says that when the distributions of processing times are of compound-type, the WDSEPT rule reduces to the WSEPT (weighted shortest expected process-ing time first) rule.

We next consider due dates with different distributions. Let Qi(x) = Pr(Di ≤ x)denote the cdf of due date Di, i = 1, . . . ,n.

Theorem 2.21. Let P1, . . . ,Pn be non-negative random variables and g(·) non-decreasing.

(a) The sequence 1,2, . . . ,n minimizes the T EC(π) in (2.16) if

γ1w1Q1 ≻inc γ2w2Q2 ≻inc · · ·≻inc γnwnQn. (2.66)

(b) If in addition, g(·) is also a convex (concave) function, then the sequence1,2, . . . ,n (n,n− 1, . . .,2,1) is optimal if

γ1w1Q1(x)≥ γ2w2Q2(x)≥ · · ·≥ γnwnQn(x). (2.67)

Proof. Let fi(t) = wig(t −Di) then

mi(t) = wiE[g(t −Di)] = wi

∫ ∞

0g(t − x)dQi(x),

which yields

Hi j(t) = γimi(t)− γ jm j(t) =∫ ∞

0g(t − x)d[γiwiQi(x)− γ jw jQ j(x)]. (2.68)

For i > j, since γiwiQi ≺inc γ jw jQ j, γiwiQi(x)− γ jw jQ j(x) is nonincreasing inx. Hence Hi j(t) is nonincreasing by (2.68) for g(t − x) is nondecreasing in t,which is equivalent to γimi ≺inc γ jm j. Part (a) of the theorem then follows fromTheorem 2.19.

We now turn to part (b). When g(x) is convex, it has finite right derivative,denoted as g′+(x), at every point x. Moreover, the convexity of g implies

g(t − x)− g(s− x)≥ (t − s)g′+(s− x)≥ 0,

which in turn implies that g(t − x)−g(s− x) is bounded from below with respect tox for arbitrary t > s. Hence (2.68) can be rewritten as

Hi j(t)−Hi j(s) =−∫ ∞

0[γiwiQi(x)− γ jw jQ j(x)]d[g(s− x)− g(t− x)].

As g is convex, g(t − x)− g(s − x) is non-increasing in x. If γiwiQi ≤ γ jw jQ j,then Hi j(t)−Hi j(s) ≤ 0 for t > s, i.e., γimi ≺inc γ jm j. Hence by Theorem 2.19,1,2, . . . ,n is optimal when g is convex. Similarly, if g is concave, then g(t − x)

−g(s − x) is non-decreasing in x, so that Hi j(t)− Hi j(s) ≥ 0 for t > s. Thus byTheorem 2.19 again, n, . . . ,2,1 is optimal when g is concave.

Remark 2.6. Part (a) of Theorem 2.21 shows that the order ≻inc between γiwiQi(x)leads to an optimal policy. If the due dates are identically distributed, then the opti-mality condition reduces to γ1w1 ≥ γ2w2 ≥ · · · ≥ γnwn. To show an example of thedistributions of due dates such that Theorem 2.21 applies, let the due dates Di beexponentially distributed with rate λi, i = 1, . . . ,n. Suppose that λ1 ≤ λ2 ≤ · · · ≤ λnand γ1w1λ1 ≥ γ2w2λ2 ≥ · · ·≥ γnwiλn. Then for any i < j and t > s > 0, there existsξ ∈ (s, t) such that

γiwi[Qi(t)−Qi(s)]γ jw j [Q j(t)−Q j(s)]

=γiwiQ′

i(ξ )γ jw jQ′

j(ξ )=

γiwiλi

γ jw jλ je(λ j−λi)ξ ≥ 1.

Therefore i < j =⇒ γiwiQi ≻inc γ jw jQ j and so part (a) of Theorem 2.21 applies.

Remark 2.7. Part (b) of Theorem 2.21 indicates that when the cost function g isconvex (concave) and nondecreasing, the requirement for the order ≻inc reduces tothe point-wise order of the functions γiwiQi(x), which will further reduce to thestochastic order of the due dates Di when γiwi equal a common value for all jobs. Ifg is merely non-decreasing but not convex (concave), the pointwise order in (2.67)does not ensure the optimality of the sequence 1,2, . . . ,n (n, . . . ,2,1); see thefollowing example.

Example 2.16. Consider n= 2 and a discounted cost function g(x)= (1−e−x)Ix≥0.Let Pr(P1 > x) = Pr(P2 > x) = e−x, x ≥ 0. Then γ1 = γ2 = 1, and the density func-tion for P1 +P2 is given by xe−x, x > 0. The due dates D1 and D2 are deterministicd1 and d2 respectively. Let π1 = 1,2 and π2 = 2,1, and assume w1 = w2 = 1.Then for T EC(π) = ∑wiE[g(Pi − di)IPi>di],

T EC(π1) = E[g(P1 − d1)]+E[g(P1+P2 − d2)]

=∫ ∞

d1

(1− e−(x−d1))e−xdx+∫ ∞

d2

(1− e−(x−d1))xe−xdx

=12

e−d1 +14(3+ 2d2)e−d2

and similarly,

T EC(π2) =12

e−d2 +14(3+ 2d1)e−d1 .

Therefore,

TEC(π1)−TEC(π2) =14(1+ 2d2)e−d2 − 1

4(1+ 2d1)e−d1 . (2.69)

It is easy to check that (1+ 2d)e−d is increasing in d < 0.5 and decreasing in d >0.5. Hence if d1 < d2 < 0.5, then (1+ 2d1)e−d1 < (1+ 2d2)e−d2 and so by (2.69),T EC(π1)> T EC(π2). On the other hand,

d1 < d2 =⇒ γ1w1Q1(x) = Ix≥d1 ≥ Ix≥d2 = γ2w2Q2(x).

Therefore, when d1 < d2 < 0.5, condition (2.67) is satisfied but π1 = 1,2 is notoptimal. Similarly when 0.5 < d1 < d2, (2.67) holds but π2 = 2,1 is not optimal.

We present the next theorem without proof, which is similar to that ofTheorem 2.21.

Theorem 2.22. Let P1, . . . ,Pn be real-valued random variables with nonnegativemeans and g non-decreasing. Then the sequence 1,2, . . . ,n minimizes the T EC(π)if either

(1) g is convex and γ1w1Q1 ≻inc γ2w2Q2 ≻inc · · ·≻inc γnwnQn, or

(2) g is differentiable with convex derivative g′(x) and γ1w1Q1(x)≥ · · ·≥γnwnQn(x).

Remark 2.8. The previous results can be easily extended to the case with prece-dence constraints in the form of nonpreemptive chains. Suppose that jobs 1, . . . ,nare divided into m nonpreemptive chains ui = i1, . . . , iki, i = 1, . . . ,m. Each chainis subject to precedence constraints in the sense that jobs within a chain must beprocessed according to a specified order. The chains are nonpreemptive in the sensethat once the machine starts to process a job in a chain, it cannot process any jobin a different chain until all jobs in the current chain are finished. The schedulingproblem then becomes one of ordering the m chains u1, . . . ,um. We can extend theresults in Theorems 2.19–2.22 straightforwardly to the scheduling problems undersuch precedence constraints as follows.

For two jobs i and j, we define i ≺ j in accordance with each situation inTheorems 2.19–2.22 that leads to an optimal sequence. For example, in the caseof Theorem 2.19, we define i ≺ j if γimi ≻inc γ jm j in part (a); or γimi ≻inc γ jm j andγimi ≻cv γ jm j in part (b), and so on. If the chains u1, . . . ,um can be ordered suchthat u1 ≺ u2 ≺ · · · ≺ um, where ui ≺ u j ⇐⇒ k ≺ l for all k ∈ ui and l ∈ u j, then theoptimal sequence of the chains is in the order of (u1,u2 . . . ,um). For example, whenthe processing times are nonnegative, if i < j =⇒ γkmk ≻inc γlml for all k ∈ ui andl ∈ u j, then the sequence (u1,u2 . . . ,um) is optimal. This extends the result in part (a)of Theorem 2.19. Similarly we can extend any other result in Theorems 2.19–2.22.

Chapter 3Irregular Performance Measures

In recent years, a main thrust of research in the scheduling field is to consider theso-called irregular performance measures, which involve minimization of both ear-liness and tardiness (E/T) costs of completing the jobs. Tardiness costs are common,which arise when a job misses its due date, whereas earliness costs represent, forexample, inventory and other administrative costs if a job is finished too early andhas to be stored before being delivered to customers. E/T scheduling problems arein fact largely motivated by the just-in-time concept in the manufacturing indus-try, which aims to finish jobs exactly at their due dates, neither earlier nor later. Inaddition to the manufacturing industry, E/T scheduling problems have applicationsin many other domains. An example is the scheduling of a set of astronomical ex-periments about an external event like the passing of a comet, where it is hopedthat the experiments will be carried out as close to the external event as possible.Another example is the scheduling of harvesting tasks under both the mature datesof the crops and the likely arrival of a disastrous event such as a severe typhoon.It is imperative to schedule the harvesting operations appropriately in such a sit-uation, in order to minimize the possible losses. Comprehensive reviews on E/Tscheduling can be found in Baker and Scudder (1990), Lauff and Werner (2004),and Hoogeveen (2005). For more recent work, see, e.g., Benmansour et al. (2012),Hino et al. (2005), Ronconi and Powell (2010), Wan and Yen (2009), Wu (2010),and the references therein.

This chapter covers stochastic scheduling problems with irregular performancemeasures. Section 3.1 is focused on models where both the earliness and tardinesscosts are functions of the completion time deviations from the due date. In Sect. 3.2,we consider the problem where the tardiness cost is a fixed charge once a job islate, whereas the earliness cost depends on the amount of completion time deviationfrom the due date. Section 3.3 addresses the completion time variance problem,a model that has been studied in the scheduling field for decades. We will showthat, a common structure of the optimal schedule for an E/T problem is a V-shapearound a due date. We will derive such properties for each model, characterize theanalytic optimal solutions when possible, and develop solution algorithms based onthe optimality properties. We will show that dynamic programming algorithms can


95

96 3 Irregular Performance Measures

usually be established based on V-shape properties. The expositions in this chapterare mainly based on Cai and Tu (1996) and Cai and Zhou (1997a, b, 1999, 2000)

3.1 Earliness/Tardiness Penalties

In this section, we consider the problem to minimize earliness/tardiness (E/T) penal-ties with random processing times and due dates. One of our main results is that anoptimal sequence must be V-shaped with respect to certain index set. This propertysubstantially reduces the number of sequences that are possibly optimal and servesas the basis to develop efficient algorithms.

Two types of distributions are considered for the processing times:

(i) Normal distributions; and

(ii) Exponential distributions.

3.1.1 Normal Processing Times

Assume that the processing times Pi follow normal distributions with means µiand variances σ2

i for job i, i = 1, . . . ,n. The variances are assumed proportional tothe means: σ2

i = aµi, i = 1, . . . ,n. This relationship holds when each job consistsof a large number of independent elementary tasks. A situation of this nature is thegroup technology environment where similar tasks are grouped together. Neverthe-less, even if the tasks follow different distributions, the proportional relationship stillapproximately hold under some mild conditions, see the Appendix of this chapterfor a justification.

As usual, we assume that the means µi are positive integers. Since the pro-cessing times are nonnegative in practice, a should be restricted such that the proba-bility of a processing time being negative is negligible. Technically, we require thata ≤ µi/4 for all i, which implies Pr(Pi < 0) = Φ(−µi/σi) = Φ(−

√µi/a) ≤ 0.05,

where Φ(·) denotes the standard normal cdf. Note that a similar restriction is im-posed on a in Sarin et al. (1991).

The objective function is formulated as the expectation of a weighted combina-tion of three types of penalties, namely, earliness, tardiness, and flowtime. Earlinessand tardiness are penalized with different weights, to reflect the reality that the na-ture of costs incurred by earliness and tardiness is inherently different in practice.The flowtime is included as an additional criterion to model the productivity of thesystem. This is a traditional measure, which provides an incentive to turn around or-ders rapidly if the machine concerned represents, say, a manufacturing company. Ina practical situation, it is natural that meeting the due dates of the external customersand increasing the efficiency of the internal system are both desirable. Specifically,the objective function is given by

3.1 Earliness/Tardiness Penalties 97

EET (π ,r) = E

[

∑Ci≤Di

α|Di −Ci|+ ∑Ci>Di

β |Ci − di|+n

∑i=1

γCi

], (3.1)

where

• π is the sequence to determine the order of processing the jobs;

• r is the time when the machine starts to process its first job, which is assumed tobe a nonnegative integer;

• Ci is the completion time of job i under (π ,r);

• Di is the due date of job i;

• α ≥ 0 is the unit earliness penalty;

• β ≥ 0 is the unit tardiness penalty; and

• γ ≥ 0 is the unit flow time penalty.

The problem is to find (π∗,r∗) that minimize EET (π ,r) with respect to (π ,r).

We assume that the due dates Di are random variables independent of the pro-cessing times Pi and follow a common distribution as that of a representativerandom variable D. Let FD(t) denote the cdf of D. We will restrict the value of r tobe selected from a certain interval [0, r], where r is an upper bound on r such that

Pr(r− x ≤ D < r)≤ Pr(r ≤ D ≤ r+ x) ∀x ≥ 0 and r ∈ [0, r], (3.2)

or equivalently,

Pr(D < r ≤ D+ x)≤ Pr(D− x ≤ r ≤ D) ∀x ≥ 0 and r ∈ [0, r]. (3.3)

Such a restriction is to avoid the situation that the machine is kept idle until a pointr such that the chance for the random due date to occur before r is greater than thatafter r, which is clearly unreasonable.

Notice that in most commonly used distributions, the value for r can be deter-mined analytically. For example, among various distributions for FD(t), the normaland uniform distributions may be considered as among the most common cases. Theuniform distribution arises naturally when there is little information about the duedate other than its range, while the normal distribution is suitable in the situationwhere there is an indication that the due date is around a known average value withcertain standard error. It is easy to see that for either normal or uniform distribution,r should be set to be the mean of D, namely, r = E[D], which will guarantee thatany r ∈ [0, r] satisfies (3.2). More analytical results on r for a wide range of due datedistributions can be found in Appendix. In general, one may always determine bynumerical computation the value for r that satisfies (3.2).

Without loss of generality, we assume that the jobs have been labeled so thatµ1 ≤ µ2 ≤ · · ·≤ µn.


Objective Function Development

We now present some alternative forms of the objective function, which are equiv-alent to (3.1) and will serve as the basis for our analysis and solution algorithms tobe developed.

Denote x+ = max(x,0). Then (3.1) can be written as

EET (π ,r) =n

∑i=1

E[α(Di −Ci)+ +β (Ci −Di)

+ + γCi]. (3.4)

For notational convenience we define following functions on (0,∞):

F(x) = (α +β )φ(x)+ x[(α +β )Φ(x)−α], (3.5)

f (x) = x∫ ∞

0F(

xa− t − r

x

)dFD(t)+ γ

(r+

x2

a

)(3.6)

and g(x) = f (√

ax), where

φ(x) = 1√2π

e−12 x2

and Φ(x) =1√2π

∫ x

−∞e−

12 y2

dy

are the density and cdf of the standard normal distribution, respectively.

Theorem 3.1. Let Bi(π) denote the set of jobs to be processed no later than job i inthe sequence π , and θi = ∑ j∈Bi(π) µ j. Then the objective function EET (π ,r) givenin (3.1) is equivalent to

EET (π ,r) =n

∑i=1

√aθi

∫ ∞

0F(

θi + r− t√aθi

)dFD(t)+ γ(r+θi) (3.7)

or

EET (π ,r) =n

∑i=1

f (√

aθi) =n

∑i=1

g(θi). (3.8)

Proof. Let X ∼ N(µ ,σ2). It is easy to calculate that for any real value c,

E[(X − c)+] = σφ(

µ − cσ

)+(µ − c)Φ

(µ − c

σ

)(3.9)

and

E[(c−X)+] = σφ(

µ − cσ

)− (µ − c)

[1−Φ

(µ − c

σ

)]. (3.10)


By the properties of the normal distribution, Ci = r+∑ j∈Bi(π) p j ∼ N(r + θi,aθi).It follows from (3.9) to (3.10) that for each i, the i-th summand in (3.4) conditionalon Di = t is given by

E[α(Di −Ci)+ +β (Ci −Di)

++ γCi | Di = t]

= α√

aθiφ(

θi + r− t√aθi

)− (θi + r− t)

[1−Φ

(θi + r− t√

aθi

)]

+β√

aθiφ(

θi + r− t√aθi

)+(θi + r− t)Φ

(θi + r− t√

aθi

)+ γ(θi + r)

=√

aθiF(

θi + r− t√aθi

)+ γ(θi + r).

Thus by (3.4),

EET (π ,r) =n

∑i=1

E[E[α(Di −Ci)+ +β (Ci −Di)

+ + γCi | Di]]

=n

∑i=1

√aθi E

[F(

θi + r−Di√aθi

)]+ γ(θi + r).

This proves (3.7), while (3.8) follows immediately from (3.7) and the definitions off (x) and g(x).

Optimality Properties

We now establish some important properties of the optimal solutions to the problemformulated above, including a V-shape property. The concept of V-shaped sequenceswas firstly introduced by Eilon and Chowdhury (1977) in the context of the com-pletion time variance problem. A sequence is said to be V-shaped with respect toprocessing times if in the sequence the jobs before (after) the job with the shortestprocessing time are arranged in non-increasing (non-decreasing) order of processingtimes.

First, we present a lemma that is essential to derive the V-shaped structure ofoptimal sequences. The proof of this lemma needs the following definition.

Definition 3.1. A function is said to be V-shaped on an interval (a,b) if there existsa δ ∈ [a,b] such that the function is decreasing on (a,δ ) and increasing on (δ ,b).

Note that the above definition includes monotone functions as a special case,which occurs when the δ coincides with one of the end points of the interval.

Lemma 3.1. The function f (x) defined in (3.6) is V-shaped on the interval (√

aµ1,∞),or equivalently, g(x) = f (

√ax) is V-shaped on the interval (µ1,∞).


Proof. It is easy to calculate

F ′(t) = (α +β )φ(t)(−x)+ [(α +β )Φ(t)−α]+ x(α +β )φ(t) = (α +β )Φ(t)−α.

Let y = y(x, t) = x/a− (t − r)/x. Then by (3.6),

f (t) =∫ ∞

0xF(y(x, t))dFD(t)+ γ

(r+

x2

a

). (3.11)

It is easy to calculate

∂∂x

xF(y(x, t)) = F(y)+ xF ′(y)∂y∂x

= (α +β )φ(y)+ y[(α +β )Φ(y)−α]+ x[(α +β )Φ(y)−α]

(1a+

t − rx2

)

= (α +β )φ(y)+ 2xa[(α +β )Φ(y)−α] (3.12)

and similarly,

∂ 2

∂x2 xF(y(x, t)) = (α +β )φ(y)(

xa+

t − rx

)2 1x+

2a[(α +β )Φ(y)−α]. (3.13)

From (3.11) to (3.13) we obtain

f ′(t) =∫ ∞

0

(α +β )φ(y)+ 2x

a[(α +β )Φ(y)−α + γ]

dFD(t) (3.14)

and

f ′′(t) =∫ ∞

0

(α +β )φ(y)

(xa+

t − rx

)2 1x+

2a[(α +β )Φ(y)−α + γ]

dFD(t).

(3.15)

If x0 ∈ [√

aµ1,∞) satisfies f ′(x0) = 0, then by (3.14),

∫ ∞

0

2a[(α +β )Φ(y0)−α + γ]dFD(t) =−

∫ ∞

0

1x0(α +β )φ(y0)dFD(t)

where y0 = y(x0, t). Substituting this into (3.15) we get

f ′′(x0) =∫ ∞

0(α +β )φ(y0)

[(x0

a+

t − rx0

)2

− 1

]1x0

dFD(t). (3.16)

Recall the assumption a ≤ µi/4 for each i. Hence x0 ≥ √aµ1 = 2a

√µ1/4a ≥ 2a

and(

x0

a+

t − rx0

)2

− 1 > 1 for t ≥ r.

It follows from (3.16) that

x0

α +β f ′′(x0) =∫ ∞

−∞φ(y0)

[(x0

a+

t − rx0

)2

− 1

]dFD(t)

>−∫

(−∞,r)φ(y0)dFD(t)+

∫

[r,∞)φ(y0)dFD(t). (3.17)

Because φ(t) is symmetric about 0 and increasing on (−∞,0],

φ(y0) = φ(

x0

a− t − r

x0

)= φ

(t − rx0

− x0

a

)

is symmetric in t about s0 = r + x20/a > r and increasing on (−∞,s0]. Therefore,

φ(y0) as a function of t satisfies all the conditions on ϕ(·) in Lemma C of Cai andZhou (1997a). Furthermore, the assumption in (3.2) is equivalent to

∫

[r−x,r)dFD(t)≤

∫

[r,r+x]dFD(t) ∀x ≥ 0.

Thus an application of the integral inequality in Appendix shows that∫

(−∞,r)φ(y0)dFD(t)≤

∫

[r,∞)φ(y0)dFD(t).

It then follows from (3.17) that f ′′(x0)> 0. In conclusion, for any x0 ≥√

aµ1 suchthat f ′(x0) = 0, we must have f ′′(x0) > 0. This means that f ′(t) has at most onezero point on [

√aµ1,∞), at which it is strictly increasing. As a result, f (t) has at

most one local minimum and no local maximum on [√

aµ1,∞), which implies thatf (t) is V-shaped on (

√aµ1,∞).

Now we can prove the V-shape of an optimal sequence.

Theorem 3.2. Given any starting time r, an optimal sequence which minimizesEET (π ,r) must be V-shaped with respect to µi.

Proof. By Lemma 3.1, g(x) is a strictly ‘quasiconvex’ function on [µ1,∞) in thesense similar to that of Federgruen and Mosheiov (1997). That is, g(x) satisfies thecondition g(y) < maxg(x),g(z) if µ1 ≤ x < y < z. Note that as θi ≥ µ1 for ev-ery i, we can restrict the domain of g(x) on [µ1,∞) without affecting the objectivefunction. The theorem then follows immediately from (3.8) and a standard argu-ment of switching neighboring jobs, which is similar to the proof of Theorem 1 of


Federgruen and Mosheiov (1997), with the only difference being that the relevantinequalities now hold strictly.

Additional optimality properties are given below:

Property 1. If ∫ ∞

0Φ(

µ1 + r− t√

aµ1

)dFD(t)≥

α − γα +β , (3.18)

then the SEPT (Shortest Expected Processing Time first) sequence is optimal.

Proof. If (3.19) holds, from (3.14) we see that

f ′(√

aµ1)>2a√

aµ1

[(α +β )

∫ ∞

0Φ(

µ1 + r− t√

aµ1

)dFD(t)− (α − γ)

]≥ 0.

This together with the fact that f ′(x) must have a positive slope at any of its zeropoint on [

√aµ1,∞) show that f ′(x) ≥ 0 on [

√aµ1,∞). Thus f (x) is an increas-

ing function on [√

aµ1,∞) and so g(x) is an increasing function on [µ1,∞). SinceEET (π ,r) = ∑i g(θi) and θi ≥ µi ≥ µ1 for each i, the SEPT sequence minimizesthe objective function.

Property 2. If α ≤ γ , then the optimal sequence must be in SEPT order and theoptimal starting time must be equal to zero.

Proof. Clearly (3.18) always holds when α ≤ γ . Hence by Property 1 the optimalsequence must be in SEPT order. Furthermore, for any given sequence π , by (3.7)we obtain

ddr

EET (π ,r) =n

∑i=1

√aθi

∫ ∞

0F ′(

θi + r− t√aθi

)1√aθi

dFD(t)+ nγ

=n

∑i=1

∫ ∞

0

[(α +β )Φ

(θi + r− t√

aθi

)−α

]dFD(t)+ nγ

=n

∑i=1

[(α +β )

∫ ∞

0Φ(

θi + r− t√aθi

)dFD(t)+ (γ −α)

], (3.19)

which is positive when α ≤ γ . Thus EET (π ,r) is increasing in r and so r = 0achieves the minimum of EET (π ,r).

Because of Property 2, we need only deal with the case of α > γ in the followingtwo properties.


Property 3. If the due dates equal a deterministic common value d, then the SEPTsequence is optimal for

d ≤ µ1 +√

aµ1Φ−1(

β + γα +β

). (3.20)

Proof. When (3.19) holds,

µ1 + r− d√

aµ1≥ µ1 − d

√aµ1

≥−Φ−1(

β + γα +β

)= Φ−1

(α − γα +β

),

hence

Φ(

µ1 + r− d√

aµ1

)≥ α − γ

α +β .

This shows that (3.18) holds and so the SEPT sequence is optimal by Property 1.

Note that if α ≤ β +2γ , then Φ−1((β + γ)/(α +β ))≥ 0 and so (3.20) holds ford ≤ µ1. Thus, as a special case for Property 3, when α ≤ β +2γ , the SEPT sequenceis optimal if the due dates equal a deterministic common value d ≤ µ1.

Property 4. Define

H(π ,r) =n

∑i=1

∫ ∞

0Φ(

θi + r− t√aθi

)dFD(t). (3.21)

Let πS and πL denote the SEPT and LEPT sequences respectively, and rS, rL satisfy

H(πS,rS) =n(α − γ)

α +β , H(πL,rL) =n(α − γ)

α +β , (3.22)

(which exist because H(π ,r) increases from 0 to n as r moves from −∞ to +∞).If (π∗,r∗) is the optimal solution, then

maxrL,0≤ r∗ ≤ minrS,d. (3.23)

In particular, if rS ≤ 0, or equivalently if

H(πS,0)≥n(α − γ)

α +β , (3.24)

then r∗ = 0.

Proof. It is clear from (3.21) that H(π ,r) is strictly increasing in r. Moreover, as(θi + r− d)/

√aθi is increasing in θi and Φ is an increasing function, H(π ,r) has

the form of ∑i h(θi) with h(·) being an increasing function. Hence

H(πS,r)≤ H(π ,r)≤ H(πL,r) for all π and r.


It follows that for any r > rS,

H(π∗,r)≥ H(πS,r)> H(πS,rS) =n(α − γ)

α +β .

This together with (3.19) and (3.21) yields

ddr

EET (π∗,r) = (α +β )[

H(π∗,r)− n(α − γ)α +β

]> 0 for r > rS.

Consequently EET (π∗,r) is strictly increasing in r > rS, and so r∗ > rS would leadto EET (π∗,r∗)> EET (π∗,rS), which contradicts the assumption that (π∗,r∗) min-imizes the EET . Thus r∗ ≤ rS. Similarly we can show that r∗ ≥ rL. As r∗ ∈ [0,d],(3.23) is thus proved. Furthermore, since H(π∗,r) is strictly increasing in r, (3.24)holds if and only if rS ≤ 0, and if this happens, by (3.23) r∗ must equal 0.

Remark 3.1. Note that rS and rL are computable using (3.22) since the sequencesSEPT and LEPT are known sequences. This means that, given any problem instance,we can always compute a range for the optimal r∗ from (3.23).

Algorithms

Based on the V-shape of the optimal sequence, we provide two dynamic program-ming algorithms to compute the solution. The first algorithm can find an exactoptimal solution through enumerating r and using a dynamic programming pro-cedure to determine the optimal sequence under each r. The second algorithm isan approximate approach that employes a Fibonacci method to search for r, whichis faster than the first algorithm, although it has no theoretical guarantee to find anexact optimum. Since the optimal solution has been known to be π∗ = πSEPT andr∗ = 0 when any of the conditions of Properties 1–3 is satisfied, in the sequel weonly consider the case where none of these conditions is met.

An exact algorithm: Let Si = 1,2, . . . , i. Then, according to Theorem 3.2, jobi should be sequenced either the first or the last among all jobs in Si. Let hi(θ ,r)be the contribution of the jobs in Si towards the overall objective function (3.7),given that θ is the sum of mean processing times of the jobs sequenced before thejobs in Si. Then, we have the following dynamic programming algorithm, whereΘi = ∑ j∈Si

µ j.

Algorithm 3.1

• Evaluate rL and rS according to (3.21)–(3.23). Take rmin =max0,rL and rmax =mind,rS, and let ℜ be the set of integers contained in [rmin,rmax].


• For each r ∈ ℜ and i = 1,2, . . . ,n, compute:

hi(θ ,r) = ming(θ + µi)+ hi−1(θ + µi,r); g(θ +Θi)+ hi−1(θ ,r) . (3.25)

for θ = 0,1, . . . ,Θn −Θi, subject to the boundary condition:

h0(θ ,r) = 0, ∀θ . (3.26)

• Optimal r∗ is the one that satisfies hn(0,r∗)≤ hn(0,r), ∀r ∈ ℜ, and optimal π∗ isthe sequence constructed by a backtracking procedure that achieves hn(0,r∗).

It is easy to see that the time requirement to compute hi(θ ,r) for all r, i and θ isbounded above by O(nΘn|ℜ|), where |ℜ| is the cardinality of the set ℜ. The timecomplexity of the algorithm reduces to O(nΘn) when (3.24) is satisfied. In general,it is bounded above by O(ndΘn) according to (3.23).

An approximate algorithm: Given any r, we can find an optimal sequence π thatminimizes EET (π ,r). Let G(r) = minπ EET (π ,r). The problem now becomes tofinding an optimal r∗ to minimize G(r). We can search for this point by the Fibon-naci method (cf. Luenberger (1984)). We set, in the Fibonnaci search, the desiredwidth of the final interval of uncertainty to be 1. This is sufficient since the finalsolution r f required by the problem should be an integer.

Algorithm 3.2

• Calculate the total number N of trial solutions as the smallest integer such thatFN > |ℜ|, where FN is a Fibonacci number calculated from Fi = Fi−1+Fi−2 withF0 = F1 = 1.

• Apply the Fibonnaci method (see Appendix for details) to search r, in whichG(rk) = hn(0,rk) with hn(0,rk) computed according to (3.25) and (3.26).

• Final solution: r f = ⌈rb⌉ and π f = the sequence that minimizes EET (π ,r f ),where rb is the end point of the final interval of uncertainty.

Clearly the algorithm above would be an optimal one if G(r) is a unimodal func-tion of r. Unfortunately this is not true, since G(r) may have multiple local minima.Although Algorithm 3.2 does not have a theoretical guarantee to find an optimalsolution, it needs less computing time as compared with Algorithm 3.1. It is easyto show that the time requirement for the Fibonnaci method is O(log |ℜ|). Thus thetime complexity of Algorithm 3.2 is O(nΘn log |ℜ|). Note that |ℜ| ≤ d. Hence anupper bound on the time complexity of Algorithm 3.2 is O(nΘn logd).


Computational Results

Computational experiments are carried out to evaluate the performance of the alg-orithms proposed above, which are designed as follows. The processing times arenormally distributed with mean µi and variance aµi for job i. The means µi are inte-gers randomly and independently drawn from the discrete uniform distribution over[10,100]. The parameter a is then randomly generated from U [0,µ1/4] (recall thata ≤ µi/4 for all i under our assumptions). The parameters α , β and γ are generatedrandomly from U [0,1]. Because, according to Property 2, the optimal solution hasbeen known for any problem with α ≤ γ , we only consider problems with α ≤ γ .The parameters α , β and γ are normalized to α +β + γ = 1.

Two cases for the due dates di are considered: deterministic due dates di = d0and random di ∼ U [d0,d0 + 10]. In both cases, d0 = bΘn, where b is a parameterto control the tightness of the due dates (the bigger the value for b, the looser thedue dates), randomly generated from a uniform U [0,2]. The parameters are taken asα = 0.3, β = 0.5, γ = 0.2, and a = 0.2.

We solve problems with n ranging from 10 to 100. For each problem, the ratio

Z =EET 2 −EET1

EET 1

is used to measure the quality of the solution obtained by Algorithm 3.2, whereEET 1 and EET 2 are the objective values found by Algorithms 3.1 and 3.2,respectively.

For each n = 10,20, . . . ,100, five problem instances are generated and solved.In Tables 3.1 and 3.2 we report the maximum Z among the five instances solved[denoted as max(Z)], the average Z [avg(Z)], and the average CPU times in seconds[avg(CPU1) and avg(CPU2)] required by Algorithm 3.1 and 3.2, respectively, insolving the five problem instances.

We can see from Tables 3.1 and 3.2 that Algorithm 3.2 is able to find, quickly,solutions that are very close to the true optima. As expected, in the experiments weobserved that in the cases where the optimal starting times r∗ are much greater thanzero, Algorithm 3.2 performed particularly well – in some cases it was much fasterthan Algorithm 3.1, while the sacrifice of solution quality was only marginal.

3.1.2 Exponential Processing Times

When the processing times are exponentially distributed, some elegant analyticalresults can be derived. Assume in this subsection that the processing times P1, . . . ,Pnare exponentially distributed with means µ1, . . . ,µn, respectively. We consider three


Table 3.1 Comparison of Algorithms 3.1 and 3.2 (determin-istic due dates)

n Max(Z) (%) Avg(Z) (%) Avg(CPU1) Avg(CPU2)10 0.00113 0.00023 3.67 0.3520 0.00068 0.00023 34.79 1.8330 0.00018 0.00004 94.10 3.9740 0.00000 0.00000 160.44 5.2850 0.00004 0.00001 410.69 8.4860 0.00000 0.00000 97.14 4.9470 0.00003 0.00001 1100.27 19.5780 0.00005 0.00002 2612.86 28.1690 0.00006 0.00001 2328.15 34.69

100 0.00004 0.00001 4275.19 64.45

Table 3.2 Comparison of Algorithms 3.1 and 3.2 (stochasticdue dates)

n Max(Z) (%) Avg(Z) (%) Avg(CPU1) Avg(CPU2)10 0.02736 0.00547 19.73 1.9320 0.00006 0.00001 82.19 7.8530 0.00020 0.00006 482.08 23.6340 0.00003 0.00001 804.33 30.5850 0.00007 0.00002 1710.13 49.9060 0.00006 0.00002 2142.90 53.4270 0.00002 0.00000 95.24 18.3680 0.00002 0.00000 3869.26 65.6990 0.00002 0.00000 4863.40 122.58

100 0.00005 0.00002 4956.24 72.12

types of E/T costs: symmetric quadratic, asymmetric quadratic, and asymmetriclinear cost functions.

Symmetric Quadratic Cost Function

The total expected cost with a symmetric quadratic cost function is given by

T EC(π) = E

[n

∑i=1

wi(Ci −Di)2

]=

n

∑i=1

wiE[(Ci −Di)2] (3.27)

with w1 + · · ·+wn = 1.

The problem to minimize the total expected cost in (3.27) is NP-complete evenif all jobs are subject to a common due date D (cf. Cai et al. 2000). The followingtheorem establishes a V-shape for the optimal sequence.


Theorem 3.3. If the due dates D1, . . . ,Dn have a common distribution, independentof Pi, then an optimal sequence to minimize TEC(π) in (3.27) is V-shaped withrespect to µi/wi.

Proof. By the exponential distribution of Pj we have E[Pj] = µ j and E[P2j ] = 2µ2

j .Hence

E[Ci] = ∑j∈Bi

E[Pj] = ∑j∈Bi

µ j, (3.28)

where Bi = Bi(π) is the set of jobs scheduled no later than job i under sequence π ,and

E[C2i ] = E

[

∑j∈Bi

P2j + ∑

j,k∈Bi, j =kPjPk

]= ∑

j∈Bi

E[P2j ]+ ∑

j,k∈Bi, j =kE[Pj]E[Pk]

= ∑j∈Bi

E[P2j ]+

(

∑j∈Bi

E[Pj]

)2

− ∑j∈Bi

(E[Pj])2

= ∑j∈Bi

2µ2j +

(

∑j∈Bi

µ j

)2

− ∑j∈Bi

µ2j = ∑

j∈Bi

µ2j +

(

∑j∈Bi

µ j

)2

. (3.29)

Let D denote a representative of the due dates Di. Then by (3.28)–(3.29) andthe independence between Di and Pi, the objective function in (3.27) can beexpressed by

TEC(π) =n

∑i=1

wi(E[C2

i ]− 2E[D]E[Ci]+E[D2])

=n

∑i=1

wi

⎧⎨

⎩∑j∈Bi

µ2j +

(

∑j∈Bi

µ j

)2⎫⎬

⎭− 2E[D]n

∑i=1

wi ∑j∈Bi

µ j +E[D2]

=V1 +V2 − 2E[D]V3+E[D2], (3.30)

where Vl =Vl(π), l = 1,2,3, are defined by

V1 =n

∑i=1

wi ∑j∈Bi

µ2j , V2 =

n

∑i=1

wi

(

∑j∈Bi

µ j

)2

, V3 =n

∑i=1

wi ∑j∈Bi

µ j. (3.31)

Given two jobs i and j, let π = . . . , i, j, . . . and π ′ = . . . , j, i, . . . denote twosequences which are identical except that the order of jobs i and j is switched.Define B∗ = Bi(π)−i= Bi(π ′)− j to be the set of jobs sequenced before job iunder π (or before job j under π ′), and write V ′

l =Vl(π ′), l = 1,2,3. Then by (3.31),

V1 −V ′1 = wi

(

∑k∈B∗

µ2k + µ2

i

)+wj

(

∑k∈B∗

µ2k + µ2

i + µ2j

)

−wj

(

∑k∈B∗

µ2k + µ2

j

)−wi

(

∑k∈B∗

µ2k + µ2

j + µ2i

)

=−wiµ2j +wjµ2

i ,

V2 −V ′2 = wi

(

∑k∈B∗

µk + µi

)2

+wj

(

∑k∈B∗

µk + µi + µ j

)2

−wj

(

∑k∈B∗

µk + µ j

)2

−wi

(

∑k∈B∗

µk + µ j + µi

)2

=−wiµ j

(2 ∑

k∈B∗µk + 2µi+ µ j

)+wjµi

(2 ∑

k∈B∗µk + 2µ j + µi

),

and V3 −V ′3 =−wiµ j +wjµi. Substituting these into (3.30), we obtain

T EC(π)−TEC(π ′) = 2(wjµi −wiµ j)

(∑

k∈B∗µk + µi + µ j

)−E[D]

(3.32)

Let π be a sequence that is not V-shaped with respect to µi/wi. Without loss ofgenerality we can assume that π = 1,2, . . . ,n. Then there are three consecutivejobs i, i+ 1 and i+ 2 under π such that µi/wi < µi+1/wi+1 > µi+2/wi+2. Thus

wi+1µi −wiµi+1 < 0 and wi+2µi+1 −wi+1µi+2 > 0 (3.33)

Let π ′ denote the sequence which switches jobs i and i+1 in π , and π ′′ the sequencewhich switches jobs i+ 1 and i + 2 in π . Therefore, π = . . . , i, i + 1, i + 2, . . .,π ′ = . . . , i+ 1, i, i+ 2, . . . and π ′′ = . . . , i, i+ 2, i+ 1, . . .. By (3.33),

T EC(π)−TEC(π ′) = 2(wi+1µi −wiµi+1)

i+1

∑k=1

µk −E[D]

= 2(wi+1µi −wiµi+1)(Ai+1 +B) (3.34)

where for m = 1,2, . . . ,n,

Am =m

∑k=1

µk and B =−E[D]. (3.35)

Similarly,

T EC(π)−TEC(π ′′) = 2(wi+2µi+1 −wi+1µi+2)(Ai+2 +B). (3.36)

If Ai+1 +B < 0, then by (3.33) and (3.34),

TEC(π)−TEC(π ′) = 2(wi+1µi −wiµi+1)(Ai+1 +B)> 0.

If Ai+1 +B ≥ 0, then by (3.35),

Ai+2 +B = Ai+1 + µi+2 +B > Ai+1 +B ≥ 0

and so by (3.33) and (3.36),

TEC(π)−TEC(π ′′) = 2(wi+2µi+1 −wi+1µi+2)(Ai+2 +B)> 0.

In either case, π cannot be an optimal sequence. Thus an optimal sequence must beV-shaped with respect to µi/wi.

Asymmetric Quadratic Cost Function

The total expected cost with asymmetric quadratic earliness and tardiness penaltiesis given by

TEC(π) = E

[α ∑

Ci<Di

wi(Di −Ci)2 +β ∑

Ci>Di

wi(Ci −Di)2

]

=n

∑i=1

wi

αE[(Di −Ci)

2ICi<Di]+β E

[(Ci −Di)

2ICi>Di]

, (3.37)

where α and β represent the unit cost of earliness and tardiness, respectively. Whenα = β , (3.37) reduces to the symmetric case in (3.27). In the asymmetric case, wefurther assume that the due dates D1, . . . ,Dn are exponentially distributed with acommon mean 1/δ , hence E[D] = 1/δ and E[D2] = 2/δ 2. Since Di is independentof Pi, the exponential distributions of Di and Pi yield

Pr(Di > Pi) = E[Pr(Di > Pi|Pi)] = E[e−δPi ] =1

1+ µiδ(3.38)

and so

Pr(Di >Ci) = ∏k∈Bi

Pr(Di > Pi) = ∏k∈Bi

E[e−δPk ] = ∏k∈Bi

fk, (3.39)

where

fk = E[e−δPk ] =1

1+ µkδ and 1− fi =µiδ

1+ µiδ= δ µi fi. (3.40)

Consequently,

E[(Di −Ci)2ICi<Di] = E[E[(Di −Ci)

2ICi<Di|Ci] = E[∫ ∞

Ci

(x−Ci)2δe−δxdx

]

= E[∫ ∞

0y2δe−δ (y+Ci)dy

]= E

[e−δCi

2δ 2

]

=2

δ 2 Pr(Di >Ci) =2

δ 2 ∏k∈Bi

fk. (3.41)

Let

V4 =V4(π) =n

∑i=1

wi ∏k∈Bi

fk. (3.42)

Then by (3.41) together with the result for the symmetric case in (3.30), we canrewrite (3.37) as

T EC(π) = (α −β )n

∑i=1

wiE[(Di −Ci)

2ICi<Di]+β

n

∑i=1

wiE[(Ci −Di)

2]

= β(

V1 +V2−2δ V3 +

2δ 2

)+

2δ 2 (α −β )V4 (3.43)

with V1, V2, V3 defined in (3.31) and V4 in (3.42).

Based on (3.43) we can prove the next theorem for the asymmetric case.

Theorem 3.4. If the due dates D1, . . . ,Dn are exponentially distributed with a com-mon mean 1/δ , then an optimal sequence π∗ that minimizes TEC(π) in (3.43) isV-shaped with respect to µi/wi.

Proof. Let π = . . . , i, j, . . ., π ′ = . . . , j, i, . . ., B∗ = Bi(π)− i= Bi(π ′)− j,and V ′

4 =V4(π ′). Then by (3.40),

V4 −V ′4 = wi fi ∏

k∈B∗fk +wj fi f j ∏

k∈B∗fk −wj f j ∏

k∈B∗fk −wi fi f j ∏

k∈B∗fk

= [wi fi(1− f j)−wj f j(1− fi)] ∏k∈B∗

fk

= [wi fiδ µ j f j −wj f jδ µi fi] ∏k∈B∗

fk = (wiµ j −wjµi)δ fi f j ∏k∈B∗

fk. (3.44)

Combining (3.43) and (3.44) with the result for the symmetric case in (3.32), similarto (3.34) and (3.36) with π = . . . , i, i+1, i+2, . . ., π ′ = . . . , i+1, i, i+2, . . . andπ ′′ = . . . , i, i+ 2, i+ 1, . . ., we obtain

T EC(π)−TEC(π ′) = 2(wi+1µi −wiµi+1)(Ai+1 +B) (3.45)

and

T EC(π)−TEC(π ′′) = 2(wi+2µi+1 −wi+1µi+2)(Ai+2 +B), (3.46)

where

Am = βm

∑k=1

µk −1

δ 2 (α −β )δm

∏k=1

fk, m ∈ 1, . . . ,n (3.47)

and B = 1/δ . It follows from (3.40) and (3.47) that

Am+1 −Am = β µm+1 −α −β

δ 2 δ ( fm+1 − 1)m

∏k=1

fk = β µm+1 −β −α

δ 2 δ 2µm+1

m+1

∏k=1

fk

≥ β µm+1 −β µm+1

m+1

∏k=1

fk ≥ 0,

where one of the inequalities must be strict unless both α and β are equal to zero,which of course is excluded. As a result, Am is strictly increasing in m.

Suppose that (3.33) holds. If Ai+1 + B < 0, then TEC(π)− T EC(π ′) > 0 by(3.45). If Ai+1 +B ≥ 0, then, since Am is increasing in m, Ai+2 +B > Ai+1 +B ≥ 0and so by (3.46), T EC(π)−TEC(π ′′)> 0. In either case, π cannot minimize T EC.Thus an optimal sequence must be V-shaped with respect to µi/wi.

Asymmetric Linear Cost Function

The total expected cost with asymmetric linear earliness-tardiness penalties is

T EC(π) = E[

∑Ci<Di

αi(Di −Ci)+ ∑Ci(π)>Di

βi(Ci(π)−Di)

]

=n

∑i=1

αiE[(D−Ci)ICi<D]+βiE[(Ci −D)ICi>D]

, (3.48)

where αi and βi are the unit earliness and tardiness costs, respectively, for job i.

Note that the setting of unit costs αi and βi is more general than that for theasymmetric quadratic costs in (3.37), where the weighted unit costs αwi and β wihave a constant ratio β/α , whereas βi/αi may vary with i.

We assume the same situations of Pi and Di as for the asymmetric quadraticcosts. Then by similar arguments leading to (3.43), the total expected cost in (3.48)can be expressed as

T EC(π) =n

∑i=1

βi ∑k∈Bi

µk +n

∑i=1

(αi +βi)1δ ∏

k∈Bi

fk −1δ

n

∑i=1

βi. (3.49)

The V-shape of the optimal sequence to minimize TEC(π) in (3.48) or (3.49) isstated in the next theorem.

Theorem 3.5. Define

γi j =

(α j

µ j− αi

µi

)(β j

µ j− βi

µi

)−1

if β jµi = βiµ j (3.50)

(γi j need not be defined if β jµi = βiµ j). If γi j satisfy

1+ γ jk < (1+ δ µk)(1+ γi j) (3.51)

for all distinct i, j,k ∈ 1, . . . ,n such that γ jk and γi j are defined, then an optimalsequence π∗ that minimizes T EC(π) is V-shaped with respect to µi/βi.

Proof. For π = (. . . , i, j, . . . ) and π ′ = (. . . , j, i, . . . ), (3.49), (3.50) and (3.40) give


= (β jµi −βiµ j)

1− δ fi f j

δ ∏k∈B∗

fk

− (α jµi −αiµ j)

δ fi f j

δ ∏k∈B∗

fk

= µiµ j

(β j

µ j− βi

µi

)1− fi f j ∏

k∈B∗fk

− µiµ j

(α j

µ j− αi

µi

)fi f j ∏

k∈B∗fk

= µiµ j

(β j

µ j− βi

µi

)1− (1+ γi j) fi f j ∏

k∈B∗fk

. (3.52)

Consequently, given π = . . . , i, i + 1, i + 2, . . ., π ′ = . . . , i + 1, i, i + 2, . . . andπ ′′ = . . . , i, i+ 2, i+ 1, . . ., we have

T EC(π)−TEC(π ′) = µiµi+1

(βi+1

µi+1− βi

µi

)Ai (3.53)

and

TEC(π)−TEC(π ′′) = µi+1µi+2

(βi+2

µi+2− βi+1

µi+1

)Ai+1, (3.54)

where

Ai = 1− (1+ γi,i+1)i+1

∏k=1

fk. (3.55)

Suppose that

µi

βi<

µi+1

βi+1>

µi+2

βi+2or

βi

µi>

βi+1

µi+1<

βi+2

µi+2. (3.56)

If Ai < 0, then (3.53) and (3.56) imply T EC(π)−TEC(π ′)> 0. If Ai ≥ 0, then by(3.51) and (3.55),

1+ γi,i+1 >1+ γi+1,i+2

1+ δ µi+2= (1+ γi+1,i+2) fi+2 =⇒

Ai+1 = 1− (1+ γi+1,i+2)i+2

∏k=1

fk = 1− (1+ γi+1,i+2) fi+2

i+1

∏k=1

fk

> 1− (1+ γi,i+1)i+1

∏k=1

fk = Ai ≥ 0,

which implies T EC(π)−TEC(π ′′) > 0 by (3.54) and (3.56). Thus π cannot min-imize T EC if (3.56) holds. Consequently, an optimal sequence must be V-shapedwith respect to µi/βi.

Theorem 3.5 covers a quite wide range cases for αi and βi. In particular, if wetake αi = αwi and βi = β wi, then by (3.50),

γi j =

(αwj

µ j− αwi

µi

)(β wj

µ j− β wi

µi

)−1

=αβ

is a constant, hence (3.51) holds trivially. In this case, the optimal sequence π∗ inTheorem 3.5 is identical to that in Theorem 3.4 for asymmetric quadratic costs.More cases covered by Theorem 3.5 are discussed below.

Case 1. α j/µ j −αi/µi and β j/µ j −βi/µi are proportional, that is,(

α j

µ j− αi

µi

)= K

(β j

µ j− βi

µi

)∀i, j = 1, . . . ,n (3.57)

for some constant K. Then γi j ≡ K. When K > 0, condition (3.51) holds obviously,and αi/µi have the same order as βi/µi. Hence by Theorem 3.5, the optimalsequence is V-shaped with respect to both αi/µi and βi/µi. A special case of(3.57) is when αi = αwi and βi = β wi as mentioned above.

If K ≤ 0, then αi/µi and βi/µi are in opposite orders. In such a case, ananalytical optimal sequence exists, which will be presented in Theorem 3.6 below.

Case 2. α j/µ j −αi/µi and β j/µ j −βi/µi are close to each other in the sensethat (

α j

µ j− αi

µi

)= (1+ εi j)

(β j

µ j− βi

µi

), i, j = 1, . . . ,n, (3.58)

where |εi j |≤ ε < 2 for all i, j. Then γi j = 1+ εi j and so

1+ γ jk

1+ γi j=

2+ ε jk

2+ εi j≤ 2+ ε

2− ε = 1+2ε

2− ε < 1+ δ µk holds if ε <2δ µk

2+ δ µk.

Thus if (3.58) holds with |εi j|≤ ε < 2δ µmin/(2+δ µmin), where µmin =min1≤i≤n µi,then condition (3.51) is satisfied and so an optimal sequence to minimize T EC(π)is V-shaped with respect to µi/βi.

The next two theorems identify the situations in which an analytical optimalsequence exists.

Theorem 3.6. If µi/βi and µi/αi have opposite orders, then a sequence in non-decreasing order of µi/βi, or in nonincreasing order of µi/αi, is optimal tominimize T EC(π).

Proof. Let β j/µ j ≥ βi/µi and α j/µ j ≤ αi/µi. Then γi j ≤ 0. It follows from (3.52)that T EC(π)−T EC(π ′) ≥ 0. This shows that T EC(π) ≥ T EC(π ′) if and only ifβ j/µ j ≥ βi/µi, or µ j/β j ≤ µi/βi. The theorem then follows.

Theorem 3.7. Assume µ1 ≤ µ2 ≤ · · ·≤ µn.

(i) Let b = δ (1+ δ µ1)(1+ δ µ2)/δ − 1. If∣∣∣∣α j

µ j− αi

µi

∣∣∣∣≤ b

∣∣∣∣β j

µ j− βi

µi

∣∣∣∣ ∀i, j = 1, . . . ,n,

then a sequence in nondecreasing order of µi/βi is optimal.

(ii) Let b = ∏nk=1(1+ δ µk)− 1. If

∣∣∣∣α j

µ j− αi

µi

∣∣∣∣≥ b

∣∣∣∣β j

µ j− βi

µi

∣∣∣∣ , ∀i, j = 1, . . . ,n,

then a sequence in nonincreasing order of µi/αi is optimal.

Proof. Under the conditions of Part (i),

(1+ γi j) fi f j ∏k∈B∗

fk ≤ (1+ b) fi f j =(1+ δ µ1)(1+ δ µ2)

(1+ δ µi)(1+ δ µ j)≤ 1.

Hence by (3.52),

TEC(π)≥ T EC(π ′) ⇐⇒ β j

µ j≥ βi

µi⇐⇒ µ j

β j≤ µi

βi.

So an optimal sequence should schedule job j ahead of job i if µ j/β j ≤ µi/βi. Thisproves Part (i).

Next, by the definition of b and (3.40),

(1+ b) fi f j ∏k∈B∗

fk ≥ (1+ b)n

∏k=1

fk =n

∏k=1

(1+ δ µk) fk = 1. (3.59)


Let µ j/α j ≤ µi/αi or α j/µ j ≥ αi/µi. Then by (3.52) and (3.59) together with theconditions of the theorem,

TEC(π)−TEC(π ′)≤µiµ j

b

(α j

µ j− αi

µi

)1− (1+ b) fi f j ∏

k∈B∗fk

≤ 0.

Thus an optimal sequence should place job i ahead of job j if µ j/α j ≤ µi/αi. Thisproves Part (ii).

Note that if αi are proportional to µi, the condition in Part (i) of Theorem3.7 is trivially satisfied, so that an optimal sequence is in nonincreasing order ofµi/βi. Similarly, if βi are proportional to µi, then an optimal sequence is innonincreasing order of µi/αi

Algorithm to Compute Optimal V-Shaped Sequence

To demonstrate how an algorithm can be designed based on the V-shape of theoptimal sequence, we provide an algorithm based on Theorem 3.5. Without loss ofgenerality, we assume that the jobs have been numbered such that

µ1/β1 ≤ µ2/β2 ≤ · · ·≤ µn/βn.

Consider a set of jobs Si = 1,2, . . . , i. In a V-shaped sequence, job i will besequenced either the first or the last among all jobs in Si. Assume that π∗ is the bestV-shaped sequence and Si is the set of jobs sequenced before all jobs in Si under π∗,and let

Θi = ∑j∈Si

µ j and Ψi = ∏j∈Si

f j. (3.60)

Define hi(Θi,Ψi) to be the contribution of all jobs in Si to the cost function (3.49),given Θi and Ψi. Then, it is easy to see that the costs arising from sequencing job ias the first and the last job among all jobs in the set Si will be, respectively,

hai (Θi,Ψi) = hi−1(Θi + µi,Ψi fi)+βi(Θi + µi)+ (αi +βi)

1δ Ψi fi, (3.61)

and

hbi (Θi,Ψi) = hi−1(Θi,Ψi)+βi

(Θi + ∑

j∈Si

µ j

)+(αi +βi)

1δ Ψi ∏

j∈Si

f j . (3.62)

3.2 Expected Cost of Earliness and Tardy Jobs 117

It follows from the principle of optimality of dynamic programming that, withha

i (Θi,Ψi) and hbi (Θi,Ψi) defined in (3.61) and (3.62) respectively, the best V-shaped

sequence π∗ must sequence job i such that

hi(Θi,Ψi) = minhai (Θi,Ψi),hb

i (Θi,Ψi)−1δ ∑

j∈Si

β j. (3.63)

We therefore have the following algorithm.

Algorithm 3.3

1. For i = 1,2, . . . ,n, compute hi(Θi,Ψi) according to (3.63), for all possible valuesin the feasible sets of Θi and Ψi.

2. Let H∗n = minΘn,Ψn

hn(Θn,Ψn)

.

3. Construct, by a backward tracking process, the sequence π∗ that achieves H∗n .

We have omitted the details of the backward tracking process to find π∗. We havealso omitted the definitions of the feasible sets for Θi and Ψi defined in (3.60).With certain assumptions (e.g., all µi are integers), one can identify finite feasiblesets for Θi and Ψi, and show that the time complexity of Algorithm 3.3 is pseudo-polynomial.

3.2 Expected Cost of Earliness and Tardy Jobs

An important class of scheduling problems is to minimize expected cost of earlinessand tardy jobs. In this class of problems, the tardiness cost for each job i is a fixedcharge wi, which may depend on the value of the job and is incurred if the jobmisses a due date or deadline D, and the earliness cost for job i is a general functiongi(D−Ci) of earliness D−Ci, representing such costs as inventory or maintenance.The due date is highly uncertain and thus modeled by an exponentially distributedrandom variable. One example of applications is in agriculture. In areas where longperiods of drought occur from time to time, planning properly based on rain fore-cast is important. The timing of the next rainfall is quite uncertain and can only beestimated based on weather forecast information. Planting too early before it rainscould lower the crop yields and even jeopardize its growth, whereas planting afterthe rain could lead to big losses as it is likely that no more rain will come before theend of the current planting season.

The problem is to minimize the total expected cost for earliness and tardy jobs:

T EC(ζ ) = E

[

∑i:Ci(ζ )≤D

gi(D−Ci(ζ ))+ ∑i:Ci(ζ )>D

wi

]

= E

[n

∑i=1

gi(D−Ci)ICi≤D+n

∑i=1

wiICi>D

](3.64)

with respect to policy ζ , where D is the due date, gi(·) and wi are, respectively, theearliness function and the fixed tardiness penalty of job i. A static policy ζ = (π ,S)consists of a sequence π to determine the order of processing n jobs, and a set ofidle times S = (s1, . . . ,sn) with si inserted before processing job i. For this problem,the processing times Pi may following arbitrary probability distributions. The duedate D is exponentially distributed with mean 1/δ .

3.2.1 Single Machine Scheduling

For a single machine, by the exponential distribution of D and the independencebetween P1, . . . ,Pn,D, it is easy to calculate

E[gi(D−Ci)ICi≤D] = E[E[gi(D−Ci)ICi≤D|Ci]] = E[∫ ∞

Ci

gi(x−Ci)δe−δxdx]

= E[∫ ∞

0gi(y)δe−δ (Ci+y)dy

]= E

[e−δCi

∫ ∞

0gi(y)δe−δydy

]

= αiE[e−δCi ] = αiPr(Ci < D),

where

αi =∫ ∞

0gi(y)δe−δydy, i = 1, . . . ,n.

Thus by (3.64),

T EC(ζ ) =n

∑i=1

E[gi(D−Ci)ICi≤D]+wiPr(Ci > D)

=n

∑i=1

αiPr(Ci < D)+wi(1−Pr(Ci < D))

=n

∑i=1

(αi −wi)Pr(Ci < D)+n

∑i=1

wi. (3.65)

With idle times si, the completion time of job i is given by

Ci =Ci(ζ ) = ∑k∈Bi(π)

(sk +Pk).

It then follows from the properties of the exponential distribution and (3.38) that

Pr(Ci < D) = ∏k∈Bi

Pr(D > sk)Pr(D > Pk) = ∏k∈Bi

e−δ sk fk,

where fk = E[e−δPk ] is defined by (3.40). Thus by (3.65),

T EC(ζ ) =n

∑i=1

(αi −wi) ∏k∈Bi

e−δ sk fk +n

∑i=1

wi. (3.66)

The following theorem provides the optimal static policy to minimize the totalexpected cost of earliness and tardy jobs.

Theorem 3.8. An optimal static policy to minimize T EC(ζ ) in (3.64) will processthe jobs according to the following rules:

(a) Jobs with αi < wi are sequenced in nonincreasing order of (wi −αi) fi/(1− fi)and processed from time zero with no idle time inserted between any two con-secutive jobs;

(b) All jobs with αi ≥wi are processed in an arbitrary order, starting as soon as thedue date D is missed or the last job with αi <wi has been completed, whicheveris the later.

Proof. According to (3.66), it can be seen that ETC(ζ ) is minimized when si is zeroif wi > αi and infinity if wi ≤ αi. This implies that, under an optimal policy, jobswith αi < wi should be processed from time zero with no inserted idle time betweenany two consecutive jobs, whereas jobs with αi ≥ wi should be started as late aspossible, which should not be earlier than the completions of the jobs with αi < wi.

We now show that Rule (a) is optimal. Assume, without loss of generality, thatαi < wi for i ∈ 1,2, . . . ,nI and consider two sequences πI = (1,2, . . . ,nI) andπ ′

I = (1, . . . , i− 1, i+ 1, i, i+ 2, . . . ,nI), where π ′I is resulted from interchanging the

i-th and (i+ 1)-th jobs in πI .

Let ζI and ζ ′I be two policies which differ in the sequences πI and π ′

I , and writefi = f1 f2 · · · fi−1. Then by (3.66) and noting that si = 0 when αi < wi, we have

TEC(ζ )−TEC(ζ ′) = (αi −wi) fi fi +(αi+1 −wi+1) fi fi fi+1

− (αi+1 −wi+1) fi fi+1 − (αi −wi) fi fi+1 fi

= (αi −wi) fi fi(1− fi+1)− (αi+1 −wi+1) fi fi+1(1− fi)

= fi(1− fi)(1− fi+1)

(αi −wi) fi

1− fi− (αi+1 −wi+1) fi+1

1− fi+1

.

Thus T EC(ζ )≤ T EC(ζ ′) if and only if

(αi −wi) fi

1− fi≤ (αi+1 −wi+1) fi+1

1− fi+1,

which together with si = 0 for αi < wi proves Rule (a).

Now consider Rule (b). From (3.66) we can see that the contribution to the ob-jective function from job j is given by

T ECj = (α j −wj) ∏k∈B j

e−δ sk fk +wj.

If α j ≥ wj, then T ECj ≥ wj under any rule. This means that the best policy is tohave TECj = wj, which can be achieved with si = +∞ or by starting job j after Dhas occurred because this will obviously lead to Pr(Ci > D) = 1 and so T ECj = wj .This proves Rule (b).

3.2.2 Parallel Machine Scheduling

Now consider the problem of processing n jobs on m parallel identical machines.We first investigate the multi-machine problem with wi ≡ w and gi(·) ≡ g(·) forall i. The results obtained will then be extended to more general cases with certaincompatibility conditions.

In the single-machine case, a static policy consists of a sequence π and a setS = sin

i=1 of idle times. It is clear that, when there are m parallel machines, a staticpolicy ζ should consist of a sequence of pairs, namely, ζ = (π j,S j)m

j=1, where π jdenotes the sequence of the jobs to be processed on machine j and S j denotes theset of idle times inserted before these jobs. More explicitly, π j = (π j(1), . . . ,π j(n j))is an ordered n j-tuple where n j denotes the number of jobs assigned to machine jwhile π j(k) denotes that job π j(k) is the k-th to be processed on machine j. Accord-ingly, S j =

s j

i : i = π j(1), . . . ,π j(n j)

with s ji being the idle time inserted immedi-

ately before job i on machine j. It is easy to see that a policy which requires s ji to

approaching infinity is equivalent, in terms of optimality, to a policy that processesjob i on machine j after the occurrence of the random deadline D.

Similar to (3.66), we can show that

TEC(ζ ) =m

∑j=1

n j

∑i=1

(αi −wi) ∏k∈Bi(π j)

e−δ s jk fk +

n

∑i=1

wi, (3.67)

where αi =∫ +∞

0 gi(y)δe−δydy. In the following we consider the model with wi ≡ wand gi(·)≡ g(·). Then, (3.67) reduces to

TEC(ζ ) = (α −w)m

∑j=1

n j

∑i=1

∏k∈Bi(π j)

e−δ s jk fk + nw (3.68)

with α =∫+∞

0 g(y)δe−δydy. Thus, if α ≥ w, then T EC(ζ ) ≥ nw for any policy ζand the equality holds when s j

k = +∞ for all k and j. In this case, it is clear thatany policy is optimal if it starts all jobs after the occurrence of D and process thejobs by any machines and in any order. The more interesting situation is α < w,which means that the cost of missing the deadline is greater than the earliness. Thisis usually the case in practical situations. We now focus on this case.

When α < w, an optimal policy must have s jk = 0 for all k and j in order to

minimize T EC(ζ ). Hence the problem becomes to determining an optimal policyto maximize:

T EC′(ζ ) =m

∑i=1

n j

∑i=1

∏k∈Bi(π j)

fk. (3.69)

Let us now establish the following lemma, which is essential in order to getthe optimal policy for maximizing T EC′(ζ ). The lemma establishes how one mayassign and sequence a set of 2u elements into two ordered u-tuples (a1,a2, . . . ,au)and (b1,b2, . . . ,bu) so that Su = ∑u

i=1 a1 · · ·ai +∑ui=1 b1 · · ·bi is maximized.

Lemma 3.2. Let a1,a2, . . . ,au and b1,b2, . . . ,bu be two sets of numbers in [0,1).Define Ai = a1a2 · · ·ai, Bi = b1b2 · · ·bi for i = 1, . . . ,u, and

Su = A1 + · · ·+Au +B1 + · · ·+Bu.

If one of the following three conditions holds:

(i) There exists k ∈ 1,2, . . . ,u− 1 such that ak+1 > bk,

(ii) There exists k ∈ 1,2, . . . ,u− 1 such that bk+1 > ak, or

(iii) There exist k, l ∈ 1,2, . . . ,u such that ak < bk and al > bl,

then we can rearrange a1, . . . ,au,b1, . . . ,bu to obtain two new sets a′1, . . . ,a′u and

b′1, . . . ,b′u such that

S′u = A′1 + · · ·+A′

u +B′1 + · · ·+B′

u > Su, (3.70)

where A′i = a′1 · · ·a′i and B′

i = b′1 · · ·b′i.

Proof. Let T = Tu be the subset of 1,2, . . . ,u− 1 such that ak+1 > bk for k ∈T and ak+1 ≤ bk for k ∈ Tc = 1,2, . . . ,u− 1−T. If condition (i) holds, then Tis nonempty. We now regroup the numbers a1, . . . ,au,b1, . . . ,bu into two new setsa′1, . . . ,a

′u and b′1, . . . ,b

′u by defining

a′1 = a1, a′k+1 =

bk if k ∈ Tak+1 if k ∈ Tc

and

b′k =

ak+1 if k ∈ Tbk if k ∈ Tc , b′u = bu

(i.e., interchange ak+1 with bk for every k ∈ T). Then

A′i = a′1 · · ·a′i = a1 · · ·ai ∏

k∈T,k bk for all k ∈ T, (3.70) will be proved by the following claim: With a′kand b′k as defined above,

S′u − Su >

(

∏k∈T

ak+1 −∏k∈T

bk

)b1 · · ·bu ∏

k∈T

1bk

for all u ≥ 2. (3.72)

We prove (3.72) by induction. For u = 2, we must have T = 1 and a2 > b1. Hence

S′u − Su = A′1 +A′

2 +B′1 +B′

2 −A1 −A2 −B1 −B2

= a1 + a1b1 + a2 + a2b2 − a1 − a1a2 − b1 − b1b2

= a1(b1 − a2)+ a2 − b1 + b2(a2 − b1)

= (a2 − b1)(1− a1+ b2)> (a2 − b1)b2.

Thus (3.72) holds for u = 2.

Next, suppose that (3.72) holds for a u≥ 2, and consider u+1 in place of u. In thefollowing arguments, T will denote Tu+1 ⊂ 1, . . . ,u and Tc = 1, . . . ,u−T. Thereare two cases:

Case 1. au+1 ≤ bu (i.e., u ∈ Tc). In this case, we have

Su+1 = Su +Au+1 +Bu+1 and S′u+1 = S′u +A′u+1 +B′

u+1.


Hence by the induction assumption and (3.71),

S′u+1 − Su+1 = S′u − Su +A′u+1 +B′

u+1 −Au+1−Bu+1

>

(

∏k∈T

ak+1 −∏k∈T

bk

)b1 · · ·bu ∏

k∈T

1bk

+ a1 · · ·au+1 ∏k∈T

bk

ak+1

+ b1 · · ·bu+1 ∏k∈T

ak+1

bk− a1 · · ·au+1 − b1 · · ·bu+1

=

(

∏k∈T

ak+1 −∏k∈T

bk

)b1 · · ·bu ∏

k∈T

1bk

+ a1 · · ·au+1 ∏k∈T

1ak+1

(

∏k∈T

bk −∏k∈T

ak+1

)

+ b1 · · ·bu+1 ∏k∈T

1bk

(

∏k∈T

ak+1 −∏k∈T

bk

)

=

(

∏k∈T

ak+1 −∏k∈T

bk

)(b1 · · ·bu+1 ∏

k∈T

1bk

+ b1 · · ·bu ∏k∈T

1bk

− a1 · · ·au+1 ∏k∈T

1ak+1

)

≥(

∏k∈T

ak+1 −∏k∈T

bk

)b1 · · ·bu+1 ∏

k∈T

1bk

≥ 0,

where the last two inequalities hold because by the definition of T, ak+1 > bk fork ∈ T and bk ≥ ak+1 for k ∈ Tc = 1, . . . ,u−T, so that

∏k∈T

ak+1 > ∏k∈T

bk

and

b1 · · ·bu ∏k∈T

1bk

= ∏k∈Tc

bk ≥ ∏k∈Tc

ak+1 = a2 · · ·au+1 ∏k∈T

1ak+1

.

Thus (3.72) holds for u+ 1 in Case 1.

Case 2. au+1 > bu (u ∈ T). In this case, the interchange between au+1 and bu affectsS′u+1, but not S′u. Hence by (3.71),

S′u+1 = S′u − b1 · · ·bu ∏k∈T−u

ak+1

bk+ b1 · · ·bu ∏

k∈T

ak+1

bk+A′

u+1+B′u+1. (3.73)

The induction assumption now implies

S′u − Su >

(

∏k∈T−u

ak+1 − ∏k∈T−u

bk

)b1 · · ·bu ∏

k∈T−u

1bk

.

Thus (3.73) leads to

S′u+1 − Su+1 = S′u − Su +A′u+1+B′

u+1 −Au+1 −Bu+1

− b1 · · ·bu ∏k∈T−u

ak+1

bk+ b1 · · ·bu ∏

k∈T

ak+1

bk

>

(

∏k∈T−u

ak+1 − ∏k∈T−u

bk

)b1 · · ·bu ∏

k∈T−u

1bk

+

(

∏k∈T

ak+1 −∏k∈T

bk

)(b1 · · ·bu+1 ∏

k∈T

1bk

− a1 · · ·au+1 ∏k∈T

1ak+1

)

− b1 · · ·bu ∏k∈T−u

ak+1

bk+ b1 · · ·bu ∏

k∈T

ak+1

bk

=−b1 · · ·bu + b1 · · ·bu ∏k∈T

ak+1

bk

+

(

∏k∈T

ak+1 −∏k∈T

bk

)(b1 · · ·bu+1 ∏

k∈T

1bk

− a1 · · ·au+1 ∏k∈T

1ak+1

)

= b1 · · ·bu ∏k∈T

1bk

(

∏k∈T

ak+1 −∏k∈T

bk

)

+

(

∏k∈T

ak+1 −∏k∈T

bk

)(b1 · · ·bu+1 ∏

k∈T

1bk

− a1 · · ·au+1 ∏k∈T

1ak+1

)

≥(

∏k∈T

ak+1 −∏k∈T

bk

)b1 · · ·bu+1 ∏

k∈T

1bk

,

where the last inequality holds because bk ≥ ak+1 for k ∈ Tc so that

b1 · · ·bu+1 ∏k∈T

1bk

= ∏k∈Tc

bk ≥ ∏k∈Tc

ak+1 = a2 · · ·au+1 ∏k∈T

1ak+1

.

This shows that (3.72) holds for u+ 1 in Case 2 as well and completes the proofunder condition (i). The proof under condition (ii) is similar as the roles of ai andbi are interchangeable.

It remains to prove the conclusion of the lemma under condition (iii). Under thatcondition, there exists a nonempty subset U of 1, . . . ,u such that ak < bk for k ∈ Uand al > bl for at least one l ∈ Uc = 1, . . . ,u−U. Define

(a′k,b′k) =

(bk,ak) if k ∈ U(ak,bk) if k ∈ Uc

(i.e., interchange ak with bk for each k ∈ U). Then

A′i +B′

i −Ai−Bi = a1 · · ·ai ∏k∈U,k≤i

bk

ak+ b1 · · ·bi ∏

k∈U,k≤i

ak

bk− a1 · · ·ai − b1 · · ·bi

= a1 · · ·ai ∏k∈U,k≤i

1ak

(

∏k∈U,k≤i

bk − ∏k∈U,k≤i

ak

)

+ b1 · · ·bi ∏k∈U,k≤i

1bk

(

∏k∈U,k≤i

ak − ∏k∈U,k≤i

bk

)

=

(

∏k∈U,k≤i

bk − ∏k∈U,k≤i

ak

)(

∏l∈Uc,l≤i

al − ∏l∈Uc,l≤i

bl

)> 0,

where the last inequality holds because ak < bk for k ∈ U, al ≥ bl for l ∈ Uc, andal > bl for at least one l ∈ Uc. It follows that S′u > Su.

We are now ready to derive an optimal static policy for the parallel-machineproblem, which is given in the theorem below. The policy actually says that,when the jobs have been labeled such that fi ≥ f j if i < j, then the first batchof jobs 1, . . . ,m are assigned to machines 1, . . . ,m, respectively, each of whichbecomes the first to be processed on the corresponding machine; the second batchof jobs m+ 1, . . . ,m+ 2m are assigned to machines 1, . . . ,m, respectively, and eachbecomes the second to be processed; this procedure continues until all jobs havebeen assigned. Note that jobs 1,m+ 1,2m+ 1, . . . , should be assigned on the samemachine, jobs 2, m+2,2m+2, . . . should be assigned on the same machine; and soon. Note also that the time requirement to construct the policy is O(n) after the jobshave been labeled as required, while to label the n jobs needs O(n logn) time. Thus,the time complexity to obtain the policy is O(n logn).

Theorem 3.9. Suppose that the jobs are labeled such that f1 ≥ f2 ≥ · · · ≥ fn andα <w. Let q be the integer part of n/m. Then ζ ∗ = (π∗

j ,S∗j)m

i=1 is an optimal staticpolicy to minimize T EC(ζ ) in (3.64) on m parallel identical machines, where S∗j isa set of zero elements and

π∗j =

( j, j+m, j+ 2m, . . . , j+ qm), if j+ qm ≤ n,( j, j+m, j+ 2m, . . . , j+(q− 1)m), otherwise,

(3.74)

for j = 1, . . . ,m.

Proof. As α < w, it is easy to see that all elements of S∗j for j = 1, . . . ,m should bezero. Moreover, by adding dummy jobs with fk = 0 after the existing jobs we can,without loss of generality, assume that under ζ ∗ all n j are equal, say, to a commonnumber u. Thus the remaining question is to show that the sequences (π∗

j )mi=1 given

by (3.74) maximize (3.69).

Let π j = (π j(1), . . . ,π j(u)). Given any two machines j and j′ with correspondingπ∗

j and π∗i′ , let ai = fπ∗

j (i)and bi = fπ∗

i′ (i)for i = 1, . . . ,u. If ak+1 > bk or bk+1 > ak

for some k ∈ 1, . . . ,u− 1, then by Lemma 3.2 we can increase the value of

u

∑i=1

(a1 · · ·ai + b1 · · ·bi) (3.75)

by interchanging jobs between machines j and j′. This contradicts the assumptionthat (π∗

j ) maximize (3.69) as (3.75) is exactly the contributions to (3.69) made bythe two machines. Thus we must have

ak+1 ≤ bk and bk+1 ≤ ak for all k ∈ 1, . . . ,u− 1. (3.76)

Lemma 3.2 also tells us that (3.75) can be increased if there exist k, l ∈ 1, . . . ,usuch that ak < bk and al > bl , hence (π∗

j ) must satisfy

either ak ≤ bk for all k, or ak ≥ bk for all k. (3.77)

Equations (3.76) and (3.77) imply that one of the following two must hold:

either a1 ≥ b1 ≥ a2 ≥ b2 ≥ a3 ≥ b3 ≥ · · ·≥ au ≥ bu

or b1 ≥ a1 ≥ b2 ≥ a2 ≥ b3 ≥ a3 ≥ · · ·≥ bu ≥ au,

which are equivalent to

either fπ∗j (1)

≥ fπ∗i′ (1)

≥ fπ∗j (2)

≥ fπ∗i′ (2)

≥ · · ·≥ fπ∗j (u)

≥ fπ∗i′ (u)

,

or fπ∗i′ (1)

≥ fπ∗j (1)

≥ fπ∗i′ (2)

≥ fπ∗j (2)

≥ · · ·≥ fπ∗i′ (u)

≥ fπ∗j (u)

. (3.78)

As (3.78) holds for any pair of machines and f1 ≥ · · · ≥ fn, it is not difficult to seethat under (π∗

j ), jobs 1, . . . ,m must be assigned one each to the m machines andbe the first to be processed; jobs m+1, . . . ,m+2m are also assigned one each to them machines and the second to be processed; and so on. Moreover, job m+1 shouldbe with job 1 on the same machine, job m+2 with job 2, and so on. In other words,(π∗

j ) are given by (3.74).

Generalization

Theorem 3.9 can be generalized to job-dependent costs with the following compat-ibility condition:

fi > f j ⇒ (wi −αi) fi ≥ (wj −α j) f j , ∀i, j with αi < wi, α j < wj. (3.79)

The results are presented below as a summary. The proof is similar to the argumentsfor Theorem 3.9.

3.3 Completion Time Variance 127

Theorem 3.10. Under the compatibility condition (3.79), an optimal static policy tominimize T EC(ζ ) in (3.67) will process the jobs according to the following rules:

(a) Jobs with αi < wi are sequenced according to (3.74) and processed without anyinserted idle times.

(b) All jobs with αi ≥ wi may be processed on any machine in any order, startingas soon as D has occurred or the last job with αi < wi has been completed,whichever is the later.

Remark 3.2. It can be verified that the compatibility condition (3.79) is satisfied inthe following situations:

(a) wi ≡ w and gi(·)≡ g(·) for all i with αi < wi;

(b) All processing times Pi are i.i.d. for all i with αi < wi;

(c) fi > f j ⇒ wi ≥ wj when αi ≡ α , for all i with α < wi;

(d) Pi can be stochastically ordered and Pi <st Pj =⇒ wi −αi ≥ wj −α j, for all i, jwith αi < wi, α j < wj;

(e) Pi can be stochastically ordered and Pi <st Pj =⇒ wi ≥ wj when αi ≡ α , for alli, j with α < wi,wj .

Case (a) is discussed in Theorem 3.9. Problems with i.i.d. processing times arefrequently addressed in the literature, see, for example, Boxma and Forst (1986),Buzacott and Shanthikumar (1993), Coffman et al. (1993), Emmons and Pinedo(1990), and Xu et al. (1992), etc. Emmons and Pinedo have indicated that for par-allel machine problems to minimize the expected weighted number of tardy jobs,the assumption of i.i.d. processing times is generally necessary to obtain analyticalpolicies. Case (c) deals with the situation where the earliness cost is common for alljobs. Case (d) above is intuitively understandable, since if the processing times ofa job i is ‘smaller’ than that of another job j and the difference of job i between itstardiness cost and its earliness cost is bigger than that of job j, then naturally job ishould be processed before job j as this will reduce the risk of incurring more cost.Case (e) is a special case of case (d).

3.3 Completion Time Variance

The problem of minimizing the completion time variance (CTV) was initially for-mulated by Merten and Muller (1972), motivated by the file organization problemin computing systems, where it is desirable to provide uniform response to users’requests for retrieving data files. It can be shown that the CTV model may alsohave applications in other scheduling problems where a uniform treatment of jobs


is desirable. Examples include the earliness/tardiness minimization problems withdue-date assignment in manufacturing systems, where an optimal job sequence andan optimal due-date should be determined simultaneously, and customer serviceproblems in commercial environments, where customers should be served in such away that they spend approximately the same time to wait for the service.

In this section, we consider the weighted CTV problem with random processingtimes. The problem is to minimize the expected variance of job completion times.This problem is NP-complete. We prove a W-shape for the optimal sequence andthen develop an algorithm with pseudo-polynomial time complexity to compute thesolution based on this W-shape.

3.3.1 The Weighted Variance Problem

Consider a set of n jobs to be processed on a single machine. The processing timesP1, . . . ,Pn are independent random variables with means E[Pi] = µi and variancesVar(Pi) = σ2

i , i = 1, . . . ,n. The problem is to determine an optimal sequence π tominimize the expected variance of job completion times:

ECTV (π) = E

[n

∑i=1

wi(Ci − C

)2

], (3.80)

where wi is the weight of job i (w1 + · · ·+wn = 1), Ci = Ci(π) is the completiontime of job i under sequence π , and C = ∑n

i=1 wiCi is the mean completion time.

Remark 3.3. The problem formulated above is a generalization of the determin-istic model of the weighted variance minimization problem (Merten and Muller,1972) to random processing times. It is also a stochastic generalization of the ear-liness/tardiness minimization problem with due date assignment (Panwalker et al.,1982), which is to find a sequence π and a common due date d ∈ [0,∞) to minimizethe expected cost function:

f (π ,d) = E

[n

∑i=1

wi(Ci − d

)2

],

where wi represents the unit earliness/tardiness cost of job i.

We first present an equivalent expression for the expected completion time vari-ance ECTV (π) in (3.80).

Theorem 3.11. Let Bi(π) denote the set of jobs sequenced no later than job i undersequence π . Then ECTV (π) can be expressed by


ECTV (π) =n

∑i=1

wi[Mi(π)− M(π)]2 +n

∑i=1

W ai (π)W b

i (π)σ2i , (3.81)

where

Mi(π) = ∑j∈Bi(π)

µ j, M(π) =n

∑i=1

Mi(π),

W bi (π) = ∑

j∈Bi(π)−iwj and W a

i (π) = 1−Wbi (π).

Proof. Let π = (k1, . . . ,kn). Then

C =n

∑i=1

wiCi =n

∑i=1

wi

i

∑j=1

Pkj =n

∑j=1

Pj

n

∑i= j

wki =n

∑j=1

PjW aj (π) =

n

∑i=1

W ai (π)Pi. (3.82)

Similarly,

n

∑i=1

wiC2i =

n

∑i=1

wi

(

∑j∈Bi(π)

Pj

)2

=n

∑i=1

W ai (π)P2

i +n

∑i=1

wi ∑j∈Bi(π), j =l

PjPl (3.83)

and

C2 =

(n

∑i=1

wiCi

)2

=n

∑i=1

[W ai (π)]2P2

i +n

∑i= j

W ai (π)W a

j (π)PiPj. (3.84)

Note that W ai (π)− [Wa

i (π)]2 =W ai (π)[1−Wa

i (π)] =W ai (π)W b

i (π). Thus it followsfrom (3.83) and (3.84) that

n

∑i=1

wi(Ci − C

)2=

n

∑i=1

wiC2i − 2C

n

∑i=1

wiCi + C2 =n

∑i=1

wiC2i − C2

=n

∑i=1

W ai (π)W b

i (π)P2j +

n

∑i=1


PjPl

−n

∑i= j

W ai (π)W a

j (π)PiPj. (3.85)

Since Mi(π) = ∑ j∈Bi(π) µ j, the same arguments for (3.85), but with µ j in place ofPj, also lead to

n

∑i=1

wi[Mi(π)− M(π)]2 =n

∑i=1

W ai (π)W b

i (π)µ2j +

n

∑i=1


µiµ j

−n

∑i= j

W ai (π)W a

j (π)µiµ j. (3.86)

Thus by comparing the expectation of (3.85) with (3.86), and noting that E[Pj] = µ jand E[P2

j ] = µ2j +σ2

j , we obtain (3.81).

3.3.2 Structural Property of Optimal Sequence

The problem to minimize the ECTV (π) in (3.80) or (3.81) is shown to be NP-hardby Cai and Zhou (1997). We now look for some structural property of the opti-mal sequence that will enable us to develop an efficient algorithm to compute thesolution. It turns out that, under certain compatible conditions, the optimal sequencepossesses a structure similar to a V-shape, except for the job at the tip of the V-shape,which we referred to as a W-shape. The definition of a W-shaped sequence is statedbelow.

Definition 3.2. A sequence π is said to be W-shaped with respect to an index setai if there exists a d such that for any three consecutive jobs i, j,k, Mj(π) < dimplies a j ≤ ai and Mi(π)≤ d implies a j ≤ ak.

Note that in a W-shaped sequence as defined above, it is possible to have one job msequenced between jobs i and k with Mm(π)− µm < d ≤ Mm(π), such that am > aiand am > ak. We call this job m the straddling job.

The next theorem establishes the W-shape of the optimal sequence to minimizethe ECTV (π) with respect to the weighted means µi/wi under certain agreeableconditions.

Theorem 3.12. Assume that the following agreeable conditions are satisfied:

(a) µi > µ j =⇒ σ2i ≥ σ2

j and wi ≤ wj,

(b) σ2i > σ2

j =⇒ wi ≤ wj, and

(c)µi

wi≥ µ j

w j=⇒ µ2

i

w2i−

µ2j

w2j≥ σ2

i

w2i−

σ2j

w2j.

Then an optimal sequence π∗ that minimizes ECTV (π) must be W-shaped withrespect to the weighted means µi/wi with d = M(π∗).

Proof. Suppose that π∗ is not W-shaped with respect to µi/wi and d = M(π∗).Then there must exist three consecutive jobs i, j,k under π∗ such that one of thefollowing two cases occurs:

(i) Either Mj(π∗)< d = M(π∗) and µ j/wj > µi/wi;

(ii) Or Mj(π∗)≥ d + µ j and µ j/wj > µk/wk

Let π∗ = (. . . , i, j,k, . . . ). In case (i), take π1 = (. . . , j, i,k, . . . ). From (3.81), we canwrite

ECTV (π1)−ECTV (π∗) = A1 +B1, (3.87)

where

A1 =n

∑l=1

wl [Ml(π1)− M(π1)]2 −

n

∑l=1

wl [Ml(π∗)− M(π∗)]2

and

B1 =n

∑l=1

W al (π1)W b

l (π1)σ2l −

n

∑l=1

W al (π∗)W b

l (π∗)σ2l .

Note that Mi(π1) = Mi(π∗)+ µ j, Mj(π1) = Mj(π∗)− µi and Ml(π1) = Ml(π∗) forl = i, j, hence

M(π1)− M(π∗) = µ jwi − µiw j.

It follows that

A1 = [M(π1)− M(π∗)]2 + µ2j wi + µ2

i w j + 2[M(π1)− M(π∗)](µ jwi − µiw j)

+ 2[Mi(π∗)µ jwi −Mj(π∗)µiw j ]− 2M(π∗)(µ jwi − µiw j)

This together with Mj(π∗) = Mi(π∗)+ µ j gives

A1 =−(µ jwi − µiw j)2 + 2[Mi(π∗)− M(π∗)](µ jwi − µiw j)

+ µ2j wi + µ2

i w j − 2µ jµiw j (3.88)

Since Mj(π∗) = Mi(π∗)+ µ j and

µ2j wi + µ2

i w j − 2µ jµiw j = 2µ j(µ jwi − µiw j)− (µ2j wi − µ2

i w j),

it follows from (3.88) together with the agreeable conditions that in case (i),

A1 =−(µ jwi − µiw j)2 + 2[Mi(π∗)− M(π∗)](µ jwi − µiw j)− (µ2

j wi − µ2i w j)

<−(µ2j wi − µ2

i w j). (3.89)

Furthermore,

B1 =W aj (π1)W b

j (π1)σ2j +W a

i (π1)W bi (π1)σ2

i

−Waj (π∗)W b

j (π∗)σ2j −Wa

i (π∗)W bi (π∗)σ2

i

=W ai (π∗)W b

i (π∗)σ2j +[Wa

i (π∗)−wj][W bi (π∗)+wj]σ2

i

−Wai (π∗)W b

i (π∗)σ2i − [Wa

i (π∗)−wi][W bi (π∗)+wi]σ2

j

= [W bi (π∗)−Wa

i (π∗)](σ2j wi −σ2

i w j)+σ2j w2

i −σ2i w2

j . (3.90)

As

W bi (π∗)−Wa

i (π∗) = [W bj (π∗)−wi]− [Wa

j (π∗)+wi] =W bj (π∗)−Wa

j (π∗)− 2wi,

(3.90) and the agreeable conditions show that

B1 = [W bj (π∗)−Wa

j (π∗)− 2wi+ 1](σ2j wi −σ2

i w j)

= [W bj (π∗)−Wa

j (π∗)− (wi −wj)](σ2j wi −σ2

i w j)−wiwj(σ2j −σ2

i )

≤ [W bj (π∗)−Wa

j (π∗)](σ2j wi −σ2

i w j)≤ σ2j wi −σ2

i w j (3.91)

Combining (3.89)–(3.91) with the agreeable conditions leads to

ECTV (π1)−ECTV (π∗) = A1 +B1 <−(µ2j wi − µ2

i w j)+σ2j wi −σ2

i w j ≤ 0.

This shows that π∗ cannot be an optimal sequence.

Now turn to case (ii). Let π1 = (. . . , i,k, j, . . . ). Similarly to (3.87)–(3.90), we canshow that

ECTV (π2)−ECTV (π∗) = A2 +B2,

where

A2 =−(µ jwk − µkwj)2 + 2[Mj(π∗)− M(π∗)](µ jwk − µkwj)

+ µ2j wk + µ2

k wj − 2µ jµkwk (3.92)

and

B2 = [W bj (π∗)−Wa

j (π∗)](σ2j wk −σ2

k wj)− (σ2j w2

k −σ2k w2

j). (3.93)

Thus in case (ii), (3.92)–(3.93) and the agreeable conditions imply

A2 <−2µ j(µ jwk − µkwj)+ µ2j wk + µ2

k wj − 2µ jµkwk

=−(µ2j wk − µ2

k wj)− 2µ jµk(wk −wj)≤−(µ2j wk − µ2

k wj)

and B2 ≤ σ2j w2

k −σ2k w2

j . Consequently,

ECTV (π2)−ECTV (π∗) = A2 +B2 <−(µ2j wk − µ2

k wj)+σ2j wk −σ2

k wj ≤ 0.

Thus again, π∗ cannot be an optimal sequence. As a result, an optimal sequencemust be W-shaped with respect to µi/wi.

Remark 3.4. If the distributions of the processing times belong to the same family,then for most well-known families for nonnegative random variables it is easy toverify that the agreeable conditions in Theorem 3.12 reduce to a single weightcondition: µ j > µi ⇒ wj ≤ wi. These families include:

• Exponential: σ2i = µ2

i ;

• Uniform over interval [0,bi]: σ2i = µ2

i /3;

• Erlang: σ2i = µ2

i /K; (where K is a positive integer);

• Chi-square: σ2i = 2µ2

i (for µ≥2);

• Poisson: σ2i = µi; and

• Geometric: σ2i = µi(µ1 − 1).

3.3.3 Algorithm

We now develop an algorithm based on the W-shape property in Theorem 3.12. Thesolution found by the algorithm will be optimal under the agreeable conditions. Forgeneral problems that do not satisfy the agreeable conditions, the algorithm is stillapplicable, although the solution obtained is not guaranteed to be optimal. Let π∗ bean optimal sequence that minimizes ECTV (π) under the agreeable conditions andM(π∗) the mean completion time under π∗. For the time being, let us assume thatM(π∗) is given. We will later see that the resultant algorithm does not depend onthis assumption.

Let job m ∈ 1, . . . ,n be the straddling job satisfying

Mm(π∗)− µm < M(π∗)≤ Mm(π∗)

and Jmi = 1,2, . . . , i∪m. We now examine how π∗ should sequence the jobs in

Jmi under the agreeable conditions. First, define

Ψi(π∗) = maxj∈Jm

i

Mj(π∗),

αi =Ψi(π∗)− M(π∗),

βi = the sum of weights of the jobs sequenced before the jobs in Jmi ,

Θi = ∑j∈Jm

i

µ j,

Wi = ∑j∈Jm

i

w j.

Let W be a large integer such that each Wwi takes integral values and assumeall other parameters to take integral values. Then it can be shown that αi must becontained in the following set (note that there should be no idle time between anypair of consecutive jobs and that the jobs in Jm

i must straddle M(π∗)):

Ci =

0,

1W

,2

W, . . . ,Θi −

1W

,Θi


and βi must be contained in the following set:

Di = 0,1,2, . . . ,W (1−Wi) .

Now consider job i = m. Without loss of generality assume that the jobs havebeen numbered so that µ1/w1 ≤ µ2/w2 ≤ · · · ≤ µn/wn. It follows from Theorem3.12 that, in an optimal sequence π∗, job i should be sequenced either immediatelybefore or immediately after the jobs in Jm

i−1. Define f mi (αi,βi) to be the contribution

of the jobs in Jmi towards the overall objective function, given αi ∈ Ci and βi ∈ Di.

It can be seen that, if job i is sequenced before the jobs in Jmi−1, then

f mi (αi,βi) = f m

i,b(αi,βi)

= wi(Θi −αi − µi)2 +

βi

W

(1− βi

W

)σ2

i + f mi−1(αi,βi +Wwi). (3.94)

On the other hand, if job i is sequenced after the jobs in Jmi−1, then

f mi (αi,βi) = f m

i,a(αi,βi)

= wiα2i +

(βi

W+Wi −wi

)(1− βi

W−Wi+wi

)σ2

i + f mi−1(αi − µi,βi). (3.95)

According to the principle of optimality of dynamic programming, we obtain thefollowing recursive relation:

f mi (αi,βi) = min f m

i,b(αi,βi), f mi,a(αi,βi), i = m, αi ∈ Ci, βi ∈ Di (3.96)

with boundary conditions:

f m0 (α,β ) =

⎧⎨

⎩wmα2 +

βW

(1− β

W

)σ2

m, if 0 ≤ α ≤ µm and

0 ≤ β ≤W (1−wm);+∞, otherwise.

(3.97)

Note that we should skip the calculation of the recursive relation (3.96) when i = m.Thus, in order that the calculation for i = m+ 1 can be continued, we simply let

f mm (α,β ) = f m

m−1(α,β ) ∀α,β . (3.98)

Now one can see that the calculations in (3.93)–(3.98) do not depend on any priorknowledge on M(π∗). Considering that it is possible to have m = 1,2, . . . ,n, we canenumerate all possible m and retain the best solution found during the enumerationas the optimal solution. In summary, we propose the following algorithm:

Algorithm 3.4

1. Set m = 1.

Appendix 135

2. For i = 1,2, . . . ,n, compute f mi (αi,βi) for all αi ∈ Ci and βi ∈ Di according to

(3.93)–(3.98).

3. Let F(m) = minαn∈Cn,βn∈Dn

f mi (αn,βn).

4. If m < n, let m = m+ 1 and return to step 2; otherwise go to the next step.

5. Let ECTV (π∗) = minF(1),F(2), . . . ,F(n).

6. Construct the sequence π∗ corresponding to ECTV (π∗) by a backward trackingprocess.

For any given m, step 2 of the algorithm needs at most O(W 2Θ) steps to enumerateall the combinations of αi ∈ Ci and βi ∈ Di for each i as |Ci|≤WΘi and |Di|≤W .Thus the total time requirement of step 2 is bounded above by O(nW 2Θ). Thisdominates the time requirements of steps 3, 5 and 6. Since m = 1,2, . . . ,n, the totaltime requirement is bounded above by O(n2W 2Θ). In particular, when wi = 1/n forall i = 1, . . . ,n (hence W = n), if

µi > µ j ⇒ σ2i ≥ σ2

j and µi ≥ µ j ⇒ µ2i − µ2

j ≥ σ2i −σ2

j ,

then Algorithm 3.3 can find an optimal sequence π∗ in time O(n4Θ).

Appendix

Justification of the Normal Distributions

The following proposition shows that, when each job consists of a number ofelementary tasks with random processing times, the job processing times will app-roximately follow normal distributions with variances proportional to means undercertain mild conditions.

Proposition 3.1. Let t1, t2, . . . be a sequence of independent and bounded ran-dom variables following arbitrary distributions and let ν j = E(t j) and τ2

j = Var(t j).If ν j and τ2

j are independently drawn from nonnegative integers according tosome distributions with finite means and variances ν and τ2 > 0 respectively, thenfor any subsequence j1, j2, . . . of 1,2, . . .,

(i) As K → ∞,

1K(ν j1 + · · ·+ν jK )→ ν and

1K(τ2

j1 + · · ·+ τ2jK )→ τ2 > 0 (3.99)

with probability 1; and


(ii) When (3.99) holds,

SK =K

∑k=1

t jk ∼ N(Kν,aKν) (3.100)

approximately for large K in the sense that

SK −KνaKν → N(0,1) in distribution as K → ∞, where a = τ2/ν . (3.101)

Proof. Part (i) follows immediately from the strong law of large numbers. To prove(ii), note that as t j are bounded, their moments of all orders are bounded. Hence theLiapunov Condition for the Central Limit Theorem with respect to SK is satisfied if(3.99) holds. Consequently,

SK − (µ j1 + · · ·+ µ jK )

(τ2j1+ · · ·+ τ2

jK)

=SK −E[SK]

Var(SK)→ N(0,1) (3.102)

in distribution as K → ∞.(3.101) then follows from (3.102) together with (3.99).

Remark 3.5. When each job consists of a number of elementary tasks with randomprocessing times represented by t j1 , t j2 , . . . , t jK, the job processing times are of theform SK (with different subsequences for different jobs). Thus (3.100) tells us thatthe job processing times approximately follow normal distributions with variancesproportional to means.

Ranges of r for Condition (3.2)

The ranges of r that satisfy condition (3.2) in various distributions of due datescommonly seen in the literature are listed below.

1. Degenerate distribution: Pr(d = d0) = 1. This is the special case in which dreduces to a constant d0, i.e., a common deterministic due date. In such a case(3.2) holds for r ≤ d0. Thus r = d0 and r ≤ r means that the starting time shouldbe no later than the due date.

2. Normal distribution: D ∼ N(µD ,σ2D). In this case r ≤ µD implies (3.3) so that

r = µD = E(d). Note that µD is also the median of D(t), i.e., Pr(d ≤ µD) = 0.5.Thus r ≤ r is equivalent to Pr(r ≥ d) ≤ Pr(r ≤ d), that is, the starting time ismore likely to be before the due date than after.

3. Uniform distribution: D ∼U([A,B]). Then r = (A+B)/2. Again r ≤ r is equiv-alent to Pr(r ≥ d)≤ Pr(r ≤ d).

Appendix 137

Note that in either normal or uniform distribution, the density fD(t) of the duedate D is symmetric about its median (µD or (A+B)/2) and (3.2) is equivalent toPr(r ≥ d)≤ Pr(r ≤ d). This can be generalized to any D with a symmetric densityfunction about some point m, say (which is necessarily the median and the mean).For such a D we must have r = m and (3.2) reduces to Pr(r ≥ d) ≤ Pr(r ≤ d).In addition to normal and uniform distributions, the following three symmetric den-sity functions are commonly seen in the literature:

4. Cauchy distribution:

fD(t) =1π

BB2 +(x−A)2 , −∞ < x < ∞, with m = A.

5. Student-t distribution with k degrees of freedom:

fD(t) =C(

1+(x−A)2

kB2

)−(k+1)/2

, −∞ < x < ∞, with m = A,

where C is a normalizing constant.

6. Laplace distribution:

fD(t) =1

2Be−|x−A|/B, −∞ < x < ∞, with m = A,

Among non-symmetric distributions, we consider the following:

7. Exponential distribution:

fD(t) =1µ e−x/µ , x ≥ 0.

In this case (3.2) is equivalent to r = 0, so r = 0. It is not unreasonable to con-sider r = 0 only for the exponential distribution as its density reaches its maxi-mum at 0, meaning that 0 is the ‘most likely point’ of the due date (rememberthat (3.2) requires r ≤ d in some sense).

8. Delayed exponential distribution:

fD(t) =1µ e−(x−A)/µ , x ≥ A.

Then r = A will ensure that (3.2) holds.

9. Gamma distribution:

fD(t) =Cxω−1e−x/µ , x > 0,


where ω > 0 and C is a normalizing constant. In this case (3.2) holds forr = max0,µ(ω − 1). Note that max0,µ(ω − 1) is the unique mode of thedensity, i.e., the point at which the density attains its maximum value. In otherwords, max0,µ(ω − 1) is the ‘most likely point’ of the due date.

10. Erlang distribution:

fD(t) =1

µkk!xk−1e−x/µ , x ≥ 0,

where k ∈ 1,2, . . .. This is a special case of the gamma distribution and sor = µ(k− 1).

11. Log-normal distribution:

fD(t) =1

xσ√

2πe−(logx−µ)/σ 2

, x > 0.

This equivalent to that ed ∼ N(µ ,σ2) and so r = eµ .

12. Poisson distribution with mean µ : In this case it can be shown that (3.2) holdsfor integer r ≤ µ . Since r is assumed to be integer-valued, we may take r = [µ ](the integral part of µ).

An Integral Inequality

Let ϕ(t) be a nonnegative function defined on (−∞,∞) with ϕ(−∞) = 0 and G(t)be any distribution function. If

(i) ϕ(t) is symmetric about s0 ∈ (−∞,∞) and nondecreasing on (−∞,s0]; and

(ii) r ≤ s0 satisfies∫

[r−x,r)dG(t)≤

∫

[r,r+x]dG(t) ∀x ≥ 0,

then ∫

(−∞,r)ϕ(t)dG(t)≤

∫

[r,∞)ϕ(t)dG(t). (3.103)

Proof. We first show that∫

(−∞,r)ϕ(t)dG(t)≤

∫

[r,∞)ϕ(2r− t)dG(t). (3.104)

Appendix 139

Since ϕ(t) and G(t) are nondecreasing and bounded on (−∞,r], we can applyFubini’s theorem to obtain

∫

(−∞,r)ϕ(t)dG(t) =

∫

−∞<t<r

∫

−∞<s≤tdϕ(s)dG(t)

=∫

−∞<s<r

∫

s≤t<rdG(t)dϕ(s)

and similarly,∫

[r,∞)ϕ(2r− t)dG(t) =

∫

r≤t<∞

∫

−∞<s≤2r−tdϕ(s)dG(t)

=∫

−∞<s≤r

∫

r≤t≤2r−sdG(t)dϕ(s).

By Condition (ii) we have∫

s≤t<rdG(t) =

∫

[s,r)dG(t) =

∫

[r−(r−s),r)dG(t)≤

∫

[r+(r−s),r)dG(t)

=∫

[2r−s,r)dG(t) =

∫

r≤t≤2r−sdG(t) ∀s ≤ r.

Thus (3.104) follows. Next, let t ≥ r. If t ≤ s0, then 2r − t ≤ r ≤ t ≤ s0 and soϕ(2r − t) ≤ ϕ(t) since ϕ is nondecreasing on (−∞,s0] by Condition (i); if t ≥ s0,then ϕ(2r− t) ≤ ϕ(2s0 − t) = ϕ(t) again by Condition (i). Thus ϕ(2r− t) ≤ ϕ(t)for all t ≥ r, and so (3.103) follows from (3.104).

Fibonacci Method

For completeness, here we outline the Fibonnaci method we adopt in Algorithm 3.2.Fibonnaci method is a procedure which gradually reduces the interval of uncertainty(the interval that contains the optimal solution) by evaluating and comparing twointerior points that are symmetrically placed inside the interval. Since we want tofind an integer r f as our solution, the width of the final interval of uncertainty can besufficiently set to be 1. The initial interval of uncertainty is clearly [rmin,rmax] (seeStep 1 of Algorithm 3.1). The main steps of the Fibonnaci method are below:

0. Let ra = rmin and rb = rmax. Set i = 1.

1. Compute r′ = rb − FN−iFN

(rb − ra) and r′′ = ra +FN−iFN

(rb − ra).

2. Evaluate G(r′) and G(r′′).

3. If G(r′′)> G(r′), let rb = r′′; else let ra = r′.


4. Stop if rb − ra ≤ wf , where wf is a pre-specified number (in our algorithm we setwf = 1); else let i = i+ 1 and continue.

5. If r′ is the interior point that still remains inside the interval of uncertainty, thenlet r′′ = r′ and r′ = rb − FN−i

FN(rb − ra); else let r′ = r′′ and r′′ = ra +

FN−iFN

(rb − ra).

6. Evaluate G(r′) and G(r′′) and then return to Step 3.

Chapter 4Stochastic Machine Breakdowns

The majority of machine scheduling problems studied in the literature assume thatthe machine to be used to process the jobs (tasks) is continuously available until alljobs are completed. In reality, however, it is a common phenomenon that a machine(or facility) may break down randomly from time to time. A machine breakdownmay be caused by an actual fault of the machine, or by a processing discipline thatassigns higher priority to certain jobs (cf. Heathcote 1961; Takine and Sengupta1997; Vieira et al. 2003). When a job of higher priority arrives, the machine hasto process it immediately, causing a disruption to the ordinary job being processed.For an ordinary job, an interruption by higher-priority jobs is equivalent, in effect,to a breakdown of the machine. Whichever type of breakdowns is involved, it isimperative to determine the optimal scheduling policies by taking into account theavailable information on the machine breakdown processes and the job processingrequirements.

Stochastic scheduling involving random machine breakdowns has emerged tobe an active line of research in the scheduling field over the past three decades;See, for example, Nicola et al. (1987), Birge et al. (1990), Groenevelt et al. (1992),Chimento and Trivedi (1993), Zhang and Graves (1997), Duffy (2000), Iannacconeet al. (2002), Jain and Foley (2002), Cai et al. (2003, 2004, 2005, 2009b), Yu and Qi(2004)), Hanemann et al. (2005), Herroelen and Leus (2005), and Ball et al. (2007)the references therein.

Generally, problems that have been addressed in the literature may be categorizedinto two classes of models, in accordance with the impacts of a machine breakdownto the job being processed:

(i) The preemptive-resume model: If a machine breaks down during the processingof a job, there is no loss of the work done on the job prior to the breakdownand the processing of the disrupted job can be resumed from where it was inter-rupted when the machine is fixed.

(ii) The preemptive-repeat model: If the machine breaks down before a job iscompleted, the work done on this job is totally lost and so its processing willhave to restart all over again after the machine resumes its operation.


141

142 4 Stochastic Machine Breakdowns

Earlier research on stochastic scheduling with machine breakdowns focusedmainly on problems under the preemptive-resume model. The literature includesPinedo and Rammouz (1988), Birge et al. (1990), Glazebrook and Owen (1991), Li,Braun and Zhao (1998), Mittenthal and Raghavachari (1993), Cai and Zhou (1999);Cai et al. (2000), Qi et al. (2000a,b), to name just a few.

Many results have also been reported in more recent studies on optimal schedul-ing policies under the preemptive-repeat model, for example, in Adiri et al. (1989,1991), Birge et al. (1990), Frostig (1991), Mehta and Uzsoy (1998), Lee and Lin(2001), Cai et al. (2003, 2004, 2005, 2009b), Glazebrook (2005), and Lee and Yu(2007), among others.

The preemptive-resume and preemptive-repeat models, however, are only twoextreme ends of the effects of machine breakdowns on job processing: the formerassumes no loss of work at a breakdown, whereas the latter assumes a total lossof work that has been done on the job being processed prior to the breakdown.In practice, neither end is precise to describe the actual impact of breakdowns. Moreoften, the accumulated work (referred to as the processing achievement) of a job isneither fully preserved nor totally lost after a machine breakdown, but “partially”lost, and the level of loss may vary with uncertainty. This reflects more adequatelythe real-world problems, but has not been studied in the previous literature. In orderto reflect the impact of the breakdown on the job being processed more closely,we will sometimes refer to a preemptive-resume breakdown as no loss of work, apreemptive-repeat breakdown as total loss of work, and any other type of breakdownleading to a partial loss as partial loss of work.

This chapter covers on scheduling problems where the machines are subjectto stochastic breakdowns. We first formulate machine breakdown processes inSect. 4.1, then discuss the optimal static policies under the no-loss, total-loss andpartial-loss machine breakdown models in Sects. 4.2–4.4, respectively. The optimaldynamic policies will be covered in Chap. 7.

4.1 Formulation of Breakdown Processes

4.1.1 Machine Breakdown Processes

Let Y1 denote the time that the machine suffers its first breakdown, which is calledthe first uptime of the machine. In other words, the machine works continuouslyfrom time 0 to Y1, and breaks down at time Y1. The first breakdown lasts for time(duration) Z1, which is the first downtime of the machine, until the machine is fixed.The machine resumes its operation at time Y1+Z1, and continues to work for time Y2(the second uptime) until its second breakdown at time Y1 +Z1 +Y2. Continuing inthis way, the breakdown process of the machine can be characterized by a sequenceYk,Zk∞

k=1, where Yk and Zk are the durations of the k-th uptime and downtime,respectively. We often refer to Yk as the k-th uptime and Zk as the k-th downtime.

4.1 Formulation of Breakdown Processes 143

We model the breakdown process Yk,Zk∞k=1 by a sequence of independent and

identically distributed (i.i.d.) pairs of nonnegative random variables, to reflect thestochastic nature of machine breakdowns. In addition, the uptimes Yk are assumedto be independent of the downtimes Zk.

For the uptime sequence Yk∞k=1, define a counting process N(t) : t ≥ 0 by

N(t) = supk ≥ 0 : Sk ≤ t, (4.1)

where Sk denotes the total uptime of the machine before its k-th breakdown, i.e.,S0 = 0 and

Sk =k

∑j=1

Yj, k ≥ 1. (4.2)

The process N(t) counts the number of breakdowns in total uptime t, with N(t) = kif and only if Sk ≤ t < Sk+1, or equivalently, N(t) ≥ k if and only if Sk ≤ t.N(t) : t ≥ 0 is often modeled by a Poisson process, which is equivalent to theassumption of exponentially distributed uptimes Yk. More specifically, the fol-lowing two assumptions are equivalent:

(i) N(t) : t ≥ 0 is a Poisson process with rate τ , that is, Pr(N(0) = 0) = 1 and

Pr(N(s+ t)−N(s) = k) = e−τt (τt)k

k!, k = 0,1,2, . . . , s, t ≥ 0. (4.3)

(ii) Y1,Y2, . . . are i.i.d. random variables exponentially distributed with mean 1/τ .

In processing a set of n jobs, if the machine breakdown pattern varies betweendifferent jobs, then the breakdown in processing job i is denoted by Yik,Zik∞

k=1,i = 1, . . . ,n. For each job i, Yik,Zik∞

k=1 is a sequence of i.i.d. random variables,and the uptimes Yik are independent of the downtimes Zik. Let Yi,Zi denotea representative pair with the common distribution of Yik,Zik. The representa-tive pairs Y1,Z1, . . . ,Yn,Zn of the n jobs are assumed independent, but need notbe identically distributed, since the distributions of the uptimes and downtimes maydiffer between the n jobs.

4.1.2 Processing Time and Achievement

Consider a single job to be processed by a machine under the breakdown processYk,Zk∞

k=1. Denote by P1 the initial processing time in the sense that the job willbe completed if it is processed by the machine for time P1 without interruption.The processing time is sometimes also called the processing requirement.

If P1 ≤Y1, then the job is completed at time P1. If P1 >Y1, however, the processingstops at time Y1 while the job remains unfinished. In this case, the time that themachine has spent on processing the job is referred to as the processing achievementof the job. It will be reduced by any loss of work to a new processing achievement

v1 at time Y1 +Z1 when the machine resumes its operation to process the job again.In the case of a preemptive-resume (no loss) breakdown, the work done prior to thebreakdown is fully preserved, hence the processing achievement is v1 = Y1 at timeY1 +Z1. If the breakdown is preemptive-repeat (total loss), then the work is totallylost after the breakdown and so v1 = 0 at time Y1 +Z1. In the more general case ofpartial loss, 0 ≤ v1 ≤ Y1, in other words, the breakdown caused a loss of Y1 − v1.

For subsequent breakdowns, suppose that the processing of the job has beeninterrupted by k breakdowns. Let vk denote the processing achievement and Pk+1 theprocessing time (not accounting for vk) of the job when the k-th breakdown ends.Then at time ∑k

j=1(Yj +Zj), the remaining processing time required to complete thejob is Pk+1 − vk. In the no-loss model, it is natural to assume P1 = P2 = · · · . In thetotal-loss case, we will consider two scenarios for the processing times P1,P2, . . . :(i) Identical processing times: P1 = P2 = · · · . This case has also been referred to as

without re-sampling processing times in the literature.(ii) Independent processing times: P1,P2, . . . are i.i.d. random variables. This has

been referred to as re-sampled processing times in the literature.

More explanations and justification of these two scenarios in the total-loss model areprovided in Cai et al. (2003); Cai and Zhou (2004), and will be discussed in Sect. 4.4.In the extension to the partial-loss model, a more general relationship between theprocessing times P1,P2, . . . will be allowed.

When the machine resumes processing the job after the k-th breakdown, if theremaining processing time does not exceed the next uptime, i.e., Pk+1 − vk ≤ Yk+1,then the job is completed at time Pk+1 − vk +∑k

j=1(Yj + Zj). If Pk+1 − vk > Yk+1,then the job remains unfinished at the next breakdown time Yk+1 +∑k

j=1(Yj +Zj).The processing achievement at the breakdown time is vk +Yk+1 < Pk+1, which willtransit to a new achievement vk+1 ∈ [0,vk +Yk+1] with some probability distributionat time ∑k+1

j=1(Yj +Zj) when the machine is up again to process the job.Clearly, we have vk =Y1+ · · ·+Yk (so that vk+1 = vk +Yk+1) in the no-loss model

and vk = 0 in the total-loss model. In the partial-loss model with uncertainty, thetransition of processing achievement from vk +Yk+1 to vk+1 will be governed by aprobability distribution. More details will be discussed in Sect. 4.4.

In processing n jobs, when the k-th breakdown is over, the next processing timefor job i is denoted by Pi,k+1, and the processing achievement at the breakdown timeis vik, i = 1, . . . ,n. For identical processing times, we write Pi = Pi1 = Pi2 = · · · . Forindependent processing times, Pi1,Pi2, . . . is a sequence of i.i.d. random variables,and Pi denotes a representative of Pi1,Pi2, . . . with the common distribution ofthe sequence. In either case, P1,P2, . . . ,Pn are assumed to be independent randomvariables following arbitrary probability distributions, and independent of Yik,Zik.

Remark 4.1. The notation P1 has been used for the first processing time when only asingle job is concerned, or the representative processing time of job 1 when multiplejobs are involved, and similarly for Y1 and Z1. From now on, Pi, Yi and Zi will refer tothe processing time, uptime and down time for job i, respectively, unless otherwisespecified.

4.2 No-Loss (Preemptive-Resume) Model 145

4.2 No-Loss (Preemptive-Resume) Model

This section is devoted to optimal static scheduling policies under the no-loss model.Job preemption is not allowed, since a static policy is determined at time zero andwill not change at any time.

4.2.1 Completion Time

First look at a single job. As discussed in Sect. 4.1, under the no-loss model,

• The processing times of the job between breakdowns are identical, denoted by P;• The processing achievement of the job at time ∑k

j=1(Yj +Zj) is vk = ∑kj=1Yj < P

(if the job has not been completed).

If the job is completed after exactly k breakdowns, then its completion time underthe no-loss breakdown model is

C =k

∑j=1

(Yj +Zj)+P− vk = P+k

∑j=1

Zj . (4.4)

By the definition of N(t) and Sk in (4.1) and (4.2), we have N(t) = k if and onlyif Sk ≤ t < Sk+1, and Sk < P ≤ Sk+1. Thus k = N(P−) = limt↑P N(t), and then itfollows from (4.4) that

C = P+N(P−)

∑j=1

Zj. (4.5)

Now consider a set of n jobs. The processing times of job i between consecu-tive breakdowns are identical with Pi = Pi1 = Pi2 = · · · , and the breakdown processof the machine in processing job i is Yik,Zik∞

k=1. Let Ni(t) denote the countingprocess defined in (4.1) for job i. Then the total time that job i occupies the machine(including both the uptimes and the downtimes of the machine during processingjob i), referred to as the occupying time, is given by

Oi = Pi +Ni(Pi−)

∑j=1

Zi j. (4.6)

Let ζ be the scheduling policy with zero idle times, no job preemption, and asequence π = (i1, . . . , in) that determines the processing order of the n jobs, suchthat il = j if and only if job j is the l-th to be processed, l = 1, . . . ,n. Such a policycan be identified with the sequence π , so we write ζ = π .

Define Bi(π) to be the set of jobs scheduled no later than job i under sequence π .Then the completion time of job i under sequence (policy) π is given by


Ci =Ci(π) = ∑k∈Bi(π)

Ok = ∑k∈Bi(π)

(Pk +

Nk(Pk−)

∑j=1

Zk j

). (4.7)

Remark 4.2. In allowing the breakdown process to vary between different jobs, wehave assumed implicitly that the breakdown process for each job is not affected bythose for other jobs. For example, the remaining uptime after finishing one job hasno influence on the first uptime to process the next job. This is reasonable if themachine is reset after completing each job to suit the next job.

We next derive optimal static policies for scheduling problems under variousperformance measures.

4.2.2 Minimizing Regular Cost Functions

As discussed in Chap. 2, a regular cost function is nondecreasing in completiontimes Ci. One of the most commonly studied regular cost functions is the weightedflowtime. So we start with the problem of minimizing the expected weighted flow-time.

Expected Weighted Flowtime

The expected weighted flowtime of n jobs under policy ζ is defined as

EWFT (ζ ) = E

[n

∑i=1

wiCi

]=

n

∑i=1

wiE[Ci(ζ )], (4.8)

where wi is the weight assigned to job i with w1 + · · ·+wn = 1. The problem is tominimize EWFT (ζ ) with respect to ζ .

It is clear that any positive idle time will only increase the objective function,hence it suffices to consider policies with zero idle times. Consequently, a policyζ reduces to a sequence π . When the machine is not subject to breakdowns, it hasbeen shown that the optimal sequence to minimize EWFT (π) is in nondecreasingorder of E[Pi]/wi (cf. Rothkopf 1966a).

From (4.7), we can easily see that the problem with processing times Pi andmachine breakdowns is actually equivalent to one with processing times Oi andno machine breakdowns. Therefore, the optimal sequence to minimize EWFT (π)is in nondecreasing order of E[Oi]/wi.

Denote by Fi(x) the common cdf of Yik and F (k)(x) the k-th fold convolutionof any function F(x). Then, by (4.6) and the independence between Yik and Zik,

E[Oi|Pi] = E

[Pi +

Ni(Pi−)

∑j=1

Zi j

]= Pi +E[Zi]E[Ni(Pi−)|Pi], (4.9)

where Zi is a representative of Zik. Since Ni(t)≥ k if and only if Yi1 + · · ·+Yik ≤ t,

E[Ni(Pi−)|Pi] =∞

∑k=1

Pr(Ni(Pi−)≥ k|Pi) =∞

∑k=1

Pr(Yi1 + · · ·+Yik < Pi|Pi)

=∞

∑k=1

E[F (k)

i (Pi−)|Pi

]. (4.10)

It follows from (4.9) and (4.10) that

E[Oi] = E[E[Oi|Pi]] = E[Pi]+E[Zi]∞

∑k=1

E[F(k)

i (Pi−)]. (4.11)

In particular, if Yik are exponentially distributed with mean E[Yi] = 1/τi, then Ni(t)is a Poisson process with rate τi, hence E[Ni(Pi−)|Pi] = τiPi. Then (4.9) yields

E[Oi] = E [Pi +E[Zi]τiPi] = (1+ τiE[Zi])E[Pi] =

(1+

E[Zi]

E[Yi]

)E[Pi]. (4.12)

Thus the optimal static policy to minimize the expected weighted mean flowtime issummarized by the following theorem:

Theorem 4.1. The optimal static policy to minimize EWFT (π) is to process jobsin nondecreasing order of E[Oi]/wi with zero idle times, where E[Oi] is givenby (4.11) for general uptime distributions and by (4.12) for exponential uptimes.

Remark 4.3. Equation (4.12) indicates that E[Oi], the expected time the machine isoccupied by job i, is increasing in E[Pi] or E[Zi], and decreasing in E[Yi]. Theseare consistent with the intuition, since longer processing time or downtime leadsto longer time required to complete the job, while longer uptime means fewerbreakdowns during processing the job, hence shorter occupying time.

General Regular Cost Functions

We now consider more general regular cost functions described in Sect. 2.4, butwithin the no-loss breakdown framework. Let fi(Ci) denote the cost of processingjob i, where fi(·) is a general regular (stochastic) cost function under the followingassumptions:

(i) fi(t), t ≥ 0 is a stochastic process independent of processing times Pi andbreakdown processes Yik,Zik;

(ii) fi(t), t ≥ 0 is nondecreasing in t ≥ 0 almost surely; and(iii) mi(t) = E[ fi(t)] exists and is finite for every t ≥ 0.

As in Sect. 2.4, we address the following two types of performance measureswith general regular costs:


(i) Total Expected Cost:

TEC(π) =n

∑i=1

E[ fi(Ci)]; (4.13)

(ii) Maximum Expected Cost:


E[ fi(Ci)]. (4.14)

The main result for the TEC problem is as follows.

Theorem 4.2. Assume that the breakdown processes Yik,Zik, i = 1, . . . ,n, of the njobs are identically distributed. Then the conclusions of Theorem 2.11 remain truein the no-loss breakdown model. Specifically, the sequence π∗ in nondecreasingstochastic order of the processing times is optimal to minimize T EC(π).

Proof. Let π = (. . . , i, j, . . . ) and π ′ = (. . . , j, i, . . . ) be two sequences with identicalorder except that two consecutive jobs i and j are interchanged. Following the samearguments in the proof of Theorem 2.11, it remains true that

T EC(π)−TEC(π ′) = E[ fi(Ci(π))]−E[ fi(Cj(π ′))]

+E[( fi − f j)(Cj(π ′))]−E[( fi − f j)(Ci(π ′))]. (4.15)

Let C denote the completion time of the job sequenced just before job i under π(which is the same job sequenced just before job j under π ′). Write

Gi(t) =Ni(t−)

∑k=1

Zik and G j(t) =Nj(t−)

∑k=1

Zjk.

Then Ci(π) =C+Gi(Pi) and Cj(π ′) =C+G j(Pj). By the assumption of identicallydistributed Yik,Zik for i = 1, . . . ,n, it is clear that Gi(t) and G j(t) have the samedistribution for any t, and both are nondecreasing in t a.s. since Zjk are nonnegative.Hence by Part (ii) of Lemma 2.5, Pi ≤st Pj implies Gi(Pi)≤st Gi(Pj), which in turnimplies Gi(Pi)≤st G j(Pj) since Gi(Pj) has the same distribution as G j(Pj).

Next, by the independence between jobs, C is independent of Gi(Pi) and G j(Pj).Thus by Part (i) of Lemma 2.5, Pi ≤st Pj =⇒ Ci(π)≤st Cj(π ′). It then follows fromPart (ii) of Lemma 2.5 that fi(Ci(π))≤st fi(Cj(π ′)), which implies

E[ fi(Ci(π))]≤ E[ fi(Cj(π ′))]. (4.16)

Furthermore, since job j is sequenced ahead of job i under π ′, Cj(π ′)≤Ci(π ′) a.s.,hence by Part (iii) of Lemma 2.5 and the assumption of nondecreasing mi −m j,

E[( fi − f j)(Cj(π ′))]≤ E[( fi − f j)(Ci(π ′))]. (4.17)

Combining (4.16) and (4.17), we get T EC(π)−T EC(π ′)≤ 0 from (4.15). Then theconclusions of Theorem 2.11 follow.


From the proof of Theorem 4.2, we can see that

Pi ≤st Pj =⇒ Ci(π)≤st Cj(π ′)

holds in the no-loss breakdown model. As a result, the proofs for Theorems 2.12–2.15 remain valid. Therefore, under the same condition in Theorem 4.2, we canextend all results of Theorems 2.12–2.15 to the no-loss breakdown model.

Remark 4.4. Theorem 4.2 and the same results of Theorems 2.12–2.15 in the no-lossbreakdown model generalize many previous results in the literature and cover quitegeneral grounds for processing times, due dates, machine breakdown processes, andperformance measures. For more general performance measures, however, variousconditions are still required, such as stochastic order, agreeability, and identicallydistributed breakdown processes between jobs. In the case of the expected meanweighted flowtime (EWFT), on the other hand, such conditions are not needed dueto the simple structure of EWFT, as demonstrated in Theorem 4.1.

4.2.3 Minimizing Irregular Costs

We now consider the problem to minimize the total expected costs under irregularperformance measures. The processing times Pi are assumed to be exponentiallydistributed with means µi, the uptimes Yik are exponential with a common meanτ , the downtimes Zik follow a common distribution with mean ν and variance σ2,and the due dates Di are independent of Pi and Yik,Zik.

The scheduling problems for earliness-tardiness penalties are usually NP-hard,even in simple cases without machine breakdowns, hence analytical solutions arenot available. Instead, our target is to identify some structural properties for theoptimal sequence, so as to develop algorithms using techniques such as dynamicprogramming to compute the optimal schedules in pseudo-polynomial time.

Symmetric Quadratic Earliness-Tardiness Costs

As considered in Sect. 3.1.2, the total expected cost with a symmetric quadratic costfunction is given by

T EC(π) = E

[n

∑i=1

wi(Ci −Di)2

]=

n

∑i=1

wiE[(Ci −Di)2] (4.18)

with w1 + · · ·+wn = 1.The next theorem extends Theorem 3.3 to the no-loss breakdown model.

Theorem 4.3. If the due dates D1, . . . ,Dn have a common distribution, then underthe no-loss breakdown model, an optimal sequence to minimize T EC(π) in (4.18)is V-shaped with respect to µi/wi.


Proof. Let D denote a representative of the due dates Di. Note that the completiontime Ci = Ci(π) can be expressed by (4.7) with Oi defined in (4.6), and Ni(t) is aPoisson process with rate τi. By (4.12),

E[Oi] = (1+ τE[Zi])E[Pi] = (1+ τν)µi. (4.19)

Hence (4.7) gives

E[Ci] = ∑j∈Bi

E[O j] = (1+ τν) ∑j∈Bi

µ j, (4.20)

where Bi = Bi(π) is the set of jobs scheduled no later than job i under sequence π .From (4.6) we further calculate

E[O2i ] = E

[Pi +

Ni(Pi−)

∑k=1

Zik

]2

= E

[P2

i + 2Oi

Ni(Pi−)

∑k=1

Zik +

(Ni(Pi−)

∑k=1

Zik

)2]

= E[P2i ]+ 2E[Pi(τPi)E[Z]]+E[Ni(Pi−)]Var(Z)+E[N2

i (Pi−)]E2[Z]

= E[P2i ]+ 2ντE[P2

i ]+σ2E[τPi]+ν2E[τPi + τ2P2i ]

= (1+ 2ντ +ν2τ2)E[P2i ]+ τ(ν2 +σ2)E[Pi]

= 2(1+ντ)2µ2i + τ(ν2 +σ2)µi. (4.21)


E[C2i ] = E

[

∑j∈Bi

O2j + ∑

j,k∈Bi, j =kO jOk

]= ∑

j∈Bi

E[O2j ]+ ∑

j,k∈Bi, j =kE[O j]E[Ok]

= ∑j∈Bi

E[O2j ]+

(

∑j∈Bi

E[O j]

)2

− ∑j∈Bi

(E[O j])2. (4.22)

Thus by (4.19)–(4.22) and the independence between Di and Pi, after somealgebra, the objective function in (4.18) can be expressed by

TEC(π) =n

∑i=1

wi(E[C2

i ]− 2E[D]E[Ci]+E[D2])

=n

∑i=1

wi(1+ντ)2

⎧⎨

⎩∑j∈Bi

µ2j +

(

∑j∈Bi

µ j

)2⎫⎬

⎭

+n

∑i=1

wi

τ(ν2 +σ2)− 2E[D](1+ντ)

∑j∈Bi

µ j +E[D2]

= (1+ντ)2(V1 +V2)+ τ(ν2 +σ2)− 2E[D](1+ντ)V3+E[D2](4.23)

(recall that w1 + · · ·+wn = 1), where Vl =Vl(π), l = 1,2,3, are defined by

V1 =n

∑i=1

wi ∑j∈Bi

µ2j , V2 =

n

∑i=1

wi

(

∑j∈Bi

µ j

)2

, V3 =n

∑i=1

wi ∑j∈Bi

µ j. (4.24)

Given two jobs i and j, let π = . . . , i, j, . . . and π ′ = . . . , j, i, . . . denote twosequences which are identical except that the order of jobs i and j is switched. LetB∗ = Bi(π)− i= Bi(π ′)− j and write V ′

l = Vl(π ′), l = 1,2,3. Then the samearguments as in the proof of Theorem 3.3 lead to V1 −V ′

1 =−wiµ2j +wjµ2

i ,

V2 −V ′2 =−wiµ j

(2 ∑

k∈B∗µk + 2µi+ µ j

)+wjµi

(2 ∑

k∈B∗µk + 2µ j + µi

),

and V3 −V ′3 =−wiµ j +wjµi. Substituting these into (4.23) yields

TEC(π)−TEC(π ′) = (wjµi −wiµ j)

2(1+ντ)2

(∑

k∈B∗µk + µi + µ j

)

+ τ(ν2 +σ2)− 2E[D](1+ντ)

(4.25)

Let π be a sequence that is not V-shaped with respect to µi/wi. Without loss ofgenerality we can assume that π = 1,2, . . . ,n. Then there are three consecutivejobs i, i+ 1 and i+ 2 under π such that µi/wi < µi+1/wi+1 > µi+2/wi+2. Thus

wi+1µi −wiµi+1 < 0 and wi+2µi+1 −wi+1µi+2 > 0 (4.26)

Let π ′ denote the sequence which switches jobs i and i+1 in π , and π ′′ the sequencewhich switches jobs i+ 1 and i + 2 in π . Therefore, π = . . . , i, i + 1, i + 2, . . .,π ′ = . . . , i+ 1, i, i+ 2, . . . and π ′′ = . . . , i, i+ 2, i+ 1, . . .. By (4.26),

TEC(π)−TEC(π ′)

= (wi+1µi −wiµi+1)

2(1+ντ)2

i+1

∑k=1

µk +[τ(ν2 +σ2)− 2E[D](1+ντ)

]

= (wi+1µi −wiµi+1)(Ai+1 +B) (4.27)

where for m = 1,2, . . . ,n,

Am = 2(1+ντ)2m

∑k=1

µk and B = β [τ(ν2 +σ2)− 2E[D](1+ντ)]. (4.28)

Similarly,

T EC(π)−TEC(π ′′) = (wi+2µi+1 −wi+1µi+2)(Ai+2 +B). (4.29)

If Ai+1 +B < 0, then by (4.26) and (4.27),

TEC(π)−TEC(π ′) = (wi+1µi −wiµi+1)(2Ai+1 +B)> 0.

If Ai+1 +B ≥ 0, then by (4.28),

Ai+2 +B = Ai+1 + 2(1+ντ)2µi+2 +B > Ai+1 +B ≥ 0

and so by (4.26) and (4.29),

TEC(π)−TEC(π ′′) = (wi+2µi+1 −wi+1µi+2)(2Ai+2 +B)> 0.

In either case, π cannot be an optimal sequence. Thus an optimal sequence must beV-shaped with respect to µi/wi.

Asymmetric Quadratic Earliness-Tardiness Costs

As in Sect. 3.1.2, the total expected cost with asymmetric quadratic earliness andtardiness penalties is

TEC(π) = E

[α ∑

Ci<Di

wi(Di −Ci)2 +β ∑

Ci>Di

wi(Ci −Di)2

]

=n

∑i=1

wi

αE[(Di −Ci)

2ICi<Di]+β E

[(Ci −Di)

2ICi>Di]

. (4.30)

We further assume that the due dates D1, . . . ,Dn are exponentially distributed with acommon mean 1/δ , hence E[D] = 1/δ and E[D2] = 2/δ 2. Since Di is independentof Pi and Yik,Zik, the exponential distribution of Di yields

Pr(Di > Oi|Pi = x) = Pr

(Di > x+

Ni(x−)

∑k=1

Zik

)

= Pr(Di > x)∞

∑m=0

Pr(Di > Zi1 + · · ·+Zim)Pr(Ni(x−) = m)

= e−δx∞

∑m=0

[Pr(D > Z)]m(τx)m

m!e−τx = e−ηx,

where

η = δ − τPr(D > Z)+ τ = δ + τPr(D ≤ Z). (4.31)

It follows that

Pr(Di > Oi) = E[Pr(Di > Oi|Pi)] = E[e−ηPi ] (4.32)

and soPr(Di >Ci) = ∏

k∈Bi

Pr(Di > Oi) = ∏k∈Bi

E[e−ηPk ] = ∏k∈Bi

fk, (4.33)

where

fk = E[e−ηPk ] =1

1+ µkη and 1− fi =µiη

1+ µiη= ηµi fi. (4.34)

Note that (4.33) and (4.34) are the same as (3.39) and (3.40) except with η in placeof δ . Hence the same derivation for (3.41) gives

E[(Di −Ci)2ICi<Di] =

2δ 2 ∏

k∈Bi

fk (4.35)

Then by (4.35) together with the result for the symmetric case in (4.23), we canrewrite (4.30) as

T EC(π) = (α −β )n

∑i=1

wiE[(Di −Ci)

2ICi<Di]+β

n

∑i=1

wiE[(Ci −Di)

2]

= β(1+ντ)2(V1 +V2)+

[τ(ν2 +σ2)− 2

δ (1+ντ)]

V3 +2

δ 2

+2

δ 2 (α −β )V4 (4.36)

with V1, V2, V3 defined in (4.24) and V4 = ∑ni=1 wi ∏k∈Bi

fk.Based on (4.36) and following the same proof as for Theorem 3.4, we obtain the

next theorem for the asymmetric case.

Theorem 4.4. If the due dates D1, . . . ,Dn are exponentially distributed with acommon mean 1/δ , then under the no-loss breakdown model, an optimal sequenceπ∗ that minimizes T EC(π) in (4.36) is V-shaped with respect to µi/wi.

Proof. Given sequences π = . . . , i, i+1, i+2, . . ., π ′ = . . . , i+1, i, i+2, . . . andπ ′′ = . . . , i, i+ 2, i+ 1, . . ., by similar arguments leading to (3.45) and (3.46) wecan derive from (4.36) that

T EC(π)−TEC(π ′) = (wi+1µi −wiµi+1)(2Ai+1 +B) (4.37)

andT EC(π)−TEC(π ′′) = (wi+2µi+1 −wi+1µi+2)(2Ai+2 +B), (4.38)

where

Am = β (1+ντ)2m

∑k=1

µk −1

δ 2 (α −β )ηm

∏k=1

fk, m ∈ 1, . . . ,n. (4.39)

Since

η = δ + τPr(D ≤ Z) = δ + τE[1− e−δZ]≤ δ + τE[δZ] = δ (1+ τν),

it follows from (4.34) and (4.39) that

Am+1 −Am = β (1+ντ)2µm+1 −1

δ 2 (α −β )η( fm+1 − 1)m

∏k=1

fk

= β (1+ντ)2µm+1 −1

δ 2 (β −α)η2µm+1 fm+1

m

∏k=1

fk

≥ β µm+1(1+ντ)2 −β µm+1η2

δ 2

m+1

∏k=1

fk

≥ β µm+1

(1+ντ)2 − η2

δ 2

≥ 0

where one of the inequalities must be strict unless both α and β are equal to zero,which of course is excluded. As a result, Am is strictly increasing in m.

Suppose that (4.26) holds. If 2Ai+1 + B < 0, then T EC(π)− T EC(π ′) > 0by (4.37). If 2Ai+1 +B ≥ 0, then, since Am is increasing in m, 2Ai+2 +B > 2Ai+1 +B ≥ 0 and so by (4.38), T EC(π)−TEC(π ′′)> 0. In either case, π cannot minimizeT EC. Thus an optimal sequence must be V-shaped with respect to µi/wi.

Asymmetric Linear Earliness-Tardiness Costs

The total expected cost with asymmetric linear earliness-tardiness penalties is

T EC(π) = E[

∑Ci<Di

αi(Di −Ci)+ ∑Ci(π)>Di

βi(Ci(π)−Di)

]

=n

∑i=1

αiE[(D−Ci)ICi<D]+βiE[(Ci −D)ICi>D]

, (4.40)

where αi and βi are the unit earliness and tardiness costs, respectively, for job i.Note that the setting of unit costs αi and βi is more general than that for the

asymmetric quadratic costs in (4.30), where the weighted unit costs αwi and β wihave a constant ratio β/α , whereas βi/αi may vary with i.

We assume the same situations of Pi, Di and Yik,Zik as for the asymmetricquadratic costs. Then by similar arguments leading to (4.36). the total expected costin (4.40) can be expressed as

T EC(π) =n

∑i=1

βi(1+ντ) ∑k∈Bi

µk +n

∑i=1

(αi +βi)1δ ∏

k∈Bi

fk −1δ

n

∑i=1

βi (4.41)

The V-shape of the optimal sequence to minimize TEC(π) in (4.40) or (4.41) isstated in the next theorem.

Theorem 4.5. Define

γi j =

(α j

µ j− αi

µi

)(β j

µ j− βi

µi

)−1

if β jµi = βiµ j (4.42)

(γi j need not be defined if β jµi = βiµ j). If γi j satisfy

1+ γ jk < max(1+ηµk)(1+ γi j),δ (1+ντ)/η (4.43)

for all distinct i, j,k ∈ 1, . . . ,n such that γ jk and γi j are defined, then an optimalsequence π∗ that minimizes T EC(π) is V-shaped with respect to µi/βi.

Proof. For π = (. . . , i, j, . . . ) and π ′ = (. . . , j, i, . . . ), (4.41) and (4.42) give

T EC(π)−TEC(π ′) = µiµ j

(β j

µ j− βi

µi

)1+ντ − (1+ γi j)

ηδ fi f j ∏

k∈B∗fk

.

(4.44)

Consequently, given π = . . . , i, i + 1, i + 2, . . ., π ′ = . . . , i + 1, i, i + 2, . . . andπ ′′ = . . . , i, i+ 2, i+ 1, . . ., we have

T EC(π)−TEC(π ′) = µiµi+1

(βi+1

µi+1− βi

µi

)Ai (4.45)

and

TEC(π)−TEC(π ′′) = µi+1µi+2

(βi+2

µi+2− βi+1

µi+1

)Ai+1, (4.46)

where

Ai = 1+ντ − (1+ γi,i+1)ηδ

i+1

∏k=1

fk. (4.47)

Suppose thatµi

βi<

µi+1

βi+1>

µi+2

βi+2or

βi

µi>

βi+1

µi+1<

βi+2

µi+2. (4.48)

If Ai < 0, then (4.45) and (4.48) imply T EC(π)− T EC(π ′) > 0. If Ai ≥ 0, thenby (4.43) and (4.47), either

1+ γi,i+1 >1+ γi+1,i+2

1+ηµi+2= (1+ γi+1,i+2) fi+2 =⇒

Ai+1 = 1+ντ − (1+ γi+1,i+2)ηδ

i+2

∏k=1

fk = 1+ντ − (1+ γi+1,i+2) fi+2ηδ

i+1

∏k=1

fk

> 1+ντ − (1+ γi,i+1)ηδ

i+1

∏k=1

fk = Ai ≥ 0,

or 1+ γi+1,i+2 < δ (1+ντ)/η =⇒ Ai+1 > 1+ντ − (1+ γi+1,i+2)η/δ > 0. HenceAi+1 > 0 in either case, which implies T EC(π)−TEC(π ′′)> 0 by (4.46) and (4.48).Thus π cannot minimize T EC if (4.48) holds. Consequently, an optimal sequencemust be V-shaped with respect to µi/βi.

Like Theorem 3.5, Theorem 4.5 covers Case 1 in (3.57) and Case 2 in (3.58) withη in place of δ . Furthermore, the next two theorems identify the situations in whichan analytical optimal sequence exists.

Theorem 4.6. If µi/βi and µi/αi have opposite orders, then a sequence in non-decreasing order of µi/βi, or in nonincreasing order of µi/αi, is optimal tominimize T EC(π).

Proof. Let β j/µ j ≥ βi/µi and α j/µ j ≤ αi/µi. Since

ηδ fi f j ∏

k∈B∗fk <

ηδ =

1δ (δ + τE[1− e−δZ])≤ 1

δ (δ + τE[δZ])

=1δ (δ + τδν) = 1+ντ,

it follows from the middle equation in (4.44) that T EC(π)− TEC(π ′) ≥ 0. ThusT EC(π) ≥ T EC(π ′) if and only if β j/µ j ≥ βi/µi, or µ j/β j ≤ µi/βi. The theoremthen follows.

Theorem 4.7.(i) Let b = δ (1+ντ)(1+ηµ1)(1+ηµ2)/η − 1. If

∣∣∣∣α j

µ j− αi

µi

∣∣∣∣≤ b∣∣∣∣β j

µ j− βi

µi

∣∣∣∣ ∀i, j = 1, . . . ,n,

then a sequence in nondecreasing order of µi/βi is optimal.(ii) Let b = δη−1(1+ντ)∏n

k=1(1+ηµk)− 1. If∣∣∣∣α j

µ j− αi

µi

∣∣∣∣≥ b∣∣∣∣β j

µ j− βi

µi

∣∣∣∣ , ∀i, j = 1, . . . ,n,

then a sequence in nonincreasing order of µi/αi is optimal.

Proof. Under the conditions of Part (i),

(1+ γi j)ηδ fi f j ∏

k∈B∗fk ≤ (1+ b)

ηδ fi f j =

(1+ντ)(1+ηµ1)(1+ηµ2)

(1+ηµi)(1+ηµ j)≤ 1+ντ.

Hence by (4.44),

TEC(π)≥ T EC(π ′) ⇐⇒β j

µ j≥ βi

µi⇐⇒

µ j

β j≤ µi

βi.

4.3 Total-Loss (Preemptive-Repeat) Model 157

So an optimal sequence should schedule job j ahead of job i if µ j/β j ≤ µi/βi. Thisproves Part (i).

Next, by the definition of b and that of fk in (4.34),

(1+ b)ηδ fi f j ∏

k∈B∗fk ≥ (1+ b)

ηδ

n

∏k=1

fk = (1+ντ)n

∏k=1

(1+ηµk) fk = 1+ντ. (4.49)

Let µ j/α j ≤ µi/αi or α j/µ j ≥ αi/µi. Then by (4.44) and (4.49) together with theconditions of the theorem,

TEC(π)−TEC(π ′)≤µiµ j

b

(α j

µ j− αi

µi

)1+ντ − (1+ b)

ηδ fi f j ∏

k∈B∗fk

≤ 0.

Thus an optimal sequence should place job i ahead of job j if µ j/α j ≤ µi/αi. Thisproves Part (ii).

Expected Cost of Earliness and Tardy Jobs

To minimize the total expected cost T EC(ζ ) for earliness and tardy jobs in (3.64),note that by (4.33), the expression for TEC(ζ ) in (3.66) is valid with fk = E[e−ηPk ],where η = δ + τ Pr(Z ≤ D) is defined in (4.31). Consequently, Theorems 3.8–3.10remain true under the no-loss breakdown model with fk = E[e−ηPk ], k = 1, . . . ,n.

4.3 Total-Loss (Preemptive-Repeat) Model

In a total-loss (preemptive-repeat) machine breakdown model, if a breakdownoccurs before a job is completed, the work done on this job is totally lost, so that thejob has to be restarted without any preserved achievement when the machine canprocess it again. One industrial example of the preemptive-repeat model is in metalrefinery where the raw material is to be purified by melting it in very high temper-ature. If a breakdown (such as power outage) occurs before the metal is purified tothe required level, it will quickly cool down and the heating process has to be startedagain after the breakdown is fixed. Other examples include running a program ona computer, downloading a file from the Internet, performing a reliability test on afacility, etc. Generally, if a job must be continuously processed with no interruptionuntil it is totally completed, then the total-loss formulation may be used to modelthe processing pattern of the job in the presence of machine breakdowns.

Unlike the no-loss model where the processing times are not affected by break-downs, there are two scenarios for the processing times after a breakdown thatcauses a total loss of processing achievement:(a) Identical processing times: the processing time after a breakdown remains a

same (but unknown) amount as that before the breakdown with respect to the


same job, so that Pi1 = Pi2 = · · · = Pi. In a practical sense, this scenario mayoccur in the situation where the uncertainty or randomness of the processingtime is internal to the job (such as the quality of raw material in the exampleof metal refinery). This randomness is not influenced by the condition of themachine and so the processing time does not vary between machine breakdownsfor the same job.

(b) Independent processing times: the processing time is re-sampled independentlyafter each breakdown, but with the same probability distribution as that beforethe breakdown. That is, Pi1,Pi2, . . . is an i.i.d. sequence of random variableswith a representative Pi that has the common distribution of the sequence.This scenario arises when the processing time is influenced by random fac-tors external to the job (such as the condition of the machine), hence may varyindependently each time when the same job is repeated, following a specificprobability distribution.

To see the difference between these two scenarios, let us look at an intuitive andsimple example. Suppose that the processing time of a job is a random variabletaking possible values between 5 and 10 min, say. Assume that a breakdown occursafter the job has been processed continuously for 7 min, but before it is completed.Then in Case (a) (identical processing times), we know that the job will need at least7 min to complete. Hence given the information available from previous processingexperience, the next processing time will be in the range of 7–10 min. In Case (b)(independent processing times), on the other hand, the information from previousexperience is lost and the processing time may still take any value between 5 and10 min. Therefore, in Case (a), the work done on a job is totally lost when a break-down occurs, but not the information from previous experience, whereas in Case(b), both work and information accumulated in previous experience are lost whenthe machine breaks down.

In this section, we will address the optimal static policies for two performancemeasures under the total-loss model: (i) to minimize the expected weighted flow-time; and (ii) to maximize the expected discounted reward. Both measures havebeen considered extensively in the literature. We first derive the formulae for theexpected time that the machine is occupied by each job, which plays a key role infinding the optimal static policies for these two performance measures.

4.3.1 Expected Occupying Time

Under the total-loss model, the processing achievement of a job is always zero atany time when the machine resumes operation after a breakdown. Suppose that jobi is completed after experiencing k breakdowns during its processing (k ≥ 0). Then,at time ∑k

i=1(Yi j +Zi j) (counting from the start of processing job i), the remainingprocessing time required to complete job i is Pi,k+1, and Pi,k+1 ≤Yi,k+1. Job i is thencompleted time Pi,k+1 later. Therefore, the total time that job i occupies the machineis given by

Oi = Pi,k+1 +k

∑i=1

(Yi j +Zi j) (4.50)

(where by convention, the sum is zero for k = 0).Next we derive the expected occupying time of each job for the cases of identi-

cal and independent processing times separately, and then make some comparisonsbetween the two cases.

Identical Processing Times

In the case of identical processing time Pi for job i, job i is completed after kbreakdowns during its processing if and only if Yi0 < Pi,Yi1 < Pi, . . . ,Yik < Pi andYi,k+1 ≥ Pi, where Yi0 = 0. Define a counting process Ni(t) : t ≥ 0 by

Ni(t) = supk ≥ 0 : Yi0 < t,Yi1 < t, . . . ,Yik < t.

Then job i is completed after experiencing k breakdowns during its processing ifand only if Ni(Pi) = k. It then follows from (4.50) that

Oi = Pi +Ni(Pi)

∑k=0

(Yik +Zik), where Zi0 = 0. (4.51)

Let Fi(t) denote the cdf of Yik. Then

PrNi(t) = k= PrYi0 < t,Yi1 < t, . . . ,Yik < t,Yi,k+1 ≥ t= PrYi1 < t· · ·PrYik < tPrYi,k+1 ≥ t= Fk

i (t−)[1−Fi(t−)], k = 0,1,2, . . . . (4.52)

This shows that Ni(t) follows a geometric distribution with parameter 1−Fi(t−).Consequently,

E[Ni(t)] =Fi(t−)

1−Fi(t−). (4.53)

The following theorem provides a formula for the expected occupying time E[Oi]of job i, which is crucial to finding an optimal static policy, and a necessary andsufficient condition for E[Oi] to be finite.

Theorem 4.8. The expected occupying time of job i with identical processing timePi is given by

E[Oi] = E[

11−Fi(Pi−)

(∫ Pi

0(1−Fi(y))dy+νiFi(Pi−)

)]. (4.54)

Furthermore, E[Oi]< ∞ if and only if

E[

11−Fi(Pi−)

]< ∞. (4.55)

Proof. Let µi = E[Yi1] and νi = E[Zi1] denote the expected uptime and downtime ofthe machine, respectively, in processing job i. Given t > 0, as Yi1,Yi2, . . . are i.i.d.,

E

[Ni(t)

∑k=0

Yik

∣∣∣ Ni(t) = n

]= E

[n

∑k=0

Yik

∣∣∣ Yi0 < t,Yi1 < t, . . . ,Yin < t,Yi,n+1 ≥ t

]

=n

∑k=0

E[Yik|Yik < t] = nE[Yi1|Yi1 < t]

=nE[Yi1I(Yi1<t)]

Pr(Yi1 < t)=

nFi(t−)

∫

[0,t)ydFi(y). (4.56)

Next, noting that the downtimes are independent of the uptimes for every job,

E

[Ni(t)

∑k=0

Zik

∣∣∣ Ni(t) = n

]= E

[n

∑k=0

Zik

]= nE[Zi1] = nνi. (4.57)

Thus, by the law of iterated expectation, (4.56)–(4.57) and (4.53),

E

[Ni(t)

∑k=0

(Yik +Zik)

]= E

[E

[Ni(t)

∑k=0

(Yik +Zik)

∣∣∣∣∣Ni(t)

]]

= E[

Ni(t)(

1Fi(t−)

∫

[0,t)ydFi(y)+νi

)]

=

(1

Fi(t−)

∫

[0,t)ydFi(y)+νi

)Fi(t−)

1−Fi(t−)

=1

1−Fi(t−)

(∫

[0,t)ydFi(y)+νiFi(t−)

).

This together with (4.51) shows that

E [Oi] = E

[E

(Pi +

Ni(Pi)

∑k=0

(Yik +Zik)

∣∣∣∣ Pi

)]

= E[

Pi +1

1−Fi(Pi−)

(∫

[0,Pi)ydFi(y)+νiFi(Pi−)

)]. (4.58)

In addition, by Fubini Theorem,∫

[0,t)ydFi(y) =

∫

[0,t)

∫

[0,y)dxdFi(y) =

∫

[0,t)

∫

(x,t)dFi(y)dx =

∫

[0,t)[Fi(t−)−Fi(x)]dx

=−t[1−Fi(t−)]+∫ t

0[1−Fi(x)]dx. (4.59)

Substituting (4.59) into (4.58) proves (4.54). Furthermore, if (4.55) holds, thenby (4.54),

E[Oi]≤ (µi +νi)E[

11−Fi(Pi−)

]< ∞.

Conversely, if E[Oi]< ∞, then by (4.54),

E[

11−Fi(Pi−)

]= E

[Fi(Pi−)

1−Fi(Pi−)+ 1]≤ 1

νiE[Oi]+ 1 < ∞,

which completes the proof.

Remark 4.5. Theorem 4.8 provides (4.55) as the necessary and sufficient conditionto ensure a finite expectation for the time that job i occupies the machine in the caseof identical processing times. If Yik and Pi are exponentially distributed with meansE[Yik] = 1/βi and E[Pi] = 1/ηi, then condition (4.55) becomes E[eβiPi ] < ∞. Thisholds if and only if βi < ηi, or E[Yik] > E[Pi], i.e., the average length of an uptimefor job i must be greater than the average time needed to process that job in order toensure that the job can be completed within a finite expected time. This is intuitivefrom a practical point of view when each machine breakdown causes a total loss andthe processing times are identical between breakdowns.

Independent Processing Times

In this case, each time when a job is repeated, the processing time required isre-sampled independently according to its probability distribution. Thus the pro-cessing times Pik∞

k=1 for job i is a sequence of i.i.d. random variables. Define

Ti = supk ≥ 0 : Yi1 < Pi1,Yi2 < Pi2, . . . ,Yik < Pik, (4.60)

which represents the number of breakdowns during processing job i. Hence by (4.50),the occupying time of job i is given by

Oi = Pi,Ti+1 +Ti

∑k=1

(Yik +Zik). (4.61)

Let Pi = Pik∞k=1, Pi be a representative of Pik, and Fi(y) the cdf of the uptimes

Yik. Then

PrTi = k | Pi= PrYi1 < Pi1, . . . ,Yik < Pik,Yi,k+1 ≥ Pi,k+1 | Pi

=

[k

∏i=1

PrYi j < Pi j | Pi j]

PrYi,k+1 ≥ Pi,k+1 | Pi,k+1

= [1−Fi(Pi,k+1−)]k

∏i=1

Fi(Pi j−), k = 0,1,2, . . . ,

(where ∏0i=1 = 1). This provides the conditional distribution of Ti given Pi. The

unconditional distribution of Ti is given by

PrTi = k= E[PrTi = k | Pi] = 1−E[Fi(Pi−)]Ek[Fi(Pi−)], (4.62)

which is a geometric distribution with parameter 1−E[Fi(Pi−)]. As a result,

E[Ti] =E[Fi(Pi−)]

1−E[Fi(Pi−)]. (4.63)

Similar to Theorem 4.8, by (4.61), (4.63), and conditioning on Pi and Ti, we canobtain the following formula for the expected occupying time of job i:

E[Oi] =1

1−E[Fi(Pi−)]

(E[∫ Pi

0(1−Fi(y))dy

]+νiE[Fi(Pi−)]

). (4.64)

It is easy to see that E[Oi] < ∞ if and only if E[Fi(Pi−)] < 1, which is weaker thancondition (4.55) for identical processing times.

Comparisons Between Identical and Independent Processing Times

We have discussed the difference between the cases of (i) identical and (ii) indepen-dent processing times briefly in an intuitive sense. Now we attempt to make somemore comparisons in terms of the occupying time. We first show that the overalloccupying time of a job in Case (i) tends to be longer than that in Case (ii) in thefollowing proposition.

Proposition 4.1. Denote the expected occupying time E[Oi] of job i by E1[Oi] inCase (i) and by E2[Oi] in Case (ii). If Pi does not degenerate in the support of Fi(t),then E1[Oi]> E2[Oi].

Proof. Let

Hi(t) =∫ t

0[1−Fi(y)]dy+νiFi(t−). (4.65)

Then, by Theorem 4.8 and (4.64),

E1[Oi] = E[

Hi(Pi)

1−Fi(Pi−)

]and E2[Oi] =

E[Hi(Pi)]

E[1−Fi(Pi−)]. (4.66)

Let f (x) = 1−Fi(x−), g(x)=Hi(x)/[1−Fi(x−)]. It is easy to see that f (x) is strictlyincreasing and g(x) is strictly decreasing in the support of Fi. Let µ = E[ f (Pi)] anda = supx : f (x) > µ. Then f (x) > µ for x > a and f (x) ≤ µ for x < a. Hence[ f (x)−µ ]g(x) ≤ [ f (x)−µ ]g(a) for all x ≥ 0 and the inequality holds strictly in thesupport of Fi. It follows that, provided Pi does not degenerate in the support of Fi,

E[ f (Pi)g(Pi)]−E[ f (Pi)]E[g(Pi)] = E[( f (Pi)− µ)g(Pi)]< E[( f (Pi)− µ)g(a)]= E[ f (Pi)− µ ]g(a) = (µ − µ)g(a) = 0.

This shows

E[Hi(Pi)] = E[(1−Fi(Pi−))

Hi(Pi)

1−Fi(Pi−)

]< E[1−Fi(Pi−)]E

[Hi(Pi)

1−Fi(Pi−)

],

which is equivalent to E2[Oi]< E1[Oi] by (4.66).

Another interesting difference between the two cases lies in the impact of abreakdown on the remaining occupying time. Let us now compare the expectedremaining occupying time of job i given that a breakdown occurs before the jobis completed (counted from the time that the machine resumes its operation), thatis, E[Oi −Yi1 −Zi1|Pi >Yi1], with the unconditional expected occupying time E[Oi].The next proposition shows that E[Oi −Yi1 −Zi1|Pi > Yi1] is generally greater thanE[Oi] in the case of identical processing times.

Proposition 4.2. In the case of identical processing times, under condition (4.55),E[Oi −Yi1 −Zi1|Pi > Yi1] ≥ E[Oi] and the strict inequality holds as long as Pi doesnot degenerate in the support of Fi(t).

Proof. Similar to the proofs of (4.53) and Theorem 4.8, we can show that

E[Ni(t)|Yi1 < t] =1

1−Fi(t−), (4.67)

E

[Ni(t)

∑k=2

Yik

∣∣∣ Ni(t) = n,Yi1 < t

]=

n

∑k=2

E[Yik|Yik < t] = (n− 1)E[Yi2|Yi2 < t]

=n− 1

Fi(t−)

∫

[0,t)ydFi(y)

and

E

[Ni(t)

∑k=2

Zik

∣∣∣ Ni(t) = n,Yi1 < t

]= (n− 1)E[Zi2] = (n− 1)νi.

Thus, by the law of iterated expectation together with (4.67),

E

[Ni(t)

∑k=2

(Yik +Zik)∣∣∣ Yi1 < t

]= E

[(Ni(t)− 1

)(∫

[0,t) ydFi(y)

Fi(t−)+νi

) ∣∣∣ Yi1 < t

]

=

(1

Fi(t−)

∫

[0,t)ydFi(y)+νi

)(1

1−Fi(t−)− 1)

=1

1−Fi(t−)

(∫


). (4.68)

It follows from (4.68) and (4.59) that

E[(Oi −Yi1 −Zi1)I(Pi>Yi1)|Pi = t

]= E

[(Oi −Yi1 −Zi1)I(t>Yi1)|Pi = t

]

= E[Oi −Yi1 −Zi1

∣∣Pi = t > Yi1]

Pr(Yi1 < t)

= E

[t +

Ni(t)

∑k=2

(Yik +Zik)

∣∣∣∣∣Pi = t > Yi1

]Fi(t−)

= Fi(t−)

t +

11−Fi(t−)

(∫


)

= Fi(t−)

1

1−Fi(t−)

(∫ t

0[1−Fi(y)]dy+νiFi(t−)

)

= Fi(t−)Hi(t),

where Hi(t) = Hi(t)/(1−Fi(t−)), with Hi(t) defined in (4.65). Consequently,

E[Oi −Yi1 −Zi1|Pi >Yi1] =E[(Oi −Yi1 −Zi1)I(Pi>Yi1)]

Pr(Pi > Yi1)=

E[Fi(Pi−)Hi(Pi)]

E[Fi(Pi−)].

(4.69)

On the other hand, by (4.54) we have E[Oi] = E[Hi(Pi)]. Comparing it with (4.69)we can see that E[Oi −Yi1 −Zi1|Pi > Yi1]≥ E[Oi] if and only if

E[Fi(Pi−)Hi(Pi)]≥ E[Fi(Pi−)]E[Hi(Pi)]. (4.70)

As Fi(t−) and Hi(t) are nondecreasing and strictly increasing in the support of Fi(t),by a similar argument as in the proof of Proposition 4.1, the inequality in (4.70)is valid for any nonnegative random variable Pi, with the strict inequality holdingprovided Pi does not degenerate in the support of Fi(t).

For the case of independent processing times, similar arguments as above (byconditioning on Pi and Ti) show that E[Oi −Yi1 −Zi1|Pi > Yi1] = E[Oi]. Therefore,the expected remaining occupying time of a job increases after a breakdown in thecase of identical processing times, but remains the same in the case of independentprocessing times.

4.3.2 Minimizing the Expected Weighted Flowtime

Consider the problem of minimizing the expected weighted flowtime:

EWFT (π) = E

[n

∑i=1

wiCi(π)]=

n

∑i=1

wiE[Ci].

Since the completion time of job i can be expressed by Ci = Ci(π) = ∑i∈Bi(π) Oi,where Bi(π) is the set of jobs sequenced no later than job i under π , Theorem 4.1remains valid if E[Oi] are finite and calculated under the total-loss model.

Thus we have the following results.

Theorem 4.9. Under the total-loss machine breakdown model:(i) The expected weighted flowtime EWFT (π) is finite if and only if (4.55) is sat-

isfied for all i in the case of identical processing times between breakdowns; orE[Fi(Pi−)]< 1 for all i in the case of independent processing times.

(ii) In the case of identical processing times, if (4.55) holds for all i, then theoptimal static policy to minimize EWFT (π) is to process jobs in nondecreasingorder of E[Oi]/wi with zero idle times, where E[Oi] is given by (4.54).

(iii) In the case of independent processing times, if E[Fi(Pi−)]< 1 for all i, then theoptimal static policy to minimize EWFT (π) is to process jobs in nondecreasingorder of E[Oi]/wi with zero idle times, where E[Oi] is given (4.64).

The following are some applications of Theorem 4.9 in the case of identical pro-cessing times.

Example 4.1. Exponentially distributed uptimesAn important case for the uptime distribution is the exponential distribution, whichis often considered in the literature. In this case, let 1/βi denote the mean of Yik,i = 1, . . . ,n; k = 1,2, . . . . Then we have 1−Fi(t) = e−βit , so that

∫

[0,t)[1−Fi(y)]dy =

∫ t

0e−βiydy =

1βi(1− e−βit).

Substituting these into (4.54), we obtain

E[Oi] = E

[eβiPi

(1− e−βiPi

βi+νi(1− e−βiPi)

)]=

(1βi

+νi

)(E[eβiPi ]− 1

).

(4.71)

Consequently, when the uptimes Yik are exponentially distributed with mean 1/βi,and E[eβiPi ] < ∞, i = 1,2, . . . ,n, the optimal sequence minimizing the expectedweighted flowtime is in nondecreasing order of E[Oi]/wi, with E[Oi] givenby (4.71).

If Pi is also exponentially distributed with mean 1/ηi and ηi > βi, then

E[eβiPi ]− 1 =ηi

ηi −βi− 1 =

βi

ηi −βi.

Therefore, if Yik and Pi are exponentially distributed with means 1/βi and 1/ηi,respectively, and ηi > βi, i = 1,2, . . . ,n, then the optimal sequence minimizing theexpected weighted flowtime is in nondecreasing order of

1+βiνi

wi(ηi −βi), i = 1, . . . ,n

.

Example 4.2. Uniform uptimes and processing timesSuppose that the uptimes Yik and the processing times Pi are uniformly distributedover the intervals [0,ui] and [0, pi], respectively, with 0 < pi < ui, j = 1, . . . ,n. Thiscorresponds to the case where we only know the upper bounds for the uptimes andprocessing times. In such a case, Fi(t−) = t/ui for 0 < t < ui, hence 0 < pi < uiimplies

E[

11−Fi(Pi−)

]= E

[ui

ui −Pi

]=

1pi

∫ ui

0

ui

ui − xdx =

ui

piln

ui

ui − pi< ∞.

The condition pi < ui, i.e., the upper bound of the processing time for job i is lessthan that of the uptime, is necessary and sufficient for the above expectation to befinite (that ensures the problem to be well posed). Assume this basic condition holds.Then it is easy to calculate, by (4.54),

E[Oi] = E[

11−Fi(Pi−)

∫ Pi

0(1−Fi(y))dy+

νiFi(Pi−)

1−Fi(Pi−)

]

= E[

ui

ui −Pi

∫ Pi

0

(1− y

ui

)dy]+νiE

[Fi(Pi−)

1−Fi(Pi−)

]

= E[

ui

ui −Pi

(Pi −

P2i

2ui

)]+νiE

[ ui

ui −Pi− 1]

=1

2pi

∫ pi

0

(x− ui+

u2i

ui − x

)dx+νi

(∫ pi

0

uidxui − x

− 1)

=pi

4− ui

2+

u2i

2piln

ui

ui − pi+νi

(ui

piln

ui

ui − pi− 1).

Consequently, the optimal sequence to minimize the EWFT follows the nondecreas-ing order of E[Oi]/wi, with E[Oi] given above.

Example 4.3. A problem with periodical inspectionThis example represents the problem with regular maintenance checkup and repair,which often occurs in practice, and can be described as follows: After startingprocessing a job, the machine is checked periodically to monitor its condition.The check determines whether the machine needs to be shut down for repair, butthe check itself does not interrupt the processing. If a shutdown is necessary, thejob will have to start over again after the machine resumes its operation; otherwisethe processing continues without interruption. The probability that a shutdown isnecessary, as well as the period between two consecutive checks, are job dependent,due to different impacts/burdens to the machine created by the job being processed.More specifically, when job i is being processed, the machine undergoes a checkevery bi units of time, and there is a probability θi (0 < θi < 1) at each check thatthe machine has to be shut down. Other than these possible shutdowns, the machineworks continuously. The problem is to determine the optimal sequence to processthe jobs so as to minimize the EWFT.

In this case, a breakdown occurs whenever a check determines to shut down themachine, which is preemptive-repeat, while the repair time represents the downtime.Under the settings described above, the uptime to process job i is a discrete randomvariable with masses at mbi and Pr(Yik = mbi) = θi(1− θi)m−1, m = 1,2, . . . . Itfollows that Fi(x) = 0 for x < bi, and for mbi ≤ x < (m+ 1)bi, m = 1,2, . . . ,

Fi(x) =m

∑i=1

θi(1−θi)i−1 = θi

1− (1−θi)m

1− (1−θi)= 1− (1−θi)

m (4.72)

Let mi = mi(x) satisfy mibi < x ≤ (mi + 1)bi. Then by (4.72),

1−Fi(x−) = (1−θi)mi(x). (4.73)

Furthermore, given Pi = x, let m = mi(x). Then by (4.72),

∫ x

0[1−Fi(y)]dy =

m−1

∑i=0

(1−θi)ibi +(1−θi)

m(x−mbi)

=bi

θi+(1−θi)

m(

x−mbi−bi

θi

).

Substituting this and (4.73) into (4.54), we get

E[Oi|Pi = x] =1

(1−θi)m

bi

θi+(1−θi)

m(

x−mbi−bi

θi

)+νi(1− (1−θi)

m)

=1

(1−θi)m

(bi

θi+νi

)+ x−mbi−

bi

θi−νi.

= x−mi(x)bi +bi +νiθi

θi[(1−θi)

−mi(x)− 1]. (4.74)

Now, by (4.73) and Theorem 4.15, E[Oi]< ∞ if and only if

E[(1−θi)−mi(Pi)]< ∞, or equivalently, E[(1−θi)

−Pi/bi ]< ∞. (4.75)

Assume that condition (4.75) holds for j = 1,2, . . . ,n. Then by (4.74) and the lawof iterated expectation, we get

E[Oi] = E[Oi] = E[Pi −mi(Pi)bi]+bi +νiθi

θiE[(1−θi)

−mi(Pi)− 1]. (4.76)

Therefore, for the maintenance problem as described above, the optimal sequenceto minimize the EWFT, by Theorem 4.15, should follow the nondecreasing orderof E[Oi]/wi, where E[Oi] is given by (4.76). Moreover, since the distribution ofmi(Pi) is given by

Pr(mi(Pi) = m) = Prmbi < Pi ≤ (m+ 1)bi, m = 0,1,2, . . . ,

E[Oi] can also be calculated by

E[Oi] = E[Pi]+∞

∑m=0

[bi +νiθi

θi(1−θi)m −mbi

]Prmbi < Pi ≤ (m+ 1)bi−

bi +νiθi

θi.

Let us now look at some special cases of Example 4.3.

Case 1. Uniform processing times. Let Pi be uniformly distributed over (0,Mbi) forsome integer M > 0. Then

Prmbi < Pi ≤ (m+ 1)bi=bi

Mbi=

1M, m = 0,1,2, . . . ,M − 1.

Hence by either (4.75) or (4.76),

E[Oi] =Mbi

2−

M−1

∑m=1

mbi

M+

bi +νiθi

θi

[M−1

∑m=0

(1−θi)−m

M− 1

]

=Mbi

2− M − 1

2bi +

bi +νiθi

θi

[1M

(1−θi)−M − 1(1−θi)−1 − 1

− 1]

=bi

2+

bi +νiθi

θi

[1− (1−θi)M

Mθi(1−θi)M−1 − 1].

Case 2. Small bi. If the check is made frequently so that bi is relatively small, thenmi(x)bi ≈ x. Hence by (4.75), E[Oi] can be approximated by

E[Oi]≈bi +νiθi

θiE[(1−θi)

−Pi/bi − 1]. (4.77)

Case 3. bi → 0 but θi/bi remains stable. Note that frequent checks should result ina small chance to shut down the machine at each check. Let θi = βibi and bi → 0,where βi is a constant. Then by (4.77),

E[Oi]≈(

bi

βibi+νi

)E[(1−βibi)

−Pi/bi − 1]−→(

1βi

+νi

)E[eβ Pi − 1],

which is the same as (4.71) with exponential uptimes. Thus exponential uptimes canbe regarded as a limiting case of the maintenance problem in Example 4.3.

In the case of independent processing times, specific results for the above threeexamples can be similarly derived from Theorem 4.9.

In Example 4.1 where Yik are exponentially distributed with mean 1/βi, similarto (4.71), we can see that (4.64) reduces to

E[Oi] =1

E[e−βiPi ]

(1βi

E[1− e−βiPi ]+νiE[1− e−βiPi ]

)

=

(1βi

+νi

)1−E[e−βiPi ]

E[e−βiPi ].

Thus the optimal sequence minimizing the EWFT is in nondecreasing order ofE[Oi]/wi, with E[Oi] given above.

In Example 4.2 with uniformly distributed processing times, it is not difficult toshow that

E[Oi] =2ui

2ui − pi

pi

2− p2

i

6ui+

νi pi

2ui

,

which is finite as long as pi < 2ui.In Example 4.3, let ρi = E[(1−θi)mi(Pi)]. Then

E[Oi] =

(bi

θi+νi

)1−ρi

ρi+

1ρi

E[(1−θi)mi(Pi)(Pi −mi(Pi)bi)].

which is finite provided 0 < θi < 1.

Remark 4.6. Previous results on scheduling problems with preemptive-repeat (total-loss) machine breakdowns were largely restricted to the case of exponential uptimes.The results presented in this section, on the other hand, allow a general distributionfor the uptimes. This broad coverage allows one to handle a variety of interestingcases, as illustrated by Examples 4.2 and 4.3.

4.3.3 Maximizing the Expected Discounted Reward

The expected discounted reward (EDR) of completing all jobs is defined by

EDR(π) = E

[n

∑i=1

wie−rCi(π)

]=

n

∑i=1

wiE[e−rCi ], (4.78)

where wi represents the reward received for completing job i, and r > 0 is thediscount rate. The problem is to determine an optimal sequence π∗ to maximizeEDR(π) with respect to sequence π .

As pointed out by Rothkopf and Smith (1984), there are two basic classes ofdelay (or holding) costs for the jobs, linear and exponential. The weighted flowtimebelongs to the first class, which assumes that there is no change in the value ofmoney (cost) over time. The EDR measure, on the other hand, belongs to the secondclass, which considers the time value of money. Note that e−rt represents the presentvalue of a unit payment at a future time t. Thus, if wi represents the rate of cost perunit of time for job i, and the job is completed at time Ci, then the present value ofthe cost for job i is given by


∫ Ci

0wie−rtdt =

1r

wi(1− e−rCi).

Consequently, maximizing EDR(π) is equivalent to minimizing the total expectedcost for all jobs. Rothkopf (1966a, b) considered problems with deterministic andrandom processing times, respectively, under this class of exponential measures (butwithout machine breakdowns).

It is also interesting to note that

limr→0

1−EDR(π)r

= limr→0

1r

E

[1−

n

∑i=1

wie−rCi

]=

n

∑i=1

wiE[

limr→0

1r(1− e−rCi)

]

=n

∑i=1

wiE[

limr→0

∫ Ci

0e−rtdt

]=

n

∑i=1

wiE[Ci] = EWFT (π).

Thus the problem of minimizing the expected weighted flowtime EWFT (π) isequivalent to the limit of the problem to maximize the expected discounted rewardEDR(π) as r → 0 (no discount).

To find the optimal sequence to maximize EDR(π), we first derive the formulaefor the Laplace transform of the occupying time.

Laplace Transform of the Occupying Time

The Laplace transform of the occupying time Oi of job i is defined by E[e−rOi ] as afunction of r. The following theorem delivers the formulae for E[e−rOi ].

Theorem 4.10. Let Fi(x) and Gi(x) denote the cdf’s of the uptimes Yik and thedowntimes Zik, respectively.(i) In the case of identical processing times between machine breakdowns,

E[e−rOi ] = E

[e−rPi(1−Fi(Pi−))

1−∫[0,Pi)

e−rydFi(y)∫ ∞

0 e−rzdGi(z)

](4.79)

(ii) In the case of independent processing times between machine breakdowns,

E[e−rOi ] =E[e−rPi(1−Fi(Pi−))

]

1−E[e−rZi

]E[∫

(0,Pi)e−rydFi(y)

] . (4.80)

Proof. By (4.51), (4.52), and the independence between Pi, Yik and Zik,

E[e−rOi |Pi = x] = E

[exp− r(

x+Ni(x)

∑k=0

(Yik +Zik)

)]

= e−rx∞

∑m=0

E

[exp− r

m

∑k=0

(Yik +Zik)

∣∣∣∣Ni(x) = m

]Pr(Ni(x) = m)

= e−rx∞

∑m=0

m

∏k=0

E[e−r(Yik+Zik)|Yik < x

]Pr(Ni(x) = m)

= e−rx∞

∑m=0

E[e−r(Yi+Zi)|Yi < x

]mPr(Ni(x) = m)

= e−rx∞

∑m=0

ωmi (x)F

mi (x−)[1−Fi(x−)] = e−rx 1−Fi(x−)

1−ωi(x)Fi(x−)

=e−rx(1−Fi(x−))

1−∫[0,x) e−rydFi(y)

∫ ∞0 e−rzdGi(z)

, (4.81)

where

ωi(x) = E[e−r(Yi+Zi)|Yi < x

]= E

[e−rYi |Yi < x

]E[e−rZi

]

=1

Fi(x−)

∫

[0,x)e−rydFi(y)

∫ ∞

0e−rzdGi(z) <

1Fi(x−)

.

Consequently, (4.79) follows from (4.81) and E[e−rOi ] = E[E[e−rOi |Pi]] by the lawof iterated expectation. This proves Part (i).

For Part (ii), by (4.60)–(4.62) and the assumptions on Pik, Yik and Zik,

E[e−rOi ] = E

[exp− r(

Pi,Ti +Ti

∑k=0

(Yik +Zik)

)]

=∞

∑m=0

E

[exp− r(

Pim +m

∑k=0

(Yik +Zik)

)∣∣∣∣Ti = m

]Pr(Ti = m)

=∞

∑m=0

E[e−rPim

∣∣Yi,m+1 ≥ Pim] m

∏k=1

E[e−r(Yik+Zik)

∣∣Yik < Pi,k−1]

Pr(Ti = m)

=∞

∑m=0

E[e−rPi

∣∣Yi ≥ Pi]

E[e−r(Yi+Zi)

∣∣Yi < Pi]m

ρmi (1−ρi)

= ηi

∞

∑m=0

ωmi ρm

i (1−ρi) = ηi1−ρi

1− ωiρi, (4.82)

where

ηi = E[e−rPi

∣∣Yi ≥ Pi]=

E[e−rPi(1−Fi(Pi−)

]

Pr(Yi ≥ Pi), (4.83)

ρi = E[Fi(Pi−)] = Pr(Yi < Pi), and

ωi = E[e−r(Yi+Zi)

∣∣Yi < Pi]= E

[e−rYi

∣∣Yi < Pi]E[e−rZi

]

=E[e−rZi

]

Pr(Yi < Pi)E[∫

[0,Pi)e−rydFi(y)

]. (4.84)

Substituting (4.83) and (4.84) into (4.82), we obtain (4.80).

The formulae for the Laplace transform of the occupying time in Theorem 4.16allow general distributions for the processing times and the breakdown processes.Further specific formula can be obtained with particular distributions. Following aresome examples.

Example 4.4. If Yik and Zik are exponentially distributed with means 1/βi and 1/γirespectively, then Fi(t−) = Fi(t) = 1− e−βit and Gi(t) = 1− e−γit . Hence

∫

[0,t)e−rydFi(y) =

∫ t

0βie−(βi+r)ydy =

βi

βi + r[1− e−(r+βi)t ]

and ∫ ∞

0e−rzdGi(z) =

∫ ∞

0γie−(γi+r)ydy =

γi

γi + r.

Substituting these into (4.79) and (4.80), we get

E[e−rOi ] =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(r+βi)(r+ γi)E[e−(r+βi)Pi ]

r(r+βi + γi)+βiγiE[e−(r+βi)Pi ]for identical processing times;

E[

(r+βi)(r+ γi)

r(r+βi + γi)e(r+βi)Pi +βiγi

]for independent processing times.

Example 4.5. Suppose that Yik have a Gamma density function

f (y;αi,βi) =1

Γ (αi)β αi

i yαi−1e−βiy.

Let IGα(·) denote the incomplete Gamma function defined by

IGα(t) =1

Γ (α)

∫ t

0xα−1e−xdx for t ≥ 0 (α > 0).

Then E[I(Pi ≤ Yi)|Pi] = Pr(Pi ≤ Yi|Pi) = 1− IGαi(βiPi) and so

E[e−rPi I(Pi ≤ Yi)

]= E

[e−rPiE [I(Pi ≤ Yi)|Pi]

]= E

[e−rPi(1− IGαi(βiPi))

].


On the other hand,

E[e−rYi I(Pi > Yi)

]= E

[E[e−rYi I(Pi > Yi)|Pi

]]=

β αii

Γ (αi)E[∫ Pi

0e−ryyαi−1e−βiydy

]

=

(βi

r+βi

)αi

E[IGαi((r+βi)Pi)].

Thus formula (4.80) becomes

E[e−rOi ] =E[e−rPi(1− IGαi(βiPi))

]

1−(βi/(r+βi)

)αiE [e−rZi ]E [IGαi((r+βi)Pi)].

Example 4.6. If αi in Example 4.5 is an integer, then Yi has an Erlang distribution.In such a case the cumulative distribution function can be expressed in an analyticform by partially integrating the incomplete gamma function to obtain

IGαi(x) = 1− e−xαi−1

∑j=0

1j!

x j for x ≥ 0.

Substituting this expression into the result of Example 4.5, we get

E[e−rOi ] =1

1−B

αi−1

∑j=0

1j!

β ji E[P j

i e−(r+βi)Pi],

where

B =

(βi

r+βi

)αi

E[e−rZi

](

1−αi−1

∑j=0

1j!(r+βi)

jE[P j

i e−(r+βi)Pi]).

Optimal Static Policy

With the results in Theorem 4.10, it is easy to derive the following results for theoptimal static policy to maximize the expected discounted reward ERD(π).

Theorem 4.11. The optimal sequence π∗ that maximizes the expected discountedreward EDR(π) in (4.78) follows nonincreasing order of wi fi/(1− fi), wherefi = E[e−rOi ] is given by (4.79) in the case of identical processing times, or by (4.80)in the case of independent processing times.

Proof. The completion time of job i can be expressed by

Ci =Ci(π) = ∑i∈Bi(π)

Oi. (4.85)


Hence by the independence between jobs,

E[e−rCi ] = E

[

∏k∈Bi

e−rOk

]= ∏

k∈Bi

E[e−rOk ] = ∏k∈Bi

fk.

It follows that

EDR(π) =n

∑i=1

wiE[e−rCi ] =n

∑i=1

wi ∏k∈Bi

fk. (4.86)

Given any sequence π = . . . , i, j, . . ., take π ′ = . . . , j, i, . . . to be the sequencewith the same order as π except that the order of jobs i and j are interchanged. LetB∗(π) = Bi(π)− i= Bi(π ′)− j denote the set of jobs sequenced before job iunder π (or before job j under π ′). Then by (4.86) and a similar argument to theproof of Theorem 3.8, it is easy to show that

EDR(π)−EDR(π ′) = (1− fi)(1− f j)

(wi fi

1− fi−

wj f j

1− f j

)∏

k∈B∗(π)fk.

Thus to maximize EDR(π), the optimal π∗ should sequence job i ahead of job j ifand only if wi fi/(1− fi) ≥ wj f j/(1− f j), that is, π∗ is in nonincreasing order ofwi fi/(1− fi). Theorem 4.11 then follows from Theorem 4.10.

Equivalent Scheduling Problems

In addition to the interpretation of discounted value of money, the EDR function isalso shown to be equivalent to some other important scheduling problems, includingthe following two cases that involve due dates associated with the jobs. HenceTheorem 4.11 can solve the scheduling problems in such cases as well.

Case 1. Expected weighted number of tardy jobs (EWNT)Suppose that each job i is subject to a due date Di, i = 1, . . . ,n, where D1, . . . ,Dnare randomly drawn from an exponential distribution with mean 1/δ , independentof Pik and Yik,Zik. The weighted number of tardy jobs can be expressed as∑n

i=1 wiICi(π)>Di. It is easy to show that

E[ICi(π)>Di] = Pr(Ci(π)> Di) = 1−Pr(Di ≥Ci(π)) = 1−E[e−δCi(π)].

Hence by letting r = δ , the problem of maximizing EDR becomes equivalent tominimizing the expected weighted number of tardy jobs

EWNT (π) = E

[n

∑i=1

wiICi(π)>Di

]=

n

∑i=1

wiE[(

1− e−δCi)]

. (4.87)


Note that a due date Di may represent the timing of some uncertain event thatis important to the scheduling decisions. One such example is the service problemwith a number of customers. While customer i is waiting for service, he may leaveat some time point Di. In such a scenario, the potential profit wi that may be earnedfrom serving this customer is lost. Another example is the arrival time of a trans-porter to deliver the finished jobs, which may be uncertain to the manufacturer; see,e.g., the problem in Case 2 below. As it is well known, exponential distributions arecommonly used to model random events with a high level of uncertainty.

According to Theorem 4.11, the optimal sequence to minimize the expectedweighted number of tardy jobs EWNT (π) in (4.87) is in nonincreasing order ofwi fi/(1− fi), where fi = E[e−δOi ] are given by (4.79) or (4.80), with r = δ , inthe cases of identical or independent processing times, respectively.

Case 2. Scheduling jobs with delayed deliverySuppose that there is a common and exponentially distributed due date D, whichrepresents the arrival time of a transporter to deliver the completed jobs. In addition,assume that each completed job i has a delaying time Bi before it is ready for deliv-ery, so that job i can only be delivered on or after Ci+Bi. This delaying time Bi mayrepresent, e.g., packaging time, cooling time, or other extra handling time. We allowBi to be either deterministic or stochastic with arbitrary probability distributions(but independent of job processing and machine breakdowns). In such a scenario,job i can be picked up by the transporter if and only if Ci +Bi ≤ D. If the job missesthe transporter, it will have to be delivered by an alternative, much more expensivetransportation means, at an extra cost wi. Consider the problem of minimizing theexpected weighted probability of missing the transporter:

EWP(π) =n

∑i=1

wi Pr(Ci(π)+Bi > D). (4.88)

Since

Pr(Ci +Bi > D) = 1−Pr(D >Ci +Bi) = 1−E[e−δ (Ci+Bi)

]

= 1−E[e−δBi]E[e−δCi

]= 1− biE[e−δCi ],

where bi =E[e−δBi], the problem is equivalent to maximizing E[∑ni=1 w′

ie−δCi(π)] with

w′i = wibi. Consequently, the problem of minimizing the TWB(π) in (4.88) with

delayed delivery is also a special case of the problem formulated by (4.78).

According to Theorem 4.11, when jobs are subject to a common exponential duedate D with mean 1/δ and delayed delivery times Bi, the optimal sequence thatminimizes the total weighted probability of missing the transporter, the EWP(π)in (4.88), is in nonincreasing order of wibi fi/(1− fi), where fi = E[e−δOi ] aregiven by (4.79) or (4.80) in the cases of identical or independent processing times,respectively, with r = δ and bi = E[e−δBi ].


Remark 4.7. We noted earlier in this subsection that the problem of minimizing theexpected weighted flowtime EWFT (π) is equivalent to the limit of the problem tomaximize EDR(π) as r → 0. According to Theorem 4.11, the optimal sequence π∗

to maximize EDR(π) follows nonincreasing order of wi fi/(1− fi). It is easy tosee that

limr→0

1− fi

r fi= lim

r→0

1E[e−rOi ]

limr→0

1−E[e−rOi ]

r= lim

r→0E[

1− e−rOi

r

]= E[Oi].

This shows that, as r → 0, the nonincreasing order of wi fi/(1− fi) is equivalentto the nondecreasing order of E[Oi]/wi. Thus Theorem 4.9 can be regarded as thelimiting case of Theorem 4.11 as r → 0.

Extensions and Applications

In many scheduling problems, it is required to evaluate E[ fi(Ci)], where fi(·)is a given cost function. We have treated the cases of fi(x) = wix (linear) andfi(x) = wie−rx (exponential). Other forms of fi(·) are generally more difficult, andrequire the probability distribution of the completion time Ci. One approach is viathe Laplace transform E[e−rOi ] of the occupying time Oi. The distribution of Oi maybe obtained by inverting its Laplace transform (which generally requires numericalintegration). Then the distribution of Ci may be obtained by convolution, as Ci is asum of Ok over k ∈ Bi(π).

Theorem 4.10 has provided the formula for E[e−rOi ]. The moments of Oi can becalculated from E[e−rOi ] by

E[Oki ] = (−1)k dk

drk E[e−rOi ]∣∣∣r=0

, k = 1,2, . . . ,

provided that the derivatives exist at r = 0.To illustrate the applications of the approach, we present two examples for the

case of identical processing times: one is to minimize the expected weighted squaredflowtime (EWSFT), and the other is a problem with machine maintenance checkupand repair (MCAR).

The EWSFT problem: This problem is to minimize

EWSFT (π) = E

[n

∑i=1

wiC2i

]=

n

∑i=1

wiE[C2i ].

For ease of presentation, we first consider the case with deterministic Pi and Zik.Let the uptimes Yik be exponentially distributed with mean 1/βi and the downtimesZik = zi be deterministic values. Then by (4.79), the Laplace transform of Oi is


E[e−rOi ] =e−rPie−βiPi

1− e−rzi∫[0,Pi)

e−ryβie−βiydy=

(βi − r)e−(r+βi)Pi

r+βi +βie−rzi(e−(r+βi)Pi − 1

) .

By differentiating E[e−rOi ] with respect to r, it is not difficult (though a bit tedious)to calculate

E[Oi] =− ddr

E[e−rOi ]∣∣∣r=0

=1+βizi

βi(eβiPi − 1), (4.89)

E[O2i ] =

d2

dr2 E[e−rOi ]∣∣∣r=0

= 2(1+βizi)2

β 2i

(eβiPi − 1)2 + z2i (e

βiPi − 1)

+2(1+βizi)(1−βiPi)

β 2i

(eβiPi − 1)− 2Pi1+βizi

βi. (4.90)

Then E[C2i ] can be calculated by expanding the square of Ci = ∑k∈Bi(π) Ok and the

results in (4.89) and (4.90).If Zik and Pi are random variables, we only need to take the expectations with

respect to zi = Zi1 and Pi in (4.89) and (4.90).As long as E[C2

i ], and consequently E[∑wiC2i ], can be computed with respect to

any given sequence π , one can use branch-and-bound methods, or general-purposeapproaches that require only information on the objective function values (such asgenetic algorithms, or simulated annealing), to solve the problem. In particular, theresults obtained above can be used to establish some conditions such that an analyt-ical solution is available. One example is given in the proposition below.

Proposition 4.3. If the following agreeable conditions are satisfied:

E[Oi]≤ E[Ok] ⇐⇒ E[O2i ]≤ E[O2

k] ⇐⇒ wi ≥ wk, (4.91)

where E[Oi] and E[O2i ] are given by (4.89) and (4.90), then the sequence in

nondecreasing order of E[Oi] is optimal to minimize EWSFT (π).

Proof. Let π = · · · , j,k, · · · , π ′ = · · · ,k, j, · · · , C∗ =Ci(π)−Oi =Ck(π ′)−Ok.If the inequalities in (4.91) hold, then

EWSFT (π)−EWSFT (π ′) = wiE[(C∗+Oi)2]+wkE[(C∗+Oi +Ok)

2]

−wkE[(C∗+Ok)2]−wiE[(C∗+Ok +Oi)

2]

= wiE[(C∗+Oi)2]−E[(C∗+Ok)

2]+(wi −wk)E[(C∗+Ok)

2]−E[(C∗+Oi +Ok)2]

≤ wi2E[C∗](E[Oi]−E[Ok])+E[O2i ]−E[O2

k]≤ 0.

Thus E[Oi]≤ E[Ok] implies EWSFT (π)≤ EWSFT (π ′).

The MCAR problem: This problem is similar, but not identical, to Example 4.3. Foreach job i, a deterministic time of Pi is required to process it. Normally, the machineis scheduled to be shut down for a regular maintenance checkup and repair after

completing each job. There is, however, a possibility that an early check/repair maybe necessary before the current job is completed, which is indicated by a monitoringsystem. If that occurs, the job will have to start over again after the check/repair.Both the probability and the timing for the early check/repair are job dependent,due to different impacts/burdens on the machine created by the jobs being processed.Other than these checks/repairs, the machine works continuously.

In this case, the check/repair is considered as a breakdown, which causes a to-tal loss of work (preemptive-repeat). When processing job i, there is a probabilityθi (0 < θi < 1) that the machine would require an early check/repair before it iscompleted. Denote by Xik the timing for such an early check/repair, which is countedfrom the beginning of the k-th uptime while processing job i. Then 0 < Xik < Pi,k = 1,2, . . . , which are assumed to be i.i.d. random variables. The check/repairtimes, on the other hand, are equal to a fixed but job-dependent value zi, whichconstitute the downtimes.

Under these settings, conditional on Xik = x, the uptime Yik in processing job ihas two masses at x and Pi, with probabilities θi and 1−θi respectively. Hence, letHi(x) denote the cdf of Xik, then the (unconditional) cdf of Yik is given by

Fi(y) =∫ Pi

0Pr(Yik ≤ y|Xik = x)dHi(x) =

⎧⎪⎨

⎪⎩

0 if y < 0,θiHi(y) if 0 ≤ y < Pi,

1 if y ≥ Pi.

The Laplace transform of Oi can be calculated, by (4.79), as

E[e−rOi ] =e−rPi(1−θi)

1−θiE[e−rXi1 ]e−rzi=

∞

∑m=0

θ mi (1−θi)e−r(Pi+mzi)Em[e−rXi1 ]. (4.92)

Since Xik,k = 0,1,2, . . . are i.i.d., by inverse Laplace transform we can obtain thedistribution of Oi as follows: If Xi1 is a continuous random variable with densityhi(x), then Oi has a density

fOi (x) =∞

∑m=0

θ mi (1−θi)h∗m

i (x−Pi−mzi), (4.93)

where h∗mi (·) is the m-fold convolution of hi(·), and the density of Xi1 + · · ·+Xim.

Similarly, if Xi1 is discrete with probability mass function hi(x) = Pr(Xi1 = x), then

Pr(Oi = x) =∞

∑m=0

θ mi (1−θi)h∗m

i (x−Pi−mzi). (4.94)

The distribution given by (4.94) has masses at points Pi + mzi + x1 + · · ·+ xm(m = 0,1,2, . . . ), where x1, . . . ,xm are drawn with replacement from the masses ofXi1 (hence some or all of x1, . . . ,xm may coincide).

More explicit form for the distribution of Oi may be available with specific distri-bution of Xi1. For example, if Xi1 = xi is a deterministic value, then Em[eitXi1 ] = eitmxi .

4.4 Partial-Loss Breakdown Models 179

Hence by (4.92), E[e−rOi ] = ∑∞m=0 θ m

i (1− θi)e−r(Pi+mxi+mzi), which shows that Oihas masses at Pi +mxi +mzi with probabilities θ m

i (1− θi), m = 0,1,2, . . . . Thisresult is intuitive, as Pi+mxi+mzi is the occupying time of job i given that a total ofm breakdowns occurred before the job is completed, while θ m

i (1− θi) is the prob-ability that the (m+ 1)th uptime is the first one without interruption by an earlycheck/repair.

As another example, let Xi1 be normally distributed with mean µi and varianceσ2

i , where µi and σ2i are such that Pr(0 < Xi1 < Pi) = 1 can be considered as holding

for practical purpose (although theoretically it never holds exactly). Then

Em[e−rXi1 ] = (e−rµi− 12 r2σ 2

i )m = e−rmµi− 12 r2mσ 2

i ,

which is the Laplace transform of a normal distribution with mean mµi and vari-ance mσ2

i . Hence its corresponding density is h∗m(x) = φ((x−mµi)/√

mσi), whereφ(·) is the density of the standard normal distribution. It then follows from (4.93)that Oi has a density function

fOi(x) =∞

∑m=0

θ mi (1−θi)φ

(x−Pi−mµi −mzi√

mσi

).

We observe, however, that the distribution of Oi is quite complicated even in thesimplified situations where its explicit expression is available, such as the densityfOi (x) shown above, and hence the convolution required for the distribution of Ciis hardly obtainable in closed form. Nevertheless, the inverse Laplace transformand convolution can be computed by numerical methods for practical purposes, andTheorem 4.10 provides a theoretical basis to enable such numerical methods to beimplemented.

4.4 Partial-Loss Breakdown Models

In a partial-loss breakdown model, when the machine breaks down, the processingachievement is partially preserved and the job will be completed when the remainingprocessing requirement (equal to the processing time less preserved achievement) isfulfilled after the machine is fixed.

As introduced in Sect. 4.1, under the partial-loss model, if the processing of jobi is interrupted by the k-th breakdown at time tik = Yik +∑k−1

j=1(Yi j +Zi j) (from thestart of processing job i), then the processing achievement of job i at time tik isYik + vi,k−1 < Pik. When the k-th breakdown ends at time tik +Zik = ∑k

j=1(Yi j +Zi j),the processing achievement of job i makes a transition to a new achievement vik,where 0 ≤ vik ≤ Yik + vi,k−1. The two extreme cases vik = 0 and vik = Yik + vi,k−1correspond to the cases of no-loss and total-loss breakdowns, respectively, whereas0 < vik < Yik + vi,k−1 means a partial loss of work after a breakdown.

In a practical sense, the transition of processing achievement from Yik + vi,k−1 tovik can be made at any time between tik and tik +Zik (the start and end of the k-thbreakdown), which makes no difference in the effect of a transition. For the sake ofconvenience, however, we will consider the transition to be made at time tik + Zik(when the machine resumes to process job i after the k-th breakdown).

To account for the uncertainty in the partial loss of work at a breakdown, we usea probability distribution to model the transition between the values of processingachievements before and after a breakdown. More specifically, let Ui and Vi denotethe random variables representing the processing achievements of job i before andafter a breakdown, respectively. Then the transition from Ui to Vi is governed by aconditional cdf Qi(·,u) such that

Qi(v,u) = Pr(Vi ≤ v|Ui = u). (4.95)

In other words, Qi(v,u) is the conditional cdf of Vi given Ui = u.The use of Qi(v,u) may model any type of partial and uncertain work loss due to

machine breakdowns. Special cases include:

1. Uniform loss: Qi(v,u) = v/u (0 ≤ v ≤ u). This means that the loss is uniformlydistributed between total loss (v = 0) and no loss (v = u).

2. No loss: Qi(v,u) = Iv≥u. This conditional distribution is degenerate at u, withPr(Vi = u|Ui = u) = 1.

3. Total loss: Qi(v,u) = Iv≥0, which degenerates at 0, with Pr(Vi = 0|Ui = u) = 1.4. Fixed partial loss: Qi(v,u) = Iv≥(u−a)+, where (u− a)+ = max(u− a,0) with a

fixed value a > 0. In other words, a breakdown reduces the achievement u by a(a loss a) if u ≥ a; or to zero (total loss) if u < a.

5. Proportional loss: Qi(v,u) = Iv≥αu, which means Pr(Vi = αu|Ui = u) = 1,where 0 < α < 1. In other words, a breakdown causes a loss (1−α)u that isproportional to the achievement u before the breakdown.

In the first case above, given the achievement u immediately before a breakdown,it is uncertain what will be the achievement v when the machine works again, whichcan be any amount between 0 and u. Hence the loss u− v is partial and uncertain(equally likely between 0 and u). This example illustrates the modeling of partial anduncertain losses. Generally, any non-degenerate distribution Qi(·,u) models suchuncertainty, and an appropriate distribution function can be used to formulate theactual problem being considered. The other three, on the other hand, are specialcases with a certain loss given u; that is, zero in Case 2, u in Case 3, min(u,a) inCase 4, and (1−α)u in Case 5. Cases 2 and 3 model the traditional preemptive-resume (no-loss) and preemptive-repeat (total-loss) problems, respectively, whereasCases 4 and 5 are examples of partial loss.

Let Pi denote the amount of time required to complete job i on a reliable machinethat never breaks down. Then the initial processing time Pi1 has the same distributionas that of Pi. If the processing of job i is interrupted by the first breakdown, however,then the distribution of the next processing time Pi2 may depend on the processingachievement vi1 and the first uptime Yi1. More generally, after the processing of jobi is interrupted by the k-th breakdown, the distribution of the next processing time

Pi,k+1 may depend on previous vi1, . . . ,vik and Yi1, . . . ,Yik. We will consider twoscenarios for the distribution of Pi,k+1:(i) The distribution of Pi,k+1 depends only on the current achievement vik with

Pr(Pi,k+1 − v > x|vik = v) = Pr(Pi > x+ v|Pi > v) =Pr(Pi > x+ v)

Pr(Pi > v). (4.96)

(ii) The distribution of Pi,k+1 depends on the current achievement vik as well as themaximum achievement hik = maxYi1,vi1 +Yi2, . . . ,vi,k−1 +Yik in the process-ing history. In this case, we assume

Pr(Pi,k+1 − v > x|vik = v,hik = h) = Pr(Pi > x+ v|Pi > h)

=Pr(Pi > max(x+ v,h))

Pr(Pi > h). (4.97)

Note that (4.97) only specifies the distributional structure for Pi,k+1, and makes noassumption on the relationship between Pi1,Pi2, . . . . The assumption of identicalprocessing times implies (4.97) naturally, but that is not necessary for (4.97).Moreover, (4.96) can be considered as a special case of (4.97) with hik = vik,

The formulae in (4.96) and (4.97) give the distribution of the remaining process-ing requirement conditional on the current and maximum historic achievements. Theno-loss and total-loss models fall into special cases of (4.96) or (4.97) as follows:

1. In the no-loss model, Pi1 = Pi2 = · · · = Pi are identical and vik = Yi1 + · · ·+Yik.Hence hik = vik < Pik = Pi and

Pr(Pi,k+1 − v > x|vik = v) = Pr(Pi,k+1 > x+ v|Pik > v) = Pr(Pi > x+ v|Pi > v),

which coincides with (4.96), or a special case of (4.97) with h = v.2. In the total-loss model with identical processing times, Pi1 = Pi2 = · · · = Pi and

vik = 0. Hence hik = maxYi1, . . . ,Yik< Pik = Pi and

Pr(Pi,k+1 − v > x|vik = v,hik = h) = Pr(Pi,k+1 > x|Pik > h) = Pr(Pi > x|Pi > h),

which is a special case of (4.97) with v = 0.3. In the total-loss model with independent processing times, Pi1,Pi2, . . . are i.i.d.

with the same distribution as Pi, vik = 0, and hik = maxYi1, . . . ,Yik< Pik. Hence

Pr(Pi,k+1 − v > x|vik = v,hik = h) = Pr(Pi,k+1 > x|Pik > h)

= Pr(Pi,k+1 > x) = Pr(Pi > x),

which is a special case of (4.96) with v = 0.

Under the above settings for the partial-loss model, the expression of the comple-tion time in (4.85) still holds. Hence the optimal sequences in Theorems 4.9 and 4.11remain valid. We summarize the results below.

Theorem 4.12. Under the partial-loss machine breakdown model:(i) the optimal static policy to minimize the expected weighted mean flowtime

EWMFT (π) = E[∑ni=1 wiCi(π)] is to process jobs in nondecreasing order of

E[Oi]/wi with zero idle times.(ii) the optimal sequence to maximize the expected discounted reward EDR(π)

in (4.78) follows nonincreasing order of wi fi/(1− fi), where fi = E[e−rOi ].

The calculations of E[Oi] and E[e−rOi ], however, are much more difficult andcomplex in general under the partial-loss model, and explicit formulae are availableonly in certain simple cases. As an example, we consider the case of proportionalloss as described above with identical processing times.

In such a case, if Yi1 < Pi, then the processing of job i is interrupted at timeYi1 by the first breakdown with processing achievement Yi1. Then, at time Yi1 +Zi1when the machine works again, the achievement is reduced to vi1 = αYi1, where0 < α < 1, and the maximum historic achievement is hi1 = Yi1. If vi1 +Yi2 < Pi,then job i is interrupted again by the second breakdown at time Yi1 +Zi1 +Yi2 withprocessing achievement vi1 +Yi2 = αYi1 +Yi2 and maximum historic achievementhi2 = maxYi1,αYi1 +Yi2. Then at time Yi1 + Zi1 +Yi2 + Zi2, the achievement isreduced to vi2 = α(αYi1 +Yi2) = α2Yi1 +αYi2.

More generally, if job i is unfinished at time tik = ∑ki=1(Yi j +Zi j) (interrupted by

k breakdowns), then at time tik, the processing achievement and maximum historicalachievement of job i are, respectively,

vik =k

∑j=1

αk+1− jYi j and hik = max1≤ j≤k

(vi, j−1 +Yi j) (with vi0 = 0).

Job i will be completed at time Pi − vik + tik (i.e., after being interrupted by exactlyk breakdowns) if and only if hik < Pi and vik +Yi,k+1 ≥ Pi, or equivalently,

vi, j−1 +Yi j < Pi, j = 1, . . . ,k and vik +Yi,k+1 ≥ Pi. (4.98)

Define a counting process Ni(t) : t ≥ 0 by

Ni(t) = supk ≥ 0 : hik < t= sup

k ≥ 0 : vi, j−1 +Yi j < t,1 ≤ j ≤ k. (4.99)

Then (4.98) holds if and only if Ni(Pi) = k. Thus the occupying time of job i can beexpressed by

Oi = Pi − vi,Ni(Pi) + ti,Ni(Pi) = Pi −Ni(Pi)

∑j=1

αNi(Pi)+1− jYi j +Ni(Pi)

∑j=1

(Yi j +Zi j)

= Pi +Ni(Pi)

∑j=1

(1−αNi(Pi)+1− j

)Yi j +Zi j

. (4.100)

Let Fi(t) denote the cdf of Yi j. Then by (4.99), the joint distribution of(Yi1, . . . ,Yik,Ni(t)) is given by

Pr(Yi1 ≤ y1, . . . ,Yik ≤ yk,Ni(t) = l)

= PrYi1 ≤ y1, . . . ,Yik ≤ yk,vi, j−1 +Yi j < t, j = 1, . . . , l; vil +Yi,l+1 ≥ t

=

⎧⎪⎨

⎪⎩

Fi(yl+2) · · ·Fi(yk)∫

Al+1∩Bl(t)dFi(t1) · · ·dFi(tl+1) if l < k− 1,

∫

Ak∩Bl(t)dFi(t1) · · ·dFi(tl+1) if l ≥ k− 1,

(4.101)

where Al = (t1, . . . , tl) : t1 ≤ y1, . . . , tl ≤ yl and

Bl(t) =

⎧⎨

⎩(t1, . . . , tl+1) :

⎛

⎝t1 < t,αt1 + t2 < t,α2t1 +αt2 + t3 < t,

. . . ,α l−1t1 +α l−2t2 + · · ·+ tl < t,α lt1 +α l−1t2 + · · ·+αtl + tl+1 ≥ t

⎞

⎠

⎫⎬

⎭ . (4.102)

The marginal distribution of Ni(t) is given by

Pr(Ni(t) = l) =∫

Bl (t)dFi(t1) · · ·dFi(tl+1). (4.103)

It follows from (4.100) that the expected occupying time E[Oi] can be calculated by

E[Oi|Pi = t] = E

[Pi +

Ni(Pi)

∑j=1

(1−αNi(Pi)+1− j

)Yi j +Zi j

∣∣∣∣Pi = t

]

= E

[t +

Ni(t)

∑j=1

(1−αNi(t)+1− j

)Yi j +

Ni(t)

∑j=1

Zi j

]

= t +∞

∑l=1

l

∑j=1

βl jE[Yi j|Ni(t) = l]Pr(Ni(t) = l)+E[Zi1]E[Ni(t)] (4.104)

using the joint distribution of (Yi1, . . . ,Yik,Ni(t)) given by (4.101) and Pr(Ni(t) = l)in (4.103), where βl j = 1−α l+1− j, and then

E[Oi] = E[E[Oi|Pi]] =∫ ∞

0E[Oi|Pi = t]d Pr(Pi ≤ t)

using the distribution Pr(Pi ≤ t) of the processing time Pi.Following a similar procedure, we can calculate the Laplace transform E[e−rOi ]

of the occupying time by

E[e−rOi |Pi = t] = E

[exp− r(

t +Ni(t)

∑j=1

(1−αNi(t)+1− j

)Yi j +

Ni(t)

∑j=1

Zi j

)]

= e−rt∞

∑l=1

E

[exp− r

l

∑j=1

(βl jYi j +Zi j)

∣∣∣∣Ni(t) = l

]Pr(Ni(t) = l)

= e−rt∞

∑l=1

E[e−r(βl1Yi1+···+βllYil )|Ni(t) = l

](E[e−rZi1

])lPr(Ni(t) = l)

(4.105)

using (4.101), (4.103), and the distribution of Zi1, and then E[e−rOi ] = E[E[e−rOi |Pi]]using the distribution of Pi.

While the integrals in (4.101), and hence Eqs. (4.104) and (4.105), are still quitecomplicated due to the interrelationship between t1, t2, . . . , tl+1 in the region Bl(t)defined by (4.102), they can be carried out by standard techniques of calculus.

Alternatively, E[Oi] and E[e−rOi ] can be calculated by simulations. For example,we can first generate the values of Pi, Yi j and Zi j according to their distributions.Then calculate Oi using Pi,Yi j ,Zi j via (4.100). Repeat this simulation proceduremany times to generate replicates of Oi under the same condition. Then the averageof these replicates will give us an approximate value of E[Oi], which is accurate ifthe number of replicates is large, by the law of large numbers. Similarly, the averageover the replicates of e−rOi will produce a good approximation of E[e−rOi ].

In the special case of the no-loss model, α = 1 and the region in (4.102) reducesto Bl(t) = t1 + · · ·+ tl < t ≤ t1 + · · ·+ tl+1. Therefore, βl j = 1−α l+1− j = 0 forj = 1, . . . , l, so that (4.104) becomes

E[Oi|Pi = t] = t +E[Zi1]E[Ni(t)], (4.106)

and by (4.103),

Pr(Ni(t) = l) =∫

t1+···+tl<t≤t1+···+tl+1

dFi(t1) · · ·dFi(tl+1)

= Pr(Yi1 + · · ·+Yil < t ≤ Yi1 + · · ·+Yi,l+1)

= Pr(Sl < t ≤ Sl+1),

where Sl = Yi1 + · · ·+Yil. Consequently,

E[Ni(t)] =∞

∑l=1

l Pr(Ni(t) = l) =∞

∑l=1

l Pr(Sl < t ≤ Sl+1)

=∞

∑l=1

l[Pr(Sl < t)−Pr(Sl+1 < t)]

=∞

∑l=1

l Pr(Sl < t)−∞

∑l=2

(l − 1)Pr(Sl < t)

=∞

∑l=1

Pr(Sl < t) =∞

∑l=1

F (l)i (t−),

where F (l)i is the l-fold convolution of Fi and equal to the cdf of Sl. It then follows

from (4.106) that

E[Oi] = E[E[Oi|Pi]] = E[Pi +E[Zi1]Ni(Pi)] = E[Pi]+E[Zi1]E[Ni(Pi)]

= E[Pi]+E[Zi1]∞

∑l=1

E[F (l)

i (Pi−)],

which is the same formula of E[Oi] as in (4.11) for the no-loss model.Similarly, in the total-loss model, since α = 0 and the region Bl(t) reduces to

t1 < t, . . . , tl < t, tl+1 ≥ t, we can obtain from (4.105) the same formulae of E[Oi]as in (4.54) and E[e−rOi ] as in (4.79).

Remark 4.8. The exposition in this section is mainly based on Cai et al. (2013).In general, the partial-loss model is new and little has been done on this model inthe literature. The purpose of this section is to initiate the investigation into thisnew and important class of machine breakdown models, with problem description,formulation and some new results. Much more research efforts are needed to deepenthe understanding and seek optimal solutions for scheduling problems subject tomachine breakdowns with partial loss of work.

Chapter 5Optimal Stopping Problems

This chapter provides the foundations for the general theory of stochastic processesand optimal stopping problems. In Sect. 5.1, we elaborate on the concepts ofσ -algebras and information, probability spaces, uniform integrability, conditionalexpectations and essential supremum or infimum at an advanced level of probabi-lity theory. Then stochastic processes and the associated filtrations are introducedin Sect. 5.2, which intuitively explains the meaning of information flow. Section 5.3deals with the concept of stopping times, with focus on the σ -algebras at stoppingtimes. Some straightforward results are also discussed for ease of reference in thesubsequent chapters. Section 5.4 provides a concise introduction to the concept andfundamental results of martingales. The emphasis is focused on Doob’s stoppingtheorem and the convergence theorems of martingales, as well as their applicationsin studying the path properties of martingales. The materials in this chapter are es-sential in the context of stochastic controls, especially for derivation of dynamicpolicies.

5.1 Preliminaries

This section presents a concise discussion of the classical concept of measurablespace and then equates a σ -algebra to a family of real-valued functions defined onthe sample space. This enables us to introduce the advanced concept of conditionalexpectations by means of geometric intuition, instead of a lengthy treatment in thestandard textbooks of probability theory. As a by-product, we explain why and howa σ -algebra can represent information. In addition, we also introduce some othernotions as preliminaries.


187

188 5 Optimal Stopping Problems

5.1.1 σ -Algebras and Monotone Class Theorems

Let Ω be the sample space that consists of sample points (outcomes) standing forthe fundamental events in the probability theory. First recall the following facts anddefinitions:

1. A collection F of subsets of Ω is a σ -algebra if it contains the empty set ∅ andis closed under complements and countable unions. The pair (Ω ,F ) is referredto as a measurable space and a set A is F -measurable (or simply measurablewhen F is clearly the σ -algebra under consideration) if A ∈ F .

2. Let C be a subset of 2Ω . Then the smallest σ -algebra containing C , which isjust the intersection of all the σ -algebras containing C , denoted by σ(C ), isreferred to as the σ -algebra generated by C . When Ω is a topological spaceequipped with a system of open sets that is closed under finite intersection andarbitrary unions, it is generally associated with the so-called Borel σ -algebrathat is generated by the class of all open sets, written as B(Ω), or simply Bin abbreviation. In particular, if Ω = Rn, then B(Ω) can be generated by theclass (−∞,a] : a ∈ Rn so that the distribution of a random variable X can beidentified by its cumulative distribution function (cdf) FX(x) = Pr(X ≤ x),x ∈R.In addition, σ(F1 ∪F2) is usually written as F1 ∨F2 if F1 and F2 are bothσ -algebras.

3. Let f be a map from Ω onto E where Ω and E are respectively equipped withσ -algebras F and E . The map f is measurable with respect to F and E iff−1(A) = ω : f (ω) ∈ A ∈ F for any A ∈ E . Measurable maps from (Ω ,F )to (Rn,B(Rn)) are commonly referred to as (n-dimensional) random vectors, orrandom variables when n = 1.

4. A collection of subsets of Ω is called an algebra if it is closed under all finite setoperations. A collection C of subsets of Ω is called a semi-algebra if A,B ∈ Cimplies A∩B ∈ C and A−B is a finite union of subsets in C .

Example 5.1.

(1) The σ -algebra F = ∅,Ω is called the trivial σ -algebra and any measurablefunction from (Ω ,F ) to (R,B) is a constant map.

(2) F = 2Ω (the collection of all subsets of Ω ) is the largest σ -algebra on Ω . Anymap/function is measurable with respect to this F .

(3) Let C ⊂ 2Ω . Define D1 = ∅,Ω ,A : A ∈ C or Ac ∈ C . Then

S (C ) =

n⋂

i=1

Ai : Ai ∈ D1,n ≥ 1

is the smallest semi-algebra containing C . Moreover,

A (C ) =

∑i∈I

Si : Si : i ∈ I is a disjoint finite class of S (C )

is the smallest algebra containing C (also S (C )). Finally, the σ -algebras gen-erated by, respectively, C , S (C ) and A (C ) agree.

5.1 Preliminaries 189

A σ -algebra is closed under countable set operations. Hence, in a certain sense,the measurability is the feature of mathematics based on countable operations andthe symbol σ in the term σ -algebra means “countable”; in other words, “measura-bility” means “countability”. The following concept is related to an important tech-nique in probability theory.

Definition 5.1. A collection C of subsets of Ω is a p-system (meaning production)if A,B ∈ C implies A∩B ∈ C . A collection D of subsets of Ω is a d-system (namedafter Dynkins) if (i) Ω ∈ D , (ii) A,B ∈ D and A ⊂ B imply B−A ∈ D and (iii) D isa monotone class in the sense that A =

⋃n An ∈ D for any nondecreasing chain of

sets An⊂ D .

A straightforward result is that

a collection of subsets of Ω is a σ -algebra

⇐⇒ it is both a p-system and a d-system. (5.1)

The following result, known as the Monotone Class Theorem, is of fundamentalimportance in probability theory.

Theorem 5.1 (Monotone Class Theorem). Let C be a p-system and d(C ) denotethe d-system generated by C (the smallest d-system containing C ). Thenσ(C ) = d(C ).

Proof. First, σ(C )⊃ d(C ) is obvious as σ(C ) is a d-system. To show the oppositeinclusion, by (5.1), it suffices to prove d(C ) to be a p-system. For any B ∈ d(C ),write

DB = A ∈ d(C ) : A∩B ∈ d(C )⊂ d(C ). (5.2)

Then DB is also a d-system and C ⊂ DB for B ∈ C . Thus,

d(C )⊂ DB for every B ∈ C . (5.3)

The two inclusion relations (5.2) and (5.3) indicate that d(C ) =DB for every B∈C .Therefore, for any A∈ d(C ), A∩B∈ d(C ) for all B∈C . In other words, C ⊂DA forall A ∈ d(C ) and hence d(C )⊂ DA. To sum up, we have C ⊂ d(C )⊂ DA ⊂ d(C )for all A ∈ d(C ), where the last inclusion is due to (5.2). Consequently, d(C ) = DAfor all A ∈ d(C ) and thus d(C ) is a p-system.

The following theorem is a counterpart of Theorem 5.1 in terms of variablesinstead of sets.

Theorem 5.2 (Functional Monotone Class Theorem). Suppose that (1) C is ap-system and G = σ(C ), (2) Ψ is a vector space of real-valued functions defined onΩ , closed under increasing limit and containing constant 1, and (3) 1A ∈Ψ for allA ∈ C . Then Ψ contains all G -measurable functions.

Proof. We start by showing 1A ∈Ψ for every A ∈ G . Write D = A ∈ G : 1A ∈Ψ.Then condition (3) simply states that C ⊂ D . Furthermore, D is a d-system, whichcan be checked as follows.


(i) Ω ∈ D because 1 ∈Ψ (condition (2)).(ii) If A,B ∈ D and A ⊂ B, then B−A ∈ D , because 1A ∈ Ψ , 1B ∈Ψ and Ψ is a

vector space (condition (2)), so that 1B−A = 1B − 1A ∈Ψ .(iii)

⋃n An ∈ D for any nondecreasing sequence of sets An ⊂ D . This follows

from 1∪nAn = limn→∞

1An ∈Ψ as Ψ is closed under almost sure limit.

Consequently, by Theorem 5.1, the assertions C ⊂ D and that D is a d-systemimply D ⊃ G . This together with the easy fact D ⊂ G yields D = G .

We next show the final assertion. First note that Ψ contains all simple functions∑aiIAi with Ai ∈ G for all i ≥ 1, since again Ψ is a vector space. Furthermore,Ψ includes all positive G -measurable functions because they can be expressed asthe limits of nondecreasing sequences of positive simple functions fn. Finally, Ψcontains all G -measurable functions f because f = f+− f− and, again,Ψ is a vectorspace.

5.1.2 σ -Algebras vs Linear Spaces of Measurable Functions

We next establish the correspondence between σ -algebras and families of real-valued random variables. For a class Ψ of real-valued functions defined on Ω , writeσ(Ψ ) for the σ -algebra generated by the function class Ψ in the sense that

σ(Ψ ) = σ ω : X(ω)≤ r : X ∈Ψ and r ∈ R ,

which is the smallest σ -algebra such that all functions in Ψ are measurable. For aσ -algebra G , write

ΨG = f : f is a G -measurable function from (Ω ,F ) onto (R,B)

for the family of all G -measurable random variables. Then G = σ(ΨG ), becauseIG = 1A : A ∈ G ⊂ ΨG and hence G = A : 1A ∈ IG = σ(IG ) ⊂ σ(ΨG ) ⊂ G .Conversely, however, we only have Ψσ(Ψ ) ⊃Ψ in general. Therefore, for a family ofreal-valued functions to equal the σ -algebra it generates, we need certain conditionsto ensure Ψ =Ψσ(Ψ). The following theorem provides the desired conditions.

Theorem 5.3. For the family Ψ of real-valued functions on Ω , Ψ = Ψσ(Ψ) if andonly if Ψ is a linear space containing the constant function 1 and closed undercountable minimizations provided they are well-defined. Particularly in this case,we have GΨ = A : IA ∈Ψ.

Proof. For the first assertion, we only need to show the sufficiency since the neces-sity is easy to check. Write Ψ = IA : IA ∈Ψ⊂Ψ . Then

σ(Ψ )⊂ σ(Ψ ) = σw : X(w)> r : r ∈ R,X ∈Ψ. (5.4)


Note that r ∈Ψ sinceΨ is a linear space containing the constant function 1. BecauseΨ is closed under countably many minimizations, it is closed under pairwiseminimizations and hence X ∧ (r + 1/n)− X ∧ r ∈ Ψ . Note also that Ψ is closedunder countably many maximizations because X ∨Y = −(−X)∧ (−Y ) ∈ Ψ forX ,Y ∈Ψ . Consequently, Ψ is closed under limits due to liminfXn =

∨∞n=1

∧∞k=n Xk

and limsupXn =∧∞

n=1∨∞

k=n Xk for any sequence Xn⊂Ψ . It follows that

IX>r = limn→∞

X ∧ (r+ 1/n)−X ∧ r1/n

∈Ψ ,

and hence ω : X(ω) > r : r ∈ R,X ∈ Ψ ⊂ Ψ . Therefore, the last equalityin (5.4) indicates σ(Ψ) = σω : X(ω) > r : r ∈ R,X ∈Ψ ⊂ σ(Ψ ), which im-plies σ(Ψ ) = σ(Ψ ).

Next, write A = A : IA ∈ Ψ = A : IA ∈ Ψ. Then σ(A ) = σ(Ψ ) = σ(Ψ).Because Ψ is closed under countably many minimizations, for any two subsets Aand B in A , there exist two functions IA and IB in Ψ , so that IA∩B = IA ∧ IB ∈ Ψ .This shows that the class A is a p-system. The functional Monotone Class Theorem(Theorem 5.2) says thatΨ contains all σ(A )-measurable functions, i.e.,Ψ ⊃Ψσ(Ψ ).This together with the obvious fact that Ψσ(Ψ ) ⊃Ψ yields Ψ =Ψσ(Ψ).

Finally, the proof of GΨ = A : IA ∈Ψ is straightforward.

This theorem shows the existence of a 1-1 correspondence between σ -algebrasand certain vector spaces of real-valued functions closed under countable mini-mizations. This is particularly useful in understanding the definition of conditionalexpectations in the next section. Most importantly, for any sample space Ω , due tothis equivalence, working with (Ω ,G ) is equivalent to working with a pair (Ω ,Ψ )with a family Ψ of real-valued functions defined on Ω satisfying the conditions ofthis theorem. The effect of imposing a σ -algebra is just equivalent to specifying afamily of real valued functions.

5.1.3 Probability Spaces

The triplet (Ω ,F ,Pr) is referred to as a probability space if Pr is a probability mea-sure on F , i.e., a nonnegative function of A∈F such that Pr(∅)= 0, Pr(Ω) = 1 andPr(⋃∞

i=1 Ai) =∑∞i=1 Pr(Ai) whenever Ai∩A j =∅ for all i and j. An important notion

for a probability space is its completion procedure. The completion of a probabilityspace (Ω ,F ,Pr) is carried out by (i) introducing F = A∪N : A ∈ F where Nis Pr-negligible set in the sense that N ⊂ N0 for some set N0 ∈ F with Pr(N0) = 0,and (ii) defining Pr on F by Pr(A∪N) = Pr(A). The function family correspondingto F is ΨF = f : there is a function f ′ ∈Ψ such that f = f ′ is Pr -negligible,where, and henceforth, an event A is said to be Pr-negligible if Pr(A) = 0.

We call f and f ′ Pr-equivalent if f = f ′ is Pr-negligible. Therefore, in orderto complete a family of information functions, one needs to add all Pr-equivalentfunctions of the elements of Ψ . In other words, if Ψ contains all the Pr-equivalences


of its elements, it is complete. This completion procedure is mainly for mathematicalconvenience in exposition and appears irrelevant in real practice. We will work oncomplete probability spaces from now on without loss of generality.

5.1.4 Conditional Expectations

Definition 5.2. Let (Ω ,F ,Pr) be a probability space and G a sub-σ -algebra ofthe reference σ -algebra F and X an F -measurable random variable. Then theconditional expectation of X given G , written E[X |G ], or sometimes EG [X ], is aG -measurable random variable such that

∫

AE[X |G ]d Pr =

∫

AXd Pr for all A ∈ G . (5.5)

We can explain why conditional expectations can be defined this way as follows.Suppose that X is bounded (otherwise write X = X+−X− and approximate X+ andX− by the increasing limits of sequences of bounded variables). Note that theexpectation E[X ] of a random variable X is equal to argmin f∈RE[(X − f )2], wherearg stands for “the solution of”. In other words, E[X ] is the optimal approximationof X by a deterministic quantity under the expected squared loss. Therefore, theconditional expectation of X given G can be rationally considered as the optimalapproximation of X by a G -measurable random variable in ΨG under the expectedsquared error, i.e., E[X |G ] = argmin f∈ΨG

E[(X− f )2]. BecauseΨG is a vector space,the solution is just the projection of X onto the closed vector space ΨG , which isequivalent to X −Y ⊥ΨG . This expression, in turn, is equivalent to E[(X −Y )Z] = 0for all Z ∈ ΨG , or E[(X −Y )IA] = 0 for all A ∈ G as stated in (5.5), due to theobvious base Bs = 1A : A ∈ G of ΨG . In this sense, if G = σ(Ψ) for somefamily Ψ of random variables, then E[X |G ] and E[X |Ψ ] represent the same meaning.In particular, when Ψ = Y1, . . . ,Yk, the meaning of E[X |Ψ ] = E[X |Y1, . . . ,Yk]is clear.

Finally, we note that for any two sub-σ -algebras H ⊂ G , from this geomet-ric interpretation, it is obvious that EG [EH [X ]] = EH [EG [X ]] = EH [X ], which isgenerally known as the law of iterated expectations.

5.1.5 Uniform Integrability

The uniform integrability is a mathematical notion motivated by the fact that X isintegrable if and only if lim

b→∞E[|X |1|X |>b] = 0.

Definition 5.3. A collection K of real-valued random variables is said to be uni-formly integrable if k(b) = supX∈K E[|X |1|X |>b]→ 0 as b → ∞.

Note that if there exists an integrable random variable Y such that |X |≤Y for allX ∈ K , then K is uniformly integrable, because

k(b) = supX∈K

E|X |1|X |>b ≤ E[Y1|X |>b]→ 0 as b → ∞.

The following theorem presents a few characterizations of uniform integrability,where we assume X ≥ 0, which is equivalent to working with |X |.

Theorem 5.4. The following statements are equivalent:

(1) The collection K is uniformly integrable.(2) (i) K is L1-bounded, i.e., supX∈K E[X ]< ∞ and (ii) the integrals are uniformly

continuous with respect to the probability measure in the sense that

limδ→0

supH∈Hδ ,X∈K

E[X1H ] = 0, where Hδ = H ∈ F : Pr(H)≤ δ.

(3) supX∈K

E[(X − b)+] = supX∈K

∫ ∞

bSX(x)dx → 0 as b → ∞.

(4) supX∈K

E[ f (X)]<∞ for some nonnegative increasing convex function f on R+ such

that limx→∞

f (x)/x =+∞.

Proof. (1) =⇒ (2): Note that E[X ] = E[X1X≤b]+E[X1X>b ≤ b]+E[X1X>b]. Thus(i) is immediate from Definition 5.3. For assertion (ii), since X1H ≤ b1H +X1X>bfor every event H and every b in R+, it follows that

supX∈K

E[X1H]≤ bPr(H)+ supX∈K

E[X1X>b]. (5.6)

For any ε > 0, because supX∈K X1X>b → 0 as b → ∞, one can fix b suchthat supX∈K E[X1X>b] < ε/2. Thus for δ = ε/(2b) > 0, supX∈K E[X1H ]] < εby (5.6).

(2) =⇒ (1): By condition (i), as b → ∞,

δ (b) := supX∈K

PrX > b≤ 1b

supX∈K

E[X ]→ 0.

It follows that supX∈K E[X1X>b]≤ supX∈K ,H∈Hδ (b)E[X1H]→ 0 as b → ∞. Thus

(1) is verified.(1) =⇒ (3): For any b > 0, it is obvious that

0 ≤ supX∈K

E[(X − b)+] = supX∈K

E[(X − b)1X>b]

= supX∈K

(E[X1X>b]− bPr(X > b)

)≤ sup

X∈KE[X1X>b].

Hence (3) follows from (1) by letting b → ∞.(3) =⇒ (4): Under (3), there exist 0 ≤ b1 ≤ b2 ≤ · · · ≤ bn ≤ · · · such that

∑∞n=1 supX∈K E[(X − bn)+]< ∞. Define f (x) = ∑∞

n=1(x− bn)+. Then

f (x) =∞

∑n=1

∫ x

0I(y > bn)dy =

∫ x

0

∞

∑n=1

nI[bn,bn+1)(y)dy,

which is apparently a nonnegative increasing convex function such that

limx→∞

f (x)x

= limx→∞

∞

∑n=0

nI[bn,bn+1)(x) = ∞.

It follows from (3) that

supX∈K

E[ f (X)] = supX∈K ∑∞

n=1

E[(X − bn)+]≤∞

∑n=1

supX∈K

E[(X − bn)+]< ∞.

Thus (4) holds.(4) =⇒ (1). This follows immediately from

supX∈K

E[X1X>b] = supX∈K

E[ f (X)]X

f (X)1X>b ≤ sup

X∈KE[ f (X)]sup

x>b

xf (x)

→ 0

as b → ∞.

An immediate but useful result of assertion (2) is stated below.

Corollary 5.1. If K is uniformly integrable, so are its convex hull and closureunder L1 convergence.

Another result from condition (4) of Theorem 5.4 is as follows.

Corollary 5.2. Let Z be an integrable random variable and F = Ft : t ∈ I afamily of sub-σ -algebras, where I is an index set. Then K = Xt = E(Z|Ft) : t ∈ Iis uniformly integrable.

Proof. Pick 0 ≤ b1 ≤ b2 ≤ · · · ≤ bn ≤ · · · such that ∑∞n=1 E[(Z − bn)+] < ∞. Then

f (x) = ∑∞n=1 (x− bn)+ is nonnegative, increasing, convex, and lim

x→∞f (x)/x = +∞.

Because E[ f (Z)]< ∞, we further have

supX∈K

E[ f (X)] = supt∈I

E[ f (Xt)] = supt∈I

E[ f (E(Z|Ft ))]

≤ supt∈I

EE[ f (Z)|Ft ]= E[ f (Z)]< ∞.

The corollary then follows from condition (4) of Theorem 5.4.

A straightforward application of this corollary is that martingales are uniformlyintegrable (see Sect. 5.4 for the definition of martingales in continuous time).

Theorem 5.5. Let Xn be a sequence of real-valued random variables. Then itconverges in L1 if and only if it converges in probability and is uniformly integrable.

Proof. “ =⇒ ”: Assume that Xn converges in L1. Then, for any ε > 0, there isan N such that E|Xn −Xm| < ε for all n,m ≥ N. Consequently, Xn is L1 boundedand for any event A, E[|Xn|IA] ≤ E

[|Xk −XN|IXn>b

]+E [|XN |IA ] ≤ ε +E [|XN |IA ]

for n > N. Therefore,

limPr(A)→0

supn≥1

E[|Xn|IA]≤ limPr(A)→0

max1≤n≤N

|Xn|IA∨ (ε +E[|XN |IA ]) = ε.

It follows that limPr(A)→0

supn≥1 E[|Xn|IA] = 0. Hence Xn is uniformly integrable by

condition (2) of Theorem 5.4.“ ⇐= ”: Assume that Xn converges in probability and is uniformly integrable.

Let X∞ denote the limit of Xn in probability. Then, there is a subsequence X ′n of

Xn that converges to X∞ almost surely and so the Fatou’s lemma yields

E|X∞|= E[ limn→∞

|X ′n|]≤ liminf

n→∞E|X ′

n|≤ supn

E|Xn|< ∞,

where the last inequality follows from the assumed uniform integrability as wellas condition (2) of Theorem 5.4. Hence X∞ is in L1. To show that Xn → X∞ in L1,fix ε > 0, and note E|Xn − X∞| ≤ ε + E[|Xn −X∞|1|Xn−X∞|>ε]. Since X∞ is inte-grable and Xn is uniformly integrable, (Xn −X∞) is uniformly integrable. Thusby the assumed convergence in probability, Pr(|Xn −X∞| > ε) ≤ δ for sufficientlylarge n. Hence E[|Xn − X∞|1|Xn−X∞|>ε ] ≤ supA:Pr(A)≤δ E[|Xn − X∞|1A] < ε . Conse-quently, E|Xn −X∞|≤ 2ε . This completes the proof.

Convergence of Xn to X in L1 implies limn→∞

|E[Xn]−E[X ]| ≤ limn→∞

E|Xn −X | = 0.

Note that if Xn is dominated by an integrable random variable Y , then Xn isuniformly integrable. Moreover, since the almost sure convergence implies conver-gence in probability, this theorem in fact generalizes the dominated convergencetheorem; see the following implications:

|Xn|≤ Y , Y integrable,Xn

a.s.→ X

=⇒

Uniform integrability: |Xn| is uniformly integrableConvergence in probability: Xn

p→ X

⇐⇒

Convergence in L1 : limn→∞

E|Xn −X |= 0

=⇒

limn→∞

E[Xn] = E[X ]. (5.7)

If Xn are nonnegative (or equivalently bounded below), then we have the fol-lowing corollary which presents another characterization of uniform integrability.

Corollary 5.3. Let Xn be a sequence of nonnegative random variables convergingto X. Then lim

n→∞E[Xn] = E[X ] if and only if Xn is uniformly integrable.

Proof. The sufficiency part is immediate due to Theorem 5.5. For the necessity,note that lim

n→∞E[(Xn ∧X)] = E[X ] by Lebesgue’s dominated convergence theorem,

since Xn∧X are bounded. Using |a−b|= a+b−2(a∧b), the uniform integrabilityfollows from

limn→∞

E|Xn −X |= limn→∞

E [Xn +X − 2(Xn ∧X)] = 0

and the condition limn→∞

E[Xn] = E[X ].

5.1.6 Essential Supremum

When uncountably many random variables are involved in a computation, themeasurability needs to be carefully treated. The essential supremum is an elegantgeneralization of supremum to the case of uncountably many random variables.

Theorem 5.6. Let Xi, i ∈ I be a collection of random variables, which may or maynot be countable. Then there exists an almost surely unique extended random vari-able Y (which may take the values±∞ with positive probability), called the essentialsupremum of Xi, i ∈ I and writing Y = esssupXi, i ∈ I, such that (i) Xi ≤ Y a.s.for every i ∈ I and (ii) Y ≤ Y ′ a.s. provided Xi ≤ Y ′ a.s. for every i ∈ I.

Proof. The key idea of this proof is to show the existence of a countable subsetI0 ⊂ I such that esssupXi, i ∈ I= supXi, i ∈ I0.

Without loss of generality, we can assume Xi, i ∈ I to be uniformly bounded,because otherwise one can work with Zi = arctanXi, i ∈ I instead due to the factthat esssupXi, i ∈ I= tanesssupZi, i ∈ I.

Introduce σ = supJ⊂I E [supi∈J Xi], where the outer supremum is over all count-able subsets J of I. Therefore, there exists a sequence Jn of subsets of I suchthat σ = lim

n→∞E[supi∈Jn

Xi]. Write J0 =

⋃∞n=1 Jn. Then J0 is countable such that

E[supi∈J Xi]≥ E[supi∈Jn

Xi]

for every n. Thus σ = E[supi∈J0Xi], which follows from

σ = supJ⊂I E [supi∈J Xi]≥ E[supi∈J0

Xi]≥ E

[supi∈Jn

Xi]→ σ as n → ∞.

The extended random variable Y = supi∈J0Xi is then the required essential supre-

mum, as shown below. First, Y ≥ Xi a.s. for every i ∈ I. If this is not the case,then there exists a random variable Xi0 with i0 ∈ I such that Pr(Y < Xi0) > 0,

then E[supi∈J∪i0 Xi

]= E [Y ∨Xi]> E [Y ] = σ , which contradicts the fact that σ =

supJ⊂I E [supi∈J Xi]. Second, if Xi ≤ Y ′ a.s. for every i ∈ I then Y = supi∈J0Xi ≤ Y ′,

hence (ii) follows. The uniqueness of Y is obvious by (ii).

Moreover, the essential infimum is denoted and defined by essinfXi, i ∈ I = Yif −Y = esssup−Xi, i ∈ I.

5.2 Stochastic Processes 197

5.2 Stochastic Processes

Stochastic processes were developed to model evolution of random systems overthe time horizon. In the contemporary context of probability, a stochastic pro-cess is a set of random variables Xt(ω) : t ∈ T defined on a probability space(Ω ,F ,Pr). The random variable Xt(ω) takes values in a measurable space, say(E,E ). A fixed sample point ω ∈ Ω corresponds to a sample path Xt(ω) : T >−→E .In the current scope we are restricted to the case (E,E ) = (R,BR) or a multidimen-sional Euclidean space (Rn,BRn) for some integer n > 0. Moreover, T is usually aninterval of R (the continuous time version) or the set of all integers (the discrete timeversion), indicating the time horizon. This is the situation in which the terminologytime instants, paths and filtrations (see below for the definition) originated, thoughthere are branches of stochastic processes in which T may not be an ordered set(e.g., in the statistics of empirical processes and Bayesian nonparametrics, T itselfis a σ -algebra which is partially ordered by set inclusions).

A stochastic process in continuous time can be viewed as a random vector ofuncountably infinite dimensions. Its distribution law is thus characterized by itsfinite-dimensional distributions (Kolmogorov’s consistency theorem, which can befound in any standard textbooks involving general theory of stochastic processes).To be specific, we have the following definition, which also presents some otherclosely related notions.

Definition 5.4. Let X and Y be two stochastic processes.

(1) X and Y are equivalent (or equal in distribution, equal in law) if (Xt1 ,Xt2 , . . . ,Xtk )have the same joint distribution as (Yt1 ,Yt2 , . . . ,Ytk ) for any positive integer k andtime instants t1, t2, . . . , tk⊂ T.

(2) X is a modification of Y if Pr(Xt = Yt) = 1 for all t ∈ T.(3) X and Y are indistinguishable if Pr(Xt = Yt for all t ∈ T) = 1 (or simply written

Pr(X = Y ) = 1).

A simple relationship is:

X and Y are indistinguishable =⇒ X is a modification of Y

=⇒ X and Y are equivalent. (5.8)

If T is a discrete set, or if X and Y are right/left continuous, then

X is a modification of Y ⇐⇒ they are indistinguishable. (5.9)

5.2.1 Information Filtrations

To deal with a stochastic process, we are generally given a sequence F = Ftt∈Tof increasing σ -algebras indexed by T, i.e., Fs ⊂ Ft whenever s ≤ t, which isreferred to as a filtration. We make the convention that T is right closed in the sense


that t = supt : t ∈ T ∈ T. If this is not the case, we can simply extend the filtrationF onto T = T∪t by introducing Ft =

∨t∈T Ft , which apparently does not cause

any change of the structure since Ft contains no more information than Ft , t ∈ T.In this context, the probability space with the extended filtration as the σ -algebraof events is usually still denoted by (Ω ,F ,Pr) (unless otherwise specified) andcalled a filtered probability space. Without loss of generality, we generally takeT = R+ = [0,∞] in this book and leave the translation of the results for continuoustime stochastic processes into discrete time scenarios to the readers; the latter canbe specialized from the former by simply defining Xt = X[t] and Ft = F[t], where[t] is integer part of t. Therefore, a stochastic process Xt : t ∈ T is a collection ofrandom variables defined on the filtered probability space (Ω ,F ,Pr). The followingnotions are fundamental in the theory of stochastic processes.

Definition 5.5. A filtration F is said to fulfill the usual condition if it is right con-tinuous in the sense that Ft =

⋂s>t Fs for all t ∈R+ and complete in the sense that

F0 and thus all Ft , t ∈R+, contain all Pr-null events. A stochastic process X = Xtis said to be F -adapted, writing X ∈ F , if Xt ∈ Ft for every time t ∈ R+. In addi-tion, for a stochastic process Xt : t ∈ R+, the filtration Ft = σ(Xu,0 ≤ u ≤ t) iscommonly referred to as the natural filtration of X , or the filtration generated by X .

Note that while the usual condition is extensively assumed in the literature, itspractical implications need to be carefully examined in practical situations. Theadaptedness of a stochastic process X simply states that the information up to anytime t contains the contribution from this stochastic process, or in other words, whenthe information Ft is observed, the segment Xs,0 ≤ s ≤ t is deterministic.

The filtration F is generally used to model the information flow in the sense thatFt is the information observed up to time t. As discussed in Sect. 5.1, any σ -algebracan be equated to a family of random variables. For each t, write Ψt for the familyof random variables corresponding to Ft . Then Ψt , t ∈ R+ form a nondecreasingchain of families of random variables and X = Xt is said to be F -adapted if andonly if Xt ∈Ψt for all t ∈ R+ (or equivalently, Xs ∈Ψt for all 0 ≤ s ≤ t ≤ ∞). It canalso be understood as

Φ = Z = (Zt : t ∈ R+) : Zt ∈Ψt= Z : Z ∈ F,

the collection of all F -adapted processes on (Ω ,F ,Pr) such that Φ is a family ofstochastic processes. Then it follows that

Ft = σ(Xs : 0 ≤ s ≤ t,X ∈ Φ) = σ(Ψt),

which indicates that each filtration Ft can be thought of as being generated by afamily Φ of stochastic processes (i.e., F is the natural filtration of the family Ψ ),and the information Ft means the observations of the history of this family ofstochastic processes up to time t (i.e., the collection Ψt of all observed historicalrandom variables, in the context of conditional expectations). In this notation, a pro-cess is adapted if and only if it belongs to Φ . This fact clearly interprets the truemeaning of the term “information flow”, i.e., Ft is the information released by afamily of processes up to time t. In addition, a filtration F is the natural filtration


of a particular stochastic process Z if and only if all F -adapted processes can berepresented as functions of Z. It is also clear that, as a whole space, the family Φinherits the structural properties of any family of random variables as the counterpartof a σ -algebra:

1. Φ is a linear space and2. Φ is closed under countable minimizations, where the minimization of two

stochastic processes is pointwise defined in time: X ∧Y := Xt ∧Yt : t ∈ R+for any two stochastic processes X = Xt : t ∈R+ and Y = Yt : t ∈ R+.

Another representation of F is presented in the next proposition. Write

Ψ rc = X ∈ F : X has right continuous paths

and

Ψ rcs = X ∈ F : X is a right continuous step function of t.

For example, if a random variable V ∈ Ψt0 for a fixed time t0, then the processXt= VI[t0,∞)(t) ∈ Φrcs ⊂ Φrc. We can easily check the following proposition.

Proposition 5.1. With the notation introduced above, we have

Ft = σ(Xt : X ∈ Φ) = σ(Xt : X ∈ Φrc) = σ(Xt : X ∈ Φrcs). (5.10)

Proof. In fact, it is apparent that

Ft ⊃ σ(Xt : X ∈ Φ)⊃ σ(Xt : X ∈ Φrc)⊃ σ(Xt : X ∈ Φrcs). (5.11)

On the other hand, for any A ∈ Ft , define a right-continuous stochastic processXs = IAI[t,∞)(s),s ∈R+, which is F -adapted and thus X ∈ Φrcs. It then follows thatσ(Xt : X ∈ Φrcs)⊃ Ft . This relation together with (5.11) results in (5.10).

By Proposition 5.1, it is clear that the continuity of a filtration F is equivalent tothat of the companion chain Ψt in t. Therefore, the continuity of F depends funda-mentally on the global properties of the chain Ψt rather than those of the paths ofthe processes generating the filtration (the elements of Φ); any filtration can be gen-erated by a family of stochastic processes with right continuous paths. Nevertheless,for a filtration, it is difficult to pick an informative base of Φ , particularly for theselected stochastic processes to have good path properties. As a matter of fact, whenF (or Ψ) is right continuous, it appears still open to characterize path properties ofthe stochastic processes which generate a right continuous filtration.

5.2.2 Stochastic Processes as Stochastic Functions of Time

Another view on a stochastic process is to consider it as a bivariate function definedon R+×Ω and to discuss some restricted classes of stochastic processes with goodmathematical properties. A few of them are presented below.

Definition 5.6.(1) A stochastic process X = Xt is said to be measurable if Xt(ω) :R+×Ω >−→R,

considered as a bivariate function of (t,ω)∈R+×Ω , is measurable with respectto B[0,∞) ×F∞, and F -progressive (or progressively measurable) if the mapXs(ω) : [0, t]×Ω >−→R is measurable with respect to B[0,t]×Ft for all t ∈R+,where BI denotes the Borel σ -algebra on interval I.

(2) A set A ⊂ R+×Ω is referred to as a random set if At = ω : (t,ω) ∈ A ∈ F .Furthermore, A is said to be measurable if A ∈ B[0,∞)×F∞ and progressive (orprogressively measurable) if 1A is a progressive process, which is equivalent toA∩[0, t]×Ω∈ B[0,t]×Ft for all t.

Proposition 5.2. A progressive process is adapted.

Proof. By Definition 5.6, the progressive measurability of X states that the mappingX(s,ω) = Xs(ω) : [0, t]×Ω >−→R is B[0,t]×Ft -measurable. For any fixed t ∈R+,define a mapping gt(ω) = (t,ω) from Ω to [0, t]×Ω . Then

g−1t (A×B) = ω : (t,ω) ∈ A×B=

∅ ∈ Ft if t ∈ AB ∈ Ft if t ∈ A

for any A×B ∈ B[0,t]×Ft , hence gt is Ft -measurable with respect to B[0,t]×Ft .As a result, Xt(ω) = X(t,ω) = X(gt(ω)) is Ft -measurable.

One of the merits of the measurability and progressive measurability is to enablethe Fubini Theorem to be applied in the integration of the form E[

∫ ba Xtdt]. Without

measurability, even∫ b

a Xtdt cannot be carried out. The following proposition statesthat the stochastic processes with continuous paths are progressive.

Proposition 5.3. Let X be an F -adapted process with right-continuous (or left-continuous) paths. Then X is progressive.

Proof. Introduce

Xns =

∞

∑k=0

X2−n(k+1)∧t I[2−nk∧t,2−n(k+1)∧t)(s),s ∈ [0, t], n ≥ 1.

Then Xs = limn→∞

Xns for each s ∈ [0, t) due to the right continuity of X . For any ε > 0

and n such that 2−n < ε , the relationship

(s,ω) : Xns > r,s ∈ [0, t)=

∞⋃

k=0

[2−nk∧ t,2−n(k+ 1)∧ t)×

ω : X2−n(k+1)∧t > r

∈ B[0,t]×Ft

holds for all r ∈ R. That is, Xns and hence Xs is B[0,t]×Ft -measurable when res-

tricted on [0, t). Thus the adaptedness of X implies its progressiveness.

The following example shows a stochastic process that is not measurable (hencenot progressive).

Example 5.2. Let Xt , t ∈ R+ be a stochastic process with mutually independentrandom variables Xt such that E[Xt ] = 0 and Var(Xt) = 1 for all t ∈R+. This processis not measurable; see Sect. 19.7 of Stoyanov (1997) for details.

The next example demonstrates how we can construct a progressive process bymeans of Proposition 5.3.

Example 5.3. Let D be a countable dense subset of R+ and X an adapted real valuedprocess. Then the processes

Y+t = limsup

s∈D,s>t,s→tXs, Y+

t = liminfs∈D,s>t,s→t

Xs,

Z+t = limsup

s∈D,s≥t,s→tXs and Z+

t = liminfs∈D,s≥t,s→t

Xs

are all Ft+-progressive. This can be shown as follows. For every integer n define aprocess Zn by Zn

t = ∑∞k=0 I[2−nk,2−n(k+1))(t)supDt

Xs, where Dt = D∩ (t,2−n(k+1)).This process is adapted to the family (Ft+ε ) for all ε > 2−n and right-continuous,and hence progressive to this family. By Proposition 5.3, Z+

t = limn→∞

Znt is progressive

with respect to Ft+. For Y+t , note that Y+

t = (Z+t ∨Xt)ID(t)+ Z+

t IDc(t) is a sumof two Ft+-progressive processes, hence its progressive measurability is obvious.The arguments for the other three processes are similar.

A path of a stochastic process Xt is said to be cadlag (the abbreviation for Frenchterminology “continue a droite limits a gauche”) if it is right continuous and has leftlimits at every time point t. A process Xt is cadlag if its paths are all cadlag. Write

ψl = Xt : Xt is pathwise left continuous and F -adapted

andψr = Xt : Xt is cadlag and F -adapted.

It is clear that F -adapted left-continuous processes are also F−-adapted, whereF− = Ft− : t ∈ R+ and Ft− =

∨s<t Fs.

Definition 5.7. P = σ(ψl) is referred to as the predictable σ -algebra on R+×Ω ,and O = σ(ψr) is the optional σ -algebra on R+×Ω . The sets in P (O) are calledpredictable (optional) sets. A stochastic process X is called predictable (optional)with respect to F if it is measurable with respect to P (O).

Following the notation used in Sect. 5.1.1, write ΨP and ΨO respectively for thefamilies of predictable and optional stochastic processes such that X is predictable(optional) if and only if X ∈ΨP (ΨO). Due to Proposition 5.3, the following resultis straightforward.

Proposition 5.4. Optional processes are progressive.

5.3 Stopping Times

Stopping times are mathematically defined as follows.

Definition 5.8. An extended random variable T ∈ [0,+∞] is an F -stopping time ifT ≤ t ∈ Ft for all t ∈ R+ (or equivalently for all t ∈ R+).

Intuitively, an extended random variable T is a stopping time if either the eventT ≤ t occurs or not can be observed as long as one observes the information upto time t. Especially in decision problems, there is a rule to trigger some event at arandom time T . If the information up to the time t can tell whether the event shouldbe triggered or not, then this random time is a stopping time.

Example 5.4. Consider a counting process with a sequence of increasing randomtimes 0 < T1 < T2 < · · · such that lim

n→∞Tn = ∞, indicating the successive arrivals

of customers at a service system. Define Nt = ∑∞n=1 I[0,t](Tn) to be the number of

customers arriving up to time t. Then N0 = 0, Nt ≥ 0 are increasing and limt→∞

Nt = ∞.

Let F = Ft : t ∈R+ be the natural filtration of N = Nt. Then for each integer k,the time Tk is a stopping time: for every t ∈ R+, Tk ≤ t = Nt ≥ k ∈ Ft . Fora positive time a, however, the random time T = inft ∈ R+ : Nt = Na is not astopping time.

An immediate result of Definition 5.8 is that if T is a stopping time then

T < t=∞⋃

n=1

T ≤ t − 1

n

∈ Ft . (5.12)

We call T a wide sense stopping time if T < t ∈ Ft for any t ∈ R+. Then astopping time is a wide sense stopping time but the converse is not necessarily true.In fact, T is a wide sense stopping time if and only if T ≤ t ∈ Ft+ =

⋂ε>0 Ft+ε .

Thus in the case of a right continuous F , all wide sense stopping times are alsostopping times. In addition, if Ft is complete and Pr(T = t) = 0 for every t ∈ R+,then T is a stopping time if and only if it is a wide sense stopping time.

For a stopping time T , define a sequence of random variables

Tn =∞

∑k=1

k2n I2−n(k−1)≤T<2−nk+∞IT=∞

=n2n

∑k=1

k2n I2−n(k−1)≤T<2−nk+∞IT≥n. (5.13)

Then Tn ↓ T as n → ∞ and Tn are also stopping times.Write FT for the σ -algebra indicating the information at stopping time T and ΨT

the family of random variables corresponding to FT (thanks to Theorem 5.3). Then,intuitively, a random variable V ∈ΨT if and only if it is measurable with respect toF∞ and can be observed at any time t ≥ T . This leads to the following definition.

5.3 Stopping Times 203

Definition 5.9. ΨT = V ∈ F∞ : VI[T,∞)(t) is F -adapted.

Because ΨT is a linear space containing the constant function 1 and closed undercountable minimizations provided they are well-defined, by Theorem 5.3, it canbe identified with a σ -algebra (i.e., FT ). Moreover, the notation in the definitionapplies even if the stopping time T is deterministic. Thus ΨT is well defined. Someimmediate results on stopping times are stated in the following proposition.

Proposition 5.5.(1) T ∈ FT for any stopping time T .(2) The σ -algebra FT can be formally expressed as

FT = A ∈ F∞ : A∩T ≤ t ∈ Ft for all t ∈ R+. (5.14)

(3) If S is a stopping time, T is a random variable such that T ≥ S and T ∈FS, thenT is also a stopping time (T is said to be foretold by S).

(4) If T ≥ S are two stopping times, then ΨS ⊂ΨT and hence FS ⊂ FT .

Proof.(1) For every r ∈ R+ and t ∈ R+, TI[T,∞)(t) > r = r < T ≤ t if r < t and ∅

otherwise. Hence T I[T,∞)(t) ∈ Ft for all t, which leads to assertion (1).(2) A ∈ FT ⇐⇒ IA ∈ ΨT (by Theorem 5.3) ⇐⇒ IAI[T,∞)(t) is F -adapted ⇐⇒

A∩T ≤ t ∈ Ft for all t ∈ R+.(3) By (5.14), the condition T ∈ FS implies T ≤ t ∈ FS for all t ∈ R+ ⇐⇒

T ≤ t∩ S ≤ s ∈ Fs for all t ∈ R+ and s ∈ R+. Moreover, due to T ≥ S,T ≤ t= T ≤ t∩S ≤ t ∈ Ft for all t ∈ R+.

(4) Let V ∈ΨS. If S ≤ T , then VI[T,∞)(t) = VI[S,∞)(t)I[T,∞)(t) ∈ Ft , which impliesV ∈ΨS.

The equivalent form (5.14) is extensively used as the definition of FT . It hasbeen proved convenient in showing some properties regarding the informationσ -algebra at time T . Moreover, (5.14) is equivalent to the assertion A ∈ FT if andonly if A ∈ F∞ and TA = T IA +∞IAc is a stopping time. This can be compared withProposition 5.6 below.

Some more useful results are listed below. The first one states that a sufficientcondition for XT ∈ FT is that X is progressive, the second extends the assertionin (2) of Proposition 5.5 by allowing the deterministic time t to be replaced withstopping time T and the last one gives the intersection of σ -algebras at differentstopping times. The proof of the next theorem is less heuristic compared to thosewe have presented so far.

Theorem 5.7. Let T be a stopping time.(1) If X is progressive then XT ∈ FT . Consequently, ΨT = XT : X ∈ Φrc.(2) For any stopping time S, the event V ∈FS (or equivalently the random variable

V ∈ΨS) if and only if V ∩S ≤ T∈FT∧S (or V 1S≤T ∈ΨT∧S) for all stoppingtimes T . Consequently, V ∈ FS =⇒ V1S<T ∈ FT and V1S=T ∈ FT .

(3) FT ∩FS = FT∧S.

Proof.(1) By the definition of FT , it suffices to check that XT I[T,∞)(t) : t ∈ R+ ∈ F .

Let f (ω) = (ω ,T (ω)) be a map from Ωt = ω : T (ω)≤ t ∈ Ft to Ωt × [0, t].Then f is measurable with respect to Ft = Ωt ∩Ft = Ωt ∩A : A ∈ Ft andB[0,t]×Ft because f−1(H× [0,s]) =H∩(T ≤ s) =H∩Ωt ∩(T ≤ s)∈Ωt ∩Ftfor any H ∈ Ft and s ≤ t. Define g(ω ,s) = Xs(ω)I(s≤t),s ≤ t, which is measur-able with respect to Ft ×B[0,t] and B[0,∞) due to the progressive measurabilityof X . Thus XT IT≤t = g f ∈ Ft . This proves the first assertion in part (1).For the second assertion of part (1), if X ∈ Φrc, then X is progressive (Propo-sition 5.3) and thus XT ∈ ΨT . Conversely, let V ∈ ΨT . Then Xt = VIT≤t isadapted, right continuous and V = XT . Thus V ∈ XT : X ∈ Φrc.

(2) The “if” part is straightforward by taking T = ∞, so we here examine the “onlyif” part. Due to the correspondence between FS and ΨS, we only need to checkthe validity of the random variable version. For this purpose, note that V ∈FS ⇐⇒ Xt := V1S≤t ∈ Ft for all t ∈ R+. Thanks to the right-continuity ofXt , the previous assertion implies V1S≤T =V1S≤T∧S = XT∧S ∈ FS∧T .Moreover, taking V = 1 leads to S ≤ T ∈ FT∧S. Interchanging the roles of Sand T yields T ≤ S ∈ FT∧S, so do the events S = T= S ≤ T∩T ≤ Sand S < T = S ≤ T\S = T. It follows that multiplying V with the indi-cator of S = T or S < T will not alter the membership in FT∧S.

(3) It suffices to prove FT ∩FS ⊂ FT∧S. To do so, let H ∈ FT ∩FS. Then by part(2), we have both H ∩ S ≤ T ∈ FT∧S (since H ∈ FS) and H ∩ T ≤ S ∈FT∧S (since H ∈ FT ). Hence H = H ∩S ≤ T∪H ∩T ≤ S ∈ FT∧S.

Proposition 5.6. If T and S are stopping times and A ∈ FT∧S, then η = TIA + SIAc

is also a stopping time.

Proof. This proposition follows from the fact that for any t ∈ R,

η ≤ t= [T ≤ t∩A]∪ [S ≤ t∩Ac]

= T ≤ t∩ [T ∧S ≤ t∩A]∪S ≤ t∩ [T ∧S ≤ t∩Ac] ∈ Ft

as T ≤ t,T ∧S ≤ t∩A,S ≤ t and T ∧S ≤ t∩Ac are all Ft -measurable.

If A ∈ FS, we can define a new stopping time SA = SIA +∞IA, which is calledthe restriction of S on A.

Given any two stopping times S and T , define

[S,T ) = (t,ω) : S(ω)≤ t < T (ω)⊂ R+×Ω

and similarly for (S,T ],(S,T ), etc. Let A be a subset of R+×Ω and X a stochasticprocess. We define the debut time of A as

DA = inft : (t,ω) ∈ A. (5.15)

The next theorem specifies the relation between stopping times and debut times.

5.3 Stopping Times 205

Theorem 5.8. Assume the filtration to fulfill the usual condition. An extendednonnegative random variable T is a stopping time if and only if it is a debut time ofa progressive set.

Proof. Note that the indicator of [T,∞) is a right continuous adapted process, andhence progressive. Thus T = inft : (t,ω) ∈ [T,∞) is a debut time of a progressiveset. This proves the “only if” part. The “if” part is proved in Theorem IV-50 ofDellacherie and Meyer (1978).

For two random variables X and Y , we say X =Y a.s. on event A if XIA =Y IA a.s.The following result is geometrically obvious, see the previously described intuitionof conditional expectations in Sect. 5.1.4, but needs a complicated proof.

Proposition 5.7. If Z is an integrable random variable, then E[Z|FT ] = E[Z|FT∧S]a.s. on T ≤ S,T < S and T = S. Consequently,

E[E[Z|FT ]|FS] = E[Z|FT∧S] a.s. (5.16)

Proof. Because A∩T ≤ S∈FT∧S for every A ∈FT (see Theorem 5.7), it is easyto check∫

AE(Z1T≤S|FT∧S)dP =

∫1A∩T≤SE(Z|FT∧S)dP =

∫E(Z1A∩T≤S|FT∧S)dP

=∫

Z1A∩T≤SdP =∫

AZ1T≤SdP.

Because E[Z1T≤S|FT∧S] ∈ FT , this indicates E(Z1T≤S|FT ) = E[Z1T≤S|FT∧S]. Namely, E[Z|FT ) = E(Z|FT∧S] a.s. on T ≤ S. The same equality forT < S and T = S can also be verified similarly.

For equality (5.16), observe that, by the assertion just proved,

1T>SE[E[Z|FT ]|FS] = 1T>SE[E(Z|FT )|FT∧S] = 1T>SE[Z|FT∧S] a.s.

Therefore,

E[E[Z|FT ]|FS] = E[1T≤SE[Z|FT ]|FS]+ 1T>SE[E[Z|FT ]|FS]

= E[1T≤SE[Z|FT∧S]|FS]+ 1T>SE[Z|FT∧S]

= 1T≤SE[Z|FT∧S]+ 1T>SE[Z|FT∧S] = E[Z|FT∧S] a.s.


This proposition also leads to an easy result that E[Z|FT ] = E[Z|FS] a.s. onT = S because both are equal to E[Z|FT∧S] on T = S. Thus, for any stoppingtime T , we have the equation E[Z|FT ] = E[Z|Ft ] on event T = t.

The following lemma is useful in the discussion of optimal stopping times inSect. 5.5.

Lemma 5.1. Let X be a progressive process and T a stopping time. For anysequence of stopping times Sk ≥ T , we can construct a new sequence of stoppingtimes Sk ≥ T,k ≥ 1 such that E[XSk

|FT ] is nondecreasing in k and E[XSk|FT ] ≥

E[XSk |FT ].

Proof. For any such sequence of stopping times Sk ≥ T,k ≥ 1, recursively define anew sequence of stopping times Sk ≥ T by S1 = S1 and

Sk = Sk−1IE[XSk−1|FT ]>E[XSk

|FT ]+ SkIE[XSk−1|FT ]≤E[XSk

|FT ] for k = 2,3, . . . .

In this way, we have

E[XSk|FT ] = IE[XSk−1

|FT ]>E[XSk|FT ]E[XSk−1

|FT ]+IE[XSk−1|FT ]≤E[XSk

|FT )E(XSk |FT ]

= E[XSk−1|FT ]∨E[XSk |FT ].

This simultaneously proves both the monotonicity of E[XSk|FT ] in k and the

relationship E[XSk|FT ]≥ E[XSk |FT ].

5.4 Martingales

Martingales (super- or sub-martingales) are a particular type of stochastic processesthat characterize the trends of certain objectives under the observed information, interms of conditional expectations. They have so far played crucial roles in generaltheory of stochastic processes.

5.4.1 Definitions

Definition 5.10. Let (Ω ,F ,Pr) be a filtered probability space. An adapted processX = Xt : t ∈R+(or R+) is a martingale (supermartingale, submartingale) if for alls ≤ t < (or ≤) ∞, one has E[Xt |Fs] = (≤,≥) Xs.

Remark 5.1. The following results are immediate.(1) X is a submartingale if and only if −X is a supermartingale; and X is a martin-

gale if and only if it is both super- and sub-martingale.(2) Functions of martingales: If X is a supermartingale, then f (X) is a supermartin-

gale (submartingale) for any concave increasing (convex decreasing) function f .Likewise, if X is a submartingale, then f (X) is a submartingale (supermartin-gale) for any convex increasing (concave decreasing) function f . For example,if X is a supermartingale, then X− (the negative part of X) is a submartingale,because f (x) = x− =−(x∧0) is a convex decreasing function.

(3) Transforms of martingales: Let X = Xn : n ≥ 1 be a discrete time processand Vn a predictable sequence (i.e., Vn ∈ Fn−1). Write Gn = ∑n

k=1 Vk(Xk −Xk−1)

5.4 Martingales 207

(where X0 = 0), called he transform of X by V . Suppose that G is integrableand V ≥ 0. If X is a martingale (supermartingale), then Gn is also a martingale(supermartingale). In particular, for a stopping time T , the sequence Vn = I[0,T ](n)is predictable since Vn = 1 = T ≥ n ∈ Fn−1. Hence Gn = XT∧n is thestopped process at T . It is then clear if X is a supermartingale, so is the stoppedprocess XT∧n.

5.4.2 Doob’s Stopping Theorem

In this subsection, we temporarily reactivate the notation of time set T and supposeit is right closed and its right endpoint is denoted by ∞ (may or may not be the realinfinity). Then Doob’s stopping theorem is stated as follows.

Theorem 5.9 (Doob’s optional sampling theorem or optional stopping theorem).Let X = Xt , t ∈ T be a right-continuous supermartingale and suppose that S ≤ Tare two stopping times. Then E[XT |FS]≤ XS a.s. Consequently, if X is a martingale,then E[XT |FS] = XS a.s.

Proof. First consider the case of discrete T. Let S ≤ T be two stopping times. Then|XS∧n| ≤ ∑n

i=1 |Xi| and hence is integrable. Furthermore, both XS∧n and XT∧n are(super-) martingales if so is X , because

E[XS∧(n+1)−XS∧n|Fn] = I(S≥n+1)E[(Xn+1 −Xn)|Fn].

Therefore, for any event A ∈ FS, it easy to see

E [(XT∧n −XS∧n)IA] =n

∑j=1

E[(XT∧n −Xj)IA(S= j)

]

=n

∑j=1

E[(E(XT∧n|F j)−Xj)IA(S= j)

]

= 0 if X is a martingale,≤ 0 if X is a supermartingale.

(5.17)

Second, for the martingale case, let T = n be a deterministic stopping time. Then(5.17) indicates that for bounded stopping times S ≤ n,

XS = E[Xn|FS] = E[E[X∞|Fn)|FS) = E[X∞|FS]. (5.18)

Let H be the collection of all bounded stopping times. Then, by Corollary 5.2, thefamily G = XS : S ∈ H of random variables is uniformly integrable. Let G1 = G∪X∞IS>k : S ∈ H. Then its closed convex hull Gcch

1 is also uniformly integrable byCorollary 5.1. Note the almost sure convergence XS = lim

m→∞[XS∧mIS≤m+X∞IS>m]

and the fact that 12 [XS∧mIS≤m +X∞IS>m] ∈ Gcch

1 . Theorem 5.5 states that XS isalso the L1 limit of XS∧mIS≤m+X∞IS>m and thus belongs to Gcch

1 as well. As aresult, for any stopping time S and A ∈ FS, the integrability of XS leads to

E [XSIA] = E[

limm→∞

(XSIS>m+X∞IS=∞

)IA

]

= limm→∞

E[XSIA∩S≤m

]+E

[X∞IS=∞IA

].

Replacing S by S∧m in (5.18) gives XS∧m = E[X∞|FS∧m]. This together with thefact that A∩S ≤ m ∈ FS∧m yields

E [XSIA] = limm→∞

E[XS∧mIA∩S≤m

]+E

[X∞IS=∞IA

]

= limm→∞

E[E(X∞|FS∧m)IA(S≤m)

]+E

[X∞IS=∞IA

]

= limm→∞

E[X∞IA∩S≤m

]+E [X∞IS=∞IA]

= E [X∞IA] for all A ∈ FS.

It follows that

XS = E[X∞|FS] = E[E[X∞|FT ]|FS] = E[XT |FS]. (5.19)

Third, we proceed to deal with the supermartingale case. Note that by taking S = 0,(5.17) indicates E[X0] ≥ E[XT∧m] for any stopping time T . Note that X is boundedbelow by Xn ≥ E[X∞|Fn]. We can assume that X ≥ 0 with X∞ = 0 because otherwisewe can work with Xn −E[X∞|Fn] instead. Then for any stopping time S, becauseXS = lim

m→∞XS∧m +X∞I(S=∞), it follows from Fatou’s lemma that

E[XS[≤ liminfm→∞

E[XS∧m]≤ E[X0].

Hence XS is integrable. For any two stopping times S ≤ T , by (5.17) again,

E[XT∧nIA(T≤n)

]= E

[E(XT∧n|FS∧n)IA(T≤n)

]≤ E

[XS∧nIA(T≤n)

]≤ E

[XS∧nIA(S≤n)

].

for all A ∈ FS, due to A∩T ≤ n ∈ FS∧n.Thanks again to the uniform integrability, we can let n → ∞ and thus obtain

E[XT IA(T<∞)

]≤ E

[XSIA(S<∞)

]. Recall the convention X∞ = 0. This inequality hence

indicates E [XT IA]≤ E [XSIA] for all A ∈ FS, i.e., E[XT |FS]≤ XS a.s.Finally, we prove the theorem for continuous time setting. This proof needs the

assistance of the convergence theorem below; see assertion (3) of Theorem 5.12.Define Dn = 2−nk : k = 0,1, · · ·; D1 ⊂ D2 ⊂ · · · . For two stopping times S ≤ T ,let

Tn =∞

∑k=1

2−nkI2−n(k−1)≤T<2−nk+∞IT=∞

5.4 Martingales 209

and

Sn =∞

∑k=1

2−nkI2−n(k−1)≤S<2−nk+∞IS=∞.

Then Sn ≤ Tn and both Sn and Tn decreasingly converge respectively to S and T asn → ∞. Thus by the reversed supermartingale convergence theorem, E [XT ]≤ E [XS].This leads to the final result.

Remark 5.2. This theorem does not generally hold if T is not right closed. In sucha case, two methods can be applied. One is to extend T to its right closureT = T∪∞ and introduce a new random variable X∞ such that Xt , t ∈ T is stilla (super-) martingale. The other way is to consider only bounded stopping times T .For example, if T ≤ n, we in fact work with the supermartingale Xt , t ∈ T∩ [0,n]such that the condition of this theorem is satisfied.

5.4.3 Upcrossings

Let x = xn : n ≥ 0 be a sequence of real numbers and write xn = x1,x2, . . . ,xn.Fix a and b in R with a < b. Define a series of times Sk and Tk as follows:

S1 = minn : xn ≤ a, Tk = minn > Sk : xn ≥ b, Sk = minn > Tk−1 : xn ≤ a.

Here we have taken the usual convention min∅ = +∞. Replacing x by a randomsequence X = Xn : n ≥ 0, then (S1,T1,S2,T2, . . .) is an increasing sequence ofstopping times if X is adapted. See the following figure for an illustration (Fig. 5.1).

Xn

S1

b

a

T1 S2n

T2

Fig. 5.1 Upcrossing times of (a,b)

The variable Uxn(a,b) (abbreviated as Un(a,b)) = maxk : Tk ≤ n is the numberof upcrossings of (a,b) by x in [0,n]. Similarly we can define

U(a,b) = limn→∞

Un(a,b) = maxk : Tk < n

to indicate the total number of upcrossings of x.The significance of the upcrossings of a process to convergence results is due to

the following easy algebraic criterion.

Lemma 5.2. A sequence xn : n ≥ 0 converges to a limit in R if and only if thenumber of upcrossings U(a,b) of the sequence is finite for all a < b.

In the stochastic circumstance, U(a,b) is an extended random variable. In orderto examine the almost sure convergence of a sequence, a somewhat luxury conditionis E[U(a,b)]< ∞, which, in turn, by the Monotone Convergence Theorem, amountsto checking the condition lim

n→∞E[Un(a,b)]< ∞. The following proposition states the

upper bounds of E[Un(a,b)] for submartingales.

Theorem 5.10 (Doob’s upcrossing inequality). Let X be a (discrete time) sub-martingale with respect to a filtration Fn,n ≥ 0. Then

E[Un(a,b)]≤1

b− aE[(Xn − a)+− (X0 − a)+]. (5.20)

Likewise, for a supermartingale X, we have

E[Un(a,b)]≤1

b− aE[(Xn − a)−]. (5.21)

Proof. Note that Un(a,b) for Xn agrees with Un(0,b−a) for (Xn−a)+. We thuslet Yn = (Xn − a)+ ≥ 0, which is also a supermartingale. Furthermore, we have

(b− a)Un(a,b)≤Un(a,b)

∑k=1

YTk +Yn ≤n

∑m=1

∞

∑k=1

I(Sk,Tk ](m)(Ym −Ym−1),

I(Sk,Tk](m) ∈Fm−1, ∑∞k=1 I(Sk,Tk ](m)≤ 1 and E[Ym|Fm−1]−Ym−1 ≥ 0 due to the sub-

martingale property of Y . It follows that

(b− a)E[Un(a,b)]≤n

∑m=1

E

[∞

∑k=1

I(Sk,Tk ](m)(Ym −Ym−1)

]

=n

∑m=1

E

[∞

∑k=1

I(Sk,Tk ](m)(E[Ym|Fm−1]−Ym−1)

]

≤n

∑m=1

E[(E[Ym|Fm−1]−Ym−1)]

= E[Yn −Y0] = E[(Xn − a)+− (X0 − a)+],

which completes the proof of (5.20).

5.4 Martingales 211

For (5.21), define bounded stopping times Sk = Sk∧(n+1) and Tk = Tk ∧(n+1).Note that since Sk ≤ n ∈ FSk , it follows that E

[I(Sk≤n)

(XTk −XSk

)]≤ 0 and

E[I(Tk≤n)

(XTk

−XSk

)]= E

[I(Sk≤n)

(XTk

−XSk

)]−E

[I(Sk≤n,Tk>n)

(XTk

−XSk

)]

≤ E[I(Sk≤n,Tk>n)

(XSk

−XTk

)],

where the inequality follows from Doob’s stopping theorem for bounded stoppingtimes. Therefore,

(b− a)E[Un(a,b)] = (b− a)E

[∞

∑k=1

I(Tk≤n)

]≤

∞

∑k=1

E[I(Tk≤n)

(XTk

−XSk

)]

=∞

∑k=1

E[I(Sk≤n,Tk>n)

(XSk

−XTk

)]≤

∞

∑k=1

E[I(Sk≤n,Tk>n) (a−Xn)

]

≤∞

∑k=1

E[I(Sk≤n,Tk>n) (Xn − a)−

]≤ E

[(Xn − a)−

],

which implies (5.21).

When dealing with continuous time stochastic processes, we need the followingextension. Let f be a mapping of R+ into R. Further let S be any subset of R+

and suppose u = t1, t2, . . . , tn (the elements are increasingly ordered) be any finiteset of S. Note that Uu(a,b) = U f (t1), f (t2),..., f (tn)(a,b). The number of upcrossing[a,b] by f is define by US(a,b)= supfinite u⊂SUu(a,b). The importance of upcrossingnumbers arises from the following intuitive lemma (see, e.g., Theorem IV.22 ofDellacherie and Meyer for a proof).

Lemma 5.3. Let f be a function on R+ with values in R. Then f has left and rightlimits at every t ∈R+ if and only if the crossing number UI(a,b)< ∞ for every pairof rationales a < b and every finite subintervals I of R+.

5.4.4 Maxima Inequalities

The following theorem provides some results on the maxima or minima of sub- orsuper-martingales.

Theorem 5.11. Let X be a discrete time process and b > 0 is a positive number.Write An = maxk≤n Xk ≥ b. Then

bPr(An)≤

E[XnIAn ]≤ E[X+n ] if X is a submartingale,

E[XnIAn ]+E[X0−Xn] = E[X0]−E[XnIAcn ] if X is a supermartingale.

Proof. Define T = infn : Xn ≥ b. Then An = T ≤ n and XT ≥ b. Hence,

bPr(An) = bE[I(T≤n)]≤ E[XT I(T≤n)] = E[XnI(T≤n)]+E[XT∧n −Xn]

≤

E[XnI(T≤n)] if X is a submartingale,E[XnI(T≤n)]+E[X0−Xn] = E[X0]−E[XnI(T>n)] if X is a supermartingale.

This proves the theorem.

According to Theorem 5.11, if X is a supermartingale (hence −X is a submartin-gale), then the following inequalities hold:

bPr(maxk≤n

|Xk|≥ b) = bPr(maxk≤n

Xk ≥ b)+ bPr(maxk≤n

(−Xk)≥ b)

≤ E[X0]+E[X−n ]+E[(−Xn)I(maxk≤n Xk<b)]≤ 3max

k≤nE[|Xk|]. (5.22)

Letting n → ∞ leads to

bPr(supk|Xk|≥ b)≤ 3sup

kE[|Xk|]. (5.23)

The proof also indicates that if T is a stopping time bounded by n and X is a super-martingale, then

E[XT ]≤ 3maxk≤n

E[|Xk|]. (5.24)

The following is an extension of this inequality to continuous time martingales.

Proposition 5.8. If X is a supermartingale with almost surely right (or left)-continuous paths, then

bPr(supt|Xt |≥ b)≤ 3sup

tE[|Xt |]. (5.25)

If X is further assumed to be nonnegative, then the right hand side of (5.25) canagain be replaced by E[X0].

Proof. We examine the assertion for right-continuous supermartingales. Let D be adense subset of R+ (e.g., the set of all nonnegative rationales). Then

supt|Xt |= sup

t∈D|Xt |= esssup

finite U⊂Dsupt∈U

|Xt |

is a random variable, according to Theorem 5.6. Moreover, we can select a sequenceof increasing Um ⊂ D,m = 1,2, . . . , such that supt∈Un

|Xt | increasingly converges tosupt |Xt |. As a result,

bPr(

supt|Xt |≥ b

)= b lim

m→∞Pr(

supt∈Um

|Xt |≥ b)≤ 3max

t∈UmE[|Xt |]≤ 3sup

tE[|Xt |].

5.4 Martingales 213

5.4.5 Martingale Convergence Theorems

This subsection presents some convergence results on (super-) martingales. We tem-porarily allow the time horizon to be Z = · · · ,−2,−1,0,1,2, · · ·. The first resultis the following theorem.

Theorem 5.12. Let X = Xn : −∞ < n < ∞ be a supermartingale with respect to afiltration Fn : −∞ < n < ∞.(1) If supn≥1 E[X−

n ] < ∞, then the sequence Xn converges almost surely to anintegrable random variable X∞ as n → ∞.

(2) There exists an integrable random variable Y such that Xn ≥ E[Y |Fn] for alln ≥ 0 (referred to as “Y closing X on the right”) if and only if X−

n ;n ≥ 0 areuniformly integrable.

(3) Xn converges almost surely to an integrable random variable X−∞ as n →−∞.Moreover, if lim

n→−∞E[Xn]<∞, then the convergence is also valid in L1 and X−∞ ≥

E[Xn|F−∞], where F−∞ =⋂∞

n=−∞ Fn.

Proof.(1) The condition implies that

supn≥1

E|Xn|≤ supn

E[Xn + 2X−n ]≤ 2sup

nE[Xn]

−+E[X0]< ∞. (5.26)

Note that (X −a)− ≤ |X |+ |a|. Hence for any real number a< b, the upcrossinginequality for supermartingale in Theorem 5.10 states that

E[U(a,b)]≤ 1b− a

supn≥1

E[(Xn − a)−]≤1

b− a

[supn≥1

E[|Xn|]+ |a|]< ∞.

Therefore, the convergence is verified by Lemma 5.2. Moreover, Fatou’s lemmaleads to the integrability of X∞:

E[|X∞|]≤ liminfn→∞

E[|Xn|]≤ supn

E[|Xn|]< ∞.

(2) For the “if” part, suppose that X−n ;n ≥ 0 are uniformly integrable. Then

supn≥1 E[X−n ] < ∞. By assertion (1), there exists an integrable X∞ such that

limn→∞

Xn = X∞, so that limn→∞

X−n = X−

∞ . By Theorem 5.5, the uniform integrability

of X−n implies lim

n→∞E|X−

n −X−∞ |= 0. Let A ∈ Fn. Then

E [XnIA]≥ E [Xn+mIA] = E[X+

n+mIA]−E

[X−

n+mIA]

for any m ≥ 0. Letting m → ∞ leads to

E [XnIA]≥ liminfm→∞

E[X+

n+mIA]− liminf

m→∞E[X−

n+mIA]

≥ E[

limm→∞

X+n+mIA

]−E

[lim

m→∞X−

n+mIA

]= E [X∞IA] . (5.27)

It follows that Xn ≥ E[X∞|Fn], and hence it suffices to take any integrableY ≤ X∞.

For the “only if” part, if Y closes X on the right, then Xn ≥ E[Y |Fn] ≥−E[Y−|Fn], indicating X−

n ≤ E[Y−|Fn]. By Corollary 5.2, the collectionE[Y−|Fn] : n ≥ 1 is uniformly integrable, so is the sequence X−

n : n ≥ 1.(3) Let Un(a,b) denote the upcrossing of Xk : −n ≤ k ≤ 0. Then Doob’s upcross-

ing inequality (5.21) can be rewritten as

E[Un(a,b)]≤1

b− aE [(X0 − a)−] . (5.28)

Thus, the convergence of this supermartingale as n →−∞ is obvious.To examine the L1 convergence, it suffices to check the uniform integrability ofXn : n ≤ 0. Since X−

n : n ≤ 0 is a submartingale (see (3) of Remark 5.1),

E|Xn|= E[Xn]+ 2E[Xn]− ≤ lim

n→−∞E[Xn]+ 2E[X−

0 ]< ∞.

For any ε > 0, thanks to the condition limn→−∞

E[Xn] < ∞ and the fact that X is a

supermartingale, we can fix an integer K < 0 such that 0 ≤ E[Xn]−E[XK] < εfor all n < K. For any positive constant c, we have the decomposition:

E[|Xn|I|Xn|>c

]= E[Xn]IXn>c−E[Xn]IXn<−c

= E[Xn]−E[Xn]IXn≤c−E[Xn]IXn<−c.

Fix d > 0 such that E[|XK |I|XK |>d]< ε . Then for any c > (d/ε)supE|Xn|, thesupermartingale inequality states that for any n ≤ K,

E[|Xn|I|Xn|>c

]≤ E[XK ]+ ε −E

[E [XK |Fn] IXn≤c

]−E

[E [XK |Fn] IXn<−c

]

= E[XK ]+ ε −E[XKIXn≤c

]−E

[XKIXn<−c

]

= E[|XK |I|Xn|>c

]+ ε

= E[|XK |I|Xn|>c∩|XK |>d

]+E

[|XK |I|Xn|>c∩|XK |≤d

]+ ε

≤ E[|XK |I|XK |>d

]+

dc

E [|Xn|]+ ε < 3ε.

This proves the uniform integrability of Xn : n ≤ 0.Moreover, the inequality X−∞ ≥ E[Xn|F−∞] follows from E[X−∞IA] ≥ E[XnIA]for all A ∈ F−∞, as what we have done in (5.27).

5.4.6 Regularity of Paths

For the following theorems, let D be a countable dense subset of R+ and write

Ω ∗D =

w : lim

r∈D,r↓tXr(ω) ∈ R for all t ≥ 0 and lim

r∈D,r↑tXr(ω) ∈ R for all t > 0

,

where limr∈D,r↓t

Xr(ω) ∈ R indicates that limr∈D,r↓t

Xr(ω) exist and are finite. The next

theorem gives the probability of event Ω ∗D, which can be found in, for example,

Dellacherie and Meyer (1982, Theorem 4 in Chap. VI).

5.4 Martingales 215

Proposition 5.9. Let X be an F -submartingale on R+. Then Pr(Ω ∗D) = 1.

Proof. Fix s in D. Let a and b be rational numbers with a < b. Write B = D∩ [0,s].Then s ∈ B and Xr : r ∈ B is a submartingale with respect to Fr : r ∈ B. Byapplying Theorem 5.11 to the submartingale X on B, we obtain

cPr(

maxr∈B

Xr ≥ c)≤ E|Xs|. (5.29)

Next, let UB(a,b) be the number of upcrossings over interval (a,b) by the processXr : r ∈ B. Then by Theorem 5.10,

(b− a)E[UB(a,b)]≤ E[(Xs − a)+]< ∞. (5.30)

Note that the right sides of (5.29) and (5.30) are free of B. Thus, by taking supremumover all finite subsets B of D∩ [0,s] containing s, we see that the same inequalitieshold for Ms = supr∈D∩[0,s] |Xr| and Us(a,b) = supB UB(a,b) respectively. It followsthat Ms < ∞ and Us(a,b)< ∞ almost surely. Let

Ωs =

ω : limr∈D,r↓t

Xr(ω) ∈ R and limr∈D,r↑t

Xr(ω) ∈ R for all t ∈ [0,s).

Observe that Ωs ⊃⋂

a,bMs < ∞,Us(a,b) < ∞, where the intersection is over allpairs (a,b) of rationales with a < b. Thus Ωs contains an almost sure event, and soΩ ∗

D =⋂

s∈D Ωs is an almost sure event.

Write

Ω ∗ =

ω : lims↓t

Xs(ω) ∈R and lims↑t

Xs(ω) ∈ R for all t > 0.

Then Proposition 5.9 simply states that Pr(Ω ∗) = 1, and hence we can denoteXt+(ω) = lims↓t Xs(ω) and Xt−(ω) = lims↑t Xs(ω), which are random variablesprecisely defined on Ω ∗ and arbitrarily elsewhere.

Proposition 5.10. Suppose that F satisfies the usual condition and let X be anF -submartingale.(a) For each t ∈ R+, the random variable Xt+ is integrable and Xt ≤ Xt+

almost surely, and the equality holds almost surely if and only if E[Xs] is right-continuous in s at t (in particular, if X is a martingale).

(b) The process Xt+, t ∈ R+ is cadlag and also an F -submartingale.

Proof. Fix t in R+. Let rn be a sequence in D decreasing strictly to t. Then, Xrnis a reversed time submartingale, and E[Xt ]≤ E[Xrn ] for every n. By the convergencetheorem (Theorem 5.12), the sequence Xrn is uniformly integrable and convergesto Xt+ almost surely and in L1. It follows that Xt+ is integrable and, for every eventH ∈ Ft ,

E[Xt+1H ] = limn→∞

E[Xrn1H ]≥ E[Xt1H ], (5.31)

where the inequality is due to the submartingale inequality for t < rn. Thus

Et [Xt+−Xt ]≥ 0. (5.32)

Since Xrn ∈Ft+ε for every ε > 0 and all n large enough, the limit Xt+ is in Ft+, andFt+ =Ft by the assumed right-continuity for F . Thus inequality (5.32) amounts toEt [Xt+−Xt ] = Xt+−Xt ≥ 0, which proves that Xt+ ≥ Xt almost surely. The equalitywould hold almost surely if and only if E[Xt+] = E[Xt ], which is equivalent to theright-continuity of E[Xs] in s at t (in that case E[Xt ] = lim

n→∞E[Xrn ] = E[Xt+]). This

proves part (a).For part (b), the paths t → Xt+(ω) are right-continuous and have left-limits for

ω ∈ Ω ∗ by their definitions. To see that Xt+ is an F -submartingale, take s < t,choose rn⊂ D and qn⊂ D strictly decreasing to t and s, respectively, ensurings < qn < t < rn for every n. Then, for H ∈ Fs, using (5.31) twice, we get

E[Xs+1H ] = limn→∞

E[Xqn1H ]≤ limn→∞

E[Xrn1H ] = E[Xt+1H ],

where the inequality follows from the submartingale property of X . This completesthe proof.

Theorem 5.13. Let X = Xt ,0 ≤ t ≤ ∞ be an F -supermartingale. Then X hasa right-continuous modification, which is a right-continuous process X0 such thatPr(Xt = X0

t ) = 1 for every t ∈ [0,∞], if and only if E[Xt ] is right-continuous in t.Moreover, if this right-continuous modification exists, it can be chosen to be a cadlagand adapted to Ft , hence a supermartingale with respect to Ft .

Proof. This is an immediate result of the previous proposition.

For the next proposition, we say that a stopping time is discrete if it takes at mostcountably many values.

Proposition 5.11. For every stopping time v, if the supermartingale X satisfieslimn→∞

E[IAXvn ] = E[IAXv] for all sequences of discrete stopping times vn ↓ v and events

A ∈ Fv, then there is a cadlag version X0 of X such that X0v = Xv a.s.

Proof. Take A = Ω . If v and vn are deterministic numbers, then E[Xt ] is right-continuous in t under the condition lim

n→∞E[IAXvn ] =E[IAXv], and thus there is a cadlag

version X0 of X such that Pr(Xt = X0t ) = 1 for all t. For any stopping time v, define

vn =∞

∑k=1

2−nkI[2−n(k−1),2−nk)(v)+∞Iv=∞.

Then vn take values ∞ or in the set of dyadic rationales, hence

Pr(X0vn= Xvn) = 1 (5.33)

for all n. Moreover, vn are stopping times and v1 ≥ v2 ≥ · · · ≥ vn → v as n → ∞.Since X0

t is a supermartingale, the optional sampling theorem (Theorem 5.9) impliesE[X0

vn|Fv]≤ E[X0

vn+k|Fv]≤ X0

v a.s. for any integers n and k. Thus for any set A∈Fv,

5.4 Martingales 217

the sequence E(IAX0vn)∞

n=1 is nondecreasing in n and bounded above by E[IAX0v ].

Hence the right-continuity of X0 and Fatou’s lemma imply limn→∞

E[IAX0vn] = E[IAX0

v ]

because

limn→∞

E[IAX0

vn

]≤ E

[IAX0

v]= E

[IA lim

n→∞X0

vn

]≤ lim

n→∞E[IAX0

vn

].

Consequently, by (5.33),

E[IAXv] = limn→∞

E[IAXvn ] = limn→∞

E[IAX0vn] = E[IAX0

v ]

hold for all Fv-measurable set A. Thus E[IA(Xv −X0v )] = 0 or E[Xv −X0

v |Fv] = 0.This implies the assertion because Xv −X0

v ∈ Fv.

Theorem 5.14. A right-continuous and integrable process X = Xt ,0 ≤ t < ∞ isa supermartingale if and only if E[XT ] ≤ E[XS] for every pair S ≤ T of boundedstopping times.

Proof. The “only if” part follows simply from Doob’s optional sampling theorem.For the “if” part, given any deterministic times s < t and event A ∈ Fs, define astopping time S = sIA + tIAc Then S ≤ t a.s. and E[XS] = E[XsIA]+E[XtIAc]≥ E[Xt ],which implies E[XsIA]≥ E[XtIA]. It follows that Xs ≥ E[Xt |Fs] and therefore Xt is asupermartingale.

The next result is Theorem 6 in Chap. VI of Dellacherie and Meyer (1982).

Theorem 5.15. Let X be a right continuous supermartingale with supt E|Xt | < ∞.Then lim

t→∞Xt exists a.s.

We here introduce the concept of dominance for stochastic processes. A processX1 dominates another process X2 if and only if

Pr(X1t ≥ X2

t for all t ∈ [0,∞]) = 1.

In the case that X1 and X2 are right-continuous, we readily see that

Pr(X1t ≥ X2

t for all t ∈ [0,∞]) = 1 ⇐⇒ Pr(X1t ≥ X2

t ) = 1 for all t ∈ [0,∞]

⇐⇒ Pr(X1t ≥ X2

t ) = 1 for all t in a countable dense subset of [0,∞].

Moreover, if X1 is a right-continuous supermartingale and X2∞ ≤ limsupt→∞ X2

t ,then the dominance of (X1

t ,0≤ t <∞) over (X2t ,0≤ t <∞) implies that (X1

t ,0≤ t ≤∞) dominates (X2

t ,0 ≤ t ≤ ∞). This can be established by the convergence theoremof supermartingale (see Theorem 5.15).

Proposition 5.12. Let (Xt ,0 ≤ t ≤ ∞) be a supermartingale and Dt ≥ t a series ofstopping times indexed by time t, which is nondecreasing and right-continuous in t.Then Jt = E[XDt |Ft ] is also a supermartingale. Moreover, if X is cadlag, so is J.

Proof. The conclusion that Jt = E[XDt |Ft ] is a supermartingale can be verified bythe supermartingale property of X as follows. For any stopping times σ ≥ v,

E[Jσ |Fv] = E[E[XDσ |Fσ ]|Fv] = E[XDσ |Fv] = E[E[XDσ |FDv ]|Fv]

≤ E[XDv |Fv] = Jv.

For the second part, it suffices to check the continuity of E[Jt ] in t, which followsfrom lims↓t E[Js] = lims↓t E[XDs ]≥ E[lims↓t XDs ] = E[XDt ] = E[Jt ].

5.5 Optimal Stopping Problems

We here present some basic exposition of optimal stopping problems that will behelpful in proving the optimality of Gittins index policies. Similar treatment can befound in, for example, Chap. 1 of Peskir and Shiryaev (2006) and Appendix D ofKaratzas and Shreve (1998).

Let X = Xt , t ≥ 0 be a real-valued stochastic process defined on a filtered prob-ability space (Ω ,F ,Pr), where F = Ft , t ≥ 0 is the corresponding filtration thatsatisfies the usual conditions (i.e., Ft is increasing and right-continuous in t, andcontains all Pr-null sets. The time horizon is assumed without loss of generality tobe R+ = [0,∞]. Further suppose that X is right-continuous in the sense Xt is a right-continuous function of t with probability 1 and left-continuous over stopping timesin the sense that τn ↑ τ implies Xτn → Xτ a.s. as n → ∞ for stopping times τn and τ .We will also assume

E[

esssupt≥0

|Xt |]< ∞. (5.34)

For any stopping time v, denote by Mv (Mv+) the class of stopping times τ withτ ≥ v (τ > v on v < ∞ and τ = ∞ on v = ∞). Thus M0 is the collection of allstopping times. The optimal stopping problem seeks to solve

Sv = supτ∈Mv

E[Xτ |Fv]. (5.35)

Remark 5.3.1. If the time horizon is not right closed, there may be no optimal solution for this

problem. As an obvious example, let Xt be a deterministic increasing functionof t. In such a case, if the time horizon is [0,+∞), then no stopping time isoptimal. In many situations, Xt is the cumulated gain over the time period [0, t]and thus to speak of X∞ is natural.

2. Henceforth, without any loss of generality we assume that Xt ≥ 0 for all t. If Xtakes on nonnegative values, we can set H = essinft Xt , which is an integrablerandom variable due to assumption (5.34), and introduce the cadlag version ofthe martingale Mt = E[H|Ft ], t ≥ 0 (clearly Mt ≤ Xt a.s. for all t). Then wecan replace the initial gain process Xt with the adapted right-continuous processXt = Xt −Mt such that

5.5 Optimal Stopping Problems 219

supτ∈Mv

E[Xτ |Fv] = supτ∈Mv

E[Xτ +Mτ |Fv] = supτ∈Mv

E[Xτ |Fv]+Mv.

Define the stopping time

τ = infs ≥ 0 : Xs = Ss, (5.36)

which is the time point at which the current gain (if one stops) agrees with theoptimal expected future gain. Intuitively, at such a stopping time τ , it is currentlyoptimal to stop immediately. We will show that it is also globally optimal to stop atsuch a time instance.

By Lemma 5.1, the following proposition is straightforward.

Proposition 5.13. For any stopping time τ , we can find a sequence of stopping timesσk ∈Mτ such that E[Xσk |Fτ ] ↑ Sτ .

By replacing the stopping times v with deterministic time t, the following twopropositions essentially state that the equality in (5.35) indeed defines a stochasticprocess that is a supermartingale. This stochastic process is commonly known as theSnell’s envelope of X .

Proposition 5.14.(1) SvIσ=v = Sσ Iσ=v a.s. for any stopping times v and σ .(2) For τ ∈Mv,

E[Sτ |Fv] = esssupρ∈Mτ

E[Xρ |Fv]≤ Sv a.s. (5.37)

Proof.(1) Note that σ = v∈Fv∩Fσ =Fv∧σ (Proposition 5.7). For any τ ∈Mv, define

τ = τIσ=v+∞Iσ =v which is also a stopping time in Mσ by Proposition 5.6.Using the second part of Proposition 5.7, we get

Iσ=vE[Xτ |Fv] = Iσ=vE[Xτ |Fv) = Iσ=vE[Xτ |Fv∧σ ]

= Iσ=vE[Xτ |Fσ ]≤ Iσ=vSσ .

Taking esssupτ∈Mvon both sides gives SvIσ=v≤ Sσ Iσ=v a.s. Exchanging the

roles of σ and v yields the opposite inequality SvIσ=v ≥ Sσ Iσ=v a.s. ThusSvIσ=v = Sσ Iσ=v a.s.

(2) Use Proposition 5.13 to choose a sequence of stopping times ρn,n ≥ 1⊂Mτsuch that E[Xρn |Fτ ] ↑ Sτ . Thus the equality in (5.37) follows because, by themonotone convergence theorem,

E[Sτ |Fv] = limn→∞

E[E[Xρn |Fτ )|Fv] = limn→∞

E[Xρn |Fv]≤ esssupρ∈Mτ

E(Xρ |Fv]

= esssupρ∈Mτ

E[E[Xρ |Fτ ]|Fv]≤ E[Sτ |Fv].

Finally, (5.37) follows from esssupρ∈Mτ

E[Xρ |Fv]≤ esssupρ∈Mv

E[Xρ |Fv] = Sv.


By replacing the stopping times v with a deterministic (stopping) time t, part(1) of Proposition 5.14 essentially states that the equality in (5.35) indeed defines astochastic process and the relation (5.37) states that it is a supermartingale. Thisproperty of St is intuitive because the later the process X starts to play, the lessopportunity presented to the player and thus the less maximum expected gain.

Remark 5.4. Obviously, this proposition holds for discrete time stochastic processestoo. Consequently, Snell’e envelope can be simply defined at each deterministic timepoint and claimed to hold also for stopping times. We sometimes call this procedurerandomization, which is particularly useful in some envelope-related proofs.

Proposition 5.15. For any decreasing sequence of stopping times vn,n ≥ 1⊂Mvsuch that vn ↓ v a.s., we have

limn→∞

E[IASvn ] = E[IASv] for all A ∈ Fv. (5.38)

Proof. For any vn ↓ v and A ∈ Fv, by (5.37), E[IASvn ] = E[IAE[Svn |Fv]] ≤ E[IASv].For the opposite inequality, choose a sequence of stopping times ρk ∈Mv such thatE[Xρk |Fv] ↑ Sv. Then the monotone convergence theorem yields

E[IASv] = E[IA limk→∞

E[Xρk |Fv]] = limk→∞

E[IAXρk ].

For each ρk ∈Mv, define ρkn = ρk ∨ vn ∈Mvn . Then for A ∈ Fv, we have

E[IASvn ]≥ E[IAE[Xρkn |Fvn ]] = E[IAXρkn ].

Because ρkn ↓ ρk ∨ v = ρk as n → ∞, the right continuity of X and the assump-tion (5.34) state that lim

n→∞E[IASvn ]≥ limn→∞ E[IAXρkn ] = E[IAXρk ] for all k, and hence

limn→∞

E[IASvn ]≥ limk→∞

E[IAXρk ] = limk→∞

E[IAE[Xρk |Fv]] = E[IASv].


Taking A = Ω , the equality in (5.38) says that E[St ] is right-continuous in t.Therefore, by Theorem 5.13, there exists a cadlag modification S0

t of St . With thisproposition in hand, we can prove the following consequence with a stronger resultthan the modification.

Proposition 5.16. S0v = Sv a.s. for any stopping time v. Moreover, S0 dominates X

and, if S is another cadlag supermartingale dominating X, then S dominates S0 too.

Proof. The first part is a straightforward conclusion of (5.38) and Proposition 5.11.For the “moreover” part, let S be another cadlag supermartingale dominating X .For any t ∈ [0,∞], if τ ∈ Mt , then E[Xτ |Ft ] ≤ E[Sτ |Ft ] ≤ St a.s., where the lastinequality follows from the assumption that S is a supermartingale. Consequently,S0

t = esssupτ≥t E[Xτ |Ft ]≤ St .

With these propositions, we can give the following one indicating the conditionsfor a stopping time to be optimal.


Proposition 5.17. A stopping time τ∗ is optimal, i.e., E[Xτ∗ ] = S00 = supρ∈mE[Xρ ],

if and only if (i) S0τ∗ = Xτ∗ a.s. and (ii) the stopped supermartingale given by

S0t∧τ∗F ,0 ≤ t ≤ ∞ is a martingale.

Proof. For the “only if” part, suppose that τ∗ is an optimal stopping time. For anystopping time σ , since τ∗ ∈Mσ∧τ∗ , we have S0

σ∧τ∗ ≥ E[Xτ∗ |Fσ∧τ∗ ] and it followsthat E[S0

σ∧τ∗ ]≥ E[Xτ∗ ] = S00 ≥ E[S0

σ∧τ∗ ], which implies E[S0σ∧τ∗ ] = E[Xτ∗ ] = S0

0. ThusS0

t∧τ∗ is a martingale (by Theorem 5.14). Taking σ = τ∗ yields E[S0τ∗ ] = E[Xτ∗ ] = S0

0.Since S0

τ∗ ≥ Xτ∗ we have S0τ∗ = Xτ∗ .

For the “if” part, it is easy to see that conditions (i) and (ii) of the propositionimply E[Xτ∗ ] = E[S0

τ∗ ] = E[S00] = S0

0 = supρ∈mE[Xρ ].

The next theorem establishes the existence of such an optimal stopping time τ∗.For any time instant t, define

Dλt = infu ≥ t : λ S0

u ≤ Xu for t ∈ [0,∞] and λ ∈ (0,1),

which is a stopping time because S0u and Xu are right-continuous and nondecreasing

respectively in t and λ . Moreover, due to the right-continuity of S0 and X , for anystopping time v,

λ S0Dλ

v≤ XDλ

v. (5.39)

We first prove a proposition.

Proposition 5.18. For any stopping time v,

S0v = E[S0

Dλv|Fv] a.s.

Proof. First note that S0v ≥ E[S0

Dλv|Fv] by the relation in (5.37). For the reversed

inequality, consider the random variables

λ S0τ +(1−λ )E

[S0

Dλτ|Fτ

],

where τ is an arbitrary stopping time. Because IDλτ =τS0

Dλτ= S0

τ IDλτ =τ,

λ S0τ +(1−λ )E

[S0

Dλτ

∣∣∣Fτ]= S0

τIDλτ =τ+

λ S0

τ +(1−λ )E[

S0Dλ

τ

∣∣∣Fτ]

IDλτ >τ

≥ S0τIDλ

τ =τ+λ S0τIDλ

τ >τ.

For λ S0τ > Xτ on event Dλ

τ > τ, we see that λ S0τ +(1−λ )E[S0

Dλτ|Fτ ] dominates

Xτ for all stopping times τ . Therefore, for the stopping time v,

S0v = esssup

τ≥vE[Xτ |Fv]≤ esssup

τ≥vE[λ S0

τ +(1−λ )E[S0Dλ

τ|Fτ ]|Fv]

= esssupτ≥v

E[λ S0τ +(1−λ )S0

Dλτ|Fv].

In view of v ≤ τ and so v ≤ Dλv ≤ Dλ

τ , it follows that

S0v ≤ esssup

τ≥vE[λ S0

τ +(1−λ )E[S0Dλ

τ|FDλ

v]|Fv]

≤ esssupτ≥v

λ S0v +(1−λ )E[S0

Dλv|Fv]

= λ S0v +(1−λ )E[S0

Dλv|Fv].

Consequently, S0v ≤ E[S0

Dλv|Fv], and the desired result then follows.

For every stopping time v, define D∗v = limλ↑1 Dλ

v , which is also a stopping time.

Theorem 5.16. Suppose that Xt is right-continuous in t and left-continuous overstopping times with E[sup0≤t≤∞ Xt ]< ∞. Then for any stopping time v,

S0v = E[XD∗

v |Fv] and D∗v = inft ≥ v : S0

t = Xt. (5.40)

Proof. By Proposition 5.18 and (5.39), we see that

S0v = E[S0

Dλv|Fv]≤

1λ E[XDλ

v|Fv] a.s.

Now for all λ ∈ (0,1), XDλv≤ X = sup0≤t≤∞ Xt . Then the left continuity of X over

stopping times and the Dominated Convergence Theorem imply

S0v ≤ lim

λ↑1E[XDλ

v|Fv] = E[XD∗

v |Fv]≤ S0v , (5.41)

where the last inequality follows from the definition of Sv. Thus, the inequalitiesin (5.41) hold with equalities, and this leads to the first equality in (5.40).

In addition, because D∗v ∈ MD∗

v , we also have E[XD∗v ] = E[S0

D∗v] = E[S0

v ], whichimplies XD∗

v = S0D∗

vsince S0 dominates X . It follows that D∗

v ≥ inft > v : Xt = S0t .

The reversed inequality is obvious as Dλv is nondecreasing in λ and D∗

v = limλ↑1 Dλv .

Thus the second equality in (5.40) holds.

To summarize, for the optimal stopping problem (5.35) under assumption (5.34),we have

• The process S0t , t ∈ R+ is the smallest cadlag supermartingale that dominates

Xs,s ≥ t, known as the Snell’s envelope.• The stopping time τ in (5.36) is optimal for (5.35) and τ∗ ≥ τ a.s. for any other

optimal stopping time τ∗ for (5.35).• The stopped process Ss∧τt ,s ≥ t is a cadlag martingale.

Remark 5.5 (Initial augmentation of filtrations). Suppose that we have a σ -algebraG to indicate additional information. This introduces a new filtration G = Gt withGt = Ft ∨G , t ∈ R+, which is called the initial enlargement (or augmentation) of


F by G . Under the augmented filtration, we have the corresponding dominatingcadlag G -supermartingale S = St , t ∈R+, which evidently satisfies

St ≥ St (5.42)

for all t. Together with the right-continuity of both super-martingales, the optimalstopping times for the problems with filtrations F and G respectively are given by

τ = infs ≥ 0 : Xs = Ss≤ τ = infs ≥ 0 : Xs = Ss. (5.43)

Moreover, because St∧τ is a cadlag G -martingale and τ is also a G -stopping time,we obtain

E[Sτ ] = E[Sτ ] = E[Xτ ]≤ E[Sτ ]≤ E[Sτ ]. (5.44)

The expressions (5.42)–(5.44) give rise to the equality Sτ = Sτ , and hence theoptimal stopping times τ for S and τ for S agree. Therefore, if G is independent ofthe filtration F , then the adaptedness of X with respect to F and the independencebetween F and G indicate that the optimization problems St = supτ≥t E[Xτ |F0] andSt = supτ≥t E[Xτ |Ft ∨G ] have the same solution. Therefore, even if the domainsof stopping times are accordingly enlarged, the optimal stopping problems arebasically remain unchanged if the additionally obtained information is independentof the original information filtrations. This is stronger than the intuition that anyadditional information does not change the nature of an optimal problem if it isindependent of the information filtration.

Chapter 6Multi-Armed Bandit Processes

This chapter studies the powerful tool for stochastic scheduling, usingtheoretically elegant multi-armed bandit processes to maximize expected totaldiscounted rewards. This problem can be solved by the reputable theory of Gittinsindices.

Multi-armed bandit models form a particular type of optimal resource allocation(usually working with time assignment), in which a number of machines or proces-sors are to be allocated to serve a set of competing projects (termed as arms). In thetypical framework, the system consists of a single machine and a set of stochasti-cally independent projects, which will contribute random rewards continuously orat certain discrete time points, when they are served. The objective is to maximizethe expected total discounted rewards over all dynamically revisable policies. Afterthe first version of multi-bandit problems was formulated in the area of sequentialdesigns by Robbins (1952), there had not been any essential progress in two decades,until Gittins and his collaborators made celebrated research achievements in Gittins(1979), Gittins and Jones (1974), Gittins and Glazebrook (1977), and Whittle (1980)under the Markov and semi-Markov settings. In this early model, each arm is mod-eled by a Markov or semi-Markov process in which the time points of making statetransitions are decision epochs. The machine can at each epoch pick an arm to servewith a reward represented as a function of the current state of the arm being pro-cessed, and the solution is characterized by allocation indices assigned to each statethat depends only on the states of the arms. These indices are therefore known asGittins indices and the optimal policies are usually called Gittins index policies,due to his reputable contributions. The significance of Gittins’ contribution is thedrastic reduction of dimensions: instead of resolving the optimal problems of theMarkov (or semi-Markov) decision models formed by all arms, one only needs tocompute the index function of the states based merely on the information deliveredin this arm itself. The past four decades have witnessed a crucial and prominentrole played by Bandit processes and Gittins index in stochastic scheduling and otherareas involving allocating limited resources to competitive demands.

Gittins’ seminal proof of the optimality of his index policies, which employed theinterchange argument, has proved highly complicated and is extraordinarily difficult


225

226 6 Multi-Armed Bandit Processes

to follow. Whittle (1980) provided a mathematically more elegant proof by showingthat those policies solve the optimality equations of the corresponding dynamicprogramming modeling the multi-bandit processes. Another line of proof uses anintuitive deduction from the economical notion, as presented by Weber (1992) andIshikida and Varaiya (1994). EL Karoui and Karatzas (1993) presented a mathe-matically rigorous proof for arbitrary stochastic processes evolving in integer times.Section 6.1 provides a comprehensive treatment for this classical model based onEL Karoui and Karatzas (1993).

Soon after the seminal paper of Gittins, the extension to branching banditproblem to model stochastic arrivals (also known as the open bandit or arm-acquiring bandit problem) was first investigated by Nash (1973) and followed byWhittle (1981). Following the auxiliary retirement argument invented by himselfWhittle (1980, 1981) presented an elegant and interesting proof for the optimalityof Gittins index policies. Other proofs are provided by Varaiya et al. (1985), Weiss(1988) and Tsitsiklis (1994), based on interchange arguments, Weber (1992) andIshikida and Varaiya (1994), using the intuitive deduction from the economicalnotion, and Bertsimas and Nino-Mora (1996) by the notion achievable region infinite state case, which is particularly useful for the algebraic computation of theGittins indices. Under certain stability conditions, Lai and Ying (1988) also showthat Gittins indices for open bandit processes are equivalent to those of traditional(closed) bandit processes if the discount rate approaches 1. Section 6.2, based onWu and Zhou (2013), gives a detailed exposition for branching bandit problems,which even allows for possibly negative durations.

In addition, we also provide a section to deal with the generalized banditproblems, which is another extension of the classical model of multi-armed banditproblems formulated first by Nash (1980) in which the rewards depend on all statesof the arms waiting in the system. Following the results in Sect. 6.2, Sect. 6.3 givesa concise account for generalized branching bandit processes with arbitrarily manystates and possibly negative durations.

Other extensions include the models of restless bandit, formulated by Whittle(1988), in which each arm evolves restlessly according to two different mecha-nisms (idle fashion and busy fashion), and the models with switching costs/delaysby Banks and Sundaram (1994) and Van Oyen et al. (1992), who showed that noindex policy is optimal when switching between arms incurs costs/delays. We willnot further discuss these two types of models in this book.

All the models discussed above are on the basis of discrete time, for which thereward payments and the information update (represented by information filtration)occur only at certain discrete time points. Other modifications of the multi-armedbandit models may allow continuous time, where the filtration update and pay-ments can be accrued continuously. Remarkable contributions to this type of modelsand their solutions have been made by Bank and Kuchler (2007), EL Karoui andKaratzas (1994, 1997), Kaspi and Mandelbaum (1995, 1998), Mandelbaum (1987),etc. It turns out that the continuous time setting dramatically changes the situations:for Gittins index policies (following the instantaneously highest Gittins index) to beapplicable and the optimal solutions to exist, the machine is required to be sharablesimultaneously by all arms. An account for this continuous time model is discussed

6.1 Closed Multi-Armed Bandit Processes in Discrete Time 227

in Sect. 6.4, but we will leave out the technical details that are out of the scope ofthis book. It appears, however, that no efforts have been reported for such variationsas branching bandits and restless bandits in continuous time.

This chapter provides a detailed treatment for the theory of multi-armed banditprocesses established by Gittins and others. The classical theory for multi-armedbandit processes is discussed in Sect. 6.1, where the proof is based on the one pro-vided by EL Karoui and Karatzas (1993) for Markovian setting, but with some alter-ations in order to accommodate the semi-Markovian setting. Section 6.2 is devotedto a recent treatment by Wu and Zhou (2013) of open bandit processes in whichinfinitely many arms are allowed. An extension to generalized open bandit pro-cesses, including the generalized bandit processes of Nash (1973), is discussed inSect. 6.3. Finally, a concise account for closed bandit processes in continuous timeis presented in Sect. 6.4.

6.1 Closed Multi-Armed Bandit Processes in Discrete Time

A basic multi-armed bandit process consists of a set of stochastically independentprojects (referred to as arms), each can be characterized by a stochastic process (indiscrete time). The following exposition is based on EL Karoui and Karatzas (1993)with a generalization to allow random durations, compared to a constant duration 1in EL Karoui and Karatzas (1993).

6.1.1 Model and Solution

A multi-armed bandit in discrete time evolves as follows. Let N = 0,1,2, . . ..The primitives are d adapted stochastic sequences (Xk,sk,F k),k = 1,2, . . . ,d, on aprobability space (Ω ,F ,Pr) to represent d arms, meeting the following technicalconditions:

1. Filtrations: F k = F kn ,n ∈ N is by convention an increasing filtration con-

taining the information accumulated during the first n pulls of arm k and, withoutloss of generality, F k

0 = ∅,Ω (if this is not the case, we just consider allthe expectations below conditional on F k

0 ). The family F 1∞,F

2∞, . . . ,F

d∞ of

sub-σ -algebras are assumed to be mutually independent; we will refer to this asindependence between the filtrations F 1,F 2, . . . ,F d.

2. Rewards and durations: (Xk,sk) = (Xkn ,s

kn) : n ∈ N is F k-adapted (i.e.,

(Xkn ,s

kn) is F k

n -measurable), where skn > 0 (n ≥ 1) is the stochastic time duration

that arm k has to undergo after it is operated for the nth time (sk0 = 0 is assumed

for convenience), and Xkn ≥ 0 is the instantaneous reward at the (n+1)th selection

of arm k subject to discount. At the beginning, if arm k is selected, a reward Xk0

is accrued and the arm then undergoes a stochastic duration sk1. At the end of sk

1,if arm k is selected again, then it contributes another reward Xk

1 and undergoes

another stochastic duration sk2, and so on. It is evident that if arm k is operated

alone, the (n+ 1)th operation starts at time point Skn with a discounted reward

e−δSknXn and ends at the (n+ 2)th decision epoch, where

Skn+1 = sk

0 + · · ·+ skn+1, n ∈N. (6.1)

Under this condition, the reward Xkn at the (n + 1)th selection of arm k is

F kn -measurable, i.e., Xk

n is realized at the (n+ 1)th selection. This correspondsto the pre-payment setting on the selection in which Xk

n is paid at the beginningof the (n+1)th operation of the arm. Another option is the post-payment settingin which Xk

n is paid at the end of the (n+ 1)th operation of arm k, which ismore frequently employed in the literature on scheduling. Nevertheless, it can bereadily examined that the results on either option imply the other. The followingassumption is made for the rewards and durations.

Assumption 6.1 (Integrability) The following integrability condition holds foreach k = 1,2, . . . ,d:

E

[∞

∑n=0

e−δSknXk

n

]< ∞. (6.2)

3. Policies: Let S =Nd . An allocation policy is characterized by a d-dimensionalinteger-valued stochastic sequence

N = Nn : n ∈N= (N1n ,N

2n , . . . ,N

dn ) : n ∈N

such that every Nkn is an integer to indicate the number of pulls of arm k during the

first n pulls of all arms, fulfilling the following obvious technical requirements:

1. Nn = (N1n ,N

2n , . . . ,N

dn ) is component-wise nondecreasing in n with N0 = 0 (the

d-vector of zeros),2. N1

n +N2n + · · ·+Nd

n = n, and3. Nn+1 ∈ FNn = F 1

N1n∨F 2

N2n∨ · · ·∨F d

Ndn.

Write ek for the d-vector with 1 at its kth entry and 0 elsewhere, k = 1,2, . . . ,d.Condition (2) indicates that Nn+1 −Nn can only be one of ek,k = 1,2, . . . ,d sothat the machine is exclusively allocated to the arm indicated by Nn+1 −Nn andno idle is allowed in effect. Formally, idles of the machine can be allowed byreplacing condition (2) with N1

n +N2n + · · ·+Nd

n ≤ n. But a mathematically easierway is to introduce a dummy arm with constant reward zero at any time andconstant filtration; this will be adopted for more complicated bandit problems.For the closed bandit processes, however, any idle can cause a reduction of thetotal rewards due to the effect of discounting, and thus we prohibit idles at thisstage.Note that condition (3) on policies is also equivalent to Nn+1 − Nn ∈ FNn ,indicating that the decision on the (n + 1)th pull relies only on the informa-tion collected in the previous n pulls. Another equivalent expression of condition(3) is stated in the following lemma, which is more convenient to be extended to


the continuous time setting in Sect. 6.4, see, e.g., Kaspi and Mandelbaum (1998)and Mandelbaum (1986), who provide the main idea of the following proof.

Lemma 6.1. Under the independence assumption between F k,k = 1,2, . . . ,dand the completeness of F k for k = 1,2, . . . ,d, Nn+1 −Nn ∈ FNn for all n ∈N ifand only if

Nn ≤ n ∈ Fn, n ∈N, (6.3)

where n = (n1,n2, · · · ,nd) ∈ S and Fn=F 1n1∨F 2

n2∨ . . .∨F d

nd.

Proof. The “only if” part can be proved by induction on n. The assertion holdsclearly for n= 0. Assume Nn ≤ n∈Fn and define Ak = ω : Nn+1−Nn = ek.Note that the condition Nn+1 −Nn ∈ FNn implies Ak ∈ FNn . Thus

Nn+1 ≤ n =d

∑k=1

Nn ≤ n− ek∩Ak ∈d∨

k=1

Fn−ek ⊂ Fn.

This proves the “only if” part.For the “if” part, note that Nn = n = Nn ≤ n−

⋃m≤n,m=nNn ≤ m. Hence

condition (6.3) implies that for each n ∈ N, Nn = n ∈ Fn for all n, whichfurther gives rise to the equivalence that, for any fixed n, Nn = n ∈ Fn for alln if and only if Nn ≤ n ∈ Fn for all n. Because

Ak ∩Nn + ek = m = Nn = m− ek,Nn+1 = m ∈ Fm,

it is clear that Ak ∈ FNn+ek ,k = 1,2, . . . ,d. It can be easily checked that FNn+ei

and FNn+e j are independent given FNn , hence IAk = 0 or 1 a.s. for all k (due to

∑dk=1 Var(IAk |FNn) = 0). Thus Ak ∈ FNn for all k thanks to the completeness of

the filtration. It follows that Nn+1 −Nn ∈ FNn . This completes the proof.

Under a policy N, the calendar time at the (n+ 1)th pull is

Tn := Tn(N) =d

∑k=1

SkNk

n. (6.4)

Accordingly, the value of the policy N (the expected total discounted rewards underN) can be expressed as

v(N) = E

[∞

∑n=0

e−δTnd

∑k=1

XkNk

n

(Nk

n+1 −Nkn

)]. (6.5)

The objective of the bandit problem is to find an optimal policy N that maximizesv(N) :

v(N) = maxN

v(N). (6.6)

The solution to this problem is the celebrated Gittins index policy, which is definedand deduced below. It is obvious that if Assumption 6.1 is violated for some k,


i.e., E[∑∞

n=0 e−δSknXk

n

]= ∞, this bandit problem is trivial because one can obtain

an infinite expected reward by operating arm k all the time. Thus, in the deductionbelow, we implicitly impose Assumption 6.1.

Definition 6.1. For each arm k, define a sequence of arm-specified indices

Gkn = esssup

τ>n

E[

∑τ−1j=n e−δSk

j XkSk

j|F k

n

]

E[∫ Sk

τSk

ne−δudu|F k

n

] a.s., , n ∈N, (6.7)

where esssup indicates the essential supremum as defined in Theorem 5.6 and τ > nis an arbitrary integer-valued F k-stopping time.

Note that Gkn,n ∈N is also a stochastic sequence, adapted to the information

filtration F k , and is generally known as Gittins indices today due to Gittins’ seminalcontributions in Gittins and Jones (1974) and Gittins (1979). The solution of this dis-crete time bandit problem is stated as in the theorem below, whose proof is deferredto Sects. 6.1.2 and 6.1.3 later.

Theorem 6.1. A policy N is optimal if it always pulls the arm with the highestGittins index; in other words, if Nk

n+1 − Nkn = 1 only when Gk

Nkn= max

1≤ j≤dG j

Nkn

for

all n ∈N and k ∈ 1,2, . . . ,d.

Remark 6.1. In the currently discussed model, for all n ∈ N, the (n+ 1)th rewardof an arm is required to be paid at its (n+ 1)th pull, and thus are known at thatmoment. However, Theorem 6.1 also applies if the reward is instead paid at the endof the (n+ 1)th pull (thus is a random variable rather than a realized value at thetime instant when the arm is pulled) if the Gittins index is accordingly adjusted to

Gkn = esssup

τ>n

E[

∑τj=n+1 e−δSk

j XkSk

j

∣∣∣F kn

]

E[∫ Sk

τSk

ne−δudu

∣∣F kn

] a.s., n ∈N. (6.8)

6.1.2 Single-Armed Process

First we investigate a single-armed process and thus for the time being the armidentifier is suppressed for simplicity of notation as long as no confusion arises. Forthis fixed arm (X ,s,F ), denote the lower envelope of Gn by Gn = min0≤l≤n Gn andthe periodically discounted duration by

∆n = E[∫ Sn+1

Sn

e−δ (u−Sn)du∣∣∣∣Fn

], n ∈N,

where Sn is defined in (6.1). Further introduce a companion process (X ,s,F ) byreplacing the rewards Xn with

Xn = ∆nGn. (6.9)

Clearly, X is also F -adapted. For X , the one-step reward rate rn = Xn/∆n = Gn isnonincreasing in n (the so-called deteriorating process). Since X and X are definedwith respect to a common information filtration, common policies can be applied toboth.

Given any fixed γ ∈ R, associate the process (X ,s,F ) with a new stochasticsequence

Yn(γ) =n

∑j=0

e−δS j (Xj − γ∆ j)

and introduce an optimization problem

esssupτ>n

E [Yτ(γ)] . (6.10)

For every fixed n, define a random variable

vn(γ) = esssupτ>n

E

[τ−1

∑m=n

e−δSm (Xm − γ∆m)

∣∣∣∣∣Fn

], (6.11)

where τ > n is again an arbitrary (but integer-valued) F -stopping time. So vn(γ) isfinite, convex and strictly decreasing in γ with vn(−∞) = ∞ and vn(∞) =−∞ thanksto the integrability condition (6.2), which corresponds to the condition in (5.34) foroptimal stopping problems. This simply says that vn(γ) = 0 has a unique solution,denoted by v−1

n (0). The following lemma establishes the connection between Gittinsindex Gn and the function vn(γ).

Lemma 6.2. For each fixed n, the Gittins index can be computed by

Gn = v−1n (0) (6.12)

and the essential supremum in (6.7) is attained by the stopping times

τ(Gn) = minm > n : Gm ≤ Gn or τ(Gn) = minm > n : Gm < Gn. (6.13)

Proof. Suppose that Gn is defined by equalities (6.11) and (6.12). Applying Snell(1952)’s optimal stopping theory (see also Sect. 5.5 for the version in continuoustime) to the optimization problem (6.10), it follows that, for every n ∈N,

τn(γ) = min

m > n : esssup

τ>mE

[τ−1

∑j=0

e−δS j (Xj − γ∆ j)

∣∣∣∣∣Fm

]≤ Ym−1(γ)

= minm > n : vm(γ)≤ 0 (6.14)

is the optimal stopping time that attains vn(γ) in (6.11). Note that, in view of thestrict decrease of v, (6.12) implies that vm(γ) ≤ 0 ⇐⇒ Gm ≤ γ . Therefore, (6.14)can be rewritten as

τn(γ) = minm > n : Gm ≤ γ, n ∈N. (6.15)

Consequently, substituting γ with Gn in (6.11) and using (6.12) lead to

E

[τ−1

∑j=n

e−δS j (Xj −Gn∆ j)

∣∣∣∣∣Fn

]≤ vn(Gn) = 0

for every stopping time τ > n, with equality at τ = τ(Gn) = minm > n : Gm ≤ Gndue to (6.15). That is,

Gn ≥E[

τ−1∑j=n

e−δS j Xj

∣∣∣∣Fn

]

E[

τ−1∑j=n

e−δS j ∆ j

∣∣∣∣Fn

] =

E[

τ−1∑j=n

e−δS j XS j

∣∣∣∣Fn

]

E[∫ Sτ

Sne−δudu

∣∣∣Fn

] ,

with equality at τ = τ(Gn). This completes the proof.

As what we have done in Proposition 5.14, we can discuss vσ (γ) for any stoppingtime σ instead of vn(γ). Then vσ (γ)I(σ=v) = vv(γ)I(σ=v) a.s. for fixed γ . By the right-continuity of vσ (γ), we actually have Pr

(vσ (γ)I(σ=v) = vv(γ)I(σ=v) for all γ

)= 1.

In this way, we can define the Gittins indices at stopping times by Gσ = v−1σ (0),

which have the property Gσ I(σ=v) = GvI(σ=v). Thus the definition (6.7) of Gittinsindices can be extended to allow stopping times as follows. For any integer-valuedF -stopping time σ ,

Gσ = esssupτ>σ

E[

τ−1∑

j=σe−δS j Xj

∣∣∣∣Fσ

]

E[∫ Sτ

Sσe−δudu

∣∣∣Fσ] =

E[

τ∗−1∑

j=σe−δS j Xj

∣∣∣∣Fσ

]

E[∫ Sτ∗

Sσe−δudu

∣∣∣Fσ] ,

where τ∗ =minm>σ : Gm ≤Gσ or τ∗ =minm>σ : Gm <Gσ. This procedurewill be referred to as randomization.

Moreover, we have the following lemma on another expression of the Gittinsindices, which plays a crucial role in the proof of the optimality of Gittins indexpolicies.

Lemma 6.3. Let G be a σ -algebra independent of the filtration F . Then, for everyF ∨G -stopping time σ ,

Gσ = esssupξ

E[

∞∑

j=σe−δ (S j+ξ j)Xj

∣∣∣∣Fσ ∨G

]

E[

∞∑

j=σ

∫ S j+1+ξ jS j+ξ j

e−δudu∣∣∣Fσ ∨G

] =

E[

τ∗−1∑

j=σe−δS j Xj

∣∣∣∣Fσ ∨G

]

E[∫ Sτ∗

Sσe−δudu

∣∣∣Fσ ∨G] ,

(6.16)

where Fσ ∨G is the augmentation of Fσ by G , ξ = ξn,n ∈ N is an arbitraryF ∨G -adapted nondecreasing sequence of extended random variables (ξn may take∞ with positive probability) with Pr(ξ0 = ∞)< 1.

Proof. Using again the randomization and augmentation arguments (see Remarks5.4 and 5.5), it suffices to prove this lemma for deterministic stopping time σ = nunder the reduced filtration F and F -adapted sequence ξ . To further simplify therepresentation, without loss of generality, we prove the assertion for n = 0. Thus,for each fixed sequence ξ , define

τuξ = max j : e−δξ j ≥ u+ 1= min j : e−δξ j < u,

which is obviously an integer-valued F -stopping time. Then

E

[∞

∑j=0

e−δ (S j+ξ j)Xj

]= E

[∞

∑j=0

e−δS j Xj

∫ 1

0I(

u ≤ e−δξ j)

du

]

=∫ 1

0E

⎡

⎣τu

ξ−1

∑j=0

e−δS j Xj

⎤

⎦du ≤ G0

∫ 1

0E[∫ Sτu

ξ

0e−δ t

]dtdu.

On the other hand, we can obtain in the same way that

E

[∞

∑j=0

∫ S j+1+ξ j

S j+ξ j

e−δudu

]=∫ 1

0E[∫ Sτu

ξ

0e−δ t

]dtdu.

These two expressions indicate

G0 ≥ esssupξ

E[∑∞

j=0 e−δ (S j+ξ j)Xj

]

E[∑∞

j=0∫ S j+1+ξ j

S j+ξ je−δudu

] . (6.17)

Thus the assertion holds because the reverse inequality of (6.17) is apparent.

Next, define a sequence εn : n ∈N of F -stopping times recursively by

ε0 = 0, εn+1 = minm > εn : Gm ≤ Gεn, n ∈N.

Then clearly, εn : n∈N= n∈N : Gn =Gn, Gεn =Gεn=Gl for all l ∈ [εn,εn+1),

and τεn(Gεn) = εn+1. Consequently,

E

[εn+1−1

∑j=εn

e−δ Sj Xj

∣∣∣∣∣Fεn

]= Gεn

E[∫ Sεn+1

Sεn

e−δ udu∣∣∣∣Fεn

]= E

[εn+1−1

∑j=εn

e−δ Sj ∆ jG j

∣∣∣∣∣Fεn

]

= E

[εn+1−1

∑j=εn

e−δ Sj X j

∣∣∣∣∣Fεn

]for all n ∈N, (6.18)

where the second equality is due to the definition of X j in (6.9).


6.1.3 Proof of Theorem 6.1

We are now at the position to prove Theorem 6.1 and thus the identifiers of arms haveto be added back. In addition, denote F k

n = F kn∨

j =k F j∞, n ∈N. Fix an arbitrary

policy N and let ζ kn =mint : Nk

t = n be the generalized inverse of Nkn , as a function

of n∈N, to indicate the number of pulls of the whole bandit at the n-th pull of arm k.Define

ξ kn = ∑

l =kSk

Nlζ k

n

, n ∈N,

to be the total time the arms other than k have been operated at the (n+1)-th pull ofthe k-th arm. Then v(N) can be re-arranged as

v(N) = E

[d

∑k=1

∞

∑n=0

e−δ (Skn+ξ k

n )Xkn

]=

d

∑k=1

E

[∞

∑l=0

E

[εl+1−1

∑n=εl

e−δ (Skn+ξ k

n )Xkn

∣∣∣∣∣Fkεl

]]

By Lemma 6.3,

v(N)≤d

∑k=1

E

[∞

∑l=0

Gεl E

[εl+1−1

∑n=εl

∫ Skn+1+ξ k

n

Skn+ξ k

n

e−δudu


]]

By the definition of X in (6.9) and the equality in (6.18), we further have

v(N)≤d

∑k=1

E

[∞

∑l=0

E

[εl+1−1

∑n=εl

e−δ (Skn+ξ k

n )Xn


]]= v(N).

The d companion reward sequences Xk,k = 1,2, . . . ,d, are pathwise nonincreasingand thus can be pathwise optimally operated by selecting the arm with currentlylargest Gittins index, i.e. the police N. Finally, since one can readily check thatv(N) = v(N) by the usual interchange argument, the following relationships hold:

v(N) ≤ v(N) ≤ v(N) = v(N).

This ends the proof.

6.2 Open Bandit Processes

This section discusses open bandit processes to model the situation where newprojects will come into the system. A simple example is provided below, in whichthe Gittins index policies with the indices computed as in Sect. 6.1 is not optimalwhen the new projects arrive according to certain mechanisms.

Example 6.1 (A bandit with new arrivals). The decision epochs are t = 0,1,2, . . .and there are three types of arms such that:

6.2 Open Bandit Processes 235

• There are totally five states: 0,1,2,3,4, with 0 being an absorbing state.• A bandit of type 1 has three states 0, 1, 2 with corresponding rewards 0, 20, 0

respectively on selection, and deterministic state transition law 1 → 2 → 0.• A bandit of type 2 has two states 0 and 3 with rewards 0 and 5, respectively, and

deterministic state transition law 3 → 0.• A bandit of type 3 has two states 0 and 4 with rewards 0 and 50, respectively, and

deterministic transition law 4 → 0.• The initial states of these three types of bandits are 1,3, and 4 respectively.

At time t = 0, there are only two arms in the system with one from type 1 and theother type 2. New bandit of type 3 arrives according to a geometrically distributedinterarrival Pr(U = i) = (1− p)pi−1, i = 1,2, . . . . A reward at time t is discountedby (4/5)t . The Gittins indices computed based on the closed bandits setting are

State 0 1 2 3 4Gittins index 0 80/9 20 5 50

Consider the following two policies:

• G-policy operates the projects according to the highest Gittins index rule and• O-policy operates first the arm of type 2 for 1 unit of time and then goes according

to the Gittins index rule.

For both policies, the total discounted rewards, denoted by respectively WG(U)and WO(U), depend on the arrival time U of the first arm of type 3. We now comparethe performance measures E[WG(U)] and E[WO(U)]. For U = 1 or 2,

WG(1) = 0+45

501− 4/5

= 200, WG(2) = 0+45× 20+

(45

)2 501− 4/5

= 176,

WO(1) = 5+45

501− 4/5

= 205, and WO(2) = 5+ 0+(

45

)2 501− 4/5

= 165.

For U ≥ 3,

WG(U) = 0+45× 20+

(45

)2

× 5+(

45

)U 501− 4/5

= 1915+ 250

(45

)U

and

WO(U) = 5+(

45

)2

× 20+(

45

)U 501− 4/5

= 1745+ 250

(45

)U

.

ThusE [WO(U)−WG(U)]

1− p= 5− 11p− 7

5p2

1− p> 0

when p < 0.4166. Thus the G-policy is not always optimal.

In this section, we focus on developing optimal policies for open bandit problems.It turns out that Gittins index rules can still produce optimal policies, but some


modifications are required to take into account the information from the arrivingprocesses of new projects. Such models are usually referred to as arm-acquiringbandits, branching bandits or simply open bandits.

6.2.1 Formulation and Solution

We here follow the notation of Whittle (1981) to consider an arm with a differentstate as a different arm or a different arm type. This enables each arm to take exactlyone state without loss of generality.

The system is modeled as the following Markov decision setting:

• States: There are many types of arms (possibly uncountable) labeled by theelements u of an arbitrary abstract space, e.g., R+. The state of the processat any nonnegative integer time t is indicated by nt = (nt(u) : u ∈ R+) fort ∈N= 0,1,2, . . ., where nt(u) is a nonnegative integer indicating the numberof arms of type u at time t. While each arm has one type, different arms mayshare a same type with the same probabilistic features. On the other hand, ageneric state n = (n(u) : u ∈ R+) can also be considered a set of n(u) arms oftype u for all u ∈ R+ = [0,∞).For any fixed v∈R+, define e(v) to be the particular value of n = (n(u) : u∈R+)with n(v) = 1 and n(u) = 0 for u = v,u ∈ R+.At time zero, the initial state n0 is known with n0(u) = 0 for all but finitely manyu ∈ R+, indicating a finite number of arms available at the starting point.

• Actions: At any time with the process in state n, if an arm of type u from theaction space

A(n) = x : n(x)≥ 1 (6.19)

is operated, then the server can collect an immediate reward R(u) and the opera-tion gives rise to

– A random variable V (u), referred to as duration, which may take negativevalues with a positive probability and affects the value of the discountedreward, and

– A new set of arms replacing the arm operated, referred to as the descendantsof the replaced arm. The numbers of descendant arms at each type are repre-sented by a random map w(u) = (w(u,x) : x ∈ R+), where w(u,x) indicatesthe number of the descendants of type x, which is also subject to the condi-tion that w(u,x) ≥ 1 for at least one but finitely many x ∈ R+. Generally, forfixed u, w(u) is actually a stochastic process with “time parameter” x, whosedistribution can be routinely identified by its finite dimensional distributions.

– On selection of an arm of type u, the joint distribution of V (u) and w(u) isassumed to be independent of the history of all operations and the correspond-ing realization of this decision process up to the current time t. Moreover, itis implicitly assumed to be independent of the time t so that we essentiallyobtain a time-homogeneous feature of (V (u),w(u)).

• Idle (un-operated) arms are unaffected.• Policies and resulting processes: From time zero, at any integer time t, based on

the available information, the server selects an available arm of type ut to operateand then obtains an instant reward. Write Rt = R(ut), and Vt = V (ut). Under aspecified policy π , at any time t,

– The state is written as nπt = (nπ

t (x) : x∈R+), the assumptions described aboveon the descendants ensure that nπ

t must satisfy nπt (x) < ∞ for all x ∈ R+ and

nπt (x)> 0 for only finitely many x ∈ R+, so that ∑x∈A(nπ

t )nπ

t (x)< ∞;– The type of the selected arm is denoted by uπ

t ;– The reward for selecting the arm to operate is Rπ

t = R(uπt );

– The duration processes are denoted by V πt ;

– The arm-acquiring process is denoted by wπi (u); and

– The cumulative duration is defined by

Dπ0 = 0, Dπ

t =t

∑j=1

V πj , t = 1,2, . . . (6.20)

Then (nπt ,R

πt ,D

πt ) form a triplet stochastic process in discrete time t = 0,1, . . . .

• Filtration: The natural filtration generated by the process (nπt ,D

πt ) : t = 0,1, . . .

under policy π is denoted by F π(n) = F πt (n) : t = 0,1,2, . . ., or simply

F π = F πt : t = 0,1, . . . if no confusion arises. Clearly Rπ

t = R(uπt ) : t ≥ 0 is

F π -adapted. Conditional on the information at time t (i.e., the σ -algebra F πt ),

the pairs (V (u),w(u)) are independent between the arms presented in nπt .

• Final objectives: Over the infinite time horizon, the server can finally obtaina total discounted reward ∑∞

t=0 β Dπt Rπ

t , where β ∈ (0,1) is the discount factor.Denote the expectation under policy π by Eπ , i.e., the expected total reward isexpressed by

E

[∞

∑t=0

β Dπt Rπ

t

]= Eπ

[∞

∑t=0

β Dt Rt

]. (6.21)

The objective is to find a policy π∗ to maximize the expected total reward:

maxπ

Eπ

[∞

∑t=0

β Dt Rt

]= Eπ∗

[∞

∑t=0

β Dt Rt

]. (6.22)

Remark 6.2. If V (u)≥ 0 with probability 1 for all u, then this model reduces to thecase discussed by Weiss (1988). In particular, when the total number of arm types isfinite and V (u) = 1 is independent of u, this model is further reduced to the modelby Whittle (1981). The case V (u) < 0 corresponds to certain generalized banditproblems proposed by Nash (1980) and will be further discussed in the subsequentSect. 6.3.

Remark 6.3. A positive V = V (u) can be interpreted as an ordinary “duration” forcalculating the discounted value in the sense that βV is the present value of 1

received V units of time later. When V < 0, βV represents the present value of1 received −V units of time ago. It is in that sense a negative V is referred toas a “negative duration” or “reversed time” for the purpose of discounting. Thismay arise in the following scenario. Suppose that an operation is completed at agiven time t. If the operation meets certain criteria, then a 20 % bonus reward willbe paid to all future operations, which has an effect of a multiplier 1.2β for allfuture rewards. If we write 1.2β = βV and consider the situation with β > 0.9, thenV = (log1.2/ logβ )+ 1 ≤ (log1.2/ log0.9)+ 1 = −0.73 < 0. This is effectively adiscounting factor with reversed time V < 0.

For the bandit with initial state e(u), due to the presence of descendants at theoperation of this arm, from time 1 onwards, there may be more than one arm avail-able to select and thus certain policy π is needed to govern the selection amongarms. The Gittins index of an arm at type u is defined by

M(u) = esssupπ ,τ>0

Eπ [∑τ−1t=0 β Dt Rt |u]

1−Eπ [β Dτ |u] , (6.23)

where

• The u-conditioning means that the total discounted reward is collected from thesystem starting with a single arm u,

• π is any policy governing the selections among the descendants of arm u fromtime 1 onwards, and

• τ is a stopping time with respect to the filtration F π = F πt (e(u)) : t = 0,1, . . ..

In the presence of negative V (u) for arm type u, it is possible to have Eπ [β Dτ |u]≥ 1,so that the denominator in formula (6.23) takes zero or negative values. This will beprevented under certain condition presented in the following proposition.

Proposition 6.1. If for some α ∈ (0,1), E[βV (v)|v] ≤ α for any arm type v, thenDπ

∞ =+∞ a.s. and Eπ [β Dτ |u]< 1 for any policy π and F π -stopping time τ .

Proof. Under policy π , the stochastic process β Dπt : t = 1,2, . . . is a super-

martingale because Eπ[β Dt+1 |F π

t]= β Dπ

t E[βV]≤ αβ Dπ

t under the propositionassumption. Hence the martingale convergence theorem tells that β D∞ = lim

t→∞β Dt

almost surely. An application of Fatou’s lemma shows that

0 ≤ Eπ [β D∞ |u] = Eπ[

limt→∞

β Dt |u]≤ liminf

t→∞Eπ [β Dt |u]≤ lim

t→∞αt = 0.

This implies β D∞ = 0 almost surely, and hence D∞ =+∞ almost surely. Moreover,for any F π -stopping time τ , Doob’s optional stopping time theorem states that

Eπ [β Dτ∧t |u]≤ Eπ [β D1 |u] = Eπ [βV (u)|u]< 1.

Letting t → ∞ yields Eπ [β Dτ Iτ<∞|u] ≤ Eπ [βV (u)|u] < 1. Therefore, for any F π -stopping time τ ,

Eπ [β Dτ |u] = Eπ [β Dτ Iτ<∞|u]+Eπ [β D∞ Iτ=∞|u]≤ Eπ [βV (u)|u]< 1.


Remark 6.4. The condition in Proposition 6.1 holds obviously in the followingcases:

(i) The number of arm types is finite and E[βV (u)|u]< 1 for all type u. This in par-ticular covers the traditional case of a finite state space with positive durations.

(ii) The number of arm types is infinite but there exists an ε > 0 such that V (u)≥ εfor all type u. This in particular covers the Markov bandit processes in whichV (u)≡ 1.

Having computed the Gittins indices M(u) in (6.23), we define, for a genericstate n of a bandit process,

M(n) = maxM(u) : u ∈ A(n) (6.24)

for the maximum Gittins index of the currently available arms.For a bandit with initial state e(u), let us operate the arms according to the Gittins

index rule. Define τ(u) to be the first time to clear out all arms in the descendants ofan arm of type u with their Gittins indices no less than M(u) (so that all remainingones have Gittins indices below M(u)), in other words,

τ(u) = mint ∈N : M(nt)< M(u),

with τ(u) = ∞ if M(nt ) ≥ M(u) for all t ∈N. Also write Wτ(u) for the discountedrewards collected in the interval [0,τ(u)). Then τ(u) is a stopping time and W (u) ∈FG

τ(u)(u). It is not difficult to see that

M(u) =E[Wτ(u)]

1−E[β Dτ(u) ]. (6.25)

Expression (6.25) indicates that the Gittins indices are achievable in the sense thatwe have a policy (the Gittins index rule policy) and a stopping time τ(u) such thatM(u) is achieved at their combination. The optimality of the Gittins index policy isstated here.

Theorem 6.2. For the problem formulated in Sect. 6.2.1, a policy π is optimal ifit operates the arms according to the Gittins index rule, provided the condition inProposition 6.1 is satisfied.

6.2.2 Proof of Theorem 6.2

Because the proof mainly involves the Gittins index rule, the symbol to indicatethe policies as in the previous section is dropped to simplify the notation. Morespecifically, we use (nt ,Rt ,Dt) : t = 0,1, . . . for the stochastic process generated

under the Gittins index policy, where any tie (a type with more than one arm) canbe broken arbitrarily. Denote further by F (n) the natural filtration generated by theprocess (nt ,Rt ,Dt) : t = 0,1, . . .. The initial state is denoted by n0.

Define T (x,n) to be the smallest time (or the first time) needed to clear out allarms (including originals and descendants) with Gittins indices above x from aninitial state n0 = n = (n(u),u ∈ R+) following the Gittins index policy. It can beexpressed by

T (x,n) = mint ≥ 0 : M(nt)≤ x;n0 = n.

We can also define

T (x−,n) = mint ≥ 0 : M(nt )< x;n0 = n (6.26)

to be the time at which all the arms with Gittins indices at or above x have beenoperated. For T (x,n) and T (x−,n), we have the following facts:

1. T (x,n) is right-continuous, T (x−,n) is left continuous, and T (x,n)≤ T (x−,n).2. Both T (x,n) and T (x−,n) are stopping times with respect to the filtration F (n),

and thus may allow positive probabilities to take the value +∞.3. It is apparent that

T (x,n) = ∑s∈A(n)

n(s)

∑l=1

Tl(s,x,n) = ∑s∈A(n):M(s)≥M(u)

n(s)

∑l=1

Tl(s,x,n), (6.27)

where Tl(s,x,n) indicates the smallest time needed for the lth type s arm to cleanup all the arms with their Gittins indices above x. For fixed s and u, Tl(s,x,n) areindependent and identically distributed over l = 1,2, . . . ,n(s) as a representativeT (x,e(s)), provided the bandit begins with the initial state e(s). Here we take theconvention that T (x,e(s)) = 0 if M(s)≤ x. Clearly, for T (x−,n), the relationshipdisplayed in (6.27) also holds but with u− in place of u.

Suppose we are now at the time instant T (x,n) and write W (x,n) for the to-tal discounted reward during the interval working period to clean up all arms withGittins index x and their descendents with Gittins indices no less than x, valued attime T (x,n). Then the total discounted reward valued at time zero is β D(x,n)W (u,n),where D(x,n) = DT (x,n) as defined in Eq. (6.20).

We now examine the whole process of the bandit under the Gittins index policy.Starting with the initial state n, the server operates one arm at each operation andthus at most countably many types of arms can be operated over the whole timehorizon.

1. First let x1 = M(n) and so T (x1,n) = 0. The arms with Gittins indices x1 andtheir descendants with indices no less than x1 will be operated from time zero toT (x1−,n);

2. Next take x2 = M(nT (x1−,n)). Then T (x2,n) = T (x1−,n). At T (x2,n), one selectsan arm of type u2 such that M(u2) = x2 and then operates up to time T (x2−,n).

Continue this way to obtain x3,x4, . . . . Then we generate a strictly decreasing(random) sequence of indices xi : i ≥ 1 such that, for i = 2,3, . . .,

0 = T (x1,n)< T (x1−,n) = T (x2,n)< · · ·< T (xi−1−,n) = T (xi,n)< · · · .

Moreover, T (x,n) = T (xi,n) for any x ∈ [xi,xi−1) and T (x−,n) = T (xi−1,n) forany x ∈ (xi,xi−1], that is, T (x,n) and T (x−,n) are both step-down functions in xand T (x−,n) is indeed the left limit of T (x,n) in the usual sense that T (x−,n) =limx′↑x T (x′,n).

Given a state n, T (x,n) itself can be considered as a stochastic process with “timeparameter” x. Consequently, by (6.20),

D(x,n) := DT (x,n) =T (x,n)

∑j=0

Vj

is also a stochastic process. Therefore, for given n, any path of IT(x,n)<∞β D(x,n) isa step function in x.

Under the notation just described, the total discounted reward is given by

R(n) =∞

∑i=1

β D(xi,n)W (xi,n)IT (xi,n)<∞. (6.28)

Furthermore, similar to Eq. (6.25), we have

IT(xi,n)<∞E[W (xi,n)|FT (xi ,n)

]= xiIT(xi,n)<∞E

[1−β D(xi−,n)−D(xi,n)|FT (xi,n)

].

Let Va(n) = E[R(n)] denote the expected total discounted reward, also referredto as the value function.

Because T (xi−,n) = ∞ implies β D(xi−,n) = β ∞ = 0 by Proposition 2.1, thestep function IT (x,n)<∞β D(x,n) has jumps IT (xi,n)<∞(β D(xi−,n) − β D(xi,n)) (maybe positive or negative) at xi, i = 1,2, . . . . Note also that T (x,n) = 0 for x ≥ x1(recall that T (x1,n) = 0), hence D(x,n) = 0 and so IT(x,n)<∞β D(x,n) = 1 for x ≥ x1.Combined with the fact that xi is FT (xi,n)-measurable, it can be readily verified that

Va(n) = E[R(n)]

= E

∞

∑i=1

xiIT(xi,n)<∞E[β D(xi,n)E

[1−β D(xi−,n)−D(xi,n)|FT (xi,n)

]]

= E

∞

∑i=1

xiIT(xi,n)<∞

(β D(xi,n)−β D(xi−,n)

)

= E[∫ ∞

0xd(

IT(x,n)<∞β D(x,n))]

= E[∫ ∞

0

(1− IT(x,n)<∞β D(x,n)

)dx].

Define a function, referred to as the discounting function of the bandit process, by

g(x;n) =E[

IT(x,n)<∞β D(x,n)∣∣∣n0 = n

], x ∈ [0,∞). (6.29)

Then we haveg(x;n1 +n2) =g(x;n1)g(x;n2) (6.30)

due to the facts that: for i = 2,3, . . ..

(i) T (x,n1 +n2) = T (x,n1)+T (x,n2);(ii) IT(x,n1+n2)<∞ = IT(x,n1)<∞IT(x,n2)<∞; and

(iii) The arms presented in n1 +n2 are independent.

If n(u)≥ 1, then Va(n) can be expressed as

Va(n) =∫ ∞

0(1− g(x;n))dx =

∫ ∞

0xdg(x;n)

=∫ ∞

0xg(x;e(u))dg(x;n− e(u))+

∫ ∞

0xg(x;n− e(u))dg(x;e(u)).

For any reward function ψ(n), define

Luψ(n) = R(u)+E[βV (u)ψ(n− e(u)+w(u))|u

], (6.31)

where the expectation E [·|u] is with respects to random variables V (u) and w(u).As mentioned before, the model setting ensures that A(nt) is a finite set. By the

theory of dynamic programming, the Gittins index policy is optimal if its valuefunction Va(n) satisfies optimality equation Va(n) = max

u∈A(n)LuVa(n). Define

∆u(n) = Va(n)−LuVa(n).

Then the optimality of Gittins index is equivalent to the following statement:

∆u(n) = 0 ⇐⇒ M(u) = M(n), (6.32)

where “⇐⇒” represents “if and only if”. To prove (6.32), we modify the bandit pro-cess by introducing an auxiliary arm of a specific type (say ∞) that, once operated,gives an instant reward (1−β )m, constant duration V (∞) = 1 and descendants ofthe same type ∞. We use a superscript m to indicate relevant items in this modifiedbandit. Then,

(i) This arm of type ∞ and all its descendants have Gittins index m, and(ii) Under the Gittins index rule, once an arm of type ∞ is operated, one inevitably

keeps operating arms of type ∞ forever, with the effect of finishing with a finalreward m, (i.e. T m(x,n) = T (x,n) for x ≥ m and T m(x,n) = ∞ for x < m).

For this new bandit, the corresponding discounting function (cf. (6.29)) is

gm(x;n) = E[IT(x,n)m<∞β D(x,n)

]= g(x;n)Ix≥m. (6.33)


Hence it is easy to verify that the value function for this new bandit under the Gittinsindex rule is

Va(m;n) =∫ ∞

0(1− gm(x;n))dx = m+

∫ ∞

m(1− g(x;n))dx

=∫ ∞

0(1− g(x;n))dx+

∫ m

0g(x;n)dx = Va(n)+

∫ m

0g(x;n)dx. (6.34)

It follows immediately that

Va(0;n) = Va(n) and∂Va(m;n)

∂m= g(m;n). (6.35)

Moreover, let ∆u(m;n) = Va(m;n)−LuVa(m;n). Then by (6.31),

∆u(m;n) = Va(m;n)−R(u)−E[βV (u)Va(m;n− e(u)+w(u))|u

](6.36)

and consequently,

∆u(m;e(u)) = Va(m;e(u))−R(u)−E[βV (u)Va(m;w(u))|u

]. (6.37)

Applying (6.34), it follows that

∆u(m;n)−∆u(m;e(u))

= Va(m;n)−Va(m;e(u))+E[βV (u) [Va(m;w(u))−Va(m;n− e(u)+w(u))] |u

]

= Va(m;n)−Va(m;e(u))−E[

βV (u)∫ ∞

mg(x;w(u)) [1− g(x;n− e(u))]dx|u

].

This gives

∆u(m;n)−∆u(m;e(u)) = 0 for m ≥ M(n), (6.38)

because Va(x;n) = Va(x;e(u)) = x and g(x;n− e(u)) = 1 for x ≥ M(n).Differentiate (7.64) with respect to m and combine it with (6.30), (6.35) and (6.37)

to obtain

∂∆u(m;n)∂m

= g(m;n)−E[βV (u)g(m;n− e(u)+w(u))|u

]

= g(m;n− e(u))

g(m;e(u))−E[βV (u)g(m;w(u))|u

]

= g(m;n− e(u))∂∆u(m;e(u))

∂m.

Integrating this equation over the interval (0,M(n)) gives the expression

∆u(M(n);n)−∆u(0;n) =∆u(M(n);e(u))g(M(n);n− e(u))−∆u(0;e(u))g(0;n− e(u))

−∫ M(n)

0∆u(x;e(u))dg(x;n− e(u)). (6.39)


By (6.38) together with g(M(n);n−e(u))= 1 and ∆u(0;e(u))= 0, we see that (6.39)implies

∆u(n) = ∆u(0;n) =∫ M(n)

0∆u(x;e(u))dg(x;n− e(u))

=∫ M(n−e(u))

0∆u(x;e(u))dg(x;n− e(u)). (6.40)

Since ∆u(x;e(u)) = 0 for x ∈ [0,M(u)] and ∆u(x;e(u)) > 0 for x > M(u), (6.40)shows that

∆u(n) = 0 ⇐⇒ ∆u(x;e(u)) = 0 for all x ∈ [0,M(n− e(u))]⇐⇒ M(n− e(u))≤ M(u) ⇐⇒ M(u) = M(n).

This proves (6.32) and thus the theorem.

6.3 Generalized Open Bandit Problems

The generalized bandit problem was first discussed by Nash (1980), under a discretetime setting with closed bandit processes (fixed number of arms, ∑x∈A(nt ) nt(x) = dfor some d) in which the state of every arm evolves according to a Markov fashionand the immediate reward from an arm being operated is not only a function ofits state but also influenced by the states of the other frozen arms. This problemunder the branching bandit setting appeared to be first investigated by Crosbieand Glazebrook (2000) by means of the popular framework of achievable regionmethods. In their work, however, only a finite types of arms can be treated, as in allpapers with achievable region methods. In this section, we apply the general theoryfor the branching bandits with possibly negative durations to deduce the correspond-ing results of generalized branching bandit problems with arbitrary arm types. Thededuction is based on the equivalence between the generalized bandits and the ban-dits with durations (which are semi-Markov bandit problems in the case of positivedurations). For easy reference, we first recall the results by Nash (1980) and thenapply Theorem 6.2 to the generalized branching bandits.

6.3.1 Nash’s Generalized Bandit Problem

We here follow Nash (1980) to present the model on a discrete time processsetting and thus the term “states” in this subsection correspond to arm types inthe previous sections. Specifically, we have fixed d arms that are modeled byd stochastic processes Xt(i) : t ≥ 0 in discrete time on the filtered probabilityspaces (Ω ,F (i),Pr(·)), where F (i) = Ft(i) : t ≥ 0 is a filtration such thatXt(i) : t ≥ 0 is F (i)-adapt, i = 1,2, . . . ,d. The ith reward process is given byR(Xt(i)).

6.3 Generalized Open Bandit Problems 245

At any calendar time t, under a policy π , suppose that arm i has been operatedfor T (t, i) times (∑d

i=1 T (t, i) = t, T (t, i)− T (t − 1, i) = 0 if arm i is idle andT (t, i)− T (t − 1, i) = 1 if it is operated at time t − 1), and is thus at state XT (t,i)(i), i = 1,2, . . . ,d. Instead of R(XT (t,i)(i)) alone, the rewards in this generalizedbandit problem are R(XT (t,i)(i)) multiplied by factors Q(XT (t, j)( j)) for j = i, whereQ is a nonnegative function of states. Hence the real reward is determined bythe states of all arms, rather than just the current activated arm i. Consequently,the value of a policy π is computed by

v(π) = E

[∞

∑t=0

β td

∑i=1

∏j =i

Q(XT (t, j)( j))

R(XT(t,i)(i))T (t + 1, i)−T(t, i)

].

(6.41)

For fixed arm i, let T be the set of all positive F (i)-stopping times and define

T ′ = τ ∈ T : E[Q(X0(i))−β τQ(Xτ(i))]< 0.

Then define the index at the current time (taken as time 0 without loss of generality)

G0(i) = esssupτ∈T

E[

∑τ−1t=0 β tR(Xt(i))

∣∣F0(i)]

E[Q(X0(i))−β τQ(Xτ(i))|F0(i)], (6.42)

where T = T if T ′ = ∅ or T = T ′ otherwise (the indices in any time t canbe similarly defined). Nash (1980) claimed that the optimal policy is to play arm iother than arm j if either sgn(G0(i))< sgn(G0( j)) or sgn(G0(i)) = sgn(G0( j)) andG0(i)> G0( j), where in the case of G0(i) = 0, sgn(G0(i)) is defined as 1 if T ′ =∅or −1 otherwise. This result is proved by Nash using an interchange argument.

We can reformulate Nash’s model as one similar to the typical closed banditproblem as explained below. Define a new reward function R(x) = R(x)/Q(x) forx ∈ R+. Let

V0(i) = 0 and Vt(i) = 1+[logQ(Xt(i))− logQ(Xt−1(i))]/ logβ for t > 0,

and introduce

Dt(i) =t

∑l=0

Vl(i) and Dt =d

∑i=1

DT (t,i)(i).

Then (6.41) can be rewritten as

v(π) =d

∏j=1

Q(X0( j))E

[∞

∑t=0

β Dtd

∑i=1

R(XT (t,i)(i))T (t + 1, i)−T(t, i)]. (6.43)

Since ∏dj=1 Q(X0( j)) is independent of the policy π , the performance measure

v(π) is the same as that of a Markov bandit problem with reward R and durationsVt(i), i = 1,2, . . . ,d, t = 0,1, . . .. For an arm of type u, denote by Ωu the support

of its states in the sense that with probability 1, Ωu is the smallest set such thatPr(⋂∞

t=1(Xt ∈ Ωu)) = 1. If maxx,y∈Ωi Q(x)/Q(y)≤ 1/β , or equivalently,

minQ(x) : x ∈ ΩimaxQ(x) : x ∈ Ωi

≥ β , (6.44)

then Vt(i) > 0 for all i and t, and hence the generalized bandit model can be re-duced to the one with Markov structure and positive durations discussed in Sect. 6.2(which in fact agrees with the semi-Markov bandits) and the Gittins indices for thismodel thus coincides with (and can be deduced by) that in (6.42). This correspon-dence has been pointed out by Gittins (1989) and Glazebrook and Owen (1991).If (6.44) fails, however, some Vt(i) may take strictly negative values with a positiveprobability so that in the induced model, a pull of arm i at time t has the effect ofreversing time as explained before. This is the most interesting part of the Nash’smodel. In such a case, the classical result in closed bandit problems cannot producea solution for Nash’s problem, and the solution claimed by Nash (1980) appearsinvalid. We provide a counterexample below, in which (6.44) is not satisfied andthe Gittins index policy claimed by Nash (1980) fails to optimally solve the banditproblem.

Example 6.2. Consider a closed bandit problem with two deterministic arms. Arm 1always has a fixed state, say 0, and thus a fixed reward R(0)=m with Q(0) = 1. Arm2 is initiated at state 1 and subject to deterministic state transition 1 → 2 → · · · withQ(u) = β−2(u−1) and the reward sequence R(u) = α2(u−1), u = 1,2, . . . , for somefixed α ∈ (0,

√β ). Clearly, the Gittins index for arm 1 is positive and the Gittins

index for all states of arm 2 are negative because for any stopping time τ > 0,

Q(u)−β τQ(u+ τ) = β−2(u−1)−β τβ−2(u+τ−1) = β−2(u−1)(1−β−τ)< 0.

According to Nash (1980), the “optimum” is the Gittins index rule that operates arm2 at all times 0,1,2, . . . , which gives the total discounted reward as ∑∞

t=0 β−tα2t =β/(β −α2). On the other hand, a policy that operates arm 1 all the time wouldhave a total reward m/(1−β ). Therefore, if m/(1−β )> β/(β −α2), which canbe easily achieved by taking a sufficiently large m, then the Gittins index rule is notoptimal.

This example can also be converted to the time-revisable model with a constantduration V = −1 (cf. (6.43)), which violates the condition of Proposition 6.1 sinceE[βV ] = β−1 > 1.

To avoid the situation in Example 6.2, Nash’s theorem should be modified tothe following narrower version. Let F (i) = Ft(i) : t = 0,1, . . . denote the naturalfiltration.

Theorem 6.3. If E[Q(Xi0(u))− β τ Q(Xi

τ(u))|u] > 0 for any state u, F (i)-stoppingtime τ and i = 1,2, . . . ,d, then the generalized bandit can be optimally operatedunder the Gittins index rule with the indices defined by

Mi(u) = supτ>0

E[

∑τ−1t=0 β tR(Xi

t )∣∣Xi

0 = u]

E[Q(Xi0(u))−β τQ(Xi

τ(u))|Xi0 = u]

. (6.45)

6.3 Generalized Open Bandit Problems 247

6.3.2 Extension of Nash’s Model

Now we return the meaning of “state” back to what we used in Sects. 6.2.1 and 6.2.2.Here we discuss a straightforward extension of the model in Nash (1980) to thegeneralized branching bandit problems by applying Theorem 6.2, which covers themodel analysis by Crosbie and Glazebrook (2000) for finite arm types. Comparedwith the branching bandits discussed in Sects. 6.2.1 and 6.2.2, without loss ofgenerality, we can take V (u)= 1 and treat the Markov model without extra durations.Any branching with positive durations can be easily translated to the one we are todiscuss here. In this model, at any time t with state nt governed by a policy π , if theserver operates an arm u = uπ

t ∈ A(nπt ), a discounted reward β t∏v∈A(nt )[Q(v)]n(v)

R(uπt )/Q(uπ

t ) is collected and the state evolves to nt − e(u)+w(u). Note thatR(uπ

t )/Q(uπt ) takes the place of instant reward with β t∏v∈A(nt)[Q(v)]n(v) as the dis-

counting factor. Define

Q(n) = ∏v∈A(n)

[Q(v)]n(v).

In particular, Q(e(u)) = Q(u). Since at the next time point t +1, the discount factoris changed to β t+1Q(nt −e(u)+w(u)) = β t+1∏v∈A(nt−e(u)+w(u))[Q(v)]n(v), we have

β t+1Q(nt − e(u)+w(u))β t Q(nt − e(u)+w(u))

=β t+1∏v∈A(nt−e(u)+w(u))[Q(v)]n(v)

β t∏v∈A(nt )[Q(v)]n(v)

=β

Q(u) ∏v∈A(w(u))

[Q(v)]n(v).

Therefore, this generalized branching bandit problem corresponds to a semi-Markovbranching bandit model with durations

V (u) = 1+1

logβ

[

∑v∈A(w(u))

n(v) [logQ(v)− logQ(u)]

].

To define the Gittins indices, consider the branching bandit process initiated by asingle arm of type u. It is easy to see that the Gittins index for this arm can bedefined by

M(u)g = supπ ,τ>0

Eπ[

∑τ−1t=0 β tRt

∣∣u]

Eπ [Q(u)−β τQ(uτ)|u],

where the superscript “g” stands for generalized bandit branching bandit and thesupremum is taken over all policies π and all positive F π -stopping times τ . Thuswe have the following theorem.

Theorem 6.4. Suppose that Eπ [Q(u)− β τ Q(uτ)|u] > 0 for any branching banditwith initial state e(u) under any policy π and F π(e(u))-stopping times τ . Thenthe Gittins index rule is optimal for the generalized branching bandit problem justdescribed.

6.4 Closed Multi-Armed Bandit Processes in Continuous Time

6.4.1 Problem Formulation and Its Solution

A d-armed bandit in continuous time setting is constructed as follows: There are dadopted processes (Xk,F k),1 ≤ k ≤ d, on a common probability space (Ω ,F ,Pr)with Xk = Xk

t , t ∈ R+, which fulfill the following technical conditions:

1. The accumulated information F k = F kt ;t ∈ R+,1 ≤ k ≤ d, are mutually ind-

ependent filtration (i.e., F k∞,k = 1, . . . ,d are mutually independent σ -algebras),

satisfy the usual condition of right-continuity and completeness.2. For each arm k, the reward rate process Xk

t is F k-progressive, indicating thereward rate at time t if it is played alone, such that the accumulated discountedreward for continuous playing arm k in time [0, t] is

∫ t0 e−δuXudu, satisfying the

integrability condition

E[∫ ∞

0e−δuXudu

]< ∞. (6.46)

A policy T = Tt , t ∈ R+ = (T 1t ,T

2t , . . . ,T

dt ), , t ∈ R+ is a d-dimensional

stochastic process with T kt modeling the total amount of time that T spends on arm

k during the first t units of time. Formally, we have the following definition.

Definition 6.2. A d-dimensional function T of t is a policy for the bandit problemin continuous time if

(1) T0 = 0 and Tt is componentwise nondecreasing in t,(2) T 1

t +T 2t + · · ·+T d

t = t; and(3) Tt ≤ s ∈ Fs = F 1

s1 ∨ F 2s2 ∨ · · · ∨ F d

sd for all t ≥ 0 and s ∈ R+d , the d-dimensional orthant.

Here condition (3) captures the non-anticipative nature of the policy (Mandel-baum 1987; Kaspi and Mandelbaum 1998): It does not depend on information be-yond F k

sk ,1 ≤ k ≤ d, that more than sk units have been allocated to arm k.Under a policy T , the successive operations of all d arms yield a total expected

reward

v(T ) = E

[d

∑k=1

∫ ∞

0e−δ tXk

T kt

dT kt

], (6.47)

where the integration is computed over time horizon t. If we denote the right-continuous inverse of T k

t by

ξ ku = supt : T k

t = u (or left continuous inverse inft : T kt = u) (6.48)

to represent the calendar time when arm k has been operated for u units of time, theobjective function can be rewritten as

v(T ) = E

[d

∑k=1

∫ T k∞

0e−δξ k

u Xku du

]. (6.49)

6.4 Closed Multi-Armed Bandit Processes in Continuous Time 249

Apparently ξ ku and ξ k

u − u are both nondecreasing (the latter actually indicates thetime the machine has been allocated to other arms than arm k). The problem is nowagain to find an optimal policy T such that

v(T ) = esssupT

v(T ). (6.50)

Note that condition (2) of Definition 6.2 implies that each component of Tt isLipschitz and thus absolutely continuous and there exists a d-dimensional functionon the nonnegative real line R+, say f = ( f 1, f 2, . . . , f d), such that

Tt =∫ t

0f (s)ds =

(∫ t

0f 1(s)ds,

∫ t

0f 2(s)ds, . . . ,

∫ t

0f d(s)ds

). (6.51)

This function f is hereafter referred to as processing rate and may be between 0 and1. In order to fix a policy T , it then suffices to identify the processing rate f suchthat f 1(t)+ f 2(t)+ · · ·+ f d(t) = 1.

The objective of maximizing the value V (T ) in (6.47) turns out to be solvedby the Gittins index rule as described below. The Gittins indices of an arm can bedefine by

Gku = esssup

τ>u

E[∫ τ

u e−δ tXkt dt∣∣F k

u]

E[∫ τ

u e−δ tdt∣∣F k

u] , (6.52)

where the esssup is taken over all F k-stopping times τ > u. Clearly, Gku is

F k-adapted. The Gittins index Gku can also be equivalently defined as is an F k-

adapted process in u such that

Gku = essinf

Y ∈ F k

u : esssupτ>u

E[∫ τ

ue−δ t

(Xk

t −Y)

dt∣∣∣∣F

ku

]= 0, (6.53)

and

1δ Gk

u = essinf

Y ∈ F ku :

1δ Y = esssup

τ>uE[∫ τ

ue−δ (t−u)Xk

t dt +1δ Ye−δ (τ−u)

∣∣∣∣Fku

].

(6.54)

These two definitions connect the Gittins indices with the problems of optimalstopping times discussed in Sect. 5.5.

From the definition, one might think that a Gittins index process could be right-continuous. But the following simple example shows that it is not generally the case.

Example 6.3.

(1) Consider Ω = ω1,ω2 with probability Pr(ω1) = p = 1−Pr(ω2), and a stoch-astic reward rate process Xt with Xt(ω1) = 0 and Xt(ω2) = 5. The natural filtra-tion Ft = σ(Xu;u ∈ [0, t]) is equal to ∅,Ω at t = 0 and 2Ω if t > 0, hence Fis not right-continuous. The Gittins index process is given by

G0 = 5 and Gt(ω) =

0 if ω = ω15 if ω = ω2

,

which is discontinuous at time 0 with a positive probability.

(2) Take the reward rate to be a deterministic function of the time passed, that is,Xt = ∑∞

n=0 I[2−2n,2−2n+1)(t). Then the Gittins index process is

Gt =

⎧⎨

⎩1 t ∈ [2−2n,2−2n+1)1

2− 22ntt ∈ [2−2n−1,2−2n)

, n = 1,2, . . . .

Apparently, G0 = liminft→0

Gt = 2/3 and limsupt→0

Gt = 1. Hence Gt does not have a

right-limit at t = 0. This example also shows that Gittins indices are not gener-ally attainable.

As in the discrete time case, we introduce the lower envelope of Gkt as

Gkt = inf

u∈[0,t]Gk

u,

which is F k-adapted, but not generally right-continuous. Because we below onlyinvolve the Lebesgue integral of Gk

t , we can use its right-continuous modification,which is also F k-adapted because of the right-continuity of F k.

Definition 6.3. If At is a continuous nondecreasing function of t, we say that t is anincreasing point of At and write dAt > 0 if At < Au for all u > t. We further writedAt < dt when t −At is increasing at t.

Theorem 6.5. An allocation policy T =(T 1t , T

2t , . . . , T

dt ) is optimal to problem (6.50)

if and only if almost surely,

(i) the policy follows the leading Gittins index, i.e.,

dT kt > 0 =⇒ Gk

Tkt= max

1≤q≤dGq

T qt,

and(ii) whenever arm k is not engaged full-time, its Gittins index is at an all-time low:

dT kt < dt =⇒ Gk

T kt= inf

u∈[0,Tkt ]

Gku.

At this point, we can give some explanations for why we need the machine tobe engaged simultaneously by multiple arms. First, if we prohibit the simultaneousallocation of the machine resource to multiple arms, then an index policy may beinapplicable. For example, consider the situation where two identical arms are to beprocessed and the reward rate is strictly decreasing in time t the arm has been oper-ated. Clearly, the Gittins index is just the reward rate so that it is impossible to oper-ate the two arms following the highest Gittins index without allowing simultaneousallocation of the machine. Second, there may exist no optimum in the class of ex-clusive policies, i.e., operating only one arm at a time. For example, in the situationjust discussed, it follows from Sect. 6.4.2 that the optimizer simultaneously operatesthe two arms with the same processing rate 1/2. Because this optimal policy can be

6.4 Closed Multi-Armed Bandit Processes in Continuous Time 251

approximated by a sequence of exclusive policies which alternatively process thetwo arms in sufficiently small time intervals, it is clear that no optimizer exists inthe class of exclusive policies. These two points make the multi-armed bandit pro-cesses in continuous time fundamentally different from those in discrete time, inwhich the optimal policy processes arms one by one. Under certain circumstances,it might not be practically feasible to simultaneously allocate a common resource todifferent arms, and hence in such cases, one can only seek approximately optimalpolicies in practice.

6.4.2 An Account for Deteriorating Bandits

The various proofs of optimality of Gittins index policies are quite difficult to follow.We here only take the deteriorating bandits as an example to show the optimality ofGittins index policies. This in fact also provides a fundamental step towards the finalsolution for general bandit problems.

An arm is said to be deteriorating if its reward rate paths are nonincreasing intime and a bandit is deteriorating if all its arms are. In this case, the optimal policyis myopic in the sense that it plays the arms with the highest immediate reward rate.

Let Xkt be deterministic nonincreasing and right-continuous functions of t and T

a policy. Under T , the total discounted reward is

v(T ) =d

∑k=1

∫ ∞

0e−δ tXk

Tkt

dT kt =

∫ ∞

0e−δ t

d

∑k=1

XkT k

tf kt dt, (6.55)

where the reward rate Xkt is supposed to be right-continuous and f k(t) ≥ 0 are the

almost everywhere derivatives of T kt ,k = 1,2, . . .d, such that ∑d

k=1 f k(t) = 1. Thenv(T )≤ v(T ), where T is arbitrary and T a policy following the leading reward rate.This is proved as follows.

Write gku = supt : Xk

t > u = inft : Xkt ≤ u for the right-continuous inverse

of Xkt , which models the time needed to operate the arm such that its reward rate

falls down to a level no more than u. Thus, in order that all the arms can fall downto level u in their reward rates, one needs to spend a total time gu = ∑d

k=1 gku on the

d arms. We first examine the equality

d

∑k=1

T kt ∧gk

u = t ∧ gu (6.56)

over the set of t at which all XkTk

t, k = 1,2, . . . ,d, are continuous. Since ∑d

k=1 T kt = t

and gu = ∑dk=1 gk

u, it suffices to show that there exists no pair (k, p) of identifierssuch that

T pt > gp

u and T kt < gk

u (6.57)

at every time instant t at which both XkTk

tand X p

T pt

are continuous. We prove it by

contradiction. If (6.56) holds, then X pT p

t≤ u < Xk

Tkt

. Define

τ = sups : s < t,X pT p

s≥ Xk

T ks= infs : s ≤ t,X p

T ps< Xk

Tks

with the convention sup∅= 0. Because X pT p

s< Xk

Tks

for all s ∈ (τ, t], the feature of Tfollowing the leader indicates that

T pt = T p

τ . (6.58)

If τ = 0 then T pt = 0 ≤ gp

u , contradicting (6.57). On the other hand, if τ > 0, finda sequence of nonnegative sequence αn → 0 such that X p

T pτ−αn

≥ XkTk

τ−αn≥ Xk

Tkt> u;

if X pT p

τ≥ Xk

T kτ

, then αn all take value 0. Therefore, T pτ−αn

< gpu . Setting n → ∞ and

using (6.58) lead to T pt = T p

τ ≤ gpu , contradicting (6.57) again. Thus (6.56) is proved.

We now turn to checking the optimality of the policy T . For any policy T ,

d

∑k=1

(T k

t ∧gku

)≤

d

∑k=1

T kt ∧

d

∑k=1

gku = t ∧ g =

d

∑k=1

(T k

t ∧gku

). (6.59)

Thus, by (6.55),

v(T ) =d

∑k=1

∫ ∞

0e−δ tXk

T kt

dT kt =

d

∑k=1

∫ ∞

0e−δ t

∫ ∞

0I(0<u<Xk

Tkt)dudT k

t .

By Fubini’s theorem,

v(T ) =d

∑k=1

∫ ∞

0

∫ ∞

0e−δ t I(T k

t <gku)

dT kt du =

d

∑k=1

∫ ∞

0

∫ ∞

0e−δ td

(T k

t ∧gku

)du.

Further using the partial integration and the inequality in (6.59) yields

v(T) = δ∫ ∞

0

∫ ∞

0e−δ t

d

∑k=1

(T k

t ∧gku

)dtdu ≤ δ

∫ ∞

0

∫ ∞

0e−δ t

d

∑k=1

(T k

t ∧gku

)dtdu = v(T ).

This proves the desired result.

Chapter 7Dynamic Policies

This chapter is devoted to optimal dynamic policies. Section 7.1 discusses differ-ences between optimal static and dynamic policies with emphasis on the impactsof different levels of information utilization in the stochastic scheduling framework.Section 7.2 treats optimal policies in the class of restricted dynamic policies forstochastic scheduling problems subject to random machine breakdowns under thetotal-loss model. Section 7.3 discusses the optimal restricted dynamic policies forno-loss breakdown models. Section 7.4 deals with partial-loss breakdown models.Its focus is on restricted dynamic policies, but optimal static and nonpreemptive dy-namic policies are also presented as by-products. The restricted dynamic policies inSects. 7.2–7.4 show the applications of Gittins index theory to stochastic schedul-ing. Section 7.5, on the other hand, is dedicated to unrestricted dynamic policies forparallel machine scheduling with exponentially distributed processing times, withoptimal polices to be obtained by means of general Markovian decision processes.

7.1 Dynamic Policies and Information

Stochastic scheduling is a decision-making process that, by making as exhaustive aspossible utilization of information up to date, allocates the machine resource to servejob processing, subject to certain technical constraints, so as to optimize certain per-formance measures. Here information includes prior knowledge on the features ofthe system such as machines and jobs in terms of their probability distributions andthe technical conditions may be, for example, the interruptability of job processing,precedence of job processing or some other practical constraints on the policies thatcan be recruited. The time points the scheduler can make decisions are referred toas decision epochs.

One of the most important differences between deterministic and stochasticscheduling exists in the information release process. In deterministic situations,complete information is available at the beginning and no new information willbe delivered by job processing so that the decision maker can arrange every thing


253

254 7 Dynamic Policies

a priori. For stochastic scheduling problems, on the other hand, the information isgenerally released progressively in the processes of operations and thus adjustmentsof the processing arrangement are necessary at decision epochs when the newlyreleased information indicates that the currently undergoing process will no longerbe optimal. For example, consider a job with processing time P whose cdf isF(x) = Pr(P ≤ x). At any time instant, if the job has not yet been finished after tunits of processing time, then the information released is “X > t”, and the processingtime of the job now follows a new conditional distribution

Pr(P ≤ x|P > t) =Pr(t t)=

F(x∨ t)−F(t)1−F(t)

,

which differs from F(x) except for exponential and geometrical distributions withthe memoryless property. If at some time point t, the conditional distribution showsa feature of longer remaining processing time than the other jobs waiting forprocessing, then it will not be optimal to continue the current one. This showsthat, with the evolution of the processing, one gets more and more knowledge onunrealized random variables such as the processing times, the lifetimes of ma-chines, and so on. The knowledges is generally expressed in terms of conditionaldistributions, and must be taken into account in adjusting the job processing so asto achieve optimal results. Because the information is progressively collected, a de-cision maker needs to revise his decision from time to time based on up to dateinformation. This shows the dynamic feature of the decision-making process.

In the context of scheduling, the constraints may include the interruptablity ofjob processing, maintenance of machines at breakdowns, set up of machines on jobswitching, etc. If there is no constraint, then every time point is a decision epoch.The presence of constraints limits the decision epochs. For example, the decisionmaker needs to decide if he should switch the machine to another job at every timeif the job is interruptable, but he only needs to make decision on the next job at thecompletion time of the current job if it is not interruptable. Generally, a job beingprocessed is said to be “preemptive” if it can be pulled off the machine before it iscompleted.

Recall the classification of scheduling policies introduced in Sect. 1.3.3. A non-preemptive dynamic policy determines which job to process at the time the machinesare setup or when a job is completed. No job can be preempted under such a policy.It is a common observation (see Pinedo 2002) that in many circumstances, espe-cially when the job processing times are independent, the optimal nonpreemptivedynamic policy is a static sequencing policy. This however does not hold generally.The following example shows that, while a nonpreemptive dynamic policy coin-cides with a static policy in some special cases, it behaves better than static policiesin other cases.

Example 7.1 (Static list vs nonpreemptive dynamic policies). Consider a single-machine stochastic scheduling problem of n jobs to maximize the expected totalweighted discounted rewards E

[∑n

j=1 wje−rCj

]with mutually independent stochastic

processing times P1,P2, . . . ,Pn, where r > 0 is the discount rate, Cj and wj are

7.1 Dynamic Policies and Information 255

respectively the completion time and weight of job j. It is commonly known (see,e.g., Pinedo 2002) that the nonincreasing order of wjE[e−rP j]/(1−E[e−rP j]) isoptimal in not only the class of static list policies but also the larger class of nonpre-emptive dynamic policies.

Next consider another problem in which the machine needs a setup time S withdistribution Pr(S = 2) = Pr(S = 4) = 0.5. There are two jobs with deterministicprocessing times p1 = 1 and p2 = 2, and the objective is to minimize the expectedcost Π = E [ f1(C1)+ f2(C2)], where f1 and f2 are two deterministic nondecreasingfunctions. The optimal static policy is a fixed sequence of the two jobs that can bedetermined a priori according to the objective function and the distribution of S. Forthe unrestricted dynamic policies, however, it is apparent that after the realizationof the setup time, the optimal sequence of the two jobs will generally depend on therealized setup time if f1 = f2.

The next two examples show the distinctions between restricted and unrestrictedpolices.

Example 7.2 (Periodically examined (restricted) dynamic policies). Consider againthe problem of maximizing the expected total weighted discounted rewards inExample 7.1. Now the processing times Pi are independent random variables takinginteger values with probability pi j = Pr(Pi = j), j ∈N. The processing is period-ically examined so that the scheduler can make his decision on every integer timepoints n = 0,1,2, . . . . This setting can be put in the framework of bandit process indiscrete time. To be specific, the processing of every job i can be associated with aMarkov process with state space S =N∪∗, where ∗ indicates the particular stateof completion, and transition law

qik,k+1 =

∑∞s=k+2 pis

∑∞s=k+1 pis

, qik∗ =

pi,k+1

∑∞s=k+1 pis

and qik j = 0, j ∈N− k+ 1.

Then for every job i, if it has been processed for t units of time, one can associate itwith Gittins index Gi

t and the optimal policy selects a job with the highest index toprocess one unit of time. By formula (6.8), the Gittins index Gi

t can be computed by

Git = wi esssup

τ

E[

∑τs=t+1 e−rPi IPi=s

∣∣Pi > t]

E[∫ τ

t e−rudu∣∣Pi > t

] , (7.1)

where τ is a stopping time, which can be limited in the class of stopping timesρn = minPi,n : n > t. Therefore,

Git = wi esssup

n

∑ns=t+1 e−rs pis∫ n

t ∑∞s=[u] pise−rudu

, (7.2)

where [u] denotes the floor (integer part) of u. In particular, if Pi is geometricallydistributed with pin = Pr(Pi = n) = (1− e−λi)e−(n−1)λi ,λi > 0,n = 1,2, . . ., then

Git = rwi

e−(r+λi)

1− e−r = Gi (7.3)

is independent of the state t. Thus it is optimal to schedule the jobs nonpreemptivelyaccording to Gi.


Example 7.3 (Unrestricted dynamic policies). Consider the same problem as in thepreceding example but here the processing times Pi are arbitrarily distributed andthe scheduler can adjust his processing decision at any time point. By the theory ofbandit process in continuous time (Sect. 6.4), for a policy to be optimal, the machineneeds to be sharable simultaneously by all the jobs. We here only demonstrate thecomputation of the Gittins index and thus the job identifier is suppressed. Supposethat a job with processing time P has been processed for t units of time and remainsunfinished. Further suppose that a reward w will be collected on the completion ofthe job. In this simple case, Ft is the σ -algebra generated by the event P > t, i.e.,if P > t, then conditioning on Ft is the same as conditioning on P > t. Thus theGittins index is

Gt = wesssupτ>t

E[e−rPI(P=τ)|P > t

]

E[∫ τ

t e−rudu|P > t] , (7.4)

where τ is an Ft -stopping time. Due to Theorem 5.8, it is not a difficult exercise toshow that any τ can be expressed as τ = P∧ x for some x > t. Thus

Gt = wsupx>t

E[e−rPI(P≤x)|P > t

]

E[∫ P∧x

t e−rudu|P > t] = wsup

x>t

∫ xt e−rudF(u)∫ x

t [1−F(u)]e−rudu, (7.5)

where F is the cumulative distribution function of processing time P. Similar tothe preceding example, if the processing times are exponentially distributed, wecan have a nonpreemptive policy that is optimal in the class of unrestricted dynamicpolicies and does not require to share the machine among jobs. When F is absolutelycontinuous with density function f (x), with h(x) being the hazard function of P :h(x) = f (x)/[1−F(x)], then the Gittins index can be rewritten as

Gt = wsupx>t

∫ xt e−(ru+

∫ ut h(s)ds)h(u)du

∫ xt e−(ru+

∫ ut h(s)ds)du

. (7.6)

It can be easily checked by differentiation in x that the ratio in (7.6) is increasing(decreasing) in x if and only if

∫ x

te−(ru+

∫ ut h(s)ds)h(u)du ≤ (≥) h(x)

∫ x

te−(ru+

∫ ut h(s)ds)du. (7.7)

This can be used to compute the Gittins index Gt . Particularly, if condition (7.7)holds with ≤ (P has an increasing hazard function), then

Gt = w∫ ∞

t e−rudF(u)∫ ∞t [1−F(u)]e−rudu

, (7.8)

and if (7.7) holds with ≥ (P has a decreasing hazard function), then

Gt = wf (t)

1−F(t)= wh(t). (7.9)

7.2 Restricted Dynamic Policies for Total-Loss Breakdown Models 257

Otherwise, the maximum in (7.6) can be attained at some point x0. In this case, (7.7)holds with equality and thus Gt = wh(x0).

Furthermore, the second derivative indicates that h′(x0)≤ 0. To summarize, if wedenote h(∞) =

∫ ∞t e−rudF(u)/

∫ ∞t (1−F(u))e−rudu and

A = t,∞⋃

x ≥ t : h′(x)≤ 0 and∫ x

te−(ru+

∫ ut h(s)ds)(h(x)− h(u))du = 0

,

thenGt = wmax

x∈Ah(x). (7.10)

This formula states that the Gittins index is essentially the hazard of the processingtimes at some later time point, at which the hazard rate is decreasing.

For the class of restricted dynamic policies, the stochastic dynamic program-ming in discrete and continuous times provides fundamental tools to find optimalsolutions. Unfortunately, however, these dynamic programming approaches do notgenerally allow analytical solutions, and this makes the dynamic scheduling a muchmore difficult challenge. A celebrated exception is the well-known bandit processproblem which allows analytical solutions. As we have seen, even for this excep-tion, a practically executable policy may not exist; see the remark for continuoustime bandit process problems (cf. Sect. 6.4).

7.2 Restricted Dynamic Policies for Total-LossBreakdown Models

In this section, we focus on finding optimal restricted dynamic policies for single-machine scheduling subject to total-loss machine breakdowns (see Chap. 4 for therelevant definitions). Under such policies, the decision maker can switch betweenthe jobs only when a job is completed or a machine breakdown occurs. For thetotal-loss model with independent processing times, we investigate the optimal poli-cies under general cost functions. For the total-loss model with identical processingtimes, we only deal with the problem of maximizing the expected total discountedrewards.

7.2.1 Total-Loss Breakdown Model

We first discuss the problem of scheduling a set of n jobs subject to total-lossmachine breakdowns so as to minimize the expected general discounted cost (GDC):

Φ(π) = E[∫ ∞

0φ(Sπ (t))e−rtdt

], (7.11)


where• π is a restricted policy,• Sπ(t) (or S(t) in short below) indicates the set of jobs that have not been com-

pleted at time t (clearly, S(0) = 1,2, . . . ,n),• φ(·) is a nonnegative set function of subsets of N= 1, . . . ,n, which represents

the instantaneous cost rate of holding a set of jobs and has the obvious propertiesφ(∅) = 0 and φ(A)≤ φ(B) if A ⊂ B, and

• r is the discount rate.

Let Ci denote the completion time of job i. Then i ∈ S(t) if and only if Ci > t.Further denote the cardinality of S by m = m(S).

The objective function (7.11) covers many extensively studied special cases.Some examples are listed below:

1. If φ(S) = ∑i∈S wi, where wi represents the unit cost (or weight) of holding jobi, then the cost function in (7.11) becomes the expected weighted discounted cost(WDC)

Φ = E

[n

∑i=1

wi

∫ Ci

0e−rtdt

]=

r−1 ∑n

i=1 wiE[1− e−rCi] if r > 0E [∑n

i=1 wiCi] if r = 0.(7.12)

The case r = 0 corresponds to the expected weighted flowtime (WFT) and thecase r > 0 corresponds to the weighted discounted reward (WDR).

2. Consider the situation of operating a set of n testing jobs (cf. Trivedi, 2001), inwhich the process is planned to end once any k of the n jobs are completed. LetC(i) denote the completion time of the i-th completed job. Then the expected costtruncated at the k-th completed job (TKJ) is

Φ = E[∫ C(k)

0e−rtdt

]. (7.13)

This amounts to the case φ(S) = 1 when m(S)> n−k and φ(S) = 0 otherwise.3. As an example of min-max criteria, suppose that wi is the unit time holding

cost of job i and the objective is to minimize the maximum discounted holdingcost (MDC) among the incomplete jobs at any time, so that φ(S) = maxi∈S wi.Let w1 ≥ w2 ≥ · · · ≥ wn and C(i) be the completion time of the i-th completedjob and Cmax = C(n) be the total time the jobs occupy the machines. Then theobjective function becomes

Φ = E[∫ ∞

0maxi∈S

wie−rtdt]= E

[∫ ∞

0maxwi : Ci > te−rtdt

]. (7.14)

4. Suppose there is a constant holding cost as long as there remains any unfinishedjob (e.g., overhead cost such as rental of space and equipment). This correspondsto φ(A) = 1 for A =∅ and φ(A) = 0 for A =∅, so that

Φ = E[∫ ∞

0I(Cmax > x)e−rxdx

]= E

[∫ Cmax

0e−rxdx

], r ≥ 0, (7.15)

which is the expected discounted makespan, another type of criterion extensivelydiscussed in the literature. In addition to minimizing makespan in most situations,an interesting scenario can be found in Weiss (1984), in which the scheduling isto maximize the makespan.

For stochastic scheduling model with total-loss machine breakdowns, recall thatfor every job i, i = 1, . . . ,n, the random variable

Ti = mink ≥ 1 : Yik ≥ Pik (7.16)

indicates the frequency that the machine repeats the processing of job i before it iscompleted, and the occupying time Oi is the total amount of time that job i occupiesthe machine:

Oi = Pi,Ti +Ti−1

∑k=0

(Yik +Zik), (7.17)

where Yi0 ≡ 0 and Zi0 ≡ 0. To ensure the problem properly defined, we assume

Pr(Yi < Pi) = 1, i = 1, . . . ,n, (7.18)

(otherwise the job will never be completed). In addition, Pr(Yi = 0) = 1 is assumedfor each i to avoid trivial cases.

To derive the optimal restricted dynamic policy, it is crucial to compute E[e−rOi ](the Laplace transform of Oi) and E

[∫ Oi0 e−rtdt

](the integral transform of Oi). The

latter reduces to the expectation E[Oi] when r = 0. The following two lemmas corre-spond to the results on these expectations under independent processing time modeland identical processing time model respectively. Note that though E[e−rOi ] has beenobtained in Theorem 4.10, the following results also include E

[∫ Oi0 e−rtdt

]with a

more concise proof.

Lemma 7.1. Under the total-loss model with independent processing times,

E[e−rOi ] =E[e−rPiI(Pi≤Yi)

]

1−E[e−r(Yi+Zi)I(Pi>Yi)

] (7.19)

and

E[∫ Oi

0e−rtdt

]=

E[∫ Pi

0 e−rtdtI(Pi≤Yi) +∫ Yi+Zi

0 e−rtdtI(Pi>Yi)

]

1−E[e−r(Yi+Zi)I(Pi>Yi)

] . (7.20)

As a result,

E[∫ Oi

0 e−rtdt]

wiE[e−rOi ]=

E[∫ Pi


0 e−rtdtI(Pi>Yi)

]

wiE[e−rPi I(Pi≤Yi)

] . (7.21)

Proof. First we can express Oi = Pi1I(Pi1 ≤Yi1)+ I(Pi1 >Yi1)(Yi1+Zi1+O′i), where

O′i

d= Oi conditional on Pi1 > Yi1, and “ d

=” means “identically distributed”. Thus

E[∫ Oi

0e−rtdt

]=E[∫ Pi1

0e−rtdtI(Pi1 ≤ Yi1)

]+E

[∫ Yi1+Zi1

0e−rtdtI(Pi>Yi)

]

+E[∫ Oi

0e−rtdt

]E[e−r(Yi1+Zi1)I(Pi>Yi)

].

Solving E[∫ Oi

0 e−rtdt]

from the above equation yields (7.20) under condition (7.18).Equation (7.19) then follows from (7.20) since E[e−rOi ] = 1− rE[

∫Oi0 e−rtdt].

Parallel to Lemma 7.1, we have the following lemma on E[e−rOi ] and E[∫ Oi

0 e−rtdt]

with identical processing times, whose proof is similar to that of Lemma 7.1.

Lemma 7.2. Under the total-loss model with identical processing times,

E[e−rOi

]= E

[E[e−rPi I(Yi≥Pi)|Pi

]

1−E[e−rYi+Zi I(Yi<Pi)|Pi

]]

and

E[∫ Oi

0e−rtdt

]= E[

E[∫ Pi


0 e−rtdtI(Pi>Yi)

∣∣Pi]

1−E[e−r(Yi+Zi)I(Pi>Yi)|Pi

] ].

The next lemma will be useful in the calculation of Gittins index in Sect. 7.2.3 below.

Lemma 7.3. Under the identical processing time model,

E[e−rOi IOi≤∑m

j=0 τi j

]= E

[1−Em

[e−rτi I(Yi<Pi)

∣∣Pi]

1−E[

e−rτi I(Yi<Pi)

∣∣Pi] SYi(Pi−)e−rPi

],

where SYi(x) = 1−FYi(x), FYi(x) is the cdf of Yi, and τi j = Yi j +Zi j.

Proof. We first note that the event Oi ≤ ∑mj=0 τi j is equivalent to Ti ≤ m, which

results in


j=0 τi j

]=

m

∑j=1

E[e−rOi ITi= j]. (7.22)

Conditional on Pi,

E[e−rOi ITi= j|Pi] = E

[e−r(

∑ j−1l=0 τi j+Pi

) j−1

∏j=0

IYil<PiIYik≥Pi

].

Writing A = [1−FYi(Pi−)]e−rPi , we get

E[e−rOi ITi= j] = E

[AE

[k−1

∏j=0

e−rτi j IYi j<Pi

∣∣∣∣∣Pi

]]= E

[AEk−1[e−rτi IYi<Pi

∣∣Pi]].

(7.23)Substituting (7.23) into (7.22) yields


j=0 τi j

]=

m

∑j=1

E[AEk−1 [e−rτi IYi<Pi

∣∣Pi]]

= E

[1−Em[e−rτi IYi<Pi

∣∣Pi]

1−E[e−rτiIYi<Pi∣∣Pi]

SYi(Pi−)e−rPi

]

The lemma is therefore proved.

7.2.2 Optimal Policies with Independent Processing Times

This subsection is dedicated to the total-loss model with independent processingtimes. We first establish a general optimality equation under the cost measure ofgeneralized discounted cost (GDC) by converting the dynamic optimal problem forthis model to a semi-Markov decision process and then deduce from it the optimaldynamic policies under a number of specific criteria. Let N = 1, . . . ,n be the setof the n jobs under consideration. At any decision epoch, the subset S= i1, . . . , imof uncompleted jobs can define the state of the system. Take 2N, the collection ofall subsets of N, as the state space. Then at any decision epoch with state S =i1, . . . , im ∈ 2N, the restricted dynamic policy is to select a job, say ik, from S tobe processed next. At a later point of time when the job is completed or the machinebreaks down, the decision maker will be at another decision epoch and the state willmove to S− ik if the job is completed, or remain at S otherwise. Consequently,finding an optimal dynamic policy is equivalent to solving the following equation:

Ψ(S) = minj∈S

φ(S)E

[∫ O j

0e−rtdt

]+E[e−rOj ]Ψ (S− j)

, (7.24)

and the index j∗ attaining the minimum in (7.24) corresponds to the job to beprocessed next under the optimal dynamic policy.

In general, the optimality equation (7.24) cannot be analytically solved and mayneed numerical techniques. We analyze the following four interesting cases forwhich we can obtain a closed form of the optimal policies.

Case 1: Expected weighted discounted costThe following theorem provides the optimal policy to minimize the expectedweighted discounted cost (EWDC):

EWDC = E

[n

∑i=1

wi

∫ Ci

0e−rtdt

]. (7.25)

Theorem 7.1. The EWDC defined in (7.25) is minimized dynamically by the policythat processes at each decision epoch the job with the highest value of

wiE[e−rOi ]

E[∫ Oi

0 e−rtdt], (7.26)

which can be calculated by (7.21)

Proof. Let S= 1,2, . . . ,m be an arbitrary state, sorted in nondecreasing order of(7.26). Define

Ψ(S) =m

∑i=1

wiE[∫ O1+···+Oi

0e−rtdt

]. (7.27)

It suffices to show that the above Ψ (S) satisfies (7.24) with the minimum attainedat j = 1. By (7.27), for j ∈S,

Ψ(S− j) =j−1

∑i=1


0e−rtdt

]+

m

∑i= j+1

wiE[∫ O1+···+Oi−Oj

0e−rtdt

]

(7.28)

(where ∑ba = 0 if b < a). Let A( j) be the term in (7.24) to be minimized with respect

to j and φ(S) = ∑mi=1 wi. Then by (7.24) and (7.28),

A( j) =j−1

∑i=1

wiE[∫ O1+···+Oi+Oj

0e−rtdt

]+wjE

[∫ Oj

0e−rtdt

]

+m

∑i= j+1


0e−rtdt

]. (7.29)

Taking the difference between (7.27) and (7.29), we get

A( j)−Ψ(S) =j−1

∑i=1

wiE[∫ O1+···+Oi+Oj

O1+···+Oi

e−rtdt]+wjE

[∫ Oj

0e−rtdt −

∫ O1+···+Oj

0e−rtdt

]

=j−1

∑i=1

(wiE[e−rOi

]E[∫ Oj

0e−rtdt

]−wjE

[e−rOj

]E[∫ Oi

0e−rtdt

])

·E[e−r(O1+···+Oi−1)

]. (7.30)

Since

E[∫ Oj

0 e−rtdt]

wjE[e−rOj

] ≥E[∫ Oi

0 e−rtdt]

wiE[e−rOi

]

for j ≥ i, (7.30) shows that A( j) ≥ Ψ(S) for all j ∈ S and the equality in (7.30)holds when j = 1. Therefore min j∈S A( j) = A(1) =Ψ(S).

This theorem can also be deduced from the Gittins index theory of bandit pro-cesses based on Cai et al. (2005). Note that

E[∫ Oi

0 e−rtdt]wiE[e−rOi ]

=

⎧⎨

⎩

E[Oi]/wi r = 01−E[e−rOi]rwiE[e−rOi ]

r = 0.

Hence Theorem 7.1 leads immediately to the following corollary.

Corollary 7.1.

(i) The optimal policy that minimizes the expected weighted flowtime E [∑ni=1 wiCi]

is to process at each decision epoch the job with the lowest value of E[Oi]/wi.(ii) The optimal policy that maximizes the expected total weighted discounted

reward E[∑n

i=1 wie−rCi]

is to process at each decision epoch the job with thelowest value of (1−E[e−rOi ])/wiE[e−rOi ].

Case 2: The expected discounted cost before the first k jobsWe next consider the expected discounted cost truncated at the k-th completed job,as defined in (7.13). The result is stated in the following theorem.

Theorem 7.2. To minimize the expected discounted cost E[∫C(k)

0 e−rtdt]

truncatedat the k-th completed job, the optimal dynamic policy schedules at each decisionepoch any of the first k− n+m jobs in S, where m = m(S) is the size of S.

Proof. For simplicity, we assume that the jobs are nondecreasingly ordered by thevalues of E

[∫ Oj0 e−rtdt

], which can be calculated by (7.20). It suffices again to show

the existence of Ψ (S) such that any of the first k − n+m jobs in S reaches theminimum in the following equation:

Ψ (S) = minj∈S

φ(S)E

[∫ Oj

0e−rtdt

]+E[e−rOj ]Ψ(S− j)

, (7.31)

where φ(S) = 1 if m(S)> n− k and φ(S) = 0 otherwise. We now prove this with

Ψ(S) =

E[∫ O1+···+Ok−n+m

0 e−rtdt]

if m ≥ n− k0 if m < n− k

, (7.32)

where jobs in S are sorted in nondecreasing order of E[∫ Oj

0 e−rtdt].

First, for any state S with m(S) ≤ n− k, (7.31) is trivial with both sides equalto 0. If m = m(S) = n− k + u with 1 ≤ u ≤ k, then φ(S) = 1. Without loss ofgenerality, let S= 1, . . . ,m. Let A( j) be the term in (7.31) to be minimized withrespect to j. If 1 ≤ j ≤ k− n+m= u, then by (7.32),

A( j) = E[∫ Oj

0e−rtdt

]+E[e−rOj ]E

[∫ O1+···+Ou−Oj

0e−rtdt

]

= E[∫ O1+···+Ou

0e−rtdt

]=Ψ(S).

If u < j ≤ m, then

A( j) = E[∫ Oj

0e−rtdt

]+E[e−rOj ]E

[∫ O1+···+Ou−1

0e−rtdt

]

= E[∫ O1+···+Ou−1+Oj

0e−rtdt

].

It follows that

A( j)−Ψ(S) = E[e−r(O1+···+Ou−1)

]E[∫ Oj

0e−rtdt

]−E

[∫ Ou

0e−rtdt

]≥ 0.

Thus min j∈S A( j) =Ψ (S) = A( j∗) for 1 ≤ j∗ ≤ u.

Theorem 7.2 gives the optimal dynamic policy for any individual k ∈ 1, . . . ,n.As an immediate consequence of Theorem 7.2, the following corollary provides anoptimal policy for all k.

Corollary 7.2. The policy that selects at each decision epoch the job with thesmallest value of E

[∫ Oj0 e−rtdt

]among the unfinished jobs to process is dynamically

optimal to minimize E[∫C(k)

0 e−rtdt]

for all k ∈ 1, . . . ,n.

Case 3: Minimizing the number of tardy jobs under stochastic orderSuppose that all jobs have a common due date D that is exponentially distributedwith rate r > 0. Let C(k)(π) denote the completion time of the k-th completed jobunder policy π , and N(π) = ∑n

i=1 I(Ci(π)≤ D) be the number of jobs completed onor before D. Then

Pr(N ≥ k) = Pr(C(k) ≤ D) = E[e−rC(k) ] = 1− rE[∫ C(k)

0e−rtdt

].

So maximizing Pr(N ≥ k) is equivalent to minimizing the expected cost definedin (7.13).

Let π∗ be the policy that processes at each decision epoch the job with the lowestvalue of E

[∫ Oi0 e−rtdt

]. According to Corollary 7.2, π∗ is dynamically optimal under

performance measure (7.13) for all k, 1 ≤ k ≤ n. In other words, for any policy π ,

Pr(N(π∗)≥ k)≥ Pr(N(π)≥ k) for all k = 1, . . . ,n.

This shows that N(π∗) is greater than or equal to N(π) under stochastic order, whichleads to the next theorem.


Theorem 7.3. Under stochastic order, the optimal dynamic policy to minimize thenumber of tardy jobs with a common exponential due date is to process at eachdecision epoch the job with the smallest value of E

[∫ Oi0 e−rtdt

], or equivalently, the

largest value of E[e−rOi

], which can be calculated by (7.19) or (7.20).

An earlier version of this problem was discussed by Frostig (1991) under expo-nentially distributed uptimes. Theorem 7.3 is based on Cai et al. (2005) and extendsFrostig’s results to general uptime distributions.

Case 4: The maximum discounted holding costNow we turn to the problem of minimizing the maximum discounted holding costin (7.14). Its optimal solution is given in the next theorem. It is interesting to notethat the optimal solution does not depend on the processing times or the occupyingtimes at all.

Theorem 7.4. The optimal dynamic policy to minimize E[∫ ∞

0 maxi∈S wie−rtdt]

isto schedule at each decision epoch the job j in S with the maximum weight amongunfinished jobs, i.e., w j = maxi∈S wi.

Proof. Let S be any state. For convenience, and without loss of generality, assumethat S= 1,2, . . . ,m and w1 ≥ w2 ≥ · · ·≥ wm. Then φ(S) = w1. Define

Ψ(S) =m

∑i=1

wiE[e−r(O1+···+Oi−1)

]E[∫ Oi

0e−rtdt

]. (7.33)

We next check that it satisfies the optimality equation (7.24) with the minimumattained at j = 1. To do so, let A( j) be the term in (7.24) to be minimized withrespect to j. Then by (7.33), for j ∈S,

A( j) = w1E[∫ Oj

0e−rtdt

]+

j−1

∑i=1


]E[e−rOj

]E[∫ Oi

0e−rtdt

]

+m

∑i= j+1


]E[∫ Oi

0e−rtdt

]. (7.34)

As rE[∫ Oi

0 e−rtdt]= 1−E

[e−rOi

],

rj−1

∑i=1


]E[∫ Oi

0e−rtdt

]

≤j−1

∑i=1

w1(E[e−r(O1+···+Oi−1)

]−E[e−r(O1+···+Oi)

])

= w1(1−E

[e−r(O1+···+Oj−1)

]).

Hence by (7.33) and (7.34),

A( j)−Ψ(S)≥ E[∫ Oj

0e−rtdt

](w1 −wj)E

[e−r(O1+···+Oj−1)

]≥ 0,

and it is easy to see A(1)−Ψ(S) = 0. Thus min j∈SA( j) = A(1) =Ψ(S).

7.2.3 Optimal Policies with Identical Processing Times

This model is more difficult than the independent processing times. After deducingthe optimality equation, we will focus on the case with φ(S) = ∑i∈S wi, whichis equivalent to maximizing the expected weighted discounted reward (see (7.12)and corresponding argument). This will be extended to the problem with expectedweighted flowtime, the corresponding to the case r = 0. We will proceed with thefollowing steps: First, convert the process of the operation on each job to a banditprocess, which is essentially a special case of the semi-Markov decision process.Next, show that the optimal policies can be obtained by employing Gittins index. Atlast, exhibit the calculation of the corresponding Gittins index.

We first define the state of the system to convert the problem into a Markovdecision process. Due to the independence between jobs and the breakdown pro-cesses, the states of the system can be characterized in terms of each job’s state asfollows.

1. The states. Suppose that job i has been processed k times and remains unfinishedat a decision epoch t. By that time the k repeated processing times on job i arethe same as the k uptimes Yi1, . . . ,Yik, and the decision time is t = ∑k

j=1 τi j ,where τi j = Yi j + Zi j. Then the state of job i at time t is defined as xi(t) =maxYi1, . . . ,Yik. Otherwise, if job i has been finished by time t, its state iswritten as ∗. It is then clear that the future distribution of the system dependsonly on the state (xi, i ∈ S) and therefore with no loss of generality we drop thetime parameter t and always suppose we are at the time t = 0 below.

2. The behavior of the job being processed. When an unfinished job i is selected tobe processed at state xi, it will be processed for min(Yi1,Pi) units of time. Thetime it will occupy the machine before the next decision epoch and the state itenters are

(τ,y) = (τ (xi) ,y) =(τi1,max(x,Yi1) if Yi1 < Pi(Pi,∗) if Yi1 ≥ Pi

,

where τi1 =Yi1 +Zi1.3. The joint distribution of (τ (xi) ,y) depends only on the state xi.

Let Ψ(xi:i∈S) be the minimum expected discounted cost. It satisfies the obviousoptimality equation

Ψ(xi:i∈S) = minj∈S

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

φ(S)E[∫ τ(x j)

0 e−rtdt∣∣Pj > x j

]

+E[e−rPj I[Pj ,∞)(Yj)Ψ(xi:i∈S− j)

∣∣Pj > x j]

+E[e−rτ j I(x j ,Pj)(Yj)Ψ(xi:i∈S− j;Yj)

∣∣Pj > x j]

+E[e−rτ j1I(0,x j)(Yj)

∣∣Pj > x j]Ψ(xi:i∈S)

⎫⎪⎪⎪⎬

⎪⎪⎪⎭.

Simple algebraic computation shows that this can be equivalently rewritten as

Ψ(xi:i∈S) = minj∈S

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

E[∫ τ(x j)

0 e−rtdt∣∣Pj > x j

]

1−E[e−rτ j1 I(0,x j)(Yj)

∣∣Pj > x j]φ(S)

+E[e−rPj I[Pj ,∞)(Yj)Ψ(xi:i∈S− j)

∣∣Pj > x j]


∣∣Pj > x j]

+E[e−rτ j I(x j ,Pj)(Yj)Ψ(xi:i∈S− j;Yj)

∣∣Pj > x j]


∣∣Pj > x j]

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

. (7.35)

If we are concerned with maximizing the objective function E[∑n

i=1 wie−rCi]

asin (7.12), the optimal policy can be designed by means of Gittins index. As notedearlier, the maximization of the expected weighted discounted reward is equivalentto minimizing the expected discounted cost given by (7.12) for any r > 0. As r → 0,(7.12) converges to the expected weighted flowtime E [∑n

i=1 wiCi].The results are presented in the following theorem. Here we need the condition

that Pi,Yi,Zi are mutually independent.

Theorem 7.5. Under the identical processing time model, at state xi, the Gittinsindex of job i is

Gi(xi) = wi

∫ ∞xi

e−rxSYi(x−)dFi(x)∫ ∞

xiE[Ix>Yi

∫ τi0 e−rtdt + Ix≤Yi

∫ x0 e−rtdt

]dFi(x)

. (7.36)

As a result, the Gittins index of job i for minimizing E [∑ni=1 wiCi] is

Gi(xi) = wi

∫ ∞xi

SYi(x−)dFi(x)∫ ∞

xiE[Ix>Yiτi + xIx≤Yi

]dFi(x)

. (7.37)

Proof. It suffices to show that the Gittins index can be expressed as in (7.36). As weonly consider a single job, the job identifier i is suppressed for the time being. Thuswe use P (with distribution function F) to stand for the processing time of the joband Yi and Zi respectively the ith machine up- and down-time associated with thisjob. At any decision epoch, say the beginning of the (k+ 1)th round of processing,the filtration is Fk = σ(Y1, . . . ,Yk;Z1, . . . ,Zk;P > maxY1, . . . ,Yk). Suppose thatwe stand at the time instant when the unfinished job is starting its (k+ 1)th roundof processing. Define ∆ j = τ j IYj0

wiE[e−rOk Iσ=Ok|Fk

]

E[∫ σ

0 e−rtdt|Fk] , (7.38)

where Ok = O − ∑kj=1 τ j is the remaining occupying time of the job and the

maximization is taken over all random variables

σ =k+ρ

∑j=k+1

[τ jIYj<P+PIYj≥P

]

with ρ being positive random variables such that k + ρ ≤ Ti and k + ρ is anF -stopping time. Below, we will loosely refer to σ as a stopping time. Write Fkfor the conditional distribution of P given Fk. Due to the independence betweenY,Z and P, the conditional distribution of P is given by

Fk(x) = Pr(P ≤ x|Fk) =F(x)−F (maxY1, . . . ,Yk)

1−F (maxY1, . . . ,Yk)Ix≥maxY1,...,Yk. (7.39)

Let ρ be any positive random number such that ρ ≤ l∈Fk+l , i.e., ρ is a stoppingtime with respect to the filtration Fk+l; l ≥ 0. Then the Gittins index at time∑k

j=1 τ j is

Gk = wi maxρ>0

E[e−r(

O−∑kj=1 τ j

)

IT=ρ+k|Fk]

E[g(P,ρ)|Fk]

= wi maxρ>0

∫ ∞maxYi1,...,Yik f (x,ρ)dF(x)∫ ∞

maxYi1,...,Yik g(x,ρ)dF(x), (7.40)

where

f (x,ρ) = E[e−rOIρ=T|P = x

]and g(x,ρ) = E

[∫ ∑ρj=1 ∆ j

0e−rtdt

∣∣∣Pi = x].

Furthermore, the one-step discounted reward rate is calculated as

vk = wi

∫ ∞maxYi1,...,Yik f (x,1)dF(x)∫ ∞

maxYi1,...,Yik g(x,1)dF(x).

Write SYi(x−) = Pr(Yi ≥ x) and FYi(x−) = Pr(Yi < x). Note that

fi(x,1) = E[e−rPIP≤Y|P = x

]= e−rxSY (x−),

and

g(x,1) = E[∫ ∆1

0e−rtdt

∣∣∣P = x]= E

[Ix>Yi

∫ τi

0e−rtdt + Ix≤Yi

∫ x

0e−rtdt

],

where ∆1 = PIY1≥P+ τIY1<P. Since ∆1 = min(P,Y1)+Z1IY1<P is increasing inP, so is g(x,1) in x. Considering the decrease of f (x,1) in x it follows that

7.3 Restricted Dynamic Policies for No-Loss Breakdown Models 269∫ ∞

u f (x,1)dF(x)∫ ∞

u g(x,1)dF(x)≤ f (u,1)

g(u,1)for u > 0.

By taking the derivative of∫ ∞

u f (x,1)dF(x)/∫ ∞

u g(x,1)dF(x), it is not difficult to seethat

∫ ∞u f (x,1)dF(x)/

∫ ∞u g(x,1)dF(x) is decreasing in u > 0. Therefore,

vk = wi

∫ ∞maxY1,...,Yk e−rxSY (x−)dF(x)

∫ ∞maxY1,...,Yk E

[Ix>Y

∫ τ0 e−rtdt + Ix≤Y

∫ x0 e−rtdt

]dF(x)

,

is decreasing in k since maxY1, . . . ,Yk is increasing in k. This means that theone-step reward rate is deteriorating and thus the Gittins index is the same as itsone-step reward rate. The theorem is thus proved.

Remark 7.1. When processing times are independent samples after each interruptionby a breakdown, our results in Sect. 7.2.2, as well as the popular results in theliterature (e.g., Glazebrook and Owen, 1991), have shown that there often existsa static (non-preemptive) policy that is also optimal in the class of dynamic policies.On the other hand, if we only consider the static policies, then the optimal policyunder the model of identical processing times is to process the jobs in nonincreas-ing order of the indices Gi(0) = wiE[e−rOi ]/E[

∫ Oi0 e−rtdt] (either in the weighted

discounted rewards case or in the weighted flowtime case; see Chap. 4). Thus thistheorem shows that under the identical processing times model, the optimal staticpolicy is not necessarily optimal in the class of dynamic policies. This raises a sharpdistinction between the models with independent and identical processing times.The proof also shows that the Gittins index for a job is pathwise decreasing. Thusfor a job i, if after k repetitions of processing, its Gittins index falls below anotherjob’s Gittins index, then preemption is necessary to ensure the optimality.

7.3 Restricted Dynamic Policies for No-Loss Breakdown Models

This section is devoted to the problem of optimal restricted dynamic policies forsingle-machine scheduling with no-loss machine breakdowns. For the notation andother details of this model, see Sect. 4.2. Although there have been numerous re-search efforts on this model, they are largely restricted to static or unrestricted dy-namic policies. In the limited work on restricted dynamic policies, the uptimes haveusually been confined to exponential or geometric distributions for ease of mathe-matical expositions based on their memoryless property, and certain particular con-ditions are needed to ensure nonpreemptive optimal policies (see e.g., Glazebrookand Owen, 1991). This section provides an exposition of the no-loss breakdownmodel under generally distributed uptimes. We will show that, without the memory-less property of the uptimes, the optimal dynamic policy may need job preemptionat a decision epoch, hence a nonpreemptive policy may no longer be optimal.

Under the no-loss model, as stated in Sect. 4.2, a breakdown has no impact on thework done previously and thus the job will be resumed from where it was interrupted


when it is processed by the machine again. In this section, the objective is to find anoptimal policy to maximize the expected total discounted rewards E

[∑n

i=1 wie−rCi]

in the class of restricted dynamic policies. To be specific, as previously formulated,every job i ∈ 1,2, . . . ,n is associated with a processing time (also referred to asprocessing requirement) Pi and a breakdown process (Yi,Zi) = (Yik,Zik),k ≥ 1 ofi.i.d. pairs of uptimes and downtimes, such that Pi;Yi,Zi are mutually independentover i = 1,2, . . . ,n. By the theory of bandit processes in Sect. 6.1, what we need isto work out the Gittins indices for every job at every machine breakdown, thus weonly need to work with a generic single job again.

For this fixed job, associated with a weight w, a processing time P and abreakdown process (Y,Z) = (Yk,Zk),k ≥ 1, its processing can be modeled as asemi-Markov process: At any time instant before the job is completed, the statex is the processing achievement and at the completion the state is denoted by thesymbol ∗. In this semi-Markov setting, for the computation of Gittins indices, onlythe following type of stopping times are considered. Denote by Xn the achievementof the processing at the nth breakdown, i.e. Xn = ∑n

k=1 Yk if the job has not beenfinished and Xn = ∗ otherwise. If the state is x, for every Borel set A ⊂ (x,∞), define

τA = mink > n : Xk ∈ A∪∗.

Then the Gittins index at state x can be achieved by a stopping time of such type, asstated in the following theorem.

Theorem 7.6. In the no-loss model, at the nth breakdown, the Gittins index can becomputed as

Gx = w supA⊂(x,∞)

E[IXτA=∗e−r∆τA |P > x]

E[∫ ∆τA

0 e−rtdt∣∣∣P > x

] ,

where

∆τA =

∑τA

k=n+1(Yk +Zk) if XτA = ∗P− x+∑τA−1

k=n+1 Zk if XτA = ∗ .

If we take A = (x,∞), then τA = n+ 1 and

∆τA =

(Yn+1 +Zn+1) if XτA = ∗P− x+∑τA−1

k=n+1 Zk if XτA = ∗ .

E[IXτA=∗e−r∆τA |P > x]/E[∫ ∆τA

0 e−rtdt|P > x] is the one-step reward rate at state x.Now we write P(x) for the remaining processing time of P : P(x) = P− x, of whichthe distribution is computed given P > x. Note that the one-step reward rate at statex can be computed up to a constant r as

7.3 Restricted Dynamic Policies for No-Loss Breakdown Models 271

r(x) =E[Is(∆ )=∗e−r∆ ]

E[∫ ∆

0 e−rtdt] =

E[e−rP(x)IP(x)<Y+ e−rτIP(x)=Y

]

1−E[e−rP(x)IP(x)<Y+ e−rτIP(x)≥Y

]

=E[e−rP(x)

(IP(x)<Y+ d(Y)IP(x)=Y

)]

1−E[e−rP(x)IP(x)<Y+ e−rY d(Y )IP(x)≥Y

] , (7.41)

where d(Y ) = E[e−rZ|Y ]. It can be readily checked that r(x) increases (decreases) inx if P(x) decreases (increases) in x under stochastic order. This in turn correspondsto the condition that the distribution of P is of increasing (decreasing) failure rate,written as IFR (DFR). Therefore, we have the following proposition.

Proposition 7.1. For the no-loss model,

(1) If the distribution of P is of IFR, the Gittins index at state x is

G(x) =E[e−rO(x)]

E[∫ O(x)

0 e−rtdt] ,

where O(x) is the occupying time of remaining processing requirement P(x),given P > x;

(2) If the distribution of P is of DFR, then the Gittins index at state x is r(x), whichcan be computed by (7.41).

Part (1) is similar to Theorem 3 of Glazebrook and Owen (1991) where he dis-cussed the discrete time setting in which the information update and the decisionepochs occur at integer times.

To get a more general result, write h(x) = 1/rΩ (x)+ 1, and further assume thatF(x) has density function f (x) (the similar result holds for discrete processing timestaking integer values in R+). Then an easy computation gives

h(x) =S(x)−

∫∞0 S(x+ t)d(t)e−rtdt

S(x)−∫∞

0 S(x+ t)d(1− SY(t)e−rt).

Therefore, rΩ (x) increases (or decreases) in x if and only if

f (x)−∫ ∞

0 f (x+ t)d(t)e−rtdtS(x)−

∫ ∞0 S(x+ t)d(t)e−rtdt

≥ (≤)f (x)−

∫ ∞0 f (x+ t)d(1− SY(t)e−rt)

S(x)−∫ ∞

0 S(x+ t)d(1− SY(t)e−rt)(7.42)

for all x ≥ 0. The next proposition follows.

Proposition 7.2. For the no-loss model,

(1) If (7.42) holds with “≥”, the Gittins index can be expressed as

G(x) =E[e−rO(x)]

E[∫ O(x)

0 e−rtdt] ; (7.43)

(2) If condition (7.42) holds with “≤”, the Gittins index is the one-step reward rate:

G(x) =E

e−rP(x)IP(x)<Y

1−E[e−rP(x)IP(x)<Y+ e−rY d(Y )IP(x)≥Y

] . (7.44)

Note that in this proposition, the optimal policy is in fact nonpreemptive in case(1). In case (2), the maximum reward rate (the Gittins index) can be reached bythe myopic rule, and the Gittins index goes down after each round of processing.Hence, when the Gittins index of a job being processed falls below the Gittins indexof another job at a decision epoch, a preemption of the job is required by the optimalpolicy. A special case is that the processing time is exponentially distributed. Then(7.42) holds with “=”, and hence the Gittins indices (7.43) and (7.44) agree.

Remark 7.2. Under the no-loss model, while a general form of Gittins indices ispresented in Theorem 7.6 and a few specialized cases are discussed in Proposi-tions 7.1 and 7.2, we would like to point out that it is still not clear how to effectivelycompute Gittins indices for general processing times and breakdown processes.

7.4 Partial-Loss Breakdown Models

The basic formulation of partial-loss breakdown models can be referred back toSect. 4.4. We here establish the theoretical framework for optimal policies in theclass of restricted dynamic policies in two steps: First, the problem of processinga single job is converted to a semi-Markov process (and thus the scheduling of njobs turns to the optimal operation of a standard multi-armed bandit process), whichwill help the development of the Gittins indices later. Then in the second step, weelaborate the general approach to derive the corresponding Gittins indices. The op-timal static and nonpreemptive dynamic policies will be addressed as side-productsas well.

7.4.1 The Semi-Markov Model for Job Processing

In this subsection we will work with a single job to convert its processing into asemi-Markov process, hence the job indicator i is suppressed, as in the previous twosections.

Since the processing history of a job affects the distribution of its remain-ing processing time through X(t) and h(t), we denote by P(x,y) the remainingprocessing time of the job at time t with X(t) = x and h(t) = y, for which theconditional distribution was formulated in (4.96) and (4.97). Next, let τk = Yk +Zkrepresent the k-th period of the processing process, k = 1,2, . . . , which form a se-quence of i.i.d. random variables, with a typical representative τ = Y +Z. Define

the processing state of a job at time t as

s(t) =

(X(t),h(t)), if the job has not been completed at time t,∗ if the job has been completed at time t,

(7.45)

and the state space as Ω = (x,y) : 0 ≤ x ≤ y < ∞. Then the processing states of ajob form a semi-Markov process as elucidated below.

• The initial (processing) state is (0,0).• At a state (x,y) while the job is being processed, the next transition occurs when

the job is completed or the machine is fixed after a breakdown, whichever isearlier; the corresponding time span is referred to as transition time and denotedby ∆(x,y) (or simply ∆ ). More specifically, the state of the job will evolveaccording to the occurrence of event A1, A2 or A3 at the next transition as follows:

– A1 = P(x,y)<Y: The state evolves to ∗, ∆ =P(x,y), and one unit of rewardis collected at time point P(x,y) (assuming a reward w = 1 without loss ofgenerality).

– A2 = P(x,y) = Y: The state evolves to ∗, ∆ = τ in distribution and oneunit of reward is collected at time point τ . If either Y or P is continuous, thenPr(A2) = 0.

– A3 = P(x,y) > Y: The new state is (v,y ∨ (x +Y ) ∨ v), where v is theaccumulated work after a breakdown, and ∆ = τ in distribution, but no rewardis collected.

Let “ d=” denote equality in distribution. Then uniformly,

∆(x,y) d= minP(x,y),Y+ IY≤P(x,y)Z. (7.46)

• The decision epochs are time zero and the endpoints of transition times.

In terms of this semi-Markov process, the future depends only on the currentstate. Hence there is no loss of generality to suppose that state (x,y) is at time zero.

Moreover, we define

O(x,y) d= inft : s(0) = (x,y) and s(t) = ∗, (7.47)

which is called the remaining occupying time of the job, where s(t) is the state ofthe job; see (7.45). Note that O(x,y) may take value ∞ with a positive probability.It is straightforward to prove the following sufficient condition for O(x,y)< ∞:

Proposition 7.3. If for some δ < 1, independent of x and y, Pr(P(x,y)> Y )≤ δ forall state (x,y), then Pr(O(x,y)< ∞) = 1 for all (x,y).


7.4.2 Integral Equations for Gittins Indices

For a job being processed, a stopping rule determines when the processing will stop,based on the historical information of the processing. In this subsection we addressthe expected rewards and discounted stopping times under any Markovian determin-istic and homogeneous (MDH) stopping rule in order to develop our optimal policies(to be given in terms of Gittins indices in the next subsection). Recall that a stoppingrule is MDH if at each state one can decide whether to stop or continue the process-ing irrespective of the time instant t. An MDH stopping rule thus corresponds to ameasurable subset A of the state space: at any decision epoch, the processing stopsunder this stopping rule if the job’s state enters A; or continues otherwise. Sincestopping at state ∗ is mandatory, we always exclude ∗ from A. Denote by Ac thecomplement of A and by ρA the stopping time corresponding to subset A:

ρA = inft : s(0) = (x,y),s(t) ∈ A∪∗, (7.48)

where, as in Eq. (7.47), s(t) denotes the state of the job. By the definitions of ∆(x,y)and O(x,y) in (7.46)–(7.47), we have

ρΩ (x,y) = ∆(x,y) and ρ∅(x,y) = O(x,y). (7.49)

At any state (x,y), with respect to stopping set A, the expected present value of oneunit of reward at job completion is

RA(x,y) = E[Is(ρA)=∗ exp−rρA], (7.50)

and the expected discounted stopping time is WA(x,y) = E[e−rρA ]. Since s(ρ∅) = ∗,an apparent result is

R∅(x,y) =W∅(x,y) = E[exp−rO(x,y)]. (7.51)

For any function N(x,y) defined on the state space, introduce an integral transform

ΓA N(x,y) = E[e−r∆ N(s(∆))IAc (s(∆))

]= E

[e−rτ N(s(τ))IAc (s(τ))

], (7.52)

where IA(x) is the indicator of set A. This means the expected discounted rewardthat earns N(s(∆)) at time ∆ if s(∆) ∈ Ac and zero otherwise. Clearly,

ΓΩ N(x,y) = 0 and Γ∅ N(x,y) = E[e−r∆ N(s(∆))Is(∆ ) =∗

].

A simple computation gives the equations satisfied by RA(x,y) and WA(x,y), in termsof the functional transform ΓA, in the following theorem.

Theorem 7.7. RA(x,y) is the solution to the integral equation

RA(x,y) = R0(x,y)+ΓA RA(x,y), (7.53)

where R0(x,y) = E[e−r∆ Is(∆ )=∗]; and WA(x,y) is the solution to the integralequation

WA(x,y) =W 0A (x,y)+ΓA WA(x,y), (7.54)

where

W 0A (x,y) = E[IA∪∗(s(∆))e−r∆ ]. (7.55)

Proof. By the decompositions based on the realization of the process during the firstround of job processing, we get

RA(x,y) = E[Is(∆ )=∗ exp(−rρA)]

= E[e−r∆ Is(∆ )=∗]+E[e−r∆ RA(s(∆))IAc(s(∆))

]

= E[e−r∆ Is(∆ )=∗]+ΓA RA(x,y),

and

WA(x,y) = E[e−r∆ Is(∆ )=∗]+E[IA(s(∆))e−r∆ ]E[e−r∆WA(s(∆))IAc(s(∆))

]

= E[e−r∆ Is(∆ )=∗]+E[IA(s(∆))e−r∆ ]+ΓA WA(x,y).

The theorem follows.

The next theorem expresses RA(x,y) and WA(x,y) as infinite series in terms ofΓ k

A R0(x,y) and Γ kA W0

A (x,y) respectively.

Theorem 7.8. If r > 0, the solutions to Eqs. (7.53) and (7.54) are, respectively,

RA(x,y) =∞

∑k=0

Γ kA R0(x,y) and WA(x,y) =

∞

∑k=0

Γ kA W0

A (x,y). (7.56)

Proof. For the space L of functions N(x,y) with |N(x,y)|< 1, we associate it withthe L∞ norm ∥ ·∥ defined by ∥N(x,y)∥= supx,y |N(x,y)|. Then L together with ∥ ·∥form an L∞-space. By the definition of ΓA in (7.52), we see that

∥ΓA N(x,y)∥=supx,y

|ΓA N(x,y)| ≤ supx,y

|N(x,y)|E[e−r∆ IAc(s(∆ ))

]< ∥N(x,y)∥E[e−r∆ ].

Therefore,

∥ΓA∥= supN∈L

∥ΓA N(x,y)∥∥N(x,y)∥ ≤ E[e−r∆ ]< 1. (7.57)

Thus 1 − ΓA is invertible, so that Eqs. (7.53) and (7.54) have unique solutions.Note that |R0(x,y)| < 1 and |W 0

A (x,y)| < 1, hence the infinite series in (7.56) areconvergent and solve Eqs. (7.53) and (7.54).

Remark 7.3. By Theorem 7.7, we can extract RA(x,y) and WA(x,y) by solvingintegral equations; and Theorem 7.8 shows that the solutions can further be expressedas infinite series of the corresponding operators. The analytic solution is difficult towork out in general. When the state space is finite, however, Eqs. (7.53) and (7.54)


are actually linear equations with finite variables and can thus be solved simply bystandard matrix methods. For an infinite state space, we may approximate the dis-tributions by those with a finite state space and thus obtain approximate solution bysolving linear equations (note that in most of scheduling studies, processing timesare assumed to take integer values, which implies finite state space). Another wayis to use the first m terms in (7.56), W (m)

A (x,y) = ∑mk=0 Γ k

A W 0A (x,y), to approximate

WA(x,y), and choose m to control the error via its upper bound (see (7.57)):

supx,y

∣∣∣WA(x,y)−W (m)A (x,y)

∣∣∣= supx,y

∞

∑k=m+1

Γ kA W 0

A (x,y)≤∞

∑k=m+1

∥ΓA∥k

=∥ΓA∥m+1

1−∥ΓA∥≤[E[e−r∆ ]

]m+1

1−E[e−r∆ ].

7.4.3 Optimal Policies via Gittins Indices

As a side-product, we first give the following theorem on the optimum in the classof nonpreemptive dynamic policies.

Theorem 7.9. In the class of nonpreemptive dynamic policies, the optimal policy isto schedule the jobs in nonincreasing order of

wiR∅,i(0,0)1−R∅,i(0,0)

, i = 1, . . . ,n. (7.58)

where R∅(x,y) is given by (7.51) and can be computed by Theorems 7.7 and 7.8.

Proof. In the class of nonpreemptive policies, the problem is reduced to the classicalversion on a reliable machine with R∅,i(0,0) in place of the processing times Pi.Hence by the standard scheduling theory under the expected weighted anddiscounted reward criteria, the optimal nonpreemptive dynamic policy is to schedulejobs in nonincreasing order of (7.58) (cf. Pinedo, 2002).

Remark 7.4. Theorem 7.9 gives the optimal nonpreemptive dynamic policy. Further-more, the nonincreasing order of (7.58) can be determined at time zero and main-tained with no change after it is implemented to process the jobs. This will lead tothe optimal static policy, hence Theorem 7.9 gives the optimal static policy as well.

We next address the optimum in the class of the restricted dynamic policies.Under the model formulated above, the operation processes of the n jobs aremutually independent. Hence, by the theory of multi-armed bandit processes (seee.g. Gittins 1989; Ishikida and Varaiya 1994), at every decision epoch, each unfin-ished job can be attached an index (known as Gittins index) that is a function ofthe processing history of the job, and the optimal policy is to select the job with thebiggest index to process at each decision epoch. More specifically, for a job at state(x,y), we can define its Gittins index as

G(x,y) = wsupA

RA(x,y)∫ ρA0 e−rtdt

= rwsupA

RA(x,y)1−WA(x,y)

, (7.59)

where RA(x,y) and WA(x,y) are given by Theorems 7.7 or 7.8. Note that r is aconstant which is job-independent and thus does not affect the optimal solution.Thus, based on Gittins (1989), we can establish the following theorem on theoptimal policy.

Theorem 7.10. Denote by B(t) the set of all unfinished jobs at any decision epoch t.Any job i in B(t) at state (xi,yi) can be assigned an index Gi(xi,yi) defined by

Gi(xi,yi) = wi supA

RA(xi,yi)

1−WA(xi,yi), (7.60)

where RA(xi,yi) and WA(xi,yi) are given by Theorems 7.7 or 7.8. Then the optimaldynamic policy to maximize the expected discounted reward is to select the job withthe highest Gittins index G∗ = maxGi(xi,yi) : i ∈ B(t) to be processed at eachdecision epoch.

Note that the calculation of the Gittin index for any job requires only informationon the particular job itself. Consequently, the coupling among different jobs in thecombinatorial scheduling problem has been decoupled and therefore the problem offinding optimal scheduling policies, which appears quite complicated, has escapedfrom the class of NP-complete problems (see, e.g., Garey and Johnson 1979). This isa significant progress in tackling the problem. Nevertheless, practical computationof the Gittins index (7.59) is still a challenging task that requires specific and delicatetechniques in different situations. The following are two special cases in which theGittins indices can be practically computed.

1. Finite states: If the processing times have finite supports, which is actually acommon assumption in most scheduling studies, then the state space is finite.Hence as pointed out in Remark 7.3, RA(x,y) and WA(x,y) can be obtained bysolving systems of finite linear equations. The supremum in (7.59) is then readilycomputable since the class of all stopping sets is now finite.

2. Monotone one-step reward rates: When the state space is infinite, there are twoextreme cases to be noted: The supremum in (7.59) can be reached by A = ∅ orA = Ω (the state space). To clearly describe these two cases, we first define theone-step reward rate (OSRR) as

rΩ (x,y) =E[e−r∆ Is(∆ )=∗

]

E[∫ ∆

0 e−rtdt] =

E[e−rP(x,y)IP(x,y)<Y+ e−rτIW−x≥P(x,y)≥Y

]

E[∫ ∆

0 e−rtdt]

(7.61)

(see also Cai et al. 2009b). Let L(x,y) = (x′,y′) : rΩ (x′,y′) ≤ rΩ (x,y) andU(x,y) = (x′,y′) : rΩ (x′,y′)≥ rΩ (x,y).

• If after each round of processing, the state moves with certainty from (x,y) to(x′,y′) ∈ L(x,y), then the Gittins index (7.60) can be reached by A = Ω , sothat G(x,y) = rΩ (x,y). In such a case, the optimal restricted dynamic policyis to pick the job with the maximum one-step reward rate to process at eachdecision epoch.

• If after each round of processing, the state moves with certainty from (x,y)to (x′,y′) ∈ U(x,y), then the Gittins index (7.60) can be reached by A = ∅,so that G(x,y) = r∅(x,y). In this case, the optimum in the class of restricteddynamic policies is actually nonpreemptive.

The next subsection presents two specific patterns of uncertain loss of work, toillustrate the applications of the theoretical framework developed in this subsection.

7.4.4 Specific Partial-Loss Breakdown Models

Local Preemptive-Repeat Model

Consider a local preemptive-repeat model in which each job consists of m tandemparts to be processed in sequel. A job is completed if and only if all its parts arecompleted. Each part requires a deterministic processing time pi, i = 1,2, . . . ,m.Write x j = ∑ j

k=1 pk. When the machine breaks down, only the work on the lo-cal part is lost (i.e., there is a preemptive-repeat effect on the local part only).When a machine breakdown occurs, if parts 1, . . . , j− 1 have been finished but partj is being processed, then the work on part j is lost, but the work achieved on parts1, . . . , j − 1 is preserved. If the processing achievement at the breakdown is x, say,where x j−1 ≤ x < x j, then it will be reduced to x j−1 after the breakdown, result-ing in a loss x− x j−1 that is uncertain depending on the timing of the breakdown.This kind of local preemptive-repeat patterns may arise naturally in many other realsituations.

More specifically, the model is described as follows.

• The up/downtimes (Yk,Zk) are independent with arbitrary distributions.• The state of the job reduces to the processing achievement that takes values in

x1,x2, . . . ,xm at each time point when the machine is fixed after a breakdown,and is in the interval [0,xm] at the time point when a breakdown occurs. Thus weneed to compute the Gittin indices at states 0 = x0,x1,x2, . . . ,xm.

• State transition during each downtime is x → max

x j : x j ≤ x. That is, for each

given x, Qi(v,x) is a probability degenerate at v = maxxk : xk ≤ x,1 ≤ k ≤ m.• Then we have the following facts:

1. As P is deterministic, the remaining processing time at state x j is xm − x j:

S(x j + z,y)S(x j,y)

= I[0,xm−x j)(z) j = 0,1, . . . ,m− 1.

2. d(Y ) = d is independent of Yi, where

d(Y ) = E[e−rZ|Y ] and d = E[d(Y )] = E[e−rZ].

• At state x j, the transition period is ∆ = (xm − x j)Ixm−x j≤Y + τIxm−x j>Y andthe state at the end of transition period ∆ is

s(∆) =

∗ if Y ≥ xm − x j

xk if xk − x j ≤ Y < xk+1 − x j.

We now elaborate how to calculate RA(x j) and WA(x j) for any state x j and thesubset of the state space excluding the special state ∗. Note that, from the facts statedabove and Eqs. (7.52)–(7.55) in Theorem 7.7, for any function N(x j) defined on setx0,x1, . . . ,xm, we have

ΓA N(x j) = dm−1

∑k= j

IAc(xk)bkN(xk), where bk =∫

[xk−x j ,xk+1−x j)e−rydFY (y).

Moreover, R0(x j) = e−r(xm−x j)SY ((xm − x j)−) and

W 0A (x j) = R0(x j)+W0

A (x j) with W 0A (x j) = d

m−1

∑k= j

IA(xk)bk. (7.62)

Thus Eqs. (7.53) and (7.54) for y j = RA(x j) and z j =WA(x j) can be written respec-tively as, for j = 1, . . . ,m− 1,

y j = R0(x j)+ dm−1

∑k= j

IAc(xk)bkyk and z j =W 0A (x j)+ d

m−1

∑k= j

IAc(xk)bkzk,

which are easy to solve. Following the calculations of RA(x) and WA(x), the Gittinsindices assigning the priorities to job at state x can be calculated by

G(x) = supA

RA(x)1−RA(x)−WA(x)

.

Capped-Loss Model

As another example, consider the situation where the processing times are deter-ministic, and at each breakdown with achieved work x, the loss of work is max(x,c)(i.e., capped by c); assuming c = 1 without loss of generality. Specifically,

• The uptimeY and downtime Z are independent of each other and follow geometricdistributions with Pr(Y = k)= (1−a)ak and Pr(Z = k)= (1−b)bk, k= 0,1,2, . . . ,where a and b are both in (0,1).


• The processing time P = m is deterministic.• The loss is max(x,1), so that the transition of the first component x of the state

pair (x,y) is always x→ (x−1)+ =max(x−1,0). Given x, Q(v,x) is a probabilitydegenerate at v = (x− 1)+.

Then we have the following observations:

1. As P = m is deterministic, the remaining processing time at state x is m − x(x = 0,1, . . . ,m− 1). Thus S(x+ z,y)/S(x,y) = I[0,m−x)(z) and the state (x,y) re-duces to a single-component state x with state space 0,1, . . . ,m− 1∪∗.

2. Since Y is independent of Z, d(Y ) = E[e−rZ|Y ] = E[e−rZ] = d can be computed as

d(Y ) = d =∞

∑k=0

e−rk(1− b)bk =1− b

1− be−r = µ , say.

We now elaborate on how to calculate RA(x) and WA(x) for any state x and subsetof the state space excluding the special state ∗. Note that, from the observationsstated above together with Eqs. (7.52)–(7.55) in Theorem 7.7, we have

ΓA N(x) = E[e−rτ N(s(τ))IAc(s(τ))

]

= (1− a)µm−x−1

∑k=0

ake−rkN((x+ k− 1)+)IAc((x+ k− 1)+),

R0(x) = e−r(3−x)SY ((m− x)−) =(ae−r)m−x

, x = 0,1, . . . ,m− 1,

and

W 0A (x) = R0(x)+W0

A (x) =(ae−r)(3−x)

+ (1− a)µ2−x

∑k=0

IA((x+ k− 1)+)e−rkak.

Thus Eq. (7.53) for RA(x) can be rewritten as

RA(x) =(ae−r)m−x

+(1− a)µm−x−1

∑k=0

ake−rkRA ((x+ k− 1)+) IAc ((x+ k− 1)+) .

(7.63)

Take, for example, A = 1,2. Putting x = 0,1, . . . ,m−1 into Eq. (7.63) and writinguk for RA(k) yield a collection of linear equations as follows, which can be solvedeasily:

7.5 Unrestricted Policies for a Parallel Machine Model 281

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

u0 = (ae−r)m +(1− a)µ(1+ ae−r)u0 + µm−2

∑k=3

(ae−r)k+1uk

u1 = (ae−r)m−1 +(1− a)µu0+ µm−2

∑k=3

(ae−r)kuk

u2 = (ae−r)m−2 +(1− a)µm−2

∑k=3

(ae−r)k−1uk

u3 = (ae−r)m−3 +(1− a)µm−2

∑k=3

(ae−r)k−2uk

ux = (ae−r)m−x +(1− a)µm−2

∑k=x−1

(ae−r)k−x+1uk for x = 4, . . . ,m− 1,

7.5 Unrestricted Policies for a Parallel Machine Model

Whereas the optimization under restricted policies for stochastic scheduling involvesmainly the theory of stochastic processes in discrete time, the topic of unrestrictedpolicies needs successful application of the general theory of stochastic processesincluding martingales and stopping times in continuous time. Except for limitedefforts on some special cases, there has been little work reported in this area. Thetheory of resource allocation for bandit processes in continuous time is one ofthose limited successful efforts. Particularly, this theory shows a dramatic differencefrom the optimal allocation policies in the discrete time setting. In the continu-ous time setting, the existence of an optimal policy requires the machine to pos-sess the capability of being simultaneously allocated to more than one project;see the account made in Sect. 6.4. While single-machine scheduling with a reli-able machine can maximize the expected total discounted rewards via the Gittinsindex rule as shown in Example 7.3, it appears that unrestricted policies subject tomachine breakdowns have not been tackled in the literature.

In this section, we consider a special model of parallel machines with unrestricteddynamic policies as follows. A system consists of m parallel identical machines toprocess n jobs that are all available at time zero, where each machine can processany job, but only one at a time, and each job can be processed by one machine ata time. The machines are assumed to have no breakdowns. The processing times Piof i = 1, . . . ,n, are independent and exponentially distributed with rates λ1, . . . ,λnrespectively, independent of the way that the jobs are scheduled. As in the previoussection, we consider the objective function Φ defined in (7.11). Note that randomprocessing times with exponential distributions have been widely addressed in thestochastic scheduling field, including Weiss and Pinedo (1980), Ross (1983), andCai et al. (2000); Cai and Zhou (2005), among others. The exponential distributionis often used to model uncertain times, and is justified in the case with a high levelof uncertainty (cf. Cai et al., 2000).

The main results in this section are based on Cai et al. (2009a).

7.5.1 Optimality Equation

First introduce the following notation for policy classes in the parallel machine cir-cumstance. Denoted by Lur the class of all unrestriced dynamic policies, Lnd theclass of nonpreemptive dynamic policies and Lns the class of nonpreemptive staticlist policies. An obvious relationship is Lur ⊃ Lnd ⊃ Lns. Nevertheless, for ourproblem, by showing that the optimal values with respect to both classes of policiessatisfy the same optimality equation, the following theorem implies that an optimalpolicy in Lnd will be optimal in Lur too. Moreover, it is obvious that an optimalpolicy in Lnd will also be optimal in Lns if it belongs to Lns.

Let SEPT and LEPT be the abbreviations of “shortest expected processing time”and “longest expected processing time”, respectively. A policy π in Lnd is said to bea SEPT (LEPT) rule if it sequences the jobs in nondecreasing (nonincreasing) orderof expected processing times E[Pi], i = 1, . . . ,n, or equivalently the nonincreasing(nondecreasing) order of the rates λi, and selects the first job on this list to processwhenever a machine becomes available (idle). Apparently, a SEPT/LEPT rule isalso a policy in Lns.

Denote by Wpd(J) and Wnd(J) the minimum expected total cost of job set J,defined by (7.11), under respectively the policy classes Lur and Lnd . Then we havethe following theorem.

Theorem 7.11. Both Wpd(J) and Wnd(J) satisfy the following optimality equation

W (J) = min j1... jm∈J

1

∑mk=1 λ jk + r

[φ(J)+

m

∑i=1

λ jiW (J − ji)]

. (7.64)

Proof. Introduce an auxiliary class Lpσ of policies to approximate Lur. Under apolicy in Lpσ , a job can be preempted at the integral multiples of σ : σ ,2σ , . . . . Itis easy to see that Lur is the limit of Lpσ as σ ↓ 0. Denote by Wpσ (J) the minimumexpected total cost defined by (7.11) with respect to Lpσ for a given job set J.

We first show that Wpσ (J) satisfies (7.64). Suppose that m typical jobs, say,j1, . . . , jm, are selected for processing on the m machines. Then the time span beforethe next decision epoch is Pmin = min

Pji ,1 ≤ i ≤ m,σ

, in which the holding cost

incurred is g(J)∫ Pmin

0 e−δ t dt. At the next decision epoch, the decision maker needsto minimize Wpσ (J − ji) if Pji is less than Pjk for all k = i and σ , or Wpσ (J) ifσ < Pjk for all 1 ≤ k ≤ m. Since Wpσ (J − ji) and Wpσ (J) are discounted to thecurrent time, by the standard principle of stochastic programming, the optimalityequation met by Wpσ (J) is

W (J) = min j1... jm∈J

∆1 +

m

∑i=1

∆2i +∆3

, (7.65)

where ∆1 = g(J)E[∫ Pmin

0 e−δ tdt] is the expected holding cost in the next time spanup to the next decision epoch Pmin, ∆2i = E[IPji<Pjk ,k =i,Pji<σ |e−δPji ]Wpσ (J − ji)

is the current expected discounted holding cost for job set J − ji accounting forthe event Pji < Pjk ,k = i,Pji < σ, and ∆3 = E[e−δσ Iσ<min1≤i≤mPji] Wpσ (J).

First note that E[∫ Pmin

0 e−δxdx] =∫ ∞

0∫ t

0 e−δxdxdFmin(t), where Fmin(t) is the cdfof Pmin. Exchanging the orders of the integrals leads to

∆1 = g(J)∫ ∞

0Smin(x)e−δxdx =

1− exp−(∑m

i=1 λ ji + δ )σ

∑mi=1 λ ji + δ g(J),

where Smin(t) = 1−Fmin(t). Next, for each i = 1, . . . ,m,

∆2i = E[e−δPji IPji<Pjk

,k =i,Pji<σ

]=

λ ji

1− exp[−(∑m

i=1 λ ji + δ )σ ]

∑mk=1 λ jk + δ .

Finally,

∆3 = E[e−δσ Iσ<min1≤i≤m Pji

]= exp

−(

m

∑i=1

λ ji + δ)

σ.

Substituting ∆1,∆2i and ∆3 into the optimality equation (7.65), we see that Wpσsatisfies (7.64).

Moreover, we can show in a similar way that Wnd meets (7.64). By letting σ → 0,Wpd satisfies (7.64). The theorem is thus proved.

The next two subsections examine the optimality of the SEPT and LEPT rulesrespectively in the class Lur of unrestricted policies. It turns out that the SEPTrule is optimal to minimize the expected discounted holding cost on fairly generalground, including the expected flowtime and the expected weighted flowtime asspecial cases. The LEPT rule, on the other hand, can only minimize the discountedmakespan for sufficiently small discount factor r, see Theorem 7.14 below.

7.5.2 SEPT Policies

We first consider the SEPT policy in the class Lnd . For the optimality of the SEPTpolicy, we define the following two regularity conditions.

Definition 7.1 (Supermodularity). A set function φ(·) defined on a collection Ωof sets is said to be supermodular (see e.g. Denneberg, 1994) if

φ(A)+φ(B)≤ φ(A∪B)+φ(A∩B) for all A,B ∈ Ω . (7.66)

The supermodularity indicates that the holding cost accelerates in increasing numberof jobs waiting for processing. This is reasonable when the holding cost takes intoaccount such factors as the constraints of existing capacity, lower credit rating withhigher debt level, and loss of opportunity to accommodate new customers.


Definition 7.2 (Agreeability). A set function φ(·) of the subsets of 1, . . . ,n is saidto be agreeable with the rates λ1, . . . ,λn of the processing times if

λi ≥ λ j ⇐⇒ φ(i)≥ φ( j) and φ(J − i)≤ φ(J − j). (7.67)

The theorem below is one of our main results, which shows that the SEPT rule,in the class Lnd , minimizes the expected total discounted holding cost when the costrate φ is supermodular and agreeable as just defined. We will establish the theoremby proving that SEPT satisfies the optimality equation (7.64). This will be conductedby an induction procedure with respect to the number of the jobs to be sched-uled, which requires certain recursive representations and properties of the perfor-mance measure under the SEPT rule. We establish those by a series of lemmas:Lemma 7.5 builds the recursive formula for the performance measures of any setof jobs, Lemma 7.6 gives recursively the increments of the performance measures,weighted by the rates of the jobs subtracted from the job sets, Lemma 7.7 obtainsboundaries for the increments, and Lemma 7.8 shows a type of monotonicity of theincrements. These lemmas and their proofs are provided in the Appendix to thissection.

Theorem 7.12. The SEPT rule minimizes the total expected discounted holdingcosts in the class of unrestricted dynamic policies for any monotone φ satisfying(7.66) and (7.67).

Proof. Let the job set be ordered by the decreasing order of λ j as 1,2, . . . ,n. Theproof builds on the dynamic programming by the method of induction with respectto the number of the jobs awaiting processing. Let J be the job set to be scheduledon the machines. Denote by W (J) the minimized expected total cost defined by(7.11) regarding the job set J and W ∗(J) the expected total cost under the SEPTrule. As described above, the proof follows from a series of lemmas provided inAppendix of this Chapter.

First we rewrite (7.64) as

min j1... jm∈J

φ(J)+

m

∑k=1

λ jk [W (J − jk)−W∗(J)]

+

( m

∑k=1

λ jk + r)[W ∗(J)−W(J)]− rW∗(J)

= 0. (7.68)

We consider only the case |J| > m since the theorem is trivial for |J| ≤ m. Thetheorem holds trivially for n = 1. Suppose that the theorem holds for |J| ≤ n− 1,and we proceed to prove it for |J|= n. Note that under the induction hypothesis,

W (J − jk) =W ∗(J− jk), k = 1,2, . . . ,m.


Thus the optimality equation (7.68) can be rewritten as

min j1... jm∈J

φ(J)+

m

∑k=1

λ jk [W∗(J− jk)−W∗(J)]

+

( m

∑k=1

λ jk + r)[W ∗(J)−W(J)]− rW∗(J)

= 0. (7.69)

We now show that the minimum in (7.69) is achieved at 1,2, . . . ,m. If this is notthe case, then the solution is some j1, . . . , jm = 1,2, . . . ,m. Let

W j1... jm(J) = φ(J)+m

∑k=1

λ jk [W∗(J − jk)−W∗(J)]

+

( m

∑k=1

λ jk + r)[W ∗(J)−W(J)]− rW∗(J). (7.70)

By Lemma 7.8 in Appendix we have

W j1... jm(J)≥ φ(J)+m

∑k=1

λk[W ∗(J − k)−W∗(J)]

+

( m

∑k=1

λ jk + r)[W ∗(J)−W(J)]− rW∗(J). (7.71)

Further using Lemma 7.5 in Appendix together with (7.69), we get

W j1,..., jm(J)≥( m

∑k=1

λ jk + r)[W ∗(J)−W(J)] . (7.72)

Therefore, since W ∗(J)≥W (J), (7.72) and (7.69) imply

0 =W j1,..., jm(J)≥( m

∑k=1

λ jk + r)[W ∗(J)−W(J)]≥ 0. (7.73)

This shows that W ∗(J) =W (J), and hence selecting 1,2, . . . ,m to process on them machines is optimal for |J| = n. The theorem then follows from the principle ofmathematical induction.

It follows from Theorem 7.12 that the SEPT rule is optimal in the class Lnsbecause of Lnd ⊃ Lns. On the other hand, we can see that it is also optimal in theclass Lur, due to Theorem 7.11. These are summarized in the following theorem.

Theorem 7.13. The SEPT rule is optimal in the class of unrestricted dynamicpolicies and the class of nonpreemptive static list policies to minimize the totalexpected discounted holding costs for any monotone φ that satisfies (7.66) and(7.67).


We now present some examples to exhibit the applications of Theorems 7.12and 7.13. Without ambiguity, the optimality in the following example all refers tothe classes Lur, Lnd , and Lns.

Example 7.4.

1. Expected discounted cost: Let φ(A) = |A|, which satisfies Assumptions 7.1and 7.2 trivially. Hence by Theorems 7.12 and 7.13, the SEPT minimizes theexpected discounted cost in (7.11).

2. Expected weighted discounted reward: Suppose that on the completion of job i,a reward wi is collected, which is subject to the discounting at a rate r > 0. Thenthe total expected weighted discounted reward is

EWDR(π) = E

[n

∑i=1

wie−rCi

]. (7.74)

Take φ(A) = ∑i∈A wi. Then the expected total discounted holding cost defined asin (7.11) becomes

EDHC(π) =E

[n

∑i=1

wi

∫ Ci

0e−rxdx

]= E

[n

∑i=1

wi1− e−rCi

r

]

=1r

n

∑i=1

wi −1r

EWDR(π). (7.75)

Thus maximizing EWDR(π) is equivalent to minimizing EDHC(π).Now, as φ(i) = wi, the agreeability condition in (7.67) reduces to

wi ≥ wj ⇐⇒ λi ≥ λ j, (7.76)

and φ(A) =∑i∈A wi is modular (the equality in (7.66) holds). It then follows fromTheorems 7.12 and 7.13, that under agreeability condition (7.76), the SEPT ruleminimizes the expected discounted holding cost in (7.75), and so it maximizesthe expected total discounted rewards (7.74).

3. Expected weighted flowtime: Take r = 0 in (7.75). Then EDHC(π) reduces to theexpected weighted flowtime E[∑n

i=1 wiCi]. Hence under agreeability condition(7.76), the SEPT rule minimizes the expected weighted flowtime.

In the above example, the cost rate φ(·) is actually “modular” in the sense thatthe equality in (7.66) holds. We will further show some examples with “truly”supermodular cost rate. But first we need the following lemma on the supermod-ularity of set functions.

Lemma 7.4. If h(·) is a convex function and T (·) is a modular set function, thenφ(·) = h(T (·)) is supermodular.

Proof. The convexity of h(·) implies h(x+ r)−h(x)≤ h(y+ r)−h(y) for x ≤ y andr ≥ 0. Furthermore, the modularity of T means T (A∪B)−T (A) = T (B)−T (A∩B).

Let x = T (A∩B), y = T (A), and r = T (A∪B)−T (A) = T (B)−T (A∩B). Then wehave x ≤ y and r ≥ 0. Hence

φ(B)−φ(A∩B) = h(T (B))− h(T(A∩B)) = h(x+ r)− h(x)≤ h(y+ r)− h(y)

= h(T (A∪B))− h(T(A)) = φ(A∪B)−φ(A),

which shows the supermodularity of φ(·).

Example 7.5. Let φ(A) = |A|2. Then φ(·) is supermodular by Lemma 7.4. It alsosatisfies agreeability condition (7.67) trivially. Hence by Theorems 7.12 and 7.13,the SEPT rule minimizes the following expected discounted holding cost in the classof unrestricted dynamic policies:

EDHC(π) = E[∫ ∞

0|S(x)|2e−rxdx

]= E

⎡

⎣∫ ∞

0

(n

∑i=1

I(Ci > x)

)2

e−rxdx

⎤

⎦

= E

[∫ ∞

0

(n

∑i=1

I(Ci > x)+∑i= j

I(Ci > x)I(Cj > x)

)e−rxdx

]

= E

[n

∑i=1

∫ Ci

0e−rxdx+ 2 ∑

i< j

∫ min(Ci,Cj)

0e−rxdx

]. (7.77)

When r = 0, (7.77) reduces to

EDHC(π) = E

[n

∑i=1

Ci + 2 ∑i< j

min(Ci,Cj)

].

Example 7.6. Let φ(A) = (∑i∈A wi)2. Then, as ∑i∈A wi is a modular set function ofA, φ(·) is supermodular by Lemma 7.4. The agreeability condition in (7.67) for φ(·)is equivalent to (7.76). Hence by Theorems 7.12 and 7.13, if (7.76) is satisfied, thenthe SEPT rule minimizes the following expected discounted holding cost:

EDHC(π) = E[∫ ∞

0φ(S(x))2e−rxdx

]= E

⎡

⎣∫ ∞

0

(n

∑i=1

wiI(Ci > x)

)2

e−rxdx

⎤

⎦

= E

[∫ ∞

0

(n

∑i=1

w2i I(Ci > x)+∑

i= jwiwjI(Ci > x)I(Cj > x)

)e−rxdx

]

= E

[n

∑i=1

w2i

∫ Ci

0e−rxdx+ 2wiwj ∑

i< j

∫ min(Ci,Cj)

0wiwje−rxdx

],

which reduces, when r = 0, to

EDHC(π) = E

[n

∑i=1

w2i Ci + 2 ∑

i< jwiwj min(Ci,Cj)

].

The next example shows a simple but realistic situation in which the cost ratesatisfies (7.66) and (7.67) and is not modular.

Example 7.7. Suppose that the decision maker has a storehouse with a capacity ofholding k unfinished jobs at a fixed maintenance cost. If the number of unfinishedjobs exceeds k, each job in excess of the capacity requires an additional cost of 1per unit time, for example, from an outside storage service provider. Consider suchadditional costs as the holding costs for unfinished jobs. Then the instantaneous costrate for holding a set A of jobs is

φ(A) = max(0, |A|− k) = h(T (A)),

where T (A) = |A| is a modular set function and h(x) = max(0,x− k) is a convexfunction. It is easy to see that φ(·) satisfies the supermodularity condition (7.66) byLemma 7.4 and the agreeability condition (7.67) by a straightforward computation.Moreover, φ(·) is clearly non-modular as the strict inequality

φ(A)+φ(B)< φ(A∪B)+φ(A∩B)

holds when |A|≤ k ≤ |B|< |A∪B|, |A|≤ |B|≤ k < |A∪B|, |A∩B|< k ≤ |A|≤ |B|,and so on.

7.5.3 LEPT Policies

Discounted makespan is a measure that takes into account the time-variant value ofmoney. One practical application is the situation where the machines are a set ofresources rented, with the cost of the rental depending upon the length of using themachines and the interest rate. In such a case, it is the discounted makespan thatshould be minimized (note that the cost of the rental increases nonlinearly due tothe effect of the interest rate).

As another example, if the completion time of the n jobs determines the finishedtime of a product (or a project), and the time-to-market affects the value of theproduct due to the discounted rate, then the discounted makespan is a measure thatshould be minimized.

While the LEPT rule has been widely known in the literature as an optimalsolution to minimize the makespan on multiple identical machines, we will show,in this subsection, that such a result will hold only under some limited conditions.More specifically, the following theorem states that when φ(A) = 1 for all nonemptyA and the gaps between the expected processing times are not too large, the LEPTrule minimizes the expected discounted makespan. We will also show that this isnot the case if the gaps are sufficiently large when the discount rate r > 0, whichhighlights an important distinction between the discounted and time-invariant costs.

Now suppose that the jobs have been ordered in nondecreasing order of λi sothat λ1 ≤ λ2 ≤ · · ·≤ λn.

Theorem 7.14. The LEPT rule minimizes, in the class of unrestricted dynamicpolicies, the class of nonpreemptive dynamic policies, and the class of nonpreemp-tive static list policies, the expected total discounted holding cost when φ(A) = 1for all A =∅ and λ1, . . . ,λn satisfy the following inequalities

1r≥ λi+1 − 2λi

2λ 2i

, i = 1, . . . ,n− 1. (7.78)

Proof. We focus on the class of nonpreemptive dynamic policies. Using the samearguments as in the previous subsection, LEPT is optimal in the other two classesof policies if it is optimal in the class of nonpreemptive dynamic policies.

The proof follows the similar procedure in the proof of Theorem 7.12, but withsimpler notation and calculations. We now need to verify that

D12(J)≤ 0 and D2m(J)≤ 0 for m = 3, . . . ,n. (7.79)

With definition (7.94) in Appendix, we have

D12(J) =1

λ1 +λ2 + r[λ1D32(J − 1)+λ2D13(J− 2)] (7.80)

and

D2m(J) =1

λ1 +λ2 + r[λ1D2m(J − 1)+λ2D3m(J− 2)] . (7.81)

We verify (7.79) by induction again.First, when |J|= 2, in the current case we have

D12(J) =− (λ2 −λ1)

λ1 +λ2 + r+

λ2

λ2 + r− λ1

λ1 + r=− (λ2 −λ1)λ1λ2

(λ1 + r)(λ2 + r)(λ1 +λ2 + r)< 0

anddD12(J)

dλ1=

λ2[2λ1(λ1 + r)− rλ2]

(λ1 + r)2(λ1 +λ2 + r)2 ≥ 0.

Suppose that the following inequalities hold for |J|≤ n− 1:

D12(J)≤ 0, D2m(J)≤ 0 anddD12(J)

dλ1≥ 0 for 2 < m ≤ |J|. (7.82)

Then consider |J| = n. The induction hypothesis implies that D13(J − 2) isincreasing in λ1. Let λ1 move increasingly to λ2. Then job 1 changes to job 2 andD13(J − 2) increases to D23(J− 1), so that

D23(J− 1)≥ D13(J − 2). (7.83)


Hence by (7.80),

D12(J)≤(λ2 −λ1)

λ1 +λ2 + rD23(J − 1)≤ 0

and

dD12(J)dλ1

=1

(λ1 +λ2 + r)2 [−D23(J − 1)(λ2+ r)−λ2D13(J− 2)]

+1

(λ1 +λ2 + r)2dD13(J− 2)

dλ1(λ1 +λ2 + r)≥ 0,

where the last inequality holds by the induction hypothesis. Finally, D2m(J) ≤ 0is trivial by the induction hypothesis together with the recursive formula in (7.81).Therefore, (7.82) holds for all |J|= n ≥ 2, so does (7.79).

Remark 7.5. Equation (7.78) is automatically satisfied when λi+1 ≤ 2λi. Thus,(7.78) is equivalent to an upper bound of r. In other words, Theorem 7.14 canbe restated as: The LEPT rule minimizes the expected discounted makespan forr satisfying

0 ≤ r ≤ min1≤i≤n−1

2λ 2

i

max(λi+1 − 2λi,0)

.

When r = 0, it has been shown that the LEPT rule minimizes the makespan and somemore general cost rate, see Weiss and Pinedo (1980) and Kampke (1987a,b, 1989).In the discounted case, however, this conclusion no longer holds. We will show thatthe LEPT rule does not minimize the expected discounted makespan without thecondition of Theorem 7.14 when r > 0, in contrast to the case of no discounting(r = 0). An illustrative counterexample with |J|= 3 is provided below.

Example 7.8. In order for the LEPT rule to minimize (in the three classes of policies)the objective function in Theorem 7.14, we must have D23 ≤ 0. This is justified asfollows. With the notation used in Theorem 7.14 for J = 1,2,3, we have

W ∗(J) =1

λ1 +λ2 + r1+λ1W ∗(2,3)+λ2W ∗(1,3)

and

W13(J) =1

λ1 +λ3 + r1+λ1W ∗(2,3)+λ3W ∗(1,2) .

Thus

W13(J)−W∗(J) =(λ2 −λ3) [1+λ1W ∗(2,3)](λ1 +λ2 + r)(λ1 +λ3 + r)

+λ3W ∗(1,2)

λ1 +λ3 + r− λ2W ∗(1,3)

λ1 +λ2 + r

≤ (λ2 −λ3) [1+λ1W ∗(2,3)](λ1 +λ2 + r)(λ1 +λ3 + r)

− D23

λ1 +λ2 + r≤− D23

λ1 +λ2 + r,

where the first inequality is due to λ2 < λ3 and D23 = λ2W ∗(1,3)− λ3W ∗

(1,2). Therefore, when D23 > 0, W13(J) < W ∗(J), so that the LEPT rule is nolonger optimal. However, D23 = D23(J) can be calculated by

D23 = λ1D23(2,3) = λ1(λ3 −λ2)

λ2 +λ3 + r(1− 2)+

r(λ3 −λ2)

(λ2 + r)(λ3 + r)

= (λ3 −λ2)

[− λ1

λ2 +λ3 + r+

r(λ2 + r)(λ3 + r)

].

It follows that

limλ3→∞

D23 =r

(λ2 + r)and lim

λ1→0D23 =

r(λ3 −λ2)

(λ2 + r)(λ3 + r).

As a result, D23 > 0 for sufficiently large λ3 or sufficiently small λ1, providedr > 0. Therefore, in the discounted case with r > 0, if the conditions on λi inTheorem 7.14 fail to hold, then the LEPT rule is no longer optimal to minimize theexpected discounted makespan.

It may be interesting to note that while the general form of the objectivefunction in (7.11) is considered in both Sects. 7.5.2 and 7.5.3, the optimal policiesin Theorems 7.12–7.14, namely the SEPT and LEPT, are in the opposite order. Thisis due to the nature of the cost rate φ(·). Note that φ(A) ≡ 1 is in fact modularand the opposite inequality in (7.66) holds for this φ . When there is no discount-ing, previous studies showed that the SEPT is optimal for certain supermodular φ(·)(such as φ(A) = |A|), and the LEPT is optimal for φ(A) ≡ 1. When the cost rate isdiscounted, we have shown that the SEPT remains optimal for supermodular φ(·),whereas the optimality of the LEPT for φ(A) ≡ 1 is limited to certain restrictivecircumstances only.

Appendix

Let

ΛJ =|J|

∑i=1

λi + r, Λ j(J) =|J|

∑i=1,i= j

λi + r and Λm =m

∑i=1

λi + r.

Then, for the SEPT rule, applying a procedure similar to that used in the proof ofTheorem 7.11 yields the following lemma.

Lemma 7.5. W ∗(J) satisfies the equations

ΛmW ∗(J) = φ(J)+m

∑k=1

λkW ∗(J − k) if |J|> m (7.84)

and

ΛJW ∗(J) = φ(J)+|J|

∑k=1

λkW ∗(J− k) if |J|≤ m. (7.85)

The next lemma is also for the SEPT rule. Define

A j(J) = λ j[W ∗(J − j)−W∗(J)]. (7.86)

Lemma 7.6. For 1 < |J|≤ m,

ΛJA j(J) =|J|

∑i=1,i= j

λiA j(J − i)−λ j[φ(J)−φ(J − j)]. (7.87)

Further, when |J|> m, if j > m, then

ΛmA j(J) =m

∑i=1

λiA j(J − i)−λ j[φ(J)−φ(J− i)], (7.88)

and if j ≤ m, then

ΛmA j(J) =m

∑i=1,i= j

λiA j(J − i)+λ jAm+1(J − j)−λ j[φ(J)−φ(J− j)].

(7.89)

Proof. If 1 < |J| ≤ m, then there are no jobs left waiting for processing. Hence by(7.85),

ΛJA j(J) = ΛJλ j[W ∗(J − j)−W∗(J)]

=λ j

[ΛJW ∗(J − j)−φ(J)−

|J|

∑i=1

λiW ∗(J − i)]

=λ j

[Λ j(J)W ∗(J − j)−φ(J)−

|J|

∑i=1,i= j

λiW ∗(J− i)]

=λ j

[φ(J − j)+

|J|

∑i=1,i= j

λiW ∗(J− j, i)−φ(J)−|J|

∑i=1,i= j

λiW ∗(J− i)]

=|J|

∑i=1,i= j

λiA j(J − i)−λ j[φ(J)−φ(J− j)].

Thus (7.87) is proved.Next consider the case of |J|> m. For j > m, since 1,2, . . . ,m ⊂ J− j⊂ J.


ΛmA j(J) = λ jΛm[W ∗(J − j)−W∗(J)]

= λ j

[φ(J − j)+

m

∑i=1

λiW ∗(J− j, i)]−λ j

[φ(J)+

m

∑i=1

λiW ∗(J − i)]

=m

∑i=1

λiA j(J − i)−λ j[φ(J)−φ(J − i)],

which proves (7.88). Furthermore, for j ∈ 1, . . . ,m, by (7.84) again,

ΛmA j(J) = λ jΛm[W ∗(J− j)−W∗(J)]

= λ j

[ΛmW ∗(J− j)−φ(J)−

m

∑i=1

λiW ∗(J − i)]

= λ j

[( m+1

∑i=1,i= j

λi + r)

W ∗(J− j)−φ(J)−m

∑i=1,i= j

λiW ∗(J − i)

−λm+1W ∗(J− j)]

= λ j

[φ(J − j)+

m+1

∑i=1,i= j

λiW ∗(J− j, i)−φ(J)−m

∑i=1,i= j

λiW ∗(J− i)

−λm+1W ∗(J− j)]

=m

∑i=1,i= j

λiA j(J− i)+λ j[φ(J − j)+λm+1W ∗(J− j,m+ 1)

−φ(J)−λm+1W ∗(J − j)]

=m

∑i=1,i= j

λiA j(J− i)+λ jAm+1(J − j)−λ j[φ(J)−φ(J− j)].

Thus (7.89) holds as well.

Lemma 7.7. For A j(J) defined in (7.86), we have

A j(J)≤ 0 (7.90)

andA j(J)+φ(J)−φ(J− j)≥ 0. (7.91)

Proof. If |J|= 2, it follows from (7.85) that

(λ1 +λ2 + r)W ∗(1,2) = φ(1,2)+λ1W ∗(2)+λ2W ∗(1)

= φ(1,2)+ λ1

λ2 + rφ(2)+ λ2

λ1 + rφ(1).

Using (7.87) for W ∗(1,2) (since m ≥ |J|= 2),


λ2(λ1 +λ2 + r) [W ∗(1)−W∗(1,2)]= λ2 [(λ1 +λ2 + r)W ∗(1)−φ(1,2)−λ1W ∗(2)−λ2W ∗(1)]=−λ2φ(1,2)−λ1λ2W ∗(2)+λ2(λ1 + r)W ∗(1)

=−λ2 [φ(1,2)−φ(1)]− λ1λ2

(λ2 + r)φ(2). (7.92)

Therefore, (7.92) says that A2(1,2)≤ 0. Since this assertion is independent of theorder of λ1 and λ2, we have A1(1,2)≤ 0 as well.

From the recursive formula (7.87) we see that A j(J)≤ 0 for |J|≤m. Furthermore,in case |J|>m, recursive formulae (7.88) and (7.89) ensure the validity of A j(J)≤ 0for the subcases j > m and j ≤ m respectively. Equation (7.90) is thus proved.

We now examine inequality (7.91). First notice that

(λ1 +λ2 + r) [A2(1,2)+φ(1,2)−φ(1)]

=−λ2 [φ(1,2)−φ(1)]− λ1λ2

(λ2 + r)φ(2)+ (λ1+λ2 + r) [φ(1,2)−φ(1)]

= (λ1 + r) [φ(1,2)−φ(1)]− λ1λ2

(λ2 + r)φ(2)

> λ1 [φ(1,2)−φ(1)−φ(2)]≥ 0.

By (7.87),

ΛJ [A j(J)+φ(J)−φ(J− j)]

=|J|

∑i=1,i= j

λiA j(J−i)−λ j [φ(J)−φ(J− j)]+ΛJ [φ(J)−φ(J− j)]

=|J|

∑i=1,i= j

λi [A j(J −i)+φ(J −i)−φ(J −i, j)]

+|J|

∑i=1,i= j

λi[φ(J)−φ(J− j)−φ(J −i)+φ(J −i, j)]+ r[φ(J)−φ(J− j)].

By the supermodularity and the monotonicity, we have

ΛJ [A j(J)+φ(J)−φ(J− j)]≥|J|

∑i=1,i= j

λi [A j(J − i)+φ(J− i)−φ(J− i, j)] .

(7.93)

Therefore by recursive arguments associated with Eq. (7.93) we prove (7.91) forcase |J|≤ m. The assertion for case |J|> m can be proved similarly.

Lemma 7.8. For |J|≤ m, Al(J)≥ A j(J) if l > j.

Proof. DefineD jk(J) = A j(J)−Ak(J). (7.94)

Then we need to show D jl(J)≤ 0 for l > j.Let j < l and first consider the case |J|≤ m. By (7.87),

ΛJD jl(J) = ΛJ [A j(J)−Al(J)]

=|J|

∑i=1,i= j

λiA j(J− i)−λ j[φ(J)−φ(J− j)]

−|J|

∑i=1,i=l

λiAl(J− i)+λl[φ(J)−φ(J− l)].

By further calculations we get

ΛJD jl(J) =|J|

∑i=1,i= j,l

λi [A j(J − i)−Al(J− i)]+λlA j(J− l)−λ jAl(J − j)

+λl[φ(J)−φ(J − l)]−λ j[φ(J)−φ(J − j)]

=|J|

∑i=1,i= j,l

λiD jl(J − i)+λlλ j(W ∗(J− j)−W∗(J− l))

+λl[φ(J)−φ(J − l)]−λ j[φ(J)−φ(J − j)]

=|J|

∑i=1,i= j,l

λiD jl(J − i)+ (λlA j(J)−λ jAl(J))

+λl[φ(J)−φ(J − l)]−λ j[φ(J)−φ(J − j)].

Therefore,

( |J|

∑i=1,i =l

λi + r)

D jl =|J|

∑i=1,i = j,l

λiD jl(J−i)+(λl −λ j)Al(J)

+λl [φ(J)−φ(J−l)]−λ j[φ(J)−φ(J− j)]

≤|J|

∑i=1,i = j,l

λiD jl(J−i)+(λl −λ j) [Al(J)+φ(J)−φ(J −l)] .

By Lemma 7.7 and λl −λ j ≤ 0, we further obtain

ΛJD jl(J)≤|J|

∑i=1,i= j,l

λiD jl(J− i). (7.95)

We next prove D jl(J)≤ 0 by induction in the following steps.

Step 1. Let J = 1,2. Then we can write

D12(J) = A1(J)−A2(J)

= λ1(W ∗(2)−W∗(1,2))−λ2(W ∗(1)−W∗(1,2))

=(λ2−λ1)

λ1+λ2+r(φ(J)−φ(1)−φ(2))+ λ2

λ2 + rφ(2)− λ1

λ1 + rφ(1).

Recall the supermodularity of φ leads to φ(J)−φ(1)−φ(2)≥ 0 and the agree-ability between φ and λi gives φ(1)≥ φ(2). Hence it is readily to check thatfor J = 1,2,

D12(J)≤ 0. (7.96)

Step 2. By the recursive formula (7.95) and (7.96), we see that D jl(J) ≤ 0 for|J|≤ m if j < l.

Step 3. We now turn to the case |J| > m. With definition (7.94) and Eqs. (7.88)and (7.89), we further divide the proof into the following three cases.

Case 1. 0 < j < l ≤ m:

(m+1

∑i=1

λi + r)

D jl(J) =(m+1

∑i=1

λi + r)

A j(J)−(m+1

∑i=1

λi + r)

Al(J)

=m+1

∑i=1,i= j,l

λi[A j(J − i)−Al(J− i)]+λlA j(J− l)+λ jAm+1(J)

−λ jAl(J − j)−λlAm+1(J)

+λl[φ(J)−φ(J − l)]−λ j[φ(J)−φ(J− j)]

=m+1

∑i=1,i= j,l

λiD jl(J− i)+λlA j(J)−λ jAl(J)+ (λ j −λl)Am+1(J)

+λl[φ(J)−φ(J − l)]−λ j[φ(J)−φ(J− j)],

or equivalently,

( m+1

∑i=1,i=l

λi+r)

D jl(J)=m+1

∑i=1,i= j,l

λiD jl(J− i)−(λ j −λl)Al(J)+ (λ j −λl)Am+1(J)

+λl[φ(J)−φ(J − l)]−λ j[φ(J)−φ(J − j)]

≤m+1

∑i=1,i= j,l

λiD jl(J− i)+ (λ j−λl)Am+1(J)

− (λ j −λl)[Al(J)+φ(J)−φ(J− l)].

It then follows from Lemma 7.7 that

( m+1

∑i=1,i=l

λi + r)

D jl(J)≤m+1

∑i=1,i= j,l

λiD jl(J− i). (7.97)

7.6 Bibliographical Comments 297

Case 2. j ≤ m < l:

ΛmD jl(J) =m

∑i=1,i= j

λiA j(J− i)+λ jAm+1(J − j)−λ j[φ(J)−φ(J− j)]

−m

∑i=1,i= j

λiAl(J − i)−λ jAl(J − j)+λl[φ(J)−φ(J − l)]

=m

∑i=1,i= j

λiD jl(J − i)+λ jDm+1,l(J − j)

+λl[φ(J)−φ(J− l)]−λ j[φ(J)−φ(J− j)].

Due to the agreeability between φ and λi, i ∈ J, we have

ΛmD jl(J)≤m

∑i=1,i= j

λiD jl(J− i)+λ jDm+1,l(J− j). (7.98)

Case 3. m < j < l:

ΛmD jl(J) =m

∑i=1

λiA j(J− i)+λ j[φ(J)−φ(J− j)]

−m

∑i=1

λiAl(J − i)+λl[φ(J)−φ(J − l)]

=m

∑i=1

λiD jl(J− i)+λl[φ(J)−φ(J − l)]−λ j[φ(J)−φ(J − j)].

Due to again the agreeability between φ and λi, i ∈ J, we have

ΛmD jl(J)≤m

∑i=1

λiD jl(J − i). (7.99)

Therefore, it follows from the recursive formulae (7.97)–(7.99) that D jl(J) ≤ 0 forthe case |J| ≤ m with j < l. Thus we have shown that D jl(J) ≤ 0 whenever j < l,which completes the proof.

7.6 Bibliographical Comments

Significant results on the no-loss breakdown model have been reported in Birgeet al. (1990), Cai and Zhou (1999); Cai et al. (2000), Glazebrook (1984, 1987), Li,Braun and Zhao (1998), Mittenthal and Raghavachari (1993), Pinedo and Rammouz


(1988), and Qi et al. (2000a,b), to name just a few. Literature on the total-loss modelinclude Adiri et al. (1989, 1991), Birge et al. (1990), Frostig (1991), Mehta and Uz-soy (1998), and Cai et al. (2003, 2004, 2005, 2009b), etc. Most of the studies con-cerning breakdown-repeat breakdowns have considered static policies only. The ex-ceptions include Frostig (1991), Glazebrook (1984); Glazebrook and Owen (1991),and Cai et al. (2005, 2009b). See Chap. 4 for details.

The SEPT and LEPT policies of scheduling jobs on parallel machines havebeen studied for decades in the literature, in which Glazebrook (1979) and We-ber (1982a, b) showed that on identical parallel machine settings, the SEPT policyminimizes the expected flowtime, while the LEPT policy minimizes the expectedmakespan. More general results can be found in Weiss and Pinedo (1980), whichconsidered these problems with non-identical machines that are characterized bytheir speeds and general cost functions which cover flowtime and makespan as spe-cial cases. They showed that the SEPT or LEPT policy minimizes the expected costfunctions when the cost rates meet some regularity conditions. Kampke (1987a,b,1989) followed this direction to examine the conditions under which the LEPT andSEPT rules are optimal with general cost functions. Chang et al. (1992) went fur-ther to allow the machines to be subject to breakdowns and repairs. Alternative toextending the results from flowtime and makespan to general cost functions, anotherdirection is to consider general rewards as the objective functions of completing thejobs. For example Weber et al. (1986) considered the parallel machine schedulingproblem with a type of general rewards, represented as the sum of a general functionof each completion instant. Further results are available in Weber (1988). More stud-ies can be found in, for example, Bruno (1985), Hordijk and Koole (1993), Weiss(1990), Righter (1988, 1991), Weiss (1990), Cai and Zhou (1999), and Righter andXu (1991), among others. In the references mentioned above, apart from Cai andZhou (1999), the cost functions are time invariant. While in the single machinecase, scheduling with discounted cost/rewards has been extensively studied and alarge number of results have been reported, there has been little work reported, ondiscounted cost/rewards so far for parallel machine scheduling.

Chapter 8Stochastic Scheduling with IncompleteInformation

The majority of studies on stochastic scheduling models have largely been es-tablished based on the assumption of complete information, in the sense that theprobability distributions of the random variables involved, such as the processingtimes and the machine up/downtimes, are completely specified a priori. In real-ity, however, there are many circumstances where the information is only partiallyavailable, which makes it impossible to completely specify the distributions of therelevant random variables. Examples of scheduling with incomplete informationcan be found in environmental clean-up (Lee and Kitanidis, 1991), parallel compu-tation (Bast, 1998), project management (Gardoni et al., 2007), petroleum explo-ration (Glazebrook and Boys, 1995), sensor scheduling in mobile robots (Gage andMurphy, 2004), and cycle time modelling (Chen et al., 2001), among many others.As a result of incomplete information, there may be multiple competing distribu-tions to model the random variables of interest. A common and effective approachto tackle this problem is the well-known Bayesian methodology, which identifieseach competing distribution by a realization of a random variable, say Θ . Initially,Θ has a prior distribution based on historical information or assumption (which maybe non-informative if no historical information is available). Information on Θ maybe updated after realizations of the random variables are observed. A key concernin decision making is how to utilize the updated information to refine and enhancethe decisions.

The main purpose of this chapter is to treat a class of scheduling models subject tomachine breakdowns with incomplete information. Under this class of models, therepeated processing times between breakdowns are dependent via a latent randomvariable. This leads to partially available information on the processing times duringthe process, and the information is gradually accumulated from previous processingexperience and adaptively incorporated into the decision making for processingremaining jobs. Section 8.1 formulates the model and discusses the probabilisticcharacteristics of the repetition frequency and occupying times, and the impact ofincomplete information. The optimal restricted dynamic policies for this model are


299

300 8 Stochastic Scheduling with Incomplete Information

derived in Sect. 8.2 based on posterior Gittins indices. Finally, Sect. 8.3 discussesan interesting case in which the posterior Gittins indices can be represented by theone-step rewards rates of the jobs. The main results of this chapter are mainly basedon Cai et al. (2009b).

8.1 Modelling and Probabilistic Characteristics

8.1.1 Formulation and Assumptions

We consider the problem of scheduling a set of n jobs, all available at time 0on a single machine as described in Chap. 4, but with incomplete information.Specifically, suppose that for each job i, the distributions of (Pik,Yik,Zik), k =1,2, . . . , are only partially known and depend on an unknown parameter Θi. Toaccount for the partial knowledge on Θi (hence the distributions of Pik,Yik,Zik), itis modelled as a random variable with a prior distribution πi(θ ). We further assumethat• Conditional on Θi, (Pik,Yik,Zik), k = 1,2, . . . , are i.i.d. following arbitrary distri-

butions as (Pi,Yi,Zi),• (Θi;Pik,Yik,Zik,k = 1,2, . . .) are mutually independent over i = 1,2, . . . ,n.

Remark 8.1. Note that (Pik,Yik,Zik), k = 1,2, . . . , are only assumed to be condition-ally independent given Θi. Unconditionally, however, they are dependent via Θi. Forexample, it is easy to see that the covariance between Pi j and Pik is given by

Cov(Pi j,Pik) = E [Cov(Pi j,Pik|Θi)]+Cov(E [Pi j|Θi] ,E [Pik|Θi])

=

E [Var(Pi|Θi)]+Var(E[Pi|Θi]) if j = k,Var(E [Pi|Θi]) if j = k.

Thus the correlation coefficient between Pi j and Pik for j = k is

Corr(Pi j,Pik) =Cov(Pi j,Pik)√

Var(Pi j)Var(Pik)=

Var(E[Pi|Θi])

E [Var(Pi|Θi)]+Var(E[Pi|Θi]), (8.1)

which is positive, hence Pi j and Pik are dependent (unless E[Pi|Θi] is constant).Equation (8.1) also shows that the repeated processing times Pi1,Pi2, . . . are equallycorrelated.

Under this model setting, the uncertainty in (Pik,Yik,Zik) consists of two parts.One is the variation in Θi that reflects the differences between competing distri-butions for (Pik,Yik,Zik). The other is the variation of (Pik,Yik,Zik) given Θi, whichreflects the level of tightness for the distributions of (Pik,Yik,Zik) to depend on thehistorical knowledge. For the processing times Pik, one extreme situation is thatthe history supplies no useful information for future decisions in the sense thatPik are independent of Θi. This corresponds to independent processing times

8.1 Modelling and Probabilistic Characteristics 301

between breakdowns. The other extreme is when the repeated processing timesPi1,Pi2, . . . are all equal to a value completely determined by Θi, which is the case ofidentical processing times. Intermediate situations represent the usual reality thatthe history is helpful to some extent, but is not sufficient to completely specifythe processing times and leaves them with some stochastic features. Consequently,this incomplete information model includes those previously investigated indepen-dent/identical processing time models as two extreme special cases. When job i isbeing processed, the information on the unknown parameter Θi is gradually accu-mulated from the realizations of the processing times and up/downtimes. Based onsuch accumulated information, the decision maker can modify the policy to achievebetter results.

8.1.2 Repetition Frequency and Occupying Time

The number of repetitions Ti and the occupying time Oi for each job i are definedin Sect. 4.3. We now study the probabilistic characteristics of the two variables.In addition to a better understanding of the model, the results we present here areessential for the development of optimal policies, in both the static and dynamicclasses.

First note that, conditional on Θi, the repetition number Ti follows a geometricdistribution with success probability pi(Θi) = Pr(Yi < Pi |Θi), i.e.,

Pr(Ti = k|Θi) = [1− pi(Θi)] pk−1(Θi), k = 1,2, . . . . (8.2)

As a result, the unconditional (marginal) distribution of Ti is mixed geometric with

Pr(Ti = k) = E[[1− pi(Θi)] pk−1

i (Θi)], k = 1,2, . . . , (8.3)

and

E[Ti] = E[

1pi(Θi)

], k = 1,2, . . . . (8.4)

This leads to an immediate result below:

Proposition 8.1. Job i is processible, i.e., Oi < ∞ a.s. if and only if pi(Θi)< 1 a.s.

The probabilistic characteristics of the occupying time Oi can be obtained byits Laplace transform E[e−tOi ]. In addition, the moments of Oi are also useful indeveloping solutions that require the evaluation of an objective function E[ fi(Ci(λ ))]when fi(·) is a polynomial.

Proposition 8.2. The Laplace transform of Oi is given by

E[e−rOi |Θi] = E

[E[e−rPi IPi≤Yi|Θi

]

1−E[e−r(Yi+Zi)IPi>Yi|Θi

]], r ≥ 0. (8.5)


Proof. This is an immediate result of Theorem 4.10.

Although one can calculate the moments of an occupying time from its Laplacetransform by Proposition 8.2, a simpler method is available by a recursive proceduregiven in the next proposition.

Proposition 8.3. The conditional moments of Oi given Θi can be calculated recur-sively by

E[Ok

i |Θi]=

E[Pk

i IPi≤Yi|Θi]

E[IPi≤Yi|Θi

] +k−1

∑t=0

Ctk

E[(Yi +Zi)k−t IPi>Yi|Θi]E[Oti |Θi]

E[IPi≤Yi|Θi

] , (8.6)

where Ctk = k!/[t!(k− t)!]. As a result, the first two moments of Oi conditional on

Θi are

E [Oi|Θi] =E[PiIPi≤Yi+(Yi +Zi) IPi>Yi|Θi

]

E[IPi≤Yi|Θi

] (8.7)

and

E[O2

i |Θi]=

E[P2i IPi≤Yi|Θi]

E[IPi≤Yi|Θi]+

2E[(Yi +Zi)2IPi>Yi)|Θi]

E[I(Pi≤Yi)|Θi]

+E[(Yi +Zi)I(Pi>Yi)|Θi]

E[IPi≤Yi|Θi]E[Oi|Θi]. (8.8)

Proof. Let =st denote the equality of distributions. First note the useful renewaltechnique (see, e.g., Ross 1996):

[Oi|Θi] =st [Pi1IPi1≤Yi1|Θi]+ [IPi1>Yi1(Yi1 +Zi1 +O′i)|Θi], (8.9)

where O′i =st Oi, independent of (Pi1,Yi1,Zi1) given Θi. Decompose

[Ok

i |Θi]

as

[Ok

i |Θi]=st[Pk

i IPi≤Yi∣∣Θi]+[(Yi +Zi +O′

i)kIPi1>Yi1

∣∣Θi].

Taking expectation, we have

E[Ok

i |Θi]= E

[Pk

i IPi≤Yi∣∣Θi]+E[(Yi +Zi +O′

i)kIPi1>Yi1

∣∣Θi]

= E[Pk

i IPi≤Yi∣∣Θi]+

k

∑t=0

CtkE[(Yi +Zi)

k−t IPi1>Yi1∣∣Θi]E[Ok

i |Θi]k, (8.10)

which gives (8.6). Then (8.7) and (8.8) follow directly.

The following corollary provides a formula that unifies (8.5) and (8.7).

Corollary 8.1.

E[∫ Oi

0e−rtdt

∣∣Θi

]=

E[∫ Pi

0 e−rxdxIPi≤Yi+∫ Yi+Zi

0 e−rxdxIPi>Yi|Θi]

1−E[e−r(Yi+Zi)IPi>Yi|Θi]. (8.11)

8.1 Modelling and Probabilistic Characteristics 303

Proof. For r > 0,∫ Oi

0 e−rtdt = (1−e−rOi)/r, hence (8.11) follows from (8.5). Whenr = 0,

∫ Oi0 e−rtdt = Oi and (8.11) reduces to (8.7).

8.1.3 Impact of Incomplete Information on Static Policies

As discussed in Chap. 4, the scheduling problems with stochastic breakdowns understatic policies are in fact equivalent to the ones without breakdowns (with theoccupying times taking the role of the processing times). As a result, based onthe Laplace transform of Oi in Proposition 8.2 or the moments of Oi in Proposi-tion 8.3, one can derive, analytically or computationally, the optimal static policiesunder any objective functions. We here provide a simple example to show that, evenwithin the class of static policies, incomplete information may have a great impacton the optimal policy. Consider the problem of maximizing the expected weighteddiscounted rewards:

EWDR(π) = E

[n

∑i=1

wie−rCi

]. (8.12)

When wi = 1 for all i, we know that the optimal sequence is in nonincreasing order of f j/(1− f j), or equivalently, in nonincreasing order of f j, where f j are givenin (8.5). Suppose that the processing times Pj are exponentially distributed withmean 1/Θ j, the uptimes Yj are exponentially distributed with mean 1, the downtimesZj are identically distributed with E[e−Zj ] = 1/2, Yj and Zj are independent of eachother and of (Pj,Θ j), and the discount rate is r = 1. Then

E[e−rPj I(Pj≤Yj)

∣∣Θ j]= E

[∫ Yj

0e−xΘ je−Θ jxdx

]=

Θ j

1+Θ j

1−E

[e−(1+Θ j)Yj

]

=Θ j

1+Θ j

1−

∫ ∞

0e−(1+Θ j)ye−ydy

=

Θ j

2+Θ j(8.13)

and similarly,

E[e−r(Yj+Zj)I(Pj>Yj)

∣∣Θ j]= E

[e−Zj

]E[e−(1+Θ j)Yj

]=

12

(1

2+Θ j

)=

14+ 2Θ j

.

(8.14)

If the information on the distributions of Pj is complete, i.e., Θ1, . . . ,Θn are known(deterministic), as θ1, . . . ,θn (say), respectively, then by (8.5) and (8.13)–(8.14),

f j =θ j/(2+θ j)

1− 1/(4+ 2θ j)=

2θ j

3+ 2θ j, (8.15)

which is increasing in θ j. Thus the optimal sequence is in nonincreasing order ofθi, or equivalently, in nondecreasing order of 1/θi = E[Pj]. In other words,the SEPT rule is optimal with complete information in this example.

Now let us examine the case with incomplete information. Consider the simplecase of each Θ j taking on two possible values a j and b j with

β j = Pr(Θ j = a j) = 1−Pr(Θ j = b j), 0 < β j < 1, j = 1, . . . ,n.

Then by (8.5) and (8.15),

f j = E[

2Θ j

3+ 2Θ j

]=

2a j

3+ 2a jβ j +

2b j

3+ 2b j(1−β j),

while the expected processing time is

E[Pj] = E[E[Pj|Θ j]] = E[

1Θ j

]=

1a j

β j +1b j(1−β j).

It is easy to see that the nonincreasing order of fi is no longer equivalent to thenondecreasing order of E[Pj]. For example, take (a1,b1)= (1,8), (a2,b2)= (3,6),β1 = 0.2 and β2 = 0.7. Then

f1 =2(1)

3+ 2(1)(0.2)+

2(8)3+ 2(8)

(0.8) = 0.7537,

f2 =2(3)

3+ 2(3)(0.7)+

2(6)3+ 2(6)

(0.3) = 0.7067.

Hence f1 > f2 and so the optimal sequence should process job 1 before job 2. Onthe other hand,

E[P1] = 0.2+18(0.8) = 0.3 and E[P2] =

13(0.7)+

16(0.3) = 0.2833,

so that E[P1]> E[P2]. Therefore the SEPT rule is no longer optimal with incompleteinformation.

The optimal sequence in the order 1,2 with E[P1] > E[P2] is counter-intuitiveand differs completely from the well-known SEPT rule with complete information.This highlights the impact of incomplete information on the optimal decisions.

8.2 Optimal Restricted Dynamic Policies

Under a dynamic policy, the decision maker has the option to revise his policy ateach decision epoch. As we will see, in the case where the distributions involvedcontain unknown parameters, the historical information can be used to infer theparameters via their posterior distributions, which in turn influences the optimaldynamic policy. We here focus on the problem of maximizing the expected weighteddiscounted rewards (EWDR) defined by (8.12). We will establish the optimal poli-cies via the celebrated dynamic allocation index theory of Gittins (see Chap. 6) formulti-armed bandit processes.

8.2 Optimal Restricted Dynamic Policies 305

We first develop the formulae of Gittins indices. Suppose that job i has beenprocessed k ≥ 0 times and remains unfinished at a decision epoch t. Then, if k > 0,it has received processing times equal to the uptimes Yi1, . . . ,Yik of the machine,and occupied the machine for time tik = ∑k

j=1 τi j , where τi j = Yi j +Zi j is the timeoccupied by job i in the j-th round of breakdown for its processing. The processinghistory of job i at that decision epoch is Hik = σ (Yi j,Zi j;Yi j < Pi j, j = 1, . . . ,k),the σ -algebra generated by the random variables (Yi j,Zi j; j = 1, . . . ,k) together withthe events (Yi j < Pi j; j = 1, . . . ,k) for k > 0. If k = 0, we define Hi0 = ( /0,Ω) to bethe trivial σ -algebra. Hi = (Hik,k = 0,1,2, . . .) represents the filtration generatedby the processing history of job i. The remaining occupying time is denoted byOik = Oi(Hik) = Oi −∑k

j=1 τi j (obviously, Oi0 = Oi). Moreover, given the historyHik , we denote the Bayes posterior distribution of Θi as the conditional distributionof Θi given Hik:

πik(θi) = πi(θi|Hik). (8.16)

The Gittins index for job i that has been processed k times but not yet completedis computed as

Gki := Gi(Hik) = max

σ>0

wiE[e−rOik I(σ=Oik)|Hik

]

E[∫ σ

0 e−rtdt∣∣Hik

] , (8.17)

where the maximization is taken over all random variables

σ =k+ρ

∑j=k+1

[τi j I(Yi j<Pi j) +Pi jI(Yi j≥Pi j)

]

with ρ being positive random variables such that k+ρ ≤ Ti and k+ρ is Hi-stoppingtime. We will loosely refer to such a σ as a stopping time below.

Given Θi, the distribution of the future events of job processing is independentof the history due to the conditional independence of (Yi j,Zi j,Pi j) over j, and Oik isidentically distributed as Oi due to the geometric distribution of Ti by (8.2). Thus bythe iterated expectation we have

E[e−rOik I(σ=Oik)|Hik] = E[E[e−rOi I(σ=Oi)|Θi]|Hik] = Eπik [E[e−rOi I(σ=Oi)|Θi]],

where Eπik denotes the expectation with respect to Θi under the posterior distributionπik. Similarly,

E[∫ σ

0e−rtdt

∣∣∣Hik

]= Eπik

[E[∫ σ

0e−rtdt

∣∣∣Θi

]].

Therefore, (8.17) can be rewritten as

Gi(Hik) = maxσ>0

wiEπik

[E[e−rOi I(σ=Oi)|Θi

]]

Eπik

[E[∫ σ

0 e−rtdt|Θi]] . (8.18)

This formula implies that Gittins index depends on the historical information viathe posterior distributions πik at the time when the index is calculated, hence we canrewrite Gi(Hik) as Gi(πik). As a result, (8.18) has the following implications:1. The Gittins index at any decision epoch before the completion of job i can be

calculated as if the job is at its beginning of processing, except that the priordistribution πi = πi0 is replaced by the current posterior distribution πik.

2. The processing of job i can be defined as a bandit process with:

(a) The state as the current posterior distribution πik,(b) The instantaneous reward for selecting job i defined by

Ri(πik) = Eπik

[E[e−rPi IPi≤Yi|Θi

]], (8.19)

and(c) The next state transition time interval

∆i,k+1 = τi,k+1I(Yi,k+1 < Pi,k+1)+Pi,k+1I(Yi,k+1 ≥ Pi,k+1). (8.20)

3. According to Lemma 6.2, the maximum in (8.18) is attained at the positiverandom variable σ = ∑k+ρ

m=k+1 ∆im, where ρ = infl : G(πi,k+l) ≤ G(πik) orρ = infl : G(πi,k+l)< G(πik). Hence when calculating the index, the stoppingtime σ can be selected based on the posterior distribution as follows. Define thespace of all posterior distributions for job i by

Πi = µ(θ ) :There exists a k and a processing history Hik such that µ = πik.

Then the stopping times σ in (8.18) can be limited to the form of σ = ∑ρm=1 ∆im,

where ρ is restricted to

H = ρ : There exists a set A ⊂ Πi such that ρ = infk : πik ∈ A∧Ti. (8.21)

The stopping numbers in the above H are said to be homogenous (or Markovian)with respect to the posterior distributions. Thus the Gittins index for job i at thebeginning of the (k+ 1)th round is simplified to

Gi(πik) = wi maxρ∈H

Eπik [ fi(Θi,ρ)]Eπik [gi (Θi,ρ)]

, (8.22)

where for any stopping number ρ ,

fi(θ ,ρ) = E[e−rOi I(ρ = Ti)|Θi = θ

](8.23)

and

gi (θ ,ρ) = E[∫ ∆i1+···+∆iρ

0e−rtdt

∣∣∣Θi = θ]. (8.24)

To sum up, the above arguments lead to the following theorem.

8.2 Optimal Restricted Dynamic Policies 307

Theorem 8.1. Any unfinished job i, associated with a current posterior distributionπiki of Θi, can be assigned an index defined by (8.22)–(8.24). A policy that selectsthe job with the highest index G=maxi

Gi(πiki)

is optimal in the class of dynamic

policies.

Since Gi(πik) in (8.22) depends on the posterior distributions πiki , we also callit posterior Gittins index to distinguish it from that for the complete informationmodel. Note that fi(θ ,ρ) and gi(θ ,ρ) can be prepared in advance as they areindependent of the historical information. Moreover, the posterior distributions πikof Θi can be calculated by standard probabilistic methods once the k realizations of(IPi>Yi,Yi,Zi) are observed. In particular, if the prior πi0 of Θi is a conjugate for thejoint conditional distribution of

(IPi>Yi,Yi,Zi

)given Θi, the posterior πik will have

the same mathematical form as πi0.Once fi(θ ,ρ) and gi(θ ,ρ) have been prepared, the process of calculating the

posterior Gittins indices G(πik) in (8.22) becomes a sequence of Bayes informationupdates. If at the beginning of the kth round of processing, the posterior distributionof Θi is πi,k−1 and the corresponding Gittins index is G(πi,k−1), then at the begin-ning of the (k+ 1)th round of processing, the posterior Gittins index G(πik) can becomputed by (8.22) with πi,k−1 replaced by πik. This is due to the following updatingmechanism (a direct consequence of (8.16)):

Prik(Θi > θ ) = Pri,k−1(Θi > θ |Yik,Zik,Pik > Yik), (8.25)

where Prik denotes the probability evaluated under the posterior distribution πik.To conclude this section, we discuss an important distinction between the cases

of complete and incomplete information in Remark 8.2 below to highlight the sig-nificance of the incomplete information model in situations with dynamic policies,and a case where we can allow new arrivals of jobs in Remark 8.3.

Remark 8.2. In the case with complete information, the value of Θi is known, say,Θi = θi. Hence the Gittins index of job i is given by

Gi = wiE[e−rPi I(Yi≥Pi)|Θi = θi

]

E[∫ Pi

0 e−rxdxI(Pi≤Yi) +∫ Yi+Zi

0 e−rxdxIPi>Yi∣∣Θi = θi

] (8.26)

(see Corollary 7.1), which only involves the distribution of (Pi,Yi,Zi), but not theirrealizations, and so will not need any revision. In contrast, the posterior Gittins indexgiven by (8.22) for the incomplete information model needs to be revised adap-tively by a maximum over all stopping times, which depend on the realizations of(Pi j,Yi j,Zi j) from the previous processing history. This shows that the involvementof incomplete information leads to a much more complex situation. For a specificexample on how the posterior Gittins index should be updated according to therealization of the history, see Example 8.1 and the subsequent remark.

Remark 8.3. If r = 0, the posterior Gittins index Gi is still defined by (8.22), butfi(θ ,ρ) and gi (θ ,ρ) reduce to

fi(θ ,ρ) = E[I(ρ = Ti)|Θi = θ

]and gi (θ ,ρ) = E

[∆i1 + · · ·+∆iρ

∣∣Θi = θ].

(8.27)

In this case, the optimal dynamic policy with the posterior Gittins indices aboveminimizes the total expected weighted flowtime (EWF). In fact, we can establish afurther result for this case, which allows arrivals of new jobs according to a Pois-son process. More specifically, suppose that the machine is to process n types ofjobs, where n can be finite or infinite. Type-i jobs arrive at the system accordingto a Poisson stream with rate ηi, i = 1,2, . . . ,n. These Poisson streams are mutuallyindependent and also independent of the decision process. For each job of type i,there is a holding cost at a rate of wi before its completion. Denote by h(t) the totalholding rate at time t incurred by all unfinished jobs in the system. The objective isto find a dynamic policy to minimize the expected average cost for holding the jobsover infinite time horizon:

limsupT→∞

1T

E[∫ T

0h(t)dt

]. (8.28)

This problem falls into the framework of Klimov’s problem; see, e.g., Lai and Ying(1988) and Varaiya et al. (1985). We can show that the dynamic policy with theposterior Gittins indices defined by (8.22) and (8.27), regardless of the Poissonrates ηi, is optimal to minimize (8.28) provided the following stability conditionis satisfied:

n

∑i=1

ηiE[Oi]< 1.

8.3 Posterior Gittins Indices with One-Step Reward Rates

It is commonly recognized that even though Gittins indices can be defined for aproblem, how to calculate them to derive the corresponding optimal policy remainsa great challenge and numerical methods are necessary in general; see, for example,Sect. 6.4 of Gittins (1989). We in this section identify certain realistic conditionsunder which the posterior Gittins index (8.22) can be calculated analytically by aone-step reward rate; this is known as the deteriorating case.

8.3.1 Posterior Gittins Indices by One-Step Reward Rates

Note that Eπik [ fi(Θi,1)] = Ri(πik) as defined in (8.19). When the stopping numberis deterministic with ρ = 1, we define the one-step discounted reward rate at stateπ (a distribution) by

v1i (π) :=

Ri(π)Eπ[∫ ∆i

0 e−rtdt] =

Eπ [ fi(Θi,1)]Eπ[gi(Θi,1)

] ,

8.3 Posterior Gittins Indices with One-Step Reward Rates 309

where ∆i = τiIYi<Pi+PiIYi≥Pi, fi(θ ,1) = E[e−rPi IYi≥Pi|Θi = θ

]and

gi(θ ,1) =E[∫ ∆i

0e−rtdt

∣∣∣∣Θi = θ]

=E[

IYi<Pi

∫ Yi+Zi

0e−rtdt + IYi≥Pi

∫ Pi

0e−rtdt

∣∣∣∣Θi = θ]. (8.29)

Then the following theorem is obvious.

Theorem 8.2. If v1i (πik) is nonincreasing in k, then

Gi(πik) = v1i (πik) =

Eπik [ fi(Θi,1)]Eπik [gi(Θi,1)]

. (8.30)

Note that the condition for (8.30) to be valid, namely a nonincreasing one-stepreward rate in k, can be interpreted as a longer processing times in distribution(hence lower reward rate due to heavier discounting) for a job repeated more timesdue to breakdowns, which is intuitively reasonable in practice.

The following is a simple example to illustrate how the current posterior distri-butions and the corresponding Gittins indices can be calculated.

Example 8.1. Suppose that Yi, Zi and Pi are exponentially distributed with rates αi,βi and Θi respectively, where αi > 0 and βi > 0 are known parameters, whereas Θiis unknown. Further assume that Θi is a random variable whose prior distributionπi(θ ) is exponential with a known rate ηi > 0.

Given that job i has not been completed by the kth round processing, we haveobserved Hik = Yi j,Zi j;Pi j > Yi j, j = 1, . . . ,k. Since Pr(Pi j > Yi j|Yi j) = e−ΘiYi j ,the likelihood of Hik conditional on Θi has the form

f (Hik|Θi) =k

∏j=1

[αie−αiYi j βie−βiZi j e−ΘiYi j

].

Hence, at the end of the kth machine breakdown for processing job i, the posteriordensity of Θi is

π(θ |Hik) ∝ f (Hik|Θi = θ )πi(θ ) ∝ ηie−ηiθk

∏j=1

e−θYi j ∝ e−θ(

ηi+∑kj=1 Yi j

)

.

This shows that the posterior distribution of Θi given the available information Hikis also exponential, but with a new rate ηi +Yi1 + · · ·+Yik. That is,

πik(θ ) = π(θ |Hik) = (ηi +Yi1 + · · ·+Yik)e−θ(

ηi+∑kj=1 Yi j

)

.

Therefore, the posterior distribution at every breakdown is adaptively updated byadding the last observed uptime to its exponential rate, provided that the job has notbeen completed.

Furthermore, by some calculus computations, we have

fi(θ ,1) = E[e−rPiIPi≤Yi|Θi = θ

]= E

[e−(r+αi)Pi |Θi = θ

]=

θi

r+αi +θi

and

gi(θ ,1) = E[

IYi<Pi

∫ Yi+Zi

0e−rtdt + IYi≥Pi

∫ Pi

0e−rtdt

∣∣∣∣Θi = θ]

=1r

E[1− IYi<Pie−r(Yi+Zi)− IYi≥Pie−rPi

∣∣Θi = θ]

=1r

E[

1− βi

r+βiIYi<Pie−rYi − IYi≥Pie−rPi

∣∣∣Θi = θ]

=r+αi +βi

(r+αi)(r+βi)E[(

1− e−(r+αi)Pi)∣∣Θi = θ

]

=r+αi +βi

(r+βi)(r+αi +θi).

Consequently, a variable transform ξ = θ(

ηi +∑kj=1 Yi j

)yields

Eπik [ fi(Θi,1)] =

(ηi +

k

∑j=1

Yi j

)∫ ∞

0

θe−θ(

ηi+∑kj=1 Yi j

)

(r+αi)+θ dθ

=∫ ∞

0

ξ e−ξ(

ηi +∑kj=1 Yi j

)(r+αi)+ ξ

dξ

and

Eπik [gi(Θi,1)] =

(ηi +

k

∑j=1

Yi j

)∫ ∞

0

r+αi +βi

(r+βi)(r+αi +θ )e−(

ηi+∑kj=1 Yi j

)θ dθ

=r+αi +βi

(r+βi)

∫ ∞

0

e−ξ

r+αi + ξ(

ηi +∑kj=1Yi j

)−1 dξ .

These show that Eπik [ fi(Θi,1)] is decreasing in k, whereas Eπik [gi(Θi,1)] is increas-ing in k. As a result, the one-step discounted reward rate

v1i (πik) = Eπik [ fi(Θi,1)]/Eπik [gi(Θi,1)]

is decreasing in k. It follows that, by Theorem 8.2, the Gittins index at πik can beexplicitly written as

G(πik) = v1i (πik) =

r+βi

r+αi +βi

∫ ∞

0

ξ e−ξ dξ(r+αi) (ηi +Yi1 + · · ·+Yik)+ ξ

∫ ∞

0

(ηi +Yi1 + · · ·+Yik)e−ξ dξ(r+αi)(ηi +Yi1 + · · ·+Yik)+ ξ

. (8.31)

Remark 8.4. The above example also demonstrates the difference in Gittins indicesbetween complete and incomplete information models. In the complete informationmodel (Θi = θi is known), by (8.26), the Gittins index for job i is

Gi = G(θi) =θi

r+αi +θi

(r+βi)(r+αi +θi)

r+αi +βi=

(r+βi)θi

r+αi +βi, (8.32)

which does not involve any realizations of random variables, hence will not need anyadjustment during the process. In the incomplete information model, on the otherhand, the Gittins indices must be updated progressively by Eq. (8.31) according tothe realizations of Yi j in the previous rounds.

8.3.2 Incompletion Information for Processing Times

This subsection treats a special case where only Pik depend on Θi, while Yik and Zikare independent of Θi, so as to identify certain conditions under which the poste-rior Gittins indices can be computed by the one-step reward rates. As we are nowonly concerned with the Gittins index, the job identifier i is dropped to simplify thenotation. Thus Pk represents the processing time required to complete the job in thekth round, and (Yk,Zk) are the kth up/downtimes for the job, k = 1,2, . . . ,∞. Thesequence of triplets (Pk,Yk,Zk) are conditionally i.i.d. as (P,Y,Z) given Θ . The cdfof P given Θ is denoted by F(x|θ ) = Pr(P ≤ x|Θ = θ ) and S(x|θ ) = 1−F(x|θ ) isthe decumulative distribution. Further write

f (θ ) = f (θ ,1) = E[e−rPI(P ≤ Y )|Θ = θ

]and

g(θ ) =g(θ ,1) = E[∫ ∆1

0e−rtdt|Θ = θ

]. (8.33)

To calculate the posterior Gittins index, we proceed by the following steps.

1. One-Step Posterior Distributions

Given that the machine has processed the job for time Y and Y θ |Y,Y < P) =Pr(Θ > θ ,Y < P|Y )

Pr(Y < P|Y )

=E [Pr(Θ > θ ,Y < P|Y,Θ)]

E [Pr(Y < P|Y,Θ)]=

E [I(Θ > θ )S(Y |Θ)|Y ]E [S(Y |Θ)|Y ]

=

∫ ∞θ S(Y |ξ )dFπ(ξ )∫ ∞−∞ S(Y |ξ )dFπ(ξ )

. (8.34)


2. Computing f (θ ) and g(θ )

Define

G(x) = 1− e−rxSY (x) and H(x) = 1−E[e−rτ ]−∫

(x,∞)(e−rx −D(y)e−ry)dFY (y),

where τ = Y + Z and D(Y ) = E[e−rZ|Y

]. Then we have the following lemma,

providing formulae for f (θ ) and g(θ ) defined in (8.33), from which we can workout a condition for the posterior Gittins index to be expressed as a one-step rewardrate (see Theorem 8.3 below).

Lemma 8.1.

f (θ ) =∫

(0,∞)F(x|θ )dG(x) = 1−

∫

(0,∞)S(x|θ )dG(x) (8.35)

and

g(θ ) = 1r

∫

(0,∞)S(x|θ )dH(t). (8.36)

Proof. By the definition of f (θ ),

f (θ ) = E[e−rPIP≤Y|Θ = θ ] = E[e−rPSY (P−)|Θ = θ ]

=∫ ∞

0e−rxSY (x−)dF(x|θ ). (8.37)

Since e−rxSY (x) is deceasing, right-continuous in x, and converges to zero as x → ∞,we have e−rxSY (x−) =

∫[x,∞) dG(t). Substitute this into (8.37) and interchange the

orders of integrations with respect to t and x, we get (8.35).Next, recall ∆ = PIP≤Y+(Y +Z)IP>Y. Similar to f (θ ) we can calculate

g(θ ) = E[∫ ∆

0e−rtdt

∣∣∣Θ = θ]

= E[

IP≤Y

∫ P

0e−rtdt + IP>Y

∫ Y+Z

0e−rtdt

∣∣∣Θ = θ]

= E[∫ τ

0e−rtdt

]−E

[I(Y ≥ P)

∫ Y+Z

Pe−rtdt

∣∣∣Θ = θ]

= E[∫ τ

0e−rtdt

]−E[A|Θ = θ ], (8.38)

where

A = IY≥P

∫ Y+Z

Pe−rtdt =

r−1IY≥P(e−rP − e−r(Y+Z)), r > 0,IY≥P(Y +Z−P), r = 0.

Consider the case r > 0 (the arguments for r = 0 are similar). Then 0 <D(Y )< 1since D(Y ) = E[e−rZ|Y ]. Hence

E[A|Θ ] =1r

E[IY≥P(e

−rP − e−r(Y+Z))|Θ]=

1r

E[IY≥P

(e−rP −D(Y )e−rY )|Θ

].

By iterated expectation,

E[A|Θ ] =1r

E[E[IY≥P(e

−rP −D(Y )e−rY )|P,Θ]]

=1r

E[∫

[P,∞)(e−rP −D(y)e−ry)dFY (y)

∣∣∣Θ]

Write H(x) =∫[x,∞)(e

−rx −D(y)e−ry)dFY (y), which is decreasing in x (because theintegrand is decreasing in y and positive for y > x), left continuous, and with

H(0) =∫

[0,∞)(1−D(y)e−ry)dFY (y) = 1−

∫

[0,∞)D(y)e−rydFY (y) = 1−E[e−rτ ].

Since H(x) = H(0)− H(x+) is right-continuous,

E[A|Θ = θ ] = 1r

∫ ∞

0H(x)dF(x|θ ) = 1

r

∫ ∞

0

∫

[x,∞)dH(t)dF(x|θ ).

Interchanging the integrations leads to

E[A|Θ = θ ] = 1r

∫ ∞

0

∫

[0,t])dF(x|θ )dH(t) =

1r

∫ ∞

0F(t|θ )dH(t)

=1r

[1−E[e−rτ ]−

∫ ∞

0S(x|θ )dH(t)

]. (8.39)

Inserting (8.39) into (8.38) yields (8.36). Thus the lemma is proved.

3. Weak Order of a Processing Time

Let S denote the support of Θ . Naturally, the class F(y|θ ),θ ∈ S of distribu-tions is stochastically ordered if F(y|θ ) are ordered in θ in the usual stochasticorder. Since Θ is now modelled as a random variable with distribution π , we defineanother order in F(y|θ ),θ ∈ S, which is weaker than the usual stochastic orderand comprises a key condition for the Gittins index to be computed with one-stepreward rate, see Theorem 8.3 later.

Definition 8.1. Let F(y|θ ),θ ∈ S be a class of distributions of the processingtime P identified by θ , and Θ be a random variable following a distribution π withsupport S. Then F(y|θ ),θ ∈S is said to be “weakly nondecreasing with respectto π”, if for all θ ∈S,

Pr(Pk+1 > y|Θ > θ ;Pj > Yj,Yj, j = 1, . . . ,k)

> Pr(Pk+1 > y|Θ = θ ;Pj > Yj,Yj, j = 1, . . . ,k), (8.40)

In this case we also say that P is weakly nondecreasing (with respect to π).

The left-hand side of (8.40) represents the conditional probability of Pk+1 > ygiven the values of Yj, j = 1, . . . ,k and the event (Θ > θ ;Pj > Yj, j = 1, . . . ,k), andthe right-hand side can be interpreted similarly.

This definition states that whatever are the realizations of (Yj,Pj), j = 1, . . . ,k,we have

Pr(Pk+1 > y|Θ > θ )> Pr(Pk+1 > y|Θ = θ ) (8.41)

under the current posterior distribution given Yj and Pj >Yj, j = 1, . . . ,k. In lightof the conditional independence of (Pj,Yj) over j (given Θ ), it is not difficult to seethat, given Y1, . . . ,Yk and Yj < Pj, j = 1, . . . ,k, (8.40) is equivalent to

∫ ∞θ S(y|ξ )∏k

i=1 S(Yj|ξ )dFπ(ξ )∫ ∞θ ∏k

i=1 S(Yj|ξ )dFπ(ξ )> S(y|θ ) (8.42)

a.s. for all values of θ and Yj, j = 1, . . . ,k. An obvious consequence is as follows.

Proposition 8.4. (8.42) holds if S(y|θ ) is nondecreasing in θ .

Note that ordering S(y|θ ),θ ∈ S in θ is equivalent to the stochastic order inθ , hence the order defined by (8.40) is weaker than the stochastic order between thecompetitive distributions. To establish the main theorem, we need two more lemmas.

Lemma 8.2. The function

Φ(θ ) =∫ ∞

θ S(y|ξ )∏kj=1 S(Yj|ξ )dFπ(ξ )

∫ ∞θ ∏k

j=1 S(Yj|ξ )dFπ(ξ )(8.43)

is nondecreasing in θ if P is weakly nondecreasing and S(y|θ ) is left-continuousin θ .

Proof. It suffices to prove the lemma for k = 0, since for k ≥ 1 we can associatethe product ∏k

i=1 S(Yi|ξ ) with dFπ(ξ ) to generate a new measure πk for Θ such thatdFπk(ξ ) = ∏k

i=1 S(Yi|ξ )dFπ(ξ ).We prove the lemma for k = 0 by contradiction. If the lemma is false, then there

exist two numbers θ1 < θ2 such that Φ(θ1) > Φ(θ2). For nonnegative numbersa,b,c,d,

ab<

a+ cb+ d

⇐⇒ a+ cb+ d

<cd⇐⇒ a

b<

cd. (8.44)

Note the fact that Φ(θ1)>Φ(θ2) implies∫ θ2

θ1dFπ(ξ )> 0. There exists a u∈ (θ1,θ2]

such that S(y|u)∫ θ2

θ1dFπ(ξ )≥

∫ θ2θ1

S(y|ξ )dFπ(ξ ). Let

ξ1 = sup

u ∈ (θ1,θ2] : S(y|u)≥

∫ θ2θ1

S(y|ξ )dFπ(ξ )∫ θ2

θ1dFπ(ξ )

.

Then ξ1 > θ1 and since S(y|θ ) is left-continuous in θ ,

S(y|ξ1)≥∫ θ2

θ1S(y|ξ )dFπ(ξ )∫ θ2

θ1dFπ(ξ )

> Φ(θ1)> Φ(θ2).

Moreover, the condition S(y|θ2)≤Φ(θ2) (see (8.42)) implies ξ1 < θ2. Hence by thecondition of the lemma, Φ(ξ1) ≥ S(y|ξ1) > Φ(θ1) > Φ(θ2). It then follows from(8.44) that

∫ ∞ξ1

S(y|ξ )dFπ(ξ )∫ ∞

ξ1dFπ(ξ )

>

∫ ∞θ1

S(y|ξ )dFπ(ξ )∫ ∞θ1

dFπ(ξ )>

∫ ξ1θ1

S(y|ξ )dFπ(ξ )∫ ξ1

θ1dFπ(ξ )

(8.45)

and∫ θ2

ξ1S(y|ξ )dFπ(ξ )∫ θ2

ξ1dFπ(ξ )

>

∫ ∞ξ1


ξ1dFπ(ξ )

>

∫ ∞θ2


dFπ(ξ ). (8.46)

Merging (8.45) and (8.46) by linking the mutual term∫ ∞

ξ1S(y|ξ )dFπ(ξ )/

∫ ∞ξ1

dFπ(ξ ),we get

∫ θ2ξ1


ξ1dFπ(ξ )

>

∫ ∞ξ1


ξ1dFπ(ξ )

>

∫ ∞θ1


dFπ(ξ )>

∫ ξ1θ1

S(y|ξ )dFπ(ξ )∫ ξ1

θ1dFπ(ξ )

.

A similar arguments yields

∫ θ2ξ1


ξ1dFπ(ξ )

>

∫ θ2θ1


θ1dFπ(ξ )

. (8.47)

On the other hand, for u ∈ (ξ1,θ2), S(y|u)∫ θ2

θ1dFπ(ξ ) <

∫ θ2θ1

S(y|ξ )dFπ(ξ ).Therefore,

∫ θ2ξ1


ξ1dFπ(ξ )

<

∫ θ2θ1


θ1dFπ(ξ )

, (8.48)

which contradicts (8.47).

The next lemma states that the one-step posterior distribution is stochasticallydominant over the prior distribution π with probability 1, if P is weakly nonde-creasing with respect to π .


Lemma 8.3. If S(x|θ ) is left-continuous in θ on S and P is weakly nondecreasingwith respect to π , then

1. Sπ(θ ) = Sπ(θ |Y )≥ Sπ(θ ), and2. P is weakly nondecreasing with respect to π.

Proof. Since Φ(θ ) is nondecreasing in θ , it follows that

Φ(θ ) ≥ Φ(−∞) =

∫ ∞−∞ S(y|ξ )dFπ(ξ )∫ ∞

−∞ dFπ(ξ )=∫ ∞

−∞S(y|ξ )dFπ(ξ ).

Hence by (8.34),

Sπ(θ ) =∫ ∞

θdFπ(ξ )≤

∫ ∞θ S(y|ξ )dFπ(ξ )∫ ∞−∞ S(y|ξ )dFπ(ξ )

≤ Sπ(θ ).

This proves part (1) of the lemma. Part (2) is a straightforward consequence of thedefinition of weak order.

4. Gittins Index via One-Step Reward Rate

Based on the above lemmas, we can prove the main theorem below on the conditionsfor the posterior Gittins index to be calculated by the one-step reward rate.

Theorem 8.3. Suppose that

• f (θ ) is nonincreasing in θ and g(θ ) is nondecreasing in θ ;• S(x|θ ) is left-continuous in θ on S; and• P is weakly nondecreasing.

Then G(π) = Eπ [ f (θ )]/Eπ [g(θ )].

Proof. With the initial state π0 = π , denote the states entering the system con-secutively by π1,π2, . . . , where πk satisfies Sπk+1(θ ) = Sπk(θ |Yk+1), k ≥ 0, if themachine has broken down k+ 1 times with uptimes Y1,Y2, . . . ,Yk+1. It follows fromLemma 8.3 that Sπk+1(θ ) ≥ Sπk(θ ) for all θ , k, and yi, i = 1,2, . . . ,k. Thus themonotonicity of f (θ ) and g(θ ) implies the one-step reward rates to decrease ink. The theorem then follows from Theorem 8.2.

When the cdf F(x|θ ) is monotone in θ , an immediate consequence is the follow-ing corollary.

Corollary 8.2. If the conditional decumulative distribution S(x|θ ) of the processingtime is left-continuous and nondecreasing in θ , then G(π) = Eπ [ f (θ )]/Eπ [g(θ )].

Proof. By (8.35) and (8.36), it is clear that f (θ ) is nonincreasing in θ and g(θ )is nondecreasing. Moreover,

∫ ∞θ S(y|ξ )dFπ(ξ ) ≥ S(y|θ )

∫ ∞θ dFπ(ξ ) since S(y|θ ) is

nondecreasing in θ . Namely, P is weakly nondecreasing. Thus the conditions inTheorem 8.3 are satisfied.

Remark 8.5. If θ represents the expected value of the distribution, then S(x|θ ) iscontinuous and nondecreasing in θ in many distribution families considered in theliterature for nonnegative random variables. Examples include

• Exponential distribution;• Gamma distribution with a common shape parameter;• Weibull distribution with a common shape parameter;• Pareto distribution with a common shape parameter;• Lognormal distribution with a common variance parameter.

Hence Corollary 8.2 applies to these distributions. The conditions required byTheorem 8.3 are weaker. It would be interesting to see a nontrivial example in whichTheorem 8.3 applies, but not Corollary 8.2, which is given below.

Example 8.2. Let Y = y1 with probability 1 and Θ takes on three possible valuesθ1 < θ2 < θ3 with Pr(Θ = θi) = pi, i = 1,2,3. The probabilities p1, p2, p3 satisfyp1 + p2 + p3 = 1 and

p3

p2>

1ey2(1/θ1−1/θ3)− 1

. (8.49)

Further define

S(x|θi) = e−x/θi , i = 1,3, and S(x|θ2) = Ix<y2e−x/θ2 , (8.50)

where y2 > y1. Then we can show that Theorem 8.3 applies but not Corollary 8.2 asfollows.Corollary 8.2 does not apply: Although S(x|θ1) ≤ S(x|θ3) and S(x|θ2) ≤ S(x|θ3),S(x|θ1) and S(x|θ2) cannot be ordered by θ1 and θ2, hence S(x|θ ) cannot be orderedby the value of θ .

Theorem 8.3 applies: This is done in the following two steps.1. f (θ ) is nonincreasing in θ and g(θ ) nondecreasing in θ . By Lemma 8.1,

G(x) = 1− e−rxI(0,y1)(x) and H(x) = 1−E[e−rτ]− I(0,y1)(x)(e

−rx −De−ry1),

where D = E[e−rZ]. It follows that

f (θ ) = 1−∫ ∞

0S(x|θ )dG(x) = 1− r

∫ y1

0S(x|θ )e−rxdx− e−ry1S(y1|θ )

and

g(θ ) = 1r

∫ ∞

0S(x|θ )dH(t) =

∫ y1

0S(x|θ )e−rxdx+

1r(1−D)e−ry1S(y1|θ ).

Since y2 > y1, S(y1|θi) = e−y1/θi for i = 1,2,3, we further obtain

f (θi) = 1−[

r∫ y1

0e−x/θi−rxdx+ e−y1/θi−ry1

]

and

g(θi) =∫ y1

0e−x/θi−rxdx+

1−Dr

e−y1/θi−ry1 .

Thus f (θ1)> f (θ2)> f (θ3) and g(θ1)< g(θ2)< g(θ3).

2. P is weakly nondecreasing with respect to π : By (8.42), it suffices to show that∫ ∞

θiS(y|ξ )∏n

i=1 S(Yi|ξ )dFπ(ξ )∫ ∞θi

∏ni=1 S(Yi|ξ )dFπ(ξ )

> S(y|θi), i = 1,2. (8.51)

Since Y degenerates at y1 and y1 < y2, by the definitions of S(y|θi) (see (8.50))and the prior distribution π , (8.51) reduces to

∫ ∞θi

S(y|ξ )e−ny1/ξ dFπ(ξ )∫ ∞

θie−ny1/ξ dFπ(ξ )

> S(y|θi), i = 1,2. (8.52)

For i = 1, since S(y|θi)≥ S(y|θ1) (see (8.50)), if y < y2, then

∫ ∞θ1


θ1e−ny1/ξ dFπ(ξ )

>

∫ ∞θ1

S(y|θ1)e−ny1/ξ dFπ(ξ )∫ ∞


= S(y|θ1).

If y ≥ y2,

∫ ∞θ1



=p2S(y|θ2)e−ny1/θ2 + p3S(y|θ3)e−ny1/θ3

p2e−ny1/θ2 + p3e−ny1/θ3

=p3e−y/θ3e−ny1/θ3

p2e−ny1/θ2 + p3e−ny1/θ3,

which is a decreasing function of p2. Hence under condition (8.49),

p3e−y/θ3e−ny1/θ3

p2e−ny1/θ2 + p3e−ny1/θ3>

p3e−y/θ3e−ny1/θ3

p3[ey2(1/θ1−1/θ3)− 1

]e−ny1/θ2 + p3e−ny1/θ3

=e−y/θ3

[ey2(1/θ1−1/θ3)− 1

]e−ny1(1/θ2−1/θ3) + 1

>e−y/θ3

ey2(1/θ1−1/θ3)− 1+ 1= e−y/θ1e(y−y2)(1/θ1−1/θ3)

> e−y/θ1 = S(y|θ1).


This shows that (8.52) holds for i = 1. For i = 2, it is obvious that∫ ∞

θ2S(y|ξ )e−ny1/ξ dFπ(ξ )∫ ∞


= S(y|θ3)> S(y|θ2).

Thus (8.52) holds for i = 2 as well.

Chapter 9Optimal Policies in Time-Varying Scheduling

The mainstream of the scheduling theory broadly adopts the assumption that theprocessing time of a job is invariant in the sense that it is independent of thestart time or the processing sequence. In practice, however, there are many situa-tions where processing time of a job may be a function of time when it starts tobe served or the position it is served. Scheduling in such situations is generallyreferred to as time-dependent scheduling or time-varying scheduling. While the as-sumption of invariant processing times may reflect (or approximate) the real lifein certain situations, it is hardly justifiable on a more general ground, and is of-ten an over-simplified picture of the reality so as to take the advantage of com-putational convenience. Many practical instances have been reported that severelyviolate the time-invariant assumption, in which any delay of processing will havesignificant impact (increase or decrease) on the overall efforts (time, cost, etc.) toaccomplish the task. Significant progress regarding the time-varying scheduling hasbeen made in the past decades.

There are typically two types of scheduling models to deal with time-varying sit-uations: one is referred to as deteriorating processing times and the other as learningeffects. Deterioration models treat the scenarios in which job processing times arenondecreasing in their start times (to model the situation that waiting would in-crease job processing time). They have found applications in fire fighting, financialmanagement, food processing, maintenance, resource allocation, military objectivesearching, national defense, and computer science, see the two survey papers byCheng et al. (2004) and Alidaee and Womer (1999) and the references therein. Thestudy on learning effects dates back to Biskup (1999), who considered the schedul-ing problems with position-dependent processing times Pir = raPi, i = 1,2, . . . ,n,where a ≤ 0 denotes learning effect. Since then, scheduling problems with vari-ous types of learning effects have attracted growing interests, which are generallyrepresented by time/position dependent processing times; See, for example, Kuoand Yang (2006), Koulamas and Kyparisis (2007), Yin et al. (2009), and Wu et al.(2011). More details on recent research can be found in Mosheiov (2001), Wangand Cheng (2007), Biskup (2008), Wu and Lee (2009), Wang et al. (2010), Yinet al. (2011), Lee (2011), Anzanello and Fogliatto (2011), and so on.


321

322 9 Optimal Policies in Time-Varying Scheduling

A significant feature of machine breakdown is its companion repairing time (i.e.downtime), which has a crucial impact on the processing of a job as well as theinformation accumulation process. It has been shown that this impact usually mag-nifies the difficulty of the scheduling problem significantly. It will be much moreserious under deteriorating job processing, where any delay of processing due tobreakdowns could further increase the processing time/cost. This “double” impactmakes the job scheduling much more difficult to tackle when the processing timesare time-varying; the worst case is an unprocessible job in the sense that its com-pletion time may be infinite with a positive probability. Under such circumstances,the properties of job processing times and the operation system may change signifi-cantly, and the complexity of seeking optimal policies may increase dramatically.

This chapter presents an exposition for the two types of time-varying schedulingproblems as briefed above. Section 9.1 deals with deteriorating processing times forscheduling a set of n jobs on a single machine subject to stochastic breakdowns.We focus on linear deterioration and no-loss (preemptive-resume) machine break-downs. Specifically, we formulate the mechanism of linear deterioration to allowmachine breakdowns in Sect. 9.1.1, discuss the conditions for a job to be proces-sible under deterioration and machine breakdowns in Sect. 9.1.2, derive the proba-bilistic features of the model with exponentially distributed uptimes and downtimesvia Laplace transforms and differential equations in Sect. 9.1.3, and find optimalpolicies for minimizing the expected makespan in Sect. 9.1.4. In addition, analyti-cal expression of the variance of makespan and its solutions and complexity are alsotreated in Sect. 9.1.4. Learning effect models are discussed in Sect. 9.2. In Sect. 9.2.1we consider optimal scheduling with learning effects but no machine breakdowns.The results are then extended to models with machine breakdowns in Sect. 9.2.2.The main results of this chapter are mainly based on Cai et al. (2011) and Zhanget al. (2013).

9.1 Stochastic Scheduling with Deteriorating Processing Times

9.1.1 Model Formulation

Suppose that a set of n jobs are to be processed on a single machine, which are allavailable at time zero. The machine can process at most one job at a time. We areconcerned only with the static policies λ = i1, i2, . . . , in to decide the order of thejobs to be processed.

First we look at the case without job deterioration and machine breakdowns.In such a standard case, each job i is associated with an initial processing time Xi,i = 1,2, . . . ,n, which are assumed to be independent of one another.

In the following two subsections, job index i is suppressed for ease of notationsince we only work with the features of an individual job. When the problem issubject to job deterioration and machine breakdowns, the ‘true’ processing timemay differ from X . The formulation of deterioration and breakdowns are elaboratedbelow.

9.1 Stochastic Scheduling with Deteriorating Processing Times 323

• Machine breakdowns: We consider a no-loss model, see Chap. 4 for details. Inaddition, the representative uptime Y and downtime Z are independent of eachother with cdfs G(x) and H(x) respectively. A particular case is that Y and Zfollow exponential distributions with rates µY and µZ respectively.

• Job deterioration: We consider only the linear deterioration model as follows.

– We begin with the notion of remainder processing requirement at time t. Let Xrepresent the processing time required to complete a job under the “standard”conditions (no deterioration and no machine breakdowns), referred to as theinitial requirement. For a deteriorative job, the remainder processing require-ment at time t, denoted by X(t), is a stochastic process with X(0) = X .

– Under the linear deterioration assumption, X(t +∆ t) = X(t)+α∆ t if the jobis idle (not receiving any processing efforts) during the time interval (t, t+∆ t],where 0 ≤ α < 1; and X(t +∆ t) = X(t)− (1−α)∆ t if it is processed duringthe time interval (t, t +∆ t]. Here α∆ t indicates the increment of processingrequirement due to the deterioration and ∆ t stands for the reduction of theprocessing requirement thanks to the job processing. The processing on thejob at any time is a fight against the deterioration with the effect of reducingthe processing requirement at rate (1 −α). This model is referred to as arestless deterioration.

– If the machine breaks down at time s with remainder processing requirementX(s) and then experiences a downtime Z, the job will be reprocessed again attime s+Z with the new processing requirement X(s)+αZ. Figure 9.1 belowshows a typical sample path of the requirement process X(s) for a job fromtime zero to its completion. The sample path is continuous and piecewiselinear.

– Define N(t) to be the frequency of breakdowns by time t, i.e.,

N(t) = max

m :

m

∑k=0

τk ≤ t

,

where τk = Yk +Zk is the k-th processing duration of the job with τ0 = 0 forconvenience. Then the processing requirement X(t) can be expressed as

X(t) = X +αt −N(t)

∑k=0

Yk − (t −TN(t))∧YN(t)+1, (9.1)

where Tm = ∑mk=0 τk, X is the initial processing requirement of the job, and

∧ is the minimum operator defined by a∧ b = min(a,b). X(t) appears as anintractable stochastic process since N(t) is not independent of Yk.

– Denote by O(x) the occupying time of the job with initial requirement x,if it starts being processed at time zero. O(x) is the smallest solution ofthe equation X(O(x)) = 0. Moreover, if the job starts at some time s > 0,the occupying time, denoted by O(x,s) at this point, can be calculated byreplacing x with x+αs as O(x,s) = O(x+αs), which implies that to get the


14

12

10

8

6

4

2

00 10 20 30 40 50 60 70O (x )

Fig. 9.1 A typical realization of processing requirement

occupying time for a different start time, we only need to calculate O(x) forall x > 0.

• Objective function: Let Ci(π) denote the completion time of job i and C( j)(π)the completion time of the j-th processed job under any policy π . The objective isto find optimal policies that minimize the following expectation (EM) or variance(VM) of makespan :

EM(π) = E [Cmax] , VM(π) = Var [Cmax] , (9.2)

where Cmax =C(n) is the makespan of the n jobs, i.e., the completion time of thejob last processed.

Remark 9.1. We here illustrate the connection between this restless deteriorationand the traditional deterioration. If the job starts being processed at time s and thenis processed continuously till its completion (i.e., no machine breakdowns), then therequirement process of the job is

X(t) =

X +αt t ≤ sX + s− (1−α)t t > s

. (9.3)

Clearly, the processing on a job is terminated at the first time when its processingrequirement reaches zero. The instant T (s) at the completion of the job is given by

T (s) = inft : X(t) = 0.

Thus X(T (s)) = 0 and so T (s) = (X + s)/(1−α) by (9.3). Consequently, at times, when the job is selected to be processed, the real processing time (withoutpreemption) is

T (s)− s =X + s1−α − s =

X1−α +

α1−α s = T (0)+

α1−α s. (9.4)

Here and throughout this chapter, we assume 0 < α < 1. Equation (9.4) shows thatour restless deterioration model coincides with the traditional assumptions, withT (0) and α/(1−α) in place of the initial processing time and the deterioration ratein the traditional linear deterioration model.

9.1.2 Processibility

Due to the joint impact of deterioration and breakdowns, a job may be unproces-sible in the sense that its processing requirement is strictly positive at every timeinstant, so that the job will never be completed. Such a phenomenon is possible inthe situations where, for example, the uptimes are too short and/or the downtimes aretoo long, so that the deterioration outpaces the accumulation of processing achieve-ment. As a result, unlike in classical models, the occupying time O(x,s) =O(x+αs)may be an extended random variable that is infinite with a positive probability. Weformally define the processibility as follows.

Definition 9.1. A job is said to be “processible” at time s for an initial processingrequirement x if Pr(O(x,s) < ∞) = 1, and the processing is said to be “regular” ifPr(O(x)< ∞) = 1 for all initial values x.

It is clear that for the job to be processible, certain conditions should be satisfiedby the initial processing requirement x, the deterioration rate and the breakdownprocess. We first deal with the simplest case where the job is processed at time zerowith initial requirement x, and then extend the results to allow the job to be selectedto process at an arbitrary time point. Let ψ(x) = Pr(O(x) < ∞) for a deterministicinitial processing requirement x. For a random initial requirement X , it is evidentthat the probability ψX = Pr(O(X)< ∞) can be computed by ψX = E[ψ(X)]. There-fore, we now deal with ψ(x). For the time being we rearrange the up/down timesas (Z1,Y1) ,(Z2,Y2) , . . . ,(Zk,Yk) , . . .. That is, the job experiences a downtime be-fore the first uptime. This process will be referred to as a downtime-first processand the original one an uptime-first process. The downtime-first process is madefrom the uptime-first process by taking away the first uptime of the machine, forwhich the processing requirement process is shown in Fig. 9.2 below.

Again let τk = Zk +Yk. In the downtime-first case, the requirement process is

X(t) = x+αt −N(t)

∑k=0

Yk −max(t −TN(t)−ZN(t)+1,0), (9.5)

provided that the initial requirement is still x, where Tm = ∑mk=0 τk. In particular,

X(Tm) = x+αTm −m

∑k=0

Yk = x+m

∑k=0

[αZk − (1−α)Yk] . (9.6)

14

12

10

8

6

4

2

00 10 20 30 40 50 60O (x )

Fig. 9.2 A typical realization of a downtime-first processing requirement

The occupying time for a downtime-first process, denoted by O1(x), is also definedas such that X(O1(x)) = 0. Then O(x) can be rewritten in terms of O1(x) as

O(x) =x

1−α ∧Y1 + I(

Y1 <x

1−α

)O1(x− (1−α)Y1).

An immediate result is

O(x) = ∞ if and only if Y1 <x

1−α and O1(x− (1−α)Y1) = ∞,

which results in the relation

Pr(O(x) = ∞) = E[

I(

Y1 <x

1−α

)Pr(O1(x− (1−α)Y1) = ∞|Y1)

]. (9.7)

So we see that the conclusion on the processibility regarding O(x) can be implied bythe corresponding conclusion regarding O1(x), and the latter is more tractable math-ematically. We now turn to the calculation of the probability Pr(O1(x)< ∞) underthe downtime-first process. Write φ(x) = Pr(O1(x) = ∞). The following lemma de-scribes the renewal equation satisfied by φ(x).

Lemma 9.1. φ(x) satisfies the renewal equation

φ(x) =∫ ∞

x

∫ s

0φ(s− t)dG(t)dH(s), (9.8)

where G(t) = G(t/(1−α)) and H(s) = H((s− x)/α).


Proof. First noticing that O1(x) = ∞ if and only if X(Tn)> 0 for all n = 1,2, . . . , wesee that

φ(x) = Pr(X(Tm)> 0,m = 1,2, . . .) = Pr(X(T1)> 0,X(Tm)> 0, m = 2,3, . . .).

Using the law of iterated expectations, it follows that

φ(x) = E[E[I(X(T1)> 0,X(Tm)> 0, m = 2,3, . . .)|X(T1)]]

= E[I(X(T1)> 0)E[I(X(Tm)> 0, m = 2,3, . . .)|X(T1)]]

= E[I(X(T1)> 0)φ(X(T1))],

where the last equality follows from the relation

E[I(X(Tm)> 0, m ≥ 2)|X(T1)] = Pr(I(X(Tm)> 0, m ≥ 2)|X(T1)) = φ(X(T1)).(9.9)

Using (9.6), we further get

φ(x) = E[I(x+αZ1 − (1−α)Y1 > 0)φ(x+αZ1 − (1−α)Y1)]

=∫ ∞

0

∫ (u+αz)/(1−α)

0φ(x+αz− (1−α)y)dG(y)dH(z). (9.10)

Then (9.8) follows by transforming variables from (y,z) to (s, t) by s = x+αz andt = (1−α)y in the integral in (9.10). The proof is thus complete.

From now on, we suppose that Z and Y follow exponential distributions. The nexttheorem gives the representation of φ(x).

Theorem 9.1. If Z and Y follow exponential distributions with rates µZ and µY ,we have

φ(x) =

⎧⎨

⎩1− (1−α)µZ

αµYexp−(

µY

1−α − µZ

α

)x

ifµY

1−α >µZ

α ,

0 otherwise.(9.11)

Before proving Theorem 9.1, we need a lemma that gives a boundary of φ(x) inthe case µY/(1−α)> µZ/α.

Lemma 9.2. If Z and Y are exponentially distributed with rates µZ and µY respec-tively, i.e., G(u) = 1−e−µY u and H(u) = 1−e−µZu,u ≥ 0, and µY/(1−α)> µZ/α ,then

φ(x) ≥ 1− exp−(

µY

1−α − µZ

α

)x.

Proof. We define φm(x) = Pr(X(Tk)> 0, k = 1,2, . . . ,m). Then it is clear that

φ(x) = limm→∞

φm(x).


So for the lemma, it suffices to show

φm(x)> 1− exp−(

µY

1−α − µZ

α

)x

for all m ≥ 1. (9.12)

We proceed with induction arguments. For m = 1,

φ1(x) = Pr(X(T1)> 0) = Pr(x+αZ1 − (1−α)Y1 > 0).

A straightforward computation shows that

φ1(x) = 1− (1−α)µZ

(1−α)µZ +αµYexp− µY

1−α x. (9.13)

Under the condition µY/(1−α)> µZ/α , (9.12) follows from (9.13) for m = 1.Suppose that (9.12) holds for m. We consider the case m+ 1. Since

φm+1(x) = Pr(X(Tk)> 0,k = 1,2, . . . ,m+ 1).

= E[IX(Tk)>0Pr(X(Tk)> 0, k = 2, . . . ,m+ 1|X(T1))

]

= E[IX(T1)>0φm(X(T1))

],

the induction hypothesis implies

φm+1(x)> E[IX(T1)>0

(1− e−RX(T1)

)]= φ1(x)−E

[IX(T1)>0e−RX(T1)

], (9.14)

where R = µY/(1−α)− µZ/α > 0. Furthermore,

E[IX(T1)>0e−RX(T1)

]= E

[I(x+αZ1 > (1−α)Y1)e−R(x+αZ1−(1−α)Y1)

]

= µY E[

e−R(x+αZ1)∫ (x+αZ1)/(1−α)

0eR(1−α)y−µY ydy

].

Using R = µY/(1−α)− µZ/α yields


]= µY E

[e−R(x+αZ1)

∫ (x+αZ1)/(1−α)

0e−

(1−α)µZα ydy

].

Consequently,


]= e−Rx − αµY

αµY +(1−α)µZe−µY x/(1−α).

Substituting this and (9.13) into (9.14), we obtain

φm+1(x)> 1− e−Rx+αµY − (1−α)µZ

αµY +(1−α)µZe−πY x/(1−α) > 1− e−Rx.

So the lemma is proved by the induction principle.


Proof (of Theorem 9.1). Since Z and Y follow exponential distributions with ratesµZ and µY , respectively, the renewal equation (9.8) becomes

φ(x) = µY µZ

∫ ∞

0

∫ (x+αz)/(1−α)

0φ(x+αz− (1−α)y)e−(µY y+πZz)dydz. (9.15)

Let s = x+αz and t = x+αz− (1−α)y. Then the renewal equation (9.15) for φ(x)can be rewritten as

φ(x) = µY µZ

α(1−α)eµZ x/α

∫ ∞

xexp−(

µY

1−α +µZ

α

)s∫ s

0φ(t)eπY t/(1−α)dtds,

so that

φ(x)e−µZx/α =µY µZ

α(1−α)

∫ ∞

xexp−(

µY

1−α +µZ

α

)s∫ s

0φ(t)eµY t/(1−α)dtds.

Differentiating both sides of this equation with respect to x and then multiplyingthem by exp[µY/(1−α)+ µZ/α]x, we get

[φ ′(x)− µZ

α φ(x)]

eµY x/(1−α) =− µY µZ

α(1−α)

∫ x

0φ(t)eµY t/(1−α)dtds. (9.16)

By taking the second derivative on both sides of (9.16) with respect to x and multi-plying them again by e−µY x/(1−α), it follows that

φ ′′(x)+[

µY

1−α − µZ

α

]φ ′(x) = 0. (9.17)

This is a second-order differential equation. Its general solutions can be expressedas follows depending on the value of R = µY/(1−α)− µZ/α .

If R = 0, the general solution to (9.17) is, for x > 0,

φ(x) =C1 +C2e−Rx. (9.18)

As a result,

φ(0+) =C1 +C2 and φ ′(0+) =C1 −RC2. (9.19)

On the other hand, by (9.16),

φ ′(0+) =µZ

α φ(0+). (9.20)

Substituting (9.19) into (9.20), we get C2 =−(1−α)µZC1/αµY . Hence

φ(x) =C1

(1− (1−α)µZ

αµYe−Rx

). (9.21)

Consider the following cases:

1.µY

1−α >µZ

α (R > 0). As φ(∞) = 1 due to Lemma 9.2, we have C1 = 1. Thus

φ(x) = 1− (1−α)µZ

αµYexp−(

µY

1−α − µZ

α

)x, x > 0.

2.µY

1−α =µZ

α (R = 0). It is straightforward to see that φ(x) = 0.

3.µY

1−α <µZ

α (R < 0). In this case since exp−Rx trends to infinity as x grows,

we must have C1 = 0 in order to ensure φ(x) ≥ 0, so that φ(x) = 0 again.


Consequently, the probability Pr(O(x) = ∞) can be computed by Theorem 9.1,and the result is presented in the next theorem.

Theorem 9.2. If Z and Y follow exponential distributions with rates µZ and µY ,respectively, then

Pr(O(x) = ∞) =

⎧⎪⎨

⎪⎩

1− exp−(

µY

1−α − µZ

α

)x

i fµY

1−α >µZ

α0 i f

µY

1−α ≤ µZ

α

.

Proof. By (9.7) and (9.11), if µY/(1−α)> µZ/α , then

Pr(O(x) = ∞)

= E[

I(

Y1 <x

1−α

)[1− (1−α)µZ

αµYexp−(

µY

1−α − µZ

α

)[x− (1−α)Y1]

]]

= µY

∫ x/(1−α)

0

[1− (1−α)µZ

αµYexp−(

µY

1−α − µZ

α

)[x− (1−α)y]

]e−µY ydy

= 1− e−µY x/(1−α)− (1−α)µZ

α exp−(

µY

1−α − µZ

α

)x∫ x/(1−α)

0e−(1−α)µZy/α dy.

Simple computation gives

Pr(O(x) = ∞) = 1− exp−(

µY

1−α − µZ

α

)x.

If µY/(1−α)≤ µZ/α , then Theorem 9.1 and (9.7) imply Pr(O(x) = ∞) = 0.

This theorem indicates that the processibility is equivalent to the inequality

µY

1−α ≤ µZ

α ,

which is independent of the initial requirement x. Therefore, for the job processingto be regular, it suffices to have a deterministic initial processing requirement xsuch that the job is processible. Alternatively, the condition is µY/(1−α)≤ µZ/α ,or equivalently, (1−α)/µY ≥ α/µZ . That is, the capability of processing shouldbe larger than the capability of deterioration regardless of the initial processingrequirement.

9.1.3 The Characteristics of Occupying Time

In this subsection we calculate some numerical characteristics of the occupyingtime, including its expectation and variance, via Laplace transform. We return tothe uptime-first process and consider the problem without the constraints of expo-nentially distributed uptimes and downtimes. Let ϕ(x,r) = E[e−rO(x)] denote theLaplace transform of the occupying time O(x). We first present a lemma on therenewal equation for ϕ(x,r).

Lemma 9.3. ϕ(x,r) satisfies the following renewal equation:

ϕ(x,r) = e−rx/(1−α)SY1

(x

1−α −)

+E[e−rτ1ϕ(x+αZ1 − (1−α)Y1,r)IY1<x/(1−α)

], (9.22)

where SY1 is the tail probability of Y1, so that SY1(x/(1−α)−)= Pr(Y1 ≥ x/(1−α)).

Proof. If the processing is finished by the first breakdown, i.e., Y1 ≥ x/(1−α), it isclear that O(x) = x/(1−α). Otherwise, at the time point τ1 =Y1 +Z1 when the firstbreakdown finished and the job is to be processed again on the machine, the newprocessing requirement becomes O(τ1) = x+ατ1 −Y1 = x+αZ1 − (1−α)Y1, andthe new occupying time of the job is then O(X(τ1)) = O(x+αZ1 − (1−α)Y1) fromthis point onwards. Combining these two cases, we can express the total occupyingtime O(x) as

O(x) =x

1−α IY1≥x/(1−α)+[τ1 +T (X(τ1))] IY1<x/(1−α).

Therefore,

e−rO(x) = e−rx/(1−α)IY1≥x/(1−α)+ e−r(τ1+T(x+αZ1−(1−α)Y1))IY1<x/(1−α), (9.23)

and ϕ(x,r) can be represented as

ϕ(x,r) = e−rx/(1−α)SY1

(x

1−α−)+E

[e−r(τ1+T (x+αZ1−(1−α)Y1))IY1<x/(1−α)

].

Then (9.22) follows from the law of iterated expectation and computing the condi-tional expectation in the second term given Y1.

This lemma gives the renewal equation for ϕ(x,r) under general conditions,which is difficult to solve in general. When Y1 and Z1 are exponentially distributed,however, we can get an analytic form for ϕ(x,r) from the renewal equation in (9.22),which is shown in the next theorem.

Theorem 9.3. If Y1 and Z1 are independent and exponentially distributed with ratesµY and µZ respectively, then ϕ(x,r) = eR2(r)x, where R2(r) is the non-positive rootof the quadratic equation

R2 +

(µY + r1−α − µZ + r

α

)R−

(µZ + r

αµY + r1−α − µY µZ

(1−α)α

)= 0. (9.24)

Proof. Under the exponential distributions of Y and Z, by representing the expecta-tion as an integral, the renewal equation (9.22) can be rewritten as

ϕ(x,r) = e−(µY+r)x/(1−α)

+ µY µZ

∫ ∞

0

∫ ∞

0ϕ(x+αz− (1−α)y,r)Iy<x/(1−α)e

−(r+πY )y−(r+µZ)zdydz.

For this equation, by letting v = x+αz and u = x+αz− (1−α)y, and noting thaty < x/(1−α) is equivalent to u > v− x, we obtain

ϕ(x,r)− e−(µY+r)x/(1−α)

=µY µZ

(1−α)α

∫ ∞

x

∫ v

−∞ϕ(u,r)Iu>v−xe−(r+µY )(v−u)/(1−α)e−(r+µZ)(v−x)/α dudv.

Multiplying this equation by e−(r+µZ)x/α leads to

ϕ(x,r)e−(r+µZ)x/α − exp−(

µY + r1−α +

µZ + rα

)x

=µY µZ

(1−α)α

∫ ∞

xexp−(

µY + r1−α +

µZ + rα

)v∫ v

v−xϕ(u,r)e(πY+r)u/(1−α)dudv.

Differentiating the equation with respect to x yields[

ϕ ′x(x,r)−

µZ + rα ϕ(x,r)

]e−(µZ+r)x/α

+

(µY + r1−α +

µZ + rα

)exp−(

µY + r1−α +

µZ + rα

)x

=µY µZ

(1−α)α

[−exp

−(

µY+r1−α +

µZ+rα

)x∫ x

0ϕ(u,r)e(µY+r)u/(1−α)du

+∫ ∞

xexp−(

µY+r1−α +

µZ+rα

)v

ϕ(v−x,r)e(r+µY )(v−x)/(1−α)dv]

=µY µZ

(1−α)α

[− exp

−(

µY + r1−α +

µZ + rα

)x∫ x

0ϕ(u,r)e(r+µY )u/(1−α)du

+ exp−(

µY + r1−α +

µZ + rα

)x∫ ∞

0ϕ(w,r)e−(r+µZ )w/α dw

],

where the last equality follows by taking w = v− x. Multiplying both sides of theabove again by exp[(µY + r)/(1−α)+ (r+ µZ)/α]x gives

[ϕ ′

x(x,r)−r+ µZ

α ϕ(x,r)]

e(r+µY )x/(1−α) +

(µY + r1−α +

r+ µZ

α

)

=− µY µZ

(1−α)α

[∫ x

0ϕ(u,r)e(r+µY )u/(1−α)du−

∫ ∞

0e−(r+µZ)v/α ϕ(v,r)dv

].

Differentiate with respect to x again and multiply by e(r+µY )x/(1−α) to get

ϕ ′′xx(x,r)−

r+µZ

α ϕ ′x(x,r)+

r+µY

1−α ϕ ′(x,r)− r+µY

1−αr+µZ

α ϕ(x,r) =− µY µZ

(1−α)α ϕ(x,r).

Rearranging the terms leads to the following equation:

ϕ ′′xx(x,r)−

(r+ µZ

α − r+ µY

1−α

)ϕ ′

x(x,r)+(

µY µZ

(1−α)α − r+ µY

1−αr+ µZ

α

)ϕ(x,r)= 0.

(9.25)

This is a second-order differential equation with constant coefficients and its generalsolution is

ϕ(x,r) =C1(r)eR1(r)x +C2(r)eR2(r)x,

where C1(r) and C2(r) are two numbers depending on r to be determined andR1(r) ≥ 0 and R2(r) < 0 are the distinct roots of the corresponding characteristicequation given by (9.24).

Since O(x) ≥ x/(1−α), it is clear that O(x) tends to infinity with probability 1as x → ∞. So by the dominated convergence theorem, for r > 0,

limx→∞

ϕ(x,r) = limx→∞

E[e−rO(x)

]= E

[limx→∞

e−rO(x)]= 0.

Therefore, C1(r) ≡ 0 and ϕ(x,r) = C2(r)eR2(r)x. Moreover, as O(0) = 0 impliesϕ(0,r) = 1, we see that C2(r)≡ 1 for all r and hence the theorem follows.

This theorem reveals an intuitive but important fact that breakdowns essentiallychange the linear structure of the deterioration. Even though the processing require-ment increases linearly in the time passed, the real occupying time of the job doesnot linearly depend on the start time of the job processing. Let O(x,s) = O(x+αs)denote the occupying time of the job when it starts at time s. Then Theorem 9.3implies

E[e−rO(x,s)] = E[e−rO(x+αs)] = eR2(r)(x+αs) = eαR2(r)sE[e−rO(x)]. (9.26)

However, if O(x,s) were linearly dependent on time s such that O(x,s) = O(x)+δ sfor some δ , as in the traditional model without breakdowns, we would have

E[e−rO(x,s)] = E[e−rO(x+αs)] = eR2(r)(x+αs) = e−αsrE[e−rO(x)]. (9.27)


But since R2(r) = −r, (9.27) contradicts (9.26), and hence the occupying time isnot a linear function of the start time s. This clarifies why the complexity of thescheduling problem increases dramatically when the machine is subject to break-downs.

We next compute some features of O(x) in the case µZ/α > µY/(1−α) (whichensures the regularity of the processing, see Theorem 9.2). It is presented in thefollowing theorem. Though the occupying time does not linearly depend on thestart time as remarked above, its expectation and variance do if the initial processingrequirement is a known constant.

Theorem 9.4. Assume Y and Z are independent exponential random variables withrates µY and µZ, respectively, such that µZ/α > µY/(1−α). Then E[O(x)] = Axand Var(O(x)) = Bx, where

A =µY + µZ

(1−α)µZ −αµYand B =

2µY µZ

((1−α)µZ −αµY )3 .

Proof. Differentiating (9.24) with respect to r, we get

2R2R′2 +

(1

1−α − 1α

)R2 +

(µY + r1−α − µZ + r

α

)R′

2 −2r+ µY + µZ

(1−α)α = 0. (9.28)

Replacing r with 0,(

2R2(0)+µY

1−α − µZ

α

)R′

2(0)+(

11−α − 1

α

)R2(0)−

µY + µZ

(1−α)α = 0. (9.29)

It is easy to check, by (9.24), that R2(0) = 0 when πZ/α ≥ µY/(1−α). Hence (9.29)gives R′

2(0) =−A. It follows that

E[O(x)] =− ∂ϕ(x,r)∂ r

∣∣∣∣r=0

=−R′2(0)x = Ax.

Differentiating (9.28) with respect to r once again,

2(R′

2)2

+ 2R2R′′2 + 2

(1

1−α − 1α

)R′

2 +

(µY + r1−α − µZ + r

α

)R′′

2 −2

α(1−α)= 0.

(9.30)Replacing r with 0,

2(

µY + µZ

αµY − (1−α)µZ

)2

+2

α(1−α)

απZ − (1−α)µY

αµY − (1−α)µZ+

αµY − (1−α)µZ

α (1−α)R′′

2 = 0,

where the second term is the sum of the second and forth terms in (9.30) with rbeing replaced by zero. Solving for R′′

2 in the above equation gives

R′′2(0) =

2µY µZ

((1−α)µZ −αµY )3 = B. (9.31)


Observe that

E[O2(x)] =∂ 2ϕ(x,r)

∂ r2

∣∣∣∣r=0

=∂ 2

∂ r2

[eR2(r)x

]∣∣∣∣r=0

=[eR2(r)x

[R′

2(r)x]2+ eR2(r)xR′′

2(r)x]

r=0

=[R′

2(0)x]2+R′′

2(0)x = E2[O(x)]+R′′2(0)x.

Therefore, Var(O(x)) = R′′2(0)x = Bx. The proof is thus complete.

As a result of Theorem 9.4, it is easy to check the following corollary for stochas-tic initial processing requirement X .

Corollary 9.1. Under the conditions of Theorem 9.4, if the initial processing req-uirement is a random variable X, then

E[O(X)] = AE[X ] and Var(O(X)) = BE[X ]+A2Var(X),

where A and B are as defined in Theorem 9.4.

Proof. The first equality is straightforward and the second can be checked by theformula Var(O(X)) = E[Var(O(X)|X)]+Var(E[O(X)|X ]) = E[BX ]+Var(AX).

Apparently, limµZ→∞

A = 1/(1−α) and limµZ→∞

B = 0. Hence,

limµZ→∞

E[O(X)] =E[X ]

1−α and limµZ→∞

Var(O(X)) =Var(X)

(1−α)2 .

Both equations still hold if replacing µZ → ∞ with µY → 0. Since µZ → ∞ meansthat the downtimes trend to 0 and µY → 0 corresponds to infinite uptimes, in eithercase the model reduces to the situation without breakdowns.

Remark 9.2. When µZ/α = µY/(1−α), (9.29) leads to R′2(0) = ∞. Hence, for any

initial processing requirement x > 0, we have E[O(x)] = −R′2(0)x = ∞. That is,

although the job is processible in this case, its expected processing time will beinfinite, which is to be avoided in practice as well.

9.1.4 Optimal Policies

In this subsection, we address the optimal policy (sequence) that minimizes theexpected makespan of the scheduling problem. We here associate an index i to theparameters such as α,µ , A, B (see Theorem 9.3), and so on, to indicate the jobsfor which the parameters are referred to. For simplicity, we denote a policy by π =1,2, . . . ,n and by Oi(π), i = 1,2, . . . ,n the occupying time of job i (in fact, the ithprocessed job) under the policy π . Further write the completion time of the ith job

as Ci(π), or simplified to Ci, i = 1,2, . . . ,n. Then it is clear that Cmax =Cn(π). Thefollowing formula for the expected makespan under an arbitrary sequence plays animportant role in deriving the optimal policy.

Theorem 9.5. If Zi and Yi follow exponential distributions with rates µZi and µYi

respectively, and µYi/(1−αi)< µZi/αi for all i = 1,2, . . . ,n, then

E[Cmax] =n

∑k=1

n

∏j=k+1

(α jA j + 1)AkE[Xk]. (9.32)

where ∏nj=n+1(α jA j + 1) is set to 1 by convention.

Proof. We conduct the proof by induction argument on n.First, if n = 1, then Cmax = C1 = O1(X). Thus E[Cmax] = E[O1(X)] = A1E[X1]

and (9.32) holds.Next, assume the induction hypothesis that (9.32) holds for n = m. Then we con-

sider n=m+1. In this case, Cmax = Om+1(Xm+1+αm+1Cm)+Cm. By Theorem 9.4,E[Om+1(Xm+1 +αm+1Cm)] = Am+1E[Xm+1 +αm+1Cm]. It follows that

E[Cmax] = Am+1E[Xm+1 +αm+1Cm]+E[Cm]

= Am+1E[Xm+1]+ (αm+1Am+1 + 1)E[Cm].

By the induction hypothesis, we further have

E[Cmax] = Am+1E[Xm+1]+ (αm+1Am+1 + 1)m

∑k=1

m

∏j=k+1

(α jA j + 1)AkE[Xk]

=m+1

∑k=1

m+1

∏j=k

(α jA j + 1)AkE[Xk].

Therefore the theorem is proved by the induction principle.

As a result, a standard interchange argument gives the following optimal policy.

Theorem 9.6. For minimizing the expected makespan, the optimal policy orders thejobs according to nondecreasing values of E[Xk]/αk, k = 1,2, . . . ,n.

Proof. We define a scheduling problem with deteriorations in the traditional senseas follows. There are n jobs which are all available at time zero and subject to deteri-orations. The processing time of job k if starting at time t is AkXk +αkAkt. Then theexpected makespan for this problem is the same as in (9.32). According to the tradi-tional results, see for example Browne and Yechiali (1990) or Alidaee and Womer(1999), the optimal policy is to sequence the jobs according to the nondecreasingorder of AkE[Xk]/αkAk = E[Xk]/αk, k = 1,2, . . . ,n.

Remark 9.3. Under the model formulation, as mentioned before, which includes thetraditional assumption regarding linear deterioration, the optimal policy orders the


jobs as if the machine has no breakdowns. This appears to be a surprising discovery,as it indicates that the breakdowns, even in a job-dependent setting, do not impacton the optimal policy to minimize expected makespan at all.

Theorem 9.7. Under the same conditions as in Theorem 9.5,

Var(Cmax) =n

∑k=1

n

∏j=k+1

(α jA j + 1)2 (A2kVar(Xk)+BkE[Xk]

)

+n−1

∑k=1

n

∑l=k+1

n

∏j=l+1

(α jA j + 1)2αlBl

l−1

∏j=k+1

(α jA j + 1)AkE[Xk]. (9.33)

Proof. We prove the theorem by induction argument again.For n = 1, it is clear that

Var(Cmax) = Var(O1(X1)) = A21Var(X1)+B1E[X1].

which coincides with (9.33). Suppose now that (9.33) holds for n = m. Then con-sider n = m+ 1. Note that Cmax = Om+1(Xm+1 +αm+1Cm)+Cm. Hence

Var(Cmax) = Var(E [Om+1(Xm+1 +αm+1Cm)+Cm|Cm,Xm+1])

+E [Var(Om+1(Xm+1 +αm+1Cm)+Cm|Cm,Xm+1)] .

Since, given Cm and Xm+1,

E [Om+1(Xm+1 +αm+1Cm)+Cm|Cm,Xm+1] = Am+1Xm+1 +(αm+1Am+1 + 1)Cm

and

Var(Om+1(Xm+1 +αm+1Cm)+Cm|Cm,Xm+1) = Bm+1(Xm+1 +αm+1Cm),

we further have

Var(Cmax) = Var(Am+1Xm+1 +(αm+1Am+1 + 1)Cm)+E [Bm+1(Xm+1 +αm+1Cm)] .

By the independence between Xm+1 and Cm, Var(Cmax) can be rewritten as

Var(Cmax) = A2m+1Var[Xm+1]+Bm+1E[Xm+1]+ (αm+1Am+1 + 1)2 Var(Cm)

+αm+1Bm+1E[Cm].


Substituting the induction hypothesis for n = m and the formula of E[Cm] into theabove equality, we obtain

Var(Cmax) = A2m+1Var[Xm+1]+Bm+1E[Xm+1]

+ (αm+1Am+1 + 1)2m

∑k=1

m

∏j=k+1

(α jA j + 1)2 (A2kVarXk +BkE[Xk]

)

+(αm+1Am+1 + 1)2m−1

∑k=1

m

∑l=k+1

m

∏j=l+1

(α jA j + 1)2αlBl

l−1

∏j=k+1


+αm+1Bm+1

m

∑k=1

m

∏j=k+1


Combining the first three and the last two terms respectively, we get

Var(Cmax) =m+1

∑k=1

m+1

∏j=k+1


)

+m−1

∑k=1

m

∑l=k+1

m+1

∏j=l+1

(α jA j + 1)2αlBl

l−1

∏j=k+1


+αm+1Bm+1

m

∑k=1

m

∏j=k+1


=m+1

∑k=1

m+1

∏j=k+1


)

+m

∑k=1

m+1

∑l=k+1

m+1

∏j=l+1

(α jA j + 1)2αlBl

l−1

∏j=k+1


Thus (9.33) holds for n = m+ 1.

Note that µZi = ∞ (or µYi = 0) indicates no breakdowns, the following corollaryis clear from the relation

limµZi→∞

Bi = 0 and limµZi→∞

Ai =1

1−αi,

which coincide with the classical results on the variance of the makespan withoutbreakdowns.

Corollary 9.2. If no breakdowns occur, then

Var(Cmax) =n

∑k=1

n

∏j=k+1

1(1−α j)2

Var(Xk)

(1−αk)2 .

For the minimization of Var(Cmax) with machine breakdowns, however, it is diffi-cult to construct an optimal sequence of jobs, even when all processing requirementsare deterministic quantities, x1,x2, . . . ,xn, say. In particular, we consider the simplest

9.2 Stochastic Model with Learning Effects 339

case with n = 2. Then the only sequences are π1 = 1,2 and π2 = 2,1.The variances of the makespan under π1 and π2, denoted by V1 and V2 respectively,are given by (9.33) as V1 = B1(α2A2 + 1)2x1 +B2x2 and V2 = B2(α1A1 + 2)2x2 +B1x1. Clearly,

V1 ≤V2 ⇐⇒ x1/α1

B2(α1A21 + 2A1)+B1

≤ x2/α2

B1(α2A22 + 2A2)+B2

.

In other words, even in the case n = 2, the optimal policy is not generally given byan index policy. This differs dramatically from the case of minimizing the expectedmakespan.

To understand the computational complexity more precisely, we rewrite (9.33) as

Var(Cmax) =n

∑k=1

n

∏j=k+1

(α jA j + 1)2Bkxk

+n−1

∑k=1

n

∑l=k+1

n

∏j=l+1

(α jA j + 1)2αlBl

l−1

∏j=k+1

(α jA j + 1)Akxk. (9.34)

Consider the classical scheduling problem with job deterioration defined in the proofof the optimal policy for expected makespan in Theorem 9.6. Assume that associ-ated with each job i there is a weight wi. Alternative to the expected makespan, weconsider the total weighted expected completion time ∑i wiCi. Browne and Yechiali(1990) showed that

n

∑l=1

wlCl =n

∑l=1

wl

l

∑k=1

l

∏j=k+1

(α j + 1)xk =n

∑k=1

n

∑l=k

wl

l

∏j=k+1

(α j + 1)xk,

which appears simpler than Var(Cmax) (see (9.34)). It has been shown by Bachmanet al. (2002) that minimizing ∑i wiCi is an NP-hard problem. Thus we conjecturethat the problem of minimizing Var(Cmax) may be NP-hard, too.

9.2 Stochastic Model with Learning Effects

Consider the situation where n jobs, all available at time zero, are to be processedon a single machine with learning ability, the machine can process at most onejob at a time, and preemption of the jobs is not allowed. Write (Pi,Di) for thenominal processing times and due dates for job i, i = 1,2, . . . ,n, and suppose thatP1,P2, . . . ,Pn,D1,D2, . . . ,Dn are mutually independent. Each job i has a positivedeterministic weight wi, i = 1,2, . . . ,n. Due to the learning effects, a job processedlater needs shorter processing time in the way that the true processing time of job i,if scheduled at the r-th position to process, is given by

Pir = grPi, r, i = 1,2, . . . ,n, (9.35)


where

1 = g1 ≥ g2 ≥ · · ·≥ gn > 0 (9.36)

indicate the learning effects. For convenience, we denote the learning effect model(9.35) by ℓe. A policy π = (π1,π2, . . . ,πn) is a sequence permuting the integers1,2, . . . ,n to indicate the order to process the jobs, such that πk = i if job i is thek-th to be processed. The problem is to find an optimal sequence π∗ that minimizescertain performance measure over all sequences π .

9.2.1 Optimal Policies with Learning Effects

We first introduce necessary notation. Given any policy π = (π1,π2, . . . ,πn), wedenote by Ci = Ci(π) the completion time of job i, Li = Li(π) = Ci − Di thelateness, and Ti = Ti(π) = maxLi(π),0 the tardiness of job i. Further writeCmax = max1≤i≤nCi(π), Lmax = max1≤i≤nLi(π) and Tmax = max1≤i≤nTi(π)for the makespan, maximum lateness and maximum tardiness, respectively. Fora policy π = (π1, . . . ,πr,πr+1, . . . ,πn) = (s, i, j,s′), where πr = i, πr+1 = j, ands = (π1, . . . ,πr−1) and s′ = (πr+2, . . . ,πn) are partial sequences of the first r − 1and the last n− r − 1 jobs, respectively. We use π ′ = (π1, . . . ,πr+1,πr, . . . ,πn) =(s, j, i,s′) to indicate the sequence obtained by interchanging the two adjacent jobsi and j in π . Write C =C(π) for the completion time of the job just before job i insequence π , so that

Ci(π) =C+ grPi, Cj(π) =C+ grPi + gr+1Pj,

Cj(π ′) =C+ grPj, Ci(π ′) =C+ grPj + gr+1Pi.

For any two random variables X and Y , with cdfs FX(x) and FY , densities fX andfY if X and Y are continuous (or probabilities for discrete case), and hazard ratefunctions hX and hY if X and Y are positive and continuous, respectively, recall thatX is said to be less than or equal to Y :

(i) In the usual stochastic order (X ≤st Y ) if FX(x)≥ FX(x) for all x ∈ (−∞,+∞);(ii) In likelihood-ratio order (X ≤lr Y ) if f (t)/g(t) decreases in t over the union

of the supports of X and Y (here b/0 is taken to be ∞ whenever b ≥ 0), orequivalently, f (u)g(v)≥ f (v)g(u) for all u ≤ v;

(iii) In hazard-rate order (X ≤hr Y ) if hY (t)≤ hX(t) for all t ≥ 0; and(iv) In increasing convex order (X ≤icx Y ) if E[φ(X)] ≤ E[φ(Y )] for all increasing

and convex function φ .

Note the well-known implications

X ≤lr Y =⇒ X ≤hr Y =⇒ X ≤st Y =⇒ E[X ]≤ E[Y ],

see, e.g., Shaked and Shanthikumar (2007).


In the following text, when we say processing times are ordered by order “a”,e.g., X ≤a Y (it can be replaced by st, lr, hr and icx), we implicitly suppose that Xand Y can be ordered by ≤a.

We begin with some easy results. Write Cπi = ∑ij=1 Pπ j for the nominal com-

pletion time of the ith processed job. According to Chang and Yao (1993), forany function f , f (Cπ1 ,Cπ2 , . . . ,Cπn) can be minimized in the usual stochastic order(increasing convex order, the expectation order) by a sequence in nondecreasinglikelihood-ratio order (hazard-rate order, the usual stochastic order) if f is increas-ing (increasing and supermodular, separable and increasing), where a function f issupermodular if f (x∨ y)+ f (x∧ y) ≥ f (x)+ f (y) for any x,y ∈ Rn, and f is sepa-rable and increasing if f (x1,x2, . . . ,xn) = ∑n

i=1 hi(xi) for some increasing functionshi, i = 1,2, . . . ,n. For the learning effect model defined by (9.35) and (9.36), notethe decreasing order of gi, i = 1,2, . . . ,n and the relation

Cπi =i

∑j=1

g jPπ j =i

∑j=1

g j(Cπ j − Cπ j−1) =i−1

∑j=1

(g j − g j+1)Cπ j + giCπi.

We thus have the following theorem, which is similar to and can be deduced straight-forwardly from Theorem 4.1 in Chang and Yao (1993).

Theorem 9.8. Under the learning effects model defined by (9.35) and (9.36), for anyincreasing function f , f (Cπ1 ,Cπ2 , . . . ,Cπn) is minimized in the usual stochastic orderby any sequence in nondecreasing likelihood-ratio order of the nominal processingtimes.

Remark 9.4. By this theorem, if the nominal processing times can be ordered bylikelihood ratio orders, then the completion times (Cπ1 ,Cπ2 , . . . ,Cπn) can be jointlyminimized in the usual stochastic order by an SEPT sequence. Also, this givesstochastic ordering for makespan Cπn , weighted flowtime and weighted discountedflowtime when the weights are agreeably ordered.

The following theorem presents a result on maximum lateness scheduling, whichis an immediate result of Chang and Yao (1993, Theorem 4.4). For the relevantresults on maximum lateness without learning effects, see Cai et al. (2007b), Wuand Zhou (2008) and the references therein.

Theorem 9.9. Under the learning effects model (9.35) and (9.36), when process-ing times can be ordered in the likelihood ratio sense, Pi ≤lr Pi+1, i = 1, . . . ,n− 1,and due dates can be agreeably ordered in the hazard rate sense, Di ≤hr Di+1,then the maximum lateness Lmax is minimized in the usual stochastic order by thesequence 1,2, . . . ,n in nondecreasing likelihood-ratio order of the nominal pro-cessing times.

Remark 9.5. Note that the maximum tardiness is given by Tmax = max(0,Lmax),hence Pr(Tmax ≤ t) = Pr(Lmax ≤ t)It≥0. It follows that under the assumptions ofTheorem 9.9, Tmax is also stochastically minimized by the sequence in nondecreas-ing likelihood-ratio order of the processing times Pk,1 ≤ k ≤ n.


In the next theorem, we investigate optimal policies for the objectives E[Cmax]and Var(Cmax), where we do not require likelihood ratio order. The easy proof usingan interchange argument is omitted.

Recall the optimality of SEPT for minimizing the makespan Cmax in the usualstochastic order when nominal processing times can be sorted according to thelikelihood-ratio orders. The following theorem states that to minimize E[Cmax], oneneeds only sort the jobs according to their expected nominal processing times. Thiscan be compared with the classical situations of stochastic scheduling without learn-ing effect, in which it is commonly known that E[Cmax] are policy-free constants.

Theorem 9.10. Under the learning effects model (9.35) and (9.36), E[Cmax] can beminimized by the SEPT rule and Var(Cmax) can be minimized by the sequence in thenon-decreasing order of the variances of the nominal processing times.

Remark 9.6. If 1 = g1 ≤ g2 ≤ · · · ≤ gn, one has a deterioration version. Then itcan easily be shown that the optimal policy should order the jobs according tononincreasing values of E[Pk], k = 1,2, . . . ,n, i.e., according to the LEPT (longestexpected processing times first) rule.

The SEPT rule also minimizes the expected total completion time.

Theorem 9.11. E[∑nk=1 Ck] can be minimized by the SEPT rule under the learning

effects model (9.35) and (9.36).

Proof. Assume E[Pi]≥ E[Pj]. Then we have

n

∑l=1

Cπl (π) =n

∑l=1

l

∑k=1

gkPπk =n

∑k=1

(n− k+ 1)gkPπk .

Thus

E

[n

∑l=1

Cπl (π′)

]−E

[n

∑l=1

Cπl (π)]=[(n−r+1)gr−(n−r)gr+1

](E[Pj]−E[Pi]

)≤ 0.

Therefore, repeating this interchange argument for all jobs not sequenced accordingto the SEPT rule will prove the theorem.

Remark 9.7. Clearly, when 1 = g1 ≤ g2 ≤ · · ·≤ gn (i.e., position deterioration), theSEPT rule is again optimal if (n−r+1)gr−(n−r)gr+1 ≥ 0 for all r ∈ 1,2, . . . ,n.On the other hand, the LEPT rule is optimal if (n− r+ 1)gr − (n− r)gr+1 ≤ 0 forall r ∈ 1,2, . . . ,n.

It is well-known that the weighted shortest expected processing time (WSEPT)rule yields the optimal schedule for the classical expected total weighted comple-tion time problem. That is, sequencing the jobs in nondecreasing order of E[Pi]/wi isoptimal, where wi is the weight for jobs i, i = 1,2, . . . ,n. This is however not gener-ally the case in the presence of learning effects. For example, consider the instance

with n = 2, P1 ∼ U(0,2), P2 ∼ exp(0.5), g2 = 0.6, w1 = 2 and w2 = 6. Then theexpected total weighted completion time of the sequence (2,1), under the WSEPTrule, yields a performance measure of 17.2, whereas the alternative sequence (1,2)generates a performance measure of 15.2. The following theorem shows that if theSEPT sequence is the same as the WSEPT sequence, then that sequence is optimalfor the weighted flowtime.

Theorem 9.12. If E[Pi] < E[Pj] implies E[Pi]/wi ≤ E[Pj]/wj for all the jobs i andj, then E

[∑n

k=1 wkCk]

under the learning effects model can be minimized by theWSEPT rule.

Proof. Assume the sequence π = (π1,π2, . . . ,πn) = (s, i, j,s′) with jobs i and j inpositions r and r + 1 respectively, and π ′ = (s, j, i,s′). Then an easy computationgives

n

∑k=1

wπkCπk (π) =n

∑k=1

wπk

k

∑l=1

glPπl =n

∑l=1

glPπl

n

∑k=l

wπk .

Write temporarily δ =n∑

k=r+2wπk . Then E[Pi]≥ E[Pj] and E[Pi]/wi ≥ E[Pj]/wj imply

E

[n

∑l=1

wlCl(π)]−E

[n

∑l=1

wlCl(π ′)

]

= grE[Pi](wi +wj + δ )+ gr+1E[Pj](wj + δ )− grE[Pj](wj +wi + δ )− gr+1E[Pi](wi + δ )

= (gr + gr+1)(wi +wj + δ )(E[Pi]−E[Pj])+ gr+1wiwj

(E[Pi]

wi−

E[Pj]

wj

)≥ 0.


The following theorem reveals the efficiency of the WSEPT rule with respect toan optimal policy in terms of the worst-case bound as shown below.

Theorem 9.13. Let π∗ be an optimal sequence to minimize E[∑wiCi] under thelearning effects model and πw the W SEPT sequence. Then

E[∑wiCi(πw)]

E[∑wiCi(π∗)]≤ g−1

n .

Proof. Without loss of generality, assume E[P1]/w1 ≤ E[P2]/w2 ≤ · · · ≤ E[Pn]/wn.Let π = (π1,π2, . . . ,πn) be an arbitrary sequence. Then Cπk(π) = ∑k

l=1 glPπl andhence

n

∑k=1

wπkCπk(π) =n

∑k=1

wπk

k

∑l=1

glPπl =n

∑l=1

n

∑k=l

wπk glPπl .

In particular, Ck(πw) = ∑kl=1 glPl under the WSEPT rule πw = (1,2, . . . ,n), and

hence

E

[n

∑k=1

wkCk(πw)

]= E

[n

∑l=1

n

∑k=l

wkglPl

]≤ E

[n

∑l=1

n

∑k=l

wkPl

]. (9.37)

Moreover, because

E

[n

∑k=1

wπkCπk(π)]≥ gnE

[n

∑l=1

n

∑k=l

wπk Pπl

]≥ gnE

[n

∑l=1

n

∑k=l

wkPl

],

minimizing over π gives

E[∑wiCi(π∗)

]≥ gnE

[n

∑l=1

n

∑k=l

wkPl

]. (9.38)

The theorem then follows from (9.37) and (9.38).

Consider now the problem of minimizing the expected weighted discountedflowtime E

[∑wi(1− e−δCi)

]. It is well known that the WDSEPT (weighted dis-

counted SEPT) rule, which sequences the jobs in decreasing order of the ratiowiE[e−δPi ]/(1− E[e−δPi ]), gives an optimal solution to the classic stochastic ver-sion of the problem. However, this rule does not yield an optimal sequence underthe learning effects model in general. For example, consider the instance with n= 2,P1 ∼ exp(0.25), P2 ∼ exp(0.5), w1 = 4, w2 = 1, g2 = 0.5, and δ = 0.5. It is not diffi-cult to calculate that the expected weighted sum of the discounted completion timesof the sequence (1,2) from the WDSEPT rule generates a performance measure of40/9, whereas the sequence (2,1) gives the optimal value of 17/4. The followingtheorem provides a ratio bound for the performance measure of WDSEPT rule tothe optimal policy, and its proof is similar to that of Theorem 9.13.

Theorem 9.14. Let π∗ be an optimal sequence to minimize E[

∑wi(1− e−δCi)]

andπwdsept the WDSEPT sequence. Then

E[

∑wi(1− e−δCi(π∗))]

E[

∑wi(1− e−δCi(πwdsept))] ≤ g−1

n .

9.2.2 Consideration of Unreliable Machines

We now turn to stochastic scheduling problems on a single unreliable machine sub-ject to no-loss (preemptive-resume) breakdowns.

The stochastic process of breakdowns is characterized by a sequence of indepen-dent and identically distributed nonnegative random pairs Yk,Zk∞

k=1, with Yk,Zkrepresenting the kth durations of the uptime and downtime of the machine, respec-tively, k ≥ 1. Furthermore, we assume that the uptimes Yk are independent of thedowntimes Zk with µ = E[Zk] < +∞ and σ2 = E[Z2

k ] < +∞. Define a counting


processing N(t) : t ≥ 0 associated with the random sequence Yk+∞k=1 such that

N(t) = supk ≥ 0 : Sk ≤ t, where S0 = 0 and Sk =Y1 + · · ·+Yk, k ≥ 1, representingthe total uptime prior to the kth breakdown. Then, under a policy π , the completiontime of job i can be expressed as

Ci(π) = Ri(π)+N(Ri(π))

∑k=0

Zk (9.39)

where Ri(π) = ∑k∈Bi(π) gπk Pk is the total processing time of the jobs sequencedbefore and including job i under policy π . Compared with the models in the previoussection, the second item ∑N(Ri(π))

k=0 Zk in Eq. (9.39) stands for the impact of machinebreakdowns.

Firstly, we find the optimal policies for the maximum lateness, the makespan, themaximum tardiness and the total completion time under fairly general conditionson the associated renewal process N(t) : t ≥ 0. All optimal policies stochasticallyminimize these objective functions. It can be easily verified, by similar arguments,that Theorem 9.8 and Remark 9.4 remain valid under these general conditions,because the breakdown process is assumed to be independent of the jobs being pro-cessed (see also the arguments on machine breakdowns below Theorem 13.C.3 ofRighter (1994)).

If the machine breakdown process N(t) : t ≥ 0 is a Poisson process with ratea, then E[N(t)] = at and Var(N(t)) = at. In this special case, we can obtain theresults parallel to those stated in Theorems 9.10–9.12 by the similar arguments. Forinstance,

E[Cmax(π)

]= (1+ µa)E[Rn] = (1+ µa)E

[n

∑k=1

gπkPk

].

Consequently, the problem of minimizing the expected makespan E[Cmax] under thecurrent environment can be solved by the SEPT rule from Theorem 9.10.

Finally, we give an optimal policy about the problem of minimizing the varianceof the makespan, Var(Cmax), under certain compatible conditions.

Theorem 9.15. If the means and variances of the jobs satisfy the agreeability con-dition that E[Pi]≤ E[Pj]⇐⇒ Var(Pi)≤ Var(Pj) for all jobs i and j, then the optimalpolicy sequence to minimize Var(Cmax) is to schedule jobs according to the SEPTrule, or equivalently, in nondecreasing order of the variances Var(Pk) of the pro-cessing times.

Proof. Let α = (1+ µa)2 and β = aσ . By the assumptions on Pk,Yk,Zk and thePoisson distribution of N(t), it is easy to derive

Var(Cmax) = α(

n

∑k=1

g2πk

Var(Pk)

)+β

(n

∑k=1

gπk E[Pk]

). (9.40)


A standard interchange argument for all jobs that are not sequenced according to theSEPT rule will prove the theorem.

Remark 9.8. The agreeability condition required in the above theorem may be satis-fied by a host of well-known distribution families. For instance, the Poisson family,exponential family, uniform family over interval [0,ck] and so on.

Chapter 10More Stochastic Scheduling Models

This chapter discusses some other scheduling problems and models that do not fallinto the categories presented in Chaps. 2–9. Section 10.1 considers the problem tominimize a random variable of performance measure under stochastic order, whichproduces stronger results than the common approach of minimizing the expectedvalue of the measure. Section 10.2 introduces the concept of “team-work tasks”, inwhich each job is processed by a team of different processors working on designatedcomponents of the job, and derives a number of corresponding optimal schedulingpolicies. Section 10.3 is devoted to investigate the scheduling problem involving theproduction and delivery of perishable products such as vegetables and sea foods, anddevelop the optimal waiting and sequencing decisions under appropriate models.

10.1 Optimization Under Stochastic Order

In this section, we address a single-machine stochastic scheduling problem tostochastically minimize the objective function of maximum lateness or maximumweighted lateness. The performance measure based on maximum lateness has beenconsidered by a number of authors in the literature; see for example, Jackson (1955),Sarin et al. (1991), Zhou and Cai (1997), and Cai and Zhou (2005), among others.The focus of this section is, however, on the problem to optimize the performancemeasure under stochastic order defined in Sect. 1.2. Under such a stochastic order,a random variable X is considered as (stochastically) smaller than another randomvariable Y if X is always more likely than Y to take on smaller values. Therefore, theoptimal schedule under stochastic order is always more likely to produce a smallervalue of the objective function than any other schedule. Such a desired property isnot generally available for the optimal schedule that minimizes the mean value ofthe objective function.


347

348 10 More Stochastic Scheduling Models

Scheduling problems involving stochastic order have been studied by someresearchers. Brown and Solomon (1973) considered the problem of optimal issuingpolicies under stochastic order, which is equivalent to a scheduling problem.Shanthikumar and Yao (1991) proposed a bivariate characterization which isextremely useful for interchange arguments in scheduling. They considered theproblem of minimizing the total flowtime and obtained optimization results withlikelihood-ratio ordered processing times. Chang and Yao (1993) further demon-strated this theory, and applied some stochastic rearrangement inequalities to obtainsolutions to the stochastic counterpart of many classical deterministic schedulingproblems. Boxma and Forst (1986) showed that when processing times are stochas-tically ordered and due dates are independent and identically distributed (i.i.d.), theSEPT (shortest expected processing time) rule minimizes the expected number oftardy jobs. Chang and Yao (1993) permitted the rearrangement of weights and pro-cessing times separately. In the case of agreeable due dates, so that the SEPT rule isidentical to the SEDD (shortest expected due date) rule – also known as the EEDD(earliest expected due date) rule, SEPT minimizes certain classes of functions oflateness or tardiness.

The main results presented in this section are the optimal sequences for theproblem of minimizing the maximum lateness (or weighted maximum lateness) ina number of situations, including: (A) The likelihood ratios of the processing timesand the hazard rates of the due dates meet an agreeability condition; (B) The duedates are exponentially distributed with rates agreeable with the likelihood ratiosof the processing times and the weights; and (C) The processing times and the duedates are exponentially distributed.

In Sect. 10.1.1 next, we specify the basic problems and assumptions. ThenSect. 10.1.2 presents the main results on stochastic minimization of maximumlateness and maximum weighted lateness. More delicate results with exponentiallydistributed processing times and due dates are derived in Sect. 10.1.3. The exposi-tion of this section is mainly based on Cai et al. (2007b).

10.1.1 Basic Problem

We study the following problem: A set of n jobs are to be processed on a singlemachine, which are all available at time zero. The processing times Pi of jobs i,i = 1,2, . . . ,n, are independent random variables. Each job i has a due date Di. Thedue dates D1, . . . ,Dn are independent random variables and independent of Pi.The machine can process at most one job at a time. We consider the maximumlateness:

ML(π) = max1≤i≤n

(Ci(π)−Di) .

10.1 Optimization Under Stochastic Order 349

and the maximum weighted lateness:

MWL(π) = max1≤i≤n

Wi (Ci(π)−Di) .

The problem is to find an optimal sequence π∗ such that

ML(π∗)≤st ML(π) for all π , (10.1)

or

MW L(π∗)≤st MWL(π) for all π , (10.2)

where “≤st” represents the stochastic order defined in Sect. 1.2. Such a π∗ is said tostochastically minimize ML(π) (or MW L(π)).

By the properties of stochastic order, the solution to (10.1) or (10.2) also mini-mizes E[ML(π)] or E[MWL(π)] (cf. Sect. 1.2.2).

10.1.2 Stochastic Minimization of Maximum Lateness

It has been known that in a deterministic environment, the maximum latenessML(π) is minimized by the EDD (Earliest Due Date) rule. That is, the optimalsequence π∗ is in nonincreasing order of the deterministic due dates Di, regardlessof the processing times. As a result, if Pi are random variables but Di remaindeterministic, then the EDD rule minimizes ML(π) almost surely (with probabil-ity 1) for any π , which implies (10.1) with π∗ = EEDD. Thus when the due datesare deterministic, the EDD rule remains optimal under stochastic order. But it wasunclear what would happen if the due dates are also stochastic. In the following wewill show that when the due dates Di are random variables, the optimal schedulewill depend on the processing times, and give the optimal solution under certainconditions.

First we state a lemma regarding a characterization of the likelihood-ratio order,which will play a crucial rule in the proof of our first theorem. It is a result fromTheorem 1.C.14 of Shaked and Shanthikumar (1994).

Lemma 10.1. Let X and Y be two independent random variables and φ1(u,v),φ2(u,v) be two bivariate real-valued functions. If X ≤lr Y and u ≤ v impliesφ1(u,v)≤ φ2(u,v) and φ1(u,v)+φ1(v,u)≤ φ2(u,v)+φ2(v,u), then

E[φ1(X ,Y )]≤ E[φ2(X ,Y )].

Remark 10.1. Brown and Solomon (1973) provided a lemma with regard to apairwise interchange of likelihood-ratio ordered distributions, which is anothercharacterization for the likelihood-ratio order and is similar to Theorem 1.C.13 of

Shaked and Shanthikumar (1994). It is however insufficient to prove our results, asour objective functions involve random elements other than the processing times,such as the due dates and weights.

Theorem 10.1. Let P1, . . . ,Pn be independent random processing times and D1, . . . ,Dn be independent random due dates, independent of Pi. If the Pi can belikelihood-ratio ordered, the Di can be hazard-rate ordered, and the orders satisfythe following agreeability condition:

Pi ≤lr Pj ⇐⇒ Di ≤hr D j for all i, j ∈ 1, . . . ,n,

then the maximum lateness ML(π) is stochastically minimized by the sequence innondecreasing likelihood-ratio order of the processing times Pi, or equivalently,in nondecreasing hazard-rate order of the processing times Di (i.e., SEPT orEEDD).

Proof. By the independence of the processing times P1, . . . ,Pn and the due datesD1, . . . ,Dn, we have

Pr(ML(π)< x) = E[

Pr(ML(π)< x|P1, . . . ,Pn)]

= E[

Pr(

max1≤i≤n

(Ci(π)−Di)< x∣∣∣P1, . . . ,Pn

)]

= E[Pr(Ci(π)−Di < x, i = 1, . . . ,n

∣∣P1, . . . ,Pn)]

= E

[n

∏i=1

Pr(

Di >Ci(π)− x∣∣∣P1, . . . ,Pn

)]

= E

[n

∏i=1

Fi (Ci(π)− x)

]= E

[n

∏i=1

Fi (Ci(π)− x)

], (10.3)

where the expectation is with respect to Pi,

Fi(x) = Pr(Di > x) = exp−∫ x∨0

0λi(t)dt

, i = 1, . . . ,n, (10.4)

and λi(t) is the hazard rate function of Di.

For an arbitrary job sequence π = (. . . ,r,s, . . .), let π ′ = (. . . ,s,r, . . .) denote thesequence after interchanging two neighboring jobs r and s in π . Let C be the com-pletion time of the job sequenced just before r under π . Then we have

Cr(π) =C+Pr, Cs(π) =C+Pr +Ps,

Cs(π ′) =C+Ps, Cr(π ′) =C+Pr +Ps, and

Ci(π ′) =Ci(π) for i = r,s.

Hence by (10.3),

Pr(ML(π) < x) = E [Fr (C+Pr − x) Fs (C+Pr +Ps− x)H(x)] (10.5)

and

Pr(ML(π ′)< x) = E [Fs (C+Ps− x) Fr (C+Ps+Pr − x)H(x)] , (10.6)

whereH(x) = H(x;Pr,Ps,Pi, i = r,s) = ∏

i=r,sFi(Ci(π)− x). (10.7)

Suppose Pr ≤lr Ps and Dr ≤hr Ds. Then λr(t) ≥ λs(t) for all t, which togetherwith (10.4) lead to

Fr(C+ u− x)Fr(C+ u+ v− x)

= exp−∫ (C+u−x)∨0

0λr(t)dt +

∫ (C+u+v−x)∨0

0λr(t)dt

= exp∫ (C+u+v−x)∨0

(C+u−x)∨0λr(t)dt

≥ exp

∫ (C+u+v−x)∨0

(C+u−x)∨0λs(t)dt

=Fs(C+ u− x)

Fs(C+ u+ v− x).

Hence

Fr(C+ u− x)Fs(C+ u+ v− x)≥ Fs(C+ u− x)Fr(C+ u+ v− x) (10.8)

and by the same arguments,

Fr(C+ v− x)Fs(C+ u+ v− x)≥ Fs(C+ v− x)Fr(C+ u+ v− x). (10.9)

As Fr(x) is a nonincreasing function, (10.9) also implies

Fr(C+ u− x)Fs(C+ u+ v− x)≥ Fs(C+ v− x)Fr(C+ u+ v− x) if u ≤ v. (10.10)

Given Pi = pi for i = r,s, and for fixed x, define

φ1(u,v) = Fs(C+ v− x)Fr(C+ u+ v− x)H(x;u,v,pi, i = r,s)

andφ2(u,v) = Fr(C+ u− x)Fs(C+ u+ v− x)H(x;u,v,pi, i = r,s).

Note that H(x;u,v,pi, i = r,s) = H(x;v,u,pi, i = r,s) since Pr and Ps are inter-changeable in (10.7). Hence (10.8) implies φ2(u,v) ≥ φ1(v,u) and (10.9) impliesφ2(v,u) ≥ φ1(u,v), consequently, φ2(u,v)+ φ2(v,u) ≥ φ1(u,v) + φ1(v,u). Further-more, (10.10) shows that φ2(u,v)≥ φ1(u,v) for u ≤ v. It then follows from Lemma3.1 that, conditional on Pi, i = r,s, Pr ≤lr Ps =⇒ E[φ1(Pr,Ps)] ≤ E[φ2(Pr,Ps)]. Thusby (10.5) and (10.6), Pr ≤lr Ps and Dr ≤hr Ds imply

Pr(ML(π) < x) = E [E [Fr(C+Pr − x)Fs(C+Pr +Ps− x)H(x)|Pi, i = r,s]]= E

[E[φ2(Pr,Ps)|Pi, i = r,s]

]≥ E

[E[φ1(Pr,Ps)|Pi, i = r,s]

]

= E [E [Fs(C+Ps− x)Fr(C+Pr +Ps− x)H(x)|Pi, i = r,s]]= Pr(ML(π ′)< x) for all x ∈ (−∞,∞)

=⇒ ML(π)≤st ML(π ′).

This shows that an optimal solution to minimize ML(π) stochastically is given bySEPT (EEDD).

Remark 10.2. When Pi = pi are deterministic and Di follow a common distribu-tion, Eqs. (10.5) and (10.6) reduce to

Pr(ML(π) < x) = F(C+ pr − x)F(C+ pr + ps − x)H(x)

andPr(ML(π ′)< x) = F(C+ ps − x)Fr(C+ pr + ps− x)H(x).

Hence it is easy to see that ML(π) ≤st ML(π ′) ⇐⇒ pr ≤ ps. Thus unlike in thecase of deterministic due dates, the optimal solution does depend on the processingtimes when Di’s are random, even in this very special case. Therefore, the optimalsolution can no longer be given by any rule independent of Pi, such as the EEDDrule, without an agreeability condition.

When the due dates Di are exponentially distributed, we have the followingresult on the maximum weighted lateness.

Theorem 10.2. Suppose that the due dates D1, . . . ,Dn are exponentially distributedwith rates ν1, . . . ,νn respectively. If Pi and Wi can be likelihood-ratio orderedand and satisfy the following agreeability condition with Di:

Pi ≤lr Pj ⇐⇒ νi ≥ ν j ⇐⇒Wi ≥lr Wj for all i, j ∈ 1, . . . ,n,

then the maximum weighted lateness MW L(π) is stochastically minimized by EEDD,or equivalently SEPT, or the largest expected weight first rule.

Proof. Similar to (10.3) and (10.5)–(10.7), we get

Pr(MW L(π)< x) = E

[n

∏i=1

Fi (Ci(π)−Xi)

], (10.11)

where Xi = x/Wi and the expectation is with respect to Pi and Wi,

Pr(MW L(π)< x) = E [Fr (C+Pr −Xr) Fs (C+Pr +Ps−Xs)H(x)] (10.12)

and

Pr(MW L(π ′)< x) = E [Fs (C+Ps−Xs) Fr (C+Ps+Pr −Xr)H(x)] , (10.13)

whereH(x) = H(x;Pr,Ps,Pi,Wi, i = r,s) = ∏

i=r,sFi(Ci(π)−Xi).

Suppose that Pr ≤lr Ps, νr ≥ νs and Wr ≥lr Ws. Let Pi = pi, i = r,s and Wi =wi, i = 1, . . . ,n be given, and for now suppose wr ≥ ws. Then Xi = xi = x/wi,i = 1, . . . ,n. Define

φ1(u,v) = Fs(C+ v− xs)Fr(C+ u+ v− xr)H(x;u,v,pi,wi, i = r,s)

and

φ2(u,v) = Fr(C+ u− xr)Fs(C+ u+ v− xs)H(x;u,v,pi,wi, i = r,s).

Let u≤ v. Since Di is exponential with rate νi, Fi(x) = exp−νi(x∨0), i = 1, . . . ,n.Thus φ1(u,v)≤ φ2(u,v) if

νr[(C+u+v−xr)∨0−(C+u−xr)∨0]≥ νs[(C+u+v−xs)∨0−(C+v−xs)∨0].(10.14)

First consider the case x ≥ 0. Then 0 ≤ xr = x/wr ≤ x/ws ≤ xs. If C + u ≥ xr andC+ v ≥ xs, then (10.13) becomes νrv ≥ νsu, which holds since u ≤ v and νr ≥ νs.If C + u ≥ xr and C + v < xs, then (10.13) becomes νrv ≥ νs(C + u+ v− xs)∨ 0,which also holds because C+ u+ v− xs u. Finally, as xr ≤ xs and νr ≥ νs imply, νr(C+u+v−xr)∨0≥ νs(C+u+v−xs)∨0, (10.13) still holds when C+u< xr and C+v< xs.In summary, (10.13) holds for x ≥ 0. When x < 0, (10.13) reduces to νrv ≥ νsu andhence holds as well. Therefore, we have shown that u ≤ v =⇒ φ1(u,v) ≤ φ2(u,v).So given Pi = pi, i = r,s and Wi = wi with wr ≥ ws, it follows from Lemma 3.1that E[φ1(Pr,Ps)]≤ E[φ2(Pr,Ps)] when Pr ≤lr Ps.

By similar (and in fact simpler) arguments, we can show that φ1(u,v)≤ φ2(v,u)and φ1(v,u)≤ φ2(u,v), so that φ1(u,v)+φ1(v,u)≤ φ2(u,v)+φ2(v,u). Thus accord-ing to Lemma 3.1, conditional on Pi = pi, i = r,s, E[φ1(Pr,Ps)] ≤ E[φ2(Pr,Ps)]when Pr ≤lr Ps. Consequently, by (10.11) and (10.12), conditional on Wr = wr ≥ws =Ws,

Pr(MW L(π)< x) = E[E[Fr(C+Pr − xr)Fs(C+Pr +Ps− xs)H(x)|Pi,Wii=r,s]

]

= E[E[φ2(Pr,Ps)|Pi,Wii=r,s]

]≥ E

[E[φ1(Pr,Ps)|Pi,Wii=r,s]

]

= E[E[Fs(C+Ps− xs)Fr(C+Pr +Ps− xr)H(x)|Pi,Wii=r,s]

]

= Pr(ML(π ′)< x) for all x ∈ (−∞,∞). (10.15)

Next, define

ψ1(u,v) = Pr(MW L(π ′)< x|Ws = u,Wr = v)

and

ψ2(u,v) = Pr(MW L(π)< x|Ws = u,Wr = v).

Then (10.14) shows that ψ1(u,v)≤ ψ2(u,v) whenever u ≤ v. It can also be checkedthat ψ1(u,v)+ψ1(v,u) ≤ ψ2(u,v)+ψ2(v,u) for u ≤ v. Thus applying Lemma 3.1again gives

Ws ≤lr Wr =⇒ E[ψ1(Ws,Wr)]≤ E[ψ2(Ws,Wr)].

To summarize, we have shown that Pr ≤lr Ps, νr ≥ νs and Wr ≥lr Ws imply

Pr(MW L(π)< x) = E[Pr(MW L(π)< x|Ws,Wr)] = E[ψ2(Ws,Wr)]

≥ E[ψ1(Ws,Wr)] = E[Pr(MW L(π ′)< x|Ws,Wr)]

= Pr(MW L(π ′)< x) for all x ∈ (−∞,∞).

It follows that an optimal solution to minimize MW L(π) stochastically is given bythe sequence in nondecreasing likelihood-ratio order of Pi, or equivalently, bySEPT, EEDD, or the largest mean weight first rule.

10.1.3 Optimal Solutions with Exponential Processing Timesand Due Dates

In this subsection, we show that when both Pi and Di are exponentially dis-tributed, the agreeability condition in Theorem 10.1 can be substantially relaxed inorder to minimize the maximum lateness ML(π) stochastically.

To begin, by (10.5) and (10.6), we can write

Pr(ML(π)< x) = E [Pr(ML(π)< x|P1, . . . ,Pn)]

= E[Fr(C+Pr − x)Fs(C+Pr +Ps− x)H(x)

]

= E[E[Fr(Pr − a)Fs(Pr +Ps− a)H(x)

∣∣Pi, i = r,s]], (10.16)

where H(x) is given by (10.7), which depends on P1, . . . ,Pn. Similarly,

Pr(ML(π ′)< x) = E[E[Fs(C+Pr − x)Fr(C+Pr +Ps− x)H(x)

∣∣Pi, i = r,s]].

(10.17)

Let B = B(π) and A = A(π) denote the sets of jobs scheduled respectively beforeand after jobs r,s under π . Write H(x) = λ1(x)λ2(x), where

λ1(x) = ∏j∈B

Fj(Cj(π)− x) and λ2(x) = ∏j∈A

Fj(Cj(π)− x).

Note that λ1(x) is independent of Pr and Ps, as Cj(π) only involves those jobs se-quenced before jobs r and s. But λ2(x) still depends on Pr and Ps. Thus by (10.15)and (10.16), Pr(ML(π)< x)≥ Pr(ML(π ′)< x) holds if

E[Fr(C+Pr − x)Fs(C+Pr +Ps− x)λ2(x)

∣∣Pi, i = r,s]

≥ E[Fs(C+Pr − x)Fr(C+Pr +Ps− x)λ2(x)

∣∣Pi, i = r,s]

(10.18)

for every instance of Pi, i = r,s.

Define A j = x−Cj(π)+Pr +Ps for j ∈ A. Then, as Cj(π)−Pr −Ps representsthe sum of the processing times over jobs up to job j under π , excluding r and s,A j is independent of Pr, Ps and A j < x−C. Given Pi = pi, i = r,s, A j = a j anda = x−C are fixed, with a j < a. We can now write

λ2(x) = ∏j∈A

Fj(Cj − x) = ∏j∈A

Fj(Pr +Ps− a j) (a j < a = x−C). (10.19)

Thus (10.17) holds if

E[

Fr(Pr − a)Fs(Pr +Ps− a)∏j∈A

Fj(Pr +Ps− a j)

]

≥ E[

Fs(Pr − a)Fr(Pr +Ps− a)∏j∈A

Fj(Pr +Ps− a j)

]when a j < a. (10.20)

When Pi and Di are exponentially distributed, we have the following result.

Theorem 10.3. Suppose that P1, . . . ,Pn are independent and exponentially distributedwith rates µ1, . . . ,µn respectively, D1, . . . ,Dn are independent and exponentiallydistributed with rates ν1, . . . ,νn respectively, and Pi are independent of Di.Let ν(1) ≤ · · ·≤ ν(n) denote the ordered values of ν1, . . . ,νn. If νi and µi satisfythe following condition:

the sequence νi(νi + µi), i = 1, . . . ,n has the same order as

the sequence νi(νi + µi +A0), i = 1, . . . ,n for some A0 ≥n

∑i=3

ν(i), (10.21)

then ML(π) is stochastically minimized by the sequence in the nonincreasing orderof νi (νi + µi).

Proof. As Di is exponential with rate νi, we can write Fi(x) = Pr(Di > x) as

Fi(x) = 1x<0+ e−νix1x≥0, i = 1, . . . ,n, (10.22)

where 1E is the indicator of an event E which takes value 1 if E occurs and 0otherwise.

Let π = (. . . ,r,s, . . .) and π ′ = (. . . ,s,r, . . .). By (10.20) we have

Fr(Pr − a)Fs (Pr +Ps− a)

=[1Pr<a+ e−νr(Pr−a)1Pr≥a

][1Pr+Ps<a+ e−νs(Pr+Ps−a)1Pr+Ps≥a

]

= 1Pr+Ps<a+ e−νs(Pr+Ps−a)1Pr<a≤Pr+Ps+ e−(νr+νs)(Pr−a)−νsPs 1Pr≥a

and Fj(Pr +Ps−a j) = 1Pr+Ps<a j+e−ν j(Pr+Ps−a j)1Pr+Ps≥a j, j ∈ A. It follows that

Fr(Pr − a)Fs(Pr +Ps− a)∏j∈A

Fj(Pr +Ps− a j)

= ∏j∈A

Fj(Pr +Ps− a j)1Pr+Ps<a+ e−(νs+νA)(Pr+Ps)+νsa+(νa)A1Pr<a≤Pr+Ps

+ e−(νr+νs+νA)Pr−(νs+νA)Pse(νr+νs)a+(νa)A1Pr≥a, (10.23)

where νA = ∑ j∈A ν j and (νa)A = ∑ j∈A ν ja j. Similarly,

Fs(Pr − a)Fr(Pr +Ps− a)∏j∈A

Fj(Pr +Ps− a j)

= ∏j∈A

Fj(Pr +Ps− a j)1Pr+Ps<a+ e−(νr+νA)(Pr+Ps)+νra+(νa)A1Ps<a≤Pr+Ps

+ e−(νr+νs+νA)Ps−(νr+νA)Pr e(νr+νs)a+(νa)A1Ps≥a. (10.24)

Let

E1 = E[e−(νs+νA)(Pr+Ps)+νsa1Pr<a≤Pr+Ps

],

E2 = E[e−(νr+νs+νA)Pr−(νs+νA)Ps+(νr+νs)a1Pr≥a

],

E ′1 = E

[e−(νr+νA)(Pr+Ps)+νra1Ps<a≤Pr+Ps

],

E ′2 = E

[e−(νr+νs+νA)Ps−(νr+νA)Pr+(νr+νs)a1Ps≥a

],

where the expectations are with respect to Pr and Ps only, and a is a fixed real value.Then by (10.21) and (10.22), we can see that (10.18) holds if

E1 +E2 −E ′1 −E ′

2 ≥ 0 for all a ∈ (−∞,∞). (10.25)

First consider the case of a ≥ 0. Define

I (a; µr,µs) =

⎧⎨

⎩

e−µsa − e−µra

µr − µsif µr = µs,

ae−µsa if µr = µs,

(note that I(a; µr,µs) = I(a; µs,µr)≥ 0 for any a ≥ 0). Then

E1 = E[e−(νs+νA)(Pr+Ps)+νsa1Pr<a≤Pr+Ps

]

=∫ ∫

u<a≤u+ve−(νs+νA)(u+v)+νsaµre−µruµse−µsvdudv

= µrµseνsa∫ a

0

(∫ ∞

a−ue−(νs+µs+νA)vdv

)e−(νs+µr+νA)udu

=µrµseνsa

νs + µs +νA

∫ a

0e−(νs+µs+νA)(a−u)e−(νs+µr+νA)udu

=µrµse−(µs+νA)a

νs + µs +νA

∫ a

0e−(µr−µs)udu =

µrµse−νAa

νs + µs +νAI(a; µr,µs). (10.26)

By the same argument we obtain

E ′1 =

µrµse−νAa

νs + µs +νAI(a; µr,µs). (10.27)

Furthermore,

E2 = µrµse(νr+νs)a∫ ∞

ae−(νr+νs+µr+νA)udu

∫ ∞

0e−(νs+µs+νA)vdu

=µrµse−(µr+νA)a

(νr +νs + µr +νA) (νs + µs+νA)

and similarly,

E ′2 =

µrµse−(µs+νA)a

(νr +νs + µs+νA) (νr + µr +νA).

Therefore, write e−(µr+νA)a = e−νAa(e−µra − e−µsa)+ e−(µs+νA)a, we get

E2 −E ′2 =

µrµse−νAa(e−µra − e−µsa)

(νr +νs + µr +νA)(νs + µs +νA)+

µrµse−(µs+νA)a

(νr +νs + µr +νA)(νs + µs+νA)

− µrµse−(µs+νA)a

(νr +νs+ µs +νA)(νr + µr +νA)

=µrµs(µs − µr)e−νAa

(νr +νs + µr +νA)(νs + µs +νA)I(a; µr,µs)

+µrµs [νr(νr + µr +νA)−νs(νs + µs+νA)]e−(µs+νA)a

(νr +νs+ µr +νA)(νs + µs +νA)(νr +νs + µs +νA)(νr + µr +νA).

Note that

1(νs + µs+νA)

− 1(νr + µr +νA)

+1

(νr +νs + µr +νA)(νs + µs +νA)

=νr(νr + µr +νA)−νs(νs + µs+νA)

(νr + µr +νA) (νs + µs+νA) (νr +νs + µr +νA). (10.28)

Combining (10.24)–(10.26) with (10.28) we get

E1 +E2 −E ′1 −E ′

2

=[νr(νr + µr +νA)−νs(νs + µs +νA)]e−νAa

(νr + µr +νA)(νs + µs+νA)(νr +νs + µr +νA)I(a; µr,µs)

+µrµs [νr(νr + µr +νA)−νs(νs + µs +νA)]e−(µs+νA)a

(νr +νs + µr +νA)(νs + µs+νA)(νr +νs + µs+νA)(νr + µr +νA).

Thus (10.23) holds for a ≥ 0, and so does (10.18), if

νr(νr + µr +A)≥ νs(νs + µs+A) for all 0 ≤ A ≤ ∑i=r,s

νi. (10.29)

If a < 0, then

Fr(Pr − a)Fs(Pr +Ps− a)λ2(x) = E−(νr+νs+νA)Pr−(νs+νA)Ps e(νr+νs)a+(νa)A ,

Fs(Ps − a)Fr(Pr +Ps− a)λ2(x) = E−(νr+νs+νA)Ps−(νr+νA)Pr e(νr+νs)a+(νa)A .

Hence (10.18) holds if

0 ≤ E[e−(νr+νs+νA)Pr−(νs+νA)Ps

]−E

[e−(νr+νs+νA)Ps−(νr+νA)Pr

]

=µrµs

(νr +νs + µr +νA)(νs + µs+νA)− µrµs

(νr +νs + µs +νA)(νr + µr +νA)

=µrµs [νr(νr + µr +νA)−νs(νs + µs+νA)]

(νr +νs + µr +νA)(νs + µs+νA)(νr +νs + µs+νA)(νr + µr +νA),

which is implied by (10.27).

Now, under condition (10.19), if νr(νr + µr)≥ νs(νs + µs), then

νr(νr + µr +A0)≥ νs(νs + µs+A0) for some A0 ≥ ∑i≥3

ν(i).

If νr ≥ νs, then

νr(νr+µr+A) = νr(νr+µr)+νrA≥ νs(νs+µs)+νsA= νs(νs+µs+A) ∀A≥ 0.

If νr < νs, then for any 0 ≤ A ≤ ∑i=r,s νi ≤ ∑i≥3 ν(i) ≤ A0,

νr(νr + µr +A) = νr(νr + µr +A0)−νr(A0 −A)

≥ νs(νs + µs+A0)−νs(A0 −A) = νs(νs + µs +A).

In either case we see that (10.27) holds when νr(νr + µr) ≥ νs(νs + µs). Thusνr(νr +µr)≥ νs(νs +µs) implies Pr(ML(π)< x)≥ Pr(ML(π ′)< x) for all x undercondition (10.19). It follows that ML(π) is stochastically minimized by the sequencein nonincreasing order of νi(νi + µi).

The following two corollaries are straightforward consequences of Theorem 10.3.

Corollary 10.1. If νi and µi satisfy the condition:

νi(νi + µi)≥ ν j(ν j + µ j)⇐⇒ νi ≥ ν j for all i, j ∈ 1, . . . ,n,

then ML(π) is stochastically minimized by the sequence in nonincreasing order ofνi. In other words, the EEDD rule is optimal.

Corollary 10.2. If νi and µi satisfy the condition:

νi ≥ ν j ⇐⇒ νi + µi ≥ ν j + µ j for all i, j ∈ 1, . . . ,n,

then ML(π) is stochastically minimized by the sequence in nonincreasing orderof νi.

The next corollary shows that the EEDD rule is optimal under a different typeof sufficient conditions: the variations among the rates of processing times aredominated by those among the due dates.

Corollary 10.3. If νi and µi satisfy following condition:∣∣µi − µ j

∣∣≤∣∣νi −ν j

∣∣ for all i, j ∈ 1, . . . ,n, (10.30)

then ML(π) is stochastically minimized by the sequence in nonincreasing orderof νi.

Proof. Let νi ≥ ν j. Then by condition (10.30), for any A ≥ 0,

ν j(µ j − µi)≤ ν j|µi − µ j|≤ (νi +ν j +A)(νi −ν j)+ µi(νi −ν j)

=⇒ (νi +ν j +A)(νi −ν j)+ µi(νi −ν j)−ν j(µ j − µi)≥ 0

=⇒ (νi +ν j +A)(νi −ν j)+νiµi −ν jµ j ≥ 0

=⇒ ν2i +νiν j +νiA−νiν j −ν2

j −ν jA+νiµi −ν jµ j ≥ 0

=⇒ νi(νi + µi +A)−ν j(ν j + µ j +A)≥ 0.

Thus the sequence νi(νi + µi +A) has the same order as νi for any A ≥ 0. Theconclusion of Corollary 10.3 then follows immediately from Theorem 10.3.


Remark 10.3. If µi and νi have the same order, then the result of Theorem 10.3becomes a special case of Theorem 10.1 or 10.2. The condition of Theorem 10.3,however, is substantially weaker than the same order between µi and νi. As asimple example, let µ1 = 1,µ2 = 2, . . . ,µn = n and ν1 = n,ν2 = n− 1, . . . ,νn = 1,which are in totally opposite order. Then νi(νi + µi +A) = νi(n+A) has thesame order as νi for any A ≥ 0. Hence according to Theorem 10.3, the optimalsequence is in nonincreasing order of νi, despite the opposite orders between µiand νi.

Moreover, in condition (10.19), the rates νi of the due dates tend to have muchgreater influence in the determination of optimal sequence than the rates µi ofthe processing times, as demonstrated by the above example and Corollary 10.3.Intuitively, this may be viewed as close to the case of deterministic due dates, wherethe due dates determine the optimal sequence, while the processing times play norole.

10.2 Team-Work Task Scheduling

10.2.1 Team-Work Tasks

The majority of the scheduling literature assumes that each task (job) can onlybe processed by one processor at a time. With the rapid development of moderntechnologies, this assumption is no longer valid in many realistic industry settingsin which a task may require the simultaneous processing of a group of processors.An example is parallel computing systems, where a task may be executed by anumber of processors simultaneously. As a result, multi-processor task schedul-ing (MPTS) has emerged as an active line of research in recent years. See, e.g.,Krawczyk and Kubale (1985), Blazewicz et al. (1993, 1994), Drozdowski (1996),Amoura et al. (2002), Bianco et al. (1995), and Cai et al.(1998, 2000). Generally,two groups of models have been considered in the MPTS studies. The first groupof models address the situations where a number of processors are to be assignedto process a task, assuming that each task can be processed on any set of proces-sors with the same number of processors and the processing time of the task isdetermined by the number of processors only. The second group of models considerthe situations where a set of processors are to be assigned to each task, where theprocessing speed for a task depends not only on how many processors, but also onwhich processors processing the task.

There is, however, a common assumption in the MPTS literature, that all proces-sors start and finish at the same time when they process a same task. In other words,if a processor has completed its work on a component of a task, it has to remain idleto wait until all other components of the same task are completed before it can workon the next task. In the real world, however, this restriction is not always necessary.

10.2 Team-Work Task Scheduling 361

In many problems, one can find that a task requires, in fact, a team of processorswhich work as follows: Each processor in the team is needed to process a specificcomponent of the task, but each processor can join or leave the team as and whenneeded. In other words, the processors assigned to process a task neither have tostart, nor have to finish, at the same time with respect to a same task. For ease ofreference, we refer to such scheduling problems as team-work task scheduling. Thissection studies this class of scheduling problems.

Team-work task scheduling can be seen in situations such as parallel processingand computing, production, and service industries. Consider a problem in reliabilitytest. To estimate the mean time to failure (MTTF) of a certain type of product,usually a number of the same product are tested on a set of machines in parallel.The settings of the testing machines may be different, in order to test the reliabilityunder different conditions and environments. A product will remain on a machineuntil it experiences a failure. Once the failure times of the product on all machinesare recorded, the MTTF of the product can be estimated, using statistical theory(Kuo et al. 1998). Therefore, the completion time of the testing for a product equalsthe latest failure time of the product on all testing machines. This is a problem thatrequires a task (product) to be processed by a set of processors (machines), wherethe starting and completing times of the testing operation on any processor do notdepend on other processors. Besides, a processor can proceed to another testing taskas soon as it finishes the current one. This is a team-task scheduling problem, if anumber of products are to be tested and the objective is to optimize the use of thetesting facilities, through minimization, e.g., the sum of the completion times ofthe testing tasks. Another problem, which relates to reliability but has a differentconcern, is to achieve a high level of reliability in computer control systems, byexecuting redundant copies of a program on different processors and voting on thefinal control decision (Gehringer et al. 1987; Hopkins et al. 1978). If a numberof such programs are to be run on a set of parallel processors, then the problembecomes a team-task scheduling problem too.

This section reports some studies on team-work task scheduling problems. Bothdeterministic and stochastic models are addressed. Deterministic team-task schedul-ing models are interesting for their own sake, which we introduce in Sect. 10.2.2;Besides, some results on the deterministic models are valuable and useful for thestudies of stochastic models, which we present in Sect. 10.2.3. The expositions ofthis section are mainly based on Cai and Zhou (2004).

10.2.2 The Deterministic Model

Suppose that there are a set of tasks: T = T1,T2, . . . ,Tn, which are to be processedby a set of processors: P = P1,P2, . . . ,Pm. Each task Ti consists of m componentsti( j), j = 1, . . . ,m, and the component ti( j) of Ti is to be processed by processor


Pj, which requires a processing time pi( j) ≥ 0. Note that by allowing pi( j) = 0 forsome j, a task may actually consist of less than m components, and thus requiresonly a subset of P to process. Let Si = j : pi( j) > 0. Then S j is the actual set ofprocessors required to process task Ti.

Furthermore, we make the following basic assumptions:

• Job preemption is not allowed; that is, as soon as a component is started on aprocessor, it has to be processed until completion without being interrupted byany other task.

• The component ti( j) can only be processed by processor Pj, with a positiveprocessing time pi( j), for j ∈ Si. In other words, the processors P1,P2, . . . ,Pmcannot substitute each other.

• For any task Ti, its components ti( j), j ∈ Si, can be processed either simultane-ously, or at different times.

• The processing of each component of a task is independent of other componentsof the task. Hence each component can be processed at any time without havingto wait for the completion of any other component(s).

• A task Ti is completed if and only if all components of Ti have been completed.In other words, if component ti( j) is completed at time ci( j), j ∈ Si, then thecompletion time of Ti is

Ci = maxci( j) : j ∈ Si.

Let π j = (i1( j), . . . , in( j)) be a permutation of positive integers 1,2, . . . ,n,which determines the order of processing the components t1( j), t2( j), . . . , tn( j)on processor Pj, with ik( j) = l if and only if tl( j) is the k-th to be processed by Pj,j = 1, . . . ,m. Then the vector of permutations π = (π1,π2, . . . ,πm) determines theorder of all components of T1, . . . ,Tn to be processed by P1, . . . ,Pm. As definedin Sect. 1.3.3, π specifies a static policy.

Each task Ti is assigned a function cost fi(Ci), depending on its completion timeCi, where fi(t) is a general nondecreasing function defined on [0,∞).

We consider the problem to minimize the general cost of all tasks of the followingform:

GC(π) = F( f1(C1(π), f2(C2(π), . . . , fn(Cn(π))) (10.31)

with respect to policy π , where F(x1,x2, . . . ,xn) is a real-valued function definedon ℜn and is nondecreasing in each coordinate xi, i = 1, . . . ,n. Typical cases ofthe function F commonly seen in the literature include F(x1, . . . ,xn) = max1≤i≤n xi,which leads to the maximum cost:

MC(π) = max1≤i≤n

fi(Ci(π)); (10.32)

and F(x1, . . . ,xn) = ∑ni=1 xi, resulting in the total cost:

TC(π) =n

∑i=1

fi(Ci(π)). (10.33)

Note that by the structure of the general cost GC(π) in (10.31), its minimiza-tion does not require inserting any idle time on a processor before processing anycomponent. Hence it suffices to minimize GC(π) with respect to policy π withoutidle times. Furthermore, each π j can be an arbitrary permutation of 1, . . . ,n andso π1, . . . ,πm need not be the same, meaning that we allow different orders to pro-cess the components on different processors. But the following theorem shows thatwe only need consider those policies with π1 = π2 = · · · = πm. As a result, we cantake a policy π as a single permutation. This reduces the number of policies to beconsidered from (n!)m to n!.

Theorem 10.4. There exists an optimal policy π = (π1,π2, . . . ,πm) to minimizeGC(π) such that π1 = π2 = · · ·= πm.

Proof. Let π = (π1, . . . ,πm) be an arbitrary policy. We will show that if underπ , component tr(1) is processed before ts(1) on processor P1, whereas tr(2) isprocessed after ts(2) on P2 (need not be consecutive in either case), then we canfind another policy π ′ = (π ′

1, . . . ,π ′m) such that Ci(π ′) ≤ Ci(π) for all i = 1, . . . ,n,

and tr(1), ts(1) are processed on P1 in the same order as tr(2), ts(2) on P2. Thetheorem then follows immediately.

Let π1 = (i1(1), . . . , in(1)) and π2 =(i1(2), . . . , in(2)). Then there are integers k, kand l, l such that 1 ≤ k < k ≤ n, 1 ≤ l < l ≤ n. Denote ik(1) = r, ik(1) = s, il(2) = sand il(2) = r, so that we have

π1 = (. . . , r, . . . , s, . . . ),k k

π2 = (. . . , s, . . . , r, . . . )l l

(10.34)

If pr(1) = 0, then any movement of tr(1) on P1 has no effects on the completiontime of any component. In particular, we can move tr(1) to any position after ts(1)without changing the completion times at all. Let π ′ denote the policy after such achange from π . Then Ci(π ′) =Ci(π) for all i, and under π ′

1, the order of tr(1), ts(1)on P1 is the same as that of tr(2), ts(2) on P2 (both with ts before tr). Similarly, wecan handle the case with some of pr(2), ps(1) and ps(2) being zero.

Now turn to the situation of positive pr(1), pr(2), ps(1) and ps(2). We treatthe following two cases separately, where ci( j,π) denotes the completion time ofcomponent ti( j) under π .

Case 1. cs(1,π) ≤ cr(2,π). In this case, on P1 we move tr(1) to the kth positionand each component after tr(1) up to ts(1) one step forward. Then the π1 in (10.34)changes to

π ′1 = (. . . , s, r, . . . )

k− 1 k

More precisely, we define π ′1 = (i′1(1), . . . , i

′n(1)) with

i′α(1) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

iα(1), α = 1, . . . ,k− 1, k+ 1, . . . ,n;iα+1(1), α = k, . . . , k− 2;s, α = k− 1;r, α = k

Also define π ′2 = π2, . . . ,π ′

m = πm. Then we have cr(1,π ′) = cs(1,π) and cr(2,π ′) =cr(2,π). But as cs(1,π)≤ cr(2,π), we get

maxcr(1,π ′),cr(2,π ′) ≤ cr(2,π)≤ maxcr(1,π),cr(2,π).

Similarly, since cs(1,π ′)< cr(1,π ′) = cs(1,π) and cs(2,π ′) = cs(2,π), we have

maxcs(1,π ′),cs(2,π ′)≤ maxcs(1,π),cs(2,π).

Furthermore, since the components tα(1) between tr(1) and ts(1) under π are movedforward under π ′, we have ci(1,π ′) < ci(1,π) for these components. All other ti( j)are in the same positions under π ′ as they are under π , thus ci( j,π ′) = ci( j,π) forthese components. Putting these together we get

Ci(π ′) = maxci( j,π ′) : j ∈ Si≤ maxci( j,π) : j ∈ Si=Ci(π) (10.35)

for all i = 1, . . . ,n, and the order of tr(1), ts(1) on P1 is the same as that of tr(2),ts(2) on P2 (both with ts before tr).

Case 2. cs(1,π)> cr(2,π). Then we change π2 to π ′2 in the same way as we did in

changing π1 to π ′1 in Case 1. That is, we define π ′

2 = (i′1(2), . . . , i′n(2)) with

i′α(2) =

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

iα(2), α = 1, . . . , l − 1, l+ 1, . . . ,n;iα+1(2), α = l, . . . , l − 2;r, α = l − 1;s, α = l,

so thatπ ′

2 = (. . . , r, s, . . . )l− 1 l

and π ′1 = π1,π ′

3 = π3, . . . ,π ′m = πm. Then similar arguments as in Case 1 lead to

cs(2,π ′) = cr(2,π)< cs(1,π), cs(1,π ′) = cs(1,π),

hence maxcs(1,π ′),cs(2,π ′) ≤ cs(1,π) ≤ maxcs(1,π),cs(2,π). Furthermore,cr(2,π ′) < cs(2,π ′) = cr(2,π), cr(1,π ′) = cr(1,π), and ci( j,π ′) ≤ ci( j,π) for allother ti( j). Thus (10.35) holds again. Also, on both P1 and P2, tr is processed be-fore ts. This completes the proof.

Due to Theorem 10.4, one will only need to consider policy π with π1 = · · ·= πmfor the deterministic team-work task scheduling problem. In the remainder of thissubsection, we will concentrate on two kinds of objectives: The minimization ofmaximum cost MC(π) in (10.32), and the minimization of the total cost TC(π)in (10.33). Both measures are covered by the general function GC(π), as we haveindicated earlier.

We first consider the maximum cost MC(π). In the case where the cost functionscan be ordered, a simple optimal sequence to minimize MC(π) exists and is givenin the following theorem.

Theorem 10.5. If the cost functions fi(t) can be ordered such that f1(t)≥ f2(t)≥· · ·≥ fn(t) for all t ≥ 0, then π∗ = (1,2, . . . ,n) is an optimal policy. In other words,a policy to sequence the tasks in the nonincreasing order of fi(t) : i = 1, . . . ,n oneach and every processor is optimal to minimize the MC(π).

Proof. Given any policy π = (. . . ,r,s, . . . ) with task Tr immediately preceding Ts,consider the policy after interchanging Tr and Ts: π ′ = (. . . ,s,r, . . . ). Suppose thatfr(t)≤ fs(t) for all t ≥ 0. We will show that MC(π ′)≤ MC(π).

Let C( j) denote the completion time of the component sequenced just beforetr( j) under π (or equivalently, just before ts( j) under π ′). Then we have

Cr(π) = maxC( j)+ pr( j) : j ∈ Sr, (10.36)

Cs(π) = maxC( j)+ pr( j)+ ps( j) : j ∈ Ss, (10.37)

Cs(π ′) = maxC( j)+ ps( j) : j ∈ Ss, (10.38)

Cr(π ′) = maxC( j)+ ps( j)+ pr( j) : j ∈ Sr, (10.39)

andCi(π ′) =Ci(π) for i = r,s. (10.40)

By (10.39) there exists a j∗ ∈ Sr such that Cr(π ′) = C( j∗) + ps( j∗) + pr( j∗). Ifj∗ ∈ Ss, then

Cr(π ′) =C( j∗)+ ps( j∗)+ pr( j∗)≤ maxC( j)+ pr( j)+ ps( j) : j ∈ Ss=Cs(π)

and so fr(t)≤ fs(t) (t ≥ 0) implies fr(Cr(π ′))≤ fs(Cr(π ′))≤ fs(Cs(π))≤ MC(π).If j∗ /∈ Ss, then ps( j∗) = 0 and, as j∗ ∈ Sr, by (10.36) we get

Cr(π ′) =C( j∗)+ pr( j∗)≤ maxC( j)+ pr( j) : j ∈ Sr=Cr(π).


It follows that fr(Cr(π ′)) ≤ fr(Cr(π)) ≤ MC(π) in any case. Furthermore, it isclear from (10.37) to (10.38) that Cs(π ′) ≤Cs(π) so that fs(Cs(π ′)) ≤ fs(Cs(π)) ≤MC(π). These together with (10.40) show that fi(Ci(π ′))≤ MC(π) for i = 1, . . . ,n,hence MC(π ′) ≤ MC(π). Thus we have shown that MC(π ′) ≤ MC(π) if fr(t) ≤fs(t) for all t ≥ 0. The theorem then follows immediately.

Theorem 10.5 can be easily applied to a number of problems, such as minimiza-tion of the maximum weighted completion time, the maximum lateness to a due date,and the maximum weighted tardiness, etc. Take the maximum lateness for example.

Maximum Lateness (ML):

ML(π) = max1≤i≤n

(Ci(π)− di),

where di is the due date of task Ti. Take fi(t) = t − di. Then MC(π) is equal toML(π), and f1(t)≥ f2(t)≥ · · ·≥ fn(t) for all t ≥ 0 if and only if d1 ≤ d2 ≤ · · ·≤ dn.Hence an application of Theorem 10.5 shows that an optimal policy to minimize MLis given by the sequence in the nondecreasing order of the due dates di,1 ≤ i ≤ n.This result is the Earliest Due Date (EDD) rule.

Remark 10.4.

(i) In the case of the maximum cost, since a nondecreasing fi(t) implies

fi(Ci) = fi

(max

1≤ j≤mci( j)

)= max

1≤ j≤mfi(ci( j)),

we have MC(π) = max1≤ j≤m

max1≤i≤n

fi(ci( j)). Therefore, in order to minimize MC(π),

one can minimize max1≤i≤n

fi(ci( j)) separately for each processor j and then obtain

an overall optimal policy π⋄. This approach, however, is less efficient than thatbased on Theorem 10.4 (see (ii) below).

(ii) Note that the optimal policy π⋄ of (i) above may consist of different tasksequences on different processors, whereas Theorem 10.4 ensures that all pro-cessors have a same optimal sequence. When the conditions of Theorem 10.5are met, the optimal policy π⋄ obtained by individual processor minimizationcoincides with the one given by Theorem 10.5.

We next consider the total cost TC(π) in (10.33), with focus on the totalcompletion time TCT (π) = ∑Ci(π). In the single-processor case, it has been well-known that TCT (π) = ∑n

i=1 Ci(π) is minimized by the shortest processing time(SPT) rule. This rule, however, can only be extended to team-work case under somespecial circumstances, such as same ordering of processing times on all processors,or the existence of a dominant component, as stated in the next theorem.


Theorem 10.6.

(a) If p1( j)≤ p2( j)≤ · · ·≤ pn( j), for all j = 1, . . . ,m, then π∗ = (1,2, . . . ,n) is anoptimal policy to minimize TCT (π) = ∑n

i=1 Ci(π).

(b) If there exists a dominant component j such that pi( j) ≥ pi(k), for all i =1, . . . ,n and k = j, then an optimal policy to minimize TCT (π) = ∑n

i=1 Ci(π)is given by processing tasks in the nondecreasing order of pi( j), i = 1, . . . ,n(the processing times of the dominant component).

Proof. Part (a) follows from (10.36) to (10.40) together with a straightforwardargument of pairwise task interchange, similar to that for the single-processor case.For Part (b), note that Ci = ci( j), i = 1, . . . ,n, hence the problem is equivalent to thesingle-processor problem with processing times pi( j), i = 1, . . . ,n, and the resultof Part (b) then follows immediately.

10.2.3 The Stochastic Model

Now we turn to stochastic versions of the problem. We consider the situations whereeither the processing times pi( j) : i = 1, . . . ,n; j = 1, . . . ,m are random variables,or the cost functions of tasks Ti, fi(t), t ≥ 0, i= 1,2, . . . ,n are stochastic processes,or both. Each fi(t) is nondecreasing in t ≥ 0 almost surely (with probability 1).Note that the cost function may become a stochastic process if it involves a randomparameter. Examples include, among others, the case with fi(t) = t − di, where diis a random due date for task i, and the case with fi(t) = wit, where wi is the unitcapital holding cost of task i at time t, which fluctuates randomly over time due tochanges in interest rate.

All other settings regarding the problem are the same as those in Sect. 10.2.2.However, as GC(π) is a random variable now, its minimization requires certainstochastic interpretation. We will consider the following three types of minimizationcriteria.

Stochastic Criterion A: Almost sure minimization. Under this criterion we aim tofind an optimal policy π∗ such that GC(π∗) = minπ GC(π) almost surely (or withprobability 1) in the sense that PrGC(π∗)≤ GC(π)= 1 for any policy π .

Stochastic Criterion B: Stochastic order minimization. A policy π∗ is optimalunder Criterion B if and only if GC(π∗)≤st GC(π) for any policy π .


Stochastic Criterion C: Expected cost minimization. A policy π∗ is optimal underCriterion C if and only if E[GC(π∗)] ≤ E[GC(π)] for any policy π , where E[X ]represents the expectation (or mean) of a random variable X .

Remark 10.5.

(i) It is easy to see that Criterion A is stronger than Criterion B, which is in turnstronger than Criterion C. But under Criterion A or B, the optimal solution maynot exist (cf. Example 10.1 below), but it always exists under Criterion C.

(ii) It remains true that no idle times are necessary in the stochastic environment.But Theorem 10.4 is no longer valid when the processing times are random,because in such a situation it is uncertain in the proof of Theorem 10.4 whetherCase 1 (cs(1,π)≤ cr(2,π)) or Case 2 (cs(1,π)> cr(2,π)) would occur – eithercan occur with a positive probability. Thus no single policy π ′ can be foundsuch that Ci(π ′) ≤Ci(π) almost surely for all tasks. As a result, Theorem 10.4cannot be extended in general to the stochastic environment without additionalassumptions. This highlights the distinction between deterministic and stochas-tic environments, and the complexity of the stochastic version of the problem(see Example 10.2 below for further illustration).

We assume that the stochastic scheduling problem we consider here allows onlystatic policies. As in the deterministic model, we also assume that preemption isnot allowed; that is, once the processing of any component of a task is started ona processor, it must continue without interruption until the task is completed. Wewill study the optimal policies of the stochastic problem under the three criteria,respectively. As in the deterministic case, we will mainly focus on the problems witheither the maximum cost or the total completion time objectives. We first considerthe problem with the maximum cost F(x1, . . . ,xn) = maxi xi, and then the problemwith the total completion time TCT = ∑i Ci.

Minimization of the Maximum Cost

With the maximum cost, the following simple result under Criterion A followsimmediately from Theorem 10.5. If the cost functions fi(t) are stochastic pro-cesses satisfying

Pr( f1(t)≥ f2(t)≥ · · ·≥ fn(t)) = 1 for all t ≥ 0, (10.41)

then π∗ = (1,2, . . . ,n) is an optimal policy under Criterion A. In other words, apolicy in the nonincreasing order of fi(t) : i = 1, . . . ,n is optimal to minimize theMC(π) almost surely.

If (10.41) does not hold, however, then minimization under Criterion A is largelyuntenable and either Criterion B or Criterion C will have to be considered. But

minimization under these two criteria is a much more difficult issue in general. In thefollowing we will provide some results which require further conditions. The nexttheorem provides an optimal solution under Criterion B.

Theorem 10.7. Suppose that the cost functions fi(t) are independent stochasticprocesses and for each fixed t, f1(t), . . . , fn(t) have a common distribution. If thereexists a dominant component j∗ = 1, say, such that

pi(1) = maxpi( j),1 ≤ j ≤ m, i = 1, . . . ,n, (10.42)

and p1(1) ≤ · · · ≤ pn(1) almost surely, then π∗ = (1,2, . . . ,n) is an optimal policyunder Criterion B. In other words, a policy in the nondecreasing order of the pro-cessing times of the dominant component minimizes MC(π) stochastically.

Proof. First we fix the values of pi( j) such that p1(1) ≤ · · · ≤ pn(1). Then byTheorem 10.4 there exists an optimal policy with π1 = · · ·= πm (which depends onthe values of pi( j)). Retain all the notation in the proof of Theorem 10.5. Thenby the independence between the cost functions,

PrMC(π)< x= Pr

max1≤i≤n

fi(Ci(π))< x= Pr fi(Ci(π))< x, i = 1, . . . ,n

= Pr f1(C1(π))< x· · ·Pr fn(Cn(π))< x= Pr fr(Cr(π))< xPr fs(Cs(π))< xH(x) (10.43)

where H(x) = ∏i=r,s Pr fi(Ci(π))< x. Similarly,

PrMC(π ′)< x= Pr fs(Cs(π ′))< xPr fr(Cr(π ′))< xH(x) (10.44)

where H(x) is the same as that in (10.43) because of (10.40).

Let pr(1)≤ ps(1). Then from (10.36) to (10.39) together with (10.42), we get

Cr(π) =C(1)+ pr(1)≤C(1)+ ps(1) =Cs(π ′),

Cs(π) =C(1)+ pr(1)+ ps(1) =Cr(π ′).

Therefore, by the assumption on fi(t),

Pr fs(Cs(π ′))< x= Pr fr(Cs(π ′))< x≤ Pr fr(Cr(π))< x,Pr fr(Cr(π ′))< x= Pr fr(Cs(π))< x= Pr fs(Cs(π))< x.

Substituting these into (10.43) and (10.44), we get PrMC(π) < x ≥PrMC(π ′) < x for all x ∈ (−∞,∞). Hence pr(1) ≤ ps(1) implies MC(π) ≤stMC(π ′). Then by interchanging consecutive pairs of tasks under any policy untilreaching π∗, we conclude that π∗ is optimal as long as p1(1) ≤ · · · ≤ pn(1). Butas π∗ is independent of pi( j) and p1(1) ≤ · · · ≤ pn(1) almost surely, we have

shown that Pr(MC(π∗) < x

∣∣pi( j))≥ Pr

(MC(π) < x

∣∣pi( j))

almost surely forany policy π and x ∈ (−∞,∞). Hence by the law of iterated expectation,

Pr(MC(π∗)< x

)= E[Pr

(MC(π∗)< x

∣∣pi( j))]

≥ E[Pr(MC(π)< x∣∣pi( j))] = Pr

(MC(π)< x

),

which proves the theorem.

Remark 10.6. Theorem 10.7 requires a dominant component condition (10.42),which is rather restrictive in general. Nevertheless, there are practical situationswhere such a condition could be met. For example, in the reliability testing problemdescribed in Sect. 10.2.1, it is possible that the testing environment of one machineis so set that the failure time of each task on it is the longest, as compared with othermachines.

Without the condition of a dominant component, the optimal solution to minimizeMC(π) under Criterion B may not exist, as shown by the following example.

Example 10.1. Let n = 3, m = 2, (p1(1), p1(2)) = (2,6), (p2(1), p2(2)) = (5,1),(p3(1), p3(2)) = (3,3). Furthermore, let fi(t) = t − di, where di, i = 1,2,3, areindependent random due dates uniformly distributed over (0,20). Then

Pr( fi(Ci)< x) = Pr(di >Ci−x) =

⎧⎪⎨

⎪⎩

1 if Ci − x ≤ 0,1− (Ci− x)/20 if 0 <Ci − x < 20,0 if Ci − x ≥ 20.

(10.45)

It is easy to see that under policy (1,2,3), C1 = 6, C2 = 7 and C3 = 10; while under(2,1,3), C1 = 5, C2 = 7 and C3 = 10. Hence by (10.43) and (10.45),

Pr(MC(1,2,3)< x) = Pr(d1 > 6− x)Pr(d2 > 7− x)Pr(d3 > 10− x)

≤ Pr(d1 > 5− x)Pr(d2 > 7− x)Pr(d3 > 10− x)

= Pr(MC(2,1,3)< x)

for all real x and with the strict inequality for −10 < x < 6. Hence MC(2,1,3) <stMC(1,2,3), so that (1,2,3) cannot be optimal under Criterion B. Similarly we caneliminate (1,3,2), (2,3,1) and (3,1,2). That leaves π1 = (3,2,1) and π2 = (2,1,3)only. Now, by (10.45),

Pr(MC(π1)< 0) = Pr(d3 > 3)Pr(d2 > 8)Pr(d1 > 10)

=

(1− 3

20

)(1− 8

20

)(1− 10

20

)=

204800

,

Pr(MC(π2)< 0) =(

1− 520

)(1− 7

20

)(1− 10

20

)=

195800

.

Hence Pr(MC(π1)< 0)> Pr(MC(π2)< 0). On the other hand,

Pr(MC(π1)< 5) = Pr(d3 > 3− 5)Pr(d2 > 8− 5)Pr(d1 > 10− 5)

=

(1− 3

20

)(1− 5

20

)=

255400

<270400

=

(1− 2

20

)(1− 5

20

)= Pr(MC(π2)< 5).

Therefore, neither MC(π1) ≤st MC(π2) nor MC(π2)≤st MC(π1). As a result, thereis no optimal solution for MC(π) under Criterion B. This may be explained intu-itively as follows.

The consecutive completion times of the three jobs are (3,8,10) under π1 and(5,7,10) under π2. It is clear that the former is better (shorter completion time)for the first job, while the latter is better for the second job, hence neither π1 norπ2 is better than the other at all time. As a result, there is no optimal solution underCriterion B as the stochastic order requires the same inequality to hold over all time.

Now we turn to Criterion C. Since X ≤st Y implies E[X ]≤ E[Y ], we immediatelyhave the following result.

Corollary 10.4. Under the conditions of Theorem 10.7, E[MC(π)] is minimized bythe SPT rule with respect to the dominant component.

Remark 10.7. Without the conditions of Theorem 10.7, the problem of minimizingE[MC(π)] remains unsolved. Although the optimal solution must exist, the ques-tion is whether and how it can be found under weaker conditions than those ofTheorem 10.7. It is an interesting problem for future study.

If we consider an alternative criterion: minimization of the maximum expectedcost


E[ fi(Ci(π))],

then the following results can be derived.

Theorem 10.8. When the processing times pi( j) are deterministic, assume thatthe mean cost functions mi(t) = E[ fi(t)] exist for t ≥ 0 and i = 1, . . . ,n. If mi(t)can be ordered such that m1(t) ≥ m2(t) ≥ · · · ≥ mn(t) for all t ≥ 0, then π∗ =(1,2, . . . ,n), i.e., a policy in the nonincreasing order of mi(t) : i = 1, . . . ,n, isoptimal to minimize the MEC(π).

Proof. Since pi( j) are deterministic, we have


E[ fi(Ci(π))] = max1≤i≤n

mi(Ci(π)).

As fi(t) is nondecreasing in t almost surely, each mi(t) is a deterministic nonde-creasing function. Hence mi(t) satisfy the conditions of Theorem 10.5 on fi(t).As a result, Theorem 10.8 follows from Theorem 10.5.

For stochastic processing times, additional conditions are required and the argu-ments are more complicated, as shown in the next theorem.

Theorem 10.9. When the processing times pi( j) are stochastic, assume thatpi( j) are independent of fi(t) and there is a dominant component j∗ = 1 (say)such that

pi(1)≥ pi( j) almost surely for all j = 1 and i = 1, . . . ,n. (10.46)

Then, under the conditions of Theorem 10.8 on mi(t), a policy in the nonincreas-ing order of mi(t) : i = 1, . . . ,n is optimal to minimize the MEC(π).

Proof. We first show the existence of an optimal policy π = (π1, . . . ,πm) withπ1 = · · · = πm that minimizes MEC(π). Retain the notation in the proof of The-orem 10.4. Without loss of generality, let π1 = (i1(1), . . . , in(1)) = (1,2, . . . ,n) andπ j = (i1( j), . . . , in( j)), j = 2, . . . ,m. If in(2) < n, then ir(2) = n for some r < n.Define a policy π ′ = (π ′

1, . . . ,π ′m), where π ′

j = (i′1( j), . . . , i′n( j)), j = 1, . . . ,m, suchthat i′n(2) = n and i′α(2) = iα+1(2) for α = r, . . . ,n− 1. All other elements of π ′ arethe same as the corresponding elements of π (i.e., we move ir(2) = n to the end andir+1(2), . . . , in one step ahead). By (10.46) we must have

cn(1,π) =n

∑i=1

pi(1)≥n

∑i=1

pi(2) = cn(2,π ′) almost surely.

Moreover, by the definition of π ′ it is clear that ci( j,π ′)≤ ci( j,π) for (i, j) = (n,2).It follows that Ci(π ′) = maxci( j,π ′) : j ∈ Si ≤ maxci( j,π) : j ∈ Si = Ci(π)almost surely for i = 1, . . . ,n, which implies E[ fi(Ci(π ′))] ≤ E[ fi(Ci(π))], i =1, . . . ,n, since fi(t) is nondecreasing in t almost surly. Thus MEC(π ′) ≤ MEC(π),and so there exists an optimal policy with in(1) = in(2) (as i′n(2) = n = in(1) =i′n(1) in the above arguments). Repeat the same arguments for the remaining tasks1, . . . ,n− 1, we then get the existence of an optimal policy with in−1(1) = in−1(2)and so on, until reaching π1 = π2. The same arguments also apply to π3, . . . ,πm,leading to π1 = · · ·= πm.

We now need only consider policies with π1 = · · · = πm. Retain all the notationin the proof of Theorem 10.5, we can see that (10.36)–(10.40) still hold for randomprocessing times. As a result, Ci(π ′) =Ci(π) for i = r,s and so

E[ fi(Ci(π ′))] = E[ fi(Ci(π))] for i = r,s. (10.47)


Next, from (10.37) to (10.38) we can see that Cs(π ′) ≤ Cs(π) with probability 1.Hence by part (ii) of Lemma 1 in Zhou and Cai (1997), and the assumption thateach mi(t) = E[ fi(t)] is nondecreasing in t ≥ 0, we obtain

E[ fs(Cs(π ′))]≤ E[ fs(Cs(π))]. (10.48)

Furthermore, by (10.46) we can see that Cr(π ′) =Cs(π) almost surely and so

E[ fr(Cr(π ′))] = E[ fr(Cs(π))]. (10.49)

If mr(t)≤ms(t) for all t ≥ 0, then by the independence between fi(t) and pi( j),

E[ fr(Cs(π))|Cs(π) = t] = E[ fr(t)|Cs(π) = t] = E[ fr(t)] = mr(t)≤ ms(t)

= E[ fs(t)] = E[ fs(Cs(π))|Cs(π) = t] for all t ≥ 0,

which implies

E[ fr(Cs(π))] = E[E[ fr(Cs(π))|Cs(π)]

]≤ E

[E[ fs(Cs(π))|Cs(π)]

]

= E[ fs(Cs(π))]. (10.50)

It follows from (10.49) to (10.50) that E[ fr(Cr(π ′))]≤ E[ fs(Cs(π))]. Combining thiswith (10.47) and (10.48), we see that mr(t)≤ ms(t) implies

MEC(π ′) = max1≤i≤n

E[ fi(Ci(π ′))]≤ max1≤i≤n

E[ fi(Ci(π))] = MEC(π)

and so the theorem follows.

An application of Theorems 10.8 and 10.9 is shown below.

Maximum Expected Lateness (MEL):

MEL(π) = max1≤i≤n

E[Ci(π)− di]

where di is a random variable representing the due date of task Ti.

Take fi(t) = t − di, which is a stochastic process as di is a random variable.Then our MEC(π) coincides with MEL(π), and mi(t) = E[ fi(t)] = t −E[di]. Sincem1(t)≥ m2(t)≥ · · ·≥ mn(t) for all t ≥ 0 if and only if E[d1]≤ E[d2]≤ · · ·≤ E[dn],the next corollary follows from Theorems 10.8 and 10.9.

Corollary 10.5. An optimal policy to minimize MEL is given by the sequence in thenondecreasing order of the expected due dates E[di],1 ≤ i ≤ n, either when theprocessing times are deterministic, or when they are stochastic and satisfy (10.46).

Remark 10.9. Theorem 10.8 is valid for deterministic processing times pi( j),while Theorem 10.9 requires additional condition (10.46) for stochastic pi( j).Neither theorem implies the other. When pi( j) are positive random variables, butwithout (10.46), some minor modifications to the proof of Theorem 10.9 can showthat π∗ is optimal under Criterion C among all policies with π1 = · · ·= πm. But thereis no guarantee that π∗ is optimal under Criterion C among all policies (includingthose with unequal π j’s), as demonstrated in the following example.

Example 10.2. Let n = m = 2 and fi(t) = t. Then the π∗ = (π∗1 ,π∗

2 ) as in Theo-rem 10.7 is given by either π∗

1 = π∗2 =(1,2) or π∗

1 = π∗2 =(2,1). For convenience we

write p1 = p1(1), p2 = p2(1), q1 = p1(2) and q2 = p2(2). Suppose that p1, p2,q1,q2are independent random variables, and each of them takes a value of either 1 or 2,each with probability 0.5. Then it is easy to calculate, if π∗

1 = π∗2 = (1,2), that

MEC(π∗) = maxE[C1(π∗)],E[C2(π∗)]= maxE[maxp1,q1],E[maxp1 + p2,q1 + q2]

= E[maxp1 + p2,q1 + q2] = (2)1

16+(3)

12+(4)

716

=278.

Furthermore, let π = (π1,π2) with π1 = (1,2) and π2 = (2,1). Then

MEC(π) = maxE[C1π)],E[C2(π)]= maxE[maxp1,q2 + q1],E[maxp1 + p2,q2]

= E[maxp1,q1 + q2] = (2)14+(3)

12+(4)

14= 3.

Therefore MEC(π)< MEC(π∗), so π∗ is not optimal under the MEC criterion.

The above remark and example again show the distinction between the determin-istic and stochastic environments. They also highlight the distinction between thesingle-processor tasks and team-work tasks, as well as highly complicated nature ofteam-work task scheduling in stochastic environment.

Minimization of the Total Completion Time

We now turn to the problem with the total completion time TCT = ∑iCi in thestochastic environment. The following theorem is a straightforward generalizationof Theorem 10.6.

Theorem 10.10.

(a) If p1( j) ≤ p2( j) ≤ · · · ≤ pn( j) almost surely for all j = 1, . . . ,m, then π∗ =(1,2, . . . ,n) is an optimal policy to minimize the TCT under all three criteria A,B and C.

(b) Under condition (10.46), a sequence in the nondecreasing order of the expectedprocessing times E[pi(1)], i = 1, . . . ,n of the dominant component minimizesthe expected total completion time E[TCT (π)] = ∑n

i=1 E[Ci(π)].

Proof. Under the conditions of Part (a), Theorem 10.6(a) provides that π∗ =(1,2, . . . ,n) is optimal under Criterion A, which also implies Criteria B and C aswell. This proves (a).

For (b), by Theorem 10.9 we only need to consider policies with π1 = · · · = πm.Under such a policy, condition (10.46) also implies Ci = Ci(1), i = 1, . . . ,n, hencethe problem is equivalent to the single-processor problem with processing timespi(1), i = 1, . . . ,n. Consequently, (b) follows from the SEPT (shortest expectedprocessing time) rule for the single-processor problem.

We can relax the ordering among the processing times to stochastic order andobtain the optimal solution under Criterion C. But that needs more delicate argu-ments to prove as neither Theorem 10.4 nor Theorem 10.9 can be applied to ensurea common order on all processors, see the next theorem.

Theorem 10.11. Suppose that the processing times pi( j), i= 1, . . . ,n; j = 1, . . . ,mare independent variables. If p1( j)≤st p2( j)≤st · · ·≤st pn( j) for j = 1, . . . ,m, thenπ∗ = (1,2, . . . ,n) is an optimal policy to minimize the total expected completiontime E[TCT (π)]. In other words, a sequence in nondecreasing stochastic order ofthe processing times is optimal.

Proof. Given a policy π = (π1, . . . ,πm), consider any two tasks r and s with r < s.Define L = j : ts( j) precedes tr( j) under π ,1 ≤ j ≤ m. If L is not empty, thenwe interchange the positions of components tr( j) and ts( j) under π for j ∈ L, anddenote the resultant policy by π ′. The theorem will follow if we can show thatE[TCT (π ′)] ≤ E[TCT (π)]. To do so, let c( j) = the completion time of the com-ponent sequenced just before tr( j) and ts( j) under π j, and d( j) = the sum of theprocessing times of components between tr( j) and ts( j) under π j. Then

Cr(π) = maxc( j)+ ps( j)+ d( j)+ pr( j),c(k)+ pr(k), j ∈ L,k ∈ Lc,Cs(π) = maxc( j)+ ps( j),c(k)+ d(k)+ pr(k)+ ps(k), j ∈ L,k ∈ Lc,Cr(π ′) = maxc( j)+ pr( j),1 ≤ j ≤ m,Cs(π ′) = maxc( j)+ pr( j)+ d( j)+ ps( j),1 ≤ j ≤ m.

Thus by the independence between pi( j), i = 1, . . . ,n; j = 1, . . . ,m, we obtain

E[Cr(π)] =∫ ∞

0Pr(Cr(π)> x)dx =

∫ ∞

0[1−Pr(Cr(π)≤ x)]dx

=∫ ∞

0

[1−∏

j∈LPrc( j)+ ps( j)+ d( j)+ pr( j) ≤ x ∏

k∈LcPrc(k)+ pr(k)≤ x

]dx.

Similarly,

E[Cs(π)]

=∫ ∞

0

[1−∏

j∈LPrc( j)+ ps( j) ≤ x ∏

k∈LcPrc(k)+ d(k)+ pr(k)+ ps(k)≤ x

]dx,

E[Cr(π ′)] =∫ ∞

0

[1−

m

∏j=1

Prc( j)+ ps( j)≤ x]

dx,

E[Cs(π ′)] =∫ ∞

0

[1−

m

∏j=1

Prc( j)+ d( j)+ pr( j)+ ps( j)≤ x]

dx.

It follows that

E[Cr(π ′)]+E[Cs(π ′)]≤ E[Cr(π)]+E[Cs(π)] (10.51)

holds if for all x ≥ 0,

∏j∈L

Prc( j)+ ps( j)+ d( j)+ pr( j) ≤ x ∏k∈Lc

Prc(k)+ pr(k)≤ x

+∏j∈L

Prc( j)+ ps( j)≤ x ∏k∈Lc

Prc(k)+ d(k)+ pr(k)+ ps(k)≤ x

≤m

∏j=1

Prc( j)+ ps( j)≤ x+m

∏j=1

Prc( j)+ d( j)+ pr( j)+ ps( j)≤ x,

or equivalently,

∏j∈L

[Prc( j)+ ps( j)≤ x−Prc( j)+ d( j)+ pr( j)+ ps( j) ≤ x

]

× ∏j∈Lc

Prc( j)+ d( j)+ pr( j)+ ps( j)≤ x

≤ ∏j∈L

[Prc( j)+ pr( j) ≤ x−Prc( j)+ d( j)+ pr( j)+ ps( j)≤ x

]

× ∏j∈Lc

Prc( j)+ ps( j) ≤ x. (10.52)

Since r < s, we have pr( j) ≤st ps( j) for all j and so c( j)+ pr( j) ≤st c( j)+ ps( j)due to the independence between pr( j), ps( j) and c( j). Consequently,

10.3 Scheduling of Perishable Products 377

Prc( j)+ ps( j)≤ x≤ Prc( j)+ pr( j) ≤ x

for x ≥ 0. It is also obvious that

Prc( j)+ d( j)+ pr( j)+ ps( j)≤ x≤ Prc( j)+ ps( j)≤ x

for all j and x ≥ 0. These together show that (10.52) holds, and so does (10.51).

Furthermore, for each j ∈ L and ti( j) sequenced between tr( j) and ts( j) under π ,let ai( j) denote the time between the completion times of ts( j) and ti( j). Then by thedefinition of π ′, ci( j,π ′) = c( j)+ pr( j)+ ai( j) ≤st c( j)+ ps( j)+ ai( j) = ci( j,π).Thus if task i has a component ti( j) sequenced between tr( j) and ts( j) under π forsome j ∈ L, then by the independence between processors, we get

Ci(π ′) = max1≤ j≤m

ci( j,π ′)≤st max1≤ j≤m

ci( j,π) =Ci(π),

which implies E[Ci(π ′)]≤ E[Ci(π)]. For any other task i (i = r,s), Ci(π) =Ci(π ′) asthe position of the task is not affected by the change from π to π ′. These togetherwith (10.51) prove E[TCT (π ′)]≤ E[TCT (π)] and so the theorem.

Remark 10.10. Note that the optimal policy π∗ in Theorem 10.11 may not be op-timal for maxi E[Ci], as shown by Example 10.2, even if the conditions of Theo-rem 10.11 are satisfied. This is an interesting question that deserves further investi-gations.

10.3 Scheduling of Perishable Products

10.3.1 Perishable Products

Perishable products arise in many industries, including agriculture, dairy, food pro-cessing, biochemical, electronic, information, toy, fashion, and so on. A commonfeature of such products is their values will diminish over time if not delivered orsold promptly. A key concern in these industries is how to properly handle the is-sue of perishability for their products so as to preserve their values against decay-ing, which is crucial to maintain or enhance profitability and competitiveness in themarketplace. Because the decaying process of perishable products directly corre-lates with the element “time”, scheduling of perishable products needs to coordinatewith the timing decisions on their production and delivery, differing from traditionalscheduling problems that do not take such timing decisions into consideration.

It is very common that a producer utilizes a publicly available transportationservice to deliver his products to a destination for further distribution. Such a trans-portation service may include cargo flights, cargo vessels, or trains, which is nor-


mally operated on regular and known schedule; cf. Chopra and Meindl (2001), andFawcett et al. (1992). The cost of using such a service is usually low, and there-fore may result in substantial savings for the producer. It requires, however, theproducer to properly align his production plan with the transportation service avail-able, which is not under his control. In certain situations, this requires the producerto plan carefully, by taking into account all the information available, in order toreduce the possible loss that may occur as a result of an unexpected disruption tothe transportation schedule.

The problem we consider here concerns a firm that produces a variety of freshproducts to supply two potential markets, an export market and a home (local) mar-ket. The export market is much more profitable, but involves a high risk of decay-ing in the delivery process. The home market, on the other hand, can be reachedpromptly but is less profitable. Delivery to the export market is carried out by atransportation service publicly available, whereas distribution to the home marketis carried out by the producer’s own delivery fleet, or by direct pickup of the localcustomers. The business practice of supplying products to two markets, with deliv-eries to the two markets being carried out as above, is common in food productionindustry, which is normally operating smoothly. What we consider here is the sit-uation when the transportation service to the export market is severely disrupted,due to unexpected causes such as bad weather, natural or human-induced disas-ters, industrial actions, machinery failures, etc. Each time when this happens, theproducer is facing a difficult problem of how to appropriately handle his products,including those that have been finished and are waiting for delivery, and those thathave been ordered but not yet processed. This is a problem concerning managementunder random disruption. It is more critical for perishable products as they requirecareful timing decisions in order to minimize the likely loss. Chang, Ericson, andPearce (2003) note that “Air transportation is important for the export of high-endfisheries products, many of which are shipped fresh or alive. . . In the disruptions of9/11, one exporter of fresh seafood lost millions of dollars in product that had to bethrown out due to spoilage.” They also indicate that “any major disruption (of theairport) during the summer season would be a disaster for exporters of cherries andother fresh produce, because it would be expensive and time-consuming for them toestablish alternative transportation linkages.”

Research on perishable products in the operations management literature hasfocused on inventory management. A comprehensive survey of the earlier litera-ture was provided by Nahmias (1982), while more recent works were reviewed inRaafat (1991), Nandakumar and Morton (1993), Hsu (2000), Cooper (2001), Goyaland Giri (2001), Ferguson and Koenigsberg (2007), and Blackburn and Scudder(2009). Generally, the key objective of perishable inventory studies is to deter-mine the replenishment policies for inventory. This differs in essence from whatwe study here, since our main concern is to find the best timing to produce and dis-pose fresh products in two markets, when the transportation for the export marketis severely disrupted. As far as the modelling of perishability is concerned, the ran-dom fresh-time we consider here is in line with the concept of random lifetime of


Nahmias (1982), where two types of perishable time – fixed lifetime and random life-time, were classified. Since our model allows the random fresh-time of each productto follow a general probability distribution, we actually cover fixed fresh-time as aspecial case.

A group of scheduling problems consider the so-called deteriorating jobs, whichdeteriorate while awaiting processing (cf. Sect. 9.1). Consequently, the longer a jobwaits before it is processed, the longer the processing time it will require. The workof Browne and Yechiali (1989, 1990) was pioneering in bringing in this line ofresearch. They motivated their model by applications in queueing and communi-cation systems, and examined a stochastic problem with random processing times.More results were reported in Alidaee and Womer (1999), Bachman et al. (2002),Mosheiov (1991), Zhao and Tang (2010), etc. See also Sect. 9.1. There is a basicdifference between the model of Browne and Yechiali and the model we deal withhere, since the latter addresses the perishability of the products after they are com-pleted while Browne and Yechiali’s model addresses the deterioration of the jobsbefore they are processed.

Other relevant researches include: Starbird (1988) analyzed a sequencing prob-lem occurring in an apple packing plant in the state of New York. Tadei et al. (1995)investigated a production scheduling problem encountered by a factory in Lisbon,Portugal, which produced perishable goods for the food market. Arbib et al. (1999)considered a scheduling problem with perishable jobs, where perishability occurredin both the initial and the final stages of the production process. Cai et al. (2008)studied a problem of scheduling the production of perishable products in order tomaximize the utilization of raw materials. Cai et al. (2010) considered a supplychain management problem involving long distance transportation of fresh productwhere the level of the freshness-keeping effort is a decision variable.

Another stream of related studies concern about supply chain management withtransportation disruptions. Several of them highlight the great impacts of such dis-ruptions, and analyze the strategies used by industry to address them. Wilson (2007)examined the effect of a transportation disruption on supply chain performance.Liskyla-Peuralahti et al. (2011) analyzed the impacts of a port closure due to a strikein Finland in 2010. They found that exports of meat, meat products and cheeseswere first to suffer, and expensive production adjustment would be necessary due tointerrupted export streams. Vakharia and Yenipazarli (2009) indicated that a trans-portation disruption would require more of a tactical response in terms of a revisionof the product allocation decisions within the supply chain distribution network. Fora comprehensive review, see Snyder et al. (2010). Rescheduling under disruptionshas also emerged to be an active line of research in recent years. See, e.g., Hall andPotts (2004), Herrmann (2006), and Hoogeveen et al. (2012). These studies usu-ally concern about how the jobs should be rescheduled when a disruption arises,so as to minimize costs associated with the deviation between the original and newschedules.

The exposition of this section is based on Cai and Zhou (2013).


10.3.2 The Base Model

A firm produces a variety of fresh products to supply two markets, an export marketand a local market. The export market is more profitable, and thus the firm alwaysattempts to have its products sold there if possible. The supply to the export marketfollows the orders placed by customers in advance (the model with random marketdemands will be studied later). A product can always be sold at the local market aslong as the firm wants to do so, although at a considerably lower price (which weassume is fixed, independent of the quantity). The firm relies on the transportationservice of an outside carrier for delivery to the export market. Normally, profits willbe achieved if the products are processed and delivered to the customers accordingto their order requirements.

The problem arises when the transportation service is disrupted severely (e.g.,the airport is closed due to an unexpected event, or the rail system is experiencinga breakdown due to a flood), and so the departure time X of the next transporterfor the designated export market becomes very uncertain. Suppose that the presenttime is t = 0. The next transporter may become available any time from now, whichdepends on when the event is resolved and how the transportation system is recov-ered. The firm has a set of n products to be completed and delivered according tothe customer orders. Because of the perishable nature of the products and the veryuncertain schedule of the transportation service, if a product misses the transporter,the firm may have to cancel the order with the customer and put the product into thelocal market to avoid the risk of severe decay in waiting for another transporter tobecome available. For orders that cannot be canceled or incur high contract cance-lation penalties, the firm may have to seek an alternative but probably much moreexpensive means (e.g., switching from rail to air) to perform the delivery.

We assume that a “product” in our basic model corresponds to a “customer order”,with specific requirements on the product type, quality, and quantity. Thus, we usethe two terms interchangeably if and when this causes no ambiguity. Moreover,we assume that there is a critical manufacturing resource that constrains the pro-duction of the products, and so the orders must be scheduled carefully to utilizethis resource. This critical resource is denoted as the “machine” we are concernedabout in our model. To illustrate, consider the process of manufacturing fresh fil-let products (cf: www.rumijapan.co.jp/en/factory/), which consists ofthe steps: Storage (to store the live fish in a tank), Ikijime (to paralyze and bleedthe fish), Nerve removal, Head removal, Gutting, Filleting, Grading, Pillow-vacuumpacking, Cooling, and Shipping preparation. The amount of time required to pro-cess an order at each step is usually proportional to the quantity of the order. Thus,such a system can be modeled, at least approximately, as a proportionate flow shop(Ow 1985; Pinedo 2002). As it is well known, in such a system there is a bottleneckmachine (this is almost always the one requiring the longest processing time; seeOw 1985), which is the critical manufacturing resource. In general, the “machine”in our model can be such a bottleneck machine in the production system, or a type of


production capacity (e.g., a team of workers for gathering the crops from the field),or even the entire production system.

For each product i, there is a processing time Pi, which represents the amount oftime the machine needs to process the product. We assume Pi to be a random variablefollowing a general probability distribution, independent of each other. We assumethat deterioration during the manufacturing process can be properly controlled inthe factory environment. Let Ci denote the completion time of product i, i = 1, . . . ,n.A finished product remains fresh in a period of time after it is completed, whichwe call its fresh-time. The fresh-time of product i, denoted by Di, is also a randomvariable following an arbitrary distribution, independent of Pi. During the inter-val [Ci,Ci +Di], the finished product retains its best value. After that, however, theproduct starts to deteriorate at a significant rate. The total cost due to deteriorationduring the interval (Ci +Di, t] is given by a function gi(τ), where τ = t −Ci −Di. Itis assumed that gi(·) is a general nonnegative and nondecreasing function. This costfunction may represent the drop in the value of the finished product due to its deteri-oration, or the additional treatment cost to keep the product fresh after its fresh-timeexpires.

The departure time X of the transporter is modeled by an exponentially dis-tributed random variable with rate δ > 0 (and mean E[X ] = 1/δ ), independent ofPi and Di. Note that an exponential X models situations with a high level ofuncertainty; see, e.g., Cai et al. (2000), Feller (1966, Chap. 1), and Parzen (1992,Sect. 6.4). For any finished product i, it is desirable to be delivered by the transporterwithin its fresh-time. Nevertheless, due to the uncertainties involved, the ideal situa-tion of collecting a finished product within its fresh-time may not be possible for allproducts. Consequently, for a product i, one of the following scenarios may occur:

(i) The product is completed before the transporter becomes available and is thendelivered by the transporter at departure time X , where X >Ci. In this scenario,a deteriorating cost gi(X −Ci −Di) will be incurred at time X , if X >Ci +Di.

(ii) The product is completed after the departure of the transporter, and thereforeincurs a cost βi (which may be the loss due to the price difference between theexport and local markets, as in the seafood processing example in Sect. 10.3.1;or the extra cost to deliver the product to the original destination, as in theagricultural example), where βi ≥ 0 is a known constant.

(iii) The product is completed before the transporter becomes available and is wait-ing for the transporter initially. The decision to wait, however, is withdrawn ata later time before it is picked up. In other words, after waiting for a periodof time, the deterioration of the product makes it better off to dispose it at thelocal market than keep it waiting and decaying. If product i is disposed at timeCi +Di + τ , then it incurs a cost ui(τ) = βi +(1− ri)gi(τ), where 0 ≤ ri ≤ 1 isa known constant. If gi(τ) represents the reduction in the value of the productdue to decay, then ri is the discount rate on the local market (i.e., 1 unit worthfor export reduces to 1− ri unit on the local market). Alternatively, if gi(τ) is

the extra cost to maintain the value of the product, then the disposal cost attime Ci +Di + τ is given by βi + gi(τ), which is a special case of the ui(τ)defined above with ri = 0.

Decisions that have to be made include: (a) for each finished product that iswaiting for the transporter at time t, the decision on whether it should continue towait or be put in the local market; (b) for each unfinished product to be processed,the amount of time to postpone its processing; and (c) the sequence to process theremaining unfinished products. Note that the model here involves decaying costs,and thus postponing the processing of a product may be sensible, since this maydelay the completion time of the product and thus reduce its decaying cost. This iswhy decision (b) is considered here.

We will consider both classes of static and dynamic policies in this section. Whenstatic policies are considered, we assume that preemption is not allowed; i.e., oncethe processing of a product starts, it should continue without interruption until itis completed. When dynamic policies are considered, however, we allow for pre-emption; i.e., a product may be preempted by another one, if this is found to betechnically feasible and beneficial based on the information available at the time ofmaking the decision. In a dynamic policy, consideration for preemptions is naturalas adjustment to the policy can be made dynamically. For ease of presentation, inthe rest of this section we will limit to the class of static policies. Considerations ondynamic policies will be discussed later.

Now suppose that a static policy is considered. Then we can write more specifi-cally the following components in a policy:

(a) The timing to dispose each finished product in the local market, denoted by aset of deterministic values T = (τ1,τ2, . . . ,τn) in [0,∞], where τi represents thelength of time that product i is allowed to wait after its fresh-times is over. Thewaiting of product i will be terminated at time Ci+Di+τi, if the transporter hasnot been available by then. In particular, for a product i completed before thetransporter becomes available, τi = 0 represents disposing it once its fresh-timeexpires, while τi = ∞ means that it will definitely wait for the transporter.

(b) The postponement for each unfinished product, denoted by a set of nonnegativedeterministic values S = (s1,s2, . . . ,sn), where si is the amount of time that theprocessing facility is kept idle immediately before product i is started.

(c) The sequence to process the n products, denoted by π = (i1, i2, . . . , in), which isa permutation of (1,2, . . . ,n), with i = ik if product i is the kth to be started andprocessed.

We denote an overall policy by ζ = (T,S,π). Then the total expected cost (T EC)of the products under policy ζ can be expressed by

T EC(ζ ) = E

[

∑Ci+Di<X<Ci+Di+τi

gi(X −Ci −Di)+ ∑X<Ci

βi + ∑X>Ci+Di+τi

ui(τi)

](10.53)

The first sum in (10.53) represents the total decaying cost incurred by those prod-ucts delivered by the transporter, the second sum represents the total cost for thoseproducts missing the transporter, and the last sum represents the total cost of thoseproducts i which are disposed at the local market after waiting τi units of time forthe transporter from the end of their fresh-times. The problem is to determine anoptimal policy to minimize the total expected cost as expressed above.

10.3.3 Waiting Decision on a Finished Product

In this subsection we consider the finished products. The primary concern about afinished product is when the waiting for delivery should be terminated.

We assume that each gi(·) is a continuous function with gi(0) = 0. This assump-tion is mainly for ease of presentation, which can be dropped if necessary (seeRemark 10.11 below). As defined in the last subsection, τi represents the lengthof time that product i is allowed to wait from the end of its fresh-time. In otherwords, the waiting of product i will be terminated at Ci +Di + τi. For brevity, wecall τi the terminating time for product i.

Since X , Ci and Di are independent random variables, given Ci = c and Di = d,

E[gi(X −Ci −Di)ICi+Di<X<Ci+Di+τi|Ci = c,Di = d]

= E[gi(X − c− d)Ic+d<X<c+d+τi] =∫ c+d+τi

c+dgi(z− c− d)δe−δ zdz

=∫ τi

0gi(y)δe−δ (y+c+d)dy = ai(τi)e−δce−δd ,

where ai(τ) =∫ τ

0 gi(y)δe−δydy. We can thus obtain the unconditional expected cost:

E[gi(X −Ci −Di)ICi+Di<X<Ci+Di+τi] = ai(τi)E[e−δCi ]E[e−δDi ] = ai(τi)biE[e−δCi ],

where bi = E[e−δDi ]. Similarly we can derive Pr(X >Ci+Di+τi) = bie−δτiE[e−δCi ]and Pr(X <Ci) = 1−E[e−δCi]. Thus it follows from (10.53) that

TEC(ζ ) =n

∑i=1

bihi(τi)−βi

E[e−δCi ]+

n

∑i=1

βi, (10.54)

where hi(τ) = ai(τ)+ui(τ)e−δτ . This means that minimizing TEC(ζ ) with respectto τi is equivalent to minimizing hi(τi) for each i. This leads to the followingtheorem.

Theorem 10.12. The optimal terminating time τ∗i of the finished product i is deter-mined by

τ∗i : hi(τ∗i ) = minhi(τ) : τ ≥ 0, (10.55)

where

hi(τ) = ai(τ)+ ui(τ)e−δτ with ai(τ) =∫ τ

0gi(y)δe−δydy, i = 1, . . . ,n.

(10.56)In other words, a finished product should wait for delivery to the export market fora maximum amount of time τ∗i after its fresh-time, by then the waiting should beterminated if the transporter is still not available.

The function hi(τ) in (10.56) plays the key role in determining the optimalterminating time τ∗i . It represents the cumulative decaying cost ai(τ) if product iis delivered before τ plus the expected cost ui(τ)e−δτ if it is disposed at the localmarket at τ . Further note that, because ai(τ) and ui(τ) are continuous functions ofτ , hi(τ) must have a minimum point on [0,∞].

The following proposition shows when the optimal decision will be either “no-wait at all” or “definitely wait until the transporter picks it up”.

Proposition 10.1. Let g(x) be a differentiable function on [0,∞) and u(x)=β+g(x).

(a) If g′(x) is a nonincreasing function, then τ∗ is either 0 or infinity.

(b) If g′(x)≥ δβ for all x ≥ 0, then τ∗ = 0.

(c) If g′(x)≤ δβ for all x ≥ 0, then τ∗ = ∞.

Proof. From the definitions of h(·) and a(·) in (10.56), we obtain

h′(x) = (1− r)g′(τ)e−δx − β +(1− r)g(x)δe−δx+ g(x)δe−δx

= (1− r)g′(x)− δβ + δ rg(x)e−δx.

Thus, when r = 0, h′(x) = (g′(x)− δβ )e−δx. If g′(x) is a nonincreasing function,then h(x) is concave down, hence its minimum is at either 0 or infinity. This provesPart (a). The proofs for (b) and (c) are similar.

The next proposition describes when the optimal waiting time is a positive finiteamount.

Proposition 10.2. If g(x) is continuously differentiable on [0,∞), (1− r)g′(0)< δβand (1−r)g′(x)+δ rg(x)> δβ for sufficiently large x, then 0< τ∗ <∞ and it solvesthe equation (1−r)g′(τ∗)+δ rg(τ∗) = δβ . If in addition, g′(x) is strictly increasingin x, then τ∗ is the unique positive solution to (1− r)g′(x)+ δ rg(x) = δβ .

Proof. It follows from the formula above for h′(x) that h(x) is decreasing if (1−r)g′(x)+ δ rg(x) < δβ and increasing if (1− r)g′(x)+ δ rg(x) > δβ . Therefore, if(1− r)g′(0) < δβ and (1− r)g′(x) + δ rg(x) > δβ for large x, then there exists a

point 0 < τ < ∞ such that h(0)> h(τ) and h(∞) > h(τ), which means that neither0 nor ∞ is a minimum point of h(τ), hence 0 < τ∗ < ∞. The rest of the propositionfollows from standard results of calculus.

In practice, it is possible that the decaying cost function g(x) may exhibit jumpsat some moments of time. For example, there may be a “best-before” date for aproduct, after which the product will lose its value substantially. In such a case,the expiry date of the fresh-time represents the “best-before” date of the product.Because the value of the product suffers a substantial drop on the “best-before”date, it occurs that g(0)> 0. The following remark applies in these situations.

Remark 10.11. If we drop the assumption that g(x) is continuous with g(0) = 0,the results of this section will remain intact if we make the following minor adjust-ments. First we extend g(x) to (−∞,∞) by defining g(x) = 0 for x < 0. Then g(x)is a nonnegative and nondecreasing function on (−∞,∞), hence it has a left limit atevery point, so does h(x). In particular, h(0−) = u(0−) = β +(1− r)g(0−) = β .As a result, there exists a point τ∗ ∈ [0,∞] (or [−Ci −Di,∞] if Ci +Di < 0) suchthat h(τ∗−) = infx h(x). This τ∗ remains the optimal terminating time for the cor-responding product, provided that the product is disposed just before τ∗ instead ofexactly at τ∗ in case g(τ∗−) < g(τ∗). For instance, if τ∗ = 0 is on 30 April andrepresents the “best-before” date, then the optimal time to dispose the product is on29 April.

A producer often faces the question whether to sell a fresh product at a muchlower price just before its “best-before” date, or convert it into a refrigerated productto extend its “good-until” period, at some extra cost. Remark 10.11 suggests that ifτ∗ > 0, then the product should be refrigerated (so that its “good-until” period isextended). The sale of the product at the local market should be conducted nearthe extended “good-until” date τ∗, rather than the “best-before” date, and τ∗ can becomputed by (10.55) and (10.56).

10.3.4 Decisions on Unfinished Products

We now consider how to deal with the unfinished products. This concerns how theproducts are scheduled to utilize the critical manufacturing capacity (the machineas we define). The completion time of a product should be the finishing time C′

ion the critical machine, plus the total time ρi required by the subsequent steps, ifany (which is a constant independent of the processing sequence on the critical ma-chine). Note that ρi can be absorbed into the fresh-time Di (that is, Di can be definedas the actual fresh-duration of the product after completion, plus ρi). We assume thatDi has been properly defined as such. Consequently, for brevity of notation, we con-tinue to use Ci to represent the finishing time on the machine.

The main question about an unfinished product is when to process it, so thatthe expected total cost is minimized. In the case of static policies, this questionrelates to two decisions: the sequence π = (i1, i2, . . . , in) to process the n products,and the postponement S = (s1,s2, . . . ,sn) before processing each product. We cansee that the timing to process each product is determined as long as both π and Sare specified. In the case of dynamic policies, the question is answered if we candetermine, at any decision epoch, which product is to be processed.

Define

bi = E[e−δDi ], fi = E[e−δPi ], ωi = βi − bihi(τ∗i )≥ 0, i = 1, . . . ,n. (10.57)

These parameters contain the information on the fresh-times, the processing times,the delivery time, and the waiting cost. They play key roles in the optimal policies,as we will see in the following subsections.

We first study S, then investigate the optimal static and dynamic sequences.

Optimal Decisions on Postponement

As we have indicated in Sect. 10.3.2, from an intuitive point of view, deliberatelypostponing the processing of the next product may be beneficial at times, since thiscan delay the completion of the product and thus reduce its decaying cost. Thefollowing theorem, however, shows that such delaying is unnecessary in an optimalpolicy.

Theorem 10.13. There exists an optimal policy ζ ∗ with s∗i = 0, i = 1, . . . ,n, in boththe classes of static and dynamic policies. In other words, there should not be anydeliberate postponement to delay the processing of any product.

Proof. First consider the static policy. When τi = τ∗i for all i, from (10.54) we have

T EC(ζ ) =n

∑i=1

[bihi(τ∗i )−βi] ∏k∈Bi(π)

e−δ sk fk +n

∑i=1

βi, (10.58)

where Bi(π) is the set of products sequenced no later than product i under π . Since0 < bi ≤ 1 for all i, it follows from (10.55) to (10.56) that bihi(τ∗i ) ≤ bihi(0) =biβ ≤ βi for all i = 1, . . . ,n. Hence (10.58) shows that with T = T∗, T EC(ζ ) isnondecreasing in each si. Consequently, T EC(ζ ) is minimized when s∗i = 0, i =1, . . . ,n.

Next we consider the dynamic policy. If the transporter has departed, any post-ponement becomes meaningless and so for any unfinished product the decision atthe present time t should be s∗i (t) = 0, no matter how long it has been postponedbefore. Suppose the transporter has not been available at time t. Then, due to the


memoryless property of the exponential X , we have (10.58) for all unfinished prod-ucts at time t. Consequently, the optimal s∗i (t) should also be 0 if product i is notfinished.

The optimality of this “zero postponement” policy is a direct consequence ofthe exponential departure time X . Because of the memoryless property, postponinga product does not increase the probability of being delivered. Moreover, the zeropostponement policy also relies on the optimal terminating times τ∗i determinedby Theorem 10.13, which ensures that bihi(τ∗i ) ≤ hi(0) = βi for all products. Notethat βi represents the relative loss if the product misses the transporter, and hi(τi)represents the expected cost if it waits for an amount of time τi after the expiryof its fresh-time (see the explanation below Theorem 10.13 on the function hi(τi)).Because of the fresh-time Di, the expected cost after its completion is discountedby a factor bi. Therefore, if the completion of the product is postponed so that itmisses the transporter, the cost is βi; If it is completed earlier so that it has to wait,the expected cost is bihi(τi). Apparently, if βi is greater than bihi(τi), then the post-ponement si should be 0. On the other hand, if bihi(τi)> βi, then by (10.58), si = ∞would be the best for product i. The cost in this case becomes βi, and this is equiva-lent to selling the product in the local market as soon as it is finished.

The above observations lead to the following remark.

Remark 10.12. Facing the very uncertain delivery schedule, it is more sensible tohave the product started with zero postponement, as long as its waiting for the trans-porter can be terminated at the optimal time τ∗i as specified in Theorem 10.13, or atany time τi such that bihi(τi)≤ βi. However, if for any reason or due to any practi-cal restriction, such a terminating time is not achievable, then the processing of thisproduct should be sufficiently delayed (until other products have been completed)and sold in the local market as soon as it is finished.

The following remark is also valid.

Remark 10.13. The conclusions of Theorem 10.14 hold when the departure time Xfollows a decreasing failure rate distribution.

Optimal Static Sequence

It follows from Theorem 10.14 together with (10.58) that

T EC(ζ ∗) =−n

∑i=1

ωi ∏k∈Bi(π)

fk +n

∑i=1

βi =−E

[n

∑i=1

ωie−δCi

]+Ω , (10.59)

where Ω = ∑i βi is a fixed constant independent of any sequence π . Therefore,to find an optimal sequence to minimize (10.59) is equivalent to finding an opti-


mal sequence to maximize E[∑i ωie−δCi

]. This is a known problem in scheduling,

although in the scheduling literature δ is treated as the discounted factor in a dis-counted reward measure while here it is the rate of the delivery time X . The follow-ing result can be established (cf. the proof of Theorem 3.8).

Theorem 10.14. The optimal static sequence π∗ should sequence the products inN in the nonincreasing order of ωi fi/(1 − fi), where fi and ωi are definedin (10.57).

In order to catch the delivery to the more profitable market, common practicesin the industry include: (i) to finish as many products as possible, or (ii) to finish asvaluable products as possible. Strategy (i) in fact implies a nondecreasing order ofthe expected processing times E(Pi), while strategy (ii) implies a nonincreasingorder of product values βi. Theorem 10.15 suggests that neither of such commonpractices is optimal – instead the optimal strategy should sequence the productswith a new index ωi fi/(1− fi). Since ωi = βi −bihi(τ∗i ) is βi minus the expectedcost of decay and fi = E[e−δPi ] is decreasing in Pi, the index ωi fi/(1− fi) maybe interpreted as a combined effect of the processing time and product value afteraccounting for decaying cost.

Optimal Dynamic Sequence

We now study the optimal sequence in the class of dynamic policies. Note that theexpression of the expected total cost in (10.53) remains valid for dynamic policies.However, other expressions derived under the assumption of static policies may nolonger hold under dynamic policies.

We first consider the case where preemptions are not allowed, namely, the pro-cessing of any product should continue without interruption until it is completed.This means that the decision epochs in this case is time 0 as well as the completiontimes of the products. At any decision epoch t, the states of the system consist of: (a)the set of the remaining unfinished products, and (b) the event that the transporterhas departed or not. Let N(t) denote the set of unfinished products at t. The problemhere is to determine, at each decision epoch t, the unfinished product in the set N(t)to be processed next, according to the information available about the states of thesystem, so as to minimize the total expected cost. Let H(i) =ωi fi/(1− fi), i ∈ N(t),where fi and ωi are defined in (10.57).

The optimal dynamic policy in this case is given in the following theorem (whichcan be proven by mathematical induction; see, e.g., Pinedo 1983, pp. 565–566 andCai and Zhou 1999).


Theorem 10.15. If preemption is not allowed, then at any decision epoch t, theoptimal decision is to select product i∗ with H(i∗) = maxi∈N(t) H(i) to be the nextone to process, no matter whether the transporter has departed or not.

We now turn to the case where preemptions are allowed. In this case, the pro-cessing on a product may be interrupted, if necessary, before its completion so thatanother product is started. Therefore, at any decision epoch t, the states of the sys-tem consist of: (a) the set of the remaining unfinished products, (b) the amounts oftime that have been spent to process the respective products up to the present timet (that is, the realizations of random processing times Pi up to t), and (c) the eventthat the transporter has departed or not. Again, let N(t) denote the set of unfinishedproducts at t. Further, let Ti(t) be the realization of the processing time on producti up to the present moment t. The decision epochs in this case can be any momentsof time. Without loss of generality, we assume that it takes discrete values; that is,t = 0,1,2, . . . , until all products are completed. The problem here is to determine, ateach decision epoch t, the product to be processed in [t, t + 1], based on the knowninformation about the states of the system, and in anticipation of the possible statesthat may occur in the future.

As Theorem 10.14 is not affected by preemptions, from (10.59) we can see thatthe problem is to choose a product to process so as to maximize

Jt(ζ ) = ∑i∈N(t)

ωiE[e−δCi(ζ )]

at any decision epoch t in the class of dynamic policies, where Ci contain no idletime. This problem can be optimally solved by using the Gittins index; cf. Eq. (7.5).Specifically, for each product i, the following Gittins index can be defined:

Gi(Ti(t)) = supθ>Ti(t)

ωi∫ θ

Ti(t) e−δ sdQi(s)∫ θ

Ti(t)(1−Qi(s))e−δ sds, i = 1, . . . ,n, (10.60)

where Qi(s) is the cumulative distribution function of processing time Pi. In the casewhere the processing times Pi take integer values, the Gittins index takes the form:

Gi(Ti(t)) = maxθ>Ti(t)

ωi ∑θs=Ti(t)+1 e−δ s Pr(Pi = s)

∑θs=Ti(t)+1 e−δ s Pr(Pi ≥ s)

, i = 1, . . . ,n. (10.61)

To maximize Jt(ζ ), a dynamic policy ζ ∗ should, at any decision epoch t, choosethe product i∗ which has the maximum Gittins index to process. This leads to thenext theorem.

Theorem 10.16. When preemption is allowed, an optimal dynamic policy is givenas follows:


(a) At any time t, if the transporter has not departed, then choose the product i∗ suchthat Gi∗(Ti∗(t)) = maxi∈N(t) Gi(Ti(t)) as the one for processing during [t, t +1].

(b) At any time t, if the transporter has departed, then all remaining unfinishedproducts can be processed in any order and preemptions are not necessary.

Denote the hazard rate of the processing time Pi as ξi(x). It is easy to see that,when ξi(x) is a nondecreasing function, conditional on that product i has beenprocessed for t units of time, the remaining time to complete the product is stochasti-cally no greater than the original processing time Pi. The following corollary comesfrom Rule (a) of Theorem 10.17.

Corollary 10.6. If the hazard rate ξi(x) of product i is a nondecreasing function,then under the rule of Theorem 10.17, product i will not be preempted by any otherproduct once it has been selected for processing.

From Corollary 10.1 and Theorem 10.16, it is easy to conclude the next corollary.

Corollary 10.7. When all the processing times have nondecreasing hazard rates,the decisions epochs reduce to: 0 and the completions of the products. At any deci-sion epoch, the product selected to be processed is the one that has the highest indexH(i∗) as in Theorem 10.16.

If the hazard rates of the processing times are not nondecreasing, preemptionsmay be needed. Consider the case when ξi(x) is nonincreasing in x. Then by (10.60),

Gi(Ti(t)) = limθ→Ti(t)+

ωi∫ θ

Ti(t) ξi(x)(1−Qi(x))e−δxdx∫ θ

Ti(t)(1−Qi(x))e−δxdx= ωiξi(Ti(t))

(note that the ratio in the above equation is nonincreasing in θ when ξi(x) is nonin-creasing).

Thus Gi(Ti)=ωiξi(Ti) is nonincreasing in Ti = Ti(t). As a result, although a prod-uct has the maximum Gittins index when it is selected for processing, its Gittins in-dex may drop below the maximum index at a later time before its completion. In thatcase, the product should be preempted at such a time, according to Theorem 10.17.

10.3.5 Accounting for Random Market Demand

The model considered so far is assumed to follow the “produce-to-order” pattern,in the sense that the supply to the export market is to meet the orders having beenplaced by the customers. We now consider the other version of the problem: there is

an uncertain demand in the export market for each product, and the producer has toconsider his supply to the export market based on information on the random marketdemands. In such a situation, it is very important for the producer to determine theright quantity to be delivered to the export market, in order to minimize the likelyloss caused by over-supply or under-supply. We address this problem in this section.

Without loss of generality, assume that the total quantity of product i is 1. Let ρibe the quantity to be exported, where 0 ≤ ρi ≤ 1. Note that if the total quantity ofproduct i is not 1, then ρi represents the proportion of the product to be exported.This is a (deterministic) decision variable that the producer has to determine accord-ing to the market demand for the product. Let Mi be the demand for product i at theexport market, which is a random variable with an arbitrary probability distribution,independent of X and Pi,Di. For product i, the quantity to be sold at the local mar-ket is (1−ρi). We also assume that the local market can absorb any of such quantityif the producer wants to, although the local price is considerably lower.

Let vi denote the total value of product i at the export market, and li denotethe loss as the difference between vi and the realized value of product i. We writegi = gi(X −Ci −Di) for convenience. The value of li is determined as follows:

(i) If X < Ci, then product i has to be sold at the local market, so that li = βi asbefore.

(ii) If Ci < X < Ci +Di, then product i can be exported in fresh. If ρi ≤ Mi, thenthe exported quantity is fully sold, giving li = (1 − ρi)βi; if ρi > Mi, thenρi −Mi is wasted, leading to an extra loss of (ρi −Mi)vi. Together we haveli = (1−ρi)βi +(ρi −Mi ∧ρi)vi, where a∧b = min(a,b).

(iii) If Ci +Di < X < Ci +Di + τi, then a loss ρigi is incurred due to decay whilewaiting for the transporter, ρi is then exported at a reduced value of vi−gi, and1−ρi is sold locally in fresh. Thus li = ρigi+(1−ρi)βi+(ρi−Mi ∧ρi)(vi−gi).

(vi) If X >Ci +Di+ τi, then li = ui(τi) as before.

Therefore the expected loss for product i is a sum of four components:

• E1 = E[liIX<Ci] = βi Pr(X <Ci) = βiE[1− e−δCi] = βi(1−E[e−δCi]).

• E2 = E[li]Pr(Ci < X <Ci +Di)

= (1−ρi)βi +(ρi −E[Mi ∧ρi])viE[e−δCi − e−δ (Ci+Di)]

= (1−ρi)βi +(ρi −mi(ρi))vi(1− bi)E[e−δCi ],

where mi(ρ) = E[Mi ∧ρ ] and bi = E[e−δDi ].

• E3 = E[liICi+Di<X<Ci+Di+τi]. Note that E[giICi+Di<X<Ci+Di+τi] = ai(τi)bi

E[e−δCi ] and Pr(Ci +Di < X <Ci +Di+ τi) = (1− e−δτi)biE[e−δCi ]. Thus

E3 = E[ρigi +(1−ρi)βi +(ρi−Mi ∧ρi)(vi − gi)ICi+Di<X<Ci+Di+τi]

= E[(Mi ∧ρi)gi +(1−ρi)βi +(ρi −Mi∧ρi)viICi+Di<X<Ci+Di+τi]

= mi(ρi)ai(τi)biE[e−δCi ]+[(1−ρi)βi+(ρi−mi(ρi))vi](1−e−δτi)biE[e−δCi ].

• E4 = ui(τi)Pr(X >Ci+Di+τi) = ui(τi)E[e−δ (Ci+Di+τi)] = ui(τi)bie−δτiE[e−δCi ].

Adding E1 to E4 together, the expected loss for product i is given by

ELi = βi(1−E[e−δCi])+ (1−ρi)βi +(ρi−mi(ρi))vi(1− bi)E[e−δCi ]

+ mi(ρi)ai(τi)bi +[(1−ρi)βi +(ρi −mi(ρi))vi](1− e−δτi)biE[e−δCi ]

+ ui(τi)bie−δτiE[e−δCi ]

= βi − [βi − hi(ρi,τi)]E[e−δCi ], (10.62)

where

hi(ρ ,τ) = (1−ρ)βi+(ρ −mi(ρ))vi(1− bie−δτ)+mi(ρ)ai(τ)bi + ui(τ)bie−δτ .(10.63)

Based on (10.62) and (10.63), similar arguments as in Sects. 10.3.3 and 10.3.4 leadto Theorem 10.17 below.

Theorem 10.17. With the uncertain market demand as described in this section, theoptimal policy is given by the following rules:

• (ρ∗i ,τ∗i ) is the minimizer of hi(ρ ,τ) with respect to (ρ ,τ);

• s∗i = 0;

• The optimal static sequence π∗ is in nonincreasing order of ωi fi/(1− fi),where fi = E[e−δPi ] and ωi = βi − hi(ρ∗

i ,τ∗i )≥ 0 since mi(0) = 0, ai(0) = 0 andui(0) = βi imply hi(ρ∗

i ,τ∗i )≤ hi(0,0) = βi(1− bi)+βibi = βi.

• The optimal dynamic sequence given by Theorems 10.16 and 10.17 remains validwith ωi defined in (10.57) replaced by ωi = βi − hi(ρ∗

i ,τ∗i ).

We may further reduce the bivariate minimization of hi(ρ ,τ) to a univariate min-imization problem under certain regularity conditions. For example, if the cdf Fi(m)of Mi is a continuous function and strictly increasing in its support, then

mi(ρ)=∫ ρ

0[1−Fi(m)]dm =⇒ d

dρ mi(ρ)= 1−Fi(ρ) =⇒d

dρ (ρ−mi(ρ))=Fi(ρ).

Partially differentiating hi(ρ ,τ) with respect to ρ , we get

∂∂ρ hi(ρ ,τ) = −βi+Fi(ρ)vi(1− bie−δτ)+ [1−Fi(ρ)]ai(τ)bi = 0

=⇒ ρ = F−1i

(βi(1− bie−δτ)− ai(τ)bi

vi(1− bie−δτ)− ai(τ)bi

)= ρi(τ), say. (10.64)

Thus τ∗i is the minimizer of the univariate function hi(ρi(τ),τ) and ρ∗i = ρi(τ∗).

A simple example to derive (ρ∗i ,τ∗i ) is illustrated below, which also shows a

parallel to the well-known newsvendor problem for optimal strategy with uncertainmarket demand.

Example 10.3. For convenience we drop the subscript i in (10.63). Then we look atminimizing

h(ρ ,τ) = (1−ρ)β +(ρ −m(ρ))v(1− be−δτ)+m(ρ)a(τ)b+ u(τ)be−δτ

(10.65)with respect to (ρ ,τ).

Consider an exponential decay function g(x) = α(1− exp(−µx)), x ≥ 0, whereα > 0 represents the maximum possible loss due to decay and µ > 0 is the rate ofdecay. Note that exponential functions have been commonly used to model the delayprocess of perishable products; see Raafat (1991), and Blackburn and Scudder(2009).

Let u(x) = β + g(x). Then u′(x) = g′(x) = αµe−µx and a′(x) = g(x)δe−δx.Partially differentiate h(ρ ,τ) in (10.65) with respect to τ:

h′τ(ρ ,τ) =∂h(ρ ,τ)

∂τ= (1−ρ)β +(ρ −m(ρ))vbδe−δτ +m(ρ)a′(τ)b+ u′(τ)be−δτ − u(τ)bδe−δτ

= be−δτ[(1−ρ)β +(ρ −m(ρ))v]δ +m(ρ)g(τ)δ +αµe−µτ − [β + g(τ)]δ= be−δτ[(1−ρ)β +(ρ −m(ρ))v]δ +θ (ρ ,τ), say, (10.66)

where θ (ρ ,τ) = αµe−µτ −β δ − m(ρ)g(τ)δ with m(ρ) = 1−m(ρ). Further, since

∂θ (ρ ,τ)∂τ =−αµ2e−µτ − m(ρ)αµe−µτδ =−αµe−µτ(µ + δ m(ρ))≤ 0,

θ (ρ ,τ) is nonincreasing in τ for every ρ , with θ (ρ ,∞) = −β δ − m(ρ)αδ < 0.As a result, by (10.66), there exists τ0 ≥ 0 such that h′τ(ρ ,τ) ≥ 0 for τ < τ0 andh′τ(ρ ,τ)≤ 0 for τ > τ0. It follows that for each ρ , h(ρ ,τ) is nondecreasing in τ < τ0and nonincreasing in τ > τ0. This implies that the minimizer (ρ∗,τ∗) of h(ρ ,τ) mustbe either

(i) τ∗ = 0, ρ∗ = ρ(0) = F−1(β/v) (see (10.64)) if h(ρ(0),0)≤ h(ρ(∞),∞); or


(ii) τ∗ = ∞, ρ∗ = ρ(∞) = F−1(

β − a(∞)bv− a(∞)b

)if h(ρ(0),0)> h(ρ(∞),∞), where

a(∞)=∫ ∞

0g(x)δe−δxdx=

∫ ∞

0α(1−e−µx)δe−δxdx =α

(1− δ

µ + δ

)=

αµµ + δ .

Remark 10.14. There is an interesting observation about the results of cases (i) and(ii) above. In case (i), we can regard v as the retail price of the product in the exportmarket, and β as the relative profit between selling it in the export market and thelocal market. Then, β/v is exactly the critical fractile in the newsvendor problem(see., e.g., Stevenson 2009), and ρ∗ = F−1(β/v) is the optimal order quantity withF−1 being the inverse cumulative distribution function of the demand. Similar ex-planation also applies to case (ii), in which we need to deduct from both v and β theexpected decaying cost a(∞)b of waiting for the transporter after the fresh periodexpires.

References

Adiri, I., Bruno, J., Frostig, E., & Rinnooy Kan, A. H. G. (1989). Single machine flowtimescheduling with a single breakdown. Acta Informatica, 26, 679–696.

Adiri, I., Frostig, E., & Rinnooy Kan, A. H. G. (1991). Scheduling on a single machine with a singlebreakdown to minimize stochastically the number of tardy jobs. Naval Research Logistics, 38,261–271.

Akkerman, R., Farahani, P., & Grunow, M. (2010). Quality, safety and sustainability in food distri-bution: A review of quantitative operations management approaches and challenges. OR Spec-trum, 32, 863–904.

Alidaee, B., & Womer, N. K. (1999). Scheduling with time dependent processing times: Reviewand extensions. Journal of the Operational Research Society, 50, 711–720.

Amoura, A. K., Bampis, E., Kenyon, C., & Manoussakis, Y. (2002). Scheduling independent mul-tiprocessor tasks. Algorithmica, 32, 247–261.

Anzanello, M. J., & Fogliatto, F. S. (2011). Learning curve models and applications: Literaturereview and research directions. International Journal of Industrial Ergonomics, 41, 573–583.

Arbib, C., Pacciarelli, D., & Smriglio, S. (1999). A three-dimensional matching model for perish-able production scheduling. Discrete Applied Mathematics, 92, 1–15.

Bachman, A., Janiak, A., & Kovalyov, M. Y. (2002). Minimizing the total weighted completiontime of deteriorating jobs. Information Processing Letters, 81, 81–84.

Bagga, P. C., & Kalra, K. R. (1981). Single machine scheduling problem with quadratic functionsof completion times – A modified approach. Journal of Information & Optimization Sciences,2, 103–108.

Baker, K. R., & Scudder, G. D. (1990). Sequencing with earliness and tardiness penalties: A review.Operations Research, 38, 22–36.

Ball, M., Barnhart, C., Nemhauser, G., & Odoni, A. (2007). Air transportation: Irregular opera-tions and control. In C. Barnhart & G. Laporte (Eds.), Handbooks in operations research andmanagement science Vol. 14 – Transportation (pp. 1–68). Amsterdam: Elsevier.

Bank, P., & Kuchler, C. (2007). On Gittins’ index theorem in continuous time. Stochastic Processesand Their Applications, 117, 1357–1371.

Banerjee, B. P. (1965). Single facility sequencing with random execution times. Operations Re-search, 13, 358–364.

Bast, H. (1998). Dynamic scheduling with incomplete information. Proceedings of the tenth annualACM symposium on parallel algorithms and architectures, Puerto Vallarta (pp. 182–191).

X.Q. Cai et al., Optimal Stochastic Scheduling, International Series in OperationsResearch & Management Science 207, DOI 10.1007/978-1-4899-7405-1,© Springer Science+Business Media New York 2014

395

396 References

Banks, J. S., & Sundaram, R. K. (1994). Switching costs and the Gittins index. Econometrica:Journal of the Econometric Society, 62(3), 687–694.

Benmansour, R., Allaoui, H., & Artiba, A. (2012). Stochastic single machine scheduling withrandom common due date. International Journal of Production Research, 50, 3560–3571.

Bertsimas, D., & Nino-Mora, J. (1996). Conservation laws, extended polymatroid and multi-armedbandit problems: A unified approachto indexable systems. Mathematics of Operations Research,21, 257–306.

Bertoin, J. (1996). Levy Processes. New York: Cambridge University Press.

Bianco, L., Blazewicz, J., Dell’Olmo, P., & Drozdowski, M. (1995). Scheduling multiprocessortasks on a dynamic configuration of dedicated processors. Annals of Operations Research, 58,493–517.

Birge, J., Frenk, J. B. G., Mittenthal, J., & Rinnooy Kan, A. H. G. (1990). Single-machine schedul-ing subject to stochastic breakdown. Naval Research Logistics, 37, 661–677.

Biskup, D. (1999). Single-machine scheduling with learning considerations. European Journal ofOperational Research, 115, 173–178.

Biskup, D. (2008). A state-of-the-art review on scheduling with learning effects. European Journalof Operational Research, 188, 315–329.

Bisgaard, T. M., & Zoltan, S. (2000). Characteristic functions and moments sequences: Positivedefiniteness in probability. Huntington: Nova Science.

Blackburn, J., & Scudder, G. (2009). Supply chain strategies for perishable products: The case offresh produce. Production and Operations Management, 18, 129–137.

Blazewicz, J., Ecker, K., Schmidt, G., & Weglarz, J. (1993). Scheduling in computer and manufac-turing systems. Berlin: Springer.

Blazewicz, J., Drozdowski, M., & Weglarz, J. (1994). Scheduling multiprocessor tasks – A survey.Microcomputer Applications, 13, 89–97.

Boxma, O. J., & Forst, F. G. (1986). Minimizing the expected weighted number of tardy jobs instochastic flow shops. Operations Research Letters, 5, 119–126.

Boys, R. J., Glazebrook, K. D., & McCrone, C. M. (1997). Single machine scheduling whenprocessing times are correlated normal random variables. European Journal of OperationsResearch, 102, 111–123.

Brown, M., & Solomon, H. (1973). Optimal issuing policies under stochastic field lives. Journalof Applied Probability, 10, 761–768.

Browne, S., & Yechiali, U. (1989). Dynamic priority rules for cyclic type queues. Advances inApplied Probability, 10, 432–450.

Browne, S., & Yechiali, U. (1990). Scheduling deteriorating jobs on a single processor. OperationsResearch, 38, 495–498.

Bruno, J. (1985). On scheduling tasks with exponential service times and in-tree precedence con-straints. Acta Informatica, 22, 139–148.

Buzacott, J. A., & Shanthikumar, J. G. (1993). Stochastic models of manufacturing systems.Englewood Cliffs: Prentice Hall.

Cai, X.Q.i, & Zhou, X. (1997b). Scheduling jobs with random processing times to minimizeweighted completion time variance. Annals of Operations Research, 70, 241–260.

Cai, X.Q.i, & Zhou, X. (1999). Stochastic scheduling on parallel machine subject to randombreakdowns to minimize expected costs for earliness and tardy cost. Operations Research, 47,422–437.

References 397

Cai, X.Q.i, & Zhou, X. (2000). Asymmetric earliness-tardiness scheduling with exponential pro-cessing times on an unreliable machine. Annals of Operations Research, 98, 313–331.

Cai, X.Q.i, & Zhou, X. (2005). Single-machine scheduling with exponential processing times andgeneral stochastic cost functions. Journal of Global Optimization, 31, 317–332.

Cai, X.Q.i, & Zhou, X. (2013, to appear). Optimal policies for perishable products whentransportation to export market is disrupted. Production and Operations Management. doi:10.111/poms.12080.

Cai, X.Q., Sun, X., & Zhou, X. (2003). Stochastic scheduling with breakdown-repeat machinebreakdowns to minimize the expected weighted flowtime. Probability in the Engineering andInformational Sciences, 17, 467–485.

Cai, X.Q., Sun, X., & Zhou, X. (2004). Stochastic scheduling subject to machine breakdowns: Thebreakdown-repeat model with discounted reward and other criteria. Naval Research Logistics,51, 800–817.

Cai, X.Q., Wu, X.Y., & Zhou, X. (2005). Dynamically optimal policies for stochastic schedul-ing subject to breakdown-repeat breakdowns. IEEE Transactions on Automation Science andEngineering, 2, 158–172.

Cai, X. Q., Wu, X. Y., & Zhou, X. (2007a). Single-machine scheduling with general costs undercompound-type distributions, Journal of Scheduling. 10, 77–84.

Cai, X.Q., Wang, L., & Zhou, X. (2007b). Single-machine scheduling to stochastically minimizemaximum lateness. Journal of Scheduling, 10(4), 293–301.

Cai, X.Q., Wu, X.Y., & Zhou, X. (2009a). Stochastic scheduling on parallel machines to minimizediscounted holding costs. Journal of Scheduling, 12(4), 375–388.

Cai, X.Q., Wu, X.Y., & Zhou, X. (2009b). Stochastic scheduling subject to breakdown-repeatbreakdowns with incomplete information. Operations Research, 57(5), 1236–1249. With anelectronic companion.

Cai, X. Q., Wu, X. Y., & Zhou, X. (2011). Scheduling deteriorating jobs on a single machinesubject to breakdowns. Journal of Scheduling, 14(2), 173–186.

Cai, X. Q., & Tu, F. S. (1996). Scheduling jobs with random processing times on a single machinesubject to stochastic breakdowns to minimize early-tardy penalties. Naval Research Logistics,43, 1127–1146.

Cai, X. Q., & Zhou, X. (1997a). Scheduling stochastic jobs with asymmetric earliness and tardinesspenalties. Naval Research Logistics, 44, 531–557.

Cai, X. Q., Wu, X. Y., & Zhou, X. (2013). Optimal dynamic stochastic scheduling with partiallosses of work. Working Paper, Department of Systems Engineering Management, The ChineseUniversity of Hong Kong.

Cai, X. Q., & Zhou, X. (2004). Deterministic and stochastic scheduling with team-work tasks.Naval Research Logistics, 51, 818–840.

Cai, X. Q., Lee, C.-Y., & Li, C. L. (1998). Minimizing total flow time in two-processor task systemswith prespecified processor allocations. Naval Research Logistics, 45, 231–242

Cai, X. Q., Lee, C.-Y., & Wong, T. L. (2000). Multi-processor task scheduling to minimize themaximum tardiness and the total completion time. IEEE Transactions on Robotics and Automa-tion, 16, 824–830.

Cai, X. Q., Chen J., Xiao, Y. B., & Xu, X. L. (2008). Product selection, machine time allocation,and scheduling decisions for manufacturing perishable products subject to a deadline, Comput-ers and Operations Research, 35(5), 1671–1683.

398 References

Cai, X. Q., Chen, J., Xiao, Y. B., & Xu, X. L. (2010). Optimization and coordination of freshproduct supply chains with freshness keeping effort. Production and Operations Management,19(3), 261–278.

Chang, S. E., Ericson, D., & Pearce, L. (2003). Airport closures in natural and human-induced disasters: Business vulnerability and planning. Report for Office of Critical Infrastruc-ture Protection and Emergency Preparedness (Canada), Catalogue No.: PS4-8/2004E-PDF;ISBN: 0-662-37716-8.

Chang, C., & Yao, D. (1993). Rearrangement, majorization and stochastic scheduling. Mathmaticsof Operations Research, 18(3), 658–684.

Chang, C. S., Chao, X., Pinedo, M., & Weber, R. (1992). On the optimality of LEPT and cµ rulesfor machines in parallel. Journal of Applied Probability, 29, 667–681.

Chen, C. Y. I., George, E. I., & Tardif, V. (2001). A Bayesian model of cycle time prediction. IIETransactions, 33(10), 921–930.

Cheng, T. C. E., Ding, Q., & Lin, B. M. T. (2004). A concise survey of scheduling with time-dependent processing times. European Journal of Operational Research, 152, 1–13.

Chimento, P. F., & Trivedi, K. S. (1993). The completion time of programs on processors subjectto failure and repair. IEEE Transactions on Computers, 42, 1184–1194.

Chopra, S., & Meindl, P. (2001). Supply chain management: Strategy, planning, and operation.Upper Saddle River: Prentice Hall.

Coffman, E. G., Jr., Flatto, L., & Wright, P. E. (1993). Optimal stochastic allocation of machinesunder waiting-time constraints. SIAM Journal on Computing, 22, 332–348.

Cooper, W. L. (2001). Pathwise properties and performance bounds for a perishable inventorysystem. Operations Research, 49, 455–466.

Crosbie, J. H., & Glazebrook, K. D. (2000). Index policies and a novel performance space structurefor a class of generalised branching bandit problems. Mathematics of Operations Research, 25,281–297.

Crabill, T. B., & Maxwell, W. L. (1969). Single machine sequencing with random processing timesand random due dates. Naval Research Logistic Quarterly, 19, 549–554.

Dellacherie, C., & Meyer, P. (1978). Probabilities and potential A. Amsterdam/New York: North-Holland.

Dellacherie, C., & Meyer, P. (1982). Probabilities and potential B: Theory of martingales.Amsterdam/Oxford: North-Holland.

Denneberg, D. (1994). Non-additive measure and integral. Dordrecht/Boston: Kluwer.

Derman, C., Lieberman, G., & Ross, S. (1978). A renewal decision problem. Management Science,24, 554–561.

Drozdowski, M. (1996). Scheduling multiprocessor tasks – An overview. European Journal ofOperational Research, 94, 215–230.

Duffy, J. A. (2000). Service recovery. In J. A. Fitzsimmons (Ed.), New service development:Creating memorable experiences (pp. 277–290). Thousand Oaks: SAGE.

Eilon, S., & Chowdhury, I. G. (1977). Minimizing waiting time variance in the single machineproblem. Management Science, 23, 567–575.

EL Karoui, N., & Karatzas, I. (1993). General Gittins index processes in discrete time. Proceedingsof the National Academy of Sciences of the United States of America, 90, 1232–1236.

EL Karoui, N., & Karatzas, I. (1994). Dynamic allocation problems in continuous time. The Annalsof Applied Probability, 4(2), 255–286.

References 399

EL Karoui, N., & Karatzas, I. (1997). Synchronization and optimality for multi-armed bandit prob-lems in continuous time. Computational and Applied Mathamatics, 16(2), 117–151.

Emmons, H., & Pinedo, M. (1990). Scheduling stochastic jobs with due dates on parallel machines.European Journal of Operational Research, 47, 49–55.

Erel, E., & Sarin, S. C. (1989). Scheduling independent jobs with stochastic processing times anda common due date on parallel and identical machines. Annals of Operations Research, 17,181–198.

Evans, S. R., & Norback, J. P. (1985). The impact of a decision-support system for vehicle routingin a foodservice supply situation. Journal of the Operational Research Society, 52, 467–472.

Fawcett, P., Mcleish, R., & Ogden, I. (1992). Logistics management. Harlow: Prentice Hall.

Federgruen, A., & Mosheiov, G. (1997). Single machine scheduling problems with general break-downs, earliness and tardiness costs. Operations Research, 45, 66–71.

Ferguson, M., & Koenigsberg, O. (2007). How should a firm manage deteriorating inventory?Production and Operations Management, 16, 306–321.

Feller, W. (1966). An introduction to probability theory and its applications (Vol II). New York:Wiley.

Frostig, E. (1991). A note on stochastic scheduling on a single machine subject to breakdown –The preemptive repeat model. Probability in the Engineering and Informational Sciences, 5,349–354.

Gage, A., & Murphy, R. R. (2004). Sensor scheduling in mobile robots using incomplete informa-tion via Min-Conflict with Happiness. IEEE Transactions on Systems, Man, and Cybernetics,Part B, 34, 454–467.

Gardoni, P., Reinschmidt, K. F., & Kumar, R. (2007). A probabilistic framework for Bayesianadaptive forecasting of project progress. Computer-Aided Civil and Infrastructure Engineering,22, 182–196.

Garey M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory ofNP-completeness. New York: W. H. Freeman.

Gehringer, E. F., Siewiorek, D. P., & Segall, Z. (1987). Parallel processing: The Cm∗ experience.Bedford: Digital Press.

Gel, E. S., Hopp, W. J., & Van Oyen, M. P. (2002). Factors affecting opportunity of worksharingas a dynamic line balancing mechanism. IIE Transactions, 34, 847–863.

Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). Joural ofRoyal Statististical Society B, 41, 148–164.

Gittins, J. C. (1989). Multi-armed bandit allocation indices (Wiley-Interscience series in systemsand optimization). Chichester: Wiley. ISBN:0-471-92059-2.

Gittins, J. C., & Jones, D. (1974). A Dynamic allocation index for the sequential allocation ofexperiments. In J. Gani, et al. (Eds.), Progress in statistics. Amsterdam: North Holland.

Gittins, J. C., & Glazebrook, K. D. (1977). On Bayesian models in stochastic scheduling. Journalof Applied Probability, 14, 556–565.

Glazebrook, K. D. (1979). Scheduling tasks with exponential service times on parallel processors.Journal of Applied Probability, 16, 685–689.

Glazebrook, K. D. (1984). Scheduling stochastic jobs on a single machine subject to breakdowns.Naval Research Logistics Quarterly, 31, 251–264.

Glazebrook, K. D. (1987). Evaluating the effects of machine breakdowns in scheduling stochasticproblems. Naval Research Logistics, 34, 319–335.

400 References

Glazebrook, K. D. (1991). On nonpreemptive policies for stochastic single-machine schedulingwith breakdowns. Probability in the Engineering and Informational Sciences, 5, 77–87.

Glazebrook, K. D., & Boys, R. J. (1995). A class of Bayesian models for optimal exploration.Journal of the Royal Statistical Society, Series B (Methodological), 57, 705–720.

Glazebrook, K. D., & Owen, R. W. (1991). New results for generalised bandit processes. Interna-tional Journal of Systems Science, 22, 479–494.

Glazebrook, K. D. (2005). Optimal scheduling of tasks when service is subject to disruption: Thepreempt-repeat case. Mathematical Methods of Operations Research, 61, 147–169.

Goyal, S. K., & Giri, B. C. (2001). Recent trends in modeling of deteriorating inventory. EuropeanJournal of Operational Research, 134, 1–16.

Groenevelt, H., Pintelon, L., & Seidmann, A. (1992). Prodution batching with machine breakdownsand safty stocks. Operations Research, 40, 959–971.

Hall, N. G., & Potts, C. N. (2004). Rescheduling for new orders. Operations Research, 52,440–453.

Hanemann, A., Sailer, M., & Schmitz, D. (2005). Towards a framework for IT service fault man-agement. In Proceedings of the European University information systems conference (EUNIS2005), Manchester.

Heathcote, C. R. (1961). Preemptive priority queueing. Biometrika, 48, 57–63.

Herrmann, J. W. (2006). Rescheduling strategies, policies, and methods. In J. W. Herrmann (Ed.),Handbook of production scheduling (pp. 135–148). New York: Springer.

Herroelen, W., & Leus, R. (2005). Project scheduling under uncertainty: Survey and research po-tentials. European Journal of Operational Research, 165, 289–306.

Hino, C. M., Ronconi, D. P., & Mendes, A. B. (2005). Minimizing earliness and tardiness penal-ties in a single-machine problem with a common due date. European Journal of OperationalResearch, 160, 190–201.

Hoogeveen, J. A. (2005). Multicriteria scheduling. European Journal of Operational Research,167, 592–623.

Hoogeveen, H., Lente, C., & T’kindt, V. (2012). Rescheduling for new orders on a single machinewith setup times. European Journal of Operational Research, 223, 40–46.

Hopkins, A. L., Smith, T. B., III & Lala, J. H. (1978). FTMP – A highly reliable fault-tolerantmultiprocessor for aircraft. Proceedings of the IEEE, 66(10), 1221–1239.

Hordijk, A., & Koole, G. (1993). On the optimality of LEPT and µc rules for parallel processorsand dependent arrival processes. Advances in Applied Probability, 25, 979–996.

Hsu, V. N. (2000). Dynamic economic lot size model with perishable inventory. Management Sci-ence, 46, 1159–1169.

Huang, C. C., & Weiss, G. (1992). Scheduling jobs with stochastic processing times and due datesto minimize total tardiness. Communications in Statistics. Stochastic Models, 8, 529–541.

Iannaccone, G., Chuah, C., Mortier, R., Bhattacharyya, S., & Diot, C. (2002). Analysis of linkfailures in an IP backbone. In Proceedings of ACM SIGCOMM internet measurement workshop,Marseille.

Iijima, M., Komatsu, S., & Katoh, S. (1996). Hybrid just-in-time logistics systems and informa-tion networks for effective management in perishable food industries. International Journal ofProduction Economics, 44, 97–103.

Ishikida, T., & Varaiya, P. (1994). Multi-armed bandit problem revisited. Journal of OptimizationTheory and Applications, 83(1), 113–154.

References 401

Jackson, J. R. (1955). Scheduling a production line to minimize maximum tardiness (ResearchReport 43). Management Science Research Project, UCLA.

Jain, S., & Foley, W. J. (2002). Impact of interruptions on schedule execution in flexible manufac-turing systems. International Journal of Flexible Manufacturing Systems, 14, 319–344.

Jang, W., & Klein, C. M. (2002). Minimizing the expected number of tardy jobs when processingtimes are normally distributed. Operation Research Letters, 30, 100–106.

Kampke, T. (1987a). On the optimality of static priority policies in stochastic scheduling on parallelmachines. Journal of Applied Probability, 24, 430–448.

Kampke, T. (1987b). Necessary optimality conditions for priority policies in stochastic weightedflowtime scheduling problems. Advances in Applied Probability, 19(3), 749–750.

Kampke, T. (1989). Optimal scheduling of jobs with exponential service times on identical parallelprocessors. Operations Research, 37, 126–133.

Karatzas, I., & Shreve, S. E. (1998). Methods of mathematical finance. New York: Springer.

Kaspi, H., & Mandelbaum, A. (1995). Levy bandits: Multi-armed bandits driven by Levy pro-cesses. Annals of Applied Probability, 5(2), 541–565.

Kaspi, H., & Mandelbaum, A. (1998). Multi-armed bandits in discrete and continuous time. Annalsof Applied Probability, 8(4), 1270–1290.

Karp, R. M. (1972). Reducibility among combinational problems. In R. E. Miller & J. W. Thatcher(Eds.), Complexity of Computer Computations (pp. 85–103). New York: Plenum Press.

Koulamas, C., & Kyparisis, G. J. (2007). Single-machine and two-machine flowshop schedulingwith general learning functions. European Journal of Operational Research, 178, 402–407.

Krawczyk, H., & Kubale, M. (1985). An approximation algorithm for diagnostic test schedulingin multi-computer systems. IEEE Transactions on Computers, 34, 869–872.

Kuo, W. H., & Yang, D. L. (2006). Minimizing the total completion time in a single machinescheduling problem with a time-dependent learning effect. European Journal of OperationalResearch, 174, 1184–1190.

Kuo, W., Chien, W. T. K., & Kim, T. (1998). Reliability, yield, and stress burn-in. Boston: Kluwer.

Lai, T. L., & Ying, Z. (1988). Open bandit processes and optimal scheduling of queueing networks.Advances in Applied Probability, 20, 447–472.

Lauff, V., & Werner, F. (2004). Scheduling with common due date, earliness and tardiness penaltiesfor multimachine problems: A survey. Mathematical and Computer Modelling, 40, 637–655.

Lawler, E. L., Lenstra, J. K., & Rinnooy Kan, A. H. G. (1982). Recent development in deterministicsequencing and scheduling. A survey. In M. A. H. Dempster, J. K. Lenstra, & A. H. G. RinnooyKan (Eds.), Deterministic and stochastic scheduling. Dordrecht: D. Reidel.

Lawler, E.L., Lenstra, J.K., & Rinnooy Kan, A.H.G. (1982). Recent development in deterministicsequencing and scheduling: A survey. In M.A.H. Dempster, J.K. Lenstra, A.H.G.Rinnooy Kan(Eds.), Deterministic and Stochastic Scheduling, D. Reidel, Dordrecht.

Lee, W. C. (2011). Scheduling with general position-based learning curves. Information Science,181, 5515–5522.

Lee, S. I., & Kitanidis, P. K. (1991). Optimal estimation and scheduling in aquifer remediationwith incomplete information. Water Resources Research, 27, 2203–2217.

Lee, C. Y., & Lin, C. S. (2001). Single-machine scheduling with maintenance and repair rate-modifying activities. European Journal of Operational Research, 135, 493–513.

Lee, C. Y., & Yu, G. (2007). Single machine scheduling under potential disruption. OperationsResearch Letters, 35, 541–548.

402 References

Liskyla-Peuralahti, J., Spies, M., & Tapaninen, U. (2011). Transport vulnerabilities and critical in-dustries: Experiences from a Finnish stevedore strike. International Journal of Risk Assessmentand Management, 15, 222–240.

Li, W., Braun, W. J., & Zhao, Y. Q. (1998). Stochastic scheduling on a repairable machine withErlang uptime distribution. Advances in Applied Probability, 30(4), 1073–1088.

Luenberger, D. G. (1984). Linear and nonlinear programming. Reading: Addison-Wesley.

Mandelbaum, A. (1986). Discrete multiarmed bandits and multiparameter processes. ProbabilityTheory and Related Fields, 71, 129–147.

Mandelbaum, A. (1987). Continuous multi-armed bandits and multiparameter processes. Annalsof Probabability, 15(4), 1527–1556.

Mehta, S. V., & Uzsoy, R. M. (1998). Predictable scheduling of a job shop subject to breakdowns.IEEE Transactions on Robotics and Automation, 14, 365–378.

Merten, A. G., & Muller, M. E. (1972). Variance minimization in single machine sequencing prob-lems. Management Science, 18, 518–528.

Mittenthal, J., & Raghavachari, M. (1993). Stochastic single machine scheduling with quadraticearly-tardy penalties. Operations Research, 41, 786–796.

Mosheiov, G. (1991). V-shaped policies for scheduling deteriorating jobs. Operations Research,39, 979–991.

Mosheiov, G. (2001). Scheduling problems with a learning effect. European Journal of Opera-tional Research, 52, 687–693.

Nahmias, S. (1982). Perishable inventory theory: A review. Operations Research, 30, 680–708.

Nandakumar, P., & Morton, T. E. (1993). Near myopic heuristics for the fixed-life perishabilityproblem. Management Science, 39, 1490–1498.

Nash, P. (1973). Optimal allocation of resources between research projects. Ph.D. Thesis,Cambridge University.

Nash, P. (1980). A generalized bandit problem. Journal of the Royal Statistical Society, Series B,42(2), 165–169.

Nicola, V. F., Kulkarni, V. G., & Trivedi, K. S. (1987). A queueing analysis of fault-tolerant com-puter systems. IEEE Transactions on Software Engineering, 13, 363–375.

Ow, P. S. (1985). Focused scheduling in proportionate flowshops. Management Science, 31(7),852–869.

Panwalker, S. S., Smith, M. L., & Seidmann, A. (1982). Common due date assignment to minimizetotal penalty for the one machine scheduling problem. Operations Research, 30, 391–399.

Parzen, E. (1992). Modern probability theory and its applications. New York: Wiley.

Peskir, G., & Shiryaev, A. N. (2006). Optimal stopping and free-boundary problems (Lectures inmathematics). Basel: ETH Zurich/Birkhauser.

Pinedo, M. (1983). Stochastic scheduling with release dates and due dates. Operations Research,31, 559–572.

Pinedo, M. (2002). Scheduling: Theory, algorithms, and systems (2nd ed.). Englewood Cliffs: Pren-tice Hall.

Pinedo, M., & Rammouz, E. (1988). A note on stochastic scheduling on a single machine subjectto breakdown and repair. Probability in the Engineering and Informational Sciences, 2, 41–49.

Pinedo, M., & Wei, S. H. (1986). Inequalities for stochastic flowshops and job shops. AppliedStochastic Models and Data Analysis, 2, 61–69.

References 403

Pinedo, M., & Weiss, G. (1987). The largest variance first policy in some stochastic schedulingproblems. Operations Research, 35, 884–891.

Qi, X. D., Yin, G., & Birge, J. R. (2000a). Scheduling problems with random processing timesunder expected earliness/tardiness costs. Stochastic Analysis and Applications, 18, 453–473.

Qi, X. D., Yin, G., & Birge, J. R. (2000b). Single machine scheduling with random machine break-downs and randomly compressible processing times. Stochastic Analysis and Applications, 18,635–653.

Raafat, F. (1991). Survey of literature on continuously deteriorating inventory models. Journal ofthe Operational Research Society, 42, 27–37.

Righter, R. (1988). Job scheduling to minimize expected weighted flowtime on uniform processors.Systems Control Letters, 10, 211–216.

Righter, R. (1994). Scheduling. In M. Shaked & J. G. Shanthikumar (Eds.), Stochastic orders andtheir applications. Boston: Academic.

Righter, R., & Xu, S. H. (1991). Scheduling jobs on nonidentical IFR processors to minimizegeneral cost functions. Advances in Applied Probability, 23, 909–924.

Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the AmericanMathematical Society, 58(5), 527–535.

Ronconi, D. P., & Powell, W. B. (2010). Minimizing total tardiness in a stochastic single ma-chine scheduling problem using approximate dynamic programming. Journal of Scheduling, 13,597–607.

Ross, S. M. (1983). Introduction to stochastic dynamic programming. New York: Academic.

Ross, S. M. (1996). Stochastic processes (2nd ed.). New York: Wiley.

Rothkopf, M. H. (1966a). Scheduling with random service times. Management Science, 12,707–713.

Rothkopf, M. H., & Smith, S. A. (1984). There are no undiscovered priority index sequencing rulesfor minimizing total delay costs. Operations Research, 32, 451–456.

Rothkopf, M. H. (1966b). Scheduling independent tasks on parallel processors. Management Sci-ence, 12, 437–447.

Rowe, J., Jewers, K., Sivayogan, J., Codd, A., & Alcock, A. (1996). Intelligent retail logisticsscheduling. AI Magazine, 17, 31–40.

Sarin, S. C., Erel, E., & Steiner, G. (1991). Sequencing jobs on a single machine with a com-mon due date and stochastic processing times. European Journal of Operational Research, 51,287–302.

Shaked, M., & Shanthikumar, J. G. (1994). Stochastic orders and their applications. Boston:Academic.

Shaked, M., & Shanthikumar, J. G. (2007). Stochastic orders. New York: Springer.

Shanthikumar, J. G., & Yao, D. D. (1991). Bivariate characterization of some stochastic orderrelations. Advances in Applied Probability, 23, 642–659.

Smith, W. E. (1956). Various optimizers for single-stage production. Naval Research LogisticsQuarterly, 3, 59–66.

Snell, L. (1952). Applications of martingale systems theorems. Transactions of American Mathe-matical Society, 73, 293–312.

Snyder, L. V., Atan, Z., Peng, P., Rong, Y., Schmitt, A. J., & Sinsoysal, B. (2010, submitted).OR/MS models for supply chain disruptions: A review. Social Science Research Network. http://dx.doi.org/10.2139/ssrn.1689882.

404 References

Starbird, S. A. (1988). Optimal loading sequences for fresh-apple storage facilities. OperationsResearch, 39, 911–917.

Stevenson, W. J. (2009). Operations management (10th ed.). New York: McGraw-Hill.

Stoyanov, J. M. (1997). Counterexamples in probability (2nd ed.). Chichester/New York: Wiley.

Tadei, R., Trubian, M., Avendano, J. L., DellaCroce, F., & Menga, G. (1995). Aggregate planningand scheduling in the food industry: A case study. European Journal of Operational Research,87, 564–573.

Takine, T., & Sengupta, B. (1997). A single server queue with service interruptions. Journal ofQueueing Systems, 26, 285–300.

Townsend, W. (1978). The single-machine problem with quadratic penalty function of completiontimes: A branch-and-bound solution. Managment Science, 24, 530–534.

Trivedi, K. S. (2001). Probability and statistics with reliability, queuing, and computer scienceapplications. New York: Wiley.

Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. The Annals of Applied Proba-bility, 4(1), 194–199.

Vakharia, A. J., & Yenipazarli, A. (2009). Managing supply chain disruptions. Foundations andTrends in Technology, Information and Operations Management, 2, 243–325.

Van Oyen, M. P., Pandelis, D. G., & Teneketzis, D. (1992). Optimality of index policies for stochas-tic scheduling with switching penalties. Journal of Applied Probability, 29(4), 957–966.

Varaiya, P., Walrand, J., & Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem:The discounted case. IEEE Transactions on Automatic Control, 230, 426–439.

Vieira, G. E., Herrmann, J. W., & Lin, E. (2003). Rescheduling manufacturing systems: A frame-work of strategies, policies, and methods. Journal of Scheduling, 6, 39–62.

Wan, G., & Yen, B. P.-C. (2009). Single machine scheduling to minimize total weighted earli-ness subject to minimal number of tardy jobs. European Journal of Operational Research, 195,89–97.

Wang, X., & Cheng, T. C. E. (2007). Single-machine scheduling with deteriorating jobs and learn-ing effects to minimize the makespan. European Journal of Operational Research, 178, 57–70.

Wang, J. B., Wang, D., & Zhang, G. D. (2010). Single-machine scheduling with learning functions.Applied Mathematics and Computation, 216, 1280–1286.

Weber, R. R. (1982a). Scheduling stochastic jobs on parallel machines to minimize makespan orflowtime. In R. Disney & T. Ott (Eds.), Applied probability – Computer science: The interface(pp. 327–337). Boston: Birkhauser.

Weber, R. R. (1982b). Scheduling jobs with stochastic processing requirements on parallel ma-chines to minimize makespan or flowtime. Journal of Applied Probability, 19, 167–182.

Weber, R. R. (1988). Stochastic scheduling on parallel processors and minimization of concavefunctions of completion times. In W. H. Fleming & P. L. Lions (Eds.), Stochastic differentialsystems, stochastic control theory and applications, Minneapolis, 1986 (IMA volumes in math-ematics and its applications, Vol. 10, pp. 601–609). New York: Springer.

Weber, R. R. (1992). On the Gittins index for multiarmed bandits. Annals of Probability, 2(4),1024–1033.

Weber, R. R., Varaiya, P., & Walrand, J. (1986). Scheduling jobs with stochastically ordered pro-cessing times on parallel machines to minimize expected flowtime. Journal of Applied Proba-bility, 23, 841–847.

Weiss, G. (1984). Scheduling spares with exponential lifetimes in a two component parallel system.Naval Logistics Research Quarterly, 31, 431–446.

References 405

Weiss, G. (1988). Branching bandit processes. Probability in Engineering and Information Science,2, 269–278.

Weiss, G. (1990). Approximation results in parallel machines stochastic scheduling. Annals ofOperations Research, 26, 195–242.

Weiss, G., & Pinedo, M. (1980). Scheduling tasks with exponential service times on nonidenticalprocessors to minimize various cost functions. Journal of Applied Probability, 17, 187–202.

Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of Royal Statistical Society,Series B, 42(2), 143–149.

Whittle, P. (1981). Arm-acquiring bandits. The Annals of Probability, 9(2), 284–292

Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. Journal of AppliedProbability, 25, 287–298. A Celebration of Applied Probability.

Wilson, M. C. (2007). The impact of transportation disruptions on supply chain performance.Transportation Research Part E: Logistics and Transportation Review, 43, 295–320.

Wu, H.-C. (2010). Solving the fuzzy earliness and tardiness in scheduling problems by using ge-netic algorithms. Expert Systems with Applications, 37, 4860–4866.

Wu, X., & Zhou, X. (2013). Open bandit processes with uncountable states and time-backwardeffects. Journal of Applied Probability, 50(2), 388–402.

Wu, C. C., & Lee, W. C. (2009). Single-machine and flowshop scheduling with a general learningeffect model. Computers and Industrial Engineering, 56, 1553–1558.

Wu, X. Y., & Zhou, X. (2008). Stochastic scheduling to minimize expected maximum lateness.European Journal of Operational Research, 190, 103–115.

Wu, C. C., Yin, Y. Q., & Cheng, S. R. (2011). Some single-machine scheduling problems with atruncation learning effect. Computers and Industrial Engineering, 60, 790–795.

Xu, H. S., Kumar, S. P. R., & Mirchandani, P. B. (1992). Scheduling stochastic jobs with increasinghazard rate on identical parallel machines. Computers and Operations Research, 19, 535–543.

Yin, Y. Q., Xu, D. H., Sun, K. B., Li, H. G. (2009). Some scheduling problems with generalposition-dependent and time-dependent learning effects. Information Science, 179, 2416–2425.

Yin, Y. Q., Xu, D. H., & Huang, X. K. (2011). Notes on “Some single-machine schedulingproblems with general position-dependent and time-dependent learning effects”. InformationScience, 181, 2209–2217.

Yu, G., & Qi, X. (2004). Disruption management: Framework, models and applications.Singapore/River Edge: World Scientific.

Zhang, H., & Graves, S. C. (1997). Cyclic scheduling in a stochastic environment. OperationsResearch, 45, 894–903.

Zhao, C., & Tang, H. (2010). Rescheduling problems with deteriorating jobs under disruptions.Applied Mathematical Modelling, 34, 238–243.

Zhang, Y. B., Wu, X. Y., & Zhou, X. (2013, to appear). Stochastic scheduling problems withgeneral position-based learning effects and stochastic breakdowns. Journal of Scheduling.

Zhou, X., & Cai, X. Q. (1997). General stochastic single-machine scheduling with regular costfunctions. Mathematical and Computer Modeling, 26, 95–108.

Index

AAgreeability, 284, 297σ−Algebras, optimal stopping

countability, 189vs. linear spaces, measurable functions,

190–191measurable space, 188Monotone Class Theorem, 189–190p and d-system, 189probability theory, 188random variable, 188semi-algebra, 188

Asymmetric linear cost function, 112–116Asymmetric linear earliness-tardiness costs,

154–157Asymmetric quadratic cost function, 110–112Asymmetric quadratic earliness-tardiness

costs, 152–154

BBayesian methodology, 299Borel field, 3Borel measurement, 4

CCapped-loss model, 279–281Cauchy distribution, 137cdf. See Cumulative distribution function (cdf)Completion time, no-loss breakdown model,

145–146Completion time variance (CTV)

algorithm, 133–135description, 127–128structural property of optimal sequence,

130–133weighted variance problem, 128–130

Compound-type distributionscharacteristic functions, 85classes

with cdf, 86compound geometric, 87exponential, 87Fourier transformation, 86geometric, 87Laplace, 87Levy process, 87–88likelihood-ratio ordered, 88Polya-type, 87

due dates, 85exponential distribution, 85optimal sequences

with due dates, 91–94for total expected costs, 89–91

Conditional probability, 8Continuous time, closed bandit processes

deteriorating banditsGittins index policies, 249, 251optimal policy, 252reward rates, 251total discounted reward, 251

deterministic function, 250dimensional stochastic process, 248integrability condition, 248optimal policy, 249–251probability space, 248total expected reward, 248

Convergence theoremsDoob’s upcrossing inequality, 214Fatou’s lemma, 213supermartingale, 214time horizon, 213uniform integrability, 213

CTV. See Completion time variance (CTV)

X.Q. Cai et al., Optimal Stochastic Scheduling, International Series in OperationsResearch & Management Science 207, DOI 10.1007/978-1-4899-7405-1,© Springer Science+Business Media New York 2014

407

408 Index

Cumulative distribution function (cdf)compound-type distributions, 85and EWNT, 61

DDecreasing failure rate (DFR), 271Degenerate distribution, 136Delayed exponential distribution, 137Deteriorating processing times, stochastic

schedulingmodel formulation

initial processing time, 322job deterioration, 323–324machine breakdowns, 323objective function, 324–325processing requirement, realization,

323–324static policies, 322

occupying time characteristicsindependent exponential random variables,

334–335renewal equation, 331–332second-order differential equation, 333

optimal policiesclassical scheduling problem, 339computational complexity, 339exponential distributions, 336induction hypothesis, 336jobs, 336machine breakdowns, 338

processibilitydowntime-first process, 325–326exponential distributions, 327–331occupying time, 326renewal equation, 326–327

DFR. See Decreasing failure rate (DFR)Discounted stopping times, 274Discrete time, closed bandit processes

allocation policy, 228arm k, 227–228, 230filtrations, 227Gittins index policy, 229, 230integrability, 228machine idles, 228optimal policy, 229pre and post payment setting, 228random durations, 227seminal contributions, 230single arm process, 230–234stochastic process, 227total discounted rewards, 229

Doob’s stopping theoremconvergence theorem, 208, 209deterministic stopping time, 207

right-continuous, 207uniform integrability, 207, 208

Dynamic policiesinformation utilization

decision-making process, 254deterministic and stochastic scheduling,

253–254job processing, 254nonpreemptive dynamic policy, 39,

254–255probability distributions, 253static list policy, 39, 254–255unrestricted dynamic policy, 40, 256–257

partial-loss breakdown modelscapped-loss model, 279–281integral equations, Gittins indices,

274–276local preemptive-repeat model, 278–279optimal policies via Gittins indices,

276–278semi-Markov model, job processing,

272–273restricted dynamic policies

no-loss breakdown model (see No-lossbreakdown model)

total-loss breakdown model (see Total-lossbreakdown model)

unrestricted policies, parallel machinemodel

LEPT policies, 288–291optimality equation, 282–283SEPT policies, 283–288

EEarliest due date (EDD), 349Earliest expected due date (EEDD) rule, 348Earliness/tardiness (E/T) penalties

distribution types, 96exponential processing times, 106–117normal processing times, 96–106with random processing times and due dates,

96Erlang distribution, 138EWFT, 49–50Expected costs of earliness and tardy jobs

class of scheduling problems, 117parallel machine scheduling, 120–127single machine scheduling, 118–120TEC, 118

Expected discounted reward (EDR)definition, 169delay costs, 169EWNTJ, 174EWSFT, 176–177

Index 409

Laplace transform, occupying timecumulative distribution function, 173Gamma density function, 172–173identical and independent processing

times, 170–172MCAR, 177–179optimal static policy, 173–174random processing times, 170scheduling jobs, delayed delivery, 175

Expected mean weighted flowtime (EMWFT)no-loss (preemptive-resume) model

definition, 146optimal static policy, 147processing times, 146

total-loss machine breakdown modelexponentially distributed uptimes, 165periodical inspection, 166–169uniform uptimes and processing times,

166Expected mean weighted squared flowtime

(EMWSF), 71Expected mean weighted tardiness (EMWT),

70Expected occupying time, total-loss model

identical processing timescounting process, 159description, 157–158Fubini Theorem, 160–161

independent processing timesdescription, 158vs. identical processing times, 162–164probability distribution, 161–162

processing achievement, 158Expected weighted discounted cost (EWDC),

261–263Expected weighted discounted rewards

(EWDR), 304Expected weighted flowtime (EWF), 308Expected weighted number of tardy jobs

(EWNTJ)cost matrix, 63–64due dates and optimal static sequence, 62dynamic policies, 61non-preemptive dynamic policies, 62–63non-preemptive static list policies, 61NP-hard even with equal due dates, 71random processing times and due dates, 71total tardy penalty of missing due dates,

60–61Expected weighted squared flowtime

(EWSFT), 176–177Exponential distribution, 137Exponential processing times

algorithm to optimal V-shaped sequence,116–117

applicationsdeadline in due dates, 84price variations and interest accrual of

capitals, 82–83asymmetric linear cost function, 112–116asymmetric quadratic cost function,

110–112density and cumulative distribution

functions, 74–75description, 74–75LEPT, 75optimal sequence

with due dates, 79–82general costs, 76–79

SEPT, 75symmetric quadratic cost function, 107–110WSEPT, 75

FFamily of distribution

binomial, 22exponential, 19–20gamma, 20–21log-normal, 21negative binomial, 22–23normal, 21Pareto, 22Poisson, 22uniform, 21Weibull, 20

Fibonacci method, 104, 139–140Fubini Theorem, 160–161

GGamma distribution, 137–138GDC. See General discounted cost (GDC)General discounted cost (GDC), 257–258, 261Generalized bandit problems

arbitrary arm types, 244deterministic arms, 246discounting factor., 247discrete time, 244Gittins index, 246, 247Markov structure, 246optimal policy, 245reward process, 244stopping times, 245, 247time-revisable model, 246

General regular costsdescription, 66MEC, 72–74TEC, 67–72

410 Index

Gittins index policiesoptimality, 226, 232, 239, 242, 251right continuous, 248seminal contributions, 230stochastic scheduling, 225stopping times, 232

Gittins indicesno-loss breakdown model, 270–272partial-loss breakdown models, 272–281total-loss breakdown model, 267–269

GLFS. See Greatest loss first serve (GLFS)

IIncreasing failure rate (IFR), 271Information filtrations, stochastic processes

σ−algebras, 197–199continuity, filtration, 199countable minimizations, 199filtered probability space, 198natural filtration, 198right continuous, 198, 199

Irregular performance measuresCauchy distribution, 137CTV, 127–135degenerate distribution, 136delayed exponential distribution, 137description, 95Erlang distribution, 138E/T penalties (see Earliness/tardiness (E/T)

penalties)expected costs of earliness and tardy jobs,

117–127exponential distribution, 137Fibonacci method, 139–140gamma distribution, 137–138integral inequality, 138–139Laplace distribution, 137log-normal distribution, 138normal distribution, 136Poisson distribution, 138student-t distribution, 137uniform distribution, 136–137

JJob characteristics, scheduling problem

arrival (available) times, 34description, 31–32due date/deadlines

earliness cost, 33lateness cost, 33–34tardiness cost, 33target time, completion, 32

processing times, 32weights, 34

LLaplace distribution, 137Laplace transform, 301–302Learning effects, stochastic model

jobs, processing times and due dates, 339optimal policies

continuous and hazard rate function, 340lateness scheduling, 341LEPT rule, 342processing times, likelihood-ratio order,

341–342SEPT rule, 342WDSEPT rule, 344WSEPT rule, 342–343

unreliable machinesjob, completion time, 345means and variances, jobs, 345–346nonnegative random pairs, 344–345Poisson process, 345

Lebesgue integral, 5–6Lebesgue measurement, 4Lebesgue–Stieltjes integral, 11LEPT. See Longest expected processing time

(LEPT)LEPT policies

classes, 290–291nonpreemptive dynamic policies, 289–291

Levy process, 88Linear spaces, measurable functions

conditional expectations, 191countable minimizations, 190functional Monotone Class Theorem, 191p-system, 191random variables, 190real-valued functions, 190, 191

Local preemptive-repeat model, 278–279Log-normal distribution, 138Longest expected processing time (LEPT)

dynamic programming, 58exponentially distribution, 58, 74makespan in dynamic policies, 58–59preemptive dynamic policies, 60rule, 342

MMaintenance checkup and repair (MCAR),

177–179Makespan scheduling

LEPT, 58–60SEPT, 57–58

Markovian deterministic and homogeneous(MDH) stopping rule, 274

Martingales, optimal stoppingalgebraic criterion, 210

Index 411

conditional expectations, 206continuous time stochastic processes, 211convergence theorems, 213–214definitions, 206–207Doob’s stopping theorem, 207–209maxima inequalities, 211–212Monotone Convergence Theorem, 210path regularity

convergence theorem, 217dyadic rationals, 216integrable process, 217probability, event, 214random variables, 215reversed time submartingale, 215right-continuity, 216submartingale property, 216–218supremum, 215

random variable, 209stochastic processes, 206supermartingale, 210upcrossing times, 209

Maximum discounted holding cost (MDC),258

Maximum expected cost (MEC)definition, 72mean cost functions, 72–73and MEL, 73and MEWT, 74and MW PL, 74

Maximum expected lateness (MEL), 73, 373Maximum expected weighted tardiness

(MEWT)and EDD order, 74and EEDD, 73–74

Maximum lateness, stochastic minimizationEDD, 349EEDD and SEPT, 352–353largest mean weight first rule, 354processing times, 350–352

Maximum weighted probability of lateness(MWPL), 74

Mean time to failure (MTTF), 361Mean weighted lateness probability (MWLP),

71Monotone class theorems

definition, 189measurable functions, 190p and d-system, 189real-valued functions, 189

MPTS. See Multi-processor task scheduling(MPTS)

MTTF. See Mean time to failure (MTTF)Multi-armed bandit processes

arbitrary stochastic processes, 226

auxiliary retirement argument, 226classical model, 226continuous time (see Continuous time,

closed bandit processes)costs/delays, 226discrete time (see Discrete time, closed

bandit processes)economical notion, 226generalized bandit problems, 244–247Gittins index policies, 225, 226information filtration, 226Markovian setting, 227open bandit processes (see Open bandit

processes)optimal resource allocation, 225total discounted rewards, 225

Multi-processor task scheduling (MPTS), 360MWLP. See Mean weighted lateness

probability (MWLP)MWPL. See Maximum weighted probability

of lateness (MWPL)

NNo-loss breakdown model

DFR, 271expected total discounted rewards, 270Gittins index, 270–272IFR, 271parallel machine scheduling, 120semi-Markov process, 270

No-loss (preemptive-resume) modelcompletion time, 145–146description, 141irregular costs

asymmetric linear earliness-tardinesscosts, 154–157

asymmetric quadratic earliness-tardinesscosts, 152–154

symmetric quadratic earliness-tardinesscosts, 149–152

TEC and tardy jobs, 157regular cost functions

EWMFT, 146–147processing job, 147TEC and MEC, 148–149

Normal distribution, 136Normal processing times

algorithms, 104–105computational experiments, 106due dates, 97justification, 96, 135–136means and variances, 96normal and uniform distributions, 97objective function, 96–99

412 Index

Normal processing times (cont.)optimality properties

integral inequality application, 101quasiconvex function, 101SEPT, 102–104V-shaped structure of optimal sequences,

99–101

OOpen bandit processes

convergence theorem, 238cumulative duration, 237deterministic state transition law, 235discounting factor, 237, 238, 242finite dimensional distributions, 236Gittins index policies (see Gittins index

policies)Markov decision setting, 236natural filtration, 240optimal policies, 235reversed time, 238reward function y(n), 242stochastic process, 236, 239–241stopping time theorem, 238time parameter, 241total discounted rewards, 235, 240, 241total reward expectation, 237value function, 241, 243

Optimal stopping problemsσ−algebras and monotone class theorems

(see σ -Algebras, optimal stopping)augmentation, filtrations, 222, 223conditional expectations, 192convergence theorem, 222deterministic time, 219essential supremum, 196Gittins index policies, 218information flow, 187martingales (see Martingales, optimal

stopping)probability spaces, 191–192randomization, 220real-valued functions, 187right-continuous process, 218stochastic processes, 197–201stopping times, 202–206time horizon, 218uniform integrability, 192–196

PParallel machine scheduling, 120–127Partial-loss breakdown models

calculation of E[Oi] and E[e−rO i], 182–185capped-loss model, 279–281

integral equations, Gittins indices, 274–276local preemptive-repeat model, 278–279non-degenerate distribution, 180optimal policies via Gittins indices, 276–278probability distribution, 180processing requirement condition, 181remaining occupying time, 273semi-Markov model, job processing,

272–273transition, processing achievement, 180

Performance measurementobjective function

completion time variance, 41definition, 40expected completion time variance, 42expected makespan, 42expected maximum lateness, 42expected total discounted reward, 43expected total earliness and tardiness, 43expected total flowtime, 42expected total weighted flowtime, 42expected total weighted tardiness, 43expected weighted number of tardy jobs,

42makespan, 41maximum expected completion time, 42maximum expected lateness, 42maximum lateness, 41total discounted reward, 41total earliness and tardiness, 41total flowtime, 40total weighted flowtime, 41total weighted tardiness, 41weighted number of tardy jobs, 41

optimality criteria, 43–44Perishable products, scheduling

air transportation, 378base model

decisions, 381–382departure time, 381export and local market, 380high contract cancelation penalties, 380processing time, 381static policy components, 382total expected cost (TEC), 382

decision, unfinished productfinishing time, 392optimal dynamic sequence, 395–397optimal static sequence, 394–395postponement, 393–394

deteriorating jobs, 379export market, 378fixed lifetime and random lifetime, 378–379home market, 378

Index 413

operations management literature, 378planning, 378profitability and competitiveness, 377random disruption, 378random market demand, accounting,

390–393raw materials, utilization, 379supply chain performance, 379transportation service, 377–378waiting decision on finished product

“best-before” date, 391–392optimal terminating time, 390–391optimal waiting time, 391

Poisson distribution, 138Policies, scheduling problem

classification, 39–40determinism, 38stochastic, 38–39

Probabilityfamily of distributions, 19–23random variables (see Random variables)space (see Space, probability)

QQuasiconvex function, 101

RRandom variables

conditional distribution, 15–16conditional expectation, 18–19definition, 9distribution functions

cumulative distribution function (cdf),9–10

mass point, 10probability function (pf), 10–11probability mass function (pmf), 10

expectation, 17–18hazard rate, 12independent random variables, 16–17joint probability distributions, 12–14marginal distribution, 14–15notation, 45–46probability distribution, 11–12state space, 9Stieltjes integral, 11

Regular costs with due datestotal weighted tardiness, 64–65weighted number of tardy jobs

EW NT (ζ ), 61–64processing time, 64

Regular performance measurementcompound-type distributions (see

Compound-type distributions)definition, 49

exponential processing times (seeExponential processing times)

general regular costs, 66–74makespan, 57–60regular costs with due dates, 60–65total completion time cost (see Total

completion time cost)Restless deterioration, 323Riemann integrable, 5Riemann–Stieltjes integral, 11

SScheduling problems

job characteristics (see Job characteristics,scheduling problem)

machine environmentsflowshop/jobshop, 35machine breakdowns, 36–37parallel machines, 35single machine, 35team-work machines, 35

notation, 46–47performance measurement (see Performance

measurement)policies, 37–40

Semi-Markov model, job processing, 272–273SEPT. See Shortest expected processing time

(SEPT)SEPT policies

agreeability, 284nonpreemptive static list policies, 285–288supermodularity, 283total expected discounted holding cost,

284–285Shortest expected processing time (SEPT)

description, 53and LEPT, 75non-decreasing and non-increasing, 53–54optimality properties, 102–104preemptive dynamic policies, 57rule, 348, 349, 375static list policy, 55–57

Single arm process, discrete timearbitrary policy, 234arm k, 234augmentation and randomization, 232, 233deteriorating process, 231fixed sequence, 233Gittins index, 231, 232integer-valued, stopping time, 233optimization, 231periodical discounted duration, 230–231random variable, 231stopping theory, 231–232

414 Index

Single machine schedulingexponential distribution, 118optimal static policy to minimize TEC(z),

119–120Space, probability

σ−algebra, 2Borel field and sets, 3conditional probability, 8description, 6–7events, 3–4independent events, 8–9Lebesgue

integral, 5–6measurement, 4

measurable functions, 5notation, 45sample space, 1–2

Stochastic machine breakdownscategorization, 141–142description, 141formulation

machine breakdown processes, 142–143processing time and achievement, 143–144

no-loss (preemptive-resume) model,145–157

partial-loss breakdown models, 179–185total-loss (preemptive-repeat) model,

157–179Stochastic orders

almost-sure and mean orders, 25exponential distributions, 29–30gamma distributions, 30general random variables, 26–28geometric distributions, 30hazard-rate order, 24independent random variables, 28–29likelihood-ratio order, 24–25normal distributions, 30notation, 46Pareto distributions, 30Poisson distributions, 30standard stochastic order, 23–24uniform distributions, 30Weibull distributions, 30

Stochastic policies, 38–39Stochastic processes

continuous time, 197information filtrations, 197–199probability space, 197random variables, 197time functions, 199–201

Stochastic scheduling, incomplete informationBayesian methodology, 299

model formulation and assumptions,300–301

models and probabilistic characteristics,300–304

optimal restricted dynamic policiesEWDR, 304EWF, 308Gittins indices, 305–308Poisson stream, 308unfinished job, 307

posterior Gittins indicesone-step discounted reward rate, 308–311one step posterior distribution, 311via one step reward rate, 316–319weak order, processing time, 313–316

repetition frequency and occupying time,301–303

static policies, incomplete information,303–304

Stochastic scheduling modelsoptimization under stochastic order

bivariate characterization, 348EEDD rule, 348exponential processing times and due

dates, 354–360independent and identically distributed

(i.i.d), 348maximum lateness, stochastic

minimization, 349–354problems, 348–349SEPT rule, 348

perishable products (see Perishableproducts, scheduling)

team-work task scheduling (see Team-worktask scheduling)

Stopping rule, 274Stopping times

σ−algebra, 202conditional expectations, 205debut time, 204definition, 202deterministic time, 203linear space, 203monotonicity, 206natural filtration, 202progressive measurability, 204random variable T, 202right continuous, 205

Student-t distribution, 137Supermodularity, 283, 294–295Symmetric quadratic cost function, 107–110Symmetric quadratic earliness-tardiness costs,

149–152

Index 415

TTeam-work task scheduling

definition, 360deterministic model

cost functions, 365–366dominant component, 367job preemption, 362maximum cost, 362–363maximum lateness (ML), 366optimal policy, 363–365processor, 361–362

MPTS, 360MTTF, 361parallel computing systems, 360reliability test, 361stochastic model

almost sure minimization, 3687expected cost minimization, 368maximum cost minimization, 368–374stochastic order minimization, 367TCT minimization, 374–377

TEC. See Total expected cost (TEC)Time functions, stochastic processes

bivariate function, 199mapping, 200optional sets, 201progressive process, 200random set, 200real valued process, 201

Time-varying scheduling, policiesdeteriorating processing times

model formulation, 322–325occupying time characteristics, 331–335optimal policies, 335–339processibility, 325–331

deterioration models, 321job scheduling, 322learning effects

optimal policies, 340–344unreliable machines, 344–346

model types, 321TKJ. See Truncated at k-th completed job

(TKJ)Total completion time cost

parallel machinesnon-preemptive policies, 52–53reward function, 52SEPT, 53–57static list policy, 53

single machineEW FT, 49–50hazard rate, 51WSEPT, 50–51

Total completion time (TCT), minimization,374–377

Total expected cost (TEC)with asymmetric linear E/T penalties, 112with asymmetric quadratic E/T penalties,

110description, 66due dates, 68–69and EMWSF, 71and EMWT, 70and EWNTJ, 71family of distributions, 68and MWLP, 71stochastic order, nondecreasing function,

67–68with symmetric quadratic cost function, 107with/without job-dependent weights, 69–70

Total-loss breakdown modelGDC, 257–258holding job, unit cost, 258identical processing times, optimal policies

expected weighted discounted reward, 266Gittins index, 267–269jobs and breakdown processes, indepen-

dence, 266–267semi-Markov decision process, 266

independent processing times, optimalpolicies

EWDC, 261–263expected discounted cost, 263–264GDC, 261maximum discounted holding cost,

265–266tardy jobs, stochastic order, 264

MDC, 258occupying time, 259restricted dynamic policies, 39, 255TKJ, 258WDR, 258WFT, 258

Total-loss (preemptive-repeat) modeldescription, 141downloading, Internet, 157EDR, 169–179EWMFT, 164–169expected occupying time, 158–164

Truncated at k-th completed job (TKJ), 258

UUniform distribution, 136Uniform integrability, optimal stopping

convergence theorem, 195, 196convex function, 193, 194definition, 192–193

416 Index

Uniform integrability, optimal stopping (cont.)L1 convergence, 194martingales, continuous time, 194probability converges, 195real-valued random variables, 192, 195sub-σ -algebras, 194

VV-shaped sequences

algorithm to optimal, 116–117dynamic programming algorithms, 104–105minimizes EET, 101–102monotone functions, 99

WWeighted discounted reward (WDR), 258

Weighted discounted shortest expectedprocessing time first rule (WDSEPT),91–92

Weighted flowtime (WFT), 258Weighted shortest expected processing time

(WSEPT)arbitrary distribution, 75and EWFT, 50–52, 74, 75hazard rate, 51optimal static policy, 62–64processing times, 64rule, 342–343

WFT. See Weighted flowtime (WFT)WSEPT. See Weighted shortest expected

processing time (WSEPT)W-shaped sequence, 130

xiaoqiang q. cai xianyi wu xian zhou optimal stochastic … · 2016-11-22 · tion of stochastic...

Documents