a technique for project management in case of …...for m thstretch, j join node is calculated and...

uncertain activity durationsA technique for project management in case of

Academic year 2018-2019

Master of Science in Industrial Engineering and Operations Research

Master's dissertation submitted in order to obtain the academic degree of

Supervisors: Prof. dr. ir. Stijn De Vuyst, Prof. dr. Dieter Fiems

Student number: 01609101Yulia Popova

A technique for project management in case of uncertainactivity durations

Yulia PopovaSupervisor(s): Prof. dr. ir. Stijn De Vuyst

Abstract— In this study the evaluation of the projectmakespan is considered from the perspective of queueingtheory and tail asymptotics of the makespan distribution.The research is restricted to considering the probabilitygenerating functions having a singularity as a simplepole. The analytic approximations are compared withbrute-force Monte Carlo and the possibility to obtainmore accurate estimates of extreme probabilities byimportance sampling is investigated.Key words - Project scheduling, makespan, complexasymptotics, simulation

I. INTRODUCTION

Project scheduling is an essential and complex part ofproject management process, which has the strongest effecton the overall project performance. A deadline has beenbaked into the success criteria of all nowadays projectsand therefore, the ability to estimate the duration for eachactivity as well as an ability to foresee the project makespanat any stage of the project is getting more crucial. Whilethe various research approaches the estimation of projectbeing overdue, the core techniques of the most softwareare based on classical techniques of PERT and CPM. Thisresearch considers the scheduling network diagram as aqueue of a specific structure, assuming that the networkconsists of basic components such as as a stretch (a nodehaving one predecessor and one successor), a fork (a nodehaving one predecessor and multiple successors) and a join(a node having multiple predecessors and one successor)shown on Fig. 1. Although these structures compose thereal network, for simplification of the analysis, they areconsidered separately as if they represent the network ontheir own, and therefore, are called "an initial stretch", "aninitial join" and "an initial fork" respectively.

(a) An initial fork (b) An initial join

(c) An initial stretch

Fig. 1: Basic structures of a network

The network has the ’activity on node’ representation andhas K activities in the network. Let the scheduled startingtime of the activity be denoted as τk for k = 1, 2, . . .K sothat the activity k will start at τk immediately after all of itsimmediate predecessors finish. Introducing a dummy K +1node at the end of the network, the scheduled makespan of

the project can be expressed as the scheduled starting timeof the dummy activity τK+1. During the project run, in caseany of the immediate predecessors is delayed, the activity khas to wait some Wk time slots and consequently the dummynode has the waiting time WK+1 time slots. Let X be themakespan of the project and be defined as

X = τK+1 +WK+1 (1)Each activity k = 1, 2...K has a planned duration ak, beingthe amount of time slots between two consecutive τk, and ithas a service time Sk, which is a random variable. We denotesk(n) = Prob[Sk = n], for n ≥ 0, µk = E[Sk], σ2

k =Var[Sk] to be the probability distribution function of Sk, itsmean and the variance respectively, which by assumption areknown in exact form.

Having the current problem setting, the goal is to evaluatethe makespan distribution of the project X , which beingconverted into evaluation its overtime and thus the evaluationof the waiting time of the dummy activity WK+1.

II. ANALYTIC APPROACH

The idea is to study a tail behaviour of the probabilitydistribution of the overtime WK+1 and to find analyticapproximation of the coefficients of probability generatingfunction of overtime by means of complex asymptotics.The restriction to consider only the probability generatingfunctions, which have a simple pole, is made, and therefore,from the theory of complex asymptotics [2], the asymptoticform of coefficients for rational and meromorhic functions,whose only dominant singularity is a simple pole, is:

w(n) = Prob[W = n] ≈ −θζn+1

(2)

The notation is as follows: θ = ResζW (z) is a residueand ζ is a singularity, thus a simple pole. Such coefficientbehaviour near the singularity is known as a geometrictail due to the geometric decay with the an exponentialgrowth factor of 1/ζ. If such probabilities are plotted ona logarithmic scale as a function of n, one gets a straightline,

log Prob[W = n] = log(−θζn+1

) = log(−θζ

)−n log ζ (3)

Thus, a line can be defined once the values of θWK+1

and ζWK+1 of the overtime distribution are known. Toobtain coefficients wK+1(n) of the overtime distribution, therecursive relation with all the precedent nodes is exploited.Such a relation was elaborated in [1] and a so-called Lindleyequation applies:

Wk+1 = max(0, Sk +Wk − ak) (4)

A. StretchTo simplify Eq. 4, an auxiliary variable B is introducedso that Bk = Wk + Sk. The generating functions

Sk(z) and Wk(z) have geometric tails so that sk(n) =−θSkζ

−n−1Sk

and w(n) = −θWkζ−n−1Wk

. Consequently,bk(n) = −θBkζ

−n−1Bk

and it can be shown that

ζBk = min(ζSk , ζWk ) and (5)

θBk =

Sk(ζWk )θWk , if ζWk < ζSk

θSkWk(ζSk ), if ζWk > ζSk

(6)

Going back to the main variable of interest WK+1, whichhas the same singularity as BK+1 so that ζWK+1 = ζBK+1

and θWK+1 is expressed as

θWK+1 = ResζBKWK+1(z) =

θBK

ζaKBK

(7)

Thus, plugging the obtained ζWK+1 and θWK+1 to Eq. 3, oneexpects to get the analytic form for probabilities of overtime.

B. JoinThe same Lindley equation (4) applies to analysis of the joinstructure and the waiting time of a join node is:

W1 = max(0, B1 − a1) W2 = max(0, B2 − a2) (8)

W3 = max(W1,W2)

ζB1 = min(ζW1 , ζS1) ζB2 = min(ζW2 , ζS2) (9)

ζW3 = min(ζB1 , ζB2)

It can be shown that the following relations hold

ζW3 = ζBj , θW3 = θBj ,with j = argmini∈1,2ζBi

(10)Thus the join activity will inherit the singularity and thecoefficient θB from one of its predecessors.

C. Computation of analytic resultsThe combination of two basic structures, a join and a stretch,into the network can be done in two different ways. Withthe first approach, it can be done by computing the requiredquantities for the nodes in stages, so that starting from thedummy start activity k0, one node k from the set of suc-cessors Succ(k0) is taken and if all the predecessors Pred(k)for that node are already computed, the required quantitiescan be computed for that node k too. This approach doesnot require splitting the network into joins and stretches.

Alternatively, the precedent nodes of each join should besplit into the independent stretches. The singularity ζmWJ

for mth stretch, Jth join node is calculated and the stretchresulting in minimum singularity is picked to calculate thecoefficient θWJ using only that stretch as a predecessor.

In order to obtain the coefficients ζWk+1 , θWk+1 by (7), therecursive relation is maintained for all the precedent nodes(generally it holds for calculating the coefficient for anynode). Thus, the recursive computation as in Alg. 1, whichis made for geometrically distributed activity durations, is tobe implemented.

III. SIMULATION APPROACH

The benchmark for the analytic result is a brute-force MonteCarlo simulation of the probability wK+1(n) in logarithmicscale. The 1st order polynomial is fitted to the scatteredsamples and the comparison of the slope and intercept ofthat polynomial and analytic slope and intercept by (3) isdone. For simulation purpose the geometric distribution istaken. For the initial structures as for a stretch, a sequenceof a 2 nodes and a 1 dummy activity and, for a join with 2

Algorithm 1 Analytic derivation of a stretch forgeometrically distributed service times

1: for k = 0, . . . ,K + 1 do . for each activity k2: for n in τK+1 do . τK+1 is planned project

duration3: sk(n) ∼ geom(pk, n)

4: θSk= −pk/(pk − 1)2

5: ζSk= 1/(1− pk)

6: ζWk= min(ζWk−1

, ζSk−1) if k > 0 else ∞

7: for k = 0, . . . ,K + 1 do8: wk(0) =

∑ak−1

j=0 bk−1(j) if k > 0 else 1

9: for n = 1, . . . , τK+1 − τk do

10: wk(n) = bk−1(n+ ak−1) if k > 0 else 0

11: bk(n) =∑n

j=0 wk(j) · sk(n− j) if k > 0else s0(n)

12: Sk(z) = pk · z/(1− (1− pk) · z)13: Wk(z) = Sk−1(z)·Wk−1(z)

zak−1 +∑ak−1

n=0 bk−1(n) ·(1− zn−ak−1) if k > 0 else 1

14: θWk= θBk−1

/ζak−1

Wkif k > 0 else ∞

15: if ζWk< ζSk

then16: θBk

= Sk(ζWk) · θWk

17: else18: θBk

=Wk(ζSk) · θSk

19: return ζWK+1, θWK+1

. ζ and θ of waiting timeof a dummy end node

predecessors only, the Monte Carlo results match the analyticresults numerically and visually.

Fig. 2: Stretch: 6 nodes

However, extending the stretch to, for example, the sequenceof 6 nodes, shows that the intercepts are different, thesimulation plot has a curve and the analytic fit acts as atangent to that curve. Similar behaviour is observed for thejoin structure with multiple precedent nodes.

A. A sample networkWe would like to consider the more complex structure. Asan example, the sample network shown in Fig. 3 with the

ii

following structure and parameter settings as given in TableI, which were chosen randomly, is taken.

k pk a k pk a1 1/14 14 9 1/4 42 1/18 18 10 1/16 163 1/11 11 11 1/17 174 1/8 8 12 1/15 155 1/7 7 13 1/16 166 1/6 6 14 1/13 137 1/3 3 15 1/10 108 1/5 5 16 1/2 2

TABLE I: Parameter settings

Fig. 3: A sample project schedule network

The analytic fit does no longer act as a tangent and visuallytwo lines, computed analytically and obtained via simula-tion, seem to diverge. However, one should notice that thenumerical results in Tab. II, which are very close to eachother.

Fig. 4: Result for sample network

The intuitive explanation is that the x-axis values becomelarger as the overtime of the more complex structures can behigher and therefore the little discrepancy in the parametersof the line, especially of the slope, results in visuallycaptured gap, whereas, for example for the initial stretchof two nodes two line would look to be coinciding.

1M Monte Carlo AnalyticSlope -0.050135 -0.057158

Intercept -2.703773 -2.706577

TABLE II: Sample network: simulations results

Besides the generally faster decay of the analyticapproximation, it also depends on the planned durationak of the activities, if it is equal to or higher than theexpected value of the geometric distribution of the servicetime E[Sk], then the decay is faster so that the probability

of overtime is lower.

Additionally, the analytic fit is dependent on the location ofthe most lengthy activity in the network implying whetherit is closer to the start of the project or to the end of it. Itis as if the model assumes that the delay of the activity inthe beginning of the project can be compensated during theproject within the planned makespan.

IV. CONCLUSION

The current research has approached the evaluation of projectovertime from the tail asymptotics perspective. Severalstrong limitations were put on the problem setting such asthe generating functions with singularity in a form of adominant pole of order 1, or a simple pole, were consideredonly and the activities with equal durations are not allowedin the same network. The results obtained from analyticapproximation do not generally coincide with the simulationresults. The line obtained via analytic computation behavesas a tangent to the simulation results. The slopes valuesare the same for the results from analytic computation andMonte Carlo simulation, but the intercepts are different. Thusthe performance measure of having both similar slopes andintercepts should be reconsidered. For longer stretches, morecomplex joins and networks, the distribution of overtime isno longer geometric-like and has some curvature observedfrom simulation plot in logarithmic scale. This curve cannotbe captured by the line and that is why the discrepancythat region between analytic and simulation results occurs.However, the convergence of the analytic line and thesimulation line in the tails is expected as n → ∞. Theanalytic line results in the more conservative estimate ofproject overtime and thus, the risk analysis of using thisestimate as an estimate of overtime for the real projectsbecomes a logical continuation of this study.

REFERENCES

[1] S. De Vuyst, H. Bruneel, and D. Fiems. Computa-tionally efficient evaluation of appointment schedules inhealth care. European Journal of Operational Research,237(3):1142–1154, 2014.

[2] P. Flajolet and R. Sedgewick. Analytic Combinatorics.Cambridge University Press, 2008.

iii

uncertain activity durationsA technique for project management in case of

Academic year 2018-2019

Master of Science in Industrial Engineering and Operations Research

Master's dissertation submitted in order to obtain the academic degree of

Supervisors: Prof. dr. ir. Stijn De Vuyst, Prof. dr. Dieter Fiems

Student number: 01609101Yulia Popova

Contents

1 Introduction 11.1 Project scheduling nowadays . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Analytic approach 52.1 Theory of complex analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Complex numbers and complex analysis . . . . . . . . . . . . . . . 72.1.3 Analyticity and a singularity . . . . . . . . . . . . . . . . . . . . . . 82.1.4 Meromorphic functions . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.5 Connecting theory to the solution . . . . . . . . . . . . . . . . . . . 14

2.2 Analysis of a stretch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Analysis of a join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Introduction to a fork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Computation of analytic results 193.1 Network structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Recursive computation of a stretch . . . . . . . . . . . . . . . . . . . . . . 21

4 Simulation approach 244.1 Naive Monte Carlo estimation . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Practical example 37

6 Conclusion 406.1 Practical interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.2 Further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Bibliography 42

Appendices 43A Derivation of θB (2.16) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43B Derivation W (z) and θW in (2.17) and (2.18) . . . . . . . . . . . . . . . . . 44C Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Chapter 1

Introduction

1.1 Project scheduling nowadaysProject scheduling is an essential and complex part of project management process, whichhas the strongest effect on the overall project performance. A project schedule aims toincorporate into a single model project activities, deliverables and milestones and itsintended durations, initiation and completion dates. The schedule is being constructedbased on the tasks descriptions, well-defined dependencies between them, availability ofresources and other conditions and constraints such as cost, quality and time requirements.Therefore, project scheduling is a challenging, many-sided problem, which is being tackledfrom the different disciplines’ perspectives.

Although a project schedule needs to keep track of financial, quality and managementfactors, an interruption in any of them will affect the duration of the entire project and thetotal project cost. Different scheduling objectives are often crucial from a practical pointof view, however, the minimization of the project lead time is often the most importantobjective of scheduling [8]. A deadline has been baked into the success criteria of allnowadays projects and the ability to close the project by the intended completion datedefines the success or failure. Moreover, the variation in project duration has an immediateeffect on the total project cost, which can come from the direct penalty costs, which arebeing commonly reflected in the contracts with a customer today, as well as indirect onescoming from the risk to lose the customer’s relationships, company’s image and liabilities.Therefore, the focus of the project schedulers is shifting to the more accurate estimationof the project lead time, reflection of variability in the schedule to approximate the onebetter to the reality and evaluation of sensitivity of the lead time to variability on differentproject stages.

The are two scheduling techniques, which can be called traditional, these are ProgramEvaluation and Review Technique (PERT) and Critical Path Method (CPM). Both tech-niques assist in evaluation of project lead time based on precedence constraints and criticalpath. The techniques undergo diverse criticism and numerous research has focused on theextensions of that problem types from “day one” the techniques were introduced. Thehistory of the techniques and the evolution of the related research is well summarizedin [6].

1

1.1. Project scheduling nowadays Chapter 1. Introduction

PERT

Calculation of a critical path is the basis for the PERT technique. The critical path is theearliest possible completion time of the project and it is the longest path in the activitiesprecedence network.

PERT is the extension of a deterministic definition of activity durations by introducingthe uncertainty through the three estimates: optimistic, realistic and pessimistic. Firstly,PERT assumes that each activity duration is a random variable between optimistic andpessimistic extreme values following a Beta distribution. The Beta distribution has pos-itive skewness, implying that the chance of getting higher duration values is lower aswell as it is a truncated function meaning that the probability to get values higher thanthe pessimistic value is zero. The PERT methodology provides formulae to calculate theparameters of the Beta distribution based on the three-point estimate of the activity.Secondly, the assumption implies that the entire project duration is also a random vari-able, the expected value of which can be derived using the central limit theorem. As thattheorem implies the project completion time is normally distributed and the calculationof quantiles and, therefore, the probabilities, for example the probability to overdue theproject by a certain time period, is now also possible.PERT technique is widely criticized, the subjects of criticism are neatly summarized intocategories in [6]:

• Critiques of the three-point estimation

• Critiques of the proposed activity distribution

• Critiques of the optimistic result of the PERT calculation

• Critiques about omitting activity calendars

The research has focused around these points in order to make the technique more flexibleand accurate as well.

CPM

Critical path calculation is the basis for CPM too, which is also explicitly stands in thename of the technique. That makes it similar to PERT and often it is referred to asPERT/CPM in literature.

CPM differs from PERT with activity duration perception. CPM assumes that activityhas a normal and a crash duration. Time/cost trade-offs in project scheduling find theirroots in the CPM model, it is assumed that every activity has a normal duration, which isbased on an execution using normal technology, normal working weeks and working days,as well as an average resource load. The so-called normal cost is associated directly withthe normal duration. On the other hand, some of the activities can be accelerated byusing longer shifts, faster technologies and application of more workforce and machines.The fastest activity duration is called the crash duration and the associated direct cost iscalled the crash cost. It is also assumed that the crash cost is greater than the normal cost,and the curve in between these costs is linear. Thus, CPM primarily is a cost optimizationtechnique. It does not take the variability of activities into account and operates with a

2

1.2. Problem statement Chapter 1. Introduction

deterministic critical path in planning, which almost guarantees the project to be late asany parallel path can become critical.

The techniques and their derivatives are incorporated in various forms into the projectscheduling software. Additionally, the application of a Monte Carlo simulation offers asolution for all the criticized issues mentioned before. It can handle multiple completingpaths and any kind of activity duration distributions. It is fast enough to run a vastnumber of instances within a reasonable amount of time and can be used on a networkcontaining complicated precedence relationships. Monte Carlo allows to get statistics ofthe distribution of the project completion time regardless its analytic shape. However,the accuracy and the variability of the estimates as well as the selection of the rightdistribution to represent the activity duration are still actual challenges to bring thismethod to the mass use.

1.2 Problem statementThe goal of the current dissertation is to evaluate the makespan of a project from aqueueing theory point of view, considering the scheduling network diagram as a queue ofspecific structure. Conventionally, a queue is perceived as a line, in other words, projectingit back to the scheduling diagram, a queue is a sequence of activities realizing one afteranother. However, the activities can happen in parallel and have various dependencies ina diagram and, therefore, the conventional queueing model needs to be extended. Onecan see later that analytic evaluation of such structure becomes complicated even for asmall project and requires some inventive approximations, which will be shown as themain approach to analytic solution.

Assume time is slotted, so that the system is discretised, and consider a network diagramrepresenting some project schedule taking as an example the network in Figure 1.1.

Figure 1.1: Example of a schedule network. Architecture is taken from [8].

The network has an ’activity on node’ representation [8] and has K activities in the net-work. Let the scheduled starting time of the activity be denoted as τk for k = 1, 2, . . . , Kso that the activity k will start at τk immediately after all of its immediate predeces-sors finish. Introducing a dummy K + 1 node at the end of the network, the scheduledmakespan of the project can be expressed as the scheduled starting time of the dummyactivity τK+1 as it coincides with the scheduled finishing time of the last activity in thenetwork and, therefore, the complete project. However, during the project run, in case

3

1.2. Problem statement Chapter 1. Introduction

any of the immediate predecessors is delayed, the activity k has to wait some Wk timeslots and consequently the dummy node has the waiting time WK+1 time slots. Let X bethe makespan of the project, which can be given now in terms of waiting time as,

X = τK+1 +WK+1 (1.1)

Each activity k = 1, 2, . . . , K has a planned duration ak, defined as the number of timeslots between the two consecutive τk, and it has a service time Sk, which is a randomvariable. We denote sk(n) = Prob[Sk = n], for n ≥ 0, µk = E[Sk], σ2

k = Var[Sk] to be theprobability distribution function of Sk, its mean and the variance respectively, which byassumption are known in exact form.

(a) An initial fork (b) An initial join (c) An initial stretch

Figure 1.2: Basic structures of a network

The scheduling diagram has various dependencies as has been mentioned before. Thediagram can be decomposed into simple structural units: ’a stretch’, ’a join’ and ’a fork’.A sequence of activities, where one activity has one predecessor and one successor, isnamed ’a stretch’ throughout the paper. A structure, where two or more activities are theimmediate predecessors of a common successor, is named ’a join’ and a structure, whereat least two or multiple activities are the successors of the common predecessor, is named’a fork’. Although the same structures compose the real network, for the simplificationof the analysis, they are considered separately as they exist on their own. Therefore,the simplified basic structures will be referred correspondingly as ’an initial stretch’, ’aninitial join’ and ’an initial fork’. Such initial structures are depicted in Figure 1.2. The’initial stretch’ is assumed to have 2 consequent real activities and 2 dummy activitiesrepresenting the start and the end of the project. The ’initial join’ is assumed to have3 real activities, 1 join activity with 2 immediate predecessors, and 2 dummy activitiesrepresenting the start and the end of the project. The ’initial fork’ is assumed to have3 real activities, 1 fork activity with 2 immediate successors, and 2 dummy activitiesrepresenting the start and the end of the project.

In order to analyse the makespan, the accumulation of waiting time has to be analysedin each structural unit separately taking into consideration the unique dependencies forthat structure. To illustrate, propagation of waiting time in the stretch-like fragment ofthe network is similar to propagation in a queue whereas a fork structure introduces theadditional dependence of the parallel successors.

The problem of evaluation of the makespan is converted into the problem of evaluation ofproject’s overtime or lateness. Thus, it converts to evaluation of waiting time WK+1 andspecifically its extreme realizations.

4

Chapter 2

Analytic approach

The idea of evaluation of project overtime is addressed from a different side and insteadof getting the main statistics of the overtime distribution, one focuses on getting theprobabilities of the project to be extremely late. Knowing the behaviour at those regions,extrapolation to the regions of the interest in order to extract the probability of a specificdelay is possible. There is no particular point, where the project becomes extremelydelayed, for example, where the main body of the distribution stops and the tail starts,but the tail refers to the part of the distribution that is really far away from the mean, oftenreferred to high quantiles as 0.90, 0.95 or even much higher. The distribution of overtimeis the output of the complex network and it is affected by the realisation of each nodein the network so that its form (a mass function) is not known explicitly. The statisticsof that distribution can be obtained via simulation, however, it becomes computationallyexpensive when the focus falls to getting the probabilities of rare events. Therefore, onetends to get an approximation of the tail probabilities via complex asymptotics, which isa subject of analytic combinatorics.

Analytic combinatorics studies combinatorial enumeration, whereas the analytic in thename implies the methods from mathematical analysis, in particular complex and asymp-totic analysis, used to predict precisely the properties of large structured combinatorialconfigurations. Generating functions are the central objects of study of the theory. Com-plex analysis or complex asymptotics, which is the tool of interest for the problem inhand, is a unified analytic theory dedicated to the process of extracting asymptotic in-formation from counting generating functions. Asymptotic means approaching a value orcurve arbitrarily closely, for instance, when some limit is taken. A collection of generaltheorems provides a systematic translation mechanism between generating function andits coefficients in asymptotic form. To sum up with the quote from [5]: ‘Given basic rulesfor assembling simple components, what are the properties of the resulting objects?’.

Projecting back to the problem at hand, a probability distribution function is a generatingfunction, being a sequence of numbers representing the probabilities as the coefficients ofa power series. Therefore, the procedure to analyse the properties of the large combinato-rial structure stated by analytic combinatorics can be applied directly, jumping over thefirst part of the procedure, where the derivation of the symbolic formulation and gener-ating function of a combinatorial structure is required. Knowing the generating function,one can proceed with the transformation and apply tools of complex asymptotics. Theframework is summarized below and depicted in Figure 2.1.

5

2.1. Theory of complex analysis Chapter 2. Analytic approach

Figure 2.1: Procedure to derive asymptotic estimates of the desired properties

Analytic combinatorics starts from an exact enumerative description of combinatorialstructures by means of generating functions: these make their first appearance as purelyformal algebraic objects. Next, generating functions are interpreted as analytic objects,that is, as mappings of the complex plane into itself. Singularities determine a function’scoefficients in asymptotic form and lead to precise estimates for counting sequences.

By classical approach, having the generating function, the asymptotic coefficients arederived, explicit expressions for coefficients are developed and then approximated. How-ever, the explicit forms can be ’unfriendly’ or even unavailable. With the approach ofanalytic combinatorics the coefficients are directly approximated without being expressedexplicitly.

To sum up, analytic combinatorics concerns the enumeration of combinatorial structuresusing tools from complex analysis. In contrast with classical combinatorics, which usesexplicit combinatorial formulae and generating functions to describe the results, analyticcombinatorics aims at obtaining asymptotic formulae.

2.1 Theory of complex analysisIn this section, the basic definitions and theorems are given in order to refer to them whenbuilding a solution later. At the current stage of the research, only generating functionshaving singularities in the form of a dominant pole of order 1, also called a simple pole,are considered. Therefore, the theory discussed below is limited to the degree sufficientto work out the solution with such limitation as an assumption. The possible extensionsof the analysis are discussed in Section 6.2.

The current Chapter 2.1 is based mainly on the book by Philippe Flajolet, RobertSedgewick ’Analytic Combinatorics’ [5], which we refer to for explicit proofs. The ad-ditional sources will be given alongside a mention in the text.

2.1.1 Generating functions

Combinatorics deals with discrete objects and the major question is how to enumeratethem. A combinatorial class is a set of such objects and an associated size function. Thesolution to enumeration is provided by a generating function, which is the central objectof combinatorial analysis. It can be shown that a generating function is a reduced form ofa combinatorial class, where the internal structure is no longer preserved and the elementscontributing to the size are replaced by a variable z.

Definition 2.1 The ordinary generating function of a sequence (An) is the formal powerseries

A(z) =∞∑

n=0

Anzn. (2.1)

6


The coefficient An quantifies how many objects of size n the generating function consistsof. Thus, An is a mass function a(n) if a probability generating function of the discreteobjects is considered.

An = [zn]A(z) (2.2)The above definition of a coefficient comes from a general operation of coefficients extrac-tion. The operation of extracting coefficients of zn in the power series f(z) =

∑n≥0 fnz

n

is denoted to be [zn]f(z), so that

[zn]

(∑

n≥0

fnzn

)= fn (2.3)

To illustrate the algebraic operations on formal generating functions and their coefficients,it can be shown that common operations as, for example, disjoint union and Cartesianproduct, hold so that for 2 different generating functions A(z) and B(z) the followingholds:

A(z) +B(z)→ An +Bn A(z)×B(z)→n∑

k=0

AkBn−k

The generating functions are the central objects of the theory.

2.1.2 Complex numbers and complex analysis

Having the definition of the generating function, one can proceed to its analysis, mean-ing to the assignment of values to the variables that appear in the generating function.Examining the generating function from the light of analysis is to treat it as an analytictool, which yields the solution, to be specific, which yields the numeric coefficients values.

If the real values are assigned to the variables in the generating function, it results incomparatively little benefit. In contrast, when assigning complex values, a generatingfunction becomes a geometric transformation of the complex plane. This transformationprovides much information regarding the function’s coefficients. One can visualise thetransformation by watching [1].

Complex numbers are an essential example of the power of abstraction. Defining a complexnumber i to be the square root of −1 so that i2 = −1.

A complex number z is then defined as z = x + iy, where the operations in Table 2.1apply. Its geometric interpretation as a point in the complex plane is shown in Fig. 2.2.

real part Rz = ximaginary part J z = y

absolute value |z|=√x2 + y2

conjugate z = x− iy

Table 2.1: Complex number components Figure 2.2: Complex number geometry

7


The complex numbers keep the basic algebraic operations and natural approach is to usealgebra, but converting i2 to− 1 whenever it occurs.

The complex transformation is very regular near the origin, one says that it is analytic,or in other words the function is smooth near origin. Farther away from the origin, somecracks start appearing, that disappearance of smoothness is called singularities. Thebehaviour of the function near its singularities provides a wealth of information regardingthe function’s coefficients, and especially their asymptotic rate of growth. Thus, havinggrasped the generating functions and complex analysis, the theory proceeds to coefficientasymptotics.

2.1.3 Analyticity and a singularity

A complex function regular in some open region is said to be analytic in that region andit is said to cease to be analytic at some point called a singularity. Analyticity is a majorproperty of a function in complex analysis.

Two parallel notions, which define what an analytic function is, are introduced: analytic-ity, defined by power series expansions, and holomorphy, defined as complex differentia-bility, which are equivalent terms. In order to understand the definitions and their effect,let us provide some assisting definitions and notions first.

Let us introduce coefficient asymptotics. The coefficients of a generating function shownin (2.3) belong to a general asymptotic type for coefficients of a function f , and can berepresented in the asymptotic form of

[zn]f(z) = AnΘ(n) (2.4)

where on the right-hand side An is called an exponential growth factor and Θ(n) is called asubexponential factor, also named frequently as a tame factor. Quickly to put the parallelwith the evaluation of the project overtime distribution,

a(n) = An = [zn]f(z) = AnΘ(n)

where a(n) is a mass function, which can for example be the desired mass function ofovertime wK+1(n) and thus can be approximated through the coefficients asymptotics aswK+1(n) = Wn = [zn]WK+1(z).

First Principle of Coefficient Asymptotics. The location of a function’s singularities dic-tates the exponential growth An of its coefficients.

Second Principle of Coefficient Asymptotics. The nature of a function’s singularitiesdetermines the associate subexponential factor Θ(n).

Its disc of convergence is tightly related to the analyticity of a function, which is conveyedin its Definition 2.3 by convergent power series.

Definition 2.2 (The disc of convergence of a power series) Let f(z) =∑

n≥0 fnzn

be a power series. Define R as the supremum of all values of x ≥ 0 such that fnxn isbounded. Then, for |z|< R, the sequence fnzn tends geometrically to 0; hence f(z) isconvergent. For |z|> R, the sequence fnzn is unbounded; hence f(z) is divergent. Inshort: a power series converges in the interior of a disc; it diverges in its exterior.

8


If given a function f that is analytic at a point z0, there exists a disc (of possibly infiniteradius) with the property that the series representing f(z) is convergent for z inside thedisc and divergent for z outside the disc. The radius of the disc of convergence is calledthe radius of convergence of f(z) at z = z0. The radius of convergence of a power seriesconveys basic information regarding the rate at which its coefficients grow.

The definition of analytic function or analyticity is given then.

Definition 2.3 (Analytic function) A function f(z) defined over a region Ω is analyticat a point z0 ∈ Ω if, for z in some open disc centred at z0 and contained in Ω, it isrepresentable by a convergent power series expansion

f(z) =∑

n≥0

cn(z − z0)n. (2.5)

A function is analytic in a region Ω iff it is analytic at every point of Ω.

Secondly, the definition of holomorphic (differentiable) functions is given.

Definition 2.4 (Holomorphic function) A function f(z) defined in a region Ω is holo-morphic or complex-differentiable at a point z0 in Ω iff the limit f ′(z0) = limδ→0

f(z0+δ)−f(z0)δ

exists, for complex δ. A function is complex-differentiable in Ω iff it is complex-differentiableat every z0 ∈ Ω.

Notationally, the definition is the same as for real domain, but now δ is complex and canapproach zero in various ways. The definition is much stronger as the value is independentof the way that δ approaches 0.

As it was mentioned in the beginning, two notions are equivalent and imply one another,what is reflected in the below theorem.

Theorem 2.1 (Basic Equivalence Theorem) A function is analytic in a region Ω iffit is complex-differentiable in Ω.

A singularity in the form of a pole can be informally defined as disappearance of smooth-ness, cracks in analytic region or a point where the function stops to be analytic; it is apoint where a given mathematical object is not defined. The function can have a singleor multiple singularities of various nature. A pole is the simplest type of singularity andis considered within the current research.

Definition 2.5 (Singularity) Given a function f defined in the region interior to thesimple closed curve γ, a point z0 on the boundary (γ) of the region is a singular point ora singularity if f is not analytically continuable at z0.

The function is said to be no longer analytically continuable. Analytic continuationprovides a way of extending the domain over which a complex function is defined. Theanalytic function is rigidly determined in any wider region as soon as it is specified in anytiny region. In contrast with real analysis, where a smooth function admits uncountablymany extensions, analytic continuation is essentially unique.

9


Figure 2.3: Analytic continuation

Let f1 and f2 be analytic functions on open domains Ω1 and Ω2, respectively, and supposethat the intersection Ω1 ∩Ω2 is not empty and that f1 = f2 on Ω1 ∩Ω2. Then f2 is an ananalytic continuation of f1 to Ω2, and vice versa. Moreover, f2 is a unique continuationaround some point z0 at the boundary of Ω1. The analytic continuation is depicted inFigure 2.3.

The location of singularities is refined by the following theorem, knowing that there areno singularities within the disc of convergence, the theorem shows that there should beone on the boundary of that disk.

Theorem 2.2 (Pringsheim’s Theorem) If f(z) is representable at the origin by a se-ries expansion that has nonnegative real coefficients and radius of convergence R, then thepoint z = R is a singularity of f(z).

Singularities of a function analytic at 0, which lie on the boundary of the disc of conver-gence, are called dominant singularities. Pringsheim’s theorem appreciably simplifies thesearch for dominant singularities of combinatorial generating functions since these havenon-negative coefficients it is sufficient to investigate analyticity along the positive realline and detect the first place at which it ceases to hold.Only the smallest positive realroot matters if no others have the same magnitude. For the quotient of two analyticfunctions f(z)/g(z) ceases to be analytic at a point a where g(a) = 0.

2.1.4 Meromorphic functions

Complex integration is another important notion for coefficients asymptotic analysis. Theinformation about the coefficients can be deduced from the values of the function itself,using adequately chosen contours of integration. Thus, the estimation of the coefficients[zn]f(z) in the expansion of f(z) near 0 by using information on f(z) away from 0. Beloware the properties of the complex integration:

• The integral of an analytic function around a loop is 0 if the function has no singu-larities inside the loop.

• The coefficients of an analytic function can be extracted via complex integration.

Thus the third notion to the known analyticity and complex-differentiability is introduced.These three notions imply each other.

10


Theorem 2.3 (Null integral property) If f(z) is analytic in Ω then∫λf(z)dz = 0

for any closed loop λ in Ω.

Figure 2.4: Theorems of complex analysis

The Cauchy Coefficients Formula allows one to extract the coeffcients using the integralrepresentation.

Theorem 2.4 (Cauchy’s Coefficient Formula) Let f(z) be analytic in a region Ωcontaining 0 and let λ be a simple loop around 0 in Ω that is positively oriented. Then,the coefficient [zn]f(z) admits the integral representation

fn = [zn]f(z) =1

2iπ

∫

λ

f(z)dz

zn+1(2.6)

Approach for coefficients extraction:

• Use contour integration to expand into terms for which coefficient extraction is easy.

• Focus on the largest term to approximate.

Properties of analytic functions then make the analysis depend only on local propertiesof the generating function at a few points, its dominant singularities. We particularlyapproach the meromorphic functions, which singularities are poles.

Definition 2.6 (Meromorphic function) A function h(z) is meromorphic at z0 iff,for z in a neighbourhood of z0 with z 6= z0, it can be represented as f(z)/g(z), with f(z)and g(z) being analytic at z0. In that case, it admits near z0 an expansion of the form

h(z) =∑

n≥−M

hn(z − z0)n.

If h−M 6= 0 and M ≥ 1, then h(z) is said to have a pole of order M at z = z0. Thecoefficient h−1 is called the residue of h(z) at z = z0 and is written as

Resz0h(z).

A function is meromorphic in a region iff it is meromorphic at every point of the region.

11


• A function h(z) that is meromorphic at z0 admits an expansion of the form (alsoknown as Laurent series)

h(z) =h−M

(z − z0)M+ . . .+

h−2(z − z0)2

+h−1

(z − z0)+ h0 + h1(z − z0) + h2(z − z0)2 + . . .

and is said to have a pole of order M at z0.

• The coefficient h−1 is called the residue of h(z) at z0, written Resz=z0h(z).

• If h(z) has a pole of order M at z0, the function (z − z0)Mh(z) is analytic at z0.Consequently, if M is finite, it can be computed from the Laurent series as

Resz0h(z) =1

(M − 1)!limz→z0

dM−1

dzM−1(z − z0)Mh(z) .

A function is meromorphic in Ω iff it is analytic in Ω except for a set of isolated singular-ities, its poles.

The important Cauchy’s Residue Theorem relates global properties of a meromorphicfunction (its integral along closed curves) to purely local characteristics at designatedpoints (the residues at poles).

Theorem 2.5 (Cauchy’s Residue Theorem) Let h(z) be meromorphic in the regionΩ and let λ be a simple loop in Ω along which the function is analytic. Then

1

2iπ

∫

λ

h(z)dz =∑

s

Ressh(z), (2.7)

where the sum is extended to all poles s of h(z) enclosed by λ.

Theorem 2.6 Suppose that h(z) is meromorphic in the closed disc |z|≤ R; analytic atz = 0 and all points |z|= R; and that α1, . . . , αm are the poles of h(z) in |z|≤ R. Then

hn = [zn]h(z) =p1(n)

αn1+p2(n)

αn2+ ...+

pm(n)

αnm+O

(1

Rn

)(2.8)

where p1, . . . , pm are polynomials with degree equal to the multiplicity of the respective poleminus one.

If αi is of order 1:

h(z) ≈ c

z − αias z → αi

Resz→αih(z)

zn+1= Resz→αi

c

zn+1(z − αi)=

c

αn+1i

For combinatorial generating functions, the singularities closest to the origin contribute toderiving asymptotic estimates of coefficients and are therefore called dominant. Accordingto Pringsheim’s Theorem a dominant singularity lies on the positive real line on the radiusof convergence for the generating function with non-negative coefficients. The implicationis that only the smallest positive real root matters if no others have the same magnitude.

12


If some of the roots do have the same magnitude, then complicated periodicities can bepresent. One willing to study deeper that behaviour can start from referring to DaffodilLemma.

The following theorem finalizes the process of explicit formulation of the coefficients andsummarize the procedure of coefficients extraction and calculation.

Theorem 2.7 Suppose that h(z) = f(z)/g(z) is meromorphic in |z|≤ R and analyticboth at z = 0 and at all points |z|= R. If α is a unique closest pole to the originof h(z) in |z|≤ R, then α is real and [zn]f(z)

g(z)≈ cβnnM−1 where M is the order of α,

c = (−1)M Mf(α)

αMg(M)(α)and β = 1

α.

If α is a pole of order 1 then hn = [zn]h(z) ≈ −h−1

αn+1 where h−1 = limz→α(z − α)h(z). Tocalculate h−1:

limz→α

(a− z)h(z) = limz→α

(a− z)f(z)

g(z)= lim

z→α

(a− z)f ′(z)− f(z)

g(z)= − f(α)

g′(α)

The bottom line of the theory supplied above is the procedure for analytic transfer formeromorphic generating functions with the dominant pole of order 1, which approximatesthe coefficients as f(z)/g(z) ≈ cβn.

• Compute the dominant pole α (smallest real with g(z) = 0); check that no othershave the same magnitude.)

• Compute the residue h−1 = −f(α)/g′(α).

• Constant c is −h−1

α.

• Exponential growth factor β is 1/α.

Figure 2.5 shows the exponential growth factor in action.

Figure 2.5: Decay of the geometric tail

The current theory can be applied to the family of distributions having the geometric tail.This category of distributions is wide including the probability generating functions in the

13


form of rational or meropmorphic functions. Therefore, the applicability of the results isalso wide, but somehow limited in capturing the real-life behaviour of the project tasks.

Various types of distribution families have different types of singularities, not restricted toa pole, which is in fact the simplest possible singularity type. For example, the convergenceregion of the "light-tailed" distributions is R > 1 whereas "heavy-tailed" distributionshave the probability generating function that is not analytic at z = 1. Therefore, in orderto model the durations of the project tasks with any of these distributions, the separatetargeted analysis have to be done.

2.1.5 Connecting theory to the solution

The theory of complex asymptotics states that assigning the complex values to generatingfunctions brings the analysis to a new light. The generating function becomes a geometrictransformation of the complex plane. That transformation is analytic (Definition 2.3)in some region, more specifically inside the disc of convergence. At some point in thecomplex plane, the cracks starts appearing, this is where the analyticity ceases and thesingularities occur. The function’s singularities provide a wealth of information regardingthe function’s coefficients and especially their asymptotic rate of growth. The objectiveof the analysis is to translate the approximation of the function near the singularity intothe asymptotic approximation of its coefficients.

Let us project the fragments of the theory to the subject of the current dissertation, aproject schedule network. Let us suppose a discrete non-negative random variable A hasa probability generating function A(z) with non-negative coefficients, and a mass functiona(n) = Prob[A = n], n ≥ 0. In particular, by A(z) the unique analytic continuation ofthe power series

∑+∞n=0 a(n)zn is meant from its convergence region |z|< R,R ≥ 1 to the

rest of the complex domain. Any probability generating function satisfies A(1) = 1, itis analytic in |z|< 1 and can only have singularities outside or on the border of the unitcircle. The asymptotic behaviour of the probabilities a(n) (i.e. tail distribution of randomvariable A) can be deduced from the shape of the function A(z) in the neighbourhood ofits singularities. According to the Pringsheim’s Theorem 2.2, all the generating functionsthat are analytic at the origin and have non-negative coefficients have a singularity at thepoint z = R, R is a radius of convergence. Probability generating function satisfies bothconditions and therefore has a singularity at z = R. Singularities of a function analytic at0, which lie on the boundary of the disc of convergence are called dominant singularities.Despite various types of singularities, this research considers poles only. Thus, all polesclosest to the origin, so dominant poles, contribute to the leading coefficient term. If thereis only one dominant pole and it is a pole of order 1 then it is called a simple pole.

Consider a probability generating function, which has the real singularity ζ > 1, which isa simple pole. The pole can be circled around, while staying in analytic region accordingto Cauchy Residue Theorem 2.5. A mass function is a set of coefficients of that probabil-ity generating function and the Cauchy Coefficient Formula 2.4 is invoked to define thecoefficients in asymptotic form as well as to define θ as,

a(n) = Prob[A = n] ≈ −θζn+1

θ = ResζA(z) = limz→ζ

(z − ζ)A(z)

The contribution of the non-dominant singularities is not considered and therefore, forfinite n, the above expression is an approximation of the coefficients. It is asymptotically

14

2.2. Analysis of a stretch Chapter 2. Analytic approach

correct to use the equality sign between the coefficients a(n) and its asymptotic form, be-cause the approximation gets better as n→∞. The above expressions are also explicitlysummarized in Theorem 2.7 and the procedure described after the theorem. However,here the slightly new notation is introduced: θ is the residue and the singularity is definedas ζ. The 1/ζ is an exponential growth factor.

The probability distributions, which have coefficients in such form are said to have ageometric tail. If such probabilities are plotted on a logarithmic scale as a function of n,one gets a straight line,

log Prob[A = n] = log(−θζn+1

) = log(−θζ

)− n log ζ (2.9)

One can see that the parameters of the line, a slope and an intercept, are expressed interms of θ and ζ now. Thus, knowing the values of θ and ζ, one can get the definition ofthe line, which is defined for each n. It is possible to get the coefficients value at each n ofinterest, meaning that the probability of that n is available. This is a useful interpretation,which enriches the chosen approach with practical insight and applicability.

The following assumptions are made:

1. Probability generating functions of the form that have only a simple pole as domi-nant singularity are considered.

2. Each probability generating function considered in the schedule diagram has tohave a singularity that is different from that of the others in order to preserve thedesired behaviour. If two activities have the pole of the smae magnitude, then thedistribution of overtime can have a dominant pole of multiplicity one or higher.Thus, every activity in the project has to have a unique numerical value for itsduration so that no two probability generating functions have the coinciding poles.

Thus, with the assumptions in hand, all the random variables, which occur in the analysisof the network, have geometric tails and can be approximated as in (2.9).

2.2 Analysis of a stretchLet us repeat briefly the problem statement described in Section 1.2. A stretch is thepart of the network, which is represented as a sequence of nodes, each having exactlyone predecessor and one successor. Each node has a scheduled service time ak and anactual service time Sk, which is a random variable. If the precedent node is delayed, thefollowing node has to wait. The waiting time propagates through the entire chain, mayaccumulate in the final node and result in the project overtime. The goal is to evaluatethe probability distribution of project overtime through finding the recursive relation of itand all the precedent nodes. Such a relation was elaborated in [4]. The so-called Lindleyequation applies:

Wk+1 = max(0, Sk +Wk − ak) (2.10)

That recursive relation translates to recursive relation of the first moments and the ex-pected makespan would be

E[X] = τK+1 + E[WK+1] (2.11)

15

2.2. Analysis of a stretch Chapter 2. Analytic approach

This way of computing the makespan is efficient according to [4], it has few advantagesover other methods,

• There are no infinite sums to be approximated by being truncated, therefore it isexact.

• Computing the sums is fast as the only demanding computation is to evaluatewaiting probabilities wk(n), but in every iteration fewer of them are required forfor n = 0, . . . , τK+1 − τk−1.

Let us take that relation and introduce an assistant variable B for compactness of deriva-tion only so that

Wk+1 = max(0, Bk − ak) Bk = Sk +Wk (2.12)

The activity duration is a random variable Sk, has a generating function Sk(z), whichcoefficients can be approximated by complex asymptotics as

sk(n) = −θSkζ−n−1Sk(2.13)

Similarly, the waiting time of activity Wk is a random variable, has a generating functionWk(z) with the coefficients

wk(n) = −θWkζ−n−1Wk

(2.14)

The variable B has a generating function Bk(z) = Wk(z)Sk(z) and its coefficients are

bk(n) = −θBkζ−n−1Bkwith ζBk = min(ζSk , ζWk

) (2.15)

The residue θBk is then derived for two different cases whether ζSk or ζWkis the minimum

as shown in Figure 2.6, where RW and RS stand for radii of disc of convergence for Wk(z)and Sk(z) correspondingly.

θBk =

Sk(ζWk

)θWk, if ζWk

< ζSk ,

θSkWk(ζSk), if ζWk> ζSk

(2.16)

Figure 2.6: Two cases of minimum singularity to define θBk

16

2.3. Analysis of a join Chapter 2. Analytic approach

By definition, Wk+1 is expressed in terms of Bk in (2.12) and the generating functionWk(z) is derived in terms of Bk(z) as well.

Wk+1(z) =Bk(z) +

∑akn=0 bk(n) · (zak − zn)

zak, (2.17)

therefore Wk+1(z) has the same singularity as Bk(z) so that ζWk+1= ζBk and the θWk+1

isexpressed as

θWk+1= ResζBkWk+1(z) =

θBkζakBk

=θBkζakWk+1

(2.18)

The derivation of the above equations is shown in Appendix B. The above can be gen-eralized and applied recursively to get the ζWK+1

and θWK+1of the makespan. It can be

noticed that the singularity ζWK+1will take the minimum of the singularities of activities

durations in the network.

2.3 Analysis of a joinThe initial join structure shown in Figure 1.2(b) is a node that has more than one prede-cessors. The join structure does not carry any dependencies with predecessors so that thewaiting times of the predecessors namely W1 and W2 are independent. By the analogywith the relations derived in analysis of the stretch in Section 2.2, and now denoting thetime that activity 3 has to wait for activities 1 and 2 as W1 and W2 respectively,

W1 = max(0, B1 − a1) W2 = max(0, B2 − a2) W3 = max(W1,W2)

ζB1 = min(ζW1 , ζS1) ζB2 = min(ζW2 , ζS2) ζW3 = min(ζB1 , ζB2)

It can be shown that the following relations hold

ζW3 = ζBj , θW3 = θBj , with j = arg mini∈1,2

ζBi , (2.19)

where index i enumerates the elements in the set of the immediate predecessors of thejoin. Thus the join activity will inherit the singularity ζ and the coefficient θ from one ofits immediate predecessors.

2.4 Introduction to a forkThis section is explaining briefly the complicating component of a fork structure andreasoning why this basic structure is left out of the research at this stage. Additionally,it aims to provide a starting point for the future research.

In the network of Figure 1.1 the activity 1 is a fork activity as it has three successors 2,3 and 5. Suppose that 1 is extremely late for a reason, then all 2, 3 and 5 are likely to belate as well. If 1 finishes early, then 2, 3 and 5 are likely to start at their scheduled times

17

2.4. Introduction to a fork Chapter 2. Analytic approach

τ2, τ3 and τ5. Therefore, one can see that the parallel sequences 2-4-7, 5-8-10 and 3-6-9are not independent of each other. There is the dependency of the after-the-fork activitiesand the same happens at the final join as well. In fact, the waiting times of the activities2-5-3 (after the fork) and 7-10-9 (before the join) are dependent. Due to that dependency,their joint distribution is required to compute the makespan. For the sample network, theanalysis is 3-dimensional, which still can be solved analytically. However, the dimensionsgrow with the amount of parallel task sequences and seeking the flexibility, the higherdimensions need to be effectively handled. That applies to every fork in the network,for example, activity 4, being a fork, brings an additional, to the already mentioned,dependency between 7-8. Thus, the general relations schema in the network becomes(F-fork, S-stretch, J-join):

W1F−→ (W2,W5,W3)

F−→ (W7,W8)S−→ (W7,W10,W9)

J−→ W11

Theoretically, there are a few ways this problem can be tackled analytically (e.g. consid-ering copula), however, the efficient way of using the recursive relations to evaluate themakespan might be lost. Therefore, as the approach to the analytic solution for the forkis different from the one suitable to the stretch and the join, it was separated into anindividual project. Alternatively to evaluating analytically the model with joint distri-butions, having the image of complexity due to the dependency in mind, one can makea strong assumption of independence, hopefully it results in the less demanding calcu-lations, and one can compare two models and their performance. There are statisticaltools, which operate surprisingly well with that strong assumption, for example, a NaiveBayes estimator.

18

Chapter 3

Computation of analytic results

3.1 Network structureThe computation algorithm has to iterate over all the activities in the network, startingwith the start activity and progressing towards the end. The required quantities, forexample, the singularity and the residue, can be computed for the activity only if thesequantities are already computed for all of its predecessors. That is, for activity k, thesingularity and the residue of its waiting time Wk can be computed only if it is donefor all predecessors of k. The computation of the entire network can be organised into anumber of stages. Let A be a set of activities, then Succ(A) is the set of all activities,which are successors of activities in A and Pred(A) contains all predecessors of A. Then,the activities Ci to be computed in successive stages i = 0, 1, 2, ... can be determined asfollows.

Algorithm 1 Network stages1: i← 02: F ← ∅3: repeat4: if i = 0 then5: Ci ← start6: else7: Ci ← k ∈ Succ(Ci−1) : Pred(k) ⊂ F8: end if9: F ← F ∪ Ci

10: i← i+ 111: until F contains all activities

Let I be the last stage obtained this way, then the first stage will always be C0 = startand the last will be CI = end. For any stage, the particular order, in which the activitiesin Ci are computed is irrelevant. Now, with the assumption that there are no fork activitiesin the network besides the forking start activity, computation could advance as follows,

19

3.1. Network structure Chapter 3. Computation of analytic results

Algorithm 2 Computation in stages1: For start activity: service time and waiting time are set to 02: for i = 1, ..., I do3: for k ∈ Ci do4: for k′ ∈ Pred(k) do5: Compute pmf, pgf, singularity and residue of B′k = W ′

k + S ′k6: Compute pmf, pgf, singularity and residue of W ′

k = max(0, B′k − a′k)7: end for8: Compute pmf, pgf, singularity and residue of Wk = maxk′∈Pred(k)(Wk′)9: end for

10: end for

This way, there is no need to identify stretches and joins in the network beforehand.

Alternatively, stretches and joins should be identified in the network. This approach isbased on the fact that the join node inherits the singularity of the waiting time ζWk

fromone of its predecessors so that ζWk

= mink′∈Pred(k)(ζB′k). Once the predecessor holding the

minimum singularity is identified, the join ’chooses’ the stretch that predecessor belongsto and compute the residue for that stretch. In that way, less of the expensive recursivecomputation of the residue is required.

Note that if Pred(k) contains only one activity k′, then k is part of a stretch. Otherwise,if Pred(k) contains multiple activities, it is part of a join. Thus, the join nodes can beidentified by having multiple activities as immediate predecessors. For each join all itspredecessors (not only immediate) are split into independent stretches. The start of thestretches is a root activity, which has no predecessors. Note, however, that if some join isnot the first join in the network, and has another join among its predecessors, the startof the stretches is the precedent join instead and the end is the current join. This is animportant note for a number of reasons.

Firstly, let us recall again that the join node inherits the singularity of the waiting timeζWk

from one of its predecessors and it can be said that the join node is selecting to whichof the stretches it will assign itself. Let us illustrate the possible scenario if the minimumsingularity happens to be on one of the join nodes. The simple schedule diagram forillustration purpose is depicted in Figure 3.1.

Figure 3.1: Sample network to illustrate the location of the minimum singularity in thenetwork on the join node

If for the join node 9, all preceding nodes are split into independent stretches, it willresult in 3 stretches 1-2-5-9, 3-4-5-9 and 6-7-8-9. Node 9 inherits the minimum singularity

20

3.2. Recursive computation of a stretch Chapter 3. Computation of analytic results

as well as the residue of one of its immediate predecessors. In other words, node 9 isassigned to one of the stretches, which has the minimum singularity in the network. Ifthe minimum singularity happens to be on the join node 5, then there are two stretches,which contain the minimum singularity and ending at node 9, and node 9 has two optionsof stretches to ’choose’ to be assigned to. However, node 5 has already ’chosen’ one of the1-2 or 3-4 stretches, which contains the minimum singularity and therefore, node 9 cantake the ’choice’ of node 5 and compute the missing part from node 5 only. Thus, insteadof computing 1-2-5-9 and 3-4-5-9, node 9 reads what node 5 has chosen, for example, thestretch 3-4-5 and computes the stretch 5-9 only. Thus, the two reasons for considering thefragment of the network between two consecutive joins when splitting it into stretches areas follows. Firstly, this accounts for the situation that the minimum singularity is locatedat the precedent join node and makes the computation feasible. Secondly, it allows toreduce the amount of computation required by computing the singularities first and thencomputing the residue only for the stretch containing the minimum singularity.

The approach described above can be summarized as follows. The combination of two ba-sic structures analytically into the network is done by splitting each of the join blocks intoindependent stretches and calculating the singularity ζmWk

for mth stretch, jth join. Thenthe stretch resulting in minimum singularity is picked and the residue θWj

is calculatedfor that stretch only. Thus, the algorithm is:

Algorithm 3 Splitting network into joins and stretchesfor j in J do . J is a set of nodes having >1 immediate predecessors

M = . M set of stretchesfor m in M do

ζmWj

return ζmWj= [m elements]

end foridx = argmin(ζmWj

)ζWj

= min(ζWidx, ζSidx)

θWj= θidx

end forreturn ζWJ

, θWJ. Return coefficients of the join closest to the end

Asking the algorithm to return the singularity and residue of the waiting time distributionof the last node in the network triggers the recursive computation. The two approaches tocombine two different basic structures, the join and the fork, into the network demonstratethat the network can be considered as a combination of stretches only. Thus, below therecursive computation steps are elaborated for the stretch structure.

3.2 Recursive computation of a stretchFor simulation of the analytic results for the stretch, additional practical derivations arerequired. From the analysis of the stretch, one recalls that in order to compute theresidue, for example, θBk , then Wk(z) and θWk

need to be known for every node. Letexpand few first iterations of the computation of the singularity and the residue of theovertime distribution for the case of the stretch and generalize the computational stepslater.

21


Let us suppose that there are K + 1 activities in the stretch and the planned duration ofthe project is τK+1 = a1 + a2 + . . . + aK and the (K + 1)th activity is a dummy activitywith the duration 0. Then,

For activity 1:

W1 = 0, so that there is no tail of the distribution and therefore ζW0 and θW0 are notdefined, but are set to ∞ for the practical purpose as it can be included into recursivecomputation.

We have,

W1(z) = 1, w1(n) =

1 if n = 0

0 if n = 1, ..., τK+1

The first activity ends after B1 = W1 + S1 = S1 slots, so

ζB1 = min(ζW1 , ζS1) = ζS1 θB1 = θS1W1(ζS1) = θS1

And, the coefficients of B1(z),

b1(n) = s1(n), n = 0, . . . , τK+1

For activity 2:

The waiting time W2 = max(0, B1 − a1) with

W2(z) =S1(z)W1(z)

za1+

a1∑

n=0

b1(n)(1− zn−a1) w2(n) =

∑a1j=0 b1(j) if n = 0

b1(n+ a1) if n = 1, . . . , τK+1 − a1

and

ζW2 = ζB1 θW2 =θB1

ζa1B1

The first activity ends after B2 = W2 + S2 slots, so

ζB2 = min(ζW2 , ζS2) θB2 =

S2(ζW2)θW2 if ζW2 < ζS2

W2(ζS2)θS2 if ζS2 < ζW2

And, the coefficients of B2(z),

b2(n) =n∑

j=0

w2(j)s2(n− j), n = 0, . . . , τK+1 − a1

Thus the general algorithm for calculating ζWK+1and θWK+1

is generalised in Algorithm 4,assuming activity k has a geometrically distributed duration with parameter pk, distinctfrom that of the other activities. Recall also that the dummy nodes 0 and K + 1 haveboth service time and planned durations equal to 0.

22


Algorithm 4 Analytic derivation of a stretch for geometrically distributed service times1: for k = 0, . . . , K + 1 do . for each activity k2: for n in τK+1 do . τK+1 is planned project duration3: sk(n) ∼ geom(pk, n)4: end for5: θSk = −pk/(pk − 1)2

6: ζSk = 1/(1− pk)7: ζWk

= min(ζWk−1, ζSk−1

) if k > 0 else ∞8: end for

9: for k = 0, . . . , K + 1 do10: wk(0) =

∑ak−1

j=0 bk−1(j) if k > 0 else 1

11: for n = 1, . . . , τK+1 − τk do12: wk(n) = bk−1(n+ ak−1) if k > 0 else 0

13: bk(n) =∑n

j=0wk(j) · sk(n− j) if k > 0 else s0(n)

14: end for15: Sk(z) = pk · z/(1− (1− pk) · z)

16: Wk(z) = Sk−1(z)·Wk−1(z)

zak−1 +∑ak−1

n=0 bk−1(n) · (1− zn−ak−1) if k > 0 else 1

17: θWk= θBk−1

/ζak−1

Wkif k > 0 else ∞

18: if ζWk< ζSk then

19: θBk = Sk(ζWk) · θWk

20: else21: θBk = Wk(ζSk) · θSk22: end if23: end for24: return ζWK+1

, θWK+1. ζ and θ of waiting time of a dummy end node

Thus by implementing one of the network structures and the algorithm for recursivecomputation of the singularity and residue for each node, the parameters of interest, thesingularity and residue of the project overtime distribution can be obtained.

23

Chapter 4

Simulation approach

In this section, the comparison of analytic and simulation results is made. For the simula-tion approach the brute-force Monte Carlo simulation is used. Additionally, the possibilityto use importance sampling for more efficient sampling with more draws from the tails isinvestigated. The comparison is made in terms of a slope and an intercept obtained viaMonte Carlo simulation and via analytic computation estimated knowing a singularity ζand a residue θ as in (2.9).

4.1 Naive Monte Carlo estimationFor simulation purpose, the durations of K + 1 activities are geometrically distributedwith the following probability mass and generating functions:

sk(n) = pk · (1− pk)n for k ∈ 1, 2, 3, . . . , K + 1, n ∈ 1, 2, 3, . . . , τK+1

Sk(z) =pk · z

1− (1− pk) · zMonte Carlo is a well-known and widely used technique, the algorithm for applying it tothe particular case of estimating the distribution of project overtime is shown below:

Algorithm 5 Monte Carlo simulationfor k = 0, . . . , K + 1 do . for each activity k

for r = 1, . . . , R do . draw R samples from geometric with parameter pkSrk ∼ geom(pk)

end forend forfor k = 0, . . . , K + 1 do

for r = 1, . . . , R doW rk = max(0, Srk−1 +W r

k−1 − ak−1)end for

end forreturn samples W r

K+1 . waiting time samples of the last node

The result of the Monte Carlo simulation plotted in logarithmic scale is the samplesscattered along some line, which is approximated by fitting a polynomial of order 1 by the

24

4.1. Naive Monte Carlo estimation Chapter 4. Simulation approach

least-squared method. The slope and the intercept of that line are used for comparisonwith the analytic results.

Having the computation procedure for both analytic and Monte Carlo computation of theslope and the intercept, the comparison of the results for the initial stretch and the initialjoin can be done. Below the setting is given for two simple experiments with the initialstretch and the initial join structures. The pk parameter in Table 4.1 corresponds to theparameter of the geometric distribution of the activity k, according to which the servicetime Sk of the activity k is distributed.

Figure 4.1: Initial stretch Figure 4.2: Initial join

Stretch JoinPMF ∼ geom ∼ geomp1 1/10 1/10a1 10 5p2 1/5 1/7a2 10 7p3 - 1/5a3 - 5

Table 4.1: Simulation settings

Stretch Monte Carlo AnalyticSlope -0.103245 -0.105361

Intercept -3.029799 -3.084485JoinSlope -0.101417 -0.105361

Intercept -2.988427 -3.084485

Table 4.2: Simulation results

Figure 4.3: Simulation of initial stretch Figure 4.4: Simulation of initial join

For the simple structures, the results look promising as the deviation between the simula-tion and analytic results is negligible and even visually the lines look coinciding. Thus thesame result can be achieved with significantly less computation and increased accuracy forsome higher values of overtime. However, after extending the stretch beyond 5 activities,which practically is a very small amount of tasks in the project, the two estimates ofthe overtime probability distribution do not result in the close-enough approximation to

25


be considered equivalent. For some network structures the numerical values of the slopeand intercept obtained with analytic and simulation approaches are still close enough,however, the general trend observed is that the analytic line acts as a tangent to thescattered samples obtained via simulation. The same behaviour is observed for the joinstructures. Below the experiment is held for the extended stretch, containing 6 nodes in asequence, the parameter setting and the results are summarized in Table 4.3 and in Table4.4 respectively. From Figure 4.5 one can see that the samples obtained via simulationand marked as blue dots form some curvature in the region of low values on x-axis andthe analytic line looks to be a tangent to it. For the more complex networks, the averageestimate of 10 Monte Carlo simulations is taken as the little fluctuations in slope andintercept make a huge difference for the relevance of the observed results.

k pk a1 1/10 102 1/5 53 1/8 84 1/3 35 1/7 76 1/6 6

Table 4.3: Stretch: simulation settings

Stretch Monte Carlo AnalyticSlope -0.094665 -0.105361

Intercept -2.504362 -1.526761

Table 4.4: Stretch: simulation results

Figure 4.5: Stretch: 6 nodes

The analytic result happens to decay generally faster and be less conservative for thehigher realizations of overtime. Moreover, one can find trends in the analytic resultsbehaviour, which are dependent on the location of the activity with minimum singularityin the network, meaning whether it is located further or close to the end of the project.It is also sensitive to the change in the value of the planned duration ak of the activity k,whether it exceeds the expected value of the service time or not. In Figure 4.6 and Figure4.7 the plots are shown for the simulation with activity duration ak 1 unit lower and 1

26


unit higher than the expected values of service time E[Sk]. For example, for activity 1with 1/pk = 1/10, the ak were set to 9 in the first experiment and to 11 in the secondexperiment. In Figure 4.6, showing the results of ak <E[Sk], the simulation fit has themore observed curve and the analytic fit results in higher probabilities of overtime. Thatis also intuitive, as the activities in reality last longer than their planned duration, thechance to overdue the project increases. The opposite situation is shown in Figure 4.7,where ak >E[Sk]. The simulation fit does not have distinct curve as in the max functionmax(0, Sk + Wk − ak), 0 happens to be chosen more frequently. The analytic fit resultsin lower probabilities of overtime, which is also intuitive as the activities in reality takeless time as it was reserved for them, the project is less likely to be delayed.

Figure 4.6: Planned activity duration ak is1 unit lower than E[Sk]

Figure 4.7: Planned activity duration ak is1 unit higher than E[Sk]

The analytic fit acts as a tangent for the join structure too. In Figure 4.8 the result ofthe experiment for the join node, which has 5 precedent nodes, is shown. The setting andthe numerical results are shown respectively in Table 4.5 and Table 4.6.

k pk a1 1/10 102 1/5 53 1/8 84 1/3 35 1/7 7

Join node 1/6 6

Table 4.5: Join: simulation settings

Join Monte Carlo AnalyticSlope -0.108312 -0.105361

Intercept -2.344140 -2.149301

Table 4.6: Join: simulation results

27

4.2. Importance Sampling Chapter 4. Simulation approach

Figure 4.8: Join: 5 predecessors

The current measure of the result is the similarity of the slope and intercept implying thatone seeks to see the similarity of dispersion of the distributions obtained via simulationand analytic ways. However, with the observed in this section results, the slopes usuallycoincide, however the intercepts are different as the analytic fit happens to act as atangent. The simulation fit has a curve and the fitted first order polynomial, whichprovides the slope and intercept for comparison, is a rough fit unable to reflect thatbehaviour. Therefore, although the results do not match the initial expectation (matchingslopes and intercepts), it does not imply the wrong results, but make one to reconsiderthe current measure of the results.

4.2 Importance SamplingIf we are interested in the tail of the overtime distribution, we need to estimate verysmall probabilities which takes Monte Carlo a long time. Moreover, as was mentioned,the average estimate of few Monte Carlo simulations should be used due to the varianceand the alternatives to reduce that variance should be considered. Importance sampling(IS) is a technique, which can get more samples form the region of interest and is a well-known variance reduction technique. IS works by ‘tilting’ the input distributions, in thecurrent case the distributions of the activities duration, towards higher values. In thesimulation the longer-than-usual activities are more likely to appear so that the proba-bility of WK+1 > q increases and can be estimated more efficiently. Suitably correctingthe estimated probability to account for the tilting finally results in a good, unbiased es-timate. The challenges of applying the importance sampling to the estimation of projectovertime is that the estimated distribution is a multivariate joint distribution and eachof its component should be tilted. Alternatively, IS can be applied point-wise by tiltingone or several activities only, raising the question what activities have to be selected fortilting.

The main drawbacks of IS, which should be taken with care during the implementation

28


are the choice of the the proposal distribution and as a consequence, a poor precision,which can become present so that a few bad samples with large weights can drasticallythrow off the estimator. Importance sampling is simulation-consistent for most purposes,but, in general, if the importance ratios are unbounded, which happen if there are parts ofthe target distribution with longer tails than the proposal distribution, then for any finitenumber of simulation draws, importance sampling gives an in-between distribution of theproposal and the target distributions. That is demonstrated and illustrated well in [3] andthe various adaptive or multiple IS are proposed to mitigate the effect of ’odd’ weights.In the case of estimating overtime, getting some high values by proposal can result inzero probability of the true distribution and can blow up the weights. Therefore, weneed to make a careful choice for each pair of true-proposal distributions of each activity.Additionally, the too extreme values of overtime have no practical meaning, for examplethere is no value to know the probability of project overtime of 1 year for the project withduration of a month or a week even. Therefore, to prevent the blow-up of the weights,one can filter the values drawn by the proposal distribution and ensure that all the drawnvalues have big enough probability to give a significant weight that contributes. Thus,that approach is put together below.

Consider an initial stretch with only two activities scheduled a1 slots apart. The activitieshave duration S1 and S2 with mass functions s1(i) and s2(j) respectively, so we know themakespan X is

X = a1 +W2 + S2 = a1 + max(0, S1 − a1) + S2 = max(a1, S1) + S2

Suppose the probability to be estimated is

x(n) = Prob[X = n] = E[fn(S1, S2)] fn(i, j) =

1, if max(a1, i) + j = n

0, otherwise

where the expectation is over the joint distribution of S1 and S2. The activity durationsare assumed to be independent, therefore, the joint distribution has the mass functions1(i)s2(j), i, j ≥ 0. The samples (S ′1,r, S ′2,r), r = 1, ..., R, can be collected from a jointproposal distribution of the activity durations, which is different from their actual jointdistribution. Maintaining independence of the activities duration in the proposal too, letthe proposal joint distribution to be s′1(i)s′2(j), i, j ≥ 0. Then,

x(n) = Prob[X = n] = E[fn(S1, S2)] =∞∑

i

∞∑

j

fn(i, j)s1(i)s2(j)

=∞∑

i

∞∑

j

fn(i, j)s1(i)s2(j)

s′1(i)s′2(j)

s′1(i)s′2(j) = E[fn(S ′1, S

′2)s1(S

′1)s2(S

′2)

s′1(S′1)s′2(S

′2)

]

where now the expectation is over the joint distribution of S ′1 and S ′2. Applying the MonteCarlo method to this expectation, we have the following IS estimator:

xIS =1

R

R∑

r=1

fn(S ′1,r, S′2,r)

s1(S′1,r)s2(S

′2,r)

s′1(S′1,r)s

′2(S

′2,r)

(4.1)

29


Note that E[xIS(n)] = E[xMC(n)] = x(n) but hopefully, if we choose a good proposaldistribution, Var[xIS(n)] ≤ Var[xMC(n)].

As was mentioned already above, if we want to estimate x(n), n = 0, . . . ,m, for somemaximal makespan value m, it would be a waste to generate R proposal samples of theduration for each n because for most of them the corresponding term in the estimator(4.1) will be 0. It is more economical to generate R proposal samples first, and then, foreach of them, see which of the m+ 1 estimators xIS(n), n = 0, . . . ,m, it can be used for.Practically, the algorithm works as follows, where LR stands for Likelihood Ratio, alsoknown as a weight:

Algorithm 6 IS estimation of x(n)

for n = 0, . . . ,m dox(n)← 0

end forfor r = 1, . . . , R do . R number of samples

S ′1,r ∼ geom(p′1)S ′2,r ∼ geom(p′2)

LRr =s1(S′1,r)s2(S

′2,r)

s′1(S′1,r)s

′2(S′2,r)

n = max(a1, S′1,r) + S ′2,r

if n ≤ m thenx(n)← x(n) + LRr/R

end ifend for

One expects importance sampling to result in the same estimate as the Monte Carloestimate, with the same or smaller amount of variance at much lower computationalexpense. As it can be seen from the variance definition below, the variance of the ISestimator is reduced by the factor 1/R with R samples.

Var[xIS(n)] =1

RE[(fn(S ′1, S

′2)s1(S

′1)s2(S

′2)

s′1(S′1)s′2(S

′2)− x(n)

)2]

Regardless the case, whether it is a univariate or multivariate distribution to be estimated,the choice of the proposal is a challenging task, where there is no strict guideline and thereare only few rules of thumb. Looking at the formula of the variance, one can represent theexpected value in the generic form as E[(f(n)p(n)/q(n)− x(n))2] where f(n) is the ’cost’function and p(n), q(n) the true and proposal distribution respectively. Note that thisvariance is 0 if we could choose q(n) = f(n)p(n)/x(n). This is a legitimate probabilitydensity if f(n) ≥ 0. Of course one cannot really do this since it would require knowingx(n), but this forms a strategy to choose a proposal such that the variance is minimizedby taking q(n) to be proportional to |f(n)|p(n). Thus, the few rules of thumb stating thatthe distribution should preferably have a similar shape (this is in case there is some imageabout the true distribution) and it should be above the true distribution in the major partof the support. The intuitive approach is to select a proposal such that it optimizes somecriterion, for example, aiming at minimization of the variance of the estimate as it is shownabove, however it can also aim at minimizing the distance between two distributions toensure the suitably proportional weights.

30


The selection of a geometric distribution as a proposal for the estimation of the projectovertime distribution is well-reasoned as we know the activity durations are geometricallydistributed and we expect the joint distribution to be geometric-like.

The simulation of the initial stretch is implemented with the proposal distributions to begeometric for S ′1 and S ′2 with parameters p′1 = 1/20, p′2 = 1/10 respectively and parameterp1 = 1/10, p2 = 1/5 staying the same as in the simulation of the stretch with Monte Carlobefore. The values of overtime above 200 were dropped out in order to secure there are nozero weights. With 100K samples, the probabilities of higher values of project overtimeare obtained (overtime of 200 timeslots vs. 120) and with significantly less replications(100K samples vs. 1M), it led to the same result. 10 subsequent simulations were runand the average slope and intercept values are given below.

IS AnalyticSlope -0.105585 -0.105360

Intercept -3.029742 -3.078545

Table 4.7: IS: simulation settings Figure 4.9: IS: an initial stretch

Starting from the simple case of the 2-node stretch, let us investigate how to choose theoptimal proposal distribution for each of the two activity duration distributions and howthe variance of the estimate changes due to the change of the proposal distribution. Wewould like to run the following experiment. The true parameter values stay the sameas one saw in the previous simulations: p1 = 1/10 and p2 = 1/5. Then, for each nodethe set of the parameters for the proposal distribution is considered as shown in Table4.8. These parameters are chosen based on the rules of thumb: choosing the proposalsthat are above the true distribution and choosing it so that for all the realizations of theproposal the true value exists and is greater than zero. The goal of the experiment isto find the pair of parameters, which results in the minimum variance of the estimate.The measured variance is the variance of the estimate that project overtime WK+1 isgreater than some quantile q, as the extreme probabilities are the subject of interest. Theexperiment is run for different quantile values shown in Table 4.8 as well. 10 simulations of10K samples are run for each pair of parameters for each quantile value and the varianceis measured. For the higher quantiles the lower mean values of the estimates are expectedand the magnitude of the variance will also have low values in comparison to the variancemagnitude of the lower quantiles. Therefore, for comparison purpose, to see which pairof the parameters result in higher variation in each particular quantile, the measure ofrelative standard deviation is more representative. The relative standard deviation (RSD)is defined as Var[x(n)]

12/x(n), where x(n) is the sample mean.

31


p′11/11, 1/12, 1/13, 1/14, 1/15, 1/16, 1/17,

1/18, 1/19, 1/20, 1/21, 1/25, 1/30, 1/35, 1/40, 1/50

p′21/6, 1/7, 1/8, 1/9, 1/10, 1/11, 1/12,

1/13, 1/14, 1/15, 1/16, 1/20, 1/25, 1/30quantile 50, 60, 70, 80, 100, 120, 140, 160

Table 4.8: Optimal proposal: parameter settings

For the 2-node stretch, due to the small amount of nodes, the greed search over the a setof parameters p′1 and p′2 can be done in order to identify the pair of parameters resultingin the lowest variance and hopefully in the same stable estimate with good weights. Thegreed search requires 201,6 M samples. Using the simplest search method, we wouldlike at least to utilise the existing technology in order to perform the experiment withinreasonable timeframe. The simulations were run in Google Colab, which provides GPUacceleration and allows to complete the simulation in shorter time.

Figure 4.10: 2 node stretch: RSD vs. quantile

Once having the simulation results, the optimal parameters resulting in the minimumRSD are found for every quantile. In Figure 4.10, the optimal pair of parameters for eachquantile is taken and its performance for other quantiles is shown in logarithmic scale.For example, the red line corresponds to the pair of parameters p′1 and p′2, which areoptimal for quantile of 80. From the graph, one can see that indeed for the quantile of 80,which is shown on the x-axis, the red dot is the lowest dot corresponding to the lowestvalue along the y-axis, which is the RSD. Similar can be shown for the optimal pairs ofother quantiles. The lines for quantiles 50, 60 and 70 are coinciding as the same pairof parameters happens to be optimal for all of these quantile values adn it can also beverified from Figure 4.11. For these particular parameter values, the RSD is not definedfor the quantile higher than the pair is optimal for. One can see that the same red linedoes not have values for quantile above 80, where the red line is the lowest. It should benoted that all of the lines are decreasing. However, it was expected that for the optimal

32


pairs of the low quantiles, the variance will be higher for the same pair of parameters, butfor the higher quantiles. It was expected to happen due to less samples drawn from thehigher regions and even little deviation in these samples from simulation to simulationwould result in ’jumping’ estimate and therefore higher variance. It is not observed forthe pairs of parameters shown in Figure 4.10, but it does occur and can be observed in thecontour plot for quantile 160 in Figure 4.18. The red dot on the contour plot representsthe optimal pair of parameters. On the contour plots for the higher quantiles the greendots show the pairs of parameters giving the undefined RSD or alternatively the varianceof zero. Having the variance of exactly zero is infeasible and what is happening there isthat for each of 10 simulation runs, the estimates of different magnitudes are summedand their average value results in zero due to the specifics how the programming languagehandles such computation. It can be noticed from Figure 4.18 that the green dots are onthe other side of the high variance region and when plotting the contour plot in slightlydifferent scale in order to include zero values, the distortion is visible now and is depictedin Figure 4.19. The crack occurs in the region where the undefined RSD pairs (green dots)are plotted in Figure 4.18 and it proves the hypothesis that indeed for the low values ofthe parameters, the variance for the higher quantiles is high, but due to the specifics ofthe computation, it is zero or undefined. This is the illustration of the ’odd’ weights thatcan blow-up the estimate.

Figure 4.11: 2 node stretch: Parameter value vs. quantile

The plot in Figure 4.11 show the optimal parameter values p′∗1 and p′∗2 for different quantilevalues. It is remarkable that only one parameter tends to increase and another parameterdeviates slightly. Both parameters have the optimal values lower than expected. Onepossible explanation is that for higher parameter values, for example, p1 = 50, the highervalues of overtime are obtained and the true distribution has zero probability of thatvalues and give the zero weight. Otherwise, if the upper bound is set as in Algorithm6, then a lot of values are filtered, leaving much less samples in the quantile region andleading to higher variance.

Below are the contour plots shown for different quantile values. The contour plots help to

33


find the border, between the parameter values giving the lower RSD and the ones givingthe relatively higher RSD. For example, for the higher quantiles of 120, 140 and 160, thereis a region marked with green dots, where the RSD is undefined. One can see on thatcontour plots that the optimal pairs marked as red dots are quite close to that regions. Itmeans that they might result in low RSD by chance in that simulation run and may alsoresult in undefined value in the next run. Figure 4.19 shows the contour plot of varianceversus parameter values. In that plot the border is quite distinct and it allows to filterthe parameter values at first and then to find the minimum among the new subset ofparameters. For example, it is possible to find the minimum RSD parameters only afterlooking at the contour plot of variance because just searching for the minimum value inthe dataset could give the parameters, which are accidentally having the low variance andare still located on the left side of the high variance border on the contour plot in Figure4.19.

Figure 4.12: Quantile 50: Prob[(WK+1 > 50)] Figure 4.13: Quantile 60

Figure 4.14: Quantile 70 Figure 4.15: Quantile 80

One can notice that the region with higher variance is larger for the higher quantiles andcovers the area of the low values of parameter 1/p′1.

34


Figure 4.16: Quantile 120 Figure 4.17: Quantile 140

Figure 4.18: Quantile 160 Figure 4.19: Quantile 160, relative variance

Thus deciding on the proposal parameters in IS simulation is not a trivial task. Theexample of IS given in the beginning of the section in Figure 4.9 uses the upper bound onovertime of 200 and the chosen proposal parameters result in a good estimate matchingwith classical Monte Carlo. However, the lower values of the proposal parameters result inin-between estimate (lower or higher than Monte Carlo). One can use the variance/RSDanalysis discussed in this section to explain that poor performance with different param-eter pairs.

Transforming straightly the same approach to the problem with more variables is not pos-sible. In [7], the application of IS to different multivariate models is considered, however,the models are restricted to the sums of random variables and mainly linear relations ofthe variables. In the current case the relation is expressed as a max function, and thejoint distribution happens to be the max of zero and multiple max functions one insidethe other. Through experimentation with the stretches with more nodes, the IS happensto perform poorly in the higher-dimensional space and it is also discussed explicitly in [2].

The importance sampling is a tool, which was used in order to achieve the more ac-curate estimates of the tail and therefore, to improve the fitted polynomial and reduce

35


the discrepancy between the analytic and simulated results. The implementation of theimportance sampling is challenging even in 2 dimensions, the choice of the proposal distri-butions and the maximum overtime value have to be taken care of and the final choice ispossible due to the Monte Carlo benchmark. As the dimensions gets higher and multipledistributions need to be tilted the more sophisticated samplers need to be considered.

36

Chapter 5

Practical example

In this chapter, two basic structures, a stretch and a join, are to be combined within a sam-ple project scheduling network. In line with the previous chapter, the same performancebenchmark, the results of Monte Carlo simulation, is used.

The sample project schedule dependency graph to be simulated is depicted in Figure 5.1.It is visually split into stretches as well as into blocks, each of them representing a joinstructure. The target is to estimate the distribution of waiting time of a dummy activity’16’ shown as a dashed circle.

Figure 5.1: A sample project schedule network

Below are the parameter setting for the experiment, which were chosen randomly withthe condition that each activity has a unique parameter pk and the difference betweenparameters is not too large, thus it was bounded to be drawn from the interval [1/20, 1/2].The planned duration of activities corresponds to the expected value of the service timeso that ak =E[Sk].

37

Chapter 5. Practical example

k pk a k pk a1 1/14 14 9 1/4 42 1/18 18 10 1/16 163 1/11 11 11 1/17 174 1/8 8 12 1/15 155 1/7 7 13 1/16 166 1/6 6 14 1/13 137 1/3 3 15 1/10 108 1/5 5 16 1/2 2

Table 5.1: Parameter settings

1M Monte Carlo AnalyticSlope -0.050135 -0.057158

Intercept -2.703773 -2.706577

Table 5.2: Simulation results

In Figure 5.2 and Figure 5.3 the results for the ’block 1’, containing a join node 9, and forthe ’block 2’, containing a join node 15, are shown. Both plots are in line with what wasobserved in Chapter 4 and the analytic fit performs as a tangent to simulation results.

Figure 5.2: Result for block 1 in Fig. 5.1 Figure 5.3: Result for block 1 in Fig. 5.1

The result for the complete sample network is shown in Figure 5.4 and the numericalresults of the experiment are summarized in Table 5.1. The analytic fit does no longer actas a tangent and visually two lines, computed analytically and obtained via simulation,seem to diverge. However, one should pay attention to the numerical values, which happento be very close to each other this time. The visual difference between the lines is due tothe little difference between two slope values and two intercept values, which scales up asn→∞. One should notice that indeed the x-axis, overtime n, reaches larger values thanin case of the short stretch or the simpler network structure.

38

Chapter 5. Practical example

Figure 5.4: Result for complete sample network

Visually it can be read that up to approximately 50 units of overtime, the deviation isvery small and increases afterwards. The results obtained from simulation of the sampleschedule network are yet open to interpretation as the expectation was to get the analyticapproximation, which acts as a tangent to simulation results as it does in case of ’block1’ and ’block 2’, the components that compose the network. Although the numericalvalues are very close and it has been initially set as a measure of the performance for theexperiment, the results are subject to further investigation and possible re-considerationof the analytic results.

39

Chapter 6

Conclusion

The current research has approached the prediction of project overtime from the complexasymptotics perspective. Several strong limitations were put on the problem settingsuch as only the generating functions with singularity in a form of a dominant pole oforder 1, or a simple pole, were considered and no coinciding activity durations wereallowed. Thus, the activities duration are assumed to have a distribution from a familyexpressed as rational or meromorphic function and, which has a geometric tail, plottingwhich in a logarithmic scale is a line. A stretch and a join, the dependencies structuresin the network, were studied and combined into the sample schedule consisting of thatspecific task dependencies only. For the simulation purposes the geometric distributionwas selected for each activity duration and the derived analytically results were comparedagainst the brute-force Monte Carlo simulation. To achieve the analytic approximation ofthe project overtime, the recursive computation is required: evaluation of the singularityand the residue at every node of the network has to be done. However, the efficientcomputation is possible due to the fact that the recursive values for the current nodehave to be evaluated only for the current and future possible project duration outcomesmeaning that for the every next node less evaluations of the functions is needed.

The results obtained from analytic approximation do not generally coincide with thesimulation results. The line obtained via analytic computation behaves as a tangentto the simulation results. The slopes values are the same for the results from analyticcomputation and Monte Carlo simulation, but the intercepts are different. Thus theperformance measure of having both similar slopes and intercepts should be reconsidered.For longer stretches, more complex joins and networks, the distribution of overtime is nolonger geometric-like and has some curvature observed from simulation plot in logarithmicscale. This curve cannot be captured by the line and that is why the discrepancy thatregion between analytic and simulation results occurs. However, the convergence of theanalytic line and the simulation line in the tails is expected as n→∞.

Several trends are observed. Firstly, the analytic fit generally decays faster and fixingthe parameter values, the variation in the planned activity duration value influences thedecay. Having reserved more time for some activity than it needs on average, resultsin lower probabilities of overtime on the whole support, implying also the faster decay.Secondly, the analytic fit depends on the location of the most lengthy activity in thenetwork. Moving it towards the end results in the higher probabilities of overtime andslower decay, moving it closer to the start of the project results is lower probabilities as ifthe model assumes the project will have time/resource to mitigate the delays happenedin the beginning. That observations are in line with the formulae developed in this

40

6.1. Practical interpretation Chapter 6. Conclusion

dessertation. The difference between simulation and analytic is less significant for thesmall values of overtime, the exact margin can be determined, and it gets higher towardsthe extreme values, where the little discrepancy in the line parameters scales up. Theapproximation is fast to compute comparing to the sampling methods.

6.1 Practical interpretationApproximating the coefficients of the function around singularity, one gets the necessaryparameters to draw a line, which is defined for every point of the project overtime. Thus,the complex asymptotics property saying, that the behaviour around the point laying faraway from the point of interest can explain the behaviour around the point of interest,comes into play. Targeting to approximate the probability of extreme values of overtime,the probability of any overtime is available to be used for any project planning purposes.

However, the main aspect of attention is whether the estimate is extra conservative orwhether it tends to underestimate and incur risks. The aspect of risk can be a subjectof a separate study, however, the experimentation with various schedule networks showthat the analytic approximation gives higher, or equal to Monte Carlo, probabilities forthe first quarter of overtime values and underestimates them further (decays faster). Thefirst quarter of overtime values can be considered as practically useful for project overtimeestimation values as the project manager is likely to be interested to know the delay ofthe project by certain percent of the planned makespan. For that values of overtime, theanalytic result is conservative and will give higher, too cautious probability values.

Additionally, the analytic approximation allows to re-evalute the makespan overtime atany stage of the project. Imagine a situation, when a project has been launched andfew activities have been already completed so that their real-life duration are in hand.Therefore, they can be plugged as a new parameter of the service time distribution ofthat activity and a new overtime probabilities can be received.

6.2 Further researchThe choice of probability distribution, which reflects the nature of activity, its complexityand tendency to be delayed, stays to be an issue and a subject of various studies. Thecurrent research can be extended to have the activity duration expressed with distributionsof different families, for example, of light-tailed and heavy-tailed distributions. Thatdistributions have other types of singularities, which have to be considered in light of thesame research goal of getting the estimate of project overtime.

The research can be extended to the cases of at least 2 different activities having the poleof the same magnitude. The behaviour is no longer linear and the fluctuations start tooccur. This effect can be studied for more than 2 identical poles in the network. Theresearch can be extended to study the network structure of the schedule network calleda fork, the challenges of which have been introduced in the report. The future researchcan also extend the simulation and interpretation of the current results and identify if theanalytic results can be applied to real projects, whether the current results incur risk forthe project by being extra conservative and whether the better benchmark than a fittedpolynomial can be considered.

41

Bibliography

[1] 3Blue1Brown. Visualizing the riemann hypothesis and analytic continuation.

[2] Li Bo, T. Bengtsson, and T. Bickel. Curse-of-dimensionality revisited: Collapse ofimportance sampling in very large scale systems. Semantic Scholar, 2005.

[3] M.F. Bugallo, V Elvira, and L. Martino. Adaptive importance sampling: the past,the present, and the future. IEEE Signal Processing Magazine, 2017.

[4] S. De Vuyst, H. Bruneel, and D. Fiems. Computationally efficient evaluation ofappointment schedules in health care. European Journal of Operational Research,237(3):1142–1154, 2014.

[5] P. Flajolet and R. Sedgewick. Analytic Combinatorics. Cambridge University Press,2008.

[6] M. Haidu and S. Isaac. Sixty years of project planning: history and future. DeGruyter, 2016.

[7] R. Srinivasan. Importance sampling – Applications in communications and detection.Springer, 2002.

[8] M. Vanhoucke. Project Management with Dynamic Scheduling. Springer - VerlagBerlin Heidelberg, 2012.

42

Appendices

A Derivation of θB (2.16)The auxiliary variable B = S + W can be written in terms of probability generatingfunctions as B(z) = S(z)W (z). S and W are independent. The generating function B(z):

B(z) = E[zB] =n=∞∑

n=0

b(n)zn = E[zS+W ] = E[zS]E[zW ],

S(z) = E[zS] and W (z) = E[zW ] and B(z) = S(z)W (z)

θB is defined for two different cases as shown below, noting that:

Reslimz→ζWW (z) = limz→ζW

(z − ζW )W (z) = θW

Reslimz→ζSS(z) = limz→ζS

(z − ζS)S(z) = θS

Case I: ζW < ζSθB = limz→ζW (z − ζW )B(z)θB = limz→ζW (z − ζW )W (z)S(z)θB = S(ζW ) limz→ζW (z − ζW )W (z)θB = S(ζW ) limz→ζW (z − ζW )θWθB = S(ζW )θW

Case II: ζS < ζWθB = limz→ζW (z − ζS)B(z)θB = limz→ζW (z − ζS)W (z)S(z)θB = W (ζS) limz→ζS(z − ζS)S(z)θB = W (ζS) limz→ζS(z − ζS)θSθB = W (ζS)θS

43

B. Derivation W (z) and θW in (2.17) and (2.18) Bibliography

B Derivation W (z) and θW in (2.17) and (2.18)

W = max(0, B − a)

W + a = max(a,B)

zW+a = zmax(a,B)

E[zW+a] = E[zmax(a,B)]

za · E[zW ] = E[zmax(a,B)]

za · E[zW ] =+∞∑

n=a

zn · Prob[max(a,B) = n]

za ·W (z) = za · Prob[max(a,B) = a] ++∞∑

n=a+1

zn · Prob[max(a,B) = n]

= zaa∑

n=0

b(n) ++∞∑

n=a+1

zn · b(n) +a∑

n=0

zn · b(n)−a∑

n=0

zn · b(n)

= B(z) +a∑

n=0

b(n) · (za − zn)

In the above the following is noted:

za · Prob[max(a,B) = a] is defined for B ∈ [0, a] and∑+∞

n=a+1 zn · Prob[max(a,B) = n],

is defined for B ∈ [a + 1,∞], where Prob[max(a,B) = n] = b(n). Recalling that B(z) =∑+∞n=0 z

n · b(n), it is defined as the sum of the two above components.

From the above, W has the same singularity as B. The residue θW is derived below:

θW = ResζWW (z) = limz→ζB

(z − ζB)W (z)

= limz→ζB

(z − ζB)B(z)

za+ lim

z→ζB(z − ζB)

∑an=0 b(n) · (za − zn)

za

=θBζaB

C Geometric DistributionLet Prob[A = n] = p(1− p)n, n ≥ 0 then E[A] = 1−p

pand A(z) = p

1−(1−p)z , which has onepole ζA = 1

1−p . If A > 0, then E[A] = 1pand A(z) = pz

1−(1−p)z , so ζA = 11−p and

θA = limz→ζA

(z − ζA)A(z) = limz→ 1

1−p

(z − 1

1− p)pz

1− (1− p)z =−p

(1− p)2

44

a technique for project management in case of …...for m thstretch, j join node is calculated and...

Documents