distribution of the optimal value of a stochastic mixed
TRANSCRIPT
Distribution of the Optimal Value of a Stochastic Mixed
Zero-One Linear Optimization Problem under Objective
Uncertainty
Karthik Natarajan∗ Chung-Piaw Teo† Zhichao Zheng‡
June 20, 2011
Abstract
This paper is motivated by the question to approximate the distribution of the completion
time of a project network with random activity durations. In general, we consider the
mixed zero-one linear optimization problem under objective uncertainty, and develop an
approach to approximate the distribution of its optimal value when the random objective
coefficients follow a multivariate normal distribution. Linking our model to the classical
Stein’s Identity, we show that the best normal approximation of the random optimal value,
under the L2-norm, can be computed by solving the persistency problem, first introduced
by Bertsimas et al. (2006). We further extend our method to the minimum quadratic
regret problem, and show that for any general mixed zero-one linear optimization problem,
the minimum quadratic regret solution can be computed by solving a related persistency
problem. Extensive computational studies on are presented to demonstrate the advantages
of the new method.
Key words: stochastic mixed zero-one linear optimization; persistency; distribution approx-
imation; regret; completely positive programming; project management; portfolio selection
1 Introduction
One of the fundamental problems in project management is to identify the project completion
time when the activity durations are random. It is well known that any project can be rep-
resented as a directed acyclic graph. In this paper, we adopt the conventional activity-on-arc
representation of the project network, where activities are represented by arcs and nodes rep-
resent the milestones that indicate the starting or ending of the activities. The length of an
∗Department of Management Sciences, City University of Hong Kong, Hong Kong. Email: [email protected]
†Department of Decision Sciences, NUS Business School, National University of Singapore. Email: [email protected]
‡Department of Decision Sciences, NUS Business School, National University of Singapore. Email:[email protected]
1
arc is the duration of the activity represented by that arc. Hence, if all the activities have
deterministic durations, finding the project completion time is as easy as finding the longest
path in a corresponding directed acyclic graph, which can be solved as a linear programming
(LP) problem. However, when the activity durations are stochastic, the analysis of the random
project completion time becomes nontrivial.
It has long been the interest of both researchers and practitioners to estimate the distribution
of the project completion time. Over the past few decades, various ways have been invented
to approximate this distribution (cf. Dodin (1985), Cox (1995), etc.). Unfortunately, to the
best of our knowledge, none of these offer a rigorous approach to guarantee the accuracy of
their approximations. In this paper, we partially address this issue under the assumption that
the activity durations follow a multivariate normal distribution, and we construct a normal
distribution that is optimal under L2-norm to approximate the random project completion
time. In fact, our method applies to any general random mixed 0-1 LP problem under objective
uncertainty:
Z (c) := maxx∈P
n∑j=1
cjxj (1)
where c = (c1, . . . , cn)T is the random cost coefficient vector with mean vector µ and covariance
matrix Σ, and P is the domain of the feasible solutions (assumed to be bounded) defined by
P := {x ∈ Rn : aTi x = bi, ∀i = 1, . . . ,m,
xj ∈ {0, 1} , ∀j ∈ B ⊆ {1, . . . , n} ,x ≥ 0 }
In the project management problem, P characterizes the set of paths in the project network,
and cj is the random duration of activity j. Note that there is by now a huge literature on
finding the distribution of Z (c) for various classes of combinatorial optimization problems,
including minimum assignment, spanning tree, and traveling salesman problem (cf. Aldous &
Steele (2003)). These problems are notoriously hard, and often only partial results (asymptotic,
or when c is uniformly distributed) are known. Finding the exact distribution for general mixed
0-1 LP problem appears to be almost impossible.
Under the Critical Path Method (CPM), which is often used in the project management
community, the random project completion time is often estimated by replacing cj with its
expected value µj , i.e., Z (µ) is used to approximate the project completion time. In the classical
Program Evaluation and Review Technique (PERT), this is taken one step further where the
distribution of the project completion time is approximated by∑n
j=1 γj cj =∑n
j=1 γj(cj −µj)+
Z(µ), with
γj =
{1, if arc j is active in the longest path when solving Z(µ),0, otherwise.
Due to the simplicity of the approach, PERT has gained a lot of popularity, and the random
project networks are sometimes also called PERT networks. This has led to a natural estimation
2
problem:
(P) minf,γ
E
Z(c)− f −∑j
γj(cj − µj)
2which is the central question addressed by this paper. Note that we are solving for the best
normal approximation (in L2-norm) to the random project duration, as an affine function of
the individual task activity duration. We explicitly obtain the (approximate) solution to this
optimization problem.
Besides the project management problem, Problem (P) is also related to another class of
interesting problems, which is the minimum quadratic regret solution to mixed 0-1 LP problem.
In the simplest case, we consider a class of portfolio selection problem, where cj represents
the random return of a financial asset j and P characterizes the set of all possible investment
decisions. To differentiate with Problem (P), we change the notation from cj to rj in this case.
In particular, we are interested to solve
min∑nj=1 xj=1,x≥0
E
Z (r)−n∑
j=1
rjxj
2where Z (c) represents the best possible investment return, i.e.,
Z (r) := max∑nj=1 xj=1,x≥0
n∑j=1
rjxj , (2)
We show that finding the optimal portfolio strategy to minimize the quadratic regret also reduces
to solving a related persistency problem.
Bertsimas et al. (2006) introduced the notion of the persistency of a binary decision variable
in Problem (1) as the probability that the variable is active (i.e., takes value of 1) in an optimal
solution to Problem (1). We generalize this concept to include continuous variables as follows:
Definition 1 The persistency of the decision variable xj in Problem (1) is defined as E[xj (c)],
where xj (c) denotes an optimal value of xj as a function of the random vector c. When xj is
a binary variable, E[xj (c)] = P(xj (c) = 1).
Regarding the issue of multiple optima, when c is continuous, the support of c over which
Problem (1) has multiple optimal solutions has measure zero and x (c) is unique almost surely1.
For discretely distributed c with possibly multiple optimal solutions over a support of strictly
positive measure, x (c) is defined to be an optimal solution randomly selected from the set of
optimal solutions at c.
The notion of persistency generalizes “criticality index” in project networks and “choice
probability” in discrete choice models (c.f. Bertsimas et al. (2006), Natarajan et al. (2009),
1In case c is normally distributed, x (c) would be unique almost surely.
3
Mishra et al. (2011)). By persistency problem, we mean the problem of estimating the persis-
tency values.
Outline of this paper: In the next section, we review the related literature. In Sec-
tion 3, we introduce our distribution approximation model, followed by an extension in regret
minimization problem using portfolio management as an example in Section 4. Results from
extensive computational studies are presented and discussed in Section 5. Finally, we provide
some concluding remarks in Section 6.
2 Literature Review
Our problem has a long history, and is related to the classical “distribution problem of stochastic
linear programming” literature (cf. Ewbank et al. (1974), Prekopa (1966) and the references
therein). The distribution of the optimum is often approximated by some cumbersome numerical
methods such as the Cartesian integration method (c.f. Bereanu (1963)). These methods have
been studied under the general framework when the uncertain parameters may appear in the
objective, constraint matrix, or the right hand side of the LP problems. However, the total
number of random variables are very limited due to the numerical methods employed. In the
case of project management problem, finding the distribution of completion time in a PERT
network is still an active area of research with a rich literature (cf. Yao & Chu (2007) and the
references therein). Most of the work in this area has been focused on using some graphical
approaches to reduce the size of the graph and to reduce the complexity of estimating the
distribution of the project completion time (e.g., Dodin (1985)). A substream of the research
tries to find a good normal approximation to the project completion time distribution, but could
only rely their methods on the Central Limit Theorem and some moment estimation methods
(e.g., Cox (1995)). We solve this open problem and shows that the best normal approximation
to the completion time distribution, under L2-norm, can be obtained by solving the related
persistency problem introduced by Bertsimas et al. (2006), and further studied in Natarajan et
al. (2009).
Brown et al. (1997) brought up the issue of persistence and persistent modeling in optimiza-
tion through a series of cases studies. Although the concept of persistence discussed in that
paper is very broad and different from the persistency defined above, they are closely related
through the issue of data uncertainty in optimization, which is also related to robust optimiza-
tion. The authors pointed out that from the perspective of persistence, robust optimization
seeks a baseline solution that will persist as best possible with a number of alternate forecast
revisions. On the other hand, persistency describes the degree of persistence of each individual
decision variable in an optimization problem with data uncertainty. Indeed, we can further
generalize Definition 1 to the persistency of a feasible solution, i.e., the probability that this
feasible solution is optimal. However, it is beyond the scope of this paper and we would like to
leave it for future research.
4
Over the past few years, a substream of research in the field of persistency estimation
yielded in a series of semidefinite programming (SDP) models based on the connection between
the moment cone and the semidefinite cone. A common feature of these models is that they
only assume the knowledge of moment information of the uncertainty rather than the exact
forms of the distribution. Hence, they are also referred as distributionally robust stochastic
programming (DRSP) models.
Bertsimas et al. (2006) introduced arguably the first computational approach to approximate
the persistency by solving a class of SDPs called Marginal Moment Model (MMM) under the
assumption that the random vector c is described only through the marginal moments of each
cj and all the decision variables in Problem (1) is binary. Natarajan et al. (2009) extended
MMM to general mixed-integer LP problems, but their model formulation is based on the
characterization of the convex hull of the binary reformulation which is typically difficult to
derive. Mishra et al. (2011) presented a SDP model named Cross Moment Model (CMM) for c
described by both the marginal and cross moments. The formulation of CMM is based on the
extreme point enumeration of Problem (1). Hence, the size of CMM can easily go exponential
for general LP problems. Inspired by a recent application of conic optimization on mixed 0-1 LP
problems due to Burer (2009), Natarajan et al. (2011) developed a parsimonious convex conic
optimization model without binary variables to estimate the persistency of a general mixed 0-1
LP problem when c is described by both the marginal and cross moments. They referred their
model as Completely Positive Cross Moment Model (CPCMM). In this paper, we mainly exploit
this model to estimate the persistency values. Detailed review of this model will be provided in
Section 5.1.
In a related vein, Vandenberghe et al. (2007) used SDP to obtain a generalized Chebyshev
bound on the probability that a random vector lies within a set defined by several strict quadratic
inequalities. Unfortunately, their bound is not suitable for general persistency estimation (cf.
Natarajan et al. (2011)).
A recent paper by Agrawal et al. (2011) investigated the loss incurred by ignoring correla-
tions in a DRSP model and proposed a new concept called price of correlations (POC). They
showed that POC is bounded from above for a certain class of cost functions, suggesting that
the intuitive approach of assuming independent distributions may actually work well for these
problems. However, independence conditions can be extremely difficult to capture as well. One
of the most famous examples is given by Hagstrom (1988), who showed that computing the
expected value of the longest path in a directed acyclic graph is #P-complete when the arc
lengths are restricted to taking two possible values and independent of each other. Perhaps a
DRSP model with correlation conditions is more tractable. Agrawal et al. (2011) also admitted
that for some cost functions, POC can be particularly large, indicating the need of DRSP mod-
els to capture correlations. Fortunately, CPCMM partially fills this gap, which in turns further
strengthens our approximation method.
5
3 Optimal Normal Approximation to the Distribution
As discussed in the introduction, our main idea is to approximate the distribution of Z (c) by
a normal distribution, W (c) with the following form:
W (c) = f +
n∑j=1
γj (cj − µj) , (3)
where f and γj ’s are adjustable parameters. Note that Equation (3) is sufficient to represent
any normal distribution. The objective is to choose f and γj ’s such that the expected squared
deviation between W (c) and Z (c) is minimized. In particular, we aim to solve:
(P) minf,γ
E
Z (c)− f −n∑
j=1
γj (cj − µj)
2It turns out that the solution to Problem (P) under the normality assumption of c is related to
the concept of persistency in a straightforward manner as shown in the following theorem.
Theorem 1 When c follows a multivariate normal distribution, a solution to Problem (P) is
f∗ = E [Z(c)] , γ∗l = E [xl(c)] , l = 1, . . . , n.
This solution is unique when the covariance matrix Σ is positive definite.
The proof of Theorem 1 utilizes the following surprising covariance identity due to Stein,
and its proof is enclosed in Appendix A for completeness.
Lemma 1 [Stein’s Identity] Let the random vector X = (X1, . . . , Xn)T be multivariate normal-
ly distributed with mean vector µ and covariance matrix Σ. For any function h(x1, . . . , xn) :
Rn → R such that ∂h(x1, . . . , xn)/∂xj exists almost everywhere and E[|∂h(X)/∂xj |] < ∞,
∀j = 1, . . . , n. Denote ∇h(X) = (∂h(X)/∂x1, . . . , ∂h(X)/∂xn)T . Then
Cov(X, h(X)) = ΣE[∇h(X)],
Specifically,
Cov (Xl, h(X1, . . . , Xn)) =n∑
j=1
Cov (Xl, Xj)E
[∂
∂xjh(X1, . . . , Xn)
], ∀l = 1, . . . , n.
Proof of Theorem 1. The necessary and sufficient optimality conditions of Problem (P) are
E
Z (c)− f −n∑
j=1
γj (cj − µj)
= 0, and
6
E
Z (c)− f −n∑
j=1
γj (cj − µj)
(cl − µl)
= 0, ∀l = 1, . . . , n.
Hence, the optimal solution to (P), (f∗,γ∗) should satisfy
f∗ = E [Z (c)] , and
E
[(Z (c)−E [Z (c)]−
n∑l=1
γ∗j (cj − µj)
)(cl − µl)
]= 0, ∀l = 1, . . . , n.
Rearranging the second set of conditions, we get
Cov (cl, Z (c)) =
n∑j=1
γ∗jCov (cl, cj) , ∀l = 1, . . . , n. (4)
Observe that
E
[∂Z(c)
∂cj
]= E
∂
∂cl
n∑j=1
cjxj(c)
= E [xl(c)] , ∀l = 1, . . . , n,
since xj(c)/∂cl exists almost everywhere and equals to zero whenever it exists for all j, l =
1, . . . , n. Thus, we get γ∗l = E [xl(c)] , l = 1, . . . , n as one solution to Equation (4), which is also
unique if the covariance matrix Σ is positive definite. Therefore, the proof is completed.
With Theorem 1, the problem of finding the best normal approximation to the distribution
of Z(c) is transformed into computing the persistency in Problem (1) as well as E[Z(c)]. In the
next section, we present another application of our method in the quadratic regret minimization
problem using the portfolio management problem as an example.
4 Regret in Portfolio Management
Portfolio management problem deals with allocating the budget in various financial assets with
the aim to earn a good return while keeping the risk low. The pioneering work by Markowitz
(1952) proposed an optimization framework to balance the expected return and the associated
risk characterized by variance as follows:
(M) min xTΣxs.t. µTx ≥ τ
Ax = bx ≥ 0
where µ and Σ are the mean and covariance matrix of the random return vector r, respective-
ly, and τ is the target expected return. µ and Σ are assumed to be known. The constraints
Ax = b and x ≥ 0 describe the restrictions on portfolio selection. Note that the presence of
the constraints x ≥ 0 does not rule out short selling. Short sales of an asset can be modeled
7
by introducing short and long positions in each asset as separate variables and adjusting the
corresponding returns. However, we assume that the short selling is limited through the con-
straints Ax = b and x ≥ 0. The simplest example is∑n
j=1 xj = 1 and x ≥ 0. Next, Markowitz
introduced the notion of efficient portfolio to denote the portfolio with minimal variance for a
given level of target expected return, and continuum of such portfolios forms a mean/variance
efficient frontier.
Based on the foundation paved by Markowitz, the modern portfolio theory flourished. A
substream of the research has gained a lot of attention due to the fact that investors tend to
evaluate their choices against some target return distributions, i.e., benchmark returns. Such
models are generally referred as tracking models (cf. Dembo & King (1992)). The investor
incurs a cost when the portfolio return falls below the benchmark. This cost is also called
“regret” if the benchmark is the investment return under the assumption that the investor has
the ability to predict future asset returns exactly. Precisely, regret of an investment strategy x
is defined as
Regret(x, r) := Z (r)−n∑
j=1
rjxj .
where Z (r) represents the best possible investment return, i.e.,
Z (r) := maxAx=b,x≥0
n∑j=1
rjxj , (5)
However, if the portfolio is selected by simply minimizing the expected regret, the result is to
risk all the budget on the asset with the highest expected return (and short on the lowest one
as much as possible). A common approach is to minimize the quadratic regret. In other words,
we try to find a portfolio with a return distribution that best approximates the ideal return
distribution in the sense of expected squared deviation, i.e.,
(R) minAx=b,x≥0
E
Z (r)−n∑
j=1
rjxj
2Unlike Problem (P), we lose one degree of freedom in this distribution approximation problem,
i.e., in Problem (R), there is no equivalence of f as in Problem (P).
Interestingly, the regret minimization approach is related to another stream of online port-
folio selection research based on aggregate return maximization and the concept of universal
portfolio introduced by Cover (1991) (cf. Helmbold et al. (1998)). These portfolio models
usually apply the logarithm transform on the aggregate return and then use the Taylor series
expansion to approximate the objective function, e.g., E[log(1 + rTx)] u E[rTx − (rTx)2/2],
provided the return is sufficiently small. Then maximizing the approximated aggregate return
is the same as minimizing E[(1− rTx)2], i.e., using 1 as the benchmark when the return is suffi-
ciently small. This works well except when the return is high, so we replace the benchmark from
1 to the theoretical optimal return. Another advantage of using the best return distribution as
8
the benchmark is the ease of defining the cost function. Because regret is always nonnegative,
we do not need an asymmetric cost function to penalize the downside risk only, which is more
difficult to optimize (cf. King & Jensen (1992)).
Theorem 2 Under the assumption that r follows a multivariate normal distribution, Problem
(R) can be solved as a convex quadratic programming problem as follows:
(R′) minAx=b,x≥0
xT(Σ+ µµT
)x− 2 (ΣE[x(r)] +E[Z(r)]µ)T x
where E[x (r)] is the persistency value in Problem (5). Furthermore, when Σ+µµT is positive
definite, this problem can be solved as a second-order cone programming (SCOP) problem.
Proof. This theorem can be proved by expanding the objective function in Problem (R) and
then applying Stein’s Identity. Since it is a straightforward algebraic manipulation, we omit the
details.
Therefore, the problem of finding the minimum quadratic regret portfolio can also be solved
by computing the persistency values in Problem (5) as well as E[Z(r)]. Note that the regret
minimization model (i.e., Problem (R)) presented in this section is applicable to any general
problem of finding minimum quadratic regret solution to a (discrete) optimization problem, e.g.,
least squared regret shortest path (cf. Aissi et al. (2009)). Then our result says such problem
can be reduced to solving a class of (mixed integer) convex quadratic optimization problem.
It would be interesting to compare the formulation of Problem (R) with the Markowitz mod-
el. Rather than (M), it has been shown by King & Jensen (1992) that the efficient mean/variance
frontier can be drawn by solving the following special case of a tracking problem:
(M′) minAx=b,x≥0
E
n∑j=1
rjxj − t
2as the parameter t varies between certain bounds depending on the value of the Lagrange
multipliers for the target expected return constraints at the extremes of the frontier. They also
showed that this problem is equivalent to the following problem:
(M′′) minAx=b,x≥0
xTΣx− 2λµTx
as the parameter λ varies from 0 to +∞. Interestingly, despite the parameter adjusted according
to the target expected return, the objective coefficients in (M′′), Σ and µ are replaced by
E[rrT
]and E
[Z (r) rT
]in (R), respectively. Hopefully, the increased difficulty of (R) due
to the requirement of estimating E [Z (r)] and E[Z (r) rT
]may pay out as a better portfolio
selection strategy. To demonstrate of the difference in the portfolio construction, we present
the analytical solutions to Problem (R) and (M) when there are only two assets available to
construct the portfolio in the following example.
9
Example 1 Suppose that there are two financial assets with random return r1 and r2. More-
over, E [rj ] = µj, V ar (rj) = σ2j , j = 1, 2, and Cov(r1, r2) = σ12 are known. Assume that the
only constraint encoded by Ax = b is x1 + x2 = 1. Then the KKT conditions of Problem (R)
are
2E[r21]x1 + 2E [r1r2]x2 + α− 2E [Z (r) r1]− β1 = 0
2E [r2r1]x1 + 2E[r22]x2 + α− 2E [Z (r) r2]− β2 = 0
x1 + x2 = 1x1β1 = 0x2β2 = 0x1, x2, β1, β2 ≥ 0
Note that in this case Z (r) = max {r1, r2} = r2+(r1 − r2)+, where (y)+ := max {y, 0}. Hence,
with a little algebra, the solution to the above system of equations can be expressed asx1 =
E[(r1−r2)2I{r1≥r2}]
E[(r1−r2)2]
=E[(r1−r2)
2|r1≥r2]E[(r1−r2)
2]P (r1 ≥ r2)
x2 =E[(r1−r2)
2I{r1<r2}]E[(r1−r2)
2]=
E[(r1−r2)2|r1<r2]
E[(r1−r2)2]
P (r1 < r2)
where I denotes the indicator function. As discussed before, the solution in this case is not
simply the persistency values, i.e., P (r1 ≥ r2) and P (r1 < r2), but rather the persistency tilted
by some factors related to the squared deviation of the two random returns. Please note that we
do not involve Stein’s Identity in this example due to the simplicity of the problem.
On the other hand, in the Markowitz model (M), let the expected return of the minimum
variance portfolio computed from (M) without the expected return constraint be τ∗. It can be
verified that as long as the target return τ ≥ τ∗ and the covariance matrix Σ is positive definite,
the expected return constraint would be tight. Although one can justify this claim through the
convex analysis, we offer an alternative argument based on the KKT conditions of Problem (M):
2σ21x1 + 2σ12x2 − µ1θ + α− β1 = 0
2σ22x2 + 2σ12x1 − µ2θ + α− β2 = 0
x1 + x2 = 1θ (τ − µ1x1 − µ2x2) = 0τ − µ1x1 − µ2x2 ≤ 0x1β1 = 0x2β2 = 0x1, x2, β1, β2 ≥ 0
Assume that the constraint µ1x1 +µ2x2 ≥ τ is not tight, i.e. τ −µ1x1 −µ2x2 < 0. Then θ = 0,
and we can obtain x1 =σ22−σ12
σ21−2σ12+σ2
2=
σ22−σ12
V ar(r1−r2)
x2 =σ21−σ12
σ21−2σ12+σ2
2=
σ21−σ12
V ar(r1−r2)
if σ21 ≥ σ12 and σ2
2 ≥ σ12{x1 = 0x2 = 1
if σ22 < σ12{
x1 = 1x2 = 0
if σ21 < σ12
which exactly resembles the minimum variance portfolio, i.e., the solution to the following prob-
10
lem:min σ2
1x21 + 2σ12x1x2 + σ2
2x22
s.t. x1 + x2 = 1x1, x2 ≥ 0
Then the condition τ − µ1x1 − µ2x2 < 0 is just τ < τ∗. Note that this solution makes sense
only if
V ar (r1 − r2) = 0. (6)
In another situation when the constraint µ1x1 + µ2x2 ≥ τ is tight, together with the budget
condition x1 + x2 = 1, we can easily find the optimal portfolio:{x1 =
τ−µ2
µ1−µ2
x2 =µ2−τµ1−µ2
which is valid under the assumption that
τ ∈ [µ1, µ2] or [µ2, µ1] , and µ1 = µ2. (7)
Those situations when some of the conditions in (6) and (7) do not hold correspond to the
extreme cases that are trivial to analyze. For example, V ar (r1 − r2) = 0 implies the two
assets are perfectly correlated and their variances are the same, i.e., σ12 = σ1σ2 and σ1 = σ2.
Consequently, the covariance matrix is not positive definite. In this case, the variance of the
portfolio would be constant and equal to σ21. Then the only concern is to meet the target return
requirement.
One salient feature of the Markowitz model is the separation on the treatment of mean and
variance of the portfolios. Especially in this example with only two assets, the target expected
return constraint and the budget constraint are enough to determine the optimal portfolio under
certain situations as discussed above. Whereas, the quadratic regret approach is able to capture
more distributional information of the random returns through an integrated framework. The
detailed performance of these two models will be analyzed later in Section 5.3.
5 Computational Study
In this section, we first review some approaches for persistency estimation and then demonstrate
the performance of our approximation model through two classes of problems, i.e., project and
portfolio management problems. Simulation is conducted and serves as a benchmark in all the
experiments, which is possible because the deterministic versions of Problem (1) considered in
this section are all polynomial time solvable. If the deterministic problem is NP-hard, imple-
menting a simulation method would require the solution to a number of NP-hard problems,
which can be very resource consuming.
11
5.1 Persistency Models
As analyzed in the previous section, the success of our approximation method hinges on the
accuracy of the estimation on the expected objective value of the random optimization problem
as well as the persistency values.
In literature, the problem of estimating the expected objective value of random optimization
problems has been studied for a long time. In case of the project management problem, the
search for the expected project completion time started half century ago (cf. Fulkerson (1962))
and is still an active research topic (cf. Yao & Chu (2007)). On the other hand, the persistency
problem has only been brought into the optimization area since Bertsimas et al. (2006). As
reviewed before, there have been several models developed for the purpose of estimating the per-
sistency values (cf. Natarajan et al. (2009), Mishra et al. (2011), Natarajan et al. (2011), Kong
et al. (2011) etc.). In this section, we will review the most recent progress on the persistency
estimation mainly contributed by Natarajan et al. (2011).
Natarajan et al. (2011) consider the following stochastic optimization problem:
ZP := supc∼(µ,Σ)+
E [Z(c)] ,
where c ∼ (µ,Σ)+ means that the set of distributions of the random coefficient vector c (as-
sumed to be nonempty) is defined by the nonnegative support Rn+, finite mean vector µ and
finite second moment matrix Σ, i.e., c ∈ {X : E[X] = µ,E[XXT ] = Σ,P(X ≥ 0) = 1}. Theyproved that ZP can be solved as the following convex conic optimization problem:
ZC = max∑n
j=1 Yj,js.t. aT
i Xai − 2biaTi x+ b2i = 0, ∀i = 1, . . . ,m
Xj,j = xj , ∀j ∈ B 1 µT xT
µ Σ Y T
x Y X
≽cp 0
(i.e., ZP = ZC) where the decision variables are x ∈ Rn, X ∈ Rn×n, and Y ∈ Rn×n, and for
a matrix A ∈ Rn×n, A ≽cp 0 means that A lies in the cone of completely positive matrices of
dimension n defined as
CPn :={A ∈ Rn×n | ∃V ∈ Rn×k
+ , such that A = V V T}.
The linear program over the convex cone of the completely positive matrices is called a com-
pletely positive program (CPP), and ZC is a typical CPP. The authors named this model as
CPCMM, which stands for Completely Positive Cross Moment Model. In a subsequent paper,
Kong et al. (2011) re-developed CPCMM from a different perspective based on the general conic
optimization theory, which significantly generalizes CPCMM such that more support constraints
can be incorporated, e.g., some ellipsoid bounding constraints on the random vector.
In the formulation of ZC , the variables x, Y and X attempt to encode the information
xj = E [xj(c)], Yi,j = E [cjxi(c)] and Xi,j = E [xi(c)xj(c)]. Thus, through solving ZC , the
12
optimal objective value gives the value of E [Z(c)], and the optimal value of x is simply the
persistency value. However, CPCMM ignores the distributional information. Hence, when
c is normally distributed, CPCMM only gives an upper bound to E [Z(c)] and estimates on
the persistency values. Another issue is that although the completely positive cone is closed,
convex and pointed, it is widely believed that CPPs are NP-hard to solve. Fortunately, there
are various hierarchies of tractable approximations for the completely positive cone (cf. Bomze
et al. (2000), Parrilo (2000) and Klerk et al. (2002) etc.). In the computational study, when
we need to solve CPCMM, we will exploit the simple SDP approximation of the completely
positive constraint, i.e., A ≽cp 0 is relaxed to A ≽ 0 and A ≥ 0, where A ≽ 0 means that A is
positive semidefinite.
Despite all these numerical inaccuracies, we show in the next section that our approximation
method is still practically attractive due to the use of persistency in the approximation and the
ability to capture correlations through CPCMM.
5.2 Project Management Problem
In this section, we present the results for approximating the distribution of the project com-
pletion time when the activity durations are stochastic. We try to compare our approach with
as many existing approximation methods as possible, except some approaches that require the
activity duration distribution to be either discrete or at least represented in some discrete form,
e.g., bounding distribution method by Kleindorfer (1971), and the finite-state discrete approx-
imation with cascading collapse method by Ord (1991)2, etc.
Another critical issue is that almost all the approximated distributions derived using previous
methods do not reside in the same probability space as Z(c), which makes the computation of the
squared deviation impossible. This problem arises since the traditional approaches solely focus
on the distribution (like tail probabilities, etc.) but overlook the approximation error between
the approximated completion time and the true completion time under a specific realization of
the random activity durations. For example, Cox (1995) assumed the project completion time
to be normally distributed at first, and then tried to estimate the moments of the completion
time. Thus the final approximated distribution does not admit a calculation of the squared
deviation from the true project completion time distribution. Hence, we have to resort to other
measures to compare the performance of different approximation methods, e.g., some descriptive
statistics, like mean and standard deviation. In addition, we also deploy the following measure
to quantify the distance between two distributions:
Square Norm Distance (F,G) :=
∫ 1
0
[F−1 (y)−G−1 (y)
]2dy
where F and G are the cumulative distribution functions of two distributions.
2Although the approach developed in Ord (1991) is based on the finite-state discrete characterization of allthe activity durations as well as the completion time of each milestone, it could still be applied to the casewhen the activity durations are continuously distributed. The only concern is to find an appropriate discreteapproximation of each activity duration distribution.
13
On the other hand, PERT approach simply considers the expected duration of each activity
when choosing the critical path (i.e., the path with longest expected completion time), and
then use the mean and variance of this critical path to approximate the mean and variance
of the project completion time, respectively. Finally, resorting to the Central Limit Theorem,
PERT assumes the project completion time is normally distributed (c.f. MacCrimmon & Ryavec
(1964)). Therefore, when the project activities follow the multivariate normal distribution, the
PERT estimation admits the computation of squared deviation and thus can be compared to
our approach in a greater detail.
The most important advantage of our approximation approach using CPCMM is its ability
to capture the correlations among the random variables. To the best of our knowledge, none of
the previous studies address the issues of correlated activities for the project management prob-
lem. However, potential positive correlations among the activity duration may exist in many
project networks. For example, if some troubles are encountered in the delivery of concrete
for a task in a construction project, this problem is likely to influence the duration of numer-
ous activities involving concrete pours in this project. With the following example network,
besides comparing the performance of different approximation methods, we also illustrate the
importance of considering the correlations among activities.
Example 2 The network consists of four nodes and five arcs as shown in Figure 1. All the
activities are independent and normally distributed with mean and variance both equal to one.
Figure 1: Network for Example 2
The network in Example 2 is the “Wheatstone bridge” network from Lindsey (1972) and
later regarded as the “forbidden graph” by Dodin (1985) since it is the basic evidence of graph
irreducibility. This network has been studied in almost every piece of the research work in this
field. Ord (1991) summarized the results for this graph with normally distributed activity du-
rations, and also provided the results from his discrete approximation method with a parameter
k indicating the number of discrete points used to approximate the normal distribution3. All
these results are presented in Table 1, where T denotes the project completion time and σ(.)
3Note that the approximated distributions obtained by Ord (1991) should be a discrete distribution as well.However, we modified his theory a bit in computing the square norm distance by assuming the final approximateddistribution follows a normal distribution with the moments derived from his original procedure.
14
denotes its standard deviation. The new result from our method is also presented in Table 1 un-
der “CPCMM”, since we use CPCMM to find the estimates on the expected project completion
time and persistency of the arcs.
Expected Square NormE[T ] σ(T ) Error on σ(T ) Square Deviation Distance
106 simulation rounds 3.516 1.39 - - -Numerical integration 3.483 1.47 5.76% - 0.017Ord (1991) k = 2 3.261 0.70 49.64% - 0.543
k = 3 3.485 1.04 25.18% - 0.128k = 4 3.525 1.08 22.32% - 0.101k = 5 3.582 1.15 17.27% - 0.068k = 6 3.594 1.15 17.27% - 0.069
Cox (1995) 3.639 1.69 21.58% - 0.116PERT 3.000 1.73 24.46% 0.973 0.395
CPCMM 3.918 1.30 6.47% 0.473 0.176Combo 3.525 1.30 6.47% 0.311 0.015
Table 1: Results for Example
From Table 1, we can see that as predicted, CPCMM gives only an upper bound on the
expected completion time, but it produces the best estimate for the standard deviation except
the numerical integration approach. Regardless of the high accuracy, the integration approach
would be too tedious to be applicable for even medium-size networks. This suggests that using
persistency could be a promising way to estimate the variability in the project completion time.
Recall in our approximation model, the variance is solely determined by the persistency values
(i.e., γj ’s in Equation (3)).
Now we propose a better way to utilize our approximation model. Credited to the flexibility
of the model, we can use E[T ] estimated from the other approach together with the persistency
estimated from CPCMM to form a new approximation using Equation (3). Since there are
much more well established results on estimating the expected project completion time, they
provide us a rich pool of resources to construct better approximations. The last row of Table
1 showcases such an example, named “Combo” distribution, which is a constructed using E[T ]
estimated from “Ord (1991) k = 4” and the persistency estimated from CPCMM. The new
distribution’s square norm distance from the optimal distribution drops even lower than numer-
ical integration approach! Figure 2 plots the density and cumulative distribution functions of
different approximations together with the simulation results.
To demonstrate the importance of having correlated activities, let us consider a simple
variation of Example 2.
Example 3 Consider the same network as Example 2 with the same mean and variance for
each activity, but the activities are correlated with each other. For analysis purpose, consider two
types of randomly generated correlation: one is the general correlation without any restriction,
and the other is the sparse and positive correlation, which is more natural in a project network.
Note that all the previous methods except PERT give the same approximations since they
ignore correlations among activities. The approximated distribution from PERT is simply the
15
−2 0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Project Completion Time
Density of the Project Completion Time and Its Approximations
SimulationPERTCPCMMCombo
−2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Project Completion Time
Pro
babi
lity
Distribution of the Project Completion Time and Its Approximations
SimulationPERTCPCMMCombo
Figure 2: Distributions for Example 2
distribution of the critical path based on the expected activity completion time, i.e., Path
1 → 2 → 3 → 4.
Table 2 summarizes the results for ten instances of Example 3 with the first five instances
corresponding to the networks with general correlated activities and the last five instances corre-
sponding to the networks with sparsely and positively correlated activities. All the correlation
matrices are randomly generated for analysis. We have conducted this experiment for more
than hundreds of random correlation matrices, and the findings are the same as those discussed
here. Thus, we only report ten instances in this paper for succinctness. The sample size for all
the simulations is 106.
ExpectedApproximation Squared
# Method E[T ] σ(T ) Deviation1 Simulation 3.489 1.454 -
PERT 3.000 1.802 0.882CPCMM 3.889 1.374 0.452
2 Simulation 3.776 1.646 -PERT 3.000 2.239 1.902
CPCMM 4.288 1.502 0.7973 Simulation 3.584 1.533 -
PERT 3.000 1.940 1.162CPCMM 4.006 1.432 0.553
4 Simulation 3.474 1.473 -PERT 3.000 1.819 0.829
CPCMM 3.867 1.397 0.4425 Simulation 3.572 1.556 -
PERT 3.000 1.928 1.130CPCMM 3.994 1.458 0.537
ExpectedApproximation Squared
# Method E[T ] σ(T ) Deviation6 Simulation 3.331 1.813 -
PERT 3 2.154 0.486CPCMM 3.663 1.786 0.288
7 Simulation 3.449 1.774 -PERT 3 2.099 0.782
CPCMM 3.829 1.719 0.4118 Simulation 3.491 1.905 -
PERT 3 2.369 1.029CPCMM 3.844 1.854 0.437
9 Simulation 3.316 1.959 -PERT 3 2.270 0.575
CPCMM 3.530 1.937 0.27210 Simulation 3.630 2.177 0
PERT 3 2.836 1.530CPCMM 4.040 2.117 0.589
(Simulation Results for Uncorrelated Case: E[T ] = 3.516, σ(T ) = 1.390)
Table 2: Results for Example 3
It is observed that both mean and variance of the project completion time can vary a lot from
the independent case, especially variance. The variance estimates from CPCMM again cope
nicely with the true variances. For the sparse and positive correlation case, CPCMM works
even better with an average relative error less than 3%. When the activities are positively
correlated, PERT tends to substantially overestimate the variance of the project completion
16
time, and the expected squared deviation of our method is significantly less than PERT with
a reduction around 50%. Figure 3 showcases the approximated distribution for one instance
randomly picked with the following correlation matrix,1 0.148 0.738 0 0.805
0.148 1 0 0 0.0290.738 0 1 0 0.9800 0 0 1 0
0.805 0 0.980 0 1
,
which is instance #10 in Table 2.
−2 0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3
Project Completion Time
Density of the Project Completion Time and Its Approximations
SimulationPERTCPCMM
−2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Project Completion Time
Pro
babi
lity
Distribution of the Project Completion Time and Its Approximations
SimulationPERTCPCMM
Figure 3: Distributions for Example 3 Instance #10
To further justify the robustness of our model, we also test our approach on a series of
random project networks of larger sizes, and compare the results with PERT.
Example 4 Consider the random project network generated by the following algorithm.
Algorithm of Random Project Network Generation:
1. Randomly set the number of nodes (n′) in the project network.
2. Construct a zero adjacency matrix. Go through every matrix entry in the upper triangle
(above the diagonal), and replace 0 by 1 if an independent realization of a uniform random
variable U(0, 1) is greater than s, where s ∈ [0, 1]. s can be used to control the density of
the graph. For example, when s = 0, we will get a complete graph. More precisely, after
this step, the random network will have an expected number of arcs E[m′] = s·n′(n′−1)/2,
and each node will have s(n′ − 1) expected number of neighbours. We set s = 0.2 in our
experiments.
3. Remove all the isolated nodes in the network.
4. Create an initial node s. For each node i, add an arc s → i, if node i has no incoming
arcs.
17
5. Create a terminal node t. For each node i, add an arc i → t, if node i has no outgoing
arcs. After this step, the structure of the network is fixed. Denote the number of nodes as
n and the number of arcs as m.
6. For arc i, generate the random arc length with mean µi uniformly drawn between 1 and
10, and standard deviations uniformly drawn from 0 to 0.7µi.
7. Randomly generate a general or sparse and positive correlation matrix for the activities.
The results for twenty random networks are presented in Table 3. The first ten instances are
generated with general random correlation matrices, and the next ten instances are generated
with sparse and positive random correlation matrices. For this example, we also constructed
and analyzed more than one hundred random networks, and all the findings are the same as
those discussed later. Thus, for succinctness we only report twenty examples in this paper. The
sample size for all the simulations is 2× 104.
Expectedn Approximation Squared
# m Method E[T ] σ(T ) Deviation
1 16 Simulation 33.237 4.745 -32 PERT 29.886 5.923 33.410
CPCMM 37.480 3.576 25.879Combo 29.886 3.576 19.137
2 14 Simulation 44.010 5.403 -30 PERT 41.105 6.870 28.445
CPCMM 47.977 4.389 22.977Combo 41.105 4.389 15.157
3 12 Simulation 39.979 6.127 -21 PERT 39.687 6.412 1.854
CPCMM 43.064 5.602 10.987Combo 38.687 5.602 1.748
4 13 Simulation 43.792 7.016 -25 PERT 42.404 8.584 10.534
CPCMM 46.475 6.289 27.129Combo 42.404 6.289 6.827
5 9 Simulation 26.616 3.854 -13 PERT 25.326 4.582 5.881
CPCMM 28.522 3.498 5.797Combo 25.326 3.498 3.733
6 15 Simulation 29.976 4.355 -27 PERT 29.420 5.048 3.481
CPCMM 34.000 3.544 18.276Combo 29.420 3.544 2.755
7 23 Simulation 48.164 6.437 -54 PERT 46.247 7.691 14.772
CPCMM 54.627 5.526 47.342Combo 46.247 5.526 9.028
8 20 Simulation 42.818 6.225 -39 PERT 40.832 7.244 18.317
CPCMM 49.696 4.820 54.846Combo 40.832 4.820 11.809
9 8 Simulation 37.501 6.069 -15 PERT 35.921 6.480 10.961
CPCMM 39.418 5.732 7.697Combo 35.921 5.732 6.750
10 13 Simulation 40.850 6.409 024 PERT 39.188 7.704 16.221
CPCMM 45.180 5.514 25.265Combo 39.188 5.514 9.681
Expectedn Approximation Squared
# m Method E[T ] σ(T ) Deviation
11 16 Simulation 37.146 8.503 -30 PERT 33.551 10.426 32.549
CPCMM 41.982 7.669 31.818
12 13 Simulation 35.275 6.102 -21 PERT 31.748 5.521 33.864
CPCMM 38.866 5.390 19.872
13 17 Simulation 33.663 5.603 -33 PERT 28.109 9.710 78.637
CPCMM 39.405 3.938 46.236
14 9 Simulation 33.163 5.041 -20 PERT 29.595 6.307 30.245
CPCMM 36.247 4.407 14.260
15 18 Simulation 33.553 3.923 -36 PERT 28.131 6.836 60.655
CPCMM 37.833 1.706 28.119
16 14 Simulation 32.591 3.996 -26 PERT 30.214 5.929 19.506
CPCMM 36.153 3.167 17.278
17 16 Simulation 37.777 5.489 -30 PERT 33.551 7.083 46.085
CPCMM 42.861 4.184 36.043
18 11 Simulation 24.558 3.898 -16 PERT 21.409 6.655 27.180
CPCMM 26.754 3.209 9.480
19 13 Simulation 27.946 3.574 -22 PERT 26.1780 4.716 13.833
CPCMM 29.771 2.873 7.205
20 14 Simulation 43.964 5.691 -25 PERT 40.991 7.245 27.092
CPCMM 47.977 4.629 24.708
Table 3: Results of random project networks for Example 4
18
For general random correlation, as the size of the network increases, the mixture of positive
and negative correlations among activities tend to cancel out and PERT produces competitively
good estimate on the expected project completion time for most instances. Meanwhile, for some
graphs with dominant critical paths, PERT gives quite accurate estimates on both mean and
variance of the project completion time. On the other hand, since the approximation error of
CPP gets larger as the size of the problem increases, the upper bounds on the expected project
completion time from CPCMM gets worse. Thus, we observe that sometimes PERT gets a
lower expected squared deviation than CPCMM. Nevertheless, the persistency estimates from
CPCMM still perform very well, so the approximated distribution designed using the mean
from PERT and the persistency from CPCMM (shown as “Combo” in Table 3) tends to give
much lower expected squared deviation. For projects with sparsely and positively correlated
activities, PERT substantially underestimates the mean and overestimates the variance, and
the performance of CPCMM dominates PERT.
5.3 Portfolio Management Problems
In this section, we test the performance of the portfolio selected based on the minimum quadratic
regret criterion using our method developed in Section 4 with the constraint Ax = b specified
by∑n
j=1 xj = 1.
Data
We collect the daily return data of n industry portfolios (n = 10 and 174) over the past twenty
years between 1991 and 2010 from Fama/French Data Library5. The portfolios are constructed
by the following manner:
Each NYSE, AMEX, and NASDAQ stock is assigned to an industry portfolio at the
end of June of year t based on its four-digit SIC code at that time. (The Compustat
SIC codes are used for the fiscal year ending in calendar year t − 1. Whenever
Compustat SIC codes are not available, the CRSP SIC codes are used for June of
year t.) Then the returns are computed from July of t to June of t + 1. All the
stocks are equally weighted in the portfolio.
The mean (AVG), standard deviation (STD) and coefficient of variation (CV) of the daily
returns for each industry portfolio are summarized in Table 4. We also examine the frequency
plots of the daily returns for each industry portfolio and confirm the applicability of the normal
distribution assumption. See Figure 4 for two example frequency plots (on the top row) of
“Food” and “Util” industry portfolio from 17 industry portfolios data6. Although the time
4Similar experiments are conducted for n = 30 and the findings are the same as what will be presented later.We omit the detailed reports for n = 30 case due to the space constraint.
5http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data library.html6The reason for demonstrating these two industry portfolios is that they represent two classes of portfolios in
our data set. The food industry is relatively more stable, while the “Util” is more volatile, which is particularlyreflected on their return fluctuations during the last financial crisis period from 2007 to 2010.
19
Industry NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils OtherAVG 0.0870 0.1095 0.0893 0.0845 0.0882 0.0851 0.1015 0.0846 0.0845 0.0724STD 0.8416 1.1158 1.0357 1.4580 1.2462 1.3093 0.9779 1.0892 0.7948 0.8365CV 9.6736 10.1900 11.5980 17.2544 14.1293 15.3854 9.6345 12.8747 9.4059 11.5539
(a) 10 industry portfolios
Industry Food Mines Oil Clths Durbl Chems Cnsum Cnstr Steel FabPrAVG 0.0870 0.1095 0.0893 0.0845 0.0882 0.0851 0.1015 0.0846 0.0714 0.1211STD 0.7621 1.5778 1.4367 1.0422 1.0003 1.1348 1.1088 1.1227 1.4089 2.5584CV 8.7598 14.4091 16.0885 12.3337 11.3413 13.3349 10.9241 13.2707 19.7325 21.1263
Industry Machn Cars Trans Utils Rtail Finan OtherAVG 0.1002 0.0786 0.0809 0.0608 0.0901 0.0898 0.0947STD 1.1929 1.2875 1.0655 0.7948 1.0445 0.8012 1.0298CV 11.9052 16.3804 13.1706 13.0724 11.5927 8.9220 10.8743
(a) 17 industry portfolios
Table 4: Characterisics of the return data
series plots of the daily return data (middle row of Figure 4) suggest that the underlying return
distribution may not be stationary over time, we would like to argue that as long as the observed
history and the investment horizon is long enough, the assumption of unique multivariate normal
return distribution for the portfolios may still hold in our experiment. However, the trouble is
whether the historical information is sufficient or not to predict the true distribution. Thus,
we plot the cumulative distribution functions of the returns for time period from 1991 to 2010
as well as the time period from 1991 to 2001, respectively, as shown in the last row of Figure
4. Despite the exclusion of the financial crisis period (2007 to 2010), the distribution observed
in 1991 to 2001 is quite close to the one with ten more years of observation. Nevertheless, the
analysis above is merely a detailed description of the data, the focus of this experiment is to
access the quality of the investment decisions from different portfolio selection strategies.
Methodology
When the daily return data of n industry portfolios are considered, the investment decision is
to divide the total budget (normalized to 1) into these n portfolios. One practical example of
such problem could be the investment in various funds.
Rolling Horizon Framework:
Assume the past k-year data (k = 5 or 10) immediately before time period t (exclusive of
t) are used to support the portfolio selection at period t and the performance of the investment
decision at period t is recorded (i.e., out-of-sample performance). This framework is rolling
forward from period to period, i.e., the budget allocations are potentially different from period to
period. We assume that there are no transaction costs for changing the portfolio structure. For
each portfolio selection strategy, which will be explained later, the performance is continuously
tracked for 10 years, i.e., from 2001 to 2010.
For example, if we choose to utilize the past 10 year information to support the current
investment decision, then the data from Jan 2nd, 1991 (the first trading day in 1991) to Dec
29th, 2000 (the last trading day in 2000) is the initial history. The investment decision developed
20
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011−15
−10
−5
0
5
10
15
Time
Dai
ly R
etur
n
Daily Return of "Food" Industry Portfolio from 1991 to 2010
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011−15
−10
−5
0
5
10
15
Time
Dai
ly R
etur
n
Daily Return of "Util" Industry Portfolio from 1991 to 2010
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Daily Return
Pro
babi
lity
Cumulative Distribution Function of the Daily Return of "Food" Industry Portfolio
1991 − 20101991 − 2001
−5 −4 −3 −2 −1 0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Daily Return
Pro
babi
lity
Cumulative Distribution Function of the Daily Return of "Util" Industry Portfolio
1991 − 20101991 − 2001
(a) Food (b) Other
Figure 4: Distribution analysis of the daily returns of two portfolios picked from 17 industryportfolios
from this information set is then tested on Jan 2nd, 2001, i.e., the return of the investment based
on the actual portfolio returns on Jan 2nd, 2001 is recorded. Such experimental framework will
shift forward day by day with a fixed length of history in terms of the number of days. If we
choose to utilize only the past 5-year information to support the current investment decision,
then the data from Jan 2nd, 1996 to Dec 29th, 2000 would be the initial history.
Portfolio Selection Strategies:
The abundance of the historical return data forms a natural sample of the stochastic port-
folio returns. Hence, Besides of relying on the persistency models, we can also estimate the
persistency from the historical data. The strategies based on both approaches are considered.
The following six different portfolio selection strategies are investigated including our minimum
quadratic regret approach:
1. Expected Return
All the budget is devoted in the portfolio with the highest expected return in history.
21
2. Uniform
At the beginning of every period, the budget is reallocated and divided evenly among the
n portfolios. Since the portfolios are always rebalanced to a fixed proportion, this is also
called a constant-rebalanced portfolio strategy.
3. Markowitz
The Markowitz mean/variance model is used to compute the optimal portfolio, and the
target return is the average of the overall historical returns, i.e., the average of the average
historical returns of each industry portfolio. Effectively, the constraint requires the ex-
pected return to exceed the expected return of the “Uniform” strategy when the historical
returns are used to predict the future returns.
4. Persistency
(a) Empirical: For each portfolio, the percentage of all the trading days that it outper-
forms the rest portfolios is calculated from the historical data, and then the budget is
allocated according to these percentages, i.e., the empirical persistency.
(b) CPCMM: The budget is allocated according to the persistency estimated from CPCM-
M.
5. Conditional Value-at-Risk (CVaR)
CVaR of a random variable X (representing a loss) at a given probability level β ∈ (0, 1)
is defined as the conditional expectation of X exceeding a certain threshold, called Value-
at-Risk(VaR), which is the percentile of X at β, i.e.,
V aRβ
(X)= inf
{x ∈ R : P
(X ≤ x
)≥ β
}, and
CV aRβ
(X)= E
[X∣∣∣X > V aRβ
(X)]
.
As a coherent risk measure, CVaR has gained a lot interest over the past ten years.
Rockafellar & Uryasev (2000) showed that CVaR can be determined by minimizing a
more tractable auxiliary function without predetermining VaR first as follows:
CV aRβ
(X)= min
α∈R
{α+
1
1− βE
[(X − α
)+]}Then the budget allocation can be chosen to minimize CVaR of loss at a given tolerance
level β, i.e.,
(C) minα∈R,
∑nj=1 γj=1,γ≥0
{α+
1
1− βE[(−rTγ − α
)+]}This is a risk averse approach since it aims to minimize the conditional expectation of loss
exceeding VaR and meanwhile with probability 1− β, the loss is below α. Please refer to
Rockafellar & Uryasev (2000) and Rockafellar & Uryasev (2004) for more details on the
development of CVaR, and Zhu & Fukushima (2009) for the most recent review in this
area of research. We tested β = 0.8, 0.9 and 0.95 in the experiments, and since the results
22
are almost the same, only the results corresponds to β = 0.9 are reported.
(a) Empirical: Solve Problem (C) as a stochastic programming problem with the historical
returns as the sample.
(b) CPCMM: Solve Problem (C) using the extensions of CPCMM developed in Natarajan
et al. (2011).
6. Quadratic Regret
(a) Empirical: The investment decision is obtained by solving Problem (R) with parame-
ters empirically determined from the history returns as “Persistency (a)”, which include
the first two moments of the returns, E [x(c)] as well as E [Z (c)].
(b) CPCMM: The investment decision is obtained by solving Problem (R) with E [x(c)]
and E [Z (c)] estimated from solving CPCMM.
Performance Measures:
We impose the following measures to gouge the performance of different portfolio selection
strategies.
1. Regret
Two types of regret are considered, i.e., absolute regret (or difference regret) and quadratic
regret. For each type of regret, the total regret over the ten years investment horizon
(from 2001 to 2010) is recorded, which also reflects the average level of regret the investor
may face everyday. Note that although our investment decision is developed based on
minimizing the quadratic regret, the actual performance on the quadratic regret measure
from our strategy may not be the best, because the test we conducted is out-of-sample
and the existence of the numerical errors either from the return moment estimation or
solving the tractable approximation of CPCMM is unavoidable.
2. Annual Return
The average (AVG), standard deviation (STD) and signal-to-noise ratios (SNR = AVG/STD)
of the annual returns, as well as the volatility-adjusted annual return (VR = AVG−0.5×STD2) are computed. These measures help capture the risk and return tradeoff for dif-
ferent strategies in a short term.
3. Aggregate Return
We also keep track of the aggregate return for different strategies over the ten years
investment horizon compounded daily. This measure reflects the return from different
strategies in the long run.
Results and Discussion
Table 5 to Table 8 summarize the performance of different portfolio selection strategies on differ-
ent industry portfolios. The top and second best performance for each measure are underscored
23
Regret Annual Return AggregateQuadratic Absolute AVG STD SNR VR Return Inc(%)
Uniform 0.4048 25.6094 0.1913 0.2089 0.9158 0.1695 5.4964 -Expected Return 0.6450 25.1376 0.2382 0.3326 0.7162 0.1829 6.2697 14.07Markowitz 0.4902 25.8735 0.1650 0.1683 0.9809 0.1508 4.5595 -17.05Persistency (CPCMM) 0.3886 25.6728 0.1850 0.2306 0.8024 0.1584 4.9162 -10.56Persistency (Empirical) 0.3890 25.6181 0.1904 0.2362 0.8062 0.1625 5.1235 -6.79CVaR (CPCMM) 0.5061 26.0535 0.1471 0.1686 0.8727 0.1329 3.8063 -30.75CVaR (Empirical) 0.5163 26.1357 0.1390 0.1682 0.8265 0.1249 3.5089 -36.16Quadratic Regret (CPCMM) 0.4003 25.4729 0.2049 0.2350 0.8717 0.1773 5.9391 8.05Quadratic Regret (Empirical) 0.3971 25.4932 0.2028 0.2380 0.8524 0.1745 5.7789 5.14
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110
1
2
3
4
5
6
7
8
9
Time
Wea
lth
Wealth Growth from 2001 to 2010
S & P 500 IndexUniformExpected ReturnMarkowitzPersistencyCVaRQuadratic Regret
Table 5: Performance of different portfolio selection strategies on 10 industry portfolios using 5years as history
in the tables. Each figure below the table depicts the growth of one’s fortune (assume to be
1 initially) over the ten year investment horizon under different portfolio selection strategies.
Since the performance of empirical and CPCMM estimation does not vary too much nor cross
over with the performance from different strategies, we choose to plot the empirical results to
keep the figure readable. We also plot the growth of S&P 500 Index as a simple reference point.
In terms of accessing the performance, we use the “Uniform” strategy as a bench mark for
comparison. The percentage increases (or decreases) in the aggregate return for other strategies
against the “Uniform” strategy are reported in the tables.
It can be observed that our “Quadratic Regret” strategy usually generates the highest
aggregate return except in one case when it is outperformed by the “Expected Return” strategy.
Fortunately, performance of the “Quadratic Regret” strategy is much more stable than the
“Expected Return” strategy. As mentioned before, the “Quadratic Regret” strategy doest
not guarantee a minimal quadratic regret in the out-of-sample performance test. However,
the performance is still satisfiable, as the regret is very close to the best performance among
24
Regret Annual Return AggregateQuadratic Absolute AVG STD SNR VR Return Inc(%)
Uniform 0.4048 25.6094 0.1913 0.2089 0.9158 0.1695 5.4964 -Expected Return 0.6369 25.5171 0.2005 0.3462 0.5791 0.1406 4.0984 -25.43Markowitz 0.4709 25.7712 0.1752 0.1717 1.0204 0.1605 5.0201 -8.67Persistency (CPCMM) 0.3875 25.6270 0.1895 0.2241 0.8458 0.1644 5.2236 -4.96Persistency (Empirical) 0.3861 25.5861 0.1936 0.2285 0.8474 0.1675 5.3874 -1.98CVaR (CPCMM) 0.4958 25.9990 0.1526 0.1732 0.8811 0.1376 3.9883 -27.44CVaR (Empirical) 0.4982 26.0630 0.1462 0.1739 0.8407 0.1311 3.7362 -32.02Quadratic Regret (CPCMM) 0.3926 25.3044 0.2216 0.2235 0.9914 0.1966 7.2190 31.34Quadratic Regret (Empirical) 0.3897 25.3628 0.2158 0.2265 0.9528 0.1901 6.7643 23.07
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110
1
2
3
4
5
6
7
Time
Wea
lth
Wealth Growth from 2001 to 2010
S & P 500 IndexUniformExpected ReturnMarkowitzPersistencyCVaRQuadratic Regret
Table 6: Performance of different portfolio selection strategies on 10 industry portfolios using10 years as history
the strategies tested. Interestingly, the “Quadratic Regret” strategy usually gives the lowest
absolute regret in almost all cases.
The extreme strategy “Expected Return” gives the most unstable performance as indicated
by the smallest signal-to-risk ratio in all scenarios as well as the most fluctuated aggregate return.
Furthermore, as suggested by the common sense that “high expectation = big disappointment”,
the investor taking such extremal strategy usually suffers from the largest regret.
For the “Markowitz” strategy, although we explicitly require the expected return to exceed
that from the “Uniform” strategy, when tested out-of-sample, it performs significantly inferior
to the “Uniform” strategy. This might be due to the way that the Markowitz mean/variance
model penalizes the risk through variance (two-sided) and achieves the tradeoff between return
and risk through a separated treatment on mean and variance. In this experiment, this is
justified by its lowest standard deviation and largest signal-to-risk ratio of the annual return.
Another interesting finding is that among all the strategies we have tested, “CVaR” is the
most conservative one. Although it achieves the minimal variation in return, the opportunity
25
Regret Annual Return AggregateQuadratic Absolute AVG STD SNR VR Return Inc(%)
Uniform 0.7113 35.5749 0.2145 0.2174 0.9865 0.1909 6.8134 -Expected Return 0.9547 35.4593 0.2260 0.3313 0.6821 0.1711 5.5682 -17.55Markowitz 0.8680 35.7727 0.1948 0.1512 1.2888 0.1834 6.3251 -0.07Persistency (CPCMM) 0.6906 35.4390 0.2280 0.2364 0.9644 0.2001 7.4705 9.64Persistency (Empirical) 0.6924 35.3802 0.2339 0.2450 0.9543 0.2039 7.7575 13.86CVaR (CPCMM) 0.8827 36.0414 0.1681 0.1507 1.1156 0.1567 4.8390 -28.98CVaR (Empirical) 0.8799 36.0720 0.1651 0.1521 1.0853 0.1535 4.6832 -31.26Quadratic Regret (CPCMM) 0.7134 35.4136 0.2305 0.2435 0.9467 0.2009 7.5307 10.53Quadratic Regret (Empirical) 0.7082 35.3473 0.2371 0.2475 0.9580 0.2065 7.9669 16.93
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110
1
2
3
4
5
6
7
8
Time
Wea
lth
Wealth Growth from 2001 to 2010
S & P 500 IndexUniformExpected ReturnMarkowitzPersistencyCVaRQuadratic Regret
Table 7: Performance of different portfolio selection strategies on 17 industry portfolios using 5years as history
cost appears to be too high as indicated by the lowest average annual return (consequently the
lowest aggregated return), as well as the large regret.
The “Persistency” strategy performs quite moderate and stable. Interestingly, it usually
gives the smallest regret, especially in terms of the quadratic regret.
Besides the number of portfolios, the length of history, k also has some impact on the
performance of different strategies. In general, increasing k improves the performance of the
investment strategies except the “Uniform” strategy, which does not rely on any historical in-
formation. Intuitively, more data available means higher accuracy on the estimation of the
underlying return distribution, so as long as the assumption on the uniqueness of such distri-
bution holds as justified before, strategies based on more information would perform better.
To summarize, the “Quadratic Regret” strategy could consistently achieve a higher return,
and it would be suitable for the investment problem with more choices on the portfolios and
more historical information.
26
Regret Annual Return AggregateQuadratic Absolute AVG STD SNR Return Inc(%)
Uniform 0.7113 35.5749 0.2145 0.2174 0.9865 0.1909 6.8134 -Expected Return 0.9964 35.0902 0.2627 0.3456 0.7602 0.2030 7.6648 12.50Markowitz 0.8602 35.9246 0.1801 0.1524 1.1822 0.1685 5.4451 -20.08Persistency (CPCMM) 0.6910 35.4051 0.2314 0.2331 0.9928 0.2042 7.7903 14.34Persistency (Empirical) 0.6904 35.3328 0.2386 0.2395 0.9960 0.2099 8.2448 21.01CVaR (CPCMM) 0.8720 36.0677 0.1655 0.1537 1.0767 0.1537 4.6913 -31.15CVaR (Empirical) 0.8708 36.0689 0.1654 0.1550 1.0674 0.1534 4.6770 -31.36Quadratic Regret (CPCMM) 0.7045 35.3120 0.2406 0.2327 1.0341 0.2135 8.5563 25.58Quadratic Regret (Empirical) 0.7000 35.2626 0.2455 0.2377 1.0330 0.2172 8.8827 30.37
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110
1
2
3
4
5
6
7
8
9
Time
Wea
lth
Wealth Growth from 2001 to 2010
S & P 500 IndexUniformExpected ReturnMarkowitzPersistencyCVaRQuadratic Regret
Table 8: Performance of different portfolio selection strategies on 17 industry portfolios using10 years as history
6 Discussion and Conclusion
In this paper, we show that several classes of stochastic optimization problems can be trans-
formed into the related persistency problems, like project completion time distribution ap-
proximation problem and quadratic regret minimization problem. Extensive computational
experiments were presented to demonstrate the advantages of our distribution approximation
method, especially the benefits of introducing persistency into the distribution approximation
problem.
The results in this paper can be developed further in several ways. The computational
experiments on portfolio management problem presented in this paper can be more compre-
hensive. We can further compare the investment strategy with other online portfolio selection
strategies using geometric weights updates etc., often used in the machine learning community
(cf. Helmbold et al. (1998)). With the knowledge on the distribution of the optimal value,
we can now conduct more in-depth risk analysis or parameter calibration for the underlying
stochastic mixed zero-one linear optimization problem. We leave these and other related issues
27
for future research.
Appendix A. Proof of Lemma 1
Proof. The proof is consolidated from Stein (1972), Stein (1981) and Liu (1994).
We begin by showing the univariate version of Stein’ts Identity (cf. Stein (1972) and Stein
(1981)).
Let Y follow a standard normal distribution, N (0, 1), and ϕ (y) denote the standard normal
density with the derivative ϕ′ (y) = −yϕ (y). For any function g : R → R such that g′ exists
almost everywhere and E[|g′(Y )|] < ∞,
E [g′ (Y ))] =∫∞−∞ g′(y)ϕ (y) dy
=∫∞0 g′(y)
[−∫∞y −zϕ (z) dz
]dy +
∫ 0−∞ g′(y)
[∫ y−∞−zϕ (z) dz
]dy
=∫∞0 zϕ (z)
[∫ z0 g′(y)dy
]dz −
∫ 0−∞ zϕ (z)
[∫ 0z g′(y)dy
]dz
=(∫∞
0 +∫ 0−∞
){zϕ (z) [g(z)− g(0)]} dz
=∫∞−∞ zϕ (z) g(z)dz
= E [Y g (Y )]
where the third equality is justified by Fubini’s Theorem. Note that E[Y ] = 0 and V ar(Y ) = 1,
the equality proved above is essentially
Cov (Y, g (Y )) = V ar(Y )E[g′ (Y ))
]. (8)
Next, we generalize the above result into the multivariate case (cf. Stein (1981) and Liu
(1994)).
Let Z = (Z1, . . . , Zn)T , where Zj ’s are independent and identically distributed standard
normal random variables. It is straightforward to show by Equation (8) that for any function
h : Rn → R satisfying the same conditions as h,
E[Zj h (Z) | (Z2, . . . , Zn)
]= E
[∂h (Z)
∂zj| (Z2, . . . , Zn)
], ∀j = 1, . . . , n.
Taking the expectation of both sides, we find that
E[Zj h (Z)
]= E
[∂h (Z)
∂zj
], ∀j = 1, . . . , n,
i.e.,
Cov(Z, h (Z)
)= E
[∇h (Z)
].
Note that the random vector X can be written as X = Σ1/2Z + µ. Consider h (Z) =
h(Σ1/2Z + µ
), then ∇h (Z) = Σ1/2∇h (X). Hence,
Cov (X, h (X)) = Cov(Σ1/2Z, h (Z)
)= Σ1/2E
[∇h (Z)
]= ΣE [∇h (X)] .
28
References
Agrawal, S., Y. Ding, A. Saberi, Y. Ye (2011) Price of correlations in stochastic optimization, Manuscript.
Aissi, H., C. Bazgan, D. Vanderpooten (2009) Min-max and min-max regret versions of combinatorialoptimization problems: A survey , European Journal of Operational Research, 197, pp. 427–438.
Aldous, D., M. Steele (2003) The objective method: Probabilistic combinatorial optimization and localweak convergence, in Probability on Discrete Structures, H. Kesten (ed), Springer, Berlin, 110, pp.1–72.
Bereanu, B. (1963) On stochastic linear programming. I: Distribution problems: A single random variable,Romanian Journal of Pure and Applied Mathematics, 8, pp. 683–697.
Berman, A., N. Shaked-Monderer (2003) Completely Positive Matrices, World Scientific, Singapore.
Bertsimas, D., X. V. Doan, K. Natarajan, C. P. Teo (2010) Models for minimax stochastic linearoptimization problems with risk aversion, Mathematics of Operations Research, 35, pp. 580–602.
Bertsimas, D., K. Natarajan, C. P. Teo (2004) Probabilistic combinatorial optimization: moments,semidefinite programming and asymptotic bounds, SIAM Journal of Optimization, 15, pp. 185–209.
Bertsimas, D., K. Natarajan, C. P. Teo (2006) Persistence in discrete optimization under data uncer-tainty , Mathematical Programming, 108, pp. 251–274.
Bomze, I. M., M. Dur, E. D. Klerk, C. Roos, A. J. Quist, T. Terlaky, (2000) On copositive programmingand standard quadratic optimization problems, Journal of Global Optimization, 18, pp. 301–320.
Boyd, S., L. Vandenberghe (2004) Convex Optimizatioin, Cambridge University Press.
Borkar, V. (1995) Probability Theory: An Advanced Course, S. Axler, F. W. Gehring, P. R. Halmos(eds), Springer, New York..
Brown, G. G., R. F. Dell, R. K. Wood (1997) Optimization and persistence, Interfaces, 27, pp. 15–37.
Burer, S. (2009) On the copositive representation of binary and continuous nonconvex quadratic programs,Mathematical Programming, 120, pp. 479–495.
Cover, T. M. (1991) Universal portfolios, Mathematical Finance, 1, pp. 1–29.
Cox, M. A. (1995) Simple normal approximation to the completion time distribution for a PERT network ,International Journal of Project Management, 13, pp. 265–270.
Dembo, R. S., A. J. King (1992) Tracking models and the optimal regret distribution in asset allocation,Applied Stochastic Models and Data Analysis, 8, pp. 195–207.
Dodin, B. (1985) Bounding the project completion time distribution in PERT networks, OperationsResearch, 33, pp. 862–881.
Dur, M. (2009) Copositive programming: A survey , In: M. Diehl, F. Glineur, E. Jarlebring, W. Michiels(Eds.), Recent Advances in Optimization and its Applications in Engineering, Springer, pp. 3–20.
Ewbank, J. B., B. L. Foote, H. J. Kumin (1974), A method for the solution of the distribution problemof stochastic linear programming , SIAM Journal on Applied Mathematics, 26, pp. 225–238.
Fulkerson, D. R. (1962) Expected critical path lengths in PERT networks, Operations Research, 10, pp.808–817.
Hagstrom, J. N. (1988) Computational complexity of PERT problems, Networks, 18, pp. 139–147.
Helmbold, D. P., R. E. Schapire, Y. Singer, M. K. Warmuth (1997) A comparison of new and oldalgorithms for a mixture estimation problem, Machine Learning, 22, pp. 97–119.
Helmbold, D. P., R. E. Schapire, Y. Singer, M. K. Warmuth (1998) On-line portfolio selection usingmultiplicative updates, Mathematical Finance, 8, pp. 325–347.
Kamburowski, J. (1985) A Note on the Stochastic Shortest Route Problem, Operations Research, 33,pp. 696–698.
29
King, A. J., D. L. Jensen (1992) Linear-quadratic efficient frontiers for portfolio optimization, AppliedStochastic Models and Data Analysis, 8, pp. 195–207.
Klerk, E. de, D. V. Pasechnik (2002) Approximation of the stability number of a graph via copositiveprogramming , SIAM Journal on Optimization, 12, pp. 875–892.
Kleindorfer, G. B. (1971) Bounding distributions for a stochastic acyclic network , Operations Research,19, pp. 1586–1601.
Kong, Q., C. Y. Lee, C. P. Teo, Z. Zheng (2011) Scheduling arrivals to a stochastic service deliverysystem using copositive cones, Manuscript.
Liu, J. S. (1994) Siegel’s formula via Stein’s identities, Statistics & Probability Letters, 21, pp. 247–251.
Lindsey, J. H. (1972) An estimate of expected critical-path length in PERT networks, OperationsResearch, 20, pp. 800–812.
MacCrimmon, K. R., C. A. Ryavec (1964) An analytical study of the PERT assumptions, OperationsResearch, 12, pp. 16–37.
Markowitz, H. M. (1952) Portfolio selection, Journal of Finance, 7, pp. 77–91.
Markowitz, H. M. (1959) Portfolio Selection: Efficient Diversification of Investments, Wiley, New York.
Mishra, V. K., K. Natarajan, H. Tao, C. P. Teo (2011) Choice prediction with semidefinite optimizationwhen utilities are correlated , Manuscript.
Natarajan, K., M. Song, C. P. Teo (2009) Persistency model and its applications in choice modeling ,Management Science, 55, pp. 453–469.
Natarajan, K., C. P. Teo, Z. Zheng (2011a) Mixed zero-one linear programs under objective uncertainty:a completely positive representation, Forthcoming in Operations Resaerch.
Ord, J. K. (1991) A simple approximation to the completion time distribution for a PERT network , TheJournal of the Operational Research Society, 42, pp. 1011–1017.
Papadatos, N., V. Papathanasiou (2003) Multivariate covariance identities with an application to orderstatistics, Sankhya: The Indian Journal of Statistics, 65, pp. 307–316.
Parrilo, P. A. (2000) Structured Semidefinite Programs and Semi-algebraic Geometry Methods in Ro-bustness and Optimization, Ph.D. Dissertation, California Institute of Technology.
Prekopa, A. (1966) On the probability distribution of the optimum of a random linear program, SIAMJournal on Control and Optimization, 4, pp. 211–222.
Rockafellar, R. T., S. Uryasev (2000) Optimization of conditional value-at-risk , Journal of Risk, 2, pp.21–42.
Rockafellar, R. T., S. Uryasev (2004) Conditional value-at-risk for general loss distributions, Journal ofBanking and Finance, 26, pp. 1443–1471.
Sculli, D. (1983) The completion time of PERT networks, The Journal of the Operational ResearchSociety, 34, pp. 155–158.
Siegel, A. F. (1993) A surprising covariance involving the minimum of multivariate normal variables,Journal of the American Statistical Association, 88, pp. 77–80.
Stein, C. M. (1972) A bound for the error in the normal approximation to the distribution of a sum ofdependent random variables, Proceedings of the Berkeley Symposium on Mathematical Statisticsand Probability, 2, pp. 583–602.
Stein, C. M. (1981) Estimation of the mean of a multivariate normal distribution, The Annals ofStatistics, 9, pp. 1135–1151.
Vandenberghe, L., S. Boyd, K. Comanor (1972) Generalized Chebyshev bounds via semidefinite program-ming , SIAM Review, 49, pp. 52–64.
Yao, M. J., W. M. Chu (2007) A new approximation algorithm for obtaining the probability distributionfunction for project completion time, Computers & Mathematics with Applications, 54, pp. 282–295.
Zhu, S., M. Fukushima (2009) Worst-case Conditional Value-at-Risk with application to robust portfoliomanagement , Operations Research, 57, pp. 1155–1168.
30