lecture 5 - pennsylvania state university · uday v. shanbhag lecture 5 a function fis said to be...

Lecture 5

Two-stage stochastic convex programs

February 4, 2015

Uday V. Shanbhag Lecture 5

Convex two-stage problems

• Consider a convex two-stage stochastic program of the form

SP minimizex

f(x) , E[f(x, ω)]

subject to x ∈ X,

where f(x, ω) is the optimal value of the second-stage problem

SecStage(ω) minimizey∈Y

q(y, ω)

subject to gi(y, ω) + χi ≤ 0, i = 1, . . . ,m

and χi = ti(x, ω).

•We assume throughout this section that q(y, ω), gi(y, ω), and ti(x, ω)

Stochastic Optimization 1


are real-valued convex functions∗ for a.e. ω, and both X and Y are

convex sets.

• The second-stage constraints can be absorbed into the objective func-

tions by defining a suitable indicator function. The resulting second-stage

problem is given by the following:

SecStage(ω) minimizey∈Rm

q(y, χ, ω)

wwhere q(y, χ, ω) = q(y, χ, ω) + 1lY (y).

•We refer to the optimal value of the second stage problem by θ(χ, ω).

∗Recall that real-valued convex functions are continuous (in fact they are locally Lipschitz continuous).



Conjugate duality: An introduction

•We now provide a brief aside regarding conjugate duality (Fenchel, 1951)

based on Veinott (1989).

• Consider the primal problem that requires choosing an x ∈ Rn that

minimizes c(x, p) where c is an extended real-valued convex function on

Rm+n and p ∈ Rm. Let C(q) be defined as follows for q ∈ Rm:

C(q) , infxc(x, q). (Primal)

It is well known that C is convex.

• The dual program requires choosing a (π, µ) ∈ Rm+1 such that the

linear function πTp− µ is maximized subject to the inequalities

πTq − µ ≤ C(q), ∀q ∈ Rm.



• More specifically, for a given p, we have the following problem:

maxπ,µ

πTp− µ

πTq − µ ≤ C(q), ∀q ∈ Rm.(1)

• Geometric insight: Choosing the affine function πTp − µ that minorize

C(p), the one whose value πTp− µ is maximized.

• In fact, C is closed at p if and only if the primal infimum is equal to the

dual supremum (the main duality relationship).

• Now consider Rockafellar’s conjugate duality relationship:

• For each fixed π, one can maximize πTp−µ subject to πTq−µ ≤ C(q)

by putting µ = C∗(π), where

C∗(π) = supq

[qTπ − C(q)

].



• The dual program then reduces to the following:

C∗∗(p) = supπ∈Rm

[pTπ − C∗(π)

].

In effect, the supremum is given by the conjugate of C∗. Furthermore,

C(p) = C∗∗(p) if and only if C is closed at p.

• Let us now return to our original problem given by (Primal).

• By Rockafellar (1968, 1970), the conjugate of c is denoted by c∗, we

have that C∗(π) = c∗(0, π) based on noting that

C∗(π) = supq∈Rm

[πTq − C(q)

]= sup

x∈Rn,q∈Rm

[0Tx+ πTq − c(x, q)

]= c∗(0, π).



• It follows from C(p) = C∗ ∗ (p) and by putting p = 0, we have that

infxc(x,0) = C(0) = C∗∗(0) = sup

π

[0Tπ − C∗(π)

]= sup

π−c∗(0, π).



664 ARTHUR F. VEINOTT, JR.

PRIMAL PROGRAM

Consider the primal program of choosing x E !H fl that minimizes

c(x, P>>

where c is a + w or real-valued convex function on $3 n +“’ and p E 3 n’. Let C be the projection of c, i.e.,

C(9) = i;fc(x,y) for 9 E S”‘.

It is well known that C is convex.

DUAL PROGRAM

The dual program is that of choosing (7~, p) E !B n’+l to maximize the linear function

CT> P> -lJ (1)

subject to the linear inequalities

(77>9)-pGCc(9) for all 9 E % I”, (2)

where (. , . > is the usual inner product on 3 nr. The geometric interpretation of the problem, illustrated in Figure 1, is that of choosing, from among all affine functions (v, .) - p minorizing C( .), one whose value at p is maxi- mum.

FIG. 1 Figure 1: A schematic



Application of conjugate duality to linear programming

Consider the standard form LP:

minx

cTx

subject to Ax = b

x ≥ 0.

(LP)

Consider the parametrized form of this LP:

minx

cTx

subject to Ax = b+ q

x ≥ 0.

(LP)

Note that when q ≡ 0, we recover the original form.



We define c(x, q) of our primal problem as follows:

c(x, q) ,

cTx if Ax = b+ q, x ≥ 0

+∞ otherwise .

Then the original LP corresponds to the primal problem:

infxc(x,0).



By the conjugate duality framework, the dual problem is given by

c∗(q, π) = supx

[qTπ − c(x, q)

]= sup

x:x≥0,Ax=b+q

[πTq − cTx

]= sup

x:x≥0

[πT(Ax− b)− cTx

]= sup

x:x≥0

[(ATπ − c)Tx

]− πTb.

Next we note that

supx:x≥0

[(ATπ − c)Tx

]− πTb =

−πTb, ATπ ≤ c

+∞. Otherwise.



But this is the dual problem

minπ

− bTπ

subject to ATπ ≤ c.(2)

This problem is equivalent to the following:

maxπ

bTπ

subject to ATπ ≤ c.(Dual)



Return to convex two-stage problems

• Consider a convex two-stage stochastic program of the form

SP minimizex

f(x) , E[f(x, ω)]

subject to x ∈ X,

where f(x, ω) is the optimal value of the second-stage problem

SecStage(ω) minimizey∈Y

q(y, ω)

subject to gi(y, ω) + χi(ω) ≤ 0, i = 1, . . . ,m.

where χi(ω) = ti(x, ω).

•We assume throughout this section that q(y, ω), gi(y, ω), and ti(x, ω)



are real-valued convex functions† for a.e. ω, X,Y are convex sets.

• Suppose ψ(y, χ, ω) is defined as

ψ(y, χ, ω) , q(y, ω) + 1lRn−G(y, ω) + χ(x, ω)),

where

q(y, ω) , q(y, ω) + 1lY (y),

G(y, ω) , (g1(y, ω); . . . ; gm(y;ω), and 1lRn−(•) is the indicator func-

tion for the negative orthant or

1lRn−(z) =

0, z ≤ 0

+∞. otherwise

†Recall that real-valued convex functions are continuous (in fact they are locally Lipschitz continuous).



• In the remainder of this discussion, we suppress ω are refer to ψ(y, χ, ω)

by ψ(y, χ).

•We may now compute the conjugate function of ψ(χ, y):

ψ∗(y∗, χ∗)

= sup(y,χ)∈Rm×Rn

((χ∗)Tχ+ (y∗)Ty)− q(y, ω)− 1lRn−(G(y, ω) + χ)

= sup

(y,χ)∈Rm×Rn

((χ∗)T(G(y) + χ)− (χ∗)TG(y) + (y∗)Ty)

− q(y, ω)− 1lRn−(G(y, ω) + χ)

= supy∈Rm

(y∗)Ty − q(y, ω)− (χ∗)TG(y)

+ supχ∈Rn

[((χ∗)T(G(y) + χ)− 1lRn−(G(y, ω) + χ)

].



• Suppose z = G(y) + χ, we have that

supχ∈Rn

[((χ∗)T(G(y) + χ)− 1lRn−(G(y, ω) + χ)

]= sup

z∈Rn

[(χ∗)Tz − 1lRn−(z)

]= sup

z∈Rn−

[(χ∗)Tz

]

=

0, χ∗ ≥ 0

+∞, otherwise.

= 1lRn+(χ∗), where 1lR+m

(u) =

0, u ≥ 0

+∞, otherwise .



• Consequently, we obtain

ψ∗(χ∗, y∗, ω) = supy∈Rm

(y∗)Ty − L(y, χ∗)

+ 1lRm+(χ∗),

where L(y, χ∗) , q(y) +∑mi=1 χ

∗igi(y, ω).

• Recall that

θ∗(χ∗) = ψ∗(0, χ∗) = supy∈Rm

−L(y, χ∗)+ 1lR+m

(χ∗)

= − infy∈Rm

L(y, χ∗) + 1lR+m

(χ∗).



As a result, the dual of the second-stage problem is given by

θ∗∗(χ) = maxλ∈Rm

λTχ− θ∗(λ)

= max

λ∈Rm

λTχ+ inf

y∈RmL(y, χ)− 1lRm+(λ)

= max

λ∈Rm+

λTχ+ inf

y∈RmL(y, λ)

.

• By the Fenchel-Moreau theorem, we have that either θ∗∗(•) = −∞ or

θ∗∗(y) = lsc(conv θ)(y), ∀y ∈ Rm.

• Recall that a function f is lower semicontinuous at x0 if

lim infx→x0

f(x) ≥ f(x0).



A function f is said to be lsc if it is lsc at every x0 ∈ Rn and lsc f is

the largest lower semicontinuous function that is less than equal to f .

• Consequently, θ∗∗(y) ≤ θ(y) for any y ∈ Rm and there is said to be no

duality gap or θ∗∗(y) = θ(y).

• Consider a setting where ψ(x, y) is convex over (x, y) ∈ Rn × Rn.

It is relatively straightforward to ascertain that θ(y) is convex and

conv θ(•) = θ(•).

• Furthermore, it is said that (**) is subconsistent if for a given value

of y, lsc θ(y) < +∞.

• Note that if (**) is feasible or dom ψ(•, y) 6= ∅, then θ(y) < +∞and (**) is subconsistent.

Proposition 1 Suppose that ψ(•, •) is convex. Then the following

holds:



1. The optimal value function θ(•) is convex;

2. If (**) is subconsistent, then θ∗∗(y) = θ(y) if and only if θ(•) is

lsc at y;

3. If θ∗∗(y) is finite, then the set of optimal solutions of the dual

problem (***) coincides with ∂θ∗∗(y);

4. The set of optimal solutions of (***) is nonempty and bounded if

and only if θ(y) is finite and θ(•) is continuous at y.

Remark: Some quick observations:

(2.) follows from the Fenchel-Moreau theorem;

(3.) follows from

∂f∗∗(x) = arg maxz∈Rn

zTx− f∗(z)

.



If θ(•) is continuous at y then it is lsc at y and θ∗∗(y) = θ(y).

Moreover, it follows that ∂θ∗∗(y) = ∂θ(y) and is nonempty and

bounded if θ(y) is finite. But by hypothesis of (iii) (θ∗∗(y) is finite), we

have that the set of optimal solutions of (***) is bounded and nonempty.

Proposition 2 Let χ and ω ∈ Ω be specified. Suppose that the second-

stage problem is convex. Then the following holds:• The functions θ(•, ω) and f(•, ω) are convex• There is no duality gap between the primal and the dual problems and

the dual problem has a nonempty set of optimal solutions if and only if

the optimal value function θ(•, ω) is subdifferentiable at χ.• Suppose that the optimal value of (SLP) is finite. Then there is no duality

gap between the primal and dual solutions and the dual problem has a

nonempty and bounded solution set if and only if

χ(•, ω) ∈ int (dom θ(•, ω)).



Nonanticipativity

Consider a two-stage stochastic program in which Ω = ω1, . . . , ωK with

probabilities p1, . . . , pK:

minx

K∑k=1

pkF (x, ωk) subject toxk ∈ X, k = 1, . . . ,K.

Consider a relaxation of the first-stage problem in which x is replaced by K

vectors, x1, . . . , xK for each scenario.

The resulting problem is given by

minxk

K∑k=1

pkF (xk, ωk) subject toxk ∈ X, k = 1, . . . ,K.



This is a set of K separable problems, with the kth problem given by

minxk

F (xk, ωk) subject toxk ∈ X.

More specifically, in the context of two-stage linear programming, this

problem is given by

minxk≥0,yk≥0

cTxk + qTk yk

subject toAxk = b,

Tkxk +Wkyk = hk.

(3)

However, this formulation is not suitable for modeling a two-stage process

because the first-stage variable xk is not dependent on the realization ωk.



This can be resolved by adding an additional constraint:

(x1, . . . , xK) ∈ L,

where

L , x = (x1, . . . , xK) : x1 = x2 = . . . = xK.

This is a linear subspace of the nK dimensional space X = Rn× . . .×Rn.

Decisions lying in this set are NOT dependent on the realization of random

data.

This constraint is referred to as the nonanticipativity constraint and together

with this constraint, the two-stage problem can be posed as follows:

minx

K∑k=1

pkF (x, ωk) subject tox1 = . . . = xK, xk ∈ X, k = 1, . . . ,K.



Another approach for specifying the non-anticipativity constraints is as

follows:x1 = x2

x2 = x3

...

xK−1 = xK

Suppose the nonanticipativity constraints are represented as

xk =K∑i=1

pixi, i = 1, . . . ,K.

Such a representation has particular relevance when contending with general,

rather than finite, distributions.



Consider the space X equipped with the scalar product

〈x,y〉 :=K∑i=1

pixTi yi.

Furthermore, suppose the linear operator P is defined as

Px :=

∑Ki=1 pixi

...∑Ki=1 pixi

.

Consequently, we have that

xk =K∑i=1

pixi, i = 1, . . . ,K.



can be compactly represented as x = Px. In fact, P is an orthogonal

projection operator on X in that

P(Px) = Px.

Furthermore, we have that

〈Px,y〉 =

K∑i=1

pixi

T K∑i=1

piyi

= 〈x,Py〉.



Dualization of nonanticipativity constraints

Suppose we assign Lagrange multipliers λ1, . . . , λK to the nonanticipativity

constraints:

xk =K∑i=1

pixi, i = 1, . . . ,K.

We may then define the Lagrangian function L(x, λ) as follows:

L(x, λ) :=K∑k=1

pkF (xk, ωk) +K∑k=1

pkλTk (xk −

K∑i=1

pixi).

Since P is an orthogonal projection, it follows that I−P is also an

orthogonal projection.



This is a consequence of noting that

(I−P)(I−P)x = (I−P)x−Px + P(Px)

= (I−P)x−Px + Px

= (I−P)x.

It follows that

K∑k=1

pkλTk (xk −

K∑i=1

pixi) = 〈λ, (I−P)x〉 = 〈(I−P)λ,x〉.

As a consequence, the Lagrangian function can be rewritten as follows:

L(x, λ) :=K∑k=1

pkF (xk, ωk) +K∑k=1

pk(λk −K∑i=1

piλi)Txk.



Duality

Consider the optimization problem

minx1,...,xK ,z

K∑k=1

pkF (xk, ωk) subject to xk = z, xk ∈ X, k = 1, . . . ,K.

By using an indicator function, we can write this problem as follows:

minx1,...,xK ,z

K∑k=1

pkF (xk, ωk) subject to xk = z, xk ∈ X, k = 1, . . . ,K.



Then the Lagrangian function is given by

L(x1, . . . , xK, z, λ1, . . . , λK) :=K∑k=1

pkFk(xk, ωk) +K∑k=1

pkλTk (xk − z).

The resulting min-max problem is given by

minx1,...,xK ,z

sup

λ1,...,λK

L(x1, . . . , xK, z, λ1, . . . , λK)

= minx1,...,xK ,z

supλ1,...,λK

K∑k=1


pkλTkxk − (

K∑k=1

pkλk)Tz

= sup

λ1,...,λK

minx1,...,xK ,z

K∑k=1


pkλTkxk − (

K∑k=1

pkλk)Tz

.



However, the infimum of the Lagrangian is −∞ unless∑Kk=1 pkλk = 0. It

follows that this problem can then be stated as follows:

maxλ1,...,λK

D(λ) where D(λ) , minx1,...,xK ,z

K∑k=1


pkλTkxk

subject to

K∑k=1

pkλk = 0.

(4)

From the separable structure of the problem, we have that

L(x, λ) =K∑k=1

pkLk(xk, λk), where Lk(xk, λk) = F (xk, ωk) + λTkxk.



Furthermore, we have that D(λ) =∑Kk=1 pkDk(λk),

where Dk(λk) = infxkLk(xk, λk).

• Suppose the problem is linear and the primal and dual problems are

feasible. Then there is no duality gap.



Duality for general distributions

• Consider the optimization problem given by the following:

minx∈Rn

E[F (x, ω)],

where

F (x, ω) = F (x, ω) + 1lX(x).

• Let X be a linear space of measurable mappings from Ω→ Rn and be

defined as Lp(Ω,F ,P;Rn) for p ∈ [1,+∞]. Consequently, for every

x ∈ X , the expectation E[F (x, ω)] is well-defined.

• Consequently, we may articulate the expected value problem as follows:

minx∈Lx

E [F (x(ω), ω)],



where

Lx , x ∈ X : x(ω) ≡ x for some x ∈ Rn

and x(ω) ≡ x implies that x(ω) = x for a.e. ω ∈ Ω.

• Consider the dual space of X , denoted by X ∗, and defined as X ∗ :=

Lq(Ω,F ,P;Rn), where 1/p + 1/q = 1. Note that by convention, if

p =∞, q = 1 and if q =∞, p = 1.

•We may now define the scalar or bilinear product given by

〈λ,x〉 = E [λTx] =∫

Ωλ(ω)Tx(ω)dP (ω), λ ∈ X ∗, x ∈ X .

• Further, consider the projection operator P : X → Lx defined as

[Px](ω) ≡ E[x].

By the definition of Lx, we have Lx = x : x ∈ X ,Px = x.



• Recall that the inner product is defined as

〈λ,Px〉 = E [λTPx].

But Px = E[x]. Consequently, we have that

E [λTPx] = E[λ]TE[x] = 〈λ,Px〉 = 〈P∗λ,x〉,

where P∗ is a projection operator from X ∗ to a subspace formed by

constant a.e. maps.‡

• It follows that

L(x, λ) := E [F (x(ω), ω)] + E [λT(x− E[x])].‡Note that if p = 2, then X∗ = X and P∗ = P.



• It can be observed that the second term can be rewritten as follows:

E [λT(x− E[x])] = 〈λ,x−Px〉 = 〈λ,x〉 − 〈λ,Px〉= 〈λ,x−Px〉 = 〈λ,x〉 − 〈P∗λ,x〉= 〈λ−P∗λ,x〉.

• Important observation:

λ+ u−P∗(λ+ u) = λ+ u− (P∗λ+ u) = λ−P∗λ,

where u is a constant map. Note that P∗u = u since P∗ is an operator

that projects onto the space of constant maps (a.e.).

• Consequently, λ − P∗λ does not change by adding a constant to λ(.).

It follows that we can subtract a constant P∗λ from λ.



• It follows that we can set P∗λ = 0 or E[λ] = 0. As a result, the

Lagrangian function is defined as follows:

L(x, λ) := E [F (x(ω), ω) + λ(ω)T(x(ω))], for E[λ] = 0.

•We may now articulate the dual problem:

maxλ∈X ∗

D(λ) := inf

x∈XL(x, λ)

subject to E[λ] = 0.

• By the interchangeability principle§, we have that the following holds:

infx∈X

E[F (x(ω), ω) + λ(ω)Tx(ω)] = E[

infx∈Rn

(F (x, ω) + λ(ω)Tx

)].

§See Theorem 7.80: basically this provides conditions under which

E[

infxf(x, ω)

]= infχ∈M

E[Fχ], where Fχ(ω) = f(χ(ω), ω).



• Consequently, D(λ) can be expressed as

D(λ) = E[Dω(λ(ω))], where Dω : Rn → R

is defined as

Dω(λ) := infx∈Rn

(λTx+Fω(x)) = − supx∈Rn

(−λTx−Fω(x)) , −F ∗ω(−λ).

• As a result, the dual function can be computed by solving Dω for every

ω and taking the Expectation over the optimal values

• From general theory, we have that the dual optimal value is less than

or equal to that of the primal problem. Furthermore, there is no duality

gap between these problems and both the primal and dual have optimal



solutions x and λ if and only if (x, λ) is a saddle-point of the Lagrangian

function or

x ∈ arg minx∈Lx

L(x, λ) and λ ∈ arg maxλ:E[λ]=0

L(x, λ).

• By the interchangeability principle, we have that

x(ω) ≡ x and x ∈ arg minx∈Rn

F (x, ω) + λ(ω)Tx

, a.e. ω ∈ Ω.

Since x(ω) = x a.e., it follows that E[λ] = 0, a consequence of the

earlier result.

• Suppose we now impose a convexity requirement (and closedness re-

quirement) on X as well as a convexity assumption of Fω(.) for a.e.



ω ∈ Ω. Consequently, Fω is also convex for a.e. ω ∈ Ω. Then, we have

that by the interchangeability principle, we have that

λ ∈ arg maxL(x, λ) if and only if λ(ω) ∈ −∂Fω(x).

To ensure feasibility with respect to E[λ] = 0, taking expectations on

both sides we have the following:

0 ∈ E[−∂Fω(x)].

However, under suitable regularity conditions, we can interchange E[.]

and ∂[.]. Furthermore, if 0 ∈ K if and only if 0 ∈ −K. It follows that

0 ∈ ∂[E[∂Fω(x)]].

Theorem 3 Suppose that the function F (x, ω) is random lower semi-



continuous, the set X is convex and closed, and for a.e. ω ∈ Ω, the

function F (., ω) is convex. Suppose (P) and (D) are given by the

following:

minx∈Rn

E[F (x, ω)] (P)

maxλ∈X ∗

D(λ) := inf

x∈XL(x, λ)

subject to E[λ] = 0. (D)

Then there is no duality gap between (P) and (D) and both prob-

lems have an optimal solution if and only if there exists and x ∈ Rn

satisfying

0 ∈ ∂[E[∂Fω(x)]].

In such a case, x is a solution of (P) and λ(ω) is a measurable

selection such that

λ(ω) ∈ −∂Fω(x)

such that E[λ] = 0 is an optimal solution of (D).


lecture 5 - pennsylvania state university · uday v. shanbhag lecture 5 a function fis said to be...

Documents