lecture 5 - pennsylvania state university · uday v. shanbhag lecture 5 a function fis said to be...
TRANSCRIPT
![Page 1: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/1.jpg)
Lecture 5
Two-stage stochastic convex programs
February 4, 2015
![Page 2: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/2.jpg)
Uday V. Shanbhag Lecture 5
Convex two-stage problems
• Consider a convex two-stage stochastic program of the form
SP minimizex
f(x) , E[f(x, ω)]
subject to x ∈ X,
where f(x, ω) is the optimal value of the second-stage problem
SecStage(ω) minimizey∈Y
q(y, ω)
subject to gi(y, ω) + χi ≤ 0, i = 1, . . . ,m
and χi = ti(x, ω).
•We assume throughout this section that q(y, ω), gi(y, ω), and ti(x, ω)
Stochastic Optimization 1
![Page 3: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/3.jpg)
Uday V. Shanbhag Lecture 5
are real-valued convex functions∗ for a.e. ω, and both X and Y are
convex sets.
• The second-stage constraints can be absorbed into the objective func-
tions by defining a suitable indicator function. The resulting second-stage
problem is given by the following:
SecStage(ω) minimizey∈Rm
q(y, χ, ω)
wwhere q(y, χ, ω) = q(y, χ, ω) + 1lY (y).
•We refer to the optimal value of the second stage problem by θ(χ, ω).
∗Recall that real-valued convex functions are continuous (in fact they are locally Lipschitz continuous).
Stochastic Optimization 2
![Page 4: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/4.jpg)
Uday V. Shanbhag Lecture 5
Conjugate duality: An introduction
•We now provide a brief aside regarding conjugate duality (Fenchel, 1951)
based on Veinott (1989).
• Consider the primal problem that requires choosing an x ∈ Rn that
minimizes c(x, p) where c is an extended real-valued convex function on
Rm+n and p ∈ Rm. Let C(q) be defined as follows for q ∈ Rm:
C(q) , infxc(x, q). (Primal)
It is well known that C is convex.
• The dual program requires choosing a (π, µ) ∈ Rm+1 such that the
linear function πTp− µ is maximized subject to the inequalities
πTq − µ ≤ C(q), ∀q ∈ Rm.
Stochastic Optimization 3
![Page 5: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/5.jpg)
Uday V. Shanbhag Lecture 5
• More specifically, for a given p, we have the following problem:
maxπ,µ
πTp− µ
πTq − µ ≤ C(q), ∀q ∈ Rm.(1)
• Geometric insight: Choosing the affine function πTp − µ that minorize
C(p), the one whose value πTp− µ is maximized.
• In fact, C is closed at p if and only if the primal infimum is equal to the
dual supremum (the main duality relationship).
• Now consider Rockafellar’s conjugate duality relationship:
• For each fixed π, one can maximize πTp−µ subject to πTq−µ ≤ C(q)
by putting µ = C∗(π), where
C∗(π) = supq
[qTπ − C(q)
].
Stochastic Optimization 4
![Page 6: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/6.jpg)
Uday V. Shanbhag Lecture 5
• The dual program then reduces to the following:
C∗∗(p) = supπ∈Rm
[pTπ − C∗(π)
].
In effect, the supremum is given by the conjugate of C∗. Furthermore,
C(p) = C∗∗(p) if and only if C is closed at p.
• Let us now return to our original problem given by (Primal).
• By Rockafellar (1968, 1970), the conjugate of c is denoted by c∗, we
have that C∗(π) = c∗(0, π) based on noting that
C∗(π) = supq∈Rm
[πTq − C(q)
]= sup
x∈Rn,q∈Rm
[0Tx+ πTq − c(x, q)
]= c∗(0, π).
Stochastic Optimization 5
![Page 7: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/7.jpg)
Uday V. Shanbhag Lecture 5
• It follows from C(p) = C∗ ∗ (p) and by putting p = 0, we have that
infxc(x,0) = C(0) = C∗∗(0) = sup
π
[0Tπ − C∗(π)
]= sup
π−c∗(0, π).
Stochastic Optimization 6
![Page 8: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/8.jpg)
Uday V. Shanbhag Lecture 5
664 ARTHUR F. VEINOTT, JR.
PRIMAL PROGRAM
Consider the primal program of choosing x E !H fl that minimizes
c(x, P>>
where c is a + w or real-valued convex function on $3 n +“’ and p E 3 n’. Let C be the projection of c, i.e.,
C(9) = i;fc(x,y) for 9 E S”‘.
It is well known that C is convex.
DUAL PROGRAM
The dual program is that of choosing (7~, p) E !B n’+l to maximize the linear function
CT> P> -lJ (1)
subject to the linear inequalities
(77>9)-pGCc(9) for all 9 E % I”, (2)
where (. , . > is the usual inner product on 3 nr. The geometric interpretation of the problem, illustrated in Figure 1, is that of choosing, from among all affine functions (v, .) - p minorizing C( .), one whose value at p is maxi- mum.
FIG. 1 Figure 1: A schematic
Stochastic Optimization 7
![Page 9: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/9.jpg)
Uday V. Shanbhag Lecture 5
Application of conjugate duality to linear programming
Consider the standard form LP:
minx
cTx
subject to Ax = b
x ≥ 0.
(LP)
Consider the parametrized form of this LP:
minx
cTx
subject to Ax = b+ q
x ≥ 0.
(LP)
Note that when q ≡ 0, we recover the original form.
Stochastic Optimization 8
![Page 10: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/10.jpg)
Uday V. Shanbhag Lecture 5
We define c(x, q) of our primal problem as follows:
c(x, q) ,
cTx if Ax = b+ q, x ≥ 0
+∞ otherwise .
Then the original LP corresponds to the primal problem:
infxc(x,0).
Stochastic Optimization 9
![Page 11: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/11.jpg)
Uday V. Shanbhag Lecture 5
By the conjugate duality framework, the dual problem is given by
c∗(q, π) = supx
[qTπ − c(x, q)
]= sup
x:x≥0,Ax=b+q
[πTq − cTx
]= sup
x:x≥0
[πT(Ax− b)− cTx
]= sup
x:x≥0
[(ATπ − c)Tx
]− πTb.
Next we note that
supx:x≥0
[(ATπ − c)Tx
]− πTb =
−πTb, ATπ ≤ c
+∞. Otherwise.
Stochastic Optimization 10
![Page 12: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/12.jpg)
Uday V. Shanbhag Lecture 5
But this is the dual problem
minπ
− bTπ
subject to ATπ ≤ c.(2)
This problem is equivalent to the following:
maxπ
bTπ
subject to ATπ ≤ c.(Dual)
Stochastic Optimization 11
![Page 13: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/13.jpg)
Uday V. Shanbhag Lecture 5
Return to convex two-stage problems
• Consider a convex two-stage stochastic program of the form
SP minimizex
f(x) , E[f(x, ω)]
subject to x ∈ X,
where f(x, ω) is the optimal value of the second-stage problem
SecStage(ω) minimizey∈Y
q(y, ω)
subject to gi(y, ω) + χi(ω) ≤ 0, i = 1, . . . ,m.
where χi(ω) = ti(x, ω).
•We assume throughout this section that q(y, ω), gi(y, ω), and ti(x, ω)
Stochastic Optimization 12
![Page 14: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/14.jpg)
Uday V. Shanbhag Lecture 5
are real-valued convex functions† for a.e. ω, X,Y are convex sets.
• Suppose ψ(y, χ, ω) is defined as
ψ(y, χ, ω) , q(y, ω) + 1lRn−G(y, ω) + χ(x, ω)),
where
q(y, ω) , q(y, ω) + 1lY (y),
G(y, ω) , (g1(y, ω); . . . ; gm(y;ω), and 1lRn−(•) is the indicator func-
tion for the negative orthant or
1lRn−(z) =
0, z ≤ 0
+∞. otherwise
†Recall that real-valued convex functions are continuous (in fact they are locally Lipschitz continuous).
Stochastic Optimization 13
![Page 15: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/15.jpg)
Uday V. Shanbhag Lecture 5
• In the remainder of this discussion, we suppress ω are refer to ψ(y, χ, ω)
by ψ(y, χ).
•We may now compute the conjugate function of ψ(χ, y):
ψ∗(y∗, χ∗)
= sup(y,χ)∈Rm×Rn
((χ∗)Tχ+ (y∗)Ty)− q(y, ω)− 1lRn−(G(y, ω) + χ)
= sup
(y,χ)∈Rm×Rn
((χ∗)T(G(y) + χ)− (χ∗)TG(y) + (y∗)Ty)
− q(y, ω)− 1lRn−(G(y, ω) + χ)
= supy∈Rm
(y∗)Ty − q(y, ω)− (χ∗)TG(y)
+ supχ∈Rn
[((χ∗)T(G(y) + χ)− 1lRn−(G(y, ω) + χ)
].
Stochastic Optimization 14
![Page 16: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/16.jpg)
Uday V. Shanbhag Lecture 5
• Suppose z = G(y) + χ, we have that
supχ∈Rn
[((χ∗)T(G(y) + χ)− 1lRn−(G(y, ω) + χ)
]= sup
z∈Rn
[(χ∗)Tz − 1lRn−(z)
]= sup
z∈Rn−
[(χ∗)Tz
]
=
0, χ∗ ≥ 0
+∞, otherwise.
= 1lRn+(χ∗), where 1lR+m
(u) =
0, u ≥ 0
+∞, otherwise .
Stochastic Optimization 15
![Page 17: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/17.jpg)
Uday V. Shanbhag Lecture 5
• Consequently, we obtain
ψ∗(χ∗, y∗, ω) = supy∈Rm
(y∗)Ty − L(y, χ∗)
+ 1lRm+(χ∗),
where L(y, χ∗) , q(y) +∑mi=1 χ
∗igi(y, ω).
• Recall that
θ∗(χ∗) = ψ∗(0, χ∗) = supy∈Rm
−L(y, χ∗)+ 1lR+m
(χ∗)
= − infy∈Rm
L(y, χ∗) + 1lR+m
(χ∗).
Stochastic Optimization 16
![Page 18: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/18.jpg)
Uday V. Shanbhag Lecture 5
As a result, the dual of the second-stage problem is given by
θ∗∗(χ) = maxλ∈Rm
λTχ− θ∗(λ)
= max
λ∈Rm
λTχ+ inf
y∈RmL(y, χ)− 1lRm+(λ)
= max
λ∈Rm+
λTχ+ inf
y∈RmL(y, λ)
.
• By the Fenchel-Moreau theorem, we have that either θ∗∗(•) = −∞ or
θ∗∗(y) = lsc(conv θ)(y), ∀y ∈ Rm.
• Recall that a function f is lower semicontinuous at x0 if
lim infx→x0
f(x) ≥ f(x0).
Stochastic Optimization 17
![Page 19: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/19.jpg)
Uday V. Shanbhag Lecture 5
A function f is said to be lsc if it is lsc at every x0 ∈ Rn and lsc f is
the largest lower semicontinuous function that is less than equal to f .
• Consequently, θ∗∗(y) ≤ θ(y) for any y ∈ Rm and there is said to be no
duality gap or θ∗∗(y) = θ(y).
• Consider a setting where ψ(x, y) is convex over (x, y) ∈ Rn × Rn.
It is relatively straightforward to ascertain that θ(y) is convex and
conv θ(•) = θ(•).
• Furthermore, it is said that (**) is subconsistent if for a given value
of y, lsc θ(y) < +∞.
• Note that if (**) is feasible or dom ψ(•, y) 6= ∅, then θ(y) < +∞and (**) is subconsistent.
Proposition 1 Suppose that ψ(•, •) is convex. Then the following
holds:
Stochastic Optimization 18
![Page 20: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/20.jpg)
Uday V. Shanbhag Lecture 5
1. The optimal value function θ(•) is convex;
2. If (**) is subconsistent, then θ∗∗(y) = θ(y) if and only if θ(•) is
lsc at y;
3. If θ∗∗(y) is finite, then the set of optimal solutions of the dual
problem (***) coincides with ∂θ∗∗(y);
4. The set of optimal solutions of (***) is nonempty and bounded if
and only if θ(y) is finite and θ(•) is continuous at y.
Remark: Some quick observations:
(2.) follows from the Fenchel-Moreau theorem;
(3.) follows from
∂f∗∗(x) = arg maxz∈Rn
zTx− f∗(z)
.
Stochastic Optimization 19
![Page 21: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/21.jpg)
Uday V. Shanbhag Lecture 5
If θ(•) is continuous at y then it is lsc at y and θ∗∗(y) = θ(y).
Moreover, it follows that ∂θ∗∗(y) = ∂θ(y) and is nonempty and
bounded if θ(y) is finite. But by hypothesis of (iii) (θ∗∗(y) is finite), we
have that the set of optimal solutions of (***) is bounded and nonempty.
Proposition 2 Let χ and ω ∈ Ω be specified. Suppose that the second-
stage problem is convex. Then the following holds:• The functions θ(•, ω) and f(•, ω) are convex• There is no duality gap between the primal and the dual problems and
the dual problem has a nonempty set of optimal solutions if and only if
the optimal value function θ(•, ω) is subdifferentiable at χ.• Suppose that the optimal value of (SLP) is finite. Then there is no duality
gap between the primal and dual solutions and the dual problem has a
nonempty and bounded solution set if and only if
χ(•, ω) ∈ int (dom θ(•, ω)).
Stochastic Optimization 20
![Page 22: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/22.jpg)
Uday V. Shanbhag Lecture 5
Nonanticipativity
Consider a two-stage stochastic program in which Ω = ω1, . . . , ωK with
probabilities p1, . . . , pK:
minx
K∑k=1
pkF (x, ωk) subject toxk ∈ X, k = 1, . . . ,K.
Consider a relaxation of the first-stage problem in which x is replaced by K
vectors, x1, . . . , xK for each scenario.
The resulting problem is given by
minxk
K∑k=1
pkF (xk, ωk) subject toxk ∈ X, k = 1, . . . ,K.
Stochastic Optimization 21
![Page 23: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/23.jpg)
Uday V. Shanbhag Lecture 5
This is a set of K separable problems, with the kth problem given by
minxk
F (xk, ωk) subject toxk ∈ X.
More specifically, in the context of two-stage linear programming, this
problem is given by
minxk≥0,yk≥0
cTxk + qTk yk
subject toAxk = b,
Tkxk +Wkyk = hk.
(3)
However, this formulation is not suitable for modeling a two-stage process
because the first-stage variable xk is not dependent on the realization ωk.
Stochastic Optimization 22
![Page 24: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/24.jpg)
Uday V. Shanbhag Lecture 5
This can be resolved by adding an additional constraint:
(x1, . . . , xK) ∈ L,
where
L , x = (x1, . . . , xK) : x1 = x2 = . . . = xK.
This is a linear subspace of the nK dimensional space X = Rn× . . .×Rn.
Decisions lying in this set are NOT dependent on the realization of random
data.
This constraint is referred to as the nonanticipativity constraint and together
with this constraint, the two-stage problem can be posed as follows:
minx
K∑k=1
pkF (x, ωk) subject tox1 = . . . = xK, xk ∈ X, k = 1, . . . ,K.
Stochastic Optimization 23
![Page 25: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/25.jpg)
Uday V. Shanbhag Lecture 5
Another approach for specifying the non-anticipativity constraints is as
follows:x1 = x2
x2 = x3
...
xK−1 = xK
Suppose the nonanticipativity constraints are represented as
xk =K∑i=1
pixi, i = 1, . . . ,K.
Such a representation has particular relevance when contending with general,
rather than finite, distributions.
Stochastic Optimization 24
![Page 26: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/26.jpg)
Uday V. Shanbhag Lecture 5
Consider the space X equipped with the scalar product
〈x,y〉 :=K∑i=1
pixTi yi.
Furthermore, suppose the linear operator P is defined as
Px :=
∑Ki=1 pixi
...∑Ki=1 pixi
.
Consequently, we have that
xk =K∑i=1
pixi, i = 1, . . . ,K.
Stochastic Optimization 25
![Page 27: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/27.jpg)
Uday V. Shanbhag Lecture 5
can be compactly represented as x = Px. In fact, P is an orthogonal
projection operator on X in that
P(Px) = Px.
Furthermore, we have that
〈Px,y〉 =
K∑i=1
pixi
T K∑i=1
piyi
= 〈x,Py〉.
Stochastic Optimization 26
![Page 28: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/28.jpg)
Uday V. Shanbhag Lecture 5
Dualization of nonanticipativity constraints
Suppose we assign Lagrange multipliers λ1, . . . , λK to the nonanticipativity
constraints:
xk =K∑i=1
pixi, i = 1, . . . ,K.
We may then define the Lagrangian function L(x, λ) as follows:
L(x, λ) :=K∑k=1
pkF (xk, ωk) +K∑k=1
pkλTk (xk −
K∑i=1
pixi).
Since P is an orthogonal projection, it follows that I−P is also an
orthogonal projection.
Stochastic Optimization 27
![Page 29: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/29.jpg)
Uday V. Shanbhag Lecture 5
This is a consequence of noting that
(I−P)(I−P)x = (I−P)x−Px + P(Px)
= (I−P)x−Px + Px
= (I−P)x.
It follows that
K∑k=1
pkλTk (xk −
K∑i=1
pixi) = 〈λ, (I−P)x〉 = 〈(I−P)λ,x〉.
As a consequence, the Lagrangian function can be rewritten as follows:
L(x, λ) :=K∑k=1
pkF (xk, ωk) +K∑k=1
pk(λk −K∑i=1
piλi)Txk.
Stochastic Optimization 28
![Page 30: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/30.jpg)
Uday V. Shanbhag Lecture 5
Duality
Consider the optimization problem
minx1,...,xK ,z
K∑k=1
pkF (xk, ωk) subject to xk = z, xk ∈ X, k = 1, . . . ,K.
By using an indicator function, we can write this problem as follows:
minx1,...,xK ,z
K∑k=1
pkF (xk, ωk) subject to xk = z, xk ∈ X, k = 1, . . . ,K.
Stochastic Optimization 29
![Page 31: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/31.jpg)
Uday V. Shanbhag Lecture 5
Then the Lagrangian function is given by
L(x1, . . . , xK, z, λ1, . . . , λK) :=K∑k=1
pkFk(xk, ωk) +K∑k=1
pkλTk (xk − z).
The resulting min-max problem is given by
minx1,...,xK ,z
sup
λ1,...,λK
L(x1, . . . , xK, z, λ1, . . . , λK)
= minx1,...,xK ,z
supλ1,...,λK
K∑k=1
pkFk(xk, ωk) +K∑k=1
pkλTkxk − (
K∑k=1
pkλk)Tz
= sup
λ1,...,λK
minx1,...,xK ,z
K∑k=1
pkFk(xk, ωk) +K∑k=1
pkλTkxk − (
K∑k=1
pkλk)Tz
.
Stochastic Optimization 30
![Page 32: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/32.jpg)
Uday V. Shanbhag Lecture 5
However, the infimum of the Lagrangian is −∞ unless∑Kk=1 pkλk = 0. It
follows that this problem can then be stated as follows:
maxλ1,...,λK
D(λ) where D(λ) , minx1,...,xK ,z
K∑k=1
pkFk(xk, ωk) +K∑k=1
pkλTkxk
subject to
K∑k=1
pkλk = 0.
(4)
From the separable structure of the problem, we have that
L(x, λ) =K∑k=1
pkLk(xk, λk), where Lk(xk, λk) = F (xk, ωk) + λTkxk.
Stochastic Optimization 31
![Page 33: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/33.jpg)
Uday V. Shanbhag Lecture 5
Furthermore, we have that D(λ) =∑Kk=1 pkDk(λk),
where Dk(λk) = infxkLk(xk, λk).
• Suppose the problem is linear and the primal and dual problems are
feasible. Then there is no duality gap.
Stochastic Optimization 32
![Page 34: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/34.jpg)
Uday V. Shanbhag Lecture 5
Duality for general distributions
• Consider the optimization problem given by the following:
minx∈Rn
E[F (x, ω)],
where
F (x, ω) = F (x, ω) + 1lX(x).
• Let X be a linear space of measurable mappings from Ω→ Rn and be
defined as Lp(Ω,F ,P;Rn) for p ∈ [1,+∞]. Consequently, for every
x ∈ X , the expectation E[F (x, ω)] is well-defined.
• Consequently, we may articulate the expected value problem as follows:
minx∈Lx
E [F (x(ω), ω)],
Stochastic Optimization 33
![Page 35: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/35.jpg)
Uday V. Shanbhag Lecture 5
where
Lx , x ∈ X : x(ω) ≡ x for some x ∈ Rn
and x(ω) ≡ x implies that x(ω) = x for a.e. ω ∈ Ω.
• Consider the dual space of X , denoted by X ∗, and defined as X ∗ :=
Lq(Ω,F ,P;Rn), where 1/p + 1/q = 1. Note that by convention, if
p =∞, q = 1 and if q =∞, p = 1.
•We may now define the scalar or bilinear product given by
〈λ,x〉 = E [λTx] =∫
Ωλ(ω)Tx(ω)dP (ω), λ ∈ X ∗, x ∈ X .
• Further, consider the projection operator P : X → Lx defined as
[Px](ω) ≡ E[x].
By the definition of Lx, we have Lx = x : x ∈ X ,Px = x.
Stochastic Optimization 34
![Page 36: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/36.jpg)
Uday V. Shanbhag Lecture 5
• Recall that the inner product is defined as
〈λ,Px〉 = E [λTPx].
But Px = E[x]. Consequently, we have that
E [λTPx] = E[λ]TE[x] = 〈λ,Px〉 = 〈P∗λ,x〉,
where P∗ is a projection operator from X ∗ to a subspace formed by
constant a.e. maps.‡
• It follows that
L(x, λ) := E [F (x(ω), ω)] + E [λT(x− E[x])].‡Note that if p = 2, then X∗ = X and P∗ = P.
Stochastic Optimization 35
![Page 37: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/37.jpg)
Uday V. Shanbhag Lecture 5
• It can be observed that the second term can be rewritten as follows:
E [λT(x− E[x])] = 〈λ,x−Px〉 = 〈λ,x〉 − 〈λ,Px〉= 〈λ,x−Px〉 = 〈λ,x〉 − 〈P∗λ,x〉= 〈λ−P∗λ,x〉.
• Important observation:
λ+ u−P∗(λ+ u) = λ+ u− (P∗λ+ u) = λ−P∗λ,
where u is a constant map. Note that P∗u = u since P∗ is an operator
that projects onto the space of constant maps (a.e.).
• Consequently, λ − P∗λ does not change by adding a constant to λ(.).
It follows that we can subtract a constant P∗λ from λ.
Stochastic Optimization 36
![Page 38: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/38.jpg)
Uday V. Shanbhag Lecture 5
• It follows that we can set P∗λ = 0 or E[λ] = 0. As a result, the
Lagrangian function is defined as follows:
L(x, λ) := E [F (x(ω), ω) + λ(ω)T(x(ω))], for E[λ] = 0.
•We may now articulate the dual problem:
maxλ∈X ∗
D(λ) := inf
x∈XL(x, λ)
subject to E[λ] = 0.
• By the interchangeability principle§, we have that the following holds:
infx∈X
E[F (x(ω), ω) + λ(ω)Tx(ω)] = E[
infx∈Rn
(F (x, ω) + λ(ω)Tx
)].
§See Theorem 7.80: basically this provides conditions under which
E[
infxf(x, ω)
]= infχ∈M
E[Fχ], where Fχ(ω) = f(χ(ω), ω).
Stochastic Optimization 37
![Page 39: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/39.jpg)
Uday V. Shanbhag Lecture 5
• Consequently, D(λ) can be expressed as
D(λ) = E[Dω(λ(ω))], where Dω : Rn → R
is defined as
Dω(λ) := infx∈Rn
(λTx+Fω(x)) = − supx∈Rn
(−λTx−Fω(x)) , −F ∗ω(−λ).
• As a result, the dual function can be computed by solving Dω for every
ω and taking the Expectation over the optimal values
• From general theory, we have that the dual optimal value is less than
or equal to that of the primal problem. Furthermore, there is no duality
gap between these problems and both the primal and dual have optimal
Stochastic Optimization 38
![Page 40: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/40.jpg)
Uday V. Shanbhag Lecture 5
solutions x and λ if and only if (x, λ) is a saddle-point of the Lagrangian
function or
x ∈ arg minx∈Lx
L(x, λ) and λ ∈ arg maxλ:E[λ]=0
L(x, λ).
• By the interchangeability principle, we have that
x(ω) ≡ x and x ∈ arg minx∈Rn
F (x, ω) + λ(ω)Tx
, a.e. ω ∈ Ω.
Since x(ω) = x a.e., it follows that E[λ] = 0, a consequence of the
earlier result.
• Suppose we now impose a convexity requirement (and closedness re-
quirement) on X as well as a convexity assumption of Fω(.) for a.e.
Stochastic Optimization 39
![Page 41: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/41.jpg)
Uday V. Shanbhag Lecture 5
ω ∈ Ω. Consequently, Fω is also convex for a.e. ω ∈ Ω. Then, we have
that by the interchangeability principle, we have that
λ ∈ arg maxL(x, λ) if and only if λ(ω) ∈ −∂Fω(x).
To ensure feasibility with respect to E[λ] = 0, taking expectations on
both sides we have the following:
0 ∈ E[−∂Fω(x)].
However, under suitable regularity conditions, we can interchange E[.]
and ∂[.]. Furthermore, if 0 ∈ K if and only if 0 ∈ −K. It follows that
0 ∈ ∂[E[∂Fω(x)]].
Theorem 3 Suppose that the function F (x, ω) is random lower semi-
Stochastic Optimization 40
![Page 42: Lecture 5 - Pennsylvania State University · Uday V. Shanbhag Lecture 5 A function fis said to be lsc if it is lsc at every x0 2Rn andlsc fis the largest lower semicontinuous function](https://reader030.vdocuments.mx/reader030/viewer/2022041114/5f22d48974145913753ac875/html5/thumbnails/42.jpg)
Uday V. Shanbhag Lecture 5
continuous, the set X is convex and closed, and for a.e. ω ∈ Ω, the
function F (., ω) is convex. Suppose (P) and (D) are given by the
following:
minx∈Rn
E[F (x, ω)] (P)
maxλ∈X ∗
D(λ) := inf
x∈XL(x, λ)
subject to E[λ] = 0. (D)
Then there is no duality gap between (P) and (D) and both prob-
lems have an optimal solution if and only if there exists and x ∈ Rn
satisfying
0 ∈ ∂[E[∂Fω(x)]].
In such a case, x is a solution of (P) and λ(ω) is a measurable
selection such that
λ(ω) ∈ −∂Fω(x)
such that E[λ] = 0 is an optimal solution of (D).
Stochastic Optimization 41