3.subgradientmethodvandenbe/236c/lectures/sgmethod.pdfl.vandenberghe ece236c(spring2019)...
TRANSCRIPT
L. Vandenberghe ECE236C (Spring 2019)
3. Subgradient method
• subgradient method
• convergence analysis
• optimal step size when f? is known
• alternating projections
• optimality
3.1
Subgradient method
to minimize a nondifferentiable convex function f : choose x0 and repeat
xk+1 = xk − tkgk, k = 0,1, . . .
gk is any subgradient of f at xk
Step size rules
• fixed step: tk constant
• fixed length: tk ‖gk ‖2 = ‖xk+1 − xk ‖2 is constant
• diminishing: tk → 0 and∞∑
k=0tk = ∞
Subgradient method 3.2
Assumptions
• f has finite optimal value f? and minimizer x?
• f is convex with dom f = Rn
• f is Lipschitz continuous with constant G > 0:
| f (x) − f (y)| ≤ G‖x − y‖2 for all x, y
this is equivalent to ‖g‖2 ≤ G for all x and g ∈ ∂ f (x) (see next page)
Subgradient method 3.3
Proof.
• assume ‖g‖2 ≤ G for all subgradients; choose gy ∈ ∂ f (y), gx ∈ ∂ f (x):
gTx (x − y) ≥ f (x) − f (y) ≥ gT
y (x − y)
by the Cauchy–Schwarz inequality
G‖x − y‖2 ≥ f (x) − f (y) ≥ −G‖x − y‖2
• assume ‖g‖2 > G for some g ∈ ∂ f (x); take y = x + g/‖g‖2:
f (y) ≥ f (x) + gT(y − x)= f (x) + ‖g‖2> f (x) + G
Subgradient method 3.4
Analysis
• the subgradient method is not a descent method
• therefore fbest,k = mini=0,...,k f (xi) can be less than f (xk)• the key quantity in the analysis is the distance to the optimal set
Progress in one iteration
• distance to x?:
‖xi+1 − x?‖22 = xi − tigi − x?
22
= ‖xi − x?‖22 − 2tigTi (xi − x?) + t2
i ‖gi‖22≤ ‖xi − x?‖22 − 2ti
(f (xi) − f?
)+ t2
i ‖gi‖22
• best function value: combine inequalities for i = 0, . . . , k:
2(k∑
i=0ti)( fbest,k − f?) ≤ ‖x0 − x?‖22 − ‖xk+1 − x?‖22 +
k∑i=0
t2i ‖gi‖22
≤ ‖x0 − x?‖22 +k∑
i=0t2i ‖gi‖22
Subgradient method 3.5
Fixed step size and fixed step length
Fixed step size: ti = t with t constant
fbest,k − f? ≤‖x0 − x?‖222(k + 1)t +
G2t2
• does not guarantee convergence of fbest,k
• for large k, fbest,k is approximately G2t/2-suboptimal
Fixed step length: ti = s/‖gi‖2 with s constant
fbest,k − f? ≤G‖x0 − x?‖22
2(k + 1)s +Gs2
• does not guarantee convergence of fbest,k
• for large k, fbest,k is approximately Gs/2-suboptimal
Subgradient method 3.6
Diminishing step size
ti → 0,∞∑
i=0ti = ∞
• bound on function value:
fbest,k − f? ≤‖x(0) − x?‖22
2k∑
i=0ti
+
G2 k∑i=0
t2i
2k∑
i=0ti
• can show that (k∑
i=0t2i )/(
k∑i=0
ti) → 0; hence, fbest,k converges to f?
• examples: ti = τ/(i + 1) or ti = τ/√
i + 1
Subgradient method 3.7
Example: 1-norm minimization
minimize ‖Ax − b‖1
• subgradient is given by AT sign(Ax − b)• example with A ∈ R500×100, b ∈ R500
Fixed steplength tk = s/‖gk ‖2 for s = 0.1, 0.01, 0.001
0 100 200 300 400 50010−4
10−3
10−2
10−1
100
k
( f (xk) − f ?)/ f ?
0.10.010.001
0 1000 2000 300010−4
10−3
10−2
10−1
100
k
( fbest,k) − f ?)/ f ?
0.10.010.001
Subgradient method 3.8
Diminishing step size: tk = 0.01/√
k + 1 and tk = 0.01/(k + 1)
0 1000 2000 3000 4000 500010−5
10−4
10−3
10−2
10−1
100
k
( fbest,k − f ?)/ f ?
tk = 0.01/√
k + 1tk = 0.01/(k + 1)
Subgradient method 3.9
Optimal step size for fixed number of iterations
from page 3.5: if si = ti‖gi‖2 and ‖x0 − x?‖2 ≤ R, then
fbest,k − f? ≤R2 +
k∑i=0
s2i
2k∑
i=0si/G
• for given k, the right-hand side is minimized by the fixed step length
si = s =R√
k + 1
• the resulting bound after k steps is
fbest,k − f? ≤ GR√k + 1
• this guarantees an accuracy fbest,k − f? ≤ ε in k = O(1/ε2) iterations
Subgradient method 3.10
Optimal step size when f ? is known
• the right-hand side in the first inequality of page 3.5 is minimized by
ti =f (xi) − f?
‖gi‖22
• the optimized bound is(f (xi) − f?
)2
‖gi‖22≤ ‖xi − x?‖22 − ‖xi+1 − x?‖22
• applying this recursively from i = 0 to i = k (and using ‖gi‖2 ≤ G) gives
fbest,k − f? ≤ G‖x0 − x?‖2√k + 1
Subgradient method 3.11
Exercise: find point in intersection of convex sets
find a point in the intersection of m closed convex sets C1, . . . , Cm:
minimize f (x) = max { f1(x), . . . , fm(x)}
where f j(x) = infy∈Cj‖x − y‖2 is Euclidean distance of x to Cj
• f? = 0 if the intersection is nonempty
• (from page 2.14) g ∈ ∂ f (x̂) if g ∈ ∂ f j(x̂) and Cj is farthest set from x̂
• (from page 2.20) subgradient g ∈ ∂ f j(x̂) follows from projection Pj(x̂) on Cj :
g = 0 if x̂ ∈ Cj, g =1
‖ x̂ − Pj(x̂)‖2(x̂ − Pj(x̂)
)if x̂ < Cj
note that ‖g‖2 = 1 if x̂ < Cj
Subgradient method 3.12
Subgradient method
• optimal step size (page 3.11) for f? = 0 and ‖gi‖2 = 1 is ti = f (xi)
• at iteration k, find farthest set Cj (with f (xk) = f j(xk)), and take
xk+1 = xk −f (xk)f j(xk)
(xk − Pj(xk))
= Pj(xk)
at each step, we project the current point onto the farthest set
• a version of the alternating projections algorithm
• for m = 2, projections alternate onto one set, then the other
• later, we will see faster versions of this that are almost as simple
Subgradient method 3.13
Optimality of the subgradient method
can the fbest,k − f? ≤ GR/√
k + 1 bound on page 3.10 be improved?
Problem class
• f is convex, with a minimizer x?
• we know a starting point x(0) with ‖x(0) − x?‖2 ≤ R
• we know the Lipschitz constant G of f on {x | ‖x − x?‖2 ≤ R}• f is defined by an oracle: given x, the oracle returns f (x) and a g ∈ ∂ f (x)
Algorithm class
• algorithm can choose any x(i+1) from the set x(0) + span{g(0),g(1), . . . ,g(i)}• we stop after a fixed number k of iterations
Subgradient method 3.14
Test problem and oracle
f (x) = maxi=1,...,k+1
xi +12‖x‖22 (with k < n), x(0) = 0
• subdifferential ∂ f (x) = {x} + conv{e j | 1 ≤ j ≤ k + 1, x j = maxi=1,...,k+1 xi}• solution and optimal value
x? = −( 1k + 1
, . . . ,1
k + 1︸ ︷︷ ︸k + 1 times
, 0, . . . ,0), f? = − 12(k + 1)
• distance of starting point to solution is R = ‖x(0) − x?‖2 = 1/√
k + 1
• Lipschitz constant on {x | ‖x − x?‖2 ≤ R}:
G = supg∈∂ f (x), ‖x−x?‖2≤R
‖g‖2 ≤2√
k + 1+ 1
• the oracle returns the subgradient e ̂ + x where ̂ = min{ j | x j = maxi=1,...,k+1
xi}
Subgradient method 3.15
Iteration
• after i ≤ k iterations of any algorithm in the algorithm class,
x(i) = (x(i)1 , . . . , x(i)i ,0, . . . ,0), f (x(i)) ≥ ‖x(i)‖2 ≥ 0
• suboptimality after k iterations
fbest,k − f? = − f? =1
2(k + 1) =GR
2(2 +√
k + 1)
Conclusion
• example shows that O(GR/√
k) bound cannot be improved
• subgradient method is “optimal” (for this problem and algorithm class)
Subgradient method 3.16
Summary: subgradient method
• handles general nondifferentiable convex problem
• often leads to very simple algorithms
• convergence can be very slow
• no good stopping criterion
• theoretical complexity: O(1/ε2) iterations to find ε -suboptimal point
• an “optimal” first-order method: O(1/ε2) bound cannot be improved
Subgradient method 3.17
References
• S. Boyd, Lecture slides and notes for EE364b, Convex Optimization II.
• Yu. Nesterov, Lectures on Convex Optimization (2018), section 3.2.3. Theexample on page 3.15 is in §3.2.1.
• B. T. Polyak, Introduction to Optimization (1987), section 5.3.
Subgradient method 3.18