focs13 workshop
DESCRIPTION
presentationTRANSCRIPT
-
ITERATIVE METHODS AND REGULARIZATION
IN THE DESIGN OF FAST ALGORITHMS
Lorenzo Orecchia, MIT Math
An unified framework for optimization and online learning
beyond Multiplicative Weight Updates
-
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
Online Linear Optimization Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs) A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
-
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
Online Linear Optimization Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs) A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
Optimization:
Regularized Updates
Online Learning:
Multiplicative Weight
Updates (MWUs)
-
Talk Outline: A Tale of Two Halves
PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING
Online Linear Optimization Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs) A Regularization Framework to generalize MWUs: Follow the Regularized Leader
MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE
PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW
Non-smooth vs Smooth Convex Optimization Non-smooth Convex Optimization reduces to Online Linear Optimization Application: Understanding Undirected Maxflow algorithms based on MWUs
MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH
-
Fast Algorithms for solving specific LPs and SDPs: Maximum Flow problems [PST], [GK], [F], [CKMST] Covering-packing problems [PST] Oblivious routing [R], [M]
Fast Approximation Algorithms based on LP and SDP relaxations: Maxcut [AK] Graph Partitioning Problems [AK], [S], [OSV]
Proof Technique Hardcore Lemma [BHK] QIP = PSPACE [W] Derandomization [Y]
and more
TOC Applications of MWUs
-
Machine Learning meets Optimization meets TCS
These techniques have been rediscovered multiple times in different fields:
Machine Learning, Convex Optimization, TCS
Three surveys emphasizing the different viewpoints and literatures:
1) ML: Prediction, Learning and Games by Gabor and Lugosi
2) Optimization: Lectures in Modern Convex Optimization
by Ben Tal and Nemirowski
3) TCS: The Multiplicative Weights Update Method: a Meta
Algorithm and Applications by Arora, Hazan and Kale
-
REGULARIZATION 101
-
What is Regularization?
Regularization is a fundamental technique in optimization
OPTIMIZATION
PROBLEM
WELL-BEHAVED
OPTIMIZATION
PROBLEM
Stable optimum
Unique optimal solution
Smoothness conditions
-
What is Regularization?
Regularization is a fundamental technique in optimization
OPTIMIZATION
PROBLEM
WELL-BEHAVED
OPTIMIZATION
PROBLEM
Benefits of Regularization in Learning and Statistics: Prevents overfitting
Increases stability
Decreases sensitivity to random noise
Regularizer F Parameter > 0
-
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a linear optimization problem:
The optimal solution f(c) may be very unstable under perturbation of c :
S Rn
kc0 ck and
S
cc0
f(c0) f(c)
kf(c0) f(c)k >>
-
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a regularized linear optimization problem
where F is -strongly convex.
Then:
S Rn
kc0 ck implies kf(c0) f(c)kk
f(c0)f(c)
+F(x)
cTx+F(x)
c0Tx+F(x)
-
Example: Regularization Helps Stability
f(c) = argminx2S cTx
Consider a convex set and a regularized linear optimization problem
where F is -strongly convex.
Then:
S Rn
kc0 ck implies kf(c0) f(c)kk
f(c0)f(c)
+F(x)
cTx+F(x)
c0Tx+F(x)kslopek
-
ONLINE LINEAR OPTIMIZATION
AND
MULTIPLICATIVE WEIGHT UPDATES
-
SETUP: Convex set X Rn, generic norm, repeated game over T rounds. At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2XCurrent solution
-
SETUP: Convex set X Rn, generic norm, repeated game over T rounds. At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2XCurrent solution
`(t) 2 Rn;kr`(t)k Current linear objective
Loss vector
-
SETUP: Convex set X Rn, generic norm, repeated game over T rounds. At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2XCurrent solution
`(t) 2 Rn;kr`(t)k Current linear objective
Loss vector
`(t)Tx(t)
Algorithms loss
-
SETUP: Convex set X Rn, generic norm, repeated game over T rounds. At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k
x(t+1) 2XUpdated solution
-
SETUP: Convex set X Rn, generic norm, repeated game over T rounds. At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k
x(t+1) 2X `(t+1) 2 Rn;kr`(t)k Updated solution New Loss Vector
-
L^SETUP: Convex set X Rn, generic norm, repeated game over T rounds. At round t,
Online Linear Minimization
ALGORITHM ADVERSARY
x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k
x(t+1) 2X `(t+1) 2 Rn;kr`(t)k GOAL: update x(t) to minimize regret
Average Algorithms Loss A Posteriori Optimum
1
TTXt=1
`(t)TxT min
x2X1
TTXt=1
`(t)
i
T
x
L
-
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X Rn under 1 norm. At round t,
-
p(t)ALGORITHM ADVERSARY
distribution over dimensions
i.e. experts
Simplex Case: Learning with Experts SETUP: Simplex X Rn under 1 norm. At round t,
k`(t)k1 Experts losses
-
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X Rn under 1 norm. At round t,
k`(t)k1 Experts losses
Eip(t)h`(t)
i
i= p(t)
T`(t)
Algorithms loss
-
p(t)ALGORITHM ADVERSARY
distribution over experts
Simplex Case: Learning with Experts SETUP: Simplex X Rn under 1 norm. At round t,
k`(t)k1 Experts losses
p(t+1)
Update distribution
-
Simplex Case: Multiplicative Weight Updates
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1 )`(t)
i w(t)
i ; w1 =~1Weights:
-
Simplex Case: Multiplicative Weight Updates
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1 )`(t)
i w(t)
i ; w1 =~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
-
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1 )`(t)
i w(t)
i ; w1 =~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
MULTIPLICATIVE WEIGHT UPDATE
Simplex Case: Multiplicative Weight Updates
-
p(t)
ALGORITHM ADVERSARY
`(t)
w(t+1)
i = (1 )`(t)
i w(t)
i ; w1 =~1Weights:
p(t+1)
i =w(t)
iPn
j=1w(t)
j
Distribution:
Simplex Case: Multiplicative Weight Updates
2 (0; 1)0 1
CONSERVATIVE AGGRESSIVE
-
MWUs: Unraveling the Update
p(t)
ALGORITHM ADVERSARY
`(t)
WEIGHT
CUMULATIVE LOSS
(1 )P
t`(t)
i
p(t+1)
i / w(t+1)i = (1 )`(t)
i w(t)iUpdate:
w(t+1)
i
Pt `(t)
i
-
For and
MWUs: Regret Bound
p(t)
ALGORITHM ADVERSARY
`(t)
L^L? lognT
+
k`(t)k1 < 12
p(t+1)
i / w(t+1)i = (1 )`(t)
i w(t)iUpdate:
-
For and
MWUs: Regret Bound
p(t)
ALGORITHM ADVERSARY
`(t)
L^L? lognT
+
< 12
p(t+1)
i / w(t+1)i = (1 )`(t)
i w(t)iUpdate:
Algorithms
Regret
Start-up Penalty Penalty for
being greedy
k`(t)k1
-
ONLINE LINEAR OPTIMIZATION BEYOND MWUs
A REGULARIZATION FRAMEWORK
-
MWUs: Proof Sketch of Regret Bound
(t+1) = log1Pni=1w
(t+1)
i
p(t+1)
i / w(t+1)i = (1 )P
t
s=1`(s)
iUpdate:
Proof is potential function argument
-
(t+1) = log1Pni=1w
(t+1)
i
p(t+1)
i / w(t+1)i = (1 )P
t
s=1`(s)
iUpdate:
Proof is potential function argument
Potential function bounds loss of best expert
(t+1) log1minni=1w
(t+1)
i =minni=1
Pts=1 `
(s)
i
MWUs: Proof Sketch of Regret Bound
-
(t+1) = log1Pni=1w
(t+1)
i
p(t+1)
i / w(t+1)i = (1 )P
t
s=1`(s)
iUpdate:
Proof is potential function argument
Potential function bounds loss of best expert
Potential function is related to algorithms performance
(t+1) log1minni=1w
(t+1)
i =minni=1
Pts=1 `
(s)
i
(t+1) (t) `(t)
Tp(t)
MWUs: Proof Sketch of Regret Bound
-
(t+1) = log1Pni=1w
(t+1)
i
p(t+1)
i / w(t+1)i = (1 )P
t
s=1`(s)
iUpdate:
Proof is potential function argument
Potential function bounds loss of best expert
Potential function is related to algorithms performance
(t+1) log1minni=1w
(t+1)
i =minni=1
Pts=1 `
(s)
i
(t+1) (t) `(t)
Tp(t)
DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE?
MWUs: Proof Sketch of Regret Bound
-
MWUs AND APPLICATIONS
Designing a Regularized Update GOAL: Design an update and its potential function analysis
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 1 FOLLOW THE LEADER: Cumulative loss
L(t) =Pts=1 `
(s)
x(t+1) = argminx2X
xTL(t) (t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 1 FOLLOW THE LEADER: Cumulative loss
L(t) =Pts=1 `
(s)
x(t+1) = argminx2X
xTL(t) (t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 1 FOLLOW THE LEADER: Cumulative loss
L(t) =Pts=1 `
(s)
x(t+1) = argminx2X
xTL(t) (t+1) = minx2X
xTL(t)
Pick best current solution Potential is current best loss
Designing a Regularized Update
Fails if best expert changes moves drastically
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 1 FOLLOW THE LEADER: Cumulative loss
L(t) =Pts=1 `
(s)
x(t+1) = argminx2X
xTL(t)
(t+1) = minx2X
xTL(t)
Designing a Regularized Update
How to make update
more stable?
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 2 FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + F(x)
(t+1) = minx2X
xTL(t) + F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. -strong convex w.r.t. norm
Parameter 0, TBD
Regularized Update: Definition
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 2 FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + F(x)
(t+1) = minx2X
xTL(t) + F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. -strong convex w.r.t. norm
Parameter 0, TBD
Regularized Update: Definition
These properties are actually sufficient to get a regret bound
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 2 FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + F(x)
(t+1) = minx2X
xTL(t) + F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. -strong convex w.r.t. norm
Parameter 0, TBD
Regularized Update: Analysis
(t+1) minx2X
L(t)Tx+ max
x2XF(x)
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 2 FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + F(x)
(t+1) = minx2X
xTL(t) + F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. -strong convex w.r.t. norm
Parameter 0, TBD
Regularized Update: Analysis
(t+1) minx2X
L(t)Tx+ max
x2XF(x) Regularization
error
-
MWUs AND APPLICATIONS
QUESTION: Choice of potential function?
DESIDERATA: 1) lower bounds best experts loss
2) tracks algorithms performance
Attempt 2 FOLLOW THE REGULARIZED LEADER:
x(t+1) = argminx2X
xTL(t) + F(x)
(t+1) = minx2X
xTL(t) + F(x)
Properties of Regularizer F(x):
1. Convex, differentiable
2. -strong convex w.r.t. norm
Parameter 0, TBD
Regularized Update: Analysis
?
f(t+1)(x)
-
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = xTL(t) + F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
Define:
(t+1)
(t)
-
Define:
(t+1)
(t)
Notice:
f(t+1)(x) f(t)(x) = `(t)Tx Latest loss vector
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = xTL(t) + F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
-
Define:
(t+1)
(t)
Notice:
f(t+1)(x) f(t)(x) = `(t)Tx Latest loss vector
`(t)Tx(t)
Tracking the Algorithm: Proof by Picture
f(t+1)(x) = L(t)Tx+ F(x)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
-
Compare:
(t+1)
(t)
and (t+1) (t)
Tracking the Algorithm: Proof by Picture
`(t)Tx(t)
f(t)(x)
x
f(t+1)(x)
x(t) x(t+1)
`(t)Tx(t)
f(t)(x)
f(t+1)(x)
-
p
Want:
(t+1)
(t)
Tracking the Algorithm: Proof by Picture
f(t+1)(x(t)) f(t+1)(x(t+1))
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
(t+1) (t) = f(t+1)(x(t+1)) f(t+1)(x(t)) + `(t)Tx(t)
-
Regularization in Action
(t+1)
(t)
f (t) is ( )-strongly-convex REGULARIZATION
f(t+1)(x) = L(t)Tx+ F(x)
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
-
`(t)
Regularization in Action
(t+1)
(t)
f (t) is ( )-strongly-convex REGULARIZATION
kf(t+1) f(t)k = k`(t)k jjx(t+1) x(t)jj jj`(t)jj
STABILITY
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
f(t+1)(x) = L(t)Tx+ F(x)
-
`(t)
Regularization in Action
(t+1)
(t)
f (t) is ( )-strongly-convex REGULARIZATION
kf(t+1) f(t)k = k`(t)k jjx(t+1) x(t)jj jj`(t)jj
STABILITY
`(t)Tx(t)
xx(t) x(t+1)
f(t)(x)
f(t+1)(x)
f(t+1)(x) = L(t)Tx+ F(x)
Quadratic
lower bound
to f(t+1)
-
MWUs AND APPLICATIONS
Analysis: Progress in One Iteration
rf(t+1)(x(t)) = `(t) jjx(t) x(t)jj jj`(t)jj
f (t+1) is ( )-strongly-convex
(t+1) (t) = f(t+1)(x(t+1)) f(t+1)(x(t)) + `(t)Tx(t)
f(t+1)(x(t+1)) f(t+1)(x(t)) `(t)T (x(t+1) x(t)) + jj`(t)jj22
-
MWUs AND APPLICATIONS
Analysis: Progress in One Iteration
rf(t+1)(x(t)) = `(t)
f(t+1)(x(t+1)) f(t+1)(x(t)) `(t)T (x(t+1) x(t)) + jj`(t)jj22
f (t+1) is ( )-strongly-convex
(t+1) (t) = f(t+1)(x(t+1)) f(t+1)(x(t)) + `(t)Tx(t)
k`(t)kkx(t+1) x(t)k+ jj`(t)jj2
k`(t)k22
jjx(t) x(t)jj jj`(t)jj
-
MWUs AND APPLICATIONS
Completing the Analysis
(t+1) (t) `(t)Tx(t) k`(t)k2
Regret at iteration t
Progress in one iteration:
-
MWUs AND APPLICATIONS
Completing the Analysis
(t+1) (t) `(t)Tx(t) k`(t)k2
Progress in one iteration:
Telescopic sum:
(T+1) TXt=1
`(t)Tp(t) +(1) T jj`
(t)jj2
-
MWUs AND APPLICATIONS
Completing the Analysis
(t+1) (t) `(t)Tx(t) k`(t)k2
Progress in one iteration:
Telescopic sum:
(T+1) TXt=1
`(t)Tp(t) +(1) T jj`
(t)jj2
Final regret bound:
1
T
TXt=1
`(t)Tx(t) min
x2X
TXt=1
`(t)Tx
!
T (maxx2X
F (x)minx2X
F (x)) +2
2
-
MWUs AND APPLICATIONS
Completing the Analysis
Regret bound: with regularizer F and
jj`(t)jj
Start-up Penalty Penalty for
being greedy
SAME TYPE OF BOUND AS FOR MWUs
1
T
TXt=1
`(t)Tx(t) min
x2X
TXt=1
`(t)Tx
!
T (maxx2X
F (x)minx2X
F (x)) +2
2
-
MWUs AND APPLICATIONS
Reinterpreting MWUs
(t+1) = minp0;Ppi=1
pTL(t) + nXi=1
pi logpiPotential function:
Regularizer: is negative entropy F (p) =
nXi=1
pi log pi
-
MWUs AND APPLICATIONS
Reinterpreting MWUs
(t+1) = minp0;Ppi=1
pTL(t) + nXi=1
pi logpiPotential function:
Regularizer: is negative entropy
F (p ) is 1-strongly-convex w.r.t.
Update:
F (p) =
nXi=1
pi log pi
k k1
p(t+1) = arg minp0;Ppi=1
pTL(t) + nXi=1
pi logpi
p(t+1)
i =e
1L(t)
iPni=1 e
1L(t)
i
=(1 )L(t)iPni=1(1 )L
(t)
i
:
SOFT-MAX
-
MWUs AND APPLICATIONS
Reinterpreting MWUs
(t+1) = minp0;Ppi=1
pTL(t) + nXi=1
pi logpiPotential function:
Regularizer: is negative entropy
F (p ) is 1-strongly-convex w.r.t.
Update:
F (p) =
nXi=1
pi log pi
k k1
p(t+1) = arg minp0;Ppi=1
pTL(t) + nXi=1
pi logpi
p(t+1)
i =e
1L(t)
iPni=1 e
1L(t)
i
=(1 )L(t)iPni=1(1 )L
(t)
i
:
-
MWUs AND APPLICATIONS
Beyond MWUs: which regularizer?
Regret bound: optimizing over
Best choice of regularizer and norm minimizes
maxt jj`(t)jj2 (maxx2X F (x)minx2X F (x))
1
T
TXt=1
`(t)Tx(t) min
x2X
TXt=1
`(t)Tx
!p(2 (maxx2X F (x)minx2X F (x))p
T
-
MWUs AND APPLICATIONS
Beyond MWUs: which regularizer?
Regret bound: optimizing over
Best choice of regularizer and norm minimizes
maxt jj`(t)jj2 (maxx2X F (x)minx2X F (x))
1
T
TXt=1
`(t)Tx(t) min
x2X
TXt=1
`(t)Tx
!p(2 (maxx2X F (x)minx2X F (x))p
T
Negative entropy with -norm is approximately optimal for simplex
QUESTION: are other regularizers ever useful?
`1
-
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Spectral algorithms for balanced separator running in time
Uses random-walk framework and SDP MWUs
Different walks correspond to different regularizers for eigenvector problem
[Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012]
Different Regularizers in Algorithm Design
F(X) = Tr(X1=2)
F(X) = Tr(Xp)
F(X) = Tr(X logX)SDP MWU
p-norm, 1 p 1 NEW REGULARIZER
Heat Kernel Random Walk
Lazy Random Walk
Personalized PageRank
~O(m)
-
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Sparsification
-spectral-sparsifiers with edges
Uses Matrix concentration bound equivalent to SDP MWUs
[Spielman, Srivastava 2008]
-spectral-sparsifiers with edges
Can be interpreted as different regularizer:
[Batson, Spielman, Srivastava 2009]
Different Regularizers in Algorithm Design
O(n logn2
)
O( n2)
F(X) = Tr(X1=2)
-
QUESTION 1:
Are other regularizers, besides entropy, ever useful?
YES! Applications:
Graph Partitioning and Random Walks
Sparsification
Many more in Online Learning
Bandit Online Learning [AHR],
Different Regularizers in Algorithm Design
-
NON-SMOOTH CONVEX OPTIMIZATION
REDUCES TO
ONLINE LINEAR OPTIMIZATION
-
Convex Optimization Setup
8x 2 X;krf(x)k
8x; y 2 X;krf(y)rf(x)k Lky xk
f convex, differentiable
X Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
-Lipschitz continuous -Lipschitz continuous gradient
-
Convex Optimization Setup
8x 2 X;krf(x)k
8x; y 2 X;krf(y)rf(x)k Lky xk
f convex, differentiable
X Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
-Lipschitz continuous -Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) f(x(t)) krf(x(t))k2
2L
-
Convex Optimization Setup
8x 2 X;krf(x)k
8x; y 2 X;krf(y)rf(x)k Lky xk
f convex, differentiable
X Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
-Lipschitz continuous -Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) f(x(t)) krf(x(t))k2
2L
x(t)x(t+1)
NO GRADIENT STEP GUARANTEE
-
Convex Optimization Setup
8x 2 X;krf(x)k
8x; y 2 X;krf(y)rf(x)k Lky xk
f convex, differentiable
X Rn closed, convex set
minx2X
f(x)
NON-SMOOTH SMOOTH
-Lipschitz continuous -Lipschitz continuous gradient
Gradient step is guaranteed to decrease
function value
f(x(t+1)) f(x(t)) krf(x(t))k2
2L
x(t)x(t+1)
NO GRADIENT STEP GUARANTEE
ONLY DUAL GUARANTEE
-
Non-Smooth Setup: Dual Approach
8x 2X; krf(x)k
f convex, differentiable
X Rn closed, convex set
minx2X
f(x)
-Lipschitz continuous
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) f(x)
f(x) f(x(t)) +rf(x(t)T (x x(t))
-
Non-Smooth Setup: Dual Approach
8x 2X; krf(x)k
f convex, differentiable
X Rn closed, convex set
minx2X
f(x)
-Lipschitz continuous
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) f(x)
f(x) f(x(t)) +rf(x(t)T (x x(t))
CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE
-
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) f(x)
f(x) f(x(t)) +rf(x(t)T (x x(t))
Take convex combination of both upper bounds and lower bounds with weights t
UPPER BOUND:
LOWER BOUND:
1PT
t=1t
PTt=1 tf(x
(t)) f(x)
UPPER
-
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) f(x)
f(x) f(x(t)) +rf(x(t))T (x x(t))
Take convex combination of both upper bounds and lower bounds with weights t
UPPER:
LOWER :
1PT
t=1t
PTt=1 tf(x
(t)) f(x)
UPPER
f(x) 1PTt=1
t
hPTt=1 t(f(x
(t)) +rf(x(t))T (x x(t)))i
LOWER
-
Non-Smooth Setup: Dual Approach
x(t)x(t+1) x(t+2)
APPROACH: Each iterate solution provides a lower bound and an upper bound
f(x(t)) f(x)
f(x) f(x(t)) +rf(x(t))T (x x(t))
Take convex combination of both upper bounds and lower bounds with weights t
UPPER:
LOWER :
1PT
t=1t
PTt=1 tf(x
(t)) f(x)
UPPER
f(x) 1PTt=1
t
hPTt=1 t(f(x
(t)) +rf(x(t))T (x x(t)))i
LOWER HOW TO UPDATE ITERATES?
HOW TO CHOSE WEIGHTS?
-
Reduction to Online Linear Minimization
Fix weights t to be uniform for simplicity:
UPPER:
LOWER :
DUALITY GAP:
1PT
t=1t
PTt=1 tf(x
(t)) f(x)
f(x) 1PTt=1
t
hPTt=1 t(f(x
(t)) +rf(x(t))T (x x(t)))i
PTt=1
tPT
t=1tf(x(t))
f(x) PTt=1rf(x(t))T (x x(t))
LINEAR FUNCTION
-
Reduction to Online Linear Minimization
Fix weights t to be uniform for simplicity:
DUALITY GAP: PTt=1
tPT
t=1tf(x(t))
f(x) PTt=1rf(x(t))T (x x(t))
ALGORITHM ADVERSARY
x(t) 2X rf(x(t))
ONLINE SETUP
-
Reduction to Online Linear Minimization
Fix weights t to be uniform for simplicity:
DUALITY GAP: PTt=1
tPT
t=1tf(x(t))
f(x) PTt=1rf(x(t))T (x x(t))
ALGORITHM ADVERSARY
x(t) 2X `(t) =rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient k`(t)k = krf(x(t))k
-
Reduction to Online Linear Minimization
Fix weights t to be uniform for simplicity:
DUALITY GAP: hPTt=1
1Tf(x(t))
i f(x) 1
TPTt=1rf(x(t))T (x x(t))
ALGORITHM ADVERSARY
x(t) 2X `(t) =rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient k`(t)k = krf(x(t))k
1
TTXt=1
rf(x(t))T (x x(t)) = REGRET
-
Final Bound
ALGORITHM ADVERSARY
x(t) 2X `(t) =rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient k`(t)k = krf(x(t))k
TXt=1
rf(x(t))T (x x(t)) = REGRET
MD p2 (maxx2X F (x)minx2X F (x))
pT
RESULTING ALGORITHM: MIRROR DESCENT
Error bound with -strongly-convex regularizer F
-
Final Bound
ALGORITHM ADVERSARY
x(t) 2X `(t) =rf(x(t))
ONLINE SETUP
Recall that by assumption: Loss vector is gradient k`(t)k = krf(x(t))k
TXt=1
rf(x(t))T (x x(t)) = REGRET
MD p2 (maxx2X F (x)minx2X F (x))
pT
RESULTING ALGORITHM: MIRROR DESCENT
Error bound with -strongly-convex regularizer F
ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND
-
Non-Smooth Optimization over Simplex
MD p2 lognpT
RESULTING ALGORITHM:
MIRROR DESCENT OVER SIMPLEX = MWU
Regularizer F is negative entropy, with krf(x(t))k1
-
APPLICATIONS IN ALGORITHM DESIGN
-
Warm-up Example: Linear Programming
A 2 Rmn;?9x 2 X : Ax b 0
LP Feasibility problem
Easy constraints
Maintain feasible Hard constraints
Require fixing
-
Warm-up Example: Linear Programming
A 2 Rmn;?9x 2 X : Ax b 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective:
LP Feasibility problem
minp2m
maxx2X
pT (bAx)
f(p) = maxx2X
pT (bAx)
-
Warm-up Example: Linear Programming
A 2 Rmn;?9x 2 X : Ax b 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective:
LP Feasibility problem
minp2m
maxx2X
pT (bAx)
f(p) = maxx2X
pT (bAx)Best response to dual
solution p
-
Warm-up Example: Linear Programming
A 2 Rmn;?9x 2 X : bAx 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective
Admits subgradients, for all p:
LP Feasibility problem
minp2m
maxx2X
pT (bAx)
f(p) = maxx2X
pT (bAx)
xp : pT (bAxp) 0;
(bAxp) 2 @f(p)Subgradient is slack
in constraints
-
Warm-up Example: Linear Programming
A 2 Rmn;?9x 2 X : bAx 0
Convert into non-smooth optimization problem over simplex:
Non-differentiable objective
Admits subgradients, for all p:
If we can pick xp such that , then
LP Feasibility problem
minp2m
maxx2X
pT (bAx)
f(p) = maxx2X
pT (bAx)
xp : pT (bAxp) 0;
(bAxp) 2 @f(p)
kbAxpk1
MD p2 lognpT
T 2 2 logn
2
-
Minaximum flow feasibility for value F over undirected graph G with incidence matrix B:
Turn into non-smooth minimization problem over simplex:
MWU and s-t Maxflow
8e 2 E;F jfejce
1
BT f = es et
f(p) = minBT f=eset
Xe2E
pe F jfejce
1
Will enforce this
Best response fp is shortest s-t path with lengths pe / ce .
For any p, if fp has length > 1, there is no subgradient, i.e. problem is infeasible.
Otherwise, the following is a subgradient
Unfortunately, width can be large
@f(p)e =F j(fp)ej
ce 1
k@f(p)ek1 F
cmin
[PST 91] T = O
F logn
2cmin
-
PROBLEM: Optimal for this specific formulation
SOLUTION: Regularize primal
Width Reduction: make function nicer
x(t)x(t+1) x(t+2)
k@f(p)ek1 F
cmin
f(p) = minBTf=eset
F Xe2E
fe
ce
pe +
m
1
NEED PRIMAL ARGUMENT
-
PROBLEM: Optimal for this specific formulation
SOLUTION: Regularize primal
REGULARIZATION ERROR:
NEW WIDTH:
ITERATION BOUND:
Width Reduction: make primal nicer
k@f(p)ek1 F
cmin
f(p) = minBTf=eset
F Xe2E
fe
ce
pe +
m
1
F
k@f(p)ek1 m
[GK 98] T = O
m logn
2
-
Electrical Flow Approach [CKMST]
8e 2 E;F f2e
c2e 1
BT f = es et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
f(p) = minBT f=eset
Xe2E
pe F f2e
c2e 1
-
Electrical Flow Approach [CKMST]
8e 2 E;F f2e
c2e 1
BT f = es et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
Original width:
f(p) = minBT f=eset
Xe2E
pe F f2e
c2e 1
Best response is electrical flow fp
k@f(p)ek1 m
-
Electrical Flow Approach [CKMST]
8e 2 E;F f2e
c2e 1
BT f = es et Will enforce this
Different formulation yields basis for CKMST algorithm:
Non-smooth optimization problem:
Regularize primal:
f(p) = minBT f=eset
Xe2E
pe F f2e
c2e 1
f(p) = minBT f=eset
F Xe2E
f2ec2e
pe +
m
1
k@f(p)ek1 rm
-
Conclusion: Take-away messages
Regularization is a powerful tool for the design of fast algorithms.
Most iterative algorithms can be understood as regularized updates: MWUs, Width Reduction, Interior Point, Gradient descent, ..
Perform well in practice. Regularization also helps eliminate noise.
ULTIMATE GOAL: Development of a library of iterative methods for fast graph algorithms.
Regularization plays a fundamental role in this effort
-
THE END THANK YOU