focs13 workshop

ITERATIVE METHODS AND REGULARIZATION

IN THE DESIGN OF FAST ALGORITHMS

Lorenzo Orecchia, MIT Math

An unified framework for optimization and online learning

beyond Multiplicative Weight Updates

Talk Outline: A Tale of Two Halves

PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING

Online Linear Optimization Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs) A Regularization Framework to generalize MWUs: Follow the Regularized Leader

MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE





Optimization:

Regularized Updates

Online Learning:

Multiplicative Weight

Updates (MWUs)





PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW

Non-smooth vs Smooth Convex Optimization Non-smooth Convex Optimization reduces to Online Linear Optimization Application: Understanding Undirected Maxflow algorithms based on MWUs

MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH

Fast Algorithms for solving specific LPs and SDPs: Maximum Flow problems [PST], [GK], [F], [CKMST] Covering-packing problems [PST] Oblivious routing [R], [M]

Fast Approximation Algorithms based on LP and SDP relaxations: Maxcut [AK] Graph Partitioning Problems [AK], [S], [OSV]

Proof Technique Hardcore Lemma [BHK] QIP = PSPACE [W] Derandomization [Y]

and more

TOC Applications of MWUs

Machine Learning meets Optimization meets TCS

These techniques have been rediscovered multiple times in different fields:

Machine Learning, Convex Optimization, TCS

Three surveys emphasizing the different viewpoints and literatures:

1) ML: Prediction, Learning and Games by Gabor and Lugosi

2) Optimization: Lectures in Modern Convex Optimization

by Ben Tal and Nemirowski

3) TCS: The Multiplicative Weights Update Method: a Meta

Algorithm and Applications by Arora, Hazan and Kale

REGULARIZATION 101

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

Stable optimum

Unique optimal solution

Smoothness conditions

What is Regularization?

Regularization is a fundamental technique in optimization

OPTIMIZATION

PROBLEM

WELL-BEHAVED

OPTIMIZATION

PROBLEM

Benefits of Regularization in Learning and Statistics: Prevents overfitting

Increases stability

Decreases sensitivity to random noise

Regularizer F Parameter > 0

Example: Regularization Helps Stability

f(c) = argminx2S cTx

Consider a convex set and a linear optimization problem:

The optimal solution f(c) may be very unstable under perturbation of c :

S Rn

kc0 ck and

S

cc0

f(c0) f(c)

kf(c0) f(c)k >>



Consider a convex set and a regularized linear optimization problem

where F is -strongly convex.

Then:

S Rn

kc0 ck implies kf(c0) f(c)kk

f(c0)f(c)

+F(x)

cTx+F(x)

c0Tx+F(x)



Consider a convex set and a regularized linear optimization problem

where F is -strongly convex.

Then:

S Rn

kc0 ck implies kf(c0) f(c)kk

f(c0)f(c)

+F(x)

cTx+F(x)

c0Tx+F(x)kslopek

ONLINE LINEAR OPTIMIZATION

AND

MULTIPLICATIVE WEIGHT UPDATES

SETUP: Convex set X Rn, generic norm, repeated game over T rounds. At round t,

Online Linear Minimization

ALGORITHM ADVERSARY

x(t) 2XCurrent solution



ALGORITHM ADVERSARY


`(t) 2 Rn;kr`(t)k Current linear objective

Loss vector



ALGORITHM ADVERSARY


`(t) 2 Rn;kr`(t)k Current linear objective

Loss vector

`(t)Tx(t)

Algorithms loss



ALGORITHM ADVERSARY

x(t) 2X x(t) 2X`(t) 2 Rn;kr`(t)k

x(t+1) 2XUpdated solution



ALGORITHM ADVERSARY


x(t+1) 2X `(t+1) 2 Rn;kr`(t)k Updated solution New Loss Vector

L^SETUP: Convex set X Rn, generic norm, repeated game over T rounds. At round t,


ALGORITHM ADVERSARY


x(t+1) 2X `(t+1) 2 Rn;kr`(t)k GOAL: update x(t) to minimize regret

Average Algorithms Loss A Posteriori Optimum

1

TTXt=1

`(t)TxT min

x2X1

TTXt=1

`(t)

i

T

x

L

p(t)ALGORITHM ADVERSARY

distribution over experts

Simplex Case: Learning with Experts SETUP: Simplex X Rn under 1 norm. At round t,


distribution over dimensions

i.e. experts


k`(t)k1 Experts losses





Eip(t)h`(t)

i

i= p(t)

T`(t)

Algorithms loss





p(t+1)

Update distribution

Simplex Case: Multiplicative Weight Updates

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1 )`(t)

i w(t)

i ; w1 =~1Weights:


p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1 )`(t)

i w(t)

i ; w1 =~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1 )`(t)

i w(t)

i ; w1 =~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:

MULTIPLICATIVE WEIGHT UPDATE


p(t)

ALGORITHM ADVERSARY

`(t)

w(t+1)

i = (1 )`(t)

i w(t)

i ; w1 =~1Weights:

p(t+1)

i =w(t)

iPn

j=1w(t)

j

Distribution:


2 (0; 1)0 1

CONSERVATIVE AGGRESSIVE

MWUs: Unraveling the Update

p(t)

ALGORITHM ADVERSARY

`(t)

WEIGHT

CUMULATIVE LOSS

(1 )P

t`(t)

i

p(t+1)

i / w(t+1)i = (1 )`(t)

i w(t)iUpdate:

w(t+1)

i

Pt `(t)

i

For and

MWUs: Regret Bound

p(t)

ALGORITHM ADVERSARY

`(t)

L^L? lognT

+

k`(t)k1 < 12

p(t+1)

i / w(t+1)i = (1 )`(t)

i w(t)iUpdate:

For and

MWUs: Regret Bound

p(t)

ALGORITHM ADVERSARY

`(t)

L^L? lognT

+

< 12

p(t+1)

i / w(t+1)i = (1 )`(t)

i w(t)iUpdate:

Algorithms

Regret

Start-up Penalty Penalty for

being greedy

k`(t)k1

ONLINE LINEAR OPTIMIZATION BEYOND MWUs

A REGULARIZATION FRAMEWORK

MWUs: Proof Sketch of Regret Bound

(t+1) = log1Pni=1w

(t+1)

i

p(t+1)

i / w(t+1)i = (1 )P

t

s=1`(s)

iUpdate:

Proof is potential function argument

(t+1) = log1Pni=1w

(t+1)

i

p(t+1)

i / w(t+1)i = (1 )P

t

s=1`(s)

iUpdate:


Potential function bounds loss of best expert

(t+1) log1minni=1w

(t+1)

i =minni=1

Pts=1 `

(s)

i


(t+1) = log1Pni=1w

(t+1)

i

p(t+1)

i / w(t+1)i = (1 )P

t

s=1`(s)

iUpdate:



Potential function is related to algorithms performance

(t+1) log1minni=1w

(t+1)

i =minni=1

Pts=1 `

(s)

i

(t+1) (t) `(t)

Tp(t)


(t+1) = log1Pni=1w

(t+1)

i

p(t+1)

i / w(t+1)i = (1 )P

t

s=1`(s)

iUpdate:



Potential function is related to algorithms performance

(t+1) log1minni=1w

(t+1)

i =minni=1

Pts=1 `

(s)

i

(t+1) (t) `(t)

Tp(t)

DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE?


MWUs AND APPLICATIONS

Designing a Regularized Update GOAL: Design an update and its potential function analysis

QUESTION: Choice of potential function?

DESIDERATA: 1) lower bounds best experts loss

2) tracks algorithms performance





Attempt 1 FOLLOW THE LEADER: Cumulative loss

L(t) =Pts=1 `

(s)

x(t+1) = argminx2X

xTL(t) (t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss

Designing a Regularized Update






L(t) =Pts=1 `

(s)

x(t+1) = argminx2X

xTL(t) (t+1) = minx2X

xTL(t)

Pick best current solution Potential is current best loss


Fails if best expert changes moves drastically






L(t) =Pts=1 `

(s)

x(t+1) = argminx2X

xTL(t)

(t+1) = minx2X

xTL(t)


How to make update

more stable?





Attempt 2 FOLLOW THE REGULARIZED LEADER:

x(t+1) = argminx2X

xTL(t) + F(x)

(t+1) = minx2X

xTL(t) + F(x)

Properties of Regularizer F(x):

1. Convex, differentiable

2. -strong convex w.r.t. norm

Parameter 0, TBD

Regularized Update: Definition






x(t+1) = argminx2X

xTL(t) + F(x)

(t+1) = minx2X

xTL(t) + F(x)




Parameter 0, TBD

Regularized Update: Definition

These properties are actually sufficient to get a regret bound






x(t+1) = argminx2X

xTL(t) + F(x)

(t+1) = minx2X

xTL(t) + F(x)




Parameter 0, TBD

Regularized Update: Analysis

(t+1) minx2X

L(t)Tx+ max

x2XF(x)






x(t+1) = argminx2X

xTL(t) + F(x)

(t+1) = minx2X

xTL(t) + F(x)




Parameter 0, TBD


(t+1) minx2X

L(t)Tx+ max

x2XF(x) Regularization

error






x(t+1) = argminx2X

xTL(t) + F(x)

(t+1) = minx2X

xTL(t) + F(x)




Parameter 0, TBD


?

f(t+1)(x)

Tracking the Algorithm: Proof by Picture

f(t+1)(x) = xTL(t) + F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Define:

(t+1)

(t)

Define:

(t+1)

(t)

Notice:

f(t+1)(x) f(t)(x) = `(t)Tx Latest loss vector


f(t+1)(x) = xTL(t) + F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Define:

(t+1)

(t)

Notice:

f(t+1)(x) f(t)(x) = `(t)Tx Latest loss vector

`(t)Tx(t)


f(t+1)(x) = L(t)Tx+ F(x)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

Compare:

(t+1)

(t)

and (t+1) (t)


`(t)Tx(t)

f(t)(x)

x

f(t+1)(x)

x(t) x(t+1)

`(t)Tx(t)

f(t)(x)

f(t+1)(x)

p

Want:

(t+1)

(t)


f(t+1)(x(t)) f(t+1)(x(t+1))

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

(t+1) (t) = f(t+1)(x(t+1)) f(t+1)(x(t)) + `(t)Tx(t)

Regularization in Action

(t+1)

(t)

f (t) is ( )-strongly-convex REGULARIZATION

f(t+1)(x) = L(t)Tx+ F(x)

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

`(t)


(t+1)

(t)


kf(t+1) f(t)k = k`(t)k jjx(t+1) x(t)jj jj`(t)jj

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ F(x)

`(t)


(t+1)

(t)


kf(t+1) f(t)k = k`(t)k jjx(t+1) x(t)jj jj`(t)jj

STABILITY

`(t)Tx(t)

xx(t) x(t+1)

f(t)(x)

f(t+1)(x)

f(t+1)(x) = L(t)Tx+ F(x)

Quadratic

lower bound

to f(t+1)


Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t) jjx(t) x(t)jj jj`(t)jj

f (t+1) is ( )-strongly-convex

(t+1) (t) = f(t+1)(x(t+1)) f(t+1)(x(t)) + `(t)Tx(t)

f(t+1)(x(t+1)) f(t+1)(x(t)) `(t)T (x(t+1) x(t)) + jj`(t)jj22


Analysis: Progress in One Iteration

rf(t+1)(x(t)) = `(t)

f(t+1)(x(t+1)) f(t+1)(x(t)) `(t)T (x(t+1) x(t)) + jj`(t)jj22

f (t+1) is ( )-strongly-convex

(t+1) (t) = f(t+1)(x(t+1)) f(t+1)(x(t)) + `(t)Tx(t)

k`(t)kkx(t+1) x(t)k+ jj`(t)jj2

k`(t)k22

jjx(t) x(t)jj jj`(t)jj


Completing the Analysis

(t+1) (t) `(t)Tx(t) k`(t)k2

Regret at iteration t

Progress in one iteration:



(t+1) (t) `(t)Tx(t) k`(t)k2


Telescopic sum:

(T+1) TXt=1

`(t)Tp(t) +(1) T jj`

(t)jj2



(t+1) (t) `(t)Tx(t) k`(t)k2


Telescopic sum:

(T+1) TXt=1

`(t)Tp(t) +(1) T jj`

(t)jj2

Final regret bound:

1

T

TXt=1

`(t)Tx(t) min

x2X

TXt=1

`(t)Tx

!

T (maxx2X

F (x)minx2X

F (x)) +2

2



Regret bound: with regularizer F and

jj`(t)jj

Start-up Penalty Penalty for

being greedy

SAME TYPE OF BOUND AS FOR MWUs

1

T

TXt=1

`(t)Tx(t) min

x2X

TXt=1

`(t)Tx

!

T (maxx2X

F (x)minx2X

F (x)) +2

2


Reinterpreting MWUs

(t+1) = minp0;Ppi=1

pTL(t) + nXi=1

pi logpiPotential function:

Regularizer: is negative entropy F (p) =

nXi=1

pi log pi


Reinterpreting MWUs

(t+1) = minp0;Ppi=1

pTL(t) + nXi=1


Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

nXi=1

pi log pi

k k1

p(t+1) = arg minp0;Ppi=1

pTL(t) + nXi=1

pi logpi

p(t+1)

i =e

1L(t)

iPni=1 e

1L(t)

i

=(1 )L(t)iPni=1(1 )L

(t)

i

:

SOFT-MAX


Reinterpreting MWUs

(t+1) = minp0;Ppi=1

pTL(t) + nXi=1


Regularizer: is negative entropy

F (p ) is 1-strongly-convex w.r.t.

Update:

F (p) =

nXi=1

pi log pi

k k1

p(t+1) = arg minp0;Ppi=1

pTL(t) + nXi=1

pi logpi

p(t+1)

i =e

1L(t)

iPni=1 e

1L(t)

i

=(1 )L(t)iPni=1(1 )L

(t)

i

:


Beyond MWUs: which regularizer?

Regret bound: optimizing over

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2 (maxx2X F (x)minx2X F (x))

1

T

TXt=1

`(t)Tx(t) min

x2X

TXt=1

`(t)Tx

!p(2 (maxx2X F (x)minx2X F (x))p

T


Beyond MWUs: which regularizer?

Regret bound: optimizing over

Best choice of regularizer and norm minimizes

maxt jj`(t)jj2 (maxx2X F (x)minx2X F (x))

1

T

TXt=1

`(t)Tx(t) min

x2X

TXt=1

`(t)Tx

!p(2 (maxx2X F (x)minx2X F (x))p

T

Negative entropy with -norm is approximately optimal for simplex

QUESTION: are other regularizers ever useful?

`1

QUESTION 1:

Are other regularizers, besides entropy, ever useful?

YES! Applications:

Graph Partitioning and Random Walks

Spectral algorithms for balanced separator running in time

Uses random-walk framework and SDP MWUs

Different walks correspond to different regularizers for eigenvector problem

[Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012]

Different Regularizers in Algorithm Design

F(X) = Tr(X1=2)

F(X) = Tr(Xp)

F(X) = Tr(X logX)SDP MWU

p-norm, 1 p 1 NEW REGULARIZER

Heat Kernel Random Walk

Lazy Random Walk

Personalized PageRank

~O(m)

QUESTION 1:


YES! Applications:


Sparsification

-spectral-sparsifiers with edges

Uses Matrix concentration bound equivalent to SDP MWUs

[Spielman, Srivastava 2008]

-spectral-sparsifiers with edges

Can be interpreted as different regularizer:

[Batson, Spielman, Srivastava 2009]


O(n logn2

)

O( n2)

F(X) = Tr(X1=2)

QUESTION 1:


YES! Applications:


Sparsification

Many more in Online Learning

Bandit Online Learning [AHR],


NON-SMOOTH CONVEX OPTIMIZATION

REDUCES TO

ONLINE LINEAR OPTIMIZATION

Convex Optimization Setup

8x 2 X;krf(x)k

8x; y 2 X;krf(y)rf(x)k Lky xk

f convex, differentiable

X Rn closed, convex set

minx2X

f(x)

NON-SMOOTH SMOOTH

-Lipschitz continuous -Lipschitz continuous gradient


8x 2 X;krf(x)k




minx2X

f(x)

NON-SMOOTH SMOOTH


Gradient step is guaranteed to decrease

function value

f(x(t+1)) f(x(t)) krf(x(t))k2

2L


8x 2 X;krf(x)k




minx2X

f(x)

NON-SMOOTH SMOOTH



function value


2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE


8x 2 X;krf(x)k




minx2X

f(x)

NON-SMOOTH SMOOTH



function value


2L

x(t)x(t+1)

NO GRADIENT STEP GUARANTEE

ONLY DUAL GUARANTEE

Non-Smooth Setup: Dual Approach

8x 2X; krf(x)k



minx2X

f(x)

-Lipschitz continuous

x(t)x(t+1) x(t+2)

APPROACH: Each iterate solution provides a lower bound and an upper bound

f(x(t)) f(x)

f(x) f(x(t)) +rf(x(t)T (x x(t))


8x 2X; krf(x)k



minx2X

f(x)

-Lipschitz continuous

x(t)x(t+1) x(t+2)


f(x(t)) f(x)


CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE


x(t)x(t+1) x(t+2)


f(x(t)) f(x)


Take convex combination of both upper bounds and lower bounds with weights t

UPPER BOUND:

LOWER BOUND:

1PT

t=1t

PTt=1 tf(x

(t)) f(x)

UPPER


x(t)x(t+1) x(t+2)


f(x(t)) f(x)

f(x) f(x(t)) +rf(x(t))T (x x(t))


UPPER:

LOWER :

1PT

t=1t

PTt=1 tf(x

(t)) f(x)

UPPER

f(x) 1PTt=1

t

hPTt=1 t(f(x

(t)) +rf(x(t))T (x x(t)))i

LOWER


x(t)x(t+1) x(t+2)


f(x(t)) f(x)

f(x) f(x(t)) +rf(x(t))T (x x(t))


UPPER:

LOWER :

1PT

t=1t

PTt=1 tf(x

(t)) f(x)

UPPER

f(x) 1PTt=1

t

hPTt=1 t(f(x

(t)) +rf(x(t))T (x x(t)))i

LOWER HOW TO UPDATE ITERATES?

HOW TO CHOSE WEIGHTS?

Reduction to Online Linear Minimization

Fix weights t to be uniform for simplicity:

UPPER:

LOWER :

DUALITY GAP:

1PT

t=1t

PTt=1 tf(x

(t)) f(x)

f(x) 1PTt=1

t

hPTt=1 t(f(x

(t)) +rf(x(t))T (x x(t)))i

PTt=1

tPT

t=1tf(x(t))

f(x) PTt=1rf(x(t))T (x x(t))

LINEAR FUNCTION



DUALITY GAP: PTt=1

tPT

t=1tf(x(t))


ALGORITHM ADVERSARY

x(t) 2X rf(x(t))

ONLINE SETUP



DUALITY GAP: PTt=1

tPT

t=1tf(x(t))


ALGORITHM ADVERSARY

x(t) 2X `(t) =rf(x(t))

ONLINE SETUP

Recall that by assumption: Loss vector is gradient k`(t)k = krf(x(t))k



DUALITY GAP: hPTt=1

1Tf(x(t))

i f(x) 1

TPTt=1rf(x(t))T (x x(t))

ALGORITHM ADVERSARY

x(t) 2X `(t) =rf(x(t))

ONLINE SETUP


1

TTXt=1

rf(x(t))T (x x(t)) = REGRET

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =rf(x(t))

ONLINE SETUP


TXt=1


MD p2 (maxx2X F (x)minx2X F (x))

pT

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with -strongly-convex regularizer F

Final Bound

ALGORITHM ADVERSARY

x(t) 2X `(t) =rf(x(t))

ONLINE SETUP


TXt=1


MD p2 (maxx2X F (x)minx2X F (x))

pT

RESULTING ALGORITHM: MIRROR DESCENT

Error bound with -strongly-convex regularizer F

ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND

Non-Smooth Optimization over Simplex

MD p2 lognpT

RESULTING ALGORITHM:

MIRROR DESCENT OVER SIMPLEX = MWU

Regularizer F is negative entropy, with krf(x(t))k1

APPLICATIONS IN ALGORITHM DESIGN

Warm-up Example: Linear Programming

A 2 Rmn;?9x 2 X : Ax b 0

LP Feasibility problem

Easy constraints

Maintain feasible Hard constraints

Require fixing


A 2 Rmn;?9x 2 X : Ax b 0

Convert into non-smooth optimization problem over simplex:

Non-differentiable objective:


minp2m

maxx2X

pT (bAx)

f(p) = maxx2X

pT (bAx)


A 2 Rmn;?9x 2 X : Ax b 0


Non-differentiable objective:


minp2m

maxx2X

pT (bAx)

f(p) = maxx2X

pT (bAx)Best response to dual

solution p


A 2 Rmn;?9x 2 X : bAx 0


Non-differentiable objective

Admits subgradients, for all p:


minp2m

maxx2X

pT (bAx)

f(p) = maxx2X

pT (bAx)

xp : pT (bAxp) 0;

(bAxp) 2 @f(p)Subgradient is slack

in constraints


A 2 Rmn;?9x 2 X : bAx 0


Non-differentiable objective

Admits subgradients, for all p:

If we can pick xp such that , then


minp2m

maxx2X

pT (bAx)

f(p) = maxx2X

pT (bAx)

xp : pT (bAxp) 0;

(bAxp) 2 @f(p)

kbAxpk1

MD p2 lognpT

T 2 2 logn

2

Minaximum flow feasibility for value F over undirected graph G with incidence matrix B:

Turn into non-smooth minimization problem over simplex:

MWU and s-t Maxflow

8e 2 E;F jfejce

1

BT f = es et

f(p) = minBT f=eset

Xe2E

pe F jfejce

1

Will enforce this

Best response fp is shortest s-t path with lengths pe / ce .

For any p, if fp has length > 1, there is no subgradient, i.e. problem is infeasible.

Otherwise, the following is a subgradient

Unfortunately, width can be large

@f(p)e =F j(fp)ej

ce 1

k@f(p)ek1 F

cmin

[PST 91] T = O

F logn

2cmin

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

Width Reduction: make function nicer

x(t)x(t+1) x(t+2)

k@f(p)ek1 F

cmin

f(p) = minBTf=eset

F Xe2E

fe

ce

pe +

m

1

NEED PRIMAL ARGUMENT

PROBLEM: Optimal for this specific formulation

SOLUTION: Regularize primal

REGULARIZATION ERROR:

NEW WIDTH:

ITERATION BOUND:

Width Reduction: make primal nicer

k@f(p)ek1 F

cmin

f(p) = minBTf=eset

F Xe2E

fe

ce

pe +

m

1

F

k@f(p)ek1 m

[GK 98] T = O

m logn

2

Electrical Flow Approach [CKMST]

8e 2 E;F f2e

c2e 1

BT f = es et Will enforce this

Different formulation yields basis for CKMST algorithm:

Non-smooth optimization problem:

f(p) = minBT f=eset

Xe2E

pe F f2e

c2e 1


8e 2 E;F f2e

c2e 1




Original width:

f(p) = minBT f=eset

Xe2E

pe F f2e

c2e 1

Best response is electrical flow fp

k@f(p)ek1 m


8e 2 E;F f2e

c2e 1




Regularize primal:

f(p) = minBT f=eset

Xe2E

pe F f2e

c2e 1

f(p) = minBT f=eset

F Xe2E

f2ec2e

pe +

m

1

k@f(p)ek1 rm

Conclusion: Take-away messages

Regularization is a powerful tool for the design of fast algorithms.

Most iterative algorithms can be understood as regularized updates: MWUs, Width Reduction, Interior Point, Gradient descent, ..

Perform well in practice. Regularization also helps eliminate noise.

ULTIMATE GOAL: Development of a library of iterative methods for fast graph algorithms.

Regularization plays a fundamental role in this effort

THE END THANK YOU

focs13 workshop

Documents

linear optimization

regularization framework

nonsmooth optimization

modern convex optimization

kale regularization

mwus message

fundamental technique

iterative techniques