newton’s methodggordon/10725-f12/slides/12-newton.pdf · review: newton • for solving nonlinear...

Newton’s method

10-725 OptimizationGeoff Gordon

Ryan Tibshirani

Geoff Gordon—10-725 Optimization—Fall 2012

Review

• Volume rule

• Infomax ICA‣ matrix natural gradient

Review: Newton• For solving nonlinear equations:‣ approx by linear ones, solve, update approx

‣ d = –J(x)-1f(x)

• For finding minima/maxima/saddles:‣ just use Newton on gradient g(x) = f ’(x) = 0

‣ d = –H(x)-1g(x)

• Line search: Newton is a descent method

• (Often) quadratic convergence

Equality constraints

• min f(x) s.t. h(x) = 0

2 0 22

Optimality w/ equality

• min f(x) s.t. h(x) = 0

‣ f: Rd → R, h: Rd → Rk (k ! d)

‣ g: Rd → Rd (gradient of f)

• Useful special case: min f(x) s.t. Ax = 0

Picture

max c�

x2 + y2 + z2 = 1

a�x = b

Optimality w/ equality• min f(x) s.t. h(x) = 0

‣ f: Rd → R, h: Rd → Rk (k ! d)

‣ g: Rd → Rd (gradient of f)

• Now suppose:‣ dg = dh =

• Optimality:

Example: bundle adjustment

• Latent:

‣ Robot positions xt, θt

‣ Landmark positions yk

• Observed: odometry, landmark vectors

‣ vt = Rθt[xt+1–xt] + noise

‣ wt = [θt+1–θt + noise]"

‣ dkt = Rθt[yk–xt] + noise

2 0 25

O = {observed kt pairs}

Bundle adjustment

minxt,ut,yk

�t�vt −R(ut)[xt+1 − xt]�2 +

�t�Rwtut − ut+1�2 +

�(t,k)∈O

�dk,t −R(ut)[yk − xt]�2

s.t. u�t ut = 1

‣ latent: Robot positions xt, θt

‣ ut = [cos θt; sin θt]

‣ latent: Landmark positions yk

‣ obs: vt = Rθt[xt+1–xt] + noise

‣ obs: wt = [θt+1–θt + noise]"

‣ obs: dkt = Rθt[yk–xt] + noise

Ex: MLE in exponential family

L = − ln�

P (xk | θ)

P (xk | θ) =g(θ) =

MLE Newton interpretation

Convergence behavior

• minx f(x) s.t. Ax = b‣ strictly convex f(x), twice differentiable

‣ some kind of bound on 3rd derivative

• Two phases‣ damped Newton—most of time here

‣ step size < 1

‣ quadratic convergence—a few final iterations to get accuracy very high

‣ step size = 1

Convergence behavior

• Damped Newton

‣ f(xt+1) ! f(xt) – Δ some fixed Δ > 0

‣ limit:

• Quadratic convergence

‣ enter when errort ! δ ! 0.5

‣ errort+1 ! (errort)2

‣ limit:

Comparisonof methods for minimizing a convex function

Newton FISTA (sub)grad stoch. (sub)grad.

convergence

cost/iter

smoothness

Variations• Trust region‣ [H(x) + tI]dx = –g(x)

‣ [H(x) + tD]dx = –g(x)

• Quasi-Newton‣ use only gradients, but build estimate of Hessian

‣ in Rd, d gradient estimates at “nearby” points determine approx. Hessian (think finite differences)

‣ can often get “good enough” estimate w/ fewer— even forget old info to save memory/time (L-BFGS)

Variations: Gauss-Newton

L = minθ

2�yk − f(xk, θ)�2

newton’s methodggordon/10725-f12/slides/12-newton.pdf · review: newton • for solving nonlinear...

Documents

september 2012 -...

council of the european union (or. en) 10725/19 polgen 132...

section: การเงิน-ลงทุน/money wise...

bahan ajar biomekanika or 3. hukum newton.pdf

capÍtulo cuatro-dinamica de una particula - leyes de...

column 5 badges unlimited, llc. · 2019-07-12 · approx...

viata dintre vieti-michael newton.pdf

w.p.(c) 10725/2017 -...

capÍtulo cinco-aplicaciones de las leyes de newton.pdf

surface nanobubbles james r. t. seddon. “nanobubble”...

for sale 10710- s nd street // miami // florida · for sale...

behavioral/systems/cognitive … · 2012-07-28 ·...

topicos de fisica vol. 1 - completo - helou & gualter &...

takeshima · 2020. 10. 25. · japan sea of japan oki...

capitulo 4 searsleyes de newton.pdf

© brent coley 2009 | . earth one year = approx. 365 days...

district 7a donaldsonville, la 70346 d. honore...

•min f(x) s.t. ax = b gi(x) 0 i i - carnegie mellon school...

guide to exhibit - jimtofguide to exhibit access tokyo big...

slater’s condition: proof - carnegie mellon school of...