chapter 4 s tochastic a pproximation for r oot f inding in n onlinear m odels

13
CHAPTER 4 CHAPTER 4 S S TOCHASTIC TOCHASTIC A A PPROXIMATION FOR PPROXIMATION FOR R R OOT OOT F F INDING IN INDING IN N N ONLINEAR ONLINEAR M M ODELS ODELS •Organization of chapter in ISSO –Introduction and potpourri of examples •Sample mean •Quantile and CEP •Production function (contrast with maximum likelihood) –Convergence of the SA algorithm –Asymptotic normality of SA and choice of gain sequence –Extensions to standard root-finding SA •Joint parameter and state estimation •Higher-order methods for algorithm acceleration Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

Upload: kiri

Post on 08-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Slides for Introduction to Stochastic Search and Optimization ( ISSO ) by J. C. Spall. CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS. Organization of chapter in ISSO Introduction and potpourri of examples Sample mean Quantile and CEP - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

CHAPTER 4CHAPTER 4 SSTOCHASTIC TOCHASTIC AAPPROXIMATION FOR PPROXIMATION FOR RROOT OOT

FFINDING IN INDING IN NNONLINEAR ONLINEAR MMODELSODELS•Organization of chapter in ISSO

–Introduction and potpourri of examples•Sample mean•Quantile and CEP•Production function (contrast with maximum likelihood)

–Convergence of the SA algorithm–Asymptotic normality of SA and choice of gain sequence –Extensions to standard root-finding SA

•Joint parameter and state estimation•Higher-order methods for algorithm acceleration•Iterate averaging•Time-varying functions

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

Page 2: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-2

Stochastic Root-Finding ProblemStochastic Root-Finding Problem

• Focus is on finding (i.e., ) such that g() = 0– g() is typically a nonlinearnonlinear function of (contrast with

Chapter 3 in ISSO)

• Assume only noisy measurements of g() are available: Yk() = g() + ek(), k = 0, 1, 2,…,

• Above problem arises frequently in practice– Optimization with noisy measurements (g() represents

gradient of loss function) (see Chapter 5 of ISSO)– Quantile-type problems– Equation solving in physics-based models– Machine learning (see Chapter 11 of ISSO)

Page 3: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-3

Core Algorithm for Stochastic Root-FindingCore Algorithm for Stochastic Root-Finding• Basic algorithm published in Robbins and Monro (1951)• Algorithm is a stochastic analogue to steepest descent when

used for optimization– Noisy measurement Yk() replaces exact gradient g()

• Generally wasteful to average measurements at given value of – Average across iterationsacross iterations (changing )

• Core Robbins-Monro algorithm for unconstrained root-finding is

• Constrained version of algorithm also exists

1ˆ ˆ ˆ( ), where 0k k k k k ka aY

Page 4: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-4

Circular Error Probable (CEP): Example of Circular Error Probable (CEP): Example of Root-Finding Root-Finding (Example 4.3 in (Example 4.3 in ISSOISSO))

• Interested in estimating radius of circle about target such that half of impacts lie within circle ( is scalar radius)

• Define success variable

• Root-finding algorithm becomes

• Figure on next slide illustrates results for one study

ˆ1 if (success)ˆ0 otherwise (nonsuccess)

( ) k kk ks X

1

ˆ ˆ ˆ 0.5( )k k k k k

kY

a s

Page 5: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-5

True and estimated CEP: 1000 impact points True and estimated CEP: 1000 impact points with impact mean differing from target point with impact mean differing from target point

(Example 4.3 in (Example 4.3 in ISSOISSO))

Page 6: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-6

Convergence ConditionsConvergence Conditions• Central aspect of root-finding SA are conditions for formal

convergence of the iterate to a root

– Provides rigorous basis for many popular algorithms (LMS, backpropagation, simulated annealing, etc.)

• Section 4.3 of ISSO contains two sets of conditions:– “Statistics” conditions based on classical assumptions

about g(), noise, and gains ak– “Engineering” conditions based on connection to

deterministic ordinary differential equation (ODE)• Convergence and stability of ODE dZ()/d = –g(Z()) closely

related to convergence of SA algorithm (Z() represents p-dimensional time-varying function and denotes time)

• Neither of statistics or engineering conditions is special case of other

Page 7: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-7

ODE Convergence Paths for Nonlinear Problem ODE Convergence Paths for Nonlinear Problem in Example 4.6 in in Example 4.6 in ISSOISSO: Satisfies ODE : Satisfies ODE

Conditions Due to Asymptotic Stability and Conditions Due to Asymptotic Stability and Global Domain of AttractionGlobal Domain of Attraction

42

2222

2 Z1

2222

2

Z2

Page 8: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-8

Gain SelectionGain Selection• Choice of the gain sequence ak is critical to the performance of

SA • Famous conditions for convergence are = and

• A common practical choice of gain sequence is

where 1/2 < 1, a > 0, and A 0

• Strictly positive A (“stability constant”) allows for larger a (possibly faster convergence) without risking unstable behavior in early iterations

and A can usually be pre-specified; critical coefficient a usually chosen by “trial-and-error”

0 kk a

2

0 kk a

( 1 )

kaa

k A

Page 9: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-9

Extensions to Basic Root-Finding SA Extensions to Basic Root-Finding SA (Section 4.5 of (Section 4.5 of ISSOISSO))

• Joint Parameter and State Evolution– There exists state vector xk related to system being

optimized– E.g., state-space model governing evolution of xk, where

model depends on values of • Adaptive Estimation and Higher-Order Algorithms

– Adaptively estimating gain ak – SA analogues of fast Newton-Raphson search

• Iterate Averaging– See slides to follow

• Time-Varying Functions– See slides to follow

Page 10: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-10

Iterate AveragingIterate Averaging• Iterate averaging is important and relatively recent development in

SA • Provides means for achieving optimal asymptoticasymptotic performance

without using optimal gains ak• Basic iterate average uses following sample mean as final

estimate:

• Results in finite-samplefinite-sample practice are mixed• Success relies on large proportion of individual iterates hovering

in some balanced way around

– Many practical problems have iterate approaching in roughly monotonic manner

– Monotonicity not consistent with good performance of iterate averaging; see plot on following slide

1

0

ˆ( 1)k

k jj

k

Page 11: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-11

Contrasting Search Paths for Typical Contrasting Search Paths for Typical pp = 2 = 2 Problem: Ineffective and Effective Uses of Problem: Ineffective and Effective Uses of

Iterate AveragingIterate Averaging

Page 12: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-12

Time-Varying FunctionsTime-Varying Functions• In some problems, the root-finding function varies with

iteration: gk() (rather than g())– Adaptive control with time-varying target vector– Experimental design with user-specified input values– Signal processing based on Markov models (Subsection 4.5.1

of ISSO)

• Let denote the root to gk() = 0• Suppose that for some fixed value (equivalent to

the fixed in conventional root-finding)

• In such cases, much standard theory continues to apply

• Plot on following slide shows case when gk() represents a gradient function with scalar

kk

Page 13: CHAPTER 4 S TOCHASTIC  A PPROXIMATION FOR  R OOT  F INDING IN  N ONLINEAR  M ODELS

4-13

Time-Varying Time-Varying ggkk(() = ) = LLkk(())// for Loss for Loss Functions with Limiting MinimumFunctions with Limiting Minimum