a bayesian approach to empirical local linearization for...

18
A Bayesian Approach to Empirical Local Linearization for Robotics Jo-Anne Ting 1 , Aaron D’Souza 2 , Sethu Vijayakumar 3 , Stefan Schaal 1 1 University of Southern California, 2 Google, Inc., 3 University of Edinburgh ICRA 2008 May 23, 2008

Upload: others

Post on 10-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

A Bayesian Approach toEmpirical Local Linearization forRobotics

Jo-Anne Ting1, Aaron D’Souza2,Sethu Vijayakumar3, Stefan Schaal1

1University of Southern California,2Google, Inc., 3University of Edinburgh

ICRA 2008May 23, 2008

Page 2: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

2

Outline

• Motivation• Past & related work• Bayesian locally weighted regression• Experimental results• Conclusions

Page 3: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

3

Motivation

• Locally linear methods have been shown to be useful for robotcontrol (e.g., learning internal models of high-dimensional systemsfor feedforward control or local linearizations for optimal control &reinforcement learning).

• A key problem is to find the “right” size of the local region for alinearization, as in locally weighted regression.

• Existing methods* use either cross-validation techniques, complexstatistical hypothesis or require significant manual parameter tuningfor good & stable performance.

*e.g., supersmoothing (Friedman, 84), LWPR (Vijayakumar et al, 05), (Fan & Gijbels, 92 & 95)

X

Y

Page 4: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

4

Outline

• Motivation• Past & related work• Bayesian locally weighted regression• Experimental results• Conclusions

Page 5: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

5

Quick Review of Locally Weighted Regression• Given a nonlinear regression problem, , our goal is to

approximate a locally linear model at each query point xq in order tomake the prediction:

• We compute the measure of locality for each data sample with aspatial weighting kernel K, e.g., wi = K(xi, xq, h).

• If we can find the “right” local regime for each xq, nonlinear functionapproximation may be solved accurately and efficiently.

y = f x( ) + !

yq= b

Txq

Previous methods may:i) Be sensitive to initial valuesii) Require tuning/setting of open parametersiii) Be computationally involved

Weighting kernel

X

Y

Page 6: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

6

Outline

• Motivation• Past & related work• Bayesian locally weighted regression• Experimental results• Conclusions

Page 7: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

7

Bayesian Locally Weighted Regression• Our variational Bayesian algorithm:

i. Learns both b and the optimal hii. Handles high-dimensional dataiii. Associates a scalar indicator weight wi with each data sample

• We assume the following prior distributions:

i = 1,..,N

bm

!2

ahm

bhm

!N

2

n

m = 1,..,d

wim

yi

xim

hm

p yixi

( ) ~ Normal bT

xi,!

2( )

p b !2( ) ~ Normal 0,! 2

"b 0

( )

p !2( ) ~ Scaled-Inv-# 2

n,!N

2( )

where each data sample has a weight wi:

wi= w

im

m=1

d

! , where p wim

( ) ~ Bernoulli 1+ xim" x

qm( )r

hm

#$

%&"1

( )hm

~ Gamma ahm

,bhm

( )

Page 8: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

8

Inference Procedure

• We can treat this as an EM learning problem (Dempster & Laird, ‘77):

• We use a variational factorial approximation of the true joint posteriordistribution* (e.g., Ghahramani & Beal, ‘00) and a variationalapproximation on concave/convex functions, as suggested by(Jaakkola & Jordan, ‘00), to get analytically tractable inference.

Maximize L,where L = log p yi ,wi ,b,z! ,h xi( )i=1

N

"

*Q b,!z,h( ) =Q b,!

z( )Q h( )

L = log p yi xi ,b,! 2( )wi

i=1

N

" + log p wim( )m=1

d

"i=1

N

" + log p b ! 2( )

+ log p !2( ) + log p h( )

where

Page 9: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

9

Important Things to Note

• For each local model, our algorithm:

i. Learns the optimal bandwidth value, h (i.e. the “appropriate”local regime)

ii. Is linear in the number of input dimensions per EM iteration (foran extended model with intermediate hidden variables, z,introduced for fast computation)

iii. Provides a natural framework to incorporate prior knowledge ofthe strong (or weak) presence of noise

Page 10: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

10

Outline

• Motivation• Past & related work• Bayesian locally weighted regression• Experimental results• Conclusions

Page 11: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

11

Experimental Results: Synthetic dataFunction with discontinuity + N(0,0.3025) output noise

Function with increasing curvature + N(0,0.01) output noise

X

Y

Page 12: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

12

Experimental Results: Synthetic dataFunction with peak + N(0,0.09) output noise

Straight line (notice “flat” kernels are learnt)

Page 13: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

13

Experimental Results: Synthetic data

Kernel Shaping Gaussian Process regressionTarget function

2D “cross” function* + N(0, 0.01)

Kernel Shaping: Learnt Kernels

*Training data has 500 samples and mean-zero noise with variance of 0.01 added to outputs.

Page 14: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

14

Experimental Results: Robot arm data• Given a kinematics problem for a 7 DOF robot arm:

we want to estimate the Jacobian, J, for the purpose of establishingthe algorithm does the right thing for each local regression problem:

• For a particular local linearization problem, we compare the estimatedJacobian using BLWR, JBLWR, to the:• Analytically computed Jacobian, JA

• Estimated Jacobian using locally weighted regression, JLWR(where the optimal distance metric is found with cross-validation).

p = f !( )

Input data consists of 7 arm joint angles

p = x y [ z]T

Resulting position of arm’s endeffector in Cartesian space

dp

dt=df !( )

d!

J=?{

d!

dt

Page 15: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

15

Angular & Magnitude Differences of Jacobians• We compare each of the estimated Jacobian matrices, JLWR & JBLWR,

with the analytically computed Jacobian, JA.

• Specifically, we calculate the angular & magnitude differencesbetween the row vectors of the Jacobian matrices:

• Observations:• BLWR & LWR (with an optimally tuned distance metric) perform similarly

• The problem is ill-conditioned and not so easy to solve as it may appear.

• Angular differences for J2 are large, but magnitudes of vectors are small.

JA,1

JBLWR,1

e.g. consider the 1st row vector of JBLWR andthe 1st row vector of JA

Page 16: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

16

Outline

• Motivation• Past & related work• Bayesian locally weighted regression• Experimental results• Conclusions

Page 17: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

17

Conclusions

• We have a Bayesian formulation of spatially locally adaptive kernels that:

i. Learns the optimal bandwidth value, h (i.e., “appropriate” local regime)ii. Is computationally efficientiii. Provides a natural framework to incorporate prior knowledge of noise

level

• Extensions to high-dimensional data with redundant & irrelevant inputdimension, incremental version, embedding in other nonlinear methods,etc. are ongoing.

Page 18: A Bayesian Approach to Empirical Local Linearization for Roboticsjting.net/pubs/2008/ting-ICRA2008-slides.pdf · 2009-02-04 · A Bayesian Approach to Empirical Local Linearization

18

Angular & Magnitude Differences of Jacobians

0.57580.46870.107125 degreesJ3

0.04270.27800.235379 degreesJ2

0.64640.52800.112919 degreesJ1

|JBLWR,i||JA,i|abs(|JA,i|- |JBLWR,i|)∠JA,i - ∠JBLWR,iJi

Between analytical Jacobian JA & inferred Jacobian JBLWR

0.59030.46870.121627 degreesJ3

0.07340.27800.204785 degreesJ2

0.64110.52800.118216 degreesJ1

|JLWR,i||JA,i|abs(|JA,i|- |JLWR,i|)∠JA,i - ∠JLWR,iJi

Between analytical Jacobian JA & inferred Jacobian of LWR (with D=0.1) JLWR

Observations:i) BLWR & LWR (with an optimally tuned D) perform similarlyii) Problem is ill-conditioned (condition number is very high ~1e5).iii) Angular differences for J2 are large, but magnitudes of vectors are small.