a bayesian approach to empirical local linearization for...

A Bayesian Approach toEmpirical Local Linearization forRobotics

Jo-Anne Ting1, Aaron D’Souza2,Sethu Vijayakumar3, Stefan Schaal1

1University of Southern California,2Google, Inc., 3University of Edinburgh

ICRA 2008May 23, 2008

2

Outline

• Motivation• Past & related work• Bayesian locally weighted regression• Experimental results• Conclusions

3

Motivation

• Locally linear methods have been shown to be useful for robotcontrol (e.g., learning internal models of high-dimensional systemsfor feedforward control or local linearizations for optimal control &reinforcement learning).

• A key problem is to find the “right” size of the local region for alinearization, as in locally weighted regression.

• Existing methods* use either cross-validation techniques, complexstatistical hypothesis or require significant manual parameter tuningfor good & stable performance.

*e.g., supersmoothing (Friedman, 84), LWPR (Vijayakumar et al, 05), (Fan & Gijbels, 92 & 95)

X

Y

4

Outline


5

Quick Review of Locally Weighted Regression• Given a nonlinear regression problem, , our goal is to

approximate a locally linear model at each query point xq in order tomake the prediction:

• We compute the measure of locality for each data sample with aspatial weighting kernel K, e.g., wi = K(xi, xq, h).

• If we can find the “right” local regime for each xq, nonlinear functionapproximation may be solved accurately and efficiently.

y = f x( ) + !

yq= b

Txq

Previous methods may:i) Be sensitive to initial valuesii) Require tuning/setting of open parametersiii) Be computationally involved

Weighting kernel

X

Y

6

Outline


7

Bayesian Locally Weighted Regression• Our variational Bayesian algorithm:

i. Learns both b and the optimal hii. Handles high-dimensional dataiii. Associates a scalar indicator weight wi with each data sample

• We assume the following prior distributions:

i = 1,..,N

bm

!2

ahm

bhm

!N

2

n

m = 1,..,d

wim

yi

xim

hm

p yixi

( ) ~ Normal bT

xi,!

2( )

p b !2( ) ~ Normal 0,! 2

"b 0

( )

p !2( ) ~ Scaled-Inv-# 2

n,!N

2( )

where each data sample has a weight wi:

wi= w

im

m=1

d

! , where p wim

( ) ~ Bernoulli 1+ xim" x

qm( )r

hm

#$

%&"1

( )hm

~ Gamma ahm

,bhm

( )

8

Inference Procedure

• We can treat this as an EM learning problem (Dempster & Laird, ‘77):

• We use a variational factorial approximation of the true joint posteriordistribution* (e.g., Ghahramani & Beal, ‘00) and a variationalapproximation on concave/convex functions, as suggested by(Jaakkola & Jordan, ‘00), to get analytically tractable inference.

Maximize L,where L = log p yi ,wi ,b,z! ,h xi( )i=1

N

"

*Q b,!z,h( ) =Q b,!

z( )Q h( )

L = log p yi xi ,b,! 2( )wi

i=1

N

" + log p wim( )m=1

d

"i=1

N

" + log p b ! 2( )

+ log p !2( ) + log p h( )

where

9

Important Things to Note

• For each local model, our algorithm:

i. Learns the optimal bandwidth value, h (i.e. the “appropriate”local regime)

ii. Is linear in the number of input dimensions per EM iteration (foran extended model with intermediate hidden variables, z,introduced for fast computation)

iii. Provides a natural framework to incorporate prior knowledge ofthe strong (or weak) presence of noise

10

Outline


11

Experimental Results: Synthetic dataFunction with discontinuity + N(0,0.3025) output noise

Function with increasing curvature + N(0,0.01) output noise

X

Y

12

Experimental Results: Synthetic dataFunction with peak + N(0,0.09) output noise

Straight line (notice “flat” kernels are learnt)

13

Experimental Results: Synthetic data

Kernel Shaping Gaussian Process regressionTarget function

2D “cross” function* + N(0, 0.01)

Kernel Shaping: Learnt Kernels

*Training data has 500 samples and mean-zero noise with variance of 0.01 added to outputs.

14

Experimental Results: Robot arm data• Given a kinematics problem for a 7 DOF robot arm:

we want to estimate the Jacobian, J, for the purpose of establishingthe algorithm does the right thing for each local regression problem:

• For a particular local linearization problem, we compare the estimatedJacobian using BLWR, JBLWR, to the:• Analytically computed Jacobian, JA

• Estimated Jacobian using locally weighted regression, JLWR(where the optimal distance metric is found with cross-validation).

p = f !( )

Input data consists of 7 arm joint angles

p = x y [ z]T

Resulting position of arm’s endeffector in Cartesian space

dp

dt=df !( )

d!

J=?{

d!

dt

15

Angular & Magnitude Differences of Jacobians• We compare each of the estimated Jacobian matrices, JLWR & JBLWR,

with the analytically computed Jacobian, JA.

• Specifically, we calculate the angular & magnitude differencesbetween the row vectors of the Jacobian matrices:

• Observations:• BLWR & LWR (with an optimally tuned distance metric) perform similarly

• The problem is ill-conditioned and not so easy to solve as it may appear.

• Angular differences for J2 are large, but magnitudes of vectors are small.

JA,1

JBLWR,1

e.g. consider the 1st row vector of JBLWR andthe 1st row vector of JA

16

Outline


17

Conclusions

• We have a Bayesian formulation of spatially locally adaptive kernels that:

i. Learns the optimal bandwidth value, h (i.e., “appropriate” local regime)ii. Is computationally efficientiii. Provides a natural framework to incorporate prior knowledge of noise

level

• Extensions to high-dimensional data with redundant & irrelevant inputdimension, incremental version, embedding in other nonlinear methods,etc. are ongoing.

18

Angular & Magnitude Differences of Jacobians

0.57580.46870.107125 degreesJ3

0.04270.27800.235379 degreesJ2

0.64640.52800.112919 degreesJ1

|JBLWR,i||JA,i|abs(|JA,i|- |JBLWR,i|)∠JA,i - ∠JBLWR,iJi

Between analytical Jacobian JA & inferred Jacobian JBLWR

0.59030.46870.121627 degreesJ3

0.07340.27800.204785 degreesJ2

0.64110.52800.118216 degreesJ1

|JLWR,i||JA,i|abs(|JA,i|- |JLWR,i|)∠JA,i - ∠JLWR,iJi

Between analytical Jacobian JA & inferred Jacobian of LWR (with D=0.1) JLWR

Observations:i) BLWR & LWR (with an optimally tuned D) perform similarlyii) Problem is ill-conditioned (condition number is very high ~1e5).iii) Angular differences for J2 are large, but magnitudes of vectors are small.

a bayesian approach to empirical local linearization for...

Documents