learning from demonstrations

16
Learning from Demonstrations Jur van den Berg

Upload: fabian

Post on 22-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Learning from Demonstrations. Jur van den Berg. Kalman Filtering and Smoothing. Dynamics and Observation model Kalman Filter: Compute Real-time, given data so far Kalman Smoother: Compute Post-processing, given all data. EM Algorithm. Kalman smoother: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning from Demonstrations

Learning from Demonstrations

Jur van den Berg

Page 2: Learning from Demonstrations

Kalman Filtering and Smoothing

• Dynamics and Observation model

• Kalman Filter:– Compute– Real-time, given data so far

• Kalman Smoother:– Compute – Post-processing, given all data

ttt YYX yy ,,| 00

TtYYX TTt 0,,,| 00 yy

),(~),(~

,,1

RNQN

CA

t

t

tt

tt

t

t

0v0w

vxwx

yx

Page 3: Learning from Demonstrations

EM Algorithm

• Kalman smoother: – Compute distributions X0, …, Xt

given parameters A, C, Q, R, and data y0, …, yt.

• EM Algorithm:– Simultaneously optimize X0, …, Xt and A, C, Q, R

given data y0, …, yt.

),(~),(~

,,1

RNQN

CA

t

t

tt

tt

t

t

0v0w

vxwx

yx

Page 4: Learning from Demonstrations

Learning from Demonstrations

• Application of EM-algorithm• Example: – Autonomous helicopter aerobatics– Autonomous surgical tasks (knot-tying)

Page 5: Learning from Demonstrations

Motivation

• Learning an ideal “trajectory” of system• Human provides demonstrations of ideal

trajectory• Human demonstrations imperfect• Multiple demonstrations implicitly encode

ideal trajectory• Task: infer ideal trajectory from

demonstrations

Page 6: Learning from Demonstrations

Acquiring Demonstrations

• Known system dynamics (A, B, Q)• Observations with known sensors (C, R)– Inertial measurement unit– GPS– Cameras

• Use Kalman smoother to optimally estimate states x along demonstration trajectory

),(~),(~

,,1

RNQN

CBA

t

t

tt

ttt

t

t

0v0w

vxwux

yx

Page 7: Learning from Demonstrations

Multiple Demonstrations

• D demonstration trajectories of duration Tj

• Hidden ideal trajectory z of duration T*

jjt

jtj

t TtDj ,,1,,1

ux

d

Tt

t

tt ,,1

ux

z

Page 8: Learning from Demonstrations

Model of Ideal Trajectory• Main idea: use demonstrations as noisy

observations of hidden ideal trajectory• Dynamics of hidden trajectory

• Observation of hidden trajectory

)0

0,(~,

01

NQ

NIBA

tttt 0wwzz

)00

0000

,(~,

11

Dttt

Dt

t

S

SNss

I

I 0z

d

d

Page 9: Learning from Demonstrations

Inferring Ideal Trajectory

• Dynamics model: Parameter N controls smoothness; A, B, Q known

• Observation model: Parameters S encode relative quality of demonstrations

• Use EM-algorithm with Kalman smoother to simultaneously optimize z and S (and N).• Initialize S with identity matrices

Page 10: Learning from Demonstrations

Time Warping

• But, this assumes demonstrations are of equal length and uniformly paced

• Include Dynamic Time Warping into EM-algorithm

• Such that demonstrations map temporally

Page 11: Learning from Demonstrations

Time Warping

• For each demonstration j, we have function tj(t)

• Maps time t along z to time tj(t) along dj

• Adapted observation model:

)00

0000

,(~,

1

)(

1)(1

Dttt

Dt

t

S

SN

I

I

D

0sszd

d

t

t

Page 12: Learning from Demonstrations

Learning Time Warping

• tj(t) is (initially) unknown• Assume (initially):– T* = (T1 + … + TD) / D– tj(t) = (Tj / T*) t

• Adapted EM-algorithm:– Run Kalman smoother with current S and t– Optimize S by maximizing likelihood– Optimize t by maximizing likelihood

(Dynamic Time Warping)

Page 13: Learning from Demonstrations

Dynamic Time Warping

• Match demonstration j with z• Assume that demonstration moves locally– twice as slow as z– same pace as z– twice as fast as z

• Dynamic Programmingto find optimal “path”

• Cost function: likelihood of),(~,

)(j

tttjt

SNj 0sszd t

Page 14: Learning from Demonstrations

Example: Helicopter Airshow• Thesis work of Pieter Abbeel• Unaligned demonstrations:– Movie

• Time-aligned demonstrations:– Movie

• Execution of learnt trajectory– Movie

Page 15: Learning from Demonstrations

Example Surgical Knot-tie

• ICRA 2010 Best Medical Robotics Paper Award• Video of knot-tie

Page 16: Learning from Demonstrations

Conclusion

• Learning from demonstrations• Includes Dynamic Time Warping into

EM-algorithm