learning from demonstrations

Learning from Demonstrations Jur van den Berg

Upload: fabian

Post on 22-Feb-2016

60 views

Category:

Documents

0 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

Learning from Demonstrations. Jur van den Berg. Kalman Filtering and Smoothing. Dynamics and Observation model Kalman Filter: Compute Real-time, given data so far Kalman Smoother: Compute Post-processing, given all data. EM Algorithm. Kalman smoother: - PowerPoint PPT Presentation

TRANSCRIPT

Learning from Demonstrations

Jur van den Berg

Kalman Filtering and Smoothing

• Dynamics and Observation model

• Kalman Filter:– Compute– Real-time, given data so far

• Kalman Smoother:– Compute – Post-processing, given all data

ttt YYX yy ,,| 00

TtYYX TTt 0,,,| 00 yy

),(~),(~

,,1

RNQN

0v0w

vxwx

EM Algorithm

• Kalman smoother: – Compute distributions X0, …, Xt

given parameters A, C, Q, R, and data y0, …, yt.

• EM Algorithm:– Simultaneously optimize X0, …, Xt and A, C, Q, R

given data y0, …, yt.

),(~),(~

,,1

RNQN

0v0w

vxwx

Learning from Demonstrations

• Application of EM-algorithm• Example: – Autonomous helicopter aerobatics– Autonomous surgical tasks (knot-tying)

Motivation

• Learning an ideal “trajectory” of system• Human provides demonstrations of ideal

trajectory• Human demonstrations imperfect• Multiple demonstrations implicitly encode

ideal trajectory• Task: infer ideal trajectory from

demonstrations

Acquiring Demonstrations

• Known system dynamics (A, B, Q)• Observations with known sensors (C, R)– Inertial measurement unit– GPS– Cameras

• Use Kalman smoother to optimally estimate states x along demonstration trajectory

),(~),(~

,,1

RNQN

CBA

ttt

0v0w

vxwux

Multiple Demonstrations

• D demonstration trajectories of duration Tj

• Hidden ideal trajectory z of duration T*

jjt

jtj

t TtDj ,,1,,1

tt ,,1

Model of Ideal Trajectory• Main idea: use demonstrations as noisy

observations of hidden ideal trajectory• Dynamics of hidden trajectory

• Observation of hidden trajectory

0,(~,

NIBA

tttt 0wwzz

)00

0000

,(~,

Dttt

SNss

I 0z

Inferring Ideal Trajectory

• Dynamics model: Parameter N controls smoothness; A, B, Q known

• Observation model: Parameters S encode relative quality of demonstrations

• Use EM-algorithm with Kalman smoother to simultaneously optimize z and S (and N).• Initialize S with identity matrices

Time Warping

• But, this assumes demonstrations are of equal length and uniformly paced

• Include Dynamic Time Warping into EM-algorithm

• Such that demonstrations map temporally

Time Warping

• For each demonstration j, we have function tj(t)

• Maps time t along z to time tj(t) along dj

• Adapted observation model:

)00

0000

,(~,

)(

1)(1

Dttt

0sszd

Learning Time Warping

• tj(t) is (initially) unknown• Assume (initially):– T* = (T1 + … + TD) / D– tj(t) = (Tj / T*) t

• Adapted EM-algorithm:– Run Kalman smoother with current S and t– Optimize S by maximizing likelihood– Optimize t by maximizing likelihood

(Dynamic Time Warping)

Dynamic Time Warping

• Match demonstration j with z• Assume that demonstration moves locally– twice as slow as z– same pace as z– twice as fast as z

• Dynamic Programmingto find optimal “path”

• Cost function: likelihood of),(~,

)(j

tttjt

SNj 0sszd t

Example: Helicopter Airshow• Thesis work of Pieter Abbeel• Unaligned demonstrations:– Movie

• Time-aligned demonstrations:– Movie

• Execution of learnt trajectory– Movie

Example Surgical Knot-tie

• ICRA 2010 Best Medical Robotics Paper Award• Video of knot-tie

Conclusion

• Learning from demonstrations• Includes Dynamic Time Warping into

EM-algorithm

Reinforcement Learning of Motor Skills using Policy Search and …€¦ · Reinforcement Learning, Policy Search, Learning from Demonstrations, Interactive Machine Learning, Movement

Learning How-To Knowledge from the Webyukez/talks/learning_how_to... · 2020. 4. 25. · Humans learn efficiently from video demonstrations. Meltzoff & Moore 1977; Meltzoff & Moore

Predictive Learning from Demonstrationpeople.cs.umu.se/thomash/reports/Predictive learning from demonst… · Predictive Learning from Demonstration 187 the demonstrations are seen

Learning Stylistic Dynamic Movement Primitives …genesis.naist.jp/~takam-m/Matsubara_IROS2010.pdfLearning Stylistic Dynamic Movement Primitives from Multiple Demonstrations Takamitsu

Learning and Memory Strategy Demonstrations for the Psychology

Watch, Try, Learn: Meta-Learning from Demonstrations and ... · Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards Allan Zhou 1, Eric Jang, Daniel Kappler2, Alex Herzog

IEEE TRANSACTIONS ON ROBOTICS 1 Learning Physical ... · IEEE TRANSACTIONS ON ROBOTICS 1 Learning Physical Collaborative Robot Behaviors from Human Demonstrations Leonel Rozo 1, Member,

Learning cloth manipulation with demonstrations - IRI · IRI-TR-19-01 Learning cloth manipulation with demonstrations Rishabh Jangir Carme Torras Guillem Aleny`a. Abstract Recent

Multi-Modal Imitation Learning from Unstructured ... · Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets Karol Hausman y, Yevgen Chebotar

A Syntactic Approach to Robot Learning of Human Tasks from ... · A Syntactic Approach to Robot Learning of Human Tasks from Demonstrations KyuhwaLee Submitted in part fulﬁlment

Devesh K. Jha Imitation of Demonstrations Using … · Imitation of Demonstrations ... model for correction. 1.1 Prior Work. Learning from demonstration (LFD) ... using the chain

Learning from Limited Demonstrations

Learning a Behavioral Repertoire from Demonstrations

Learning Potential Functions from Human Demonstrations ...khatib.stanford.edu/publications/pdfs/Khansari_2015_AR.pdf · The potential energy Φ is learned from demonstrations (shown

Learning Grounded Finite-State Representations from ...gdk/pubs/fsm_unstructured.pdf · Learning Grounded Finite-State Representations from Unstructured Demonstrations Scott Niekum1,2,

Independent Generative Adversarial Self-Imitation Learning ... · Imitation learning is also known as learning from demonstrations or apprenticeship learning, whose goal is to learn

Text Generation by Learning from Demonstrations

Accessible E-Learning Demonstrations Using IMS Accessibility Specifications

Deep Q-Learning from Demonstrations (DQfD)florian/courses/imitation_learning/lectures/... · Deep Q-Learning from Demonstrations (DQfD) Bryan Chan & Chandripal Budnarain. Markov Decision

Learning Manipulation Actions from a Few …ipvs.informatik.uni-stuttgart.de/mlr/13-SeminarRobotics/paper/... · Learning Manipulation Actions from a Few Demonstrations Nichola Abdo,

Work-based learning and skills demonstrations during COVID-19

Relational Reinforcement Learning with Guided Demonstrations · Keywords: active learning, learning guidance, planning excuse, reinforcement learning, robot learning, teacher demonstration,

Learning to summarize from human feedbackfeedback and supervised learning at 6.7B. are incentivized to place probability mass on all human demonstrations, including those that are

Learning Sensor Feedback Models from Demonstrations via … · Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks Giovanni Sutanto 1 ;2, Zhe Su

Overcoming Exploration in Reinforcement Learning with Demonstrations · 2018-02-27 · Overcoming Exploration in Reinforcement Learning with Demonstrations Ashvin Nair12, Bob McGrew

Away from Demonstrations. South Africans Poor People's

Reinforcement Learning from Imperfect Demonstrations under

Learning Motion Skills From Expert Demonstrations … · Learning Motion Skills from Expert Demonstrations and Own Experience using Gaussian Process Regression Kathrin Gräve, Jörg

Goal-conditioned Imitation Learningpapers.nips.cc/paper/9667-goal-conditioned-imitation-learning.pdf · goal-reaching policies. Learning from Demonstrations, or Imitation Learning

Learning and Memory Strategy Demonstrations for the ... · collection of in-class learning and memory strategy demonstrations. The demonstrations illustrate strategies that are empirically

Encoding Demonstrations and Learning New …sesmaeil/pdf/Encoding Demonstrations and...Encoding Demonstrations and Learning New Trajectories using Canal Surfaces S. Reza Ahmadzadeh

Towards Learning Task Intentions from Human Demonstrations€¦ · Towards Learning Task Intentions from Human Demonstrations Tim Welschehold Nichola Abdo Christian Dornhege Wolfram

Demonstrations and Ideas from the Exploratorium AAPT ...sciencegeekgirl.com/conferences/AAPT08_Chasteen_handout.pdf · Demonstrations and Ideas from the Exploratorium AAPT Denver,

Learning Latent Actions without Human Demonstrations

C-LEARN: Learning Geometric Constraints from ...people.csail.mit.edu/cdarpino/CLEARN/ICRA17_DArpino_CLEARN.pdfC-LEARN: Learning Geometric Constraints from Demonstrations for Multi-Step

learning from demonstrations

Documents

time tjt

likelihood dynamic time

ideal trajectorydynamics

demonstrations map

learning time warpingtjt

ideal trajectorytask

function tjtmaps time

tadapted emalgorithm