improving imitation learning with reinforcement learning · references references...

33
MIN Faculty Department of Informatics Improving Imitation Learning with Reinforcement Learning Niklas Fiedler University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Department of Informatics Technical Aspects of Multimodal Systems November 26, 2019 N. Fiedler Improving Imitation Learning with Reinforcement Learning 1 / 23

Upload: others

Post on 28-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

MIN FacultyDepartment of Informatics

Improving Imitation Learningwith Reinforcement Learning

Niklas Fiedler

University of HamburgFaculty of Mathematics, Informatics and Natural SciencesDepartment of InformaticsTechnical Aspects of Multimodal Systems

November 26, 2019

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 1 / 23

Page 2: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

OutlineIntroduction Imitation Learning Combining RL and IL Conclusion

1. IntroductionMotivation

2. Imitation LearningDemonstration MethodsBehavioral CloningInverse Reinforcement Learning

3. Combining Reinforcement Learning and Imitation LearningBC ApplicationIRL Application

4. Conclusion

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 2 / 23

Page 3: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

GoalIntroduction Imitation Learning Combining RL and IL Conclusion

I Imitate expert behaviorI Improve learning by including knowledge given by

demonstrationI Learn expert policies

→ Make use of expert demonstrations

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 3 / 23

Page 4: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

MotivationHumans are AwesomeIntroduction Imitation Learning Combining RL and IL Conclusion

https://rejectedprincesses.tumblr.com/post/150495232038/

chynara-madinkulova-long-hair-and-aida-akmatovaN. Fiedler – Improving Imitation Learning with Reinforcement Learning 4 / 23

Page 5: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

MotivationLearning from DemonstrationIntroduction Imitation Learning Combining RL and IL Conclusion

Learning from experts is natural behavior

[Haw50], https://www.wakecounseling.com/therapy-blog/play-therapy

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 5 / 23

Page 6: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Imitation LearningIntroduction Imitation Learning Combining RL and IL Conclusion

Method to learn a behavior based on a demonstration

Various forms of demonstration.

Two prominent methods of implementation:1. Behavioral Cloning2. Inverse Reinforcement Learning

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23

Page 7: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Imitation LearningIntroduction Imitation Learning Combining RL and IL Conclusion

Method to learn a behavior based on a demonstration

Various forms of demonstration.

Two prominent methods of implementation:1. Behavioral Cloning2. Inverse Reinforcement Learning

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23

Page 8: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Imitation LearningIntroduction Imitation Learning Combining RL and IL Conclusion

Method to learn a behavior based on a demonstration

Various forms of demonstration.

Two prominent methods of implementation:1. Behavioral Cloning2. Inverse Reinforcement Learning

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23

Page 9: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Demonstration MethodsIntroduction Imitation Learning Combining RL and IL Conclusion

Virtual/Augumented Reality Tracking of Human Motions

Teleoperation Video Streams3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg

siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/

https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23

Page 10: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Demonstration MethodsIntroduction Imitation Learning Combining RL and IL Conclusion

Virtual/Augumented Reality Tracking of Human Motions

Teleoperation Video Streams3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg

siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/

https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23

Page 11: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Demonstration MethodsIntroduction Imitation Learning Combining RL and IL Conclusion

Virtual/Augumented Reality Tracking of Human Motions

Teleoperation Video Streams3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg

siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/

https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23

Page 12: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Demonstration MethodsIntroduction Imitation Learning Combining RL and IL Conclusion

Virtual/Augumented Reality Tracking of Human Motions

Teleoperation Video Streams3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg

siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/

https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23

Page 13: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Behavioral CloningIntroduction Imitation Learning Combining RL and IL Conclusion

I Training a direct link between demonstrated input and outputI Large amounts of training data necessaryI Poor generalization

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 8 / 23

Page 14: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Behavioral CloningVideoIntroduction Imitation Learning Combining RL and IL Conclusion

https://www.youtube.com/watch?v=5BTIE_fhReo

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 9 / 23

Page 15: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Inverse Reinforcement LearningReinforcement LearningIntroduction Imitation Learning Combining RL and IL Conclusion

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 10 / 23

Page 16: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Inverse Reinforcement LearningReinforcement Learning vs. Inversed Reinforcement LearningIntroduction Imitation Learning Combining RL and IL Conclusion

RL IRL

given (partially observed)reward function R

policy π or historysampled from that policy

searching optimal policy πfor given reward

reward function Rfor which given behavioris optimal

https://thinkingwires.com/posts/2018-02-13-irl-tutorial-1.html

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 11 / 23

Page 17: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Inverse Reinforcement LearningIntroduction Imitation Learning Combining RL and IL Conclusion

https://medium.com/@sanketgujar95/generative-adversarial-imitation-learning-266f45634e60

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 12 / 23

Page 18: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Imitation LearningBehavioral Cloning vs. Inversed Reinforcement LearningIntroduction Imitation Learning Combining RL and IL Conclusion

Behavioral CloningI Weak generalizationI Relatively low

computational effort

Inversed ReinforcementLearningI Strong generalizationI Large computational effortI Complex structure

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 13 / 23

Page 19: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Combining Reinforcement Learningand Imitation LearningIntroduction Imitation Learning Combining RL and IL Conclusion

I Reducing the impact of shortcomings of both methodsI Applications should outperform demonstrators after RL

applicationsI Accelerated training processI Extending the capabilities learned with imitation learning

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 14 / 23

Page 20: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

BC ApplicationIntroduction Imitation Learning Combining RL and IL Conclusion

Overcoming Exploration in Reinforcement Learningwith Demonstrations

Ashvin Nair12, Bob McGrew1, Marcin Andrychowicz1, WojciechZaremba1 and Pieter Abbeel12

2018 IEEE International Conference on Robotics and Automation(ICRA)

1OpenAI2University of California, Berkeley

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 15 / 23

Page 21: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

BC ApplicationGoalIntroduction Imitation Learning Combining RL and IL Conclusion

Pushing Sliding

Pick and Place[NMA+18]

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 16 / 23

Page 22: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

BC ApplicationResultsIntroduction Imitation Learning Combining RL and IL Conclusion

[NMA+18]

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 17 / 23

Page 23: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

IRL ApplicationIntroduction Imitation Learning Combining RL and IL Conclusion

Reinforcement and Imitation Learning for DiverseVisuomotor Skills

Yuke Zhu1, Ziyu Wang2, Josh Merel2, Andrei Rusu2, Tom Erez2,Serkan Cabi2, Saran Tunyasuvunakool2, Janos Kramar2, RaiaHadsell2, Nando de Freitas2 and Nicolas Heess2

1Computer Science Department, Stanford University2OpenAI

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 18 / 23

Page 24: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

IRL ApplicationGoalIntroduction Imitation Learning Combining RL and IL Conclusion

[ZWM+18]

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 19 / 23

Page 25: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

IRL ApplicationMethodIntroduction Imitation Learning Combining RL and IL Conclusion

[ZWM+18]

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 20 / 23

Page 26: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

IRL ExampleResults - Block stackingIntroduction Imitation Learning Combining RL and IL Conclusion

[ZWM+18]

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 21 / 23

Page 27: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Combining Reinforcement Learning and ImitationLearningComparisonIntroduction Imitation Learning Combining RL and IL Conclusion

BC ApproachI Behavioral CloningI Simulation onlyI Goal: improve training

performance and taskcomplexity

IRL ApproachI Inversed Reinforcement

LearningI Policies transferred to real

robotI Goal: improve result

performance and taskcomplexity

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 22 / 23

Page 28: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

ConclusionIntroduction Imitation Learning Combining RL and IL Conclusion

I Behavioral cloning is a convenient option to directly mimicexperts behavior

I Inverse reinforcement learning is able to learn expert policiesI More complex reinforcement learning tasks can be realizedI When combined with reinforcement learning, demonstrators

can be outperformed

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 23 / 23

Page 29: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Useful LinksReferences

GAIL explained in a blog post:https://medium.com/@sanketgujar95/generative-adversarial-imitation-learning-266f45634e60

Behavioral Cloning explained in a blog post:https://medium.com/@ksakmann/behavioral-cloning-make-a-car-drive-like-yourself-dc6021152713

Source code and model of behavioral cloning based self-drivingcar:https://github.com/ksakmann/CarND-BehavioralCloning

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 1 / 5

Page 30: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

ReferencesReferences

[Haw50] TH Hawkins, Opening of milk bottles by birds, Nature165 (1950), no. 4194, 435–436.

[NMA+18] Ashvin Nair, Bob McGrew, Marcin Andrychowicz,Wojciech Zaremba, and Pieter Abbeel, Overcomingexploration in reinforcement learning withdemonstrations, 2018 IEEE International Conferenceon Robotics and Automation (ICRA), IEEE, 2018,pp. 6292–6299.

[ZWM+18] Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, TomErez, Serkan Cabi, Saran Tunyasuvunakool, JánosKramár, Raia Hadsell, Nando de Freitas, et al.,Reinforcement and imitation learning for diversevisuomotor skills, arXiv preprint arXiv:1802.09564(2018).

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 2 / 5

Page 31: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

Prediction in CollaborationReferences

https://www.kuka.com/-/media/kuka-corporate/images/industries/case-studies/schwingenmontage/

flexfellow_mrk_header.jpg

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 3 / 5

Page 32: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

IRL ApplicationNetwork StructureReferences

[ZWM+18]

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 4 / 5

Page 33: Improving Imitation Learning with Reinforcement Learning · References References [Haw50]THHawkins,Opening of milk bottles by birds,Nature 165 (1950),no.4194,435–436. [NMA+18]AshvinNair,BobMcGrew,MarcinAndrychowicz,

IRL ApplicationFull ResultsReferences

[ZWM+18]

N. Fiedler – Improving Imitation Learning with Reinforcement Learning 5 / 5