improving imitation learning with reinforcement learning · references references...
TRANSCRIPT
MIN FacultyDepartment of Informatics
Improving Imitation Learningwith Reinforcement Learning
Niklas Fiedler
University of HamburgFaculty of Mathematics, Informatics and Natural SciencesDepartment of InformaticsTechnical Aspects of Multimodal Systems
November 26, 2019
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 1 / 23
OutlineIntroduction Imitation Learning Combining RL and IL Conclusion
1. IntroductionMotivation
2. Imitation LearningDemonstration MethodsBehavioral CloningInverse Reinforcement Learning
3. Combining Reinforcement Learning and Imitation LearningBC ApplicationIRL Application
4. Conclusion
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 2 / 23
GoalIntroduction Imitation Learning Combining RL and IL Conclusion
I Imitate expert behaviorI Improve learning by including knowledge given by
demonstrationI Learn expert policies
→ Make use of expert demonstrations
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 3 / 23
MotivationHumans are AwesomeIntroduction Imitation Learning Combining RL and IL Conclusion
https://rejectedprincesses.tumblr.com/post/150495232038/
chynara-madinkulova-long-hair-and-aida-akmatovaN. Fiedler – Improving Imitation Learning with Reinforcement Learning 4 / 23
MotivationLearning from DemonstrationIntroduction Imitation Learning Combining RL and IL Conclusion
Learning from experts is natural behavior
[Haw50], https://www.wakecounseling.com/therapy-blog/play-therapy
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 5 / 23
Imitation LearningIntroduction Imitation Learning Combining RL and IL Conclusion
Method to learn a behavior based on a demonstration
Various forms of demonstration.
Two prominent methods of implementation:1. Behavioral Cloning2. Inverse Reinforcement Learning
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23
Imitation LearningIntroduction Imitation Learning Combining RL and IL Conclusion
Method to learn a behavior based on a demonstration
Various forms of demonstration.
Two prominent methods of implementation:1. Behavioral Cloning2. Inverse Reinforcement Learning
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23
Imitation LearningIntroduction Imitation Learning Combining RL and IL Conclusion
Method to learn a behavior based on a demonstration
Various forms of demonstration.
Two prominent methods of implementation:1. Behavioral Cloning2. Inverse Reinforcement Learning
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23
Demonstration MethodsIntroduction Imitation Learning Combining RL and IL Conclusion
Virtual/Augumented Reality Tracking of Human Motions
Teleoperation Video Streams3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg
siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/
https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23
Demonstration MethodsIntroduction Imitation Learning Combining RL and IL Conclusion
Virtual/Augumented Reality Tracking of Human Motions
Teleoperation Video Streams3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg
siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/
https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23
Demonstration MethodsIntroduction Imitation Learning Combining RL and IL Conclusion
Virtual/Augumented Reality Tracking of Human Motions
Teleoperation Video Streams3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg
siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/
https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23
Demonstration MethodsIntroduction Imitation Learning Combining RL and IL Conclusion
Virtual/Augumented Reality Tracking of Human Motions
Teleoperation Video Streams3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg
siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/
https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23
Behavioral CloningIntroduction Imitation Learning Combining RL and IL Conclusion
I Training a direct link between demonstrated input and outputI Large amounts of training data necessaryI Poor generalization
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 8 / 23
Behavioral CloningVideoIntroduction Imitation Learning Combining RL and IL Conclusion
https://www.youtube.com/watch?v=5BTIE_fhReo
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 9 / 23
Inverse Reinforcement LearningReinforcement LearningIntroduction Imitation Learning Combining RL and IL Conclusion
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 10 / 23
Inverse Reinforcement LearningReinforcement Learning vs. Inversed Reinforcement LearningIntroduction Imitation Learning Combining RL and IL Conclusion
RL IRL
given (partially observed)reward function R
policy π or historysampled from that policy
searching optimal policy πfor given reward
reward function Rfor which given behavioris optimal
https://thinkingwires.com/posts/2018-02-13-irl-tutorial-1.html
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 11 / 23
Inverse Reinforcement LearningIntroduction Imitation Learning Combining RL and IL Conclusion
https://medium.com/@sanketgujar95/generative-adversarial-imitation-learning-266f45634e60
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 12 / 23
Imitation LearningBehavioral Cloning vs. Inversed Reinforcement LearningIntroduction Imitation Learning Combining RL and IL Conclusion
Behavioral CloningI Weak generalizationI Relatively low
computational effort
Inversed ReinforcementLearningI Strong generalizationI Large computational effortI Complex structure
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 13 / 23
Combining Reinforcement Learningand Imitation LearningIntroduction Imitation Learning Combining RL and IL Conclusion
I Reducing the impact of shortcomings of both methodsI Applications should outperform demonstrators after RL
applicationsI Accelerated training processI Extending the capabilities learned with imitation learning
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 14 / 23
BC ApplicationIntroduction Imitation Learning Combining RL and IL Conclusion
Overcoming Exploration in Reinforcement Learningwith Demonstrations
Ashvin Nair12, Bob McGrew1, Marcin Andrychowicz1, WojciechZaremba1 and Pieter Abbeel12
2018 IEEE International Conference on Robotics and Automation(ICRA)
1OpenAI2University of California, Berkeley
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 15 / 23
BC ApplicationGoalIntroduction Imitation Learning Combining RL and IL Conclusion
Pushing Sliding
Pick and Place[NMA+18]
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 16 / 23
BC ApplicationResultsIntroduction Imitation Learning Combining RL and IL Conclusion
[NMA+18]
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 17 / 23
IRL ApplicationIntroduction Imitation Learning Combining RL and IL Conclusion
Reinforcement and Imitation Learning for DiverseVisuomotor Skills
Yuke Zhu1, Ziyu Wang2, Josh Merel2, Andrei Rusu2, Tom Erez2,Serkan Cabi2, Saran Tunyasuvunakool2, Janos Kramar2, RaiaHadsell2, Nando de Freitas2 and Nicolas Heess2
1Computer Science Department, Stanford University2OpenAI
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 18 / 23
IRL ApplicationGoalIntroduction Imitation Learning Combining RL and IL Conclusion
[ZWM+18]
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 19 / 23
IRL ApplicationMethodIntroduction Imitation Learning Combining RL and IL Conclusion
[ZWM+18]
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 20 / 23
IRL ExampleResults - Block stackingIntroduction Imitation Learning Combining RL and IL Conclusion
[ZWM+18]
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 21 / 23
Combining Reinforcement Learning and ImitationLearningComparisonIntroduction Imitation Learning Combining RL and IL Conclusion
BC ApproachI Behavioral CloningI Simulation onlyI Goal: improve training
performance and taskcomplexity
IRL ApproachI Inversed Reinforcement
LearningI Policies transferred to real
robotI Goal: improve result
performance and taskcomplexity
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 22 / 23
ConclusionIntroduction Imitation Learning Combining RL and IL Conclusion
I Behavioral cloning is a convenient option to directly mimicexperts behavior
I Inverse reinforcement learning is able to learn expert policiesI More complex reinforcement learning tasks can be realizedI When combined with reinforcement learning, demonstrators
can be outperformed
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 23 / 23
Useful LinksReferences
GAIL explained in a blog post:https://medium.com/@sanketgujar95/generative-adversarial-imitation-learning-266f45634e60
Behavioral Cloning explained in a blog post:https://medium.com/@ksakmann/behavioral-cloning-make-a-car-drive-like-yourself-dc6021152713
Source code and model of behavioral cloning based self-drivingcar:https://github.com/ksakmann/CarND-BehavioralCloning
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 1 / 5
ReferencesReferences
[Haw50] TH Hawkins, Opening of milk bottles by birds, Nature165 (1950), no. 4194, 435–436.
[NMA+18] Ashvin Nair, Bob McGrew, Marcin Andrychowicz,Wojciech Zaremba, and Pieter Abbeel, Overcomingexploration in reinforcement learning withdemonstrations, 2018 IEEE International Conferenceon Robotics and Automation (ICRA), IEEE, 2018,pp. 6292–6299.
[ZWM+18] Yuke Zhu, Ziyu Wang, Josh Merel, Andrei Rusu, TomErez, Serkan Cabi, Saran Tunyasuvunakool, JánosKramár, Raia Hadsell, Nando de Freitas, et al.,Reinforcement and imitation learning for diversevisuomotor skills, arXiv preprint arXiv:1802.09564(2018).
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 2 / 5
Prediction in CollaborationReferences
https://www.kuka.com/-/media/kuka-corporate/images/industries/case-studies/schwingenmontage/
flexfellow_mrk_header.jpg
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 3 / 5
IRL ApplicationNetwork StructureReferences
[ZWM+18]
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 4 / 5
IRL ApplicationFull ResultsReferences
[ZWM+18]
N. Fiedler – Improving Imitation Learning with Reinforcement Learning 5 / 5