curriculum-guided hindsight experience replay › files › posters › cher-poster.pdf · title:...

Curriculum-guided Hindsight Experience ReplayMeng Fang1 Tianyi Zhou2 Yali Du3 Lei Han1 Zhengyou Zhang1

1Tencent AI LabRobotics X 2University of Washington 3University College London

Code (Github)mengf1CHER

Paper (NeurIPS-2019)CHER

Overview

This work deals with sparse rewards challenges inreinforcement learning (RL) and assumes that notall the failed experiences are equally useful to dif-ferent learning stages

We adopt a human-like learning strategy that en-forces more curiosity in earlier stages and changesto larger goal-proximity later1) adaptively select the failed experiences for re-play according to the proximity to true goals andthe curiosity of exploration over diverse pseudogoals2) gradually change the proportion of the goal-proximity and the diversity-based curiosity in theselection criteria

Our ldquoGoal-and-Curiosity-driven CurriculumLearningrdquo leads to ldquoCurriculum-guided HER(CHER)rdquo which adaptively and dynamicallycontrols the exploration-exploitation trade-off dur-ing the learning process via hindsight experienceselection

Robotics Environments

FetchReach(Toy example) HandReach HandManipulate

BlockHandManipulate

EggHandManipulate

Pen

There are FetchReach environment and fourShadow Dexterous Hand environments Han-dReach Block manipulation Egg manipulationand Pen manipulation

Methodology

In contrast to uniform sampling we propose to select a subset of achieved goals A sube B according tomax

AsubeB|A|lekF (A) λFprox(A) + Fdiv(A)

bullGoal-proximity their proximity to the desired goals Fprox(A) sumiisinA sim(gi g)

bullDiversity-based curiosity their diversity that reflects the curiosity of agent exploring different achievedgoals in the environment Fdiv(A) sum

jisinB maxiisinA sim(gi gj)bullUtility score

F (i|A) =λ sim(gi g) +sumjisinB

max0 sim(gi gj)minusmax

lisinAsim(gl gj)

In practice kd-tree to build a sparse K-nearest neighbor graph of pseudo goals lazier than lazy greedy

Experiments

Baselines DDPG DDPG+HER (uniformly) DDPG+HEREBP (energy function)Toy example ndash FetchReach

- The red points (selected achieved goals) compose a diverse and representative subset of the gray points(all achieved goals) but some are not close to any green point (desired goals) since CHER prefers diversitythan proximity in earlier episodes- Most red points are close to some green points due to the large proximity in later episodesrsquo selectioncriteria but some regions with many gray points concentrated do not contain any red point since CHERprefers proximity more than diversityBenchmark results ndash Hand environments

CHER learns much faster than other RL methods

Conclusion

bullCHER is the first work that adaptively selectsfailed experiences for replay according to theircompatibility and usefulness to different learningstages of deep RLbullA large diversity is beneficial to earlier explo-ration while a large proximity to the desiredgoals is essential for effective exploitation in laterstagesbullThe sample efficiency and learning speed of off-policy RL algorithms can be significantly im-proved by CHERbullBetter than other HER-based approachesbullCHER does not make assumptions on tasks andenvironments and can potentially be generalizedto other more complicated tasks environmentsand settings

References

[Lillicrap et al 2015] Timothy P Lillicrap Jonathan J Hunt AlexanderPritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and DaanWierstra Continuous control with deep reinforcement learning arXivpreprint arXiv150902971[Andrychowicz et al 2017] Marcin Andrychowicz Filip Wolski Alex RayJonas Schneider Rachel Fong Peter Welinder Bob McGrew Josh TobinPieter Abbeel and Wojciech Zaremba Hindsight experience replay InAdvances in Neural Information Processing Systems[Zhao and Tresp 2018] R Zhao and V Tresp Energy-based hindsightexperience prioritization In Conference on Robot Learning[Zhou and Bilmes 2018] T Zhou and J Bilmes Minimax curriculumlearning Machine teaching with desirable difficulties and scheduled di-versity In International Conference on Learning Representations

curriculum-guided hindsight experience replay › files › posters › cher-poster.pdf · title:...

Documents