relational transfer in reinforcement learning
TRANSCRIPT
Lisa Torrey
University of Wisconsin – Madison
CS 540
Transfer Learning
EducationHierarchical curriculum
Learning tasks share common stimulus-response elements
Abstract problem-solvingLearning tasks share general underlying principles
MultilingualismKnowing one language affects learning in
anotherTransfer can be both positive and negative
Transfer Learning in Humans
Transfer Learning in AI
Given
Learn
Task T
Task S
Goals of Transfer Learning
perf
orm
an
ce
training
higher start
higher slope
higher asymptote
Inductive Learning
All Hypotheses
Allowed Hypotheses
Search
Transfer in Inductive Learning
All Hypotheses
Allowed Hypotheses
Search
Thrun and Mitchell 1995: Transfer slopes for gradient descent
Transfer in Inductive Learning
Bayesian Learning
Bayesian Transfer
Priordistribution
+
Data
=
Posterior Distributio
n
Bayesian methods
Raina et al.2006: Transfer a Gaussian prior
Transfer in Inductive Learning
Line Curve
Surface Circle
Pipe
Hierarchical methods
Stracuzzi 2006: Learn Boolean concepts that can depend on each other
Transfer in Inductive Learning
Dealing with Missing Data or Labels
Shi et al. 2008: Transfer via active learning
Task S
Task T
Reinforcement Learning
Environment
s1
AgentQ(s1, a) =
0π(s1) = a1a
1
s2
r2
δ(s1, a1) = s2
r(s1, a1) = r2
Q(s1, a1) Q(s1, a1) + Δ
π(s2) = a2a2
δ(s2, a2) = s3
r(s2, a2) = r3
s3
r3
Transfer in Reinforcement Learning
Starting-point
methods
Hierarchical methods
Alterationmethods
Imitation methods
New RL algorithms
Transfer in Reinforcement Learning
0 0 0 0
0 0 0 0
0 0 0 0 target-task training
2 5 4 8
9 1 7 2
5 9 1 4
Initial Q-tabletransferno transfer
Source task
Starting-point methods
Taylor et al. 2005: Value-function transfer
Transfer in Reinforcement Learning
Hierarchical methods
Run Kick
Pass Shoot
Soccer
Mehta et al. 2008: Transfer a learned hierarchy
Transfer in Reinforcement Learning
Alteration methods
Walsh et al. 2006: Transfer aggregate states
Task S
Original statesOriginal actionsOriginal rewards
New statesNew actionsNew rewards
Transfer in Reinforcement Learning
New RL Algorithms
Torrey et al. 2006: Transfer advice about skills
Environment
s1
AgentQ(s1, a) =
0π(s1) = a1a
1
s2r2
δ(s1, a1) = s2
r(s1, a1) = r2
Q(s1, a1) Q(s1, a1) + Δ
π(s2) = a2a2
δ(s2, a2) = s3
r(s2, a2) = r3
s3r3
Transfer in Reinforcement Learning
Imitation methods
training
source
target
policy used
Torrey et al. 2007: Demonstrate a strategy
My Research
Starting-point
methods
Imitation methods
Hierarchical methods
Hierarchical methods
New RL algorithms
SkillTransf
er
MacroTransf
er
RoboCup Domain
3-on-2 BreakAway
3-on-2 KeepAway
3-on-2 MoveDownfield
2-on-1 BreakAway
Inductive Logic Programming
IF [ ]THEN pass(Teammate)
IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate)
IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)
IF distance(Teammate) ≤ 5 THEN pass(Teammate)
IF distance(Teammate) ≤ 10 THEN pass(Teammate)
…
Advice Taking
Find Q-functions that minimize: ModelSize + C × DataMisfit
Batch Reinforcement Learning via Support Vector Regression (RL-SVR)
Environment
Agent
Batch 1
Environment
Agent
Batch 2
…Compute
Q-functions
Advice Taking
Find Q-functions that minimize: ModelSize + C × DataMisfit
Batch Reinforcement Learning with Advice (KBKR)
Environment
Agent
Batch 1
Compute Q-
functions Environment
Agent
Batch 2
…
Advice
+ µ × AdviceMisfit
Skill Transfer Algorithm
Source
Target
IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30THEN pass(Teammate)
ILP
Advice Taking
[Human advice]
Mapping
Selected ResultsSkill transfer to 3-on-2 BreakAway from several tasks
Macro-Operators
pass(Teammate)
move(Direction)
shoot(goalRight)
shoot(goalLeft)
IF [ ... ] THEN pass(Teammate)
IF [ ... ] THEN move(ahead)
IF [ ... ] THEN shoot(goalRight)
IF [ ... ] THEN shoot(goalLeft)
IF [ ... ] THEN pass(Teammate)
IF [ ... ] THEN move(left)
IF [ ... ] THEN shoot(goalRight)
IF [ ... ] THEN shoot(goalRight)
Demonstration
source
target
training
policy used
An imitation method
Macro Transfer AlgorithmSourc
e
Target
ILP
Demonstration
Macro Transfer AlgorithmLearning structures
Positive: BreakAway
games that score
Negative: BreakAway games that didn’t score
ILP
IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE)
THEN isaGoodGame(Game)
Macro Transfer AlgorithmLearning rules for arcs
Positive: states in good games
that took the arc
Negative: states in good games that could have taken the arc but didn’t
ILP
shoot(goalRight)
IF [ … ]THEN enter(State)
IF [ … ]THEN loop(State, Teammate))
pass(Teammate)
Selected ResultsMacro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway
Machine learning is often designed in standalone tasks
Transfer is a natural learning ability that we would like to incorporate into machine learners
There are some successes, but challenges remain, like avoiding negative transfer and automating mapping
Summary