machine learning via advice taking jude shavlik. thanks to... rich maclin lisa torrey trevor walker...
Post on 20-Jan-2016
233 views
TRANSCRIPT
![Page 1: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/1.jpg)
Machine Learning via
Advice Taking
Jude Shavlik
![Page 2: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/2.jpg)
Thanks To ...
Rich MaclinLisa TorreyTrevor Walker
Prof. Olvi MangasarianGlenn FungTed Wild
DARPA
![Page 3: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/3.jpg)
Quote (2002) from DARPA
Sometimes an assistant will merely watch you and draw conclusions.
Sometimes you have to tell a new person, 'Please don't do it this way' or 'From now on when I say X, you do Y.'
It's a combination of learning by example and by being guided.
![Page 4: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/4.jpg)
Widening the “Communication Pipeline” between Humans and Machine Learners
Teacher
Pupil
Machine Learner
![Page 5: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/5.jpg)
Our Approach to Building Better Machine Learners
• Human partner expresses advice “naturally” and w/o knowledge of ML agent’s internals
• Agent incorporates advice directly into the function it is learning
• Additional feedback (rewards, I/O pairs, inferred labels, more advice) used to refine learner continually
![Page 6: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/6.jpg)
“Standard” Machine Learning vs. Theory Refinement
• Positive Examples (“should see doctor”) temp = 102.1, age = 21, sex = F, …
temp = 101.7, age = 37, sex = M, …
• Negative Examples (“take two aspirins”) temp = 99.1, age = 43, sex = M, …
temp = 99.6, age = 24, sex = F, …
• Approximate Domain Knowledge if temp = high and age = young … then neg example
Related work by labs of Mooney, Pazzani, Cohen, Giles, etc
![Page 7: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/7.jpg)
Rich Maclin’s PhD (1995)
IF a Bee is (Near and West) & an Ice is (Near and North)Then Begin Move East Move North END
![Page 8: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/8.jpg)
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
0
10
00
20
00
30
00
40
00
Number of Training Episodes
Re
info
rce
me
nt
on
Te
sts
et
Sample Results
Without advice
With advice
![Page 9: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/9.jpg)
Our Motto
Give advice
rather than commands
to your computer
![Page 10: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/10.jpg)
Outline
Prior Knowledge and Support Vector Machines Intro to SVM’s Linear Separation Non-Linear Separation Function Fitting (“Regression”) Advice-Taking Reinforcement Learning Transfer Learning via Advice Taking
![Page 11: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/11.jpg)
Support Vector MachinesMaximizing the Margin between Bounding Planes
x0w = í + 1
x0w = í à 1
A+
A-
jjwjj22
Support Vectors?
Margin
![Page 12: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/12.jpg)
Linear Algebra for SVM’s
• Given p points in n dimensional space• Represent by p-by-n matrix A of reals
• More succinctly
D(Awà eí )=e;where e is vector of ones
• Separate by two bounding planes
A iw=í + 1; for D i i = + 1;
A iw5 í à 1; for D i i = à 1:
• Each Ai in class +1 or -1
![Page 13: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/13.jpg)
“Slack” VariablesDealing with Data that is not Linearly Separable
A+
A-
y
Support Vectors
![Page 14: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/14.jpg)
Support Vector Machines Quadratic Programming Formulation
• Solve this quadratic program
D(Awà eí ) >e;y > 0;w; í e0ymin
s.t. + y÷ + 2
1jjwjj22
• Maximize margin by minimizing21kwk2
2jjwjj22
• Minimize sum of slack vars with wgt
÷e0y
![Page 15: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/15.jpg)
Support Vector MachinesLinear Programming Formulation
Use 1-norm instead of 2-norm(typically runs faster; better feature selection;might generalize better, NIPS ‘03)
÷e0y+ kwk1y > 0;w; í
D(Awà eí ) + y > e
min
s.t.
![Page 16: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/16.jpg)
Knowledge-Based SVM’sGeneralizing “Example” from POINT to REGION
A+
A-
![Page 17: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/17.jpg)
Incorporating “Knowledge Sets”
Into the SVM Linear Program
This implication equivalent to set of constraints (proof in NIPS ’02 paper)
• Suppose that knowledge set belongs to class A+
Hence must lie in half space
èx??Bx 6 b
é
èxjx0w>í + 1
é
Bx6b ) x0w>í + 1
• We therefore have the implication
![Page 18: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/18.jpg)
Resulting LP for KBSVM’s
We get this linear program (LP)
Ranges over # regions
![Page 19: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/19.jpg)
KBSVM with Slack Variables
Was 0
![Page 20: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/20.jpg)
SVMs and Non-Linear Separating Surfaces
f1
f2 +
+
_
_
h(f1, f2)
g(f1, f2) +
+
_
_
Non-linearly map to new space
Linearly separate in new space (using kernels)
Result is non-linear separator in original space
Fung et al. (2003) presents knowledge-
based non-linear SVMs
![Page 21: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/21.jpg)
Support Vector Regression(aka Kernel Regression)
Linearly approximating a function, given array A of inputs and vector y of (numeric) outputs
f(x) ≈ x’w + b
Find weights such that
Aw + be ≈ y
In dual space, w = A’, so get
(A A’) + be ≈ y
Kernel’izing (to get non-linear approx)
K(A,A’) + be ≈ y
y
x
![Page 22: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/22.jpg)
What to Optimize?
Linear program to optimize
• 1st term () is “regularizer” that minimizes model complexity
• 2nd term is approximation error, weighted by parameter C
• Classical “least squares” fit if quadratic version and first term ignored
![Page 23: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/23.jpg)
Predicting Y for New X
y = K(x’, A’) + b
• Use Kernel to compute “distance” to each training point (ie, row in A)
• Weight by i (hopefully many i are zero), Sum
• Add b (a scalar)
![Page 24: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/24.jpg)
Knowledge-Based SVRMangasarian, Shavlik, & Wild, JMLR ‘04
Add soft constraints to linear program (so need only follow advice approximately)
minimize ||w||1 + C ||s||1
+ penalty for violating advice
such that y - s Aw + b y + s “slacked” match to advice
Advice: In this region, y should exceed 4
S
y
4
![Page 25: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/25.jpg)
Testbeds: Subtasks of RoboCup
Keep ball from opponents
[Stone & Sutton, ICML 2001]
Mobile KeepAway
Score goal
[Maclin et al., AAAI 2005]
BreakAway
![Page 26: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/26.jpg)
Reinforcement Learning Overview
Take an actionReceive a state
Receive a reward
Policy: choose the action with the highest Q-value in the current state
Use the rewards to
estimate the Q-values of actions in
states
Described by a set of features
![Page 27: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/27.jpg)
Incorporating Advice in KBKR
Advice format Bx ≤ d f(x) ≥ hx +
TeammatedistanceTo
shotAngle
GoaldistanceTo
is x
0 1 0
0 0 1
-
bwx
30
10x
9.0
If distanceToGoal ≤ 10 and
shotAngle ≥ 30
Then Q(shoot) ≥ 0.9
![Page 28: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/28.jpg)
Giving Advice About Relative Values of Multiple Functions
Maclin et al, AAAI ’05
When the input satisfies
preconditions(input)
Then
f1(input) > f2(input)
![Page 29: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/29.jpg)
Sample Advice-Taking Results
if distanceToGoal 10
and shotAngle 30
then prefer shoot over all other actions
0.0
0.2
0.4
0.6
0.8
1.0
0 5000 10000 15000 20000 25000
Games Played
Pro
b(S
core
Go
al)
advice
std RL2 vs 1 BreakAway, rewards +1, -1
Q(shoot) > Q(pass)Q(shoot) > Q(move)
![Page 30: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/30.jpg)
Transfer Learning
Agent discovers how tasks are related
We use a user
mappingto tell the agent this
Agent learns Task A
Agent encounters related Task B
Agent uses knowledge from Task A to learn Task B faster
Task A is the
source Task B is
the target
![Page 31: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/31.jpg)
Transfer Learning:The Goal for the Target Task
perf
orm
ance
training
with transfer
without transfer
better start
faster rise better asymptote
![Page 32: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/32.jpg)
Our Transfer Algorithm
Observe source task games to learn skills
Use ILP to create advice for the target task
Learn target taskwith KBKR
Translate learned skills
into transfer advice
If there is user advice, add it
in
![Page 33: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/33.jpg)
Learning Skills By Observation
• Source-task games are sequences: (state, action)• Learning skills is like learning to classify states
by their correct actions• ILP = Inductive Logic Programming
State 1distBetween(me,teammate2) = 15distBetween(me,teammate1) = 10distBetween(me,opponent1) = 5...action = pass(teammate2)outcome = caught(teammate2)
![Page 34: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/34.jpg)
ILP: Searching for First-Order Rules
P :- true
P :- Q P :- R P :- S
P :- R, Q P :- R, S
P :- R, S, V, W, XWe also use a
random-sampling approach
![Page 35: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/35.jpg)
Advantages of ILP
• Can produce first-order rules for skills• Capture only the essential aspects of the skill• We expect these aspects to transfer better
• Can incorporate background knowledge
pass(Teammate)
pass(teammate1)
pass(teammateN)
vs....
![Page 36: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/36.jpg)
Example of a Skill Learned by ILP from KeepAway
pass(Teammate) :- distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7.
Also gave “human” advice about shooting, since that is new skill in BreakAway
![Page 37: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/37.jpg)
TL Level 7: KA to BA Raw Curves
![Page 38: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/38.jpg)
TL Level 7: KA to BA Averaged Curves
![Page 39: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/39.jpg)
TL Level 7: StatisticsTL Metrics Average Reward
Type Name KA to BA MD to BA
Score P Value Score P Value
I Jump start 0.05 0.0312 0.08 0.0086
Jump start smoothed 0.08 0.0002 0.06 0.0014
II Transfer ratio 1.82 0.0034 1.86 0.0004
Transfer ratio (truncated) 1.82 0.0032 1.86 0.0004
Average relative reduction (narrow)
0.58 0.0042 0.54 0.0004
Average relative reduction (wide) 0.70 0.0018 0.71 0.0008
Ratio (of area under the curves) 1.37 0.0056 1.41 0.0012
Transfer difference 503.57 0.0046 561.27
0.0008
Transfer difference (scaled) 1017.00
0.0040 1091.2
0.0016
III Asymptotic advantage 0.09 0.0086 0.11 0.0040
Asymptotic advantage smoothed 0.08 0.0116 0.10 0.0030
Boldface indicates a significant difference was found
![Page 40: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/40.jpg)
Conclusion
• Can use much more than I/O pairs in ML
• Give advice to computers; theyautomatically refine it based on feedback from user or environment
• Advice an appealing mechanism for transferring learned knowledgecomputer-to-computer
![Page 41: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/41.jpg)
Some Papers (on-line, use Google :-)
Creating Advice-Taking Reinforcement Learners, Maclin & Shavlik, Machine Learning 1996
Knowledge-Based Support Vector Machine Classifiers, Fung, Mangasarian, & Shavlik, NIPS 2002
Knowledge-Based Nonlinear Kernel Classifiers, Fung, Mangasarian, & Shavlik, COLT 2003
Knowledge-Based Kernel Approximation, Mangasarian, Shavlik, & Wild, JAIR 2004
Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression, Maclin, Shavlik, Torrey, Walker, & Wild, AAAI 2005
Skill Acquisition via Transfer Learning and Advice Taking, Torrey, Shavlik, Walker, & Maclin, ECML 2006
![Page 42: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/42.jpg)
Backups
![Page 43: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/43.jpg)
Breakdown of Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1000 2000 3000 4000 5000
Games Played
Pro
bab
ilit
y o
f G
oal
all advice
transfer advice onlyuser advice only
no advice
![Page 44: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/44.jpg)
What if User Advice is Bad?
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1000 2000 3000 4000 5000
Games Played
Pro
bab
ility
of
Go
al
Transfer with good advice
Transfer with bad adviceBad advice only
No advice
![Page 45: Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56649d785503460f94a5ad8f/html5/thumbnails/45.jpg)
Related Work on Transfer
• Q-function transfer in RoboCup• Taylor & Stone (AAMAS 2005, AAAI 2005)
• Transfer via policy reuse• Fernandez & Veloso (AAMAS 2006, ICML workshop
2006)• Madden & Howley (AI Review 2004)• Torrey et al. (ECML 2005)
• Transfer via relational RL• Driessens et al. (ICML workshop 2006)