learning to learn by exploiting prior knowledge
DESCRIPTION
Learning to Learn By Exploiting Prior Knowledge. Tatiana Tommasi Idiap Research Institute École Polytechnique Fédérale de Lausanne Switzerland Oxford, October 22, 2012. Example - Learning. Task Training Experience A performance measure. “I want to learn Italian”. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/1.jpg)
Learning to Learn By Exploiting Prior Knowledge
Tatiana Tommasi
Idiap Research InstituteÉcole Polytechnique Fédérale de Lausanne
Switzerland
Oxford, October 22, 2012
![Page 2: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/2.jpg)
Example - Learning
1
Task
Training Experience
A performance measure
“I want to learn Italian”
“Bionji”… “ Buonyo” “Buongiorno”
An agent learns if its performance at a task improves with experience (Mitchell, 1996)
![Page 3: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/3.jpg)
Example – Learning to Learn
2
An agent learns to learn if its performance at each tasks improves with experience and with the number of tasks (Thrun, 1996)
Tasks
Training Experience
Performance measures
“I want to learn Italian and French”
Fr: “Bonjour”It: “Buongiorno”
![Page 4: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/4.jpg)
What is this?
3
A fruit
Does it look like some other fruit?
![Page 5: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/5.jpg)
Does it look similar to something else?
4
Analogical reasoning: if we already know the appearance of some objects we can use it as reference information when learning something new.
![Page 6: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/6.jpg)
Knowledge Transfer
5
Storing knowledge gained while solving one problem and applying it to a different but related problem.
Source/Sources Target: Guava
Learning to learn: some transfer must occur between multiple tasks with a positive impact on the performance.
![Page 7: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/7.jpg)
Domain Adaptation
6
Source/Sources Target
Domain adaptation is needed when the data distribution of the test domain is different from that of the training domain.
![Page 8: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/8.jpg)
Multi-Task Learning
7
Task 1 Task 2 Task 3
Learning over multiple tasks at the same time by exploiting a symmetric share of information.
![Page 9: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/9.jpg)
Learning to Learn
8
Sharing Information• Knowledge Transfer• Domain Adaptation• Multi-Task Learning
Dynamic Process• Online Learning: continuous update of the current knowledge.• Active Learning: interactively query an oracle to obtain the desired outputs at new data points.
![Page 10: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/10.jpg)
Learning to Learn
8
Sharing Information• Knowledge Transfer• Domain Adaptation• Multi-Task Learning
Dynamic Process• Online Learning: continuous update of the current knowledge.• Active Learning: interactively query an oracle to obtain the desired outputs at new data points.
Exploit Prior Knowledge
![Page 11: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/11.jpg)
Knowledge Transfer: Advantages
9
Particularly useful when few target training samples are available: boost the learning process.
![Page 12: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/12.jpg)
10
What to Transfer? Specify the form of the knowledge to transfer: instances, features, models.
How to Transfer? Define a learning algorithm able to exploit prior knowledge.
When to Transfer? Evaluate the task relatedness, keep useful knowledge and reject bad information (avoid negative transfer).
Knowledge Transfer: Challenges
![Page 13: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/13.jpg)
11
What to Transfer? Learning models.How to Transfer? Discriminative learning approach.When to Transfer? Automatic evaluation.
My choices
Intuition
![Page 14: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/14.jpg)
11
What to Transfer? Learning models.How to Transfer? Discriminative learning approach.When to Transfer? Automatic evaluation.
My choices
Intuition
![Page 15: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/15.jpg)
12
I want to learn … vs
Target Problem
• Given a set of data • Find a function
Minimize the structural risk
• Linear models• Feature mapping with
Optimization problem
![Page 16: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/16.jpg)
13
I already know … vs
Source Problem
• A source a set of data • with
• Pre-learned model on the source. : solution of the learning problem on the source
![Page 17: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/17.jpg)
14
What to Transfer
• Consider J source models
• : solution of the learning problem on the j-th source expressed as a weighted sum of kernel functions.
• Use as a reference knowledge when learning
What to transfer? Discriminative models.
![Page 18: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/18.jpg)
15
How and When to Transfer
How: adaptive regularization.When, how much: reweighted source knowledge.
• Evaluate the relevance of each source• Solve the target learning problem.
We name KT the obtained Knowledge Transfer approach.
[T. Tommasi and B. Caputo, BMVC 2009][T. Tommasi et al., CVPR 2010]
![Page 19: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/19.jpg)
Solve the target learning problem
Use the square loss
Solve
Adaptive Least-Square Support Vector Machines
LS-SVM (Suykens et al, 2002)• square loss: predict correctly each sample;• not sparse: all the training samples are considered;• solution: set of linear equations.
16
![Page 20: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/20.jpg)
Solving Procedure
17
In matricial form
where
The model parameters can be calculated by matrix inversion
Solution:
Classifier:
![Page 21: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/21.jpg)
Leave-One-Out PredictionWe can train the learning method on N samples and obtain as a byproduct the prediction for each training sample as if it was left out from the training set.
The Leave-One-Out error is an almost unbiased estimator of the generalization error (Lunz and Brailovsky, 1969). 1
8
![Page 22: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/22.jpg)
Evaluate the relevance of each source
The best values for beta are those producing positive values for for each i. To have a convex formulation we consider
and solve
19
![Page 23: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/23.jpg)
Experiments – Mixed Classes
• Visual Object Classification• Caltech-256• Binary problems: object vs non-object• Features: PHOG, SIFT, Region Covariance, LBP
10 mixed classes, one target and nine sources.
20
![Page 24: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/24.jpg)
Results – Mixed Classes
21
![Page 25: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/25.jpg)
Experiments – 6 Unrelated Classes
• Visual Object Classification• Caltech-256• Binary problems: object vs non-object• Features: PHOG, SIFT, Region Covariance, LBP
6 unrelated classes, one target and five sources.
22
![Page 26: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/26.jpg)
Results – 6 Unrelated Classes
23
![Page 27: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/27.jpg)
Experiments – 2 Unrelated Classes
• Visual Object Classification • Caltech-256• Binary problems: object vs non-object• Features: SIFT
2 unrelated classes, one target and one source.
24
![Page 28: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/28.jpg)
25
Results – 2 Unrelated Classes
![Page 29: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/29.jpg)
Transfer Weights and Semantic Similarity
• Use the vectors b to define a matrix of class dissimilarities.• Apply multidimensional scaling (two dimensions).
26
![Page 30: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/30.jpg)
Transfer Weights and Semantic Similarity
• Use the vectors b to define a matrix of class dissimilarities.• Apply multidimensional scaling (two dimensions).
26
![Page 31: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/31.jpg)
Extension: Multiclass Domain Adaptation
• g = 1, ..., G classes fixed for both source and target;• discriminates class g as positive from all the others considered as negative;• class prediction
Leave-One-Out predictions
27
![Page 32: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/32.jpg)
When and How Much to Transfer
We suffer a loss which is linearly proportional to the difference between the confidence of the correct label and the maximum among the confidence of the other labels.
Final objective function
28
![Page 33: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/33.jpg)
29
Three Possible Schemes 1.
![Page 34: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/34.jpg)
Three Possible Schemes 2.
30
![Page 35: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/35.jpg)
Three Possible Schemes 3.
31
![Page 36: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/36.jpg)
Application
32
[T. Tommasi et al, IEEE Transaction on Robotics 2012]
Personalization of a pre-existent model.
• Task: Hand posture classification.• Electrodes applied on the forearm collect sEMG signals.
Goals: • reduce the training time of a mechanical hand prosthesis through adaptive learning over several known subjects.• augment the control abilities over hand prosteses.
![Page 37: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/37.jpg)
Experimental setup
• 10 healthy subjects• 7 sEMG electrodes• 3 grasping actions plus rest
33
![Page 38: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/38.jpg)
34
Experimental results
![Page 39: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/39.jpg)
More Subjects and Postures
• 20 healthy subjects• 10 sEMG electrodes• 6 actions plus rest
35
![Page 40: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/40.jpg)
Leveraging over source models: Limits
36
• Restriction to binary problems (transfer learning) or multiclass with the same set of classes in the source and in the target (domain adaptation).
• The source and the target models should live in the same space: same features and learning parameters.
• Batch method, re-evaluate the relevance of each source knowledge every time a new training sample is available.
![Page 41: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/41.jpg)
Feature Transfer
37
[L. Jie*, T. Tommasi*, B. Caputo, ICCV 2011]
• Use the source models as experts that predict on the target samples.• Use the output of the prediction as additional feature elements.
• Cast the problem in the multi-kernel learning framework (Multiple Kernel Transfer Learning MKTL).• Principled multiclass formulation
![Page 42: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/42.jpg)
Online Learning
38
[T. Tommasi et al, BMVC 2012]
Combine Online Learning and Knowledge Transfer such that they can get a reciprocal benefit.
• Avoid to re-evaluate at each step the relevance of source knowledge.
• Obtain an online learning approach with robust generalization capacity.
Transfer Initialized Online Learning (TROL)
![Page 43: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/43.jpg)
Exploit Exisiting Visual Resources (partially overlapping label sets)
39
[T. Tommasi et al, ACCV 2012]
• General case of many visual dataset (tasks) with some common classes. No explicit class alignment.• No model already learned, only samples available, eventually represented with different feature descriptors.• Define a representation which decomposes in two orthogonalparts: one shared and one private for each task.• Use the generic knowledge coded in the shared part whenlearning on a new target problem.
Multi-Task Unaligned Shared Knowledge Transfer (MUST)
Cross-Database Generalization
![Page 44: Learning to Learn By Exploiting Prior Knowledge](https://reader035.vdocuments.mx/reader035/viewer/2022062410/56816271550346895dd2e0e5/html5/thumbnails/44.jpg)
Take Home Message
• It is possible to define learning algorithms that automatically evaluate the relevance of prior knowledge when addressing a new target problem with few training examples.
• The described approaches consistently outperforms learning from scratch both in transfer learning and domain adaptation problems.
• It is possible to reproduce artificially different aspects of the“human analogical reasoning process”.
40