knowledge transfer via multiple model local structure mapping jing gao wei fan jing jiangjiawei han...
TRANSCRIPT
Knowledge Transfer via Multiple Model Local Structure Mapping
Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han†
†University of Illinois at Urbana-Champaign‡IBM T. J. Watson Research Center
2/17
Standard Supervised Learning
New York Times
training (labeled)
test (unlabeled)
Classifier 85.5%
New York Times
3/17
In Reality……
New York Times
training (labeled)
test (unlabeled)
Classifier 64.1%
New York Times
Labeled data not available!Reuters
4/17
Domain Difference Performance Droptrain test
NYT NYT
New York Times New York Times
Classifier 85.5%
Reuters NYT
Reuters New York Times
Classifier 64.1%
ideal setting
realistic setting
5/17
Other Examples• Spam filtering
– Public email collection personal inboxes
• Intrusion detection– Existing types of intrusions unknown types of intrusions
• Sentiment analysis– Expert review articles blog review articles
• The aim– To design learning methods that are aware of the training and
test domain difference
• Transfer learning– Adapt the classifiers learnt from the source domain to the new
domain
6/17
All Sources of Labeled Information
training (labeled)
test (completely unlabel
ed)
Classifier
New York Times
Reuters
Newsgroup
…… ?
7/17
A Synthetic Example
Training(have conflicting concepts)
Test
Partially overlapping
8/17
Goal
SourceDomain Target
Domain
SourceDomain
SourceDomain
• To unify knowledge that are consistent with the test domain from multiple source domains
9/17
Summary of Contributions
• Transfer from multiple source domains– Target domain has no labeled examples
• Do not need to re-train– Rely on base models trained from each dom
ain– The base models are not necessarily develo
ped for transfer learning applications
10/17
Locally Weighted Ensemble
),( yxf k
k
i
iiE yxfxwyxf1
),()(),(
),(2 yxf
C1
C2
Ck
……
Training set 1),(1 yxf
),|(),( ii CxyYPyxf
),(maxarg| yxfxy Ey
Test example xTraining set 2
Training set k
……
)(1 xw
)(2 xw
)(xwk
k
i
i xw1
1)(
X-feature value y-class label
11/17
Optimal Local Weights
C1
C2
Test example x
0.9 0.1
0.4 0.6
0.8 0.2
Higher Weight
• Optimal weights– Solution to a regression problem– Impossible to get since f is unknown!
12/17
Graph-based Heuristics
• Graph-based weights approximation– Map the structures of a model onto the structure
s of the test domain– Weight of a model is proportional to the similarity
between its neighborhood graph and the clustering structure around x.
Higher Weight
13/17
Experiments Setup• Data Sets
– Synthetic data sets– Spam filtering: public email collection personal inboxes (u01,
u02, u03) (ECML/PKDD 2006)– Text classification: same top-level classification problems with
different sub-fields in the training and test sets (Newsgroup, Reuters)
– Intrusion detection data: different types of intrusions in training and test sets.
• Baseline Methods– One source domain: single models (WNN, LR, SVM)– Multiple source domains: SVM on each of the domains– Merge all source domains into one: ALL– Simple averaging ensemble: SMA– Locally weighted ensemble: LWE
14/17
Experiments on Synthetic Data
15/17
Experiments on Real Data
0.5
0.6
0.7
0.8
0.9
1
Spam Newsgroup Reuters
WNNLRSVMSMALWE
0.5
0.6
0.7
0.8
0.9
1
DOS Probi ng R2L
Set 1Set 2ALLSMALWE
16/17
Conclusions• Locally weighted ensemble framework
– transfer useful knowledge from multiple source domains
• Graph-based heuristics to compute weights– Make the framework practical and effecti
ve