Knowledge Transfer via Multiple Model Local Structure Mapping
Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han†
†University of Illinois at Urbana-Champaign‡IBM T. J. Watson Research Center
2/17
Standard Supervised Learning
New York Times
training (labeled)
test (unlabeled)
Classifier 85.5%
New York Times
3/17
In Reality……
New York Times
training (labeled)
test (unlabeled)
Classifier 64.1%
New York Times
Labeled data not available!Reuters
4/17
Domain Difference Performance Droptrain test
NYT NYT
New York Times New York Times
Classifier 85.5%
Reuters NYT
Reuters New York Times
Classifier 64.1%
ideal setting
realistic setting
5/17
Other Examples• Spam filtering
– Public email collection personal inboxes
• Intrusion detection– Existing types of intrusions unknown types of intrusions
• Sentiment analysis– Expert review articles blog review articles
• The aim– To design learning methods that are aware of the training and
test domain difference
• Transfer learning– Adapt the classifiers learnt from the source domain to the new
domain
6/17
All Sources of Labeled Information
training (labeled)
test (completely unlabel
ed)
Classifier
New York Times
Reuters
Newsgroup
…… ?
7/17
A Synthetic Example
Training(have conflicting concepts)
Test
Partially overlapping
8/17
Goal
SourceDomain Target
Domain
SourceDomain
SourceDomain
• To unify knowledge that are consistent with the test domain from multiple source domains
9/17
Summary of Contributions
• Transfer from multiple source domains– Target domain has no labeled examples
• Do not need to re-train– Rely on base models trained from each dom
ain– The base models are not necessarily develo
ped for transfer learning applications
10/17
Locally Weighted Ensemble
),( yxf k
k
i
iiE yxfxwyxf1
),()(),(
),(2 yxf
C1
C2
Ck
……
Training set 1),(1 yxf
),|(),( ii CxyYPyxf
),(maxarg| yxfxy Ey
Test example xTraining set 2
Training set k
……
)(1 xw
)(2 xw
)(xwk
k
i
i xw1
1)(
X-feature value y-class label
11/17
Optimal Local Weights
C1
C2
Test example x
0.9 0.1
0.4 0.6
0.8 0.2
Higher Weight
• Optimal weights– Solution to a regression problem– Impossible to get since f is unknown!
12/17
Graph-based Heuristics
• Graph-based weights approximation– Map the structures of a model onto the structure
s of the test domain– Weight of a model is proportional to the similarity
between its neighborhood graph and the clustering structure around x.
Higher Weight
13/17
Experiments Setup• Data Sets
– Synthetic data sets– Spam filtering: public email collection personal inboxes (u01,
u02, u03) (ECML/PKDD 2006)– Text classification: same top-level classification problems with
different sub-fields in the training and test sets (Newsgroup, Reuters)
– Intrusion detection data: different types of intrusions in training and test sets.
• Baseline Methods– One source domain: single models (WNN, LR, SVM)– Multiple source domains: SVM on each of the domains– Merge all source domains into one: ALL– Simple averaging ensemble: SMA– Locally weighted ensemble: LWE
14/17
Experiments on Synthetic Data
15/17
Experiments on Real Data
0.5
0.6
0.7
0.8
0.9
1
Spam Newsgroup Reuters
WNNLRSVMSMALWE
0.5
0.6
0.7
0.8
0.9
1
DOS Probi ng R2L
Set 1Set 2ALLSMALWE
16/17
Conclusions• Locally weighted ensemble framework
– transfer useful knowledge from multiple source domains
• Graph-based heuristics to compute weights– Make the framework practical and effecti
ve