knowledge transfer via multiple model local structure mapping jing gao wei fan jing jiangjiawei han...

Knowledge Transfer via Multiple Model Local Structure Mapping

Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han†

†University of Illinois at Urbana-Champaign‡IBM T. J. Watson Research Center

2/17

Standard Supervised Learning

New York Times

training (labeled)

test (unlabeled)

Classifier 85.5%

New York Times

3/17

In Reality……

New York Times

training (labeled)

test (unlabeled)

Classifier 64.1%

New York Times

Labeled data not available!Reuters

4/17

Domain Difference Performance Droptrain test

NYT NYT

New York Times New York Times

Classifier 85.5%

Reuters NYT

Reuters New York Times

Classifier 64.1%

ideal setting

realistic setting

5/17

Other Examples• Spam filtering

– Public email collection personal inboxes

• Intrusion detection– Existing types of intrusions unknown types of intrusions

• Sentiment analysis– Expert review articles blog review articles

• The aim– To design learning methods that are aware of the training and

test domain difference

• Transfer learning– Adapt the classifiers learnt from the source domain to the new

domain

6/17

All Sources of Labeled Information

training (labeled)

test (completely unlabel

ed)

Classifier

New York Times

Reuters

Newsgroup

…… ?

7/17

A Synthetic Example

Training(have conflicting concepts)

Test

Partially overlapping

8/17

Goal

SourceDomain Target

Domain

SourceDomain

SourceDomain

• To unify knowledge that are consistent with the test domain from multiple source domains

9/17

Summary of Contributions

• Transfer from multiple source domains– Target domain has no labeled examples

• Do not need to re-train– Rely on base models trained from each dom

ain– The base models are not necessarily develo

ped for transfer learning applications

10/17

Locally Weighted Ensemble

),( yxf k

k

i

iiE yxfxwyxf1

),()(),(

),(2 yxf

C1

C2

Ck

……

Training set 1),(1 yxf

),|(),( ii CxyYPyxf

),(maxarg| yxfxy Ey

Test example xTraining set 2

Training set k

……

)(1 xw

)(2 xw

)(xwk

k

i

i xw1

1)(

X-feature value y-class label

11/17

Optimal Local Weights

C1

C2

Test example x

0.9 0.1

0.4 0.6

0.8 0.2

Higher Weight

• Optimal weights– Solution to a regression problem– Impossible to get since f is unknown!

12/17

Graph-based Heuristics

• Graph-based weights approximation– Map the structures of a model onto the structure

s of the test domain– Weight of a model is proportional to the similarity

between its neighborhood graph and the clustering structure around x.

Higher Weight

13/17

Experiments Setup• Data Sets

– Synthetic data sets– Spam filtering: public email collection personal inboxes (u01,

u02, u03) (ECML/PKDD 2006)– Text classification: same top-level classification problems with

different sub-fields in the training and test sets (Newsgroup, Reuters)

– Intrusion detection data: different types of intrusions in training and test sets.

• Baseline Methods– One source domain: single models (WNN, LR, SVM)– Multiple source domains: SVM on each of the domains– Merge all source domains into one: ALL– Simple averaging ensemble: SMA– Locally weighted ensemble: LWE

14/17

Experiments on Synthetic Data

15/17

Experiments on Real Data

0.5

0.6

0.7

0.8

0.9

1

Spam Newsgroup Reuters

WNNLRSVMSMALWE

0.5

0.6

0.7

0.8

0.9

1

DOS Probi ng R2L

Set 1Set 2ALLSMALWE

16/17

Conclusions• Locally weighted ensemble framework

– transfer useful knowledge from multiple source domains

• Graph-based heuristics to compute weights– Make the framework practical and effecti

ve

knowledge transfer via multiple model local structure mapping jing gao wei fan jing jiangjiawei han...

Documents

new domain slide

reuters slide

new york times slide

multiple source domains

test domain weight

overlapping slide

lwe slide

synthetic data slide