transfer learning for link prediction

1

Transfer Learning for Link Prediction

Qiang Yang, 楊強，香港科大 Hong Kong University of Science and Technology

Hong Kong, China

http://www.cse.ust.hk/~qyang

We often find it easier …

2

We often find it easier…

3


4

5


Transfer Learning? 迁移学习…

People often transfer knowledge to novel situations Chess Checkers C++ Java Physics Computer Science

6

Transfer Learning:The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains)

7

But, direct application will not work

Machine Learning: Training and future (test) data

follow the same distribution, and are in same feature space

Machine Learning: Training and future (test) data

follow the same distribution, and are in same feature space

When distributions are different

Classification Accuracy (+, -) Training: 90% Testing: 60%

8

When features and distributions are different

Classification Accuracy (+, -) Training: 90% Testing: 10%

99

When Features are different

Heterogeneous: different feature spaces

12

The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ...

Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ...

Training: Text Future: Images

Apples

Bananas

Transfer Learning: Source Domains

LearningInput Output

Source Domains

13

Source Domain Target Domain

Training Data Labeled/Unlabeled Labeled/Unlabeled

Test Data Unlabeled

Transfer Learning

Multi-task Learning

Transductive Transfer Learning

Unsupervised Transfer Learning

Inductive Transfer Learning

Domain Adaptation

Sample Selection Bias /Covariance Shift

Self-taught Learning

Labeled data are available in a target domain

Labeled data are available only in a

source domain

No labeled data in both source and target domain

No labeled data in a source domain

Labeled data are available in a source domain

Case 1

Case 2Source and

target tasks are learnt

simultaneously

Assumption: different

domains but single task

Assumption: single domain and single task

An overview of An overview of various settings of various settings of transfer learningtransfer learning

目標數據

源數據

14

15

Feature-based Transfer Learning(Dai, Yang et al. ACM KDD 2007)

bridge

CoCC Algorithm (Co-clustering based)

Source: Many labeled

instances Target:

All unlabeled instances

Distributions Feature spaces

can be different, but have overlap

Same classes P(X,Y): different!

16

Document-word co-Document-word co-occurrenceoccurrence

Di

Do

Kn

ow

ledg

etran

sfer

Source

Target

17

Co-Clustering based Classification (on 20 News Group dataset)

Using Transfer Learning

Talk Outline

Transfer Learning: A quick introduction Link prediction and collaborative filtering

problems Transfer Learning for Sparsity Problem

Codebook Method CST Method Collective Link Prediction Method

Conclusions and Future Works

1818

Network Data

Social Networks Facebook, Twitter, Live messenger, etc.

Information Networks The Web (1.0 with hyperlinks and 2.0 with

tags) Citation network, Wikipedia, etc.

Biological Networks Gene network, etc.

Other Networks

19

A Real World Study [Leskovec-Horvitz WWW ‘08]

Who talks to whom on MSN messenger Network: 240M nodes, 1.3 billion edges

Conclusions: Average path length is 6.6 90% of nodes is reachable <8 steps

20

Global Network Structures Small world

Six degrees of separation [Milgram 60s]

Clustering and communities

Evolutionary patterns

Long tail distributions

21

Local Network Structures

Link Prediction A form of Statistical

Relational Learning (Taskar and Getoor)

Object classification: predict category of an object based on its attributes and links

Is this person a student? Link classification: predict

type of a link Are they co-authors?

Link existence: predict whether a link exists or not

22

(credit: Jure Leskovec, ICML ‘09)

Link Prediction

Task: predict missing links in a network Main challenge: Network Data Sparsity

23

Long Tail in Era of Recommendation

Help users discover novel/rare items The long-tail recommendation systems

24

Product Recommendation (Amazon.com)

25

2525

Collaborative Filtering (CF)

Wisdom of the crowd, word of mouth,… Your taste in music/products/movies/… is similar

to that of your friends(Maltz & Ehrlich, 1995)

Key issue: who are your friends?

26

27

Taobao.com

Matrix Factorization model for Link Prediction

30

? 1 ? 1 ? ?

1 ? ? 1 ? ?

? 1 1 ? ? 1

? 1 ? ? 1 ?

1 ? 1 1 1 ?

? ? 1 ? ? ?

Low rank Approximation

m x nk x n

x

m x k

We are seeking a low rank approximation for our target matrix

Such that the unknown value can be predicted by

Data Sparsity in Collaborative Data Sparsity in Collaborative FilteringFiltering

Training Data:

Sparse Rating Matrix

Density <=0.6%,

CF model

Test Data:

Rating Matrix

RMSE:0.9748

Realistic Setting

1

5

3

4

4

?

?

?

?

? ?

10%

32

Transfer Learning for Collaborative Filtering?

36

IMDB Database

Amazon.com

3636

Codebook Transfer

Bin Li, Qiang Yang, Xiangyang Xue. Can Movies and Books Collaborate? Cross-Domain

Collaborative Filtering for Sparsity Reduction. In Proceedings of the Twenty-First International Joint

Conference on Artificial Intelligence (IJCAI '09),

Pasadena, CA, USA, July 11-17, 2009.

37

Codebook Construction

Definition 2.1 (Codebook). A k × l matrix which compresses the cluster-level rating patterns of k user clusters and l item clusters.

Codebook: User prototypes rate on item prototypes Encoding: Find prototypes for users and items and get indices Decoding: Recover rating matrix based on codebook and indices

A B C

III

II

I

a e b f c d

2

6

4

5

1

3

a b c d e f

1

6

5

4

3

2

Codebook

3

2

3

2

3

1

1

3

23 ?3 3

1 11 1

2 2? 2

2 22 2

3 ?3 3

3 33 3

? 33 3

2 22 2

1 11 ?

2

3

2

?

3

3

3

2

3

2

1

1

3

1

3

1

?

2

3

?

3

1

2

2

2

3

2

3

3

?

?

2

3

2

1

1

40

Knowledge Sharing via Cluster-Level Rating Matrix

Source (Dense): Encode cluster-level rating patterns Target (Sparse): Map users/items to the encoded

prototypes

A B C

III

II

I

A B

III

II

I

a e b f c d

2

6

4

5

1

3

c d a b e

1

3

6

2

4

7

a b c d e f

a b c d e

1

6

5

4

3

2

7

1

6

5

4

3

2

BOOKS

(Target - Sparse)

MOVIES

(Auxiliary - Dense)

Cluster-level Rating Pattern

Matching

3

2

3

2

3

1

3

2

3

2

3

1

1

3

2

3 ?3 3

1 11 1

2 2? 2

2 22 2

3 ?3 3

3 33 3

? 33 3

2 22 2

1 11 ?

? 1 11 1 ?1 ? 1

2 ?2 2

3 3 33 ? 3

? 33 3

2 2 ?? 2 2

2

3

2

?

3

3

3

2

3

2

1

1

3

1

3

1

?

2

3

?

3

1

2

2

2

3

2

3

3

?

?

2

3

2

1

1

?

3

1

1

2

3

1

?

?

1

2

3

3

2

?

3

?

2

3

2

3

?

3

?

1

3

1

?

?

3

? 2 3 3 2

Permute rows & cols

5

Reduce to Groups

3 33 ?? 3

41

Step 1: Codebook Construction

Co-cluster rows (users) and columns (items) in Xaux

Get user/item cluster indicators Uaux {0, 1}∈ n×k, Vaux {0, ∈1}m×l

A B C

III

II

I

a e b f c d

2

6

4

5

1

3

Codebook B

3

2

3

2

3

1

1

3

23 ?3 3

1 11 1

2 2? 2

2 22 2

3 ?3 3

3 33 3

? 33 3

2 22 2

1 11 ?

(3+3+3)/3 = 3

(1+1+1+1)/4 = 1

(2+2+2)/3 = 2

……

42

Step 2: Codebook Transfer Objective

Expand target matrix, while minimizing the difference between Xtgt and the reconstructed one

User/item cluster indicators Utgt and Vtgt for Xtgt

Binary weighting matrix W for observed ratings in Xtgt

Alternate greedy searches for Utgt and Vtgt to a local minimum

43

Codebook Transfer

Each user/item in Xtgt matches to a prototype in B

Duplicate certain rows & columns in B to reconstruct Xtgt

Codebook is indeed a two-sided data representation

A B

III

II

I

a b c d e

1

6

5

4

3

2

7Codebook

1

3

1

1

2

3

1

3

1

1

2

3

3

2

3

3

3

2

3

2

3

3

3

2

1

3

1

1

2

3

2 2 3 3 2

Reconstructed

1

6

5

4

3

2

7

1

0

1

1

0

0

0

1

0

0

0

1

0

0

0

0

1

0

0 0 1

C a b c d e

User-cluster Indicators

Item-cluster Indicators

0

0

1

0

0

1

1

0

0

1

0

0

0

0

1

3

2

3

2

3

1

1

3

2

44

Experimental Setup Data Sets

EachMovie (Auxiliary): 500 users × 500 movies MovieLens (Target): 500 users × 1000 movies Book-Crossing (Target): 500 users × 1000 books

Compared Methods Pearson Correlation Coefficients (PCC) Scalable Cluster-based Smoothing (CBS) Weighted Low-rank Approximation (WLR) Codebook Transfer (CBT)

Evaluation Protocol First 100/200/300 users for training; last 200 users for testing Given 5/10/15 observable ratings for each test user

45

Experimental Results (1): Books Movies

MAE Comparison on MovieLens average over 10 sampled test sets Lower is better

46

Limitations of Codebook Transfer Same rating range

Source and target data must have the same range of ratings [1, 5]

Homogenous dimensions User and item dimensions must be similar

In reality Range of ratings can be 0/1 or [1,5] User and item dimensions may be very

different48

Coordinate System Transfer

Weike Pan, Evan Xiang, Nathan Liu and Qiang Yang. Transfer Learning in Collaborative Filtering for

Sparsity Reduction. In Proceedings of the 24th AAAI Conference on

Artificial Intelligence (AAAI-10). Atlanta, Georgia, USA. July 11-15, 2010.

49

Our Solution: Coordinate System Transfer

Step 1: Coordinate System Construction ( ) Step 2: Coordinate System Adaptation

53

Step 1: Coordinate System Adaptation

Adapt the discovered coordinate systems from the auxiliary domain to the target domain,

The effect from the auxiliary domain Initialization: take as seed model in target domain, Regularization:

Coordinate System Transfer

55

AlgorithmCoordinate System Transfer

56

Data Sets and Evaluation Metrics

Data sets (extracted from MovieLens and Netflix)

Mean Absolute Error (MAE) and Root Mean Square Error (RMSE),

Where and are the true and predicted ratings, respectively,

and is the number of test ratings.

Experimental Results

57

Baselines and Parameter Settings

Baselines Average Filling Latent Matrix Factorization (Bell and Koren,

ICDM07) Collective Matrix Factorization (Singh and Gordon,

KDD08) OptSpace (Keshavan, Montanari and Oh, NIPS10)

Average results over 10 random trials are reported


58

Results(1/2)

Observations: CST performs significantly better (t-test) than all baselines at all sparsity levels, Transfer learning methods (CST, CMF) beat two non-transfer learning methods (AF,

LFM).

59

Limitation of CST and CBT

Different source domains are related to the target domain differently Book to Movies Food to Movies

Rating bias Users tend to rate items that they like

Thus there are more rating = 5 than rating = 2

62

Our Solution: Collective Link Prediction (CLP)

Jointly learn multiple domains together Learning the similarity of different domains consistency between domains indicates

similarity. Introduce a link function to correct the bias

63

Bin Cao, Nathan Liu and Qiang Yang. Transfer Learning for Collective Link

Prediction in Multiple Heterogeneous Domains.

In Proceedings of 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel. June 2010.

64

Example: Collective Link PredictionExample: Collective Link Prediction

Predicting

Test Data:

Rating Matrix

RMSE:0.8855

1

5

3

4

4

?

?

?

?

? ?

2 3 1 5

2 4

1 3

1 2 5

3 4 2

1 0 0 1 0

0 1 0 0 1

1 0 1 1 0

1 1 0 0 1

1 0 1 0 0

Dense Auxiliary Book Rating

Matrix

Sparse Training Movie Rating

Matrix

Dense Auxiliary Movie Tagging

Matrix

Transfer Learning

Training Different Rating

Domains

Different

User behaviors

66

Inter-task Similarity

Based on Gaussian process models Key part is the kernel modeling user

relation as well as task relation

67

Making Prediction

Similar to memory-based approach

Similarity between tasks

Similarity between itemsMean of prediction

68


71

Conclusions and Future Work

Transfer Learning ( 舉一反三 ) Link prediction is an important task in

graph/network data mining Key Challenge: sparsity Transfer learning from other domains helps

improve the performance

72

Future Work

Scenario 1: Our objective is to use the dense source

network to help reduce the sparsity of the entire target network

73

Sparse Dense

Target Information

Network

Source Information

Network

Future Work

Scenario 2: To use a dense source network as a bridge to

help connect parts of the target network.

74

Dense

Sparse

Dense

Target Information

Network

Source Information

Network

75

Future: Negative Transfer

Helpful: positive transfer

Harmful:

Neutral:

negative transfer

zero transfer

Future: Negative Transfer “To Transfer or Not to Transfer”

Rosenstein, Marx, Kaelbling and Dietterich Inductive Transfer Workshop, NIPS 2005. (Task:

meeting invitation and acceptance)

76

Acknowledgement

HKUST: Sinno J. Pan, Huayan Wang, Bin Cao, Evan

Wei Xiang, Derek Hao Hu, Nathan Nan Liu, Vincent Wenchen Zheng

Shanghai Jiaotong University: Wenyuan Dai, Guirong Xue, Yuqiang Chen,

Prof. Yong Yu, Xiao Ling, Ou Jin. Visiting Students

Bin Li (Fudan U.), Xiaoxiao Shi (Zhong Shan U.),

77

transfer learning for link prediction

Documents