transfer learning for link prediction
DESCRIPTION
Transfer Learning for Link Prediction. Qiang Yang, 楊強,香港科大 Hong Kong University of Science and Technology Hong Kong, China http://www.cse.ust.hk/~qyang. We often find it easier …. We often find it easier…. We often find it easier…. We often find it easier…. Transfer Learning? 迁移学习 …. - PowerPoint PPT PresentationTRANSCRIPT
1
Transfer Learning for Link Prediction
Qiang Yang, 楊強,香港科大 Hong Kong University of Science and Technology
Hong Kong, China
http://www.cse.ust.hk/~qyang
We often find it easier …
2
We often find it easier…
3
We often find it easier…
4
5
We often find it easier…
Transfer Learning? 迁移学习…
People often transfer knowledge to novel situations Chess Checkers C++ Java Physics Computer Science
6
Transfer Learning:The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks (or new domains)
7
But, direct application will not work
Machine Learning: Training and future (test) data
follow the same distribution, and are in same feature space
Machine Learning: Training and future (test) data
follow the same distribution, and are in same feature space
When distributions are different
Classification Accuracy (+, -) Training: 90% Testing: 60%
8
When features and distributions are different
Classification Accuracy (+, -) Training: 90% Testing: 10%
99
When Features are different
Heterogeneous: different feature spaces
12
The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae ...
Banana is the common name for a type of fruit and also the herbaceous plants of the genus Musa which produce this commonly eaten fruit ...
Training: Text Future: Images
Apples
Bananas
Transfer Learning: Source Domains
LearningInput Output
Source Domains
13
Source Domain Target Domain
Training Data Labeled/Unlabeled Labeled/Unlabeled
Test Data Unlabeled
Transfer Learning
Multi-task Learning
Transductive Transfer Learning
Unsupervised Transfer Learning
Inductive Transfer Learning
Domain Adaptation
Sample Selection Bias /Covariance Shift
Self-taught Learning
Labeled data are available in a target domain
Labeled data are available only in a
source domain
No labeled data in both source and target domain
No labeled data in a source domain
Labeled data are available in a source domain
Case 1
Case 2Source and
target tasks are learnt
simultaneously
Assumption: different
domains but single task
Assumption: single domain and single task
An overview of An overview of various settings of various settings of transfer learningtransfer learning
目標數據
源數據
14
15
Feature-based Transfer Learning(Dai, Yang et al. ACM KDD 2007)
bridge
CoCC Algorithm (Co-clustering based)
Source: Many labeled
instances Target:
All unlabeled instances
Distributions Feature spaces
can be different, but have overlap
Same classes P(X,Y): different!
16
Document-word co-Document-word co-occurrenceoccurrence
Di
Do
Kn
ow
ledg
etran
sfer
Source
Target
17
Co-Clustering based Classification (on 20 News Group dataset)
Using Transfer Learning
Talk Outline
Transfer Learning: A quick introduction Link prediction and collaborative filtering
problems Transfer Learning for Sparsity Problem
Codebook Method CST Method Collective Link Prediction Method
Conclusions and Future Works
1818
Network Data
Social Networks Facebook, Twitter, Live messenger, etc.
Information Networks The Web (1.0 with hyperlinks and 2.0 with
tags) Citation network, Wikipedia, etc.
Biological Networks Gene network, etc.
Other Networks
19
A Real World Study [Leskovec-Horvitz WWW ‘08]
Who talks to whom on MSN messenger Network: 240M nodes, 1.3 billion edges
Conclusions: Average path length is 6.6 90% of nodes is reachable <8 steps
20
Global Network Structures Small world
Six degrees of separation [Milgram 60s]
Clustering and communities
Evolutionary patterns
Long tail distributions
21
Local Network Structures
Link Prediction A form of Statistical
Relational Learning (Taskar and Getoor)
Object classification: predict category of an object based on its attributes and links
Is this person a student? Link classification: predict
type of a link Are they co-authors?
Link existence: predict whether a link exists or not
22
(credit: Jure Leskovec, ICML ‘09)
Link Prediction
Task: predict missing links in a network Main challenge: Network Data Sparsity
23
Long Tail in Era of Recommendation
Help users discover novel/rare items The long-tail recommendation systems
24
Product Recommendation (Amazon.com)
25
2525
Collaborative Filtering (CF)
Wisdom of the crowd, word of mouth,… Your taste in music/products/movies/… is similar
to that of your friends(Maltz & Ehrlich, 1995)
Key issue: who are your friends?
26
27
Taobao.com
28
Matrix Factorization model for Link Prediction
30
? 1 ? 1 ? ?
1 ? ? 1 ? ?
? 1 1 ? ? 1
? 1 ? ? 1 ?
1 ? 1 1 1 ?
? ? 1 ? ? ?
Low rank Approximation
m x nk x n
x
m x k
We are seeking a low rank approximation for our target matrix
Such that the unknown value can be predicted by
Data Sparsity in Collaborative Data Sparsity in Collaborative FilteringFiltering
Training Data:
Sparse Rating Matrix
Density <=0.6%,
CF model
Test Data:
Rating Matrix
RMSE:0.9748
Realistic Setting
1
5
3
4
4
?
?
?
?
? ?
10%
32
Transfer Learning for Collaborative Filtering?
36
IMDB Database
Amazon.com
3636
Codebook Transfer
Bin Li, Qiang Yang, Xiangyang Xue. Can Movies and Books Collaborate? Cross-Domain
Collaborative Filtering for Sparsity Reduction. In Proceedings of the Twenty-First International Joint
Conference on Artificial Intelligence (IJCAI '09),
Pasadena, CA, USA, July 11-17, 2009.
37
Codebook Construction
Definition 2.1 (Codebook). A k × l matrix which compresses the cluster-level rating patterns of k user clusters and l item clusters.
Codebook: User prototypes rate on item prototypes Encoding: Find prototypes for users and items and get indices Decoding: Recover rating matrix based on codebook and indices
A B C
III
II
I
a e b f c d
2
6
4
5
1
3
a b c d e f
1
6
5
4
3
2
Codebook
3
2
3
2
3
1
1
3
23 ?3 3
1 11 1
2 2? 2
2 22 2
3 ?3 3
3 33 3
? 33 3
2 22 2
1 11 ?
2
3
2
?
3
3
3
2
3
2
1
1
3
1
3
1
?
2
3
?
3
1
2
2
2
3
2
3
3
?
?
2
3
2
1
1
40
Knowledge Sharing via Cluster-Level Rating Matrix
Source (Dense): Encode cluster-level rating patterns Target (Sparse): Map users/items to the encoded
prototypes
A B C
III
II
I
A B
III
II
I
a e b f c d
2
6
4
5
1
3
c d a b e
1
3
6
2
4
7
a b c d e f
a b c d e
1
6
5
4
3
2
7
1
6
5
4
3
2
BOOKS
(Target - Sparse)
MOVIES
(Auxiliary - Dense)
Cluster-level Rating Pattern
Matching
3
2
3
2
3
1
3
2
3
2
3
1
1
3
2
3 ?3 3
1 11 1
2 2? 2
2 22 2
3 ?3 3
3 33 3
? 33 3
2 22 2
1 11 ?
? 1 11 1 ?1 ? 1
2 ?2 2
3 3 33 ? 3
? 33 3
2 2 ?? 2 2
2
3
2
?
3
3
3
2
3
2
1
1
3
1
3
1
?
2
3
?
3
1
2
2
2
3
2
3
3
?
?
2
3
2
1
1
?
3
1
1
2
3
1
?
?
1
2
3
3
2
?
3
?
2
3
2
3
?
3
?
1
3
1
?
?
3
? 2 3 3 2
Permute rows & cols
5
Reduce to Groups
3 33 ?? 3
41
Step 1: Codebook Construction
Co-cluster rows (users) and columns (items) in Xaux
Get user/item cluster indicators Uaux {0, 1}∈ n×k, Vaux {0, ∈1}m×l
A B C
III
II
I
a e b f c d
2
6
4
5
1
3
Codebook B
3
2
3
2
3
1
1
3
23 ?3 3
1 11 1
2 2? 2
2 22 2
3 ?3 3
3 33 3
? 33 3
2 22 2
1 11 ?
(3+3+3)/3 = 3
(1+1+1+1)/4 = 1
(2+2+2)/3 = 2
……
42
Step 2: Codebook Transfer Objective
Expand target matrix, while minimizing the difference between Xtgt and the reconstructed one
User/item cluster indicators Utgt and Vtgt for Xtgt
Binary weighting matrix W for observed ratings in Xtgt
Alternate greedy searches for Utgt and Vtgt to a local minimum
43
Codebook Transfer
Each user/item in Xtgt matches to a prototype in B
Duplicate certain rows & columns in B to reconstruct Xtgt
Codebook is indeed a two-sided data representation
A B
III
II
I
a b c d e
1
6
5
4
3
2
7Codebook
1
3
1
1
2
3
1
3
1
1
2
3
3
2
3
3
3
2
3
2
3
3
3
2
1
3
1
1
2
3
2 2 3 3 2
Reconstructed
1
6
5
4
3
2
7
1
0
1
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0 0 1
C a b c d e
User-cluster Indicators
Item-cluster Indicators
0
0
1
0
0
1
1
0
0
1
0
0
0
0
1
3
2
3
2
3
1
1
3
2
44
Experimental Setup Data Sets
EachMovie (Auxiliary): 500 users × 500 movies MovieLens (Target): 500 users × 1000 movies Book-Crossing (Target): 500 users × 1000 books
Compared Methods Pearson Correlation Coefficients (PCC) Scalable Cluster-based Smoothing (CBS) Weighted Low-rank Approximation (WLR) Codebook Transfer (CBT)
Evaluation Protocol First 100/200/300 users for training; last 200 users for testing Given 5/10/15 observable ratings for each test user
45
Experimental Results (1): Books Movies
MAE Comparison on MovieLens average over 10 sampled test sets Lower is better
46
Limitations of Codebook Transfer Same rating range
Source and target data must have the same range of ratings [1, 5]
Homogenous dimensions User and item dimensions must be similar
In reality Range of ratings can be 0/1 or [1,5] User and item dimensions may be very
different48
Coordinate System Transfer
Weike Pan, Evan Xiang, Nathan Liu and Qiang Yang. Transfer Learning in Collaborative Filtering for
Sparsity Reduction. In Proceedings of the 24th AAAI Conference on
Artificial Intelligence (AAAI-10). Atlanta, Georgia, USA. July 11-15, 2010.
49
Our Solution: Coordinate System Transfer
Step 1: Coordinate System Construction ( ) Step 2: Coordinate System Adaptation
53
Step 1: Coordinate System Adaptation
Adapt the discovered coordinate systems from the auxiliary domain to the target domain,
The effect from the auxiliary domain Initialization: take as seed model in target domain, Regularization:
Coordinate System Transfer
55
AlgorithmCoordinate System Transfer
56
Data Sets and Evaluation Metrics
Data sets (extracted from MovieLens and Netflix)
Mean Absolute Error (MAE) and Root Mean Square Error (RMSE),
Where and are the true and predicted ratings, respectively,
and is the number of test ratings.
Experimental Results
57
Baselines and Parameter Settings
Baselines Average Filling Latent Matrix Factorization (Bell and Koren,
ICDM07) Collective Matrix Factorization (Singh and Gordon,
KDD08) OptSpace (Keshavan, Montanari and Oh, NIPS10)
Average results over 10 random trials are reported
Experimental Results
58
Results(1/2)
Observations: CST performs significantly better (t-test) than all baselines at all sparsity levels, Transfer learning methods (CST, CMF) beat two non-transfer learning methods (AF,
LFM).
59
Limitation of CST and CBT
Different source domains are related to the target domain differently Book to Movies Food to Movies
Rating bias Users tend to rate items that they like
Thus there are more rating = 5 than rating = 2
62
Our Solution: Collective Link Prediction (CLP)
Jointly learn multiple domains together Learning the similarity of different domains consistency between domains indicates
similarity. Introduce a link function to correct the bias
63
Bin Cao, Nathan Liu and Qiang Yang. Transfer Learning for Collective Link
Prediction in Multiple Heterogeneous Domains.
In Proceedings of 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel. June 2010.
64
Example: Collective Link PredictionExample: Collective Link Prediction
Predicting
Test Data:
Rating Matrix
RMSE:0.8855
1
5
3
4
4
?
?
?
?
? ?
2 3 1 5
2 4
1 3
1 2 5
3 4 2
1 0 0 1 0
0 1 0 0 1
1 0 1 1 0
1 1 0 0 1
1 0 1 0 0
Dense Auxiliary Book Rating
Matrix
Sparse Training Movie Rating
Matrix
Dense Auxiliary Movie Tagging
Matrix
Transfer Learning
Training Different Rating
Domains
Different
User behaviors
66
Inter-task Similarity
Based on Gaussian process models Key part is the kernel modeling user
relation as well as task relation
67
Making Prediction
Similar to memory-based approach
Similarity between tasks
Similarity between itemsMean of prediction
68
Experimental Results
71
Conclusions and Future Work
Transfer Learning ( 舉一反三 ) Link prediction is an important task in
graph/network data mining Key Challenge: sparsity Transfer learning from other domains helps
improve the performance
72
Future Work
Scenario 1: Our objective is to use the dense source
network to help reduce the sparsity of the entire target network
73
Sparse Dense
Target Information
Network
Source Information
Network
Future Work
Scenario 2: To use a dense source network as a bridge to
help connect parts of the target network.
74
Dense
Sparse
Dense
Target Information
Network
Source Information
Network
75
Future: Negative Transfer
Helpful: positive transfer
Harmful:
Neutral:
negative transfer
zero transfer
Future: Negative Transfer “To Transfer or Not to Transfer”
Rosenstein, Marx, Kaelbling and Dietterich Inductive Transfer Workshop, NIPS 2005. (Task:
meeting invitation and acceptance)
76
Acknowledgement
HKUST: Sinno J. Pan, Huayan Wang, Bin Cao, Evan
Wei Xiang, Derek Hao Hu, Nathan Nan Liu, Vincent Wenchen Zheng
Shanghai Jiaotong University: Wenyuan Dai, Guirong Xue, Yuqiang Chen,
Prof. Yong Yu, Xiao Ling, Ou Jin. Visiting Students
Bin Li (Fudan U.), Xiaoxiao Shi (Zhong Shan U.),
77