![Page 1: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/1.jpg)
1
Cross-domain Link Prediction and Recommendation
Jie Tang
Department of Computer Science and TechnologyTsinghua University
![Page 2: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/2.jpg)
2
Networked World• 1.26 billion users• 700 billion
minutes/month • 280 million users• 80% of users are 80-
90’s
• 560 million users • influencing our daily
life
• 800 million users • ~50% revenue
from network life
• 555 million users • .5 billion tweets/day
• 79 million users per month
• 9.65 billion items/year • 500 million users
• 35 billion on 11/11
![Page 3: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/3.jpg)
3
Challenge: Big Social Data
• We generate 2.5x1018 byte big data per day.
• Big social data:– 90% of the data was generated in the past 2 yrs– Mining in single data center mining deep
knowledge from multiple data sources
![Page 4: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/4.jpg)
4
Opinion Mining
Innovation diffusion
Business intelligence
Info. Space
Social Space
Interaction
Understanding the mechanisms of interaction dynamics
Info. Space vs. Social Space
Social Networks
![Page 5: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/5.jpg)
5
Core Research in Social Network
BIG Social Data
Social TheoriesAlgorithmic
Foundations
BA
mo
del
So
cial
influen
ce
Actio
nSocial Network Analysis
Theory
Information Diffusion
Search Prediction AdvertiseApplication
Macro Meso Micro
ER
m
od
el
Com
munity
Gro
up
b
eha
vior
Structural h
ole
So
cial tie
![Page 6: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/6.jpg)
6
(KDD 2010, PKDD 2011 Best Runnerup)
Part A: Let us start with a simple case
“inferring social ties in single network”
![Page 7: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/7.jpg)
7
Real social networks are complex...
• Nobody exists merely in one social network.– Public network vs. private network– Business network vs. family network
• However, existing networks (e.g., Facebook and Twitter) are trying to lump everyone into one big network– FB/QQ tries to solve this problem via lists/groups– however…
• Google circles
![Page 8: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/8.jpg)
8
Even complex than we imaged!
• Only 16% of mobile phone users in Europe have created custom contact groups– users do not take the time to create it– users do not know how to circle their friends
• The Problem is that online social network are
black white…
![Page 9: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/9.jpg)
9
Example 1. From BW to Color (KDD’10)
![Page 10: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/10.jpg)
10
Example 2. From BW to Color (PKDD’11, Best Paper Runnerup)
CEO
Employee
How to infer
Manager
Enterprise email network
User interactions may form implicit groups
![Page 11: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/11.jpg)
11
What is behind?
From Home08:40
From Office11:35
Both in office08:00 – 18:00
From Office15:20
From Outside21:30
From Office17:55
Publication network
Mobile communication network
Twitter’s following network
![Page 12: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/12.jpg)
12
What is behind?
Publication network
Mobile communication network
Twitter’s following networkFrom Home
08:40
From Office11:35
Both in office08:00 – 18:00
From Office15:20
From Outside21:30
From Office17:55
Questions: - What are the fundamental forces behind? - A generalized framework for inferring social ties? - How to connect the different networks?
![Page 13: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/13.jpg)
13
Learning Framework
inferring social ties in single network
![Page 14: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/14.jpg)
14
Problem Analysis
Dynamic collaborative network Labeled network
Output: potential types of relationships and their probabilities: (type, prob, [s_time, e_time])
Smith
2000
2000
2001
2002
2003
1999
Ada Bob
Jerry
Ying
Input: Temporal collaboration network
Output: Relationship analysis
(0.8, [1999,2000])
(0.7, [2000, 2001])
(0.65, [2002, 2004])
2004
Ada
Bob
Ying
Smith
(0.2, [2001, 2003])
(0.5, [/, 2000])
(0.9, [/, 1998])
(0.4, [/, 1998])
(0.49, [/, 1999])
Jerry
[1] C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, and J. Guo. Mining Advisor-Advisee Relationships from Research Publication Networks. KDD'10, pages 203-212.
![Page 15: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/15.jpg)
15
Overall Framework
• ai: author i
• pj: paper j
• py: paper year• pn: paper#• sti,yi: starting
time• edi,yi: ending
time• ri,yi: probability
1 2
34
The problem is cast as, for each node, identifying which neighbor has the highest probability to be his/her advisor, i.e., P(yi=j |xi, x~i, y), where xj and xi are neighbors.
![Page 16: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/16.jpg)
16
Time-constrained Probabilistic Factor Graph (TPFG)
• Hidden variable yx: ax’s advisor
• stx,yx: starting timeedx,yx: ending time
• g(yx, stx, edx) is pairwise local feature
• fx(yx,Zx)= max g(yx ,
stx, edx) under time constraint
• Yx: set of potential advisors of ax
f 0(y0,y1,
y2,y3,y4,y5)
y1
y0
y2 y3
y4y5
f 1(y1, y2,y3)
g 2(y2)
g 5(y5)
g 4(y4)
f 3(y3, y4,y5)
gsted
y1
1∞0
0
gsted
y0
1∞-∞
0
gsted
y3
0.2∞0
00.820002001
1
gsted
y2
0.1∞0
00.919992000
1
gsted
y4
0.3∞0
00.720012003
3gsted
y5
0.2∞0
00.820022004
3
![Page 17: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/17.jpg)
17
• A general likelihood objective func can be defined as
where Φ(.) can be instantiated in different ways, e.g.,
Maximum likelihood estimation
1
),,(),,(})|{|(
with
})|{|(1
),,(
1
1
11
iYxxiijxijijiixii
N
iixiiN
stedyedstygYxyyf
YxyyfZ
yyP
xiijx
xiijx
xiijx stediy
stediystedy
,0
,1),,(
f 0(y0,y1,
y2,y3,y4,y5)
y1
y0
y2 y3
y4y5
f 1(y1, y2,y3)
g 2(y2)
g 5(y5)
g 4(y4)
f 3(y3, y4,y5)
gsted
y1
1∞0
0
gsted
y0
1∞-∞
0
gsted
y3
0.2∞0
00.820002001
1
gsted
y2
0.1∞0
00.919992000
1
gsted
y4
0.3∞0
00.720012003
3gsted
y5
0.2∞0
00.820022004
3
![Page 18: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/18.jpg)
18
Inference algorithm of TPFG
• rij = max P(y1, . . . , yna|yi = j) = exp (sentij + recvij)
1
Phase 1
a5
a2 a3
a4
a0
a1
1
1
1
11
2
3
2
Phase 2
a5
a2 a3
a4
a0
a1
1
3
1
1 3
2
1
senty0
10
recv ?senty0
10
recv 1
senty2
u2,0
0
recv ?u2,1
1
?
senty5
u5,0
0
recv ?
senty1
u1,0
0
recv ?
senty4
u4,0
0
recv ?u4,3
3
?
u5,3
3
?
senty3
u3,0
0
recv ?u3,1
1
?
senty2
u2,0
0
recv v2,0
u2,1
1
v2,1
senty5
u5,0
0
recv v5,0
u5,3
3
v5,3
senty1
u1,0
0
recv v1,0
senty3
u3,0
0
recv v3,0
u3,1
1
v3,1
senty4
u4,0
0
recv v4,0
u4,3
3
v4,3
![Page 19: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/19.jpg)
19
Results of Model 1
• DBLP data: 654, 628 authors, 1,076,946 publications, years provided.
• Ground truth: MathGenealogy Project; AI Genealogy Project; Faculty Homepage
Datasets RULE SVM IndMAX Model 1
TEST1 69.9% 73.4% 75.2% 78.9% 80.2% 84.4%
TEST2 69.8% 74.6% 74.6% 79.0% 81.5% 84.3%
TEST3 80.6% 86.7% 83.1% 90.9% 88.8% 91.3%
Empiricalparameter
optimizedparameter
heuristics Supervised learning
![Page 20: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/20.jpg)
20
Results
[1] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. ArnetMiner: Extraction and Mining of Academic Social Networks. KDD’08, pages 990-998.
![Page 21: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/21.jpg)
21
(KDD 2012, WSDM
Part B: Extend the problem to cross-domain
“cross-domain collaboration recommendation”
![Page 22: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/22.jpg)
22[1] J. Tang, S. Wu, J. Sun, and H. Su. Cross-domain Collaboration Recommendation. KDD’12, pages 1285-1293. (Full Presentation & Best Poster Award)
Cross-domain Collaboration
• Interdisciplinary collaborations have generated huge impact, for example,– 51 (>1/3) of the KDD 2012 papers are result of
cross-domain collaborations between graph theory, visualization, economics, medical inf., DB, NLP, IR
– Research field evolution
Biology
Computer
Science
bioinformatics
![Page 23: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/23.jpg)
23
Cross-domain Collaboration (cont.)
• Increasing trend of cross-domain collaborations
Data Mining(DM), Medical Informatics(MI), Theory(TH), Visualization(VIS)
![Page 24: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/24.jpg)
24
Challenges
?
?
Data Mining TheorySparse Connection: <1%1
Complementary expertise
2
Topic skewness: 9%3
Large graph
heterogeneous network
Sociall network
Graph theory
Complexity theory
Automata theory
![Page 25: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/25.jpg)
25
Related Work-Collaboration recommendation
• Collaborative topic modeling for recommending papers– C. Wang and D.M. Blei. [2011]
• On social networks and collaborative recommendation– I. Konstas, V. Stathopoulos, and J. M. Jose. [2009]
• CollabSeer: a search engine for collaboration discovery– H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles. [2007]
• Referral web: Combining social networks and collaborative filtering– H. Kautz, B. Selman, and M. Shah. [1997]
• Fab: content-based, collaborative recommendation– M. Balabanovi and Y. Shoham. [1997]
![Page 26: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/26.jpg)
26
Related Work-Expert finding and matching
• Topic level expertise search over heterogeneous networks– J. Tang, J. Zhang, R. Jin, Z. Yang, K. Cai, L. Zhang, and Z. Su. [2011]
• Formal models for expert finding in enterprise corpora– K. Balog, L. Azzopardi, and M.de Rijke. [2006]
• Expertise modeling for matching papers with reviewers– D. Mimno and A. McCallum. [2007]
• On optimization of expertise matching with various constraints– W. Tang, J. Tang, T. Lei, C. Tan, B. Gao, and T. Li. [2012]
![Page 27: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/27.jpg)
27
Approach Framework—Cross-domain Topic Learning
cross-domain collaboration recommendation
![Page 28: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/28.jpg)
28
Author Matching
GS
v1
v2
vN
vq
…
GT
v'1
v'2
v' N'
…
Data Mining Medical Informatics
Author
Coauthorships
Query user
Cross-domain coauthorships
![Page 29: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/29.jpg)
29
Recall Random Walk
• Let us begin with PageRank[1]
5
4
1
3
2
0.2
0.2
0.20.2
0.2
5
4
1
3
2
(0.2+0.2*0.5+0.2*1/3+0.2)0.85+0.15*0.2
?
??
?
[1] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP-1999-0120, Stanford University, 1999.
![Page 30: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/30.jpg)
30
Random Walk with Restart[1]
q
4
1
3
2
0.4
0.15
0.10.1
0.25
1/3
1/3
1/3
Uq=1 1
[1] J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In ICDM’05, pages 418–425, 2005.
![Page 31: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/31.jpg)
31
Author Matching
GS
v1
v2
vN
vq
…
GT
v'1
v'2
v' N'
…
Data Mining Medical Informatics
Author
Coauthorships
Query user
Cross-domain coauthorships
Sparse connection!1
![Page 32: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/32.jpg)
32
Topic Matching
GS
v1
v2
vN
vq
…
GT
v'1
v'2
v' N'
…… …
z'1
z'2
z'3
z'T
Data Mining Medical InformaticsTopics Topics
Topics Extraction
Topics correlations
z1
z2
z3
zT
Complementary Expertise!
Topic skewness!2
3
![Page 33: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/33.jpg)
33
Recall Topic Model
• Usage of a theme:– Summarize topics/subtopics– Navigate documents– Retrieve documents– Segment documents– All other tasks involving unigram
language models
![Page 34: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/34.jpg)
34[1] T. Hofmann. Probabilistic latent semantic indexing. SIGIR’99, pages 50–57, 1999.
Topic Model
• A generative model for generating the co-occurrence of documents d∈D={d1,…,dD} and terms w∈W={w1,…,wW}, which associates latent variable z∈Z={z1,…,zZ}.
• The generative processing is:
w1w1
w2w2
wWwW
…
d1d1
d2d2
dDdD
…
z1
z2
zZ
P(d)
P(z|d) P(w|z)
![Page 35: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/35.jpg)
35
Topic Model
w1w1
w2w2
wWwW
…
d1d1
d2d2
dDdD
…
z1
z2
zZ
P(d)
P(z|d) P(w|z)
![Page 36: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/36.jpg)
36
Maximum-likelihood
• Definition– We have a density function P(x|Θ) that is govened by the set of
parameters Θ, e.g., P might be a set of Gaussians and Θ could be the means and covariances
– We also have a data set X={x1,…,xN}, supposedly drawn from this distribution P, and assume these data vectors are i.i.d. with P.
– Then the log-likehihood function is:
– The log-likelihood is thought of as a function of the parameters Θ where the data X is fixed. Our goal is to find the Θ that maximizes L. That is
( | ) log ( | ) log ( | ) log ( | )i iii
L X p X p x p x
* arg max ( | )L X
![Page 37: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/37.jpg)
37
Topic Model
• Following the likelihood principle, we determines P(d), P(z|d), and P(w|d) by maximization of the log-likelihood function
( , )( | , , ) log ( , )
( , ) log ( , )
( , ) log ( | ) ( | ) ( )
n d w
d w
d D w W
d D w W z Z
L d w z P d w
n d w P d w
n d w P w z P d z P z
co-occurrence times of d and w. Which is
obtained according to the multi-distribution
Observed data
Unobserved data
P(d), P(z|d), and P(w|d)
![Page 38: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/38.jpg)
38
• Recall that f is a convex function if f ”(x)≥0, and f is strictly convex function if f ”(x)>0
• Let f be a convex function, and let X be a random variable, then:
• Moreover, if f is strictly convex, then E[f(X)]=f(EX) holds true if and only if X=E[X] with probability 1 (i.e., if X is a constant)
Jensen’s Inequality
[ ( )] ( )E f X f EX
![Page 39: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/39.jpg)
39
Basic EM Algorithm• However, Optimizing the likelihood function is analytically intractable but
when the likelihood function can be simplified by assuming the existence of and values for additional but missing (or hidden) parameters:
• Maximizing L(Θ) explicitly might be difficult, and the strategy is to instead repeatedly construct a lower-bound on L(E-step), and then optimize that lower bound (M-step).– For each i, let Qi be some distribution over z (∑zQi(z)=1, Qi(z)≥0), then
– The above derivation used Jensen’s inequality. Specifically, f(x) = logx is a concave function, since f”(x)=-1/x2<0
( | ) log ( | ) log ( , | )i ii i z
L X p x p x z
( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( ) ( ) ( )
( ) ( )
( , ; ) ( , ; )log ( , ; ) log ( ) ( ) log
( ) ( )i i i
i i i ii i i i
i ii ii i iz z zi i
p x z p x zp x z Q z Q z
Q z Q z
![Page 40: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/40.jpg)
40
Parameter Estimation-Using EM• According to Basic EM:
• Then we define
• Thus according to Jensen’s inequality
( ) ( ) ( )( ) ( | ; )i i iiQ z p z x
( )( ) ( | , )iiQ z p z d w
( | ) ( | ) ( )( ) ( , ) log ( | , )
( | , )
( | ) ( | ) ( )( , ) ( | , ) log
( | , )
d D w W z Z
d D w W z Z
p w z p d z p zL n d w p z d w
p z d w
p w z p d z p zn d w p z d w
p z d w
![Page 41: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/41.jpg)
41
(1)Solve P(w|z)
• We introduce Lagrange multiplier λwith the constraint that ∑wP(w|z)=1, and solve the following equation:
( | ) ( | ) ( )( , ) ( | , ) log ( | ) 1 0
( | ) ( | , )
( , ) ( | , )0,
( | )
( , ) ( | , )( | ) ,
( | ) 1,
( , ) ( | , ),
( , ) (( | )
d D w W z Z z
d D
d D
w
w W d D
p w z p d z p zn d w p z d w P w z
P w z p z d w
n d w P z d w
P w z
n d w P z d wP w z
P w z
n d w P z d w
n d w PP w z
| , )
( , ) ( | , )d D
w W d D
z d w
n d w P z d w
![Page 42: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/42.jpg)
42
The final update Equations
• E-step:
• M-step:
( | ) ( | ) ( )( | , )
( | ) ( | ) ( )z Z
P w z P d z P zP z d w
P w z P d z P z
( , ) ( | , )( | )
( , ) ( | , )d D
w W d D
n d w P z d wP w z
n d w P z d w
( , ) ( | , )( | )
( , ) ( | , )w W
d D w W
n d w P z d wP d z
n d w P z d w
( , ) ( | , )( )
( , )d D w W
w W d D
n d w P z d wP z
n d w
![Page 43: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/43.jpg)
43
PLSI(SIGIR’99)
D
Document
d
w
Wordz
Nd
Topic
[1] T. Hofmann. Probabilistic latent semantic indexing. SIGIR’99, pages 50–57, 1999.
![Page 44: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/44.jpg)
44
LDA (JMLR’03)
D
Document
α θ
Document specific distribution over
topics
Φ
T
Topic distribution over words
β w
Wordz
Nd
Topic
[1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003.
![Page 45: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/45.jpg)
45
Cross-domain Topic Learning
GS
v1
v2
vN
vq
…
GT
v'1
v'2
v' N'
……
z1
z2
z3
zK
Data Mining Medical InformaticsTopics
Identify “cross-domain” Topics
![Page 46: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/46.jpg)
46
γ
λ Ad
s v
x
Collaborated document d
Φ θ'
s=0
γt
s=1
α
θ
target domain
source domain
z
(v, v')
v
v'
β
Collaboration Topics Extraction
Step 1:
Step 2:
![Page 47: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/47.jpg)
47
Intuitive explanation of Step 2 in CTL
Collaboration topics
![Page 48: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/48.jpg)
48
Experiments
cross-domain collaboration recommendation
![Page 49: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/49.jpg)
49
• Arnetminer (available at http://arnetminer.org/collaboration)
• Baselines– Content Similarity(Content)– Collaborative Filtering(CF)– Hybrid– Katz– Author Matching(Author), Topic Matching(Topic)
Data Set and Baselines
Domain Authors Relationships Source
Data Mining 6,282 22,862 KDD, SDM, ICDM, WSDM, PKDD
Medical Informatics 9,150 31,851 JAMIA, JBI, AIM, TMI, TITB
Theory 5,449 27,712 STOC, FOCS, SODA
Visualization 5,268 19,261 CVPR, ICCV, VAST, TVCG, IV
Database 7,590 37,592 SIGMOD, VLDB, ICDE
![Page 50: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/50.jpg)
50
Performance Analysis
Cross Domain
ALG P@10 P@20 MAP R@100 ARHR-10
ARHR-20
Data Mining(S)
to Theory(T)
Content 10.3 10.2 10.9 31.4 4.9 2.1
CF 15.6 13.3 23.1 26.2 4.9 2.8
Hybrid 17.4 19.1 20.0 29.5 5.0 2.4
Author 27.2 22.3 25.7 32.4 10.1 6.4
Topic 28.0 26.0 32.4 33.5 13.4 7.1
Katz 30.4 29.8 21.6 27.4 11.2 5.9
CTL 37.7 36.4 40.6 35.6 14.3 7.5
Content Similarity(Content): based on similarity between authors’ publicationsCollaborative Filtering(CF): based on existing collaborationsHybrid: a linear combination of the scores obtained by the Content and the CF methods.Katz: the best link predictor in link-prediction problem for social networksAuthor Matching(Author): based on the random walk with restart on the collaboration graphTopic Matching(Topic): combining the extracted topics into the random walking algorithm
Training: collaboration before 2001 Validation: 2001-2005
![Page 51: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/51.jpg)
51
Performance on New Collaboration Prediction
CTL can still maintain about 0.3 in terms of MAP which is significantly higher than baselines.
![Page 52: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/52.jpg)
52
Parameter Analysis
(a) varying the number of topics T (b) varying α parameter (c) varying the restart parameter τ in the random walk (d) Convergence analysis
![Page 53: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/53.jpg)
53
Prototype System http://arnetminer.org/collaborator
Treemap: representing subtopic in the target domain
Recommend Collaborators & Their relevant publications
![Page 54: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/54.jpg)
54
(ACM TKDD, TIST, WSDM 2013-14)
Part C: Further incorporate user feedback
“interactive collaboration recommendation”
![Page 55: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/55.jpg)
55
Luo Gang
Philip S. Yu
Kun-Lung Wu
Jimeng Sun
Ching-Yung Lin
Milind R Naphade
Philip is not a healthcare people
Kun-Lung Wu is matching to me
Finding co-inventors in IBM (>300,000 employers)
Recommend Candidates Interactive feedback
Existing co-inventors Recommendation
Find me a partner to collaborate on Healthcare…
Luo Gang
Philip S. Yu
Kun-Lung Wu
Jimeng Sun
Ching-Yung Lin
Milind R Naphade
Refined Recommendations
Recommended collaborators by interactive learning
Example
[1] S. Wu, J. Sun, and J. Tang. Patent Partner Recommendation in Enterprise Social Networks. WSDM’13, pages 43-52.
![Page 56: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/56.jpg)
56
Challenges
• What are the fundamental factors that influence the formation of co-invention relationships?
• How to design an interactive mechanism so that the user can provide feedback to the system to refine the recommendations?
• How to learn the interactive recommendation framework in an online mode?
![Page 57: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/57.jpg)
57
Learning framework
interactive collaboration recommendation
![Page 58: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/58.jpg)
58
RankFG Model
Map each pair to a node in the graphical model
Random variable
Pairwise factor function
Social correlation factor function
Recommended collaborator
The problem is cast as, for each relationship, identifying which type has the highest probability.
constraint
![Page 59: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/59.jpg)
59
Modeling with exponential family
y1
f(v1,y1)
y2
y4
y5
relationships
g (y12, y34)y1=?
v1
v4
v5...
….
y2=2
y4=2
y5=?
f(v2,y2)f(.)
f(v4,y4)
f(v5,y5)
h (y1, y2)
g (y45, y34)
g (y12,y45)
v2
Partially LabeledModel
}),(exp{)|(1
d
jiijjjii yxgyxP
})(exp{)|(
i
ic k
ckkii YhYyP
iii yxPGYPGYPYXP
GXP
YPYGXPGXYP
)|()|()|()|(
),(
)()|,(),|(
Likelihood objective function
![Page 60: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/60.jpg)
60
Ranking Factor Graphs
• Pairwise factor function:
• Correlation factor function:
• Log-likelihood objective function:
• Model learning
![Page 61: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/61.jpg)
61
Learning Algorithm
Expectation ComputingLoopy Belief Propagation
![Page 62: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/62.jpg)
62
Still Challenge
How to incrementally incorporate users’ feedback?
![Page 63: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/63.jpg)
63
Learning Algorithm
Incremental estimation
![Page 64: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/64.jpg)
64
Interactive Learning
1) add new factor nodes to the factor graph built in the model learning process.
2) -step message passing: Start from the new variable node (root node). Send messages to all of its neighborhood factors. Propagate the messages up to-step Perform a backward messages passing.
3) Calculate an approximate value of the marginal probabilities of the newly factors.
New variable
New factor node
![Page 65: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/65.jpg)
65
From passive interactive to active
• Influence model
• Entropy
• Threshold
[1] Z. Yang, J. Tang, and B. Xu. Active Learning for Networked Data Based on Non-progressive Diffusion Model. WSDM’14.
[2] L. Shi, Y. Zhao, and J. Tang. Batch Mode Active Learning for Networked Data. ACM Transactions on Intelligent Systems and Technology (TIST), Volume 3, Issue 2 (2012), Pages 33:1--33:25.
![Page 66: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/66.jpg)
66
Active learning via Non-progressive diffusion model
• Maximizing the diffusion
NP-hard!
![Page 67: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/67.jpg)
67
MinSS
• Greedily expand Vp
![Page 68: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/68.jpg)
68
MinSS(cont.)
![Page 69: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/69.jpg)
69
Lower Bound and Upper Bound
![Page 70: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/70.jpg)
70
Approximation Ratio
![Page 71: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/71.jpg)
71
Experiments
interactive collaboration recommendation
![Page 72: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/72.jpg)
72
• PatentMiner (pminer.org)
• Baselines:– Content Similarity (Content)– Collaborative Filtering (CF)– Hybrid– SVM-Rank
Data Set
DataSet Inventors Patents Average increase #patent
Average increase #co-invention
IBM 55,967 46,782 8.26% 11.9%
Intel 18,264 54,095 18.8% 35.5%
Sony 8,505 31,569 11.7% 13/0%
Exxon 19,174 53,671 10.6% 14.7%
[1] J. Tang, B. Wang, Y. Yang, P. Hu, Y. Zhao, X. Yan, B. Gao, M. Huang, P. Xu, W. Li, and A. K. Usadi. PatentMiner: Topic-driven Patent Analysis and Mining. KDD’12, pages 1366-1374.
![Page 73: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/73.jpg)
73
Performance Analysis-IBM
Data ALG P@5 P@10 P@15 P@20 MAP R@100
IBM
Content 23.0 23.3 18.8 15.6 24.0 33.7
CF 13.8 12.8 11.3 11.5 21.7 36.4
Hybrid 13.9 12.8 11.5 11.5 21.8 36.7
SVMRank 13.3 11.9 9.6 9.8 22.2 43.5
RankFG 31.1 27.5 25.6 22.4 40.5 46.8
RankFG+ 31.2 27.5 26.6 22.9 42.1 51.0
RankFG+: it uses the proposed RankFG model with 1% interactive feedback.
Training: collaboration before 2000 Validation: 2001-2010
![Page 74: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/74.jpg)
74
Interactive Learning Analysis
Interactive learning achieves a close performance to the complete learning with only 1/100 of the running time used for complete training.
![Page 75: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/75.jpg)
75
Parameter Analysis
Factor contribution analysis Convergence analysis
RankFG-C: stands for ignoring referral chaining factor functions.RankFG-CH: stands for ignoring both referral chaining and homophily. RankFG-CHR: stands for further ignoring recency.
![Page 76: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/76.jpg)
76
Results of Active Learning
![Page 77: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/77.jpg)
77
Summaries
• Inferring social ties in single network– Time-dependent factor graph model
• Cross-domain collaboration recommendation– Cross-domain topic learning
• Interactive collaboration recommendation– Ranking factor graph model– Active learning via non-progressive diffusion
![Page 78: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/78.jpg)
78
?Family
Friend
Lady GagaYou Lady GagaYou
?
Lady Gaga
You
Lady Gaga
You
?
Shiteng Shiteng
Inferring social ties
Reciprocity
Triadic Closure
Future Work
![Page 79: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/79.jpg)
79
References• Tiancheng Lou, Jie Tang, John Hopcroft, Zhanpeng Fang, Xiaowen Ding. Learning to Predict Reciprocity and
Triadic Closure in Social Networks. In TKDD, 2013.• Yi Cai, Ho-fung Leung, Qing Li, Hao Han, Jie Tang, Juanzi Li. Typicality-based Collaborative Filtering
Recommendation. IEEE Transaction on Knowledge and Data Engineering (TKDE).• Honglei Zhuang, Jie Tang, Wenbin Tang, Tiancheng Lou, Alvin Chin, and Xia Wang. Actively Learning to Infer
Social Ties. DMKD, Vol. 25, Issue 2 (2012), pages 270-297.• Lixin Shi, Yuhang Zhao, and Jie Tang. Batch Mode Active Learning for Networked Data. ACM Transactions
on Intelligent Systems and Technology (TIST), Volume 3, Issue 2 (2012), Pages 33:1--33:25.• Jie Tang, Jing Zhang, Ruoming Jin, Zi Yang, Keke Cai, Li Zhang, and Zhong Su. Topic Level Expertise
Search over Heterogeneous Networks. Machine Learning Journal, Vol. 82, Issue 2 (2011), pages 211-237. • Zhilin Yang, Jie Tang, and Bin Xu. Active Learning for Networked Data Based on Non-progressive Diffusion
Model. WSDM’14.• Sen Wu, Jimeng Sun, and Jie Tang. Patent Partner Recommendation in Enterprise Social Networks.
WSDM’13, pages 43-52. • Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. Cross-domain Collaboration Recommendation. KDD’12,
pages 1285-1293. (Full Presentation & Best Poster Award)• Jie Tang, Bo Wang, Yang Yang, Po Hu, Yanting Zhao, Xinyu Yan, Bo Gao, Minlie Huang, Peng Xu, Weichang
Li, and Adam K. Usadi. PatentMiner: Topic-driven Patent Analysis and Mining. KDD’12, pages 1366-1374.• Jie Tang, Tiancheng Lou, and Jon Kleinberg. Inferring Social Ties across Heterogeneous Networks.
WSDM’12, pages 743-752. • Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, and Jingyi Guo. Mining Advisor-Advisee
Relationships from Research Publication Networks. KDD'10, pages 203-212.• Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of
Academic Social Networks. KDD’08, pages 990-998.
![Page 80: 1 Cross-domain Link Prediction and Recommendation Jie Tang Department of Computer Science and Technology Tsinghua University](https://reader035.vdocuments.mx/reader035/viewer/2022070408/56649e4f5503460f94b460ef/html5/thumbnails/80.jpg)
80
Thank you!Collaborators: John Hopcroft, Jon Kleinberg (Cornell)
Jiawei Han and Chi Wang (UIUC)
Tiancheng Lou (Google)
Jimeng Sun (IBM)
Jing Zhang, Zhanpeng Fang, Zi Yang, Sen Wu (THU)
Jie Tang, KEG, Tsinghua U, http://keg.cs.tsinghua.edu.cn/jietangDownload all data & Codes, http://arnetminer.org/download