pro lerank: finding relevant content and in uential users based … › ~arlei › talks ›...
TRANSCRIPT
ProfileRank:Finding Relevant Content and Influential
Users based on Information Diffusion@SNAKDD’13, Chicago, IL
Arlei Silva1, Sara Guimaraes2,Wagner Meira Jr.2, Mohammed Zaki3
1Computer Science Department – University of California, Santa Barbara, CA2Computer Science Department – Universidade Federal de Minas Gerais, Brazil
3Computer Science Department – Rensselaer Polytechnic Institute, NY
Social Media in Numbers
Twitter: 500M users, 340M tweets/day
Tumblr: 100M users, 75M posts/day
Facebook: 1.15B users, 1B pieces of content shared/day
Instagram: 30M users, 5M photos shared/day
Influence and Relevance in Social Media: Questions
Who are the influentials?
I influence: ability of popularizing information
I personalized influence
What is relevant?
I relevance: capacity of satisfying a user’s information needs
I personalized relevance
Why are these questions important?
I Information diffusion mechanisms
I Recommender systems
I Viral marketing
Information Diffusion Data
Content creation/propagation represented as tuples:
I <user,content,time>
C
RT@user_0 A
0 @user_0 1
2 3
@user_1
@user_2 @user_3
A
BB | !BB?
RT@user_0 B
B
RT@user_1 C
(a) Twitter
user 0, A, t0user 0, B, t1user 1, A, t2user 1, C , t3user 2, B, t4user 3, C , t5
(b) Diffusion data
How can we measure influence and relevance?
ProfileRankRandom walks over a content-user graph
Relevant content is created and propagated by influentialusers and influential users create relevant content
Relies on content propagation, instead of a social networkI In some scenarios, there is no social network availableI # of followers 6= capacity to propagate content [Cha et al.’10]
user 0, A, t0user 0, B, t1user 1, A, t2user 1, C , t3user 2, B, t4user 3, C , t5
(a) Diffusion data (b) Diffusion model
ProfileRank: Formulation
Information diffusion data → information diffusion graph
I G (U,C ,F ,E )
G can be represented as two matrices:
1. M: User-content matrix
2. L: Content-user matrix
Relevance r and influence i computed as:
r = iM i = rL
r(k) = r(k−1)LM) i(k) = i(k−1)ML
r = (1− d)u(I − dLM)−1 i = (1− d)u(I − dML)−1
These equations always have a unique solution
Related Work
Social influence and information diffusion [Gruhl et al.’04,Leskovec et al.’07, Tang et al.’09, Cha et al.’09, Cha et al.’10,Weng et al’10, Goyal et al.’10, Romero et al.’11]
Content search and recommendation [Baluja et al.’08, Chen etal.’10, De Choudhury et al.’11, Kim and Shim’11]
Link prediction in social networks [Liben-Nowell andKleinberg’03, Hannon et al.’10, Leroy et al.’10, Gomez Rodriguezet al.’10]
Relevance in hyperlinked environments [Kleinberg’98, Page etal.’99]
Evaluation
Problem: Absence of ground truth information
I Influential users
I Relevant content
Solution: Considering personalized assessments
I A user is influential to another user
I A content is relevant to a given user
ProfileRank can be personalized to provide recommendations
Assumption: Recommendation accuracy → model quality
Evaluation: Datasets
Dataset content #users #pieces of content #propagations source
TW-CARS tweet 529,630 369,287 1,368,080 TwitterTW-SOCCER tweet 837,559 3,485,313 958,144 TwitterTW-ELECTIONS tweet 3,860,251 4,067,221 15,844,788 TwitterTW-LARGE tweet 17,069,982 476,553,560 71,835,017 TwitterMEME meme 96,608,034 210,999,824 126,905,936 MemeTracker
Table: Information diffusion datasets.
Dataset edge #edges source
TW-SOCCER follower-followee 269,217,548 TwitterTW-LARGE follower-followee 1,470,000,000 Twitter
Table: Network datasets.
Evaluation: Content Recommendation
Task: Predicting content users will propagate
I 50/50% split training/test
Content: tweets and memes
ProfileRank (global and personalized):
I Recommendations based on relevance scores
Baselines: collaborative filtering
I MyMediaLite library
Evaluation: Content Recommendation
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
true p
ositiv
e r
ate
false positive rate
(a) ROC
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6
pre
cis
ion
recall
(b) Prec-recall
0
0.05
0.1
0.15
5 10 15 20
pre
cis
ion
n
(c) Precision@n
0
0.05
0.1
0.15
0.2
0.25
5 10 15 20
reca
ll
n
PPR
WRMF
POPULAR
PR
(d) Recall@n
Evaluation: User Recommendation
Task: Predicting influence links
I Cold-start
Follower relationships on Twitter data
ProfileRank (global and personalized):
I Recommendations based on influence scores
Baselines: cold-start link prediction [Leroy et al.’10]
I # content shared
I Adamic-Adar score
I # content shared + common neighbors
I Adamic-Adar score + common neighbors
Evaluation: User Recommendation
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
true p
ositiv
e r
ate
false positive rate
(a) ROC
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6
pre
cis
ion
recall
(b) Prec-recall
0
0.1
0.2
0.3
0.4
0.5
5 10 15 20
pre
cis
ion
n
(c) Precision@n
0
0.1
0.2
0.3
0.4
5 10 15 20
reca
ll
n
PPR
AA
CC
AA+CN
CC+CN
PR
(d) Recall@n
Evaluation: Top Influentials - US Elections
user description
BarackObama US President and Demo-crat candidate
Obama2012 Obama’s campaignUberFacts Comedy factsBorowitzReport Comedy newsStephenAtHome Comediantruthteam2012 Obama’s campaignReal Liam Payne Pop singerMittRomney Republican candidatethinkprogress Political blogrealDonaldTrump Businessman
Evaluation: Top Relevant Tweets - US Elections
content description
@BarackObama hi mr Obama have yougot up all night yet?
Message from Liam Payneto Barack Obama
That was one of the strangest daysever will smith taylor swift justin biebermichelle obama wow what it going onwith my life!!
Liam Payne about the 2012Kid’s Choice Award
Obama, congratulations on being thefirst sitting President to support marriageequality. Feels like the future, and not thepast. #NoFear
Lady Gaga about Obama’ssupport for gay marriage
”Same-sex couples should be able to getmarried.”–President Obama
Obama about his supportfor gay marriage
Summertime with @NiallOfficial and@BarackObama! http://t.co/KNnWnfz7
Josh Devine about a pictureincluding a Obama’s statue
Concluding Remarks
We proposed a simple model that accurately measures userinfluence and content relevance in information diffusion data.
Model based on random walks over a content-user graph
Extensive evaluation:
I Quantitative: User and content recommendation
I Qualitative: Intuitive results in real data
Future work:
I ProfileRank+filtering for search on Twitter
I Incorporating temporal dynamics for updated assessments
I Incorporating textual and network information
ProfileRank: Finding Relevant Content andInfluential Users based on Information Diffusion
More information:
http://www.cs.ucsb.edu/~arlei
http://code.google.com/p/profilerank/
This student received a travel award. Thanks!
Evaluation: Content Recommendation
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
true p
ositiv
e r
ate
false positive rate
(a) ROC
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6
pre
cis
ion
recall
(b) Prec-recall
0
0.1
0.2
0.3
5 10 15 20
pre
cis
ion
n
(c) Precision@n
0
0.1
0.2
0.3
0.4
0.5
0.6
5 10 15 20
reca
ll
n
PPR
WIKNN
POPULAR
PR
(d) Recall@n
Evaluation: Content Recommendation
Method AUC BEP P@5 P@20 R@5 R@20PPR 0.81 0.28 0.12 0.08 0.12 0.22PR 0.64 0.01 0.01 0.01 0.01 0.02WRMF 0.61 0.11 0.04 0.03 0.05 0.08WBPRMF 0.58 0.08 0.02 0.01 0.03 0.04WIKNN 0.57 0.13 0.05 0.03 0.05 0.09WUKNN 0.57 0.13 0.05 0.03 0.05 0.09POPULAR 0.55 0.01 0.01 0.01 0.01 0.03
(a) TW-CARS
Method AUC BEP P@5 P@20 R@5 R@20PPR 0.89 0.46 0.27 0.11 0.46 0.58WIKNN 0.75 0.43 0.22 0.09 0.35 0.44WUKNN 0.75 0.38 0.21 0.09 0.35 0.44WBRMF 0.71 0.09 0.04 0.02 0.07 0.13WRMF 0.71 0.05 0.01 0.01 0.01 0.01POPULAR 0.65 0.01 0.01 0.01 0.01 0.02PR 0.62 0.01 0.01 0.01 0.01 0.02
(b) MEME
Evaluation: User Recommendation
Method AUC BEP P@5 P@20 R@5 R@20PPR 0.88 0.25 0.42 0.25 0.18 0.30PR 0.84 0.07 0.06 0.05 0.02 0.06AA+CN 0.78 0.06 0.18 0.10 0.07 0.12CC+CN 0.70 0.02 0.13 0.06 0.05 0.08AA 0.62 0.17 0.41 0.20 0.16 0.24CC 0.61 0.10 0.28 0.13 0.12 0.17
Table: TW-SOCCER
Evaluation: Pairwise Ranking Correlations
ProfileRank PageRank #propag. #followers
ProfileRank - n/a 0.89 n/aPageRank 0.28 - n/a n/a#propag. 0.81 0.30 - n/a#followers 0.29 0.81 0.32 -
(a) User metrics
ProfileRank #content PageRank #user #followerspropag. propag.
ProfileRank - 0.36 n/a 0.42 n/a#content propag. 0.22 - n/a 0.44 n/aPageRank 0.26 -0.02 - n/a n/a#user propag. 0.27 0.11 0.42 - n/a#followers 0.25 -0.01 0.83 0.45 -
(b) Content metrics
Evaluation: Execution Time
Dataset ProfileRank PageRankTW-CARS 3.85 10.04TW-SOCCER 39.32 133.55TW-ELECTIONS 5.28 9.20TW-LARGE 17.74 59.33MEME 1.23 3.86
Table: Running time (in seconds).