![Page 1: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/1.jpg)
Discovering Overlapping Groups in Social Media
Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu
[email protected] State University
![Page 2: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/2.jpg)
Social Media• Facebook
– 500 million active users– 50% of users log on to Facebook everyday
• Twitter– 100 million users– 300, 000 new users everyday– 55 million tweets everyday
• Flickr– 12 million members– 5 billion photos
3
![Page 3: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/3.jpg)
Activities in Social Media• Connect with others to form “Friends”• Interact with others (comment, discussion,
messaging)• Bookmark websites/URLs (StumbleUpon,
Delicious)• Join groups if explicitly exist (Flickr, YouTube)• Write blogs (Wordpress,Myspace)• Update status (Twitter, Facebook)• Share content (Flickr, YouTube, Delicious)
5
![Page 4: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/4.jpg)
Community Structure
• Behavior Studying– Individual ? Too many users– Site level ? Lose too much details– Community level. Yes, provide information
with vary granularity
6
![Page 5: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/5.jpg)
Overlapping Communities
8
Colleagues
Family
Neighbors
![Page 6: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/6.jpg)
Related Work• Disjoint Community Detection
– Modularity Maximization– Based on Link Structure, (how to understand ?)
• Overlapping Community Detection– Soft Clustering (Clustering is dense)– CFinder (Efficiency and Scalability)
• Co-clustering– Disjoint– Understanding groups by words (tags)
9
![Page 7: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/7.jpg)
Problem Statement
• Given a User-Tag subscription matrix M, and the number of clusters k, find k overlapping communities which consist of both users and tags.
u3
t2
u1
u2
t1
t4u4
u5
t3
10
![Page 8: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/8.jpg)
Our Contributions• Extracting overlapping communities that
better reflect reality
• Clustering on a user-tag graph. Tags are informative in identifying user interests
• Understanding groups by looking at tags within each group
11
![Page 9: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/9.jpg)
u3t2
u1
u2t1
t4u4
u5
t3
Edge-centric View
• Cluster edges instead of nodes into disjoint groups– One node can belong to multiple groups – One edge belongs to one group
u3
t2
u1
u2
t1
t4
u4
u5
t3
12
![Page 10: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/10.jpg)
Edge-centric View
• In an Edge-centric viewedge u1 u2 u3 u4 u5 t1 t2 t3 t4
e1 1 0 0 0 0 1 0 0 0
e2 1 0 0 0 0 0 1 0 0
e3 0 1 0 0 0 1 0 0 0
e4 0 1 0 0 0 0 1 0 0
e5 0 0 1 0 0 0 1 0 0
e6 0 0 1 0 0 0 0 1 0
e7 0 0 0 1 0 0 0 1 0
e8 0 0 0 1 0 0 0 0 1e9 0 0 0 0 1 0 0 1 0
e10 0 0 0 0 1 0 0 0 113
![Page 11: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/11.jpg)
Clustering Edges• We can use any clustering algorithms
(e.g., k-means) to group similar edges together
• Different similarity schemes
14
k
i Cxijc
Cij
cxSk 1
),(1maxarg
![Page 12: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/12.jpg)
Defining Edge Similarity
• Similarity between two edges e and e’ can be defined, but not limited, by
ui
ujtp
tq
),()1(),()',( qptjiue ttSuuSeeS
• α is set to 0.5, which suggests the equal importance of user and tag
• Define user-user and tag-tag similarity 15
![Page 13: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/13.jpg)
Independent Learning
• Assume users are independent, tags are independent
nmnm
nm
ttuueeS qpjie
,0,1
),(
)),(),((21)',(
16
![Page 14: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/14.jpg)
Normalized Learning
• Differentiate nodes with varying degrees by normalizing each node with its nodal degree
)0,...,0,1,0,...,0,1,0,...0(),(pi tu
pi ddtue
2222
),(),()',(
qpji
jiqp
ttuu
qpuujitte
dddd
ttdduuddeeS
17
![Page 15: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/15.jpg)
Correlational Learning• Tags are semantically close
– Tags cars, automobile, autos, car reviews are used to describe a blog written by sid0722 on BlogCatalog
u Х t u Х k
• Compute user-user and tag-tag cosine similarity in the latent space
18
)~~~~
~~~~
(21)',(
qp
qp
ji
jie tt
ttuuuu
eeS
![Page 16: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/16.jpg)
Spectral Clustering Perspective• Graph partition can be solved by the Generalized
Eigenvalue problem
VU
Z
MM
W
DMMD
L
WzLz
T
T
z
00
min
2
1
19
![Page 17: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/17.jpg)
Spectral Clustering Perspective• Plug in L,W,Z, we obtain
VDUM
UDVM
VU
DD
VU
DMMD
TT
T
T
2
1
2
1
)1(
)1(
2001
• U and V are the right and left singular vectors corresponding to the top k largest singular values of user-tag matrix M
20
![Page 18: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/18.jpg)
Synthetic Data Sets
• Synthetic data sets– Number of clusters, users, and tags – Inner-cluster density and Inter-cluster density
(1% of total user-tag links)– Normalized mutual Information
• Between 0 and 1• The higher, the better
21
![Page 19: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/19.jpg)
Synthetic Performance• We fix the number of users, tags, and
density, but vary the number of clusters
22
![Page 20: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/20.jpg)
Synthetic Performance• We fixed the number of users, tags, and
clusters, but vary the inner-cluster density
23
![Page 21: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/21.jpg)
Social Media Data Sets• BlogCatalog
– Tags describing each blog– Category predefined by BlogCatalog for each
blog
• Delicious– Tags describing each bookmark– Select the top 10 most frequently used tags
for each person
24
![Page 22: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/22.jpg)
Inferring Personal Interests
• Category information reveals personal interests, view group affiliation as features to infer personal interests via cross-validation
25
![Page 23: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/23.jpg)
Connectivity Study• The correlation between the number of co-
occurrence of two users in different affiliations and their connectivity in real networks.
• The larger the co-occurrence of two users, the more likely they are connected
26
![Page 24: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/24.jpg)
Understanding Groups via Tag Cloud
• Tag cloud for Category Health
27
![Page 25: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/25.jpg)
Understanding Groups via Tag Cloud
• Tag cloud for Cluster Health
28
![Page 26: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/26.jpg)
Understanding Groups via Tag Cloud
• Tag cloud for Cluster Nutrition
29
![Page 27: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/27.jpg)
Conclusions and Future Work• Overlapping communities on a User-Tag
graph• Propose an edge-centric view and define
edge similarity– Independent Learning– Normalized Learning– Correlational Learning
• Evaluate results in synthetic and real data sets
• Many applications: link prediction, Scalability
30
![Page 28: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/28.jpg)
References• I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph
partitioning,” in KDD ’01, NY, USA• L. Tang and H. Liu, “Scalable learning of collective behavior based on sparse social
dimensions,” in CIKM’09, NY, USA.• L. Tang and H. Liu, “Community Detection and Mining in Social Media,” Morgan &
Claypool Publishers, Synthesis Lectures on Data Mining and Knowledge Discovery, 2010.
• G. Palla, I. Dernyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature’05, vol.435, no.7043, p.814
• K. Yu, S. Yu, and V. Tresp, “Soft clustering on graphs,” in NIPS, p. 05, 2005.• U. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4,
pp. 395–416, 2007.• M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in
networks,” Phys. Rev. E, vol. 69, no. 2, p. 026113, Feb 2004.• S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5,
pp. 75 – 174, 2010.
31
![Page 29: Discovering Overlapping Groups in Social Media](https://reader036.vdocuments.mx/reader036/viewer/2022062814/568168ab550346895ddf4fd9/html5/thumbnails/29.jpg)
Contact the Authors
• Xufei Wang– [email protected]– Arizona State University
• Lei Tang– [email protected]– Yahoo! Labs
32