[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...
TRANSCRIPT
Mining User’s Real Social Circle in Microblog
Hailong Qin
Research Center for Social Computingand Information Retrieval
Harbin Institute of TechnologyHarbin,China
E-mail:[email protected]
Ting Liu
Research Center for Social Computingand Information Retrieval
Harbin Institute of TechnologyHarbin,China
Email: [email protected]
Yanjun Ma
BaiduBeijing,China
Email: [email protected]
Abstract—As a media and communication platform, mi-croblog is more and more popular around the world. Userscan follow anyone ranges from well-known individuals to realfriends, and read their tweets without their permission. Mostusers follow a large number of celebrities and public media inmicroblog; however, these celebrities do not necessarily followall their fans. Such one-way relationship abounds in the usernetwork and is displayed in the forms of users’ followees andfollowers, which make it difficult to identify users’ real friendswho are contained in the merged list of followees and followers.The aim of this paper is to propose a general algorithmfor mining users’ real friends in social media and dividingthem into different social circles automatically according tothe closeness of their relationships. To verify the effectivenessof the proposed algorithm, we build a microblog applicationwhich presents the social circles for users identified by thealgorithm and enable users to modify the proposed resultsaccording to her/his real social circles. We demonstrate thatour algorithm is superior to traditional clustering method interms of F measure and Mean Average Precision.
Keywords-social circle; community detection; social networkanalysis;
I. INTRODUCTION
Social media has become one of most popular online
communication tools. With the popularity of microblog sites
on the internet, such as Twitter, Sina Weibo, thousands of
millions of people all over the world have been turned into
users of microblog services.
Microblog is a network service with the function of both
social network and the news media. Users of microblog can
follow others or be followed. When a user follows other
users, a one-way relation will be built without any confir-
mation. If two users follow each other, they are bilateral
followers. The number of followees and followers of a user
will be displayed as two separate lists. Users of microblog
can follow many celebrities whom they adore and some
public media such as newspapers and websites. As a result,
well-known individuals or organizations normally have a
greater number of followers in microblog, from several
thousands to millions, or even more.
In the relation networks of well-known people, there
exists an abundance of one-way relations imply because
their followers often outnumber their followees[1]. Most of
bilateral relations are built among acquaintances. The social
aspect of microblog can be mainly reflected in the interaction
between users and their real friends such as classmates,
neighbors and members in the same organizations. However,
all followees and followers are mixed in their lists whoever
they are friends, or families or superstars you adore, making
it difficult to organize as the lists grow over time[2].
In fact, there are fewer real friends of users within the
relationship in microblog[2]. Real friends tend to be more
interactive, and the frequent communication between users
may imply that they have more in common in terms of user
characteristics. In addition, the useful information can be
transmitted between real friends as they are acquaintances
offline so that they have more common topics such as
about the job, the school, the hobby, and the hometown etc.
Furthermore, a given user has many kinds of real friends in
microblog network, such as their classmates and colleagues.
The relationship lists of users in microblog cannot show this
information either. As the messages between acquaintances
have relatively higher credibility, and will be diffused more
rapidly than between other users, identifying these potential
real relationships will promote the research and commercial
work of social network. Much credible information and user
behaviors are originated from these relationships. Based on
the feature of social circles which consist of close friends,
we can further research several areas more effectively, such
as collaborative filtering, advertise recommendation, user
behavior and information credibility in microblog and so on.
This paper will focus on a method of mining the real friends
and the social circles of a given user in a microblog website.
Our method mines users’ real social circles by analyzing the
structure of their relationship network. Experimental results
show that we can find a user’s several social circles, each of
them reveals a kind of the user’s specific social relationship.
In general, it is a community detection problem. There
is already much previous work about that. But most work
detects the communities within overall network without
a centric user. This kind of work can just reveal partial
social characters, and it is difficult to reveal the individual
behaviors with social networks. Our work can identify a
specific user’s social circles by finding out several maximum
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.64
348
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.64
348
complete graphs including this user in the relationship
network. However, identifying this kind of graphs is an NP-
complete problem. On the other hand, a social circle is not
always a maximum complete graph in microblog, since not
every user follows each other in the circle. Our method uses
the approximate maximum complete graph instead of it.
The rest of this paper is organized as follows: in the
next section, we describe related work. Then we introduce
our methodology of mining social circles. To evaluate the
performance of our method, we develop an application for
finding out users’ social circles in Sina Weibo as presented
in section IV. We also compare our method with K-means
clustering in section V. Finally, we conclude our work and
point out avenues for future research.
II. RELATED WORK
In 1969, the term of social circle is defined as the critical
dimension of social structure. The behavior in social circle
has five factors: information interaction, safety, privacy,
neighboring preferences and localism[3][4]. Social circle
is closely related to the term community which has two
interpretations: one is the geographical notion of community
and another is relational. The second one is concerned with
quality of character of human relationship, without reference
to location[5]. The online social circle carries the second
meaning and does not concern physical locations.
Bernado proposed that it is necessary to mine users’ real
friends[2]. In this paper, if one user @ another user at least
twice, they are more likely friends . But this definition
just analyzes the content of post. The @ sign is rather
rare in the post, and most of them exist in repost other
than communication with friends. Obviously, the structure
of relationship network is more suitable for analyzing users’
social circles. Grivan and Newman detected communities
by removing edges with maximum betweenness[6]. And
Newman detected clusters by computing the modularity
of community[7]. These community detection methods are
based on the whole network. And they are not suitable
for community detection around a centric node. Fang and
Huberman solved community detection based on notions
of voltage drops across network[8]. It can identify the
community of a given node. But it can only mine one single
community of the given node. Huiqi and Ram mined the
personal community via call detail records[9]. They find a
community around a given user, and the distance between
the user and the user’s every friend can display in the net-
work structure. Although the above two methods proposed
the method of personal community detection, they cannot
mine more than one community based on a given node.
A given user will have several social circles in microblog
and some friends may belong to the user’s multiple social
circles. In general, a subgroup of social network has some
key players[10]. In our work, the given user may not be a
core player in the social circle, but this user is the connection
of several social circles. In[11] the algorithm computed the
overlapping k-clique in a network. For different k, the nodes
of the complete sub-graph in a social circle are different. Qu
and Liu[12] proposed a semi-supervised method to detect
social circles in twitter. The method uses information on
network and text respectively. They crawl users’ manually
constructed groups using Twitter API as the seeds of their
mining method. However, this method can only detect one
group at a time.
III. METHODOLOGY
In this section, we introduce our method to identify a
given user’s social circles in microblog by mining overlaps
of bilateral followings between this user’s bilateral followers.
This is a process of mining approximate complete cluster.
Figure 1. A user’s four social circles.One has four members and threehave five.
As our microblog platform is Sina Weibo, we collect
users’ bilateral followings firstly by Sina API. We can regard
those who follow each other as potential real friends. In the
field of social science, two people having more common
friends lead to much closer relation between them. In a
directed network, bilateral means either two vertexes have an
edge to each other. Assuming everyone within a social circle
should acquaint with others, finding out a user’s real social
circles in the microblog is equal to mining several distinct
maximum complete clusters of this user-centered network.
In general, a user has many social circles and all circles
should include this user. A friend in one social circle of
a user may also belong to this user’s other social circles.
(As shown in Fig. 1) However, the microblog is a virtual
environment. It does not guarantee that all these people are
bilateral followings although they are really close friends in
a social circle. If mining absolute complete cluster, some real
members in social circles will be excluded from the circle.
We mine an approximate complete cluster to avoid omitting
some real friends of the user’s social circle. Regarding the
social network as a directed graph, there we mine several
approximate complete clusters around a centric vertex.
349349
Definition I:Every user in the graph is a vertex. The given
vertex is the center, and represents the candidate user who
will be mined social circles.
Definition II:The neighbor of a vertex represents a bilat-
eral friend of this vertex.
Definition III:For every vertex n, the number and the
name list of the neighbors are respectively denoted as
N NEIGHBOR(n) and V NEIGHBOR(n), representing bi-
lateral followings of n in microblog.
Definition IV:The similarity between vertex a and vertex
b is |Sa,b|. Sa,b is equal to the set of common neighbors of
a and b (1). The number of vertexes in Sa,b is |Sa,b|. This
value is the weight between vertex a and b.
Sa,b = V NEIGHBOR(a)⋂
V NEIGHBOR(b) (1)
Definition V:Dclu,node is the normalized distance be-
tween a vertex node and a cluster clu as defined in (2).
For every vertex i in a cluster clu of the centric node c, the
algorithm considers the percentage of members of Snode,i∧
Snode,c(common neighbors of node, i and c) in members of
Snode,c(common neighbors of node and c ). Dclu,node is
equal to the average value of this percentage which node in
clu. When judging whether a friend node belongs to certain
social circle clu, this value measures the familiarity between
node and existing members of clu. In (2), num is the number
of existing members in clu.
Dclu,node =
∑ |Snode,i∧Snode,c|
num× |Snode,c|(i ∈ members of clu)
(2)
First, much closer vertexes are more likely to for-
m clusters. Using Sina Weibo API, we can get
N NEIGHBOR(center) and V NEIGHBOR(center). That is
the number and list of neighbors around the given vertex.
After obtaining these vertexes, we compute every
Scenter,v. v is every center’s neighbors. In this process,
the number of times calling API is equal to the number
of friends. Then we record every Scenter,v and |Scenter,v|.Then we assign |Scenter,v| to the weight between center and
v. According to all weights, we rank given user’s friends by
descending order. By regarding every user as a vertex in the
graph, the vertexes which sharing more neighbors with the
given vertex have more front position in the rank. It means
if two users have more common friends, and they are more
likely to belong to the same social circle. Compared with
other members in the same cluster, some friends stand in
front of rank since they have more connections with others
in the social circle, while they form the strong tie within a
cluster. However, those friends who have adjacent positions
are not always in the same group. Since a user may have one
more social circles and their sizes are approximate. Members
of a same social circle perhaps discontinuous in the rank.
Since all neighbors are ranked by the weights, so we take
the nodes on top of the rank to build the social circles.
They share more common neighbors with the centric node,
those are common friends. At the beginning of the mining
process, the vertex at the top of rank builds the first cluster.
For each of the rest nodes, we compute Dclu,node of all
existing clusters. When Dclu,node is above a threshold θ,
the vertex will be added to the cluster. We test the algorithm
performance when θ is ranging from 0.2 to 0.8 in a small-
scale data set. It shows the optimum threshold is 0.4. So we
assign 0.4 to θ in our application as described in the next
section. We prove that it is the best parameter at the end of
experiment.
One node may be a member of one cluster or many
clusters. If it does not belong to any clusters, this vertex
will be used to build a new cluster, and it will be the first
member of it. After computing all neighbors of the centric
node, several clusters will be built and all vertexes will be
assigned to one or more clusters.
Algorithm: MINING CLUSTERS OF A VERTEXProcedure FindCompleteClusters (center)center: The given vertexV NEIGHBOR(center): The neighbors list of center.N NEIGHBOR(center): The weights between neighbors and center.G(center)=(V,E)V are vertexes in V NEIGHBOR(center), E arethe weights between center and V NEIGHBOR(center)R(center): the set of ranked neighbors of center by descending weight.CLU(center): the set of approximate maximum clusters including center.1:for all ri ∈ R(center) do2: if i=1 then3: Add BuildCluster(ri) to CLU(center)4: else5: for all cluj ∈ CLU(center) do6: if Dclu,node > θ
7: Add ri to cluj8: end if9: end for10: end if11: if ri is not in any CLU(center)12: Add BuildCluster(ri) to CLU(center)13: end if14: end forCLU(center) is the output of clusters
IV. EXPERIMENT:A WEIBO APPLICATION
To validate our method, we develop an application called
Weibo Group Picture1 in Sina Weibo. It is an implementation
of our social circles mining algorithm which can identify
potential social circles and real friends of a given user in
microblog.
In this application, we analyze a user’s relationship net-
work and mining the user’s social circles. Users can adjust
social circles to real social relationships. As long as users
finish adjusting, we can get the modified data. Then we can
compare the real result with social circles as suggested by
our algorithm.
After a user logs in the application, the user’s all social
circles are displayed to the user in the interface (As shown
in Fig. 2). The information on the centric user displays
1Weibo Group Picture: http://jitizhao.sinaapp.com/
350350
in the top left corner ( 1©). And each bunch of balloons
represents a social circle, where the members of the circle
have closer relationships with each other( 2©). The user can
name the social circle in the text area below the bunch, such
as classmate, friends in hometown( 3©).
Figure 2. Display of social circles.
The social circle may contain inappropriate members, as
somebody may belong to other social circles or no social
circle. For users adjusting the result of social circles, we
provide a modification interface (As shown in Fig. 3).
The user can switch different social circles by clicking
different circles in below( 4©), and the last symbol contains
the friends who should not belong to any group( 5©). Also,
they can adjust (add or delete) these members and drag their
avatars( 6©) to correct groups( 4©). The system collects final
results after users sending their friend circles to posts.
Figure 3. The interface of members modification.
Up to March 26th 2012, we collect 595 social circles.
The number of adjusted social circles is 113. A user may
adjust one or more social circles. The rest of social circles
are not modified, namely, their accuracy is 100%. There can
be some exactly correct circles among them. However, it is
not clear whether these circles are correct or users simply
do not want to modify them. Hence they are excluded from
the test set. Additionally, the accuracy of 16 social circles is
0.0%. That means users remove all members of the social
circle and build a new one. Our system cannot predict users’
behavior. They may adjust social circles just for fun. We
exclude them from the test set as a variety of reasons can
lead to these results. The algorithm performance can hardly
be estimated by them.
Table IDATA SET
Posting Social Adjusting Social Effective AdjustingCircles Circles Social Circles
595 113 97
We use two measurements to evaluate the performance
of the result, F value and MAP (Mean Average Precision)
value. Let SCori represents members in the original social
circle, SCadj represents members in the adjusted social
circle. The precision and the recall are computed by (3)
and (4) respectively. N is the amount of correct friends
in SCori, POSi is the position of the ith correct friend.
AP(Average Precision) is represented by (5), while MAP
value is computed by (6).
Precision =|SCori
∧SCadj|
|SCori|(3)
Recall =|SCori
∧SCadj|
|SCadj|(4)
AP =
∑ni=1
iPOSi
|SCori|(5)
MAP =ΣAP
Total Number of Social Circles(6)
After obtaining correct results of users’ social circles, we
compute F value and MAP value of our method, which
stand at 74.75% and 66.68% respectively. Compared with
correct results, we change the parameter range from 0.2 to
0.8 and run the algorithm again. The experiment shows that
our algorithm reaches the best performance when θ is set to
0.4(As shown Fig. 4)
Figure 4. Algorithm performance at different parameter.
V. BASELINE METHOD
Traditionally, clustering is a common method of com-
munity detection. As a baseline method, we use K-means
clustering to detect social circles in microblog.
351351
A vector represents a user and it consists of a set of
key/value pairs (7). ID is the set of the user’s bilateral
friends. When given a centric user, we build a vector for
every the user’s bilateral friend first. K-means clustering can
be used to combine these vectors into several clusters. If the
vectors of two users are V1 and V2, the distance between
them can be measured using cosine similarity of V1 and V2
(8).
V ectoruser = {id1 = 1, id2 = 2......idn = 1}(idx ∈ ID)(7)
Sim(V1, V2) =V1 · V2|V1||V2|
(8)
In order to select a more reasonable k value, our method
detects the centric user’s social circles first. Then k is set to
the number of circles detected by our method. When mining
a centric user’s social circles, we select the user’s k friends
randomly and run K-means until the algorithm converging.
For 97 adjusted social circles as described in the previous
section, we detect their social circles by K-means clustering.
The result shows that our method is superior to K-means (As
shown in Fig. 5). Specifically, our method outperforms K-
means by an absolute 14% in terms and F-value and MAP.
Figure 5. Compare with K-means.
VI. CONCLUSION
In the paper, we propose an algorithm for mining social
circles in microblog. Unlikely the previous study, the method
solves the problem of mining several social circles based for
a specific user. A person has many kinds of relationship
network in the real life. These distinct subgroups have
diverse meanings and characteristics. Distinguishing these
social circles will provide valuable information for both
business and research.
Within a social circle, the members have a closer rela-
tionship and are acquainted with each other. Followees and
followers represent user relationship in microblog and both
of them are one-way relationships. Based on the nature of
social circle, we mine the subgroups by bilateral followers.
When given a centric user, the algorithm finds the user’s
bilateral friends firstly. According to the overlap of bilateral
followers between these friends, we find out several approx-
imate maximum clusters around this central user. That are
the user’s social circles in microblog.
To verify the accuracy of mining result, we develop a
Sina Weibo application to search users’ social circle and
display the results to them. They can adjust the results by
modifying the incorrect members. We demonstrate that our
method achieves an absolute 14% improvement in F measure
and MAP over K-means clustering.
In the future, we will use more textual information in our
method to further improve the performance of social circles
mining. Moreover, as social network is a reflection of real
society, we will study the phenomenon of social relationship
using online social circle.
ACKNOWLEDGMENT
This work is supported by the Natural Science Foundation
of China (61073126, 61133012).
REFERENCES
[1] Haewoon. K, Changhyun. L, Hosung P, and Sue. M, What isTwitter, a Social Network or a News Media? In Proc. ACMInt. Conf. on WWW (WWW’10), pp. 591-600, 2010.
[2] Bernardo. A, Daniel. M, and Fang. W, Social networks thatmatter: Twitter under the microscope, First Monday, vol. 14,no. 1, 2009.
[3] Tropman. J. E, Critical dimensions of community structure: Areexamination of the Hadden-Borgotta findings, Urban AffairsQuarterly, vol. 5, pp. 215-232 1963.
[4] Doolittle. R. J, and MacDonald. D. K. Elissa, Communicationand a sense of community in a metropolitan neighborhood:A factor analytic examination, Communication Quarterly, vol.26, pp. 2-7, 1987.
[5] Gusfield. J. R, The community: A critical response, Harper-collins, 1978.
[6] Grivan. M, Newman, M, Community structure in social andbiological network, Proceedings of the National Academy ofSciences of the United States of America, vol. 99, no. 12, pp.7821-7826, 2002.
[7] Newman. M, Fast algorithm for detecting community structurein networks, Physical Review E, vol. 69, 2004.
[8] Fang. W, Bernardo. A, Finding communities in linear time: aphysics approach, The European Physical Journal B, vol. 38,no. 2, pp. 331-338, 2004.
[9] Huiqi. Z, Ram. D, Discovery of Social Groups Using CallDetail Records, Lecture Notes in Computer Science, vol.5333/2008, pp. 489-498, 2008.
[10] Conghuan. Y, Dense Subgroup Identifying in Social Net-work, 2011 International Conference on Advances in SocialNetworks Analysis and Mining (ASONAM’11), pp. 555-556,2011.
[11] Palla. G, Dere nyi. I, Farkas. I, and Vicsek. T, Uncoveringthe overlapping community structure of complex networks innature and society, Nature, vol. 435, pp. 814-818.
[12] Zhonghua. Qu, Yang. Liu, Interactive Group Suggesting forTwitter, Proceedings of the 49th Annual Meeting of the Asso-ciation for Computational Linguistics (ACL’11), pp. 519-523,2011.
352352