[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...

5
Mining User’s Real Social Circle in Microblog Hailong Qin Research Center for Social Computing and Information Retrieval Harbin Institute of Technology Harbin,China E-mail:[email protected] Ting Liu Research Center for Social Computing and Information Retrieval Harbin Institute of Technology Harbin,China Email: [email protected] Yanjun Ma Baidu Beijing,China Email: [email protected] Abstract—As a media and communication platform, mi- croblog is more and more popular around the world. Users can follow anyone ranges from well-known individuals to real friends, and read their tweets without their permission. Most users follow a large number of celebrities and public media in microblog; however, these celebrities do not necessarily follow all their fans. Such one-way relationship abounds in the user network and is displayed in the forms of users’ followees and followers, which make it difficult to identify users’ real friends who are contained in the merged list of followees and followers. The aim of this paper is to propose a general algorithm for mining users’ real friends in social media and dividing them into different social circles automatically according to the closeness of their relationships. To verify the effectiveness of the proposed algorithm, we build a microblog application which presents the social circles for users identified by the algorithm and enable users to modify the proposed results according to her/his real social circles. We demonstrate that our algorithm is superior to traditional clustering method in terms of F measure and Mean Average Precision. Keywords-social circle; community detection; social network analysis; I. I NTRODUCTION Social media has become one of most popular online communication tools. With the popularity of microblog sites on the internet, such as Twitter, Sina Weibo, thousands of millions of people all over the world have been turned into users of microblog services. Microblog is a network service with the function of both social network and the news media. Users of microblog can follow others or be followed. When a user follows other users, a one-way relation will be built without any confir- mation. If two users follow each other, they are bilateral followers. The number of followees and followers of a user will be displayed as two separate lists. Users of microblog can follow many celebrities whom they adore and some public media such as newspapers and websites. As a result, well-known individuals or organizations normally have a greater number of followers in microblog, from several thousands to millions, or even more. In the relation networks of well-known people, there exists an abundance of one-way relations imply because their followers often outnumber their followees[1]. Most of bilateral relations are built among acquaintances. The social aspect of microblog can be mainly reflected in the interaction between users and their real friends such as classmates, neighbors and members in the same organizations. However, all followees and followers are mixed in their lists whoever they are friends, or families or superstars you adore, making it difficult to organize as the lists grow over time[2]. In fact, there are fewer real friends of users within the relationship in microblog[2]. Real friends tend to be more interactive, and the frequent communication between users may imply that they have more in common in terms of user characteristics. In addition, the useful information can be transmitted between real friends as they are acquaintances offline so that they have more common topics such as about the job, the school, the hobby, and the hometown etc. Furthermore, a given user has many kinds of real friends in microblog network, such as their classmates and colleagues. The relationship lists of users in microblog cannot show this information either. As the messages between acquaintances have relatively higher credibility, and will be diffused more rapidly than between other users, identifying these potential real relationships will promote the research and commercial work of social network. Much credible information and user behaviors are originated from these relationships. Based on the feature of social circles which consist of close friends, we can further research several areas more effectively, such as collaborative filtering, advertise recommendation, user behavior and information credibility in microblog and so on. This paper will focus on a method of mining the real friends and the social circles of a given user in a microblog website. Our method mines users’ real social circles by analyzing the structure of their relationship network. Experimental results show that we can find a user’s several social circles, each of them reveals a kind of the user’s specific social relationship. In general, it is a community detection problem. There is already much previous work about that. But most work detects the communities within overall network without a centric user. This kind of work can just reveal partial social characters, and it is difficult to reveal the individual behaviors with social networks. Our work can identify a specific user’s social circles by finding out several maximum 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4799-2/12 $26.00 © 2012 IEEE DOI 10.1109/ASONAM.2012.64 348 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 978-0-7695-4799-2/12 $26.00 © 2012 IEEE DOI 10.1109/ASONAM.2012.64 348

Upload: lyanh

Post on 27-Mar-2017

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

Mining User’s Real Social Circle in Microblog

Hailong Qin

Research Center for Social Computingand Information Retrieval

Harbin Institute of TechnologyHarbin,China

E-mail:[email protected]

Ting Liu

Research Center for Social Computingand Information Retrieval

Harbin Institute of TechnologyHarbin,China

Email: [email protected]

Yanjun Ma

BaiduBeijing,China

Email: [email protected]

Abstract—As a media and communication platform, mi-croblog is more and more popular around the world. Userscan follow anyone ranges from well-known individuals to realfriends, and read their tweets without their permission. Mostusers follow a large number of celebrities and public media inmicroblog; however, these celebrities do not necessarily followall their fans. Such one-way relationship abounds in the usernetwork and is displayed in the forms of users’ followees andfollowers, which make it difficult to identify users’ real friendswho are contained in the merged list of followees and followers.The aim of this paper is to propose a general algorithmfor mining users’ real friends in social media and dividingthem into different social circles automatically according tothe closeness of their relationships. To verify the effectivenessof the proposed algorithm, we build a microblog applicationwhich presents the social circles for users identified by thealgorithm and enable users to modify the proposed resultsaccording to her/his real social circles. We demonstrate thatour algorithm is superior to traditional clustering method interms of F measure and Mean Average Precision.

Keywords-social circle; community detection; social networkanalysis;

I. INTRODUCTION

Social media has become one of most popular online

communication tools. With the popularity of microblog sites

on the internet, such as Twitter, Sina Weibo, thousands of

millions of people all over the world have been turned into

users of microblog services.

Microblog is a network service with the function of both

social network and the news media. Users of microblog can

follow others or be followed. When a user follows other

users, a one-way relation will be built without any confir-

mation. If two users follow each other, they are bilateral

followers. The number of followees and followers of a user

will be displayed as two separate lists. Users of microblog

can follow many celebrities whom they adore and some

public media such as newspapers and websites. As a result,

well-known individuals or organizations normally have a

greater number of followers in microblog, from several

thousands to millions, or even more.

In the relation networks of well-known people, there

exists an abundance of one-way relations imply because

their followers often outnumber their followees[1]. Most of

bilateral relations are built among acquaintances. The social

aspect of microblog can be mainly reflected in the interaction

between users and their real friends such as classmates,

neighbors and members in the same organizations. However,

all followees and followers are mixed in their lists whoever

they are friends, or families or superstars you adore, making

it difficult to organize as the lists grow over time[2].

In fact, there are fewer real friends of users within the

relationship in microblog[2]. Real friends tend to be more

interactive, and the frequent communication between users

may imply that they have more in common in terms of user

characteristics. In addition, the useful information can be

transmitted between real friends as they are acquaintances

offline so that they have more common topics such as

about the job, the school, the hobby, and the hometown etc.

Furthermore, a given user has many kinds of real friends in

microblog network, such as their classmates and colleagues.

The relationship lists of users in microblog cannot show this

information either. As the messages between acquaintances

have relatively higher credibility, and will be diffused more

rapidly than between other users, identifying these potential

real relationships will promote the research and commercial

work of social network. Much credible information and user

behaviors are originated from these relationships. Based on

the feature of social circles which consist of close friends,

we can further research several areas more effectively, such

as collaborative filtering, advertise recommendation, user

behavior and information credibility in microblog and so on.

This paper will focus on a method of mining the real friends

and the social circles of a given user in a microblog website.

Our method mines users’ real social circles by analyzing the

structure of their relationship network. Experimental results

show that we can find a user’s several social circles, each of

them reveals a kind of the user’s specific social relationship.

In general, it is a community detection problem. There

is already much previous work about that. But most work

detects the communities within overall network without

a centric user. This kind of work can just reveal partial

social characters, and it is difficult to reveal the individual

behaviors with social networks. Our work can identify a

specific user’s social circles by finding out several maximum

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.64

348

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4799-2/12 $26.00 © 2012 IEEE

DOI 10.1109/ASONAM.2012.64

348

Page 2: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

complete graphs including this user in the relationship

network. However, identifying this kind of graphs is an NP-

complete problem. On the other hand, a social circle is not

always a maximum complete graph in microblog, since not

every user follows each other in the circle. Our method uses

the approximate maximum complete graph instead of it.

The rest of this paper is organized as follows: in the

next section, we describe related work. Then we introduce

our methodology of mining social circles. To evaluate the

performance of our method, we develop an application for

finding out users’ social circles in Sina Weibo as presented

in section IV. We also compare our method with K-means

clustering in section V. Finally, we conclude our work and

point out avenues for future research.

II. RELATED WORK

In 1969, the term of social circle is defined as the critical

dimension of social structure. The behavior in social circle

has five factors: information interaction, safety, privacy,

neighboring preferences and localism[3][4]. Social circle

is closely related to the term community which has two

interpretations: one is the geographical notion of community

and another is relational. The second one is concerned with

quality of character of human relationship, without reference

to location[5]. The online social circle carries the second

meaning and does not concern physical locations.

Bernado proposed that it is necessary to mine users’ real

friends[2]. In this paper, if one user @ another user at least

twice, they are more likely friends . But this definition

just analyzes the content of post. The @ sign is rather

rare in the post, and most of them exist in repost other

than communication with friends. Obviously, the structure

of relationship network is more suitable for analyzing users’

social circles. Grivan and Newman detected communities

by removing edges with maximum betweenness[6]. And

Newman detected clusters by computing the modularity

of community[7]. These community detection methods are

based on the whole network. And they are not suitable

for community detection around a centric node. Fang and

Huberman solved community detection based on notions

of voltage drops across network[8]. It can identify the

community of a given node. But it can only mine one single

community of the given node. Huiqi and Ram mined the

personal community via call detail records[9]. They find a

community around a given user, and the distance between

the user and the user’s every friend can display in the net-

work structure. Although the above two methods proposed

the method of personal community detection, they cannot

mine more than one community based on a given node.

A given user will have several social circles in microblog

and some friends may belong to the user’s multiple social

circles. In general, a subgroup of social network has some

key players[10]. In our work, the given user may not be a

core player in the social circle, but this user is the connection

of several social circles. In[11] the algorithm computed the

overlapping k-clique in a network. For different k, the nodes

of the complete sub-graph in a social circle are different. Qu

and Liu[12] proposed a semi-supervised method to detect

social circles in twitter. The method uses information on

network and text respectively. They crawl users’ manually

constructed groups using Twitter API as the seeds of their

mining method. However, this method can only detect one

group at a time.

III. METHODOLOGY

In this section, we introduce our method to identify a

given user’s social circles in microblog by mining overlaps

of bilateral followings between this user’s bilateral followers.

This is a process of mining approximate complete cluster.

Figure 1. A user’s four social circles.One has four members and threehave five.

As our microblog platform is Sina Weibo, we collect

users’ bilateral followings firstly by Sina API. We can regard

those who follow each other as potential real friends. In the

field of social science, two people having more common

friends lead to much closer relation between them. In a

directed network, bilateral means either two vertexes have an

edge to each other. Assuming everyone within a social circle

should acquaint with others, finding out a user’s real social

circles in the microblog is equal to mining several distinct

maximum complete clusters of this user-centered network.

In general, a user has many social circles and all circles

should include this user. A friend in one social circle of

a user may also belong to this user’s other social circles.

(As shown in Fig. 1) However, the microblog is a virtual

environment. It does not guarantee that all these people are

bilateral followings although they are really close friends in

a social circle. If mining absolute complete cluster, some real

members in social circles will be excluded from the circle.

We mine an approximate complete cluster to avoid omitting

some real friends of the user’s social circle. Regarding the

social network as a directed graph, there we mine several

approximate complete clusters around a centric vertex.

349349

Page 3: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

Definition I:Every user in the graph is a vertex. The given

vertex is the center, and represents the candidate user who

will be mined social circles.

Definition II:The neighbor of a vertex represents a bilat-

eral friend of this vertex.

Definition III:For every vertex n, the number and the

name list of the neighbors are respectively denoted as

N NEIGHBOR(n) and V NEIGHBOR(n), representing bi-

lateral followings of n in microblog.

Definition IV:The similarity between vertex a and vertex

b is |Sa,b|. Sa,b is equal to the set of common neighbors of

a and b (1). The number of vertexes in Sa,b is |Sa,b|. This

value is the weight between vertex a and b.

Sa,b = V NEIGHBOR(a)⋂

V NEIGHBOR(b) (1)

Definition V:Dclu,node is the normalized distance be-

tween a vertex node and a cluster clu as defined in (2).

For every vertex i in a cluster clu of the centric node c, the

algorithm considers the percentage of members of Snode,i∧

Snode,c(common neighbors of node, i and c) in members of

Snode,c(common neighbors of node and c ). Dclu,node is

equal to the average value of this percentage which node in

clu. When judging whether a friend node belongs to certain

social circle clu, this value measures the familiarity between

node and existing members of clu. In (2), num is the number

of existing members in clu.

Dclu,node =

∑ |Snode,i∧Snode,c|

num× |Snode,c|(i ∈ members of clu)

(2)

First, much closer vertexes are more likely to for-

m clusters. Using Sina Weibo API, we can get

N NEIGHBOR(center) and V NEIGHBOR(center). That is

the number and list of neighbors around the given vertex.

After obtaining these vertexes, we compute every

Scenter,v. v is every center’s neighbors. In this process,

the number of times calling API is equal to the number

of friends. Then we record every Scenter,v and |Scenter,v|.Then we assign |Scenter,v| to the weight between center and

v. According to all weights, we rank given user’s friends by

descending order. By regarding every user as a vertex in the

graph, the vertexes which sharing more neighbors with the

given vertex have more front position in the rank. It means

if two users have more common friends, and they are more

likely to belong to the same social circle. Compared with

other members in the same cluster, some friends stand in

front of rank since they have more connections with others

in the social circle, while they form the strong tie within a

cluster. However, those friends who have adjacent positions

are not always in the same group. Since a user may have one

more social circles and their sizes are approximate. Members

of a same social circle perhaps discontinuous in the rank.

Since all neighbors are ranked by the weights, so we take

the nodes on top of the rank to build the social circles.

They share more common neighbors with the centric node,

those are common friends. At the beginning of the mining

process, the vertex at the top of rank builds the first cluster.

For each of the rest nodes, we compute Dclu,node of all

existing clusters. When Dclu,node is above a threshold θ,

the vertex will be added to the cluster. We test the algorithm

performance when θ is ranging from 0.2 to 0.8 in a small-

scale data set. It shows the optimum threshold is 0.4. So we

assign 0.4 to θ in our application as described in the next

section. We prove that it is the best parameter at the end of

experiment.

One node may be a member of one cluster or many

clusters. If it does not belong to any clusters, this vertex

will be used to build a new cluster, and it will be the first

member of it. After computing all neighbors of the centric

node, several clusters will be built and all vertexes will be

assigned to one or more clusters.

Algorithm: MINING CLUSTERS OF A VERTEXProcedure FindCompleteClusters (center)center: The given vertexV NEIGHBOR(center): The neighbors list of center.N NEIGHBOR(center): The weights between neighbors and center.G(center)=(V,E)V are vertexes in V NEIGHBOR(center), E arethe weights between center and V NEIGHBOR(center)R(center): the set of ranked neighbors of center by descending weight.CLU(center): the set of approximate maximum clusters including center.1:for all ri ∈ R(center) do2: if i=1 then3: Add BuildCluster(ri) to CLU(center)4: else5: for all cluj ∈ CLU(center) do6: if Dclu,node > θ

7: Add ri to cluj8: end if9: end for10: end if11: if ri is not in any CLU(center)12: Add BuildCluster(ri) to CLU(center)13: end if14: end forCLU(center) is the output of clusters

IV. EXPERIMENT:A WEIBO APPLICATION

To validate our method, we develop an application called

Weibo Group Picture1 in Sina Weibo. It is an implementation

of our social circles mining algorithm which can identify

potential social circles and real friends of a given user in

microblog.

In this application, we analyze a user’s relationship net-

work and mining the user’s social circles. Users can adjust

social circles to real social relationships. As long as users

finish adjusting, we can get the modified data. Then we can

compare the real result with social circles as suggested by

our algorithm.

After a user logs in the application, the user’s all social

circles are displayed to the user in the interface (As shown

in Fig. 2). The information on the centric user displays

1Weibo Group Picture: http://jitizhao.sinaapp.com/

350350

Page 4: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

in the top left corner ( 1©). And each bunch of balloons

represents a social circle, where the members of the circle

have closer relationships with each other( 2©). The user can

name the social circle in the text area below the bunch, such

as classmate, friends in hometown( 3©).

Figure 2. Display of social circles.

The social circle may contain inappropriate members, as

somebody may belong to other social circles or no social

circle. For users adjusting the result of social circles, we

provide a modification interface (As shown in Fig. 3).

The user can switch different social circles by clicking

different circles in below( 4©), and the last symbol contains

the friends who should not belong to any group( 5©). Also,

they can adjust (add or delete) these members and drag their

avatars( 6©) to correct groups( 4©). The system collects final

results after users sending their friend circles to posts.

Figure 3. The interface of members modification.

Up to March 26th 2012, we collect 595 social circles.

The number of adjusted social circles is 113. A user may

adjust one or more social circles. The rest of social circles

are not modified, namely, their accuracy is 100%. There can

be some exactly correct circles among them. However, it is

not clear whether these circles are correct or users simply

do not want to modify them. Hence they are excluded from

the test set. Additionally, the accuracy of 16 social circles is

0.0%. That means users remove all members of the social

circle and build a new one. Our system cannot predict users’

behavior. They may adjust social circles just for fun. We

exclude them from the test set as a variety of reasons can

lead to these results. The algorithm performance can hardly

be estimated by them.

Table IDATA SET

Posting Social Adjusting Social Effective AdjustingCircles Circles Social Circles

595 113 97

We use two measurements to evaluate the performance

of the result, F value and MAP (Mean Average Precision)

value. Let SCori represents members in the original social

circle, SCadj represents members in the adjusted social

circle. The precision and the recall are computed by (3)

and (4) respectively. N is the amount of correct friends

in SCori, POSi is the position of the ith correct friend.

AP(Average Precision) is represented by (5), while MAP

value is computed by (6).

Precision =|SCori

∧SCadj|

|SCori|(3)

Recall =|SCori

∧SCadj|

|SCadj|(4)

AP =

∑ni=1

iPOSi

|SCori|(5)

MAP =ΣAP

Total Number of Social Circles(6)

After obtaining correct results of users’ social circles, we

compute F value and MAP value of our method, which

stand at 74.75% and 66.68% respectively. Compared with

correct results, we change the parameter range from 0.2 to

0.8 and run the algorithm again. The experiment shows that

our algorithm reaches the best performance when θ is set to

0.4(As shown Fig. 4)

Figure 4. Algorithm performance at different parameter.

V. BASELINE METHOD

Traditionally, clustering is a common method of com-

munity detection. As a baseline method, we use K-means

clustering to detect social circles in microblog.

351351

Page 5: [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances

A vector represents a user and it consists of a set of

key/value pairs (7). ID is the set of the user’s bilateral

friends. When given a centric user, we build a vector for

every the user’s bilateral friend first. K-means clustering can

be used to combine these vectors into several clusters. If the

vectors of two users are V1 and V2, the distance between

them can be measured using cosine similarity of V1 and V2

(8).

V ectoruser = {id1 = 1, id2 = 2......idn = 1}(idx ∈ ID)(7)

Sim(V1, V2) =V1 · V2|V1||V2|

(8)

In order to select a more reasonable k value, our method

detects the centric user’s social circles first. Then k is set to

the number of circles detected by our method. When mining

a centric user’s social circles, we select the user’s k friends

randomly and run K-means until the algorithm converging.

For 97 adjusted social circles as described in the previous

section, we detect their social circles by K-means clustering.

The result shows that our method is superior to K-means (As

shown in Fig. 5). Specifically, our method outperforms K-

means by an absolute 14% in terms and F-value and MAP.

Figure 5. Compare with K-means.

VI. CONCLUSION

In the paper, we propose an algorithm for mining social

circles in microblog. Unlikely the previous study, the method

solves the problem of mining several social circles based for

a specific user. A person has many kinds of relationship

network in the real life. These distinct subgroups have

diverse meanings and characteristics. Distinguishing these

social circles will provide valuable information for both

business and research.

Within a social circle, the members have a closer rela-

tionship and are acquainted with each other. Followees and

followers represent user relationship in microblog and both

of them are one-way relationships. Based on the nature of

social circle, we mine the subgroups by bilateral followers.

When given a centric user, the algorithm finds the user’s

bilateral friends firstly. According to the overlap of bilateral

followers between these friends, we find out several approx-

imate maximum clusters around this central user. That are

the user’s social circles in microblog.

To verify the accuracy of mining result, we develop a

Sina Weibo application to search users’ social circle and

display the results to them. They can adjust the results by

modifying the incorrect members. We demonstrate that our

method achieves an absolute 14% improvement in F measure

and MAP over K-means clustering.

In the future, we will use more textual information in our

method to further improve the performance of social circles

mining. Moreover, as social network is a reflection of real

society, we will study the phenomenon of social relationship

using online social circle.

ACKNOWLEDGMENT

This work is supported by the Natural Science Foundation

of China (61073126, 61133012).

REFERENCES

[1] Haewoon. K, Changhyun. L, Hosung P, and Sue. M, What isTwitter, a Social Network or a News Media? In Proc. ACMInt. Conf. on WWW (WWW’10), pp. 591-600, 2010.

[2] Bernardo. A, Daniel. M, and Fang. W, Social networks thatmatter: Twitter under the microscope, First Monday, vol. 14,no. 1, 2009.

[3] Tropman. J. E, Critical dimensions of community structure: Areexamination of the Hadden-Borgotta findings, Urban AffairsQuarterly, vol. 5, pp. 215-232 1963.

[4] Doolittle. R. J, and MacDonald. D. K. Elissa, Communicationand a sense of community in a metropolitan neighborhood:A factor analytic examination, Communication Quarterly, vol.26, pp. 2-7, 1987.

[5] Gusfield. J. R, The community: A critical response, Harper-collins, 1978.

[6] Grivan. M, Newman, M, Community structure in social andbiological network, Proceedings of the National Academy ofSciences of the United States of America, vol. 99, no. 12, pp.7821-7826, 2002.

[7] Newman. M, Fast algorithm for detecting community structurein networks, Physical Review E, vol. 69, 2004.

[8] Fang. W, Bernardo. A, Finding communities in linear time: aphysics approach, The European Physical Journal B, vol. 38,no. 2, pp. 331-338, 2004.

[9] Huiqi. Z, Ram. D, Discovery of Social Groups Using CallDetail Records, Lecture Notes in Computer Science, vol.5333/2008, pp. 489-498, 2008.

[10] Conghuan. Y, Dense Subgroup Identifying in Social Net-work, 2011 International Conference on Advances in SocialNetworks Analysis and Mining (ASONAM’11), pp. 555-556,2011.

[11] Palla. G, Dere nyi. I, Farkas. I, and Vicsek. T, Uncoveringthe overlapping community structure of complex networks innature and society, Nature, vol. 435, pp. 814-818.

[12] Zhonghua. Qu, Yang. Liu, Interactive Group Suggesting forTwitter, Proceedings of the 49th Annual Meeting of the Asso-ciation for Computational Linguistics (ACL’11), pp. 519-523,2011.

352352