[IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining - Mining User's Real Social Circle in Microblog

Download [IEEE 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) - Istanbul (2012.08.26-2012.08.29)] 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining - Mining User's Real Social Circle in Microblog

Post on 27-Mar-2017




3 download

Embed Size (px)


<ul><li><p>Mining Users Real Social Circle in Microblog</p><p>Hailong QinResearch Center for Social Computing</p><p>and Information RetrievalHarbin Institute of Technology</p><p>Harbin,ChinaE-mail:hlqin@ir.hit.edu.cn</p><p>Ting LiuResearch Center for Social Computing</p><p>and Information RetrievalHarbin Institute of Technology</p><p>Harbin,ChinaEmail: liuting@ir.hit.edu.cn</p><p>Yanjun MaBaidu</p><p>Beijing,ChinaEmail: yma@baidu.com</p><p>AbstractAs a media and communication platform, mi-croblog is more and more popular around the world. Userscan follow anyone ranges from well-known individuals to realfriends, and read their tweets without their permission. Mostusers follow a large number of celebrities and public media inmicroblog; however, these celebrities do not necessarily followall their fans. Such one-way relationship abounds in the usernetwork and is displayed in the forms of users followees andfollowers, which make it difficult to identify users real friendswho are contained in the merged list of followees and followers.The aim of this paper is to propose a general algorithmfor mining users real friends in social media and dividingthem into different social circles automatically according tothe closeness of their relationships. To verify the effectivenessof the proposed algorithm, we build a microblog applicationwhich presents the social circles for users identified by thealgorithm and enable users to modify the proposed resultsaccording to her/his real social circles. We demonstrate thatour algorithm is superior to traditional clustering method interms of F measure and Mean Average Precision.</p><p>Keywords-social circle; community detection; social networkanalysis;</p><p>I. INTRODUCTION</p><p>Social media has become one of most popular onlinecommunication tools. With the popularity of microblog siteson the internet, such as Twitter, Sina Weibo, thousands ofmillions of people all over the world have been turned intousers of microblog services.Microblog is a network service with the function of both</p><p>social network and the news media. Users of microblog canfollow others or be followed. When a user follows otherusers, a one-way relation will be built without any confir-mation. If two users follow each other, they are bilateralfollowers. The number of followees and followers of a userwill be displayed as two separate lists. Users of microblogcan follow many celebrities whom they adore and somepublic media such as newspapers and websites. As a result,well-known individuals or organizations normally have agreater number of followers in microblog, from severalthousands to millions, or even more.In the relation networks of well-known people, there</p><p>exists an abundance of one-way relations imply becausetheir followers often outnumber their followees[1]. Most of</p><p>bilateral relations are built among acquaintances. The socialaspect of microblog can be mainly reflected in the interactionbetween users and their real friends such as classmates,neighbors and members in the same organizations. However,all followees and followers are mixed in their lists whoeverthey are friends, or families or superstars you adore, makingit difficult to organize as the lists grow over time[2].In fact, there are fewer real friends of users within the</p><p>relationship in microblog[2]. Real friends tend to be moreinteractive, and the frequent communication between usersmay imply that they have more in common in terms of usercharacteristics. In addition, the useful information can betransmitted between real friends as they are acquaintancesoffline so that they have more common topics such asabout the job, the school, the hobby, and the hometown etc.Furthermore, a given user has many kinds of real friends inmicroblog network, such as their classmates and colleagues.The relationship lists of users in microblog cannot show thisinformation either. As the messages between acquaintanceshave relatively higher credibility, and will be diffused morerapidly than between other users, identifying these potentialreal relationships will promote the research and commercialwork of social network. Much credible information and userbehaviors are originated from these relationships. Based onthe feature of social circles which consist of close friends,we can further research several areas more effectively, suchas collaborative filtering, advertise recommendation, userbehavior and information credibility in microblog and so on.This paper will focus on a method of mining the real friendsand the social circles of a given user in a microblog website.Our method mines users real social circles by analyzing thestructure of their relationship network. Experimental resultsshow that we can find a users several social circles, each ofthem reveals a kind of the users specific social relationship.In general, it is a community detection problem. There</p><p>is already much previous work about that. But most workdetects the communities within overall network withouta centric user. This kind of work can just reveal partialsocial characters, and it is difficult to reveal the individualbehaviors with social networks. Our work can identify aspecific users social circles by finding out several maximum</p><p>2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</p><p>978-0-7695-4799-2/12 $26.00 2012 IEEEDOI 10.1109/ASONAM.2012.64</p><p>348</p><p>2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</p><p>978-0-7695-4799-2/12 $26.00 2012 IEEEDOI 10.1109/ASONAM.2012.64</p><p>348</p></li><li><p>complete graphs including this user in the relationshipnetwork. However, identifying this kind of graphs is an NP-complete problem. On the other hand, a social circle is notalways a maximum complete graph in microblog, since notevery user follows each other in the circle. Our method usesthe approximate maximum complete graph instead of it.The rest of this paper is organized as follows: in the</p><p>next section, we describe related work. Then we introduceour methodology of mining social circles. To evaluate theperformance of our method, we develop an application forfinding out users social circles in Sina Weibo as presentedin section IV. We also compare our method with K-meansclustering in section V. Finally, we conclude our work andpoint out avenues for future research.</p><p>II. RELATED WORK</p><p>In 1969, the term of social circle is defined as the criticaldimension of social structure. The behavior in social circlehas five factors: information interaction, safety, privacy,neighboring preferences and localism[3][4]. Social circleis closely related to the term community which has twointerpretations: one is the geographical notion of communityand another is relational. The second one is concerned withquality of character of human relationship, without referenceto location[5]. The online social circle carries the secondmeaning and does not concern physical locations.Bernado proposed that it is necessary to mine users real</p><p>friends[2]. In this paper, if one user @ another user at leasttwice, they are more likely friends . But this definitionjust analyzes the content of post. The @ sign is ratherrare in the post, and most of them exist in repost otherthan communication with friends. Obviously, the structureof relationship network is more suitable for analyzing userssocial circles. Grivan and Newman detected communitiesby removing edges with maximum betweenness[6]. AndNewman detected clusters by computing the modularityof community[7]. These community detection methods arebased on the whole network. And they are not suitablefor community detection around a centric node. Fang andHuberman solved community detection based on notionsof voltage drops across network[8]. It can identify thecommunity of a given node. But it can only mine one singlecommunity of the given node. Huiqi and Ram mined thepersonal community via call detail records[9]. They find acommunity around a given user, and the distance betweenthe user and the users every friend can display in the net-work structure. Although the above two methods proposedthe method of personal community detection, they cannotmine more than one community based on a given node.A given user will have several social circles in microblogand some friends may belong to the users multiple socialcircles. In general, a subgroup of social network has somekey players[10]. In our work, the given user may not be acore player in the social circle, but this user is the connection</p><p>of several social circles. In[11] the algorithm computed theoverlapping k-clique in a network. For different k, the nodesof the complete sub-graph in a social circle are different. Quand Liu[12] proposed a semi-supervised method to detectsocial circles in twitter. The method uses information onnetwork and text respectively. They crawl users manuallyconstructed groups using Twitter API as the seeds of theirmining method. However, this method can only detect onegroup at a time.</p><p>III. METHODOLOGY</p><p>In this section, we introduce our method to identify agiven users social circles in microblog by mining overlapsof bilateral followings between this users bilateral followers.This is a process of mining approximate complete cluster.</p><p>Figure 1. A users four social circles.One has four members and threehave five.</p><p>As our microblog platform is Sina Weibo, we collectusers bilateral followings firstly by Sina API. We can regardthose who follow each other as potential real friends. In thefield of social science, two people having more commonfriends lead to much closer relation between them. In adirected network, bilateral means either two vertexes have anedge to each other. Assuming everyone within a social circleshould acquaint with others, finding out a users real socialcircles in the microblog is equal to mining several distinctmaximum complete clusters of this user-centered network.In general, a user has many social circles and all circlesshould include this user. A friend in one social circle ofa user may also belong to this users other social circles.(As shown in Fig. 1) However, the microblog is a virtualenvironment. It does not guarantee that all these people arebilateral followings although they are really close friends ina social circle. If mining absolute complete cluster, some realmembers in social circles will be excluded from the circle.We mine an approximate complete cluster to avoid omittingsome real friends of the users social circle. Regarding thesocial network as a directed graph, there we mine severalapproximate complete clusters around a centric vertex.</p><p>349349</p></li><li><p>Definition I:Every user in the graph is a vertex. The givenvertex is the center, and represents the candidate user whowill be mined social circles.Definition II:The neighbor of a vertex represents a bilat-</p><p>eral friend of this vertex.Definition III:For every vertex n, the number and the</p><p>name list of the neighbors are respectively denoted asN NEIGHBOR(n) and V NEIGHBOR(n), representing bi-lateral followings of n in microblog.Definition IV:The similarity between vertex a and vertex</p><p>b is |Sa,b|. Sa,b is equal to the set of common neighbors ofa and b (1). The number of vertexes in Sa,b is |Sa,b|. Thisvalue is the weight between vertex a and b.</p><p>Sa,b = V NEIGHBOR(a)</p><p>V NEIGHBOR(b) (1)</p><p>DefinitionV:Dclu,node is the normalized distance be-tween a vertex node and a cluster clu as defined in (2).For every vertex i in a cluster clu of the centric node c, thealgorithm considers the percentage of members of Snode,i</p><p>Snode,c(common neighbors of node, i and c) in members ofSnode,c(common neighbors of node and c ). Dclu,node isequal to the average value of this percentage which node inclu. When judging whether a friend node belongs to certainsocial circle clu, this value measures the familiarity betweennode and existing members of clu. In (2), num is the numberof existing members in clu.</p><p>Dclu,node =</p><p> |Snode,iSnode,c|</p><p>num |Snode,c|(i members of clu)</p><p>(2)First, much closer vertexes are more likely to for-</p><p>m clusters. Using Sina Weibo API, we can getN NEIGHBOR(center) and V NEIGHBOR(center). That isthe number and list of neighbors around the given vertex.After obtaining these vertexes, we compute every</p><p>Scenter,v. v is every centers neighbors. In this process,the number of times calling API is equal to the numberof friends. Then we record every Scenter,v and |Scenter,v|.Then we assign |Scenter,v| to the weight between center andv. According to all weights, we rank given users friends bydescending order. By regarding every user as a vertex in thegraph, the vertexes which sharing more neighbors with thegiven vertex have more front position in the rank. It meansif two users have more common friends, and they are morelikely to belong to the same social circle. Compared withother members in the same cluster, some friends stand infront of rank since they have more connections with othersin the social circle, while they form the strong tie within acluster. However, those friends who have adjacent positionsare not always in the same group. Since a user may have onemore social circles and their sizes are approximate. Membersof a same social circle perhaps discontinuous in the rank.Since all neighbors are ranked by the weights, so we take</p><p>the nodes on top of the rank to build the social circles.</p><p>They share more common neighbors with the centric node,those are common friends. At the beginning of the miningprocess, the vertex at the top of rank builds the first cluster.For each of the rest nodes, we compute Dclu,node of allexisting clusters. When Dclu,node is above a threshold ,the vertex will be added to the cluster. We test the algorithmperformance when is ranging from 0.2 to 0.8 in a small-scale data set. It shows the optimum threshold is 0.4. So weassign 0.4 to in our application as described in the nextsection. We prove that it is the best parameter at the end ofexperiment.One node may be a member of one cluster or many</p><p>clusters. If it does not belong to any clusters, this vertexwill be used to build a new cluster, and it will be the firstmember of it. After computing all neighbors of the centricnode, several clusters will be built and all vertexes will beassigned to one or more clusters.</p><p>Algorithm: MINING CLUSTERS OF A VERTEXProcedure FindCompleteClusters (center)center: The given vertexV NEIGHBOR(center): The neighbors list of center.N NEIGHBOR(center): The weights between neighbors and center.G(center)=(V,E)V are vertexes in V NEIGHBOR(center), E arethe weights between center and V NEIGHBOR(center)R(center): the set of ranked neighbors of center by descending weight.CLU(center): the set of approximate maximum clusters including center.1:for all ri R(center) do2: if i=1 then3: Add BuildCluster(ri) to CLU(center)4: else5: for all cluj CLU(center) do6: if Dclu,node &gt; 7: Add ri to cluj8: end if9: end for10: end if11: if ri is not in any CLU(center)12: Add BuildCluster(ri) to CLU(center)13: end if14: end forCLU(center) is the output of clusters</p><p>IV. EXPERIMENT:A WEIBO APPLICATION</p><p>To validate our method, we develop an application calledWeibo Group Picture...</p></li></ul>


View more >