understanding user behavior in online social networks: a survey

7
IEEE Communications Magazine • September 2013 144 0163-6804/13/$25.00 © 2013 IEEE 1 http://finance.yahoo. com/news/number-active- users-facebook-over- years-214600186—financ e.html INTRODUCTION In recent years, online social networks (OSNs) have dramatically expanded in popularity around the world. According to the data in October 2012, Facebook has 1.01 billion people using the site each month. 1 Moreover, the numbers of users in five popular OSNs are listed in Table 1. The rapid growth of OSNs has attracted a large number of researchers to explore and study this popular, ubiquitous, and large-scale service. In this article, we focus on understanding user behavior in OSNs. OSN user behavior covers various social activities that users can do online, such as friend- ship creation, content publishing, profile brows- ing, messaging, and commenting. Notably, these activities can be legitimate or malicious. Under- standing OSN user behavior is important to dif- ferent Internet entities in several aspects: • For Internet service providers (ISPs), as OSN traffic is growing quickly and becom- ing significant, they want to learn the evolu- tion of the traffic pattern of OSNs. This can guide them to do some infrastructural actions (e.g., adding traffic optimization in network middle-boxes). • For OSN service providers, it helps them understand their customers’ attitudes toward different functions, especially for some experimental functions. Moreover, from the perspective of infrastructure investment, such as which locations are most cost-effective to build data centers or which content delivery network (CDN) cluster could be leveraged to deliver fre- quently accessed data, understanding users’ geographic distribution and traffic activity is vital. • For OSN users, behavior study is important to enhance user experience. For example, there are numerous malicious accounts in OSNs. These accounts generate unwanted messages for legitimate users. Therefore, identifying and blocking malicious users are very important to ensure good user experi- ence. Our survey contains four aspects of under- standing user behavior in OSNs. First, a social graph is a classic and effective mathematical model to represent the relationship between users in OSNs, and has been widely used in OSN research. Based on four different types of social graphs, we discuss the aspect of connectivity and interaction. Second, network monitoring records detailed traffic activity of OSNs and provides us with a method to understand the network usage of OSNs. Also, network-based measurement results can demonstrate more users’ activities than using the social graph only. Therefore, we focus on the perspective of traffic activity. Third, the rapid development of mobile platforms and applications plays an important role in OSN- related applications. Mobile devices not only ABSTRACT Currently, online social networks such as Facebook, Twitter, Google+, LinkedIn, and Foursquare have become extremely popular all over the world and play a significant role in people’s daily lives. People access OSNs using both traditional desktop PCs and new emerging mobile devices. With more than one billion users worldwide, OSNs are a new venue of inno- vation with many challenging research prob- lems. In this survey, we aim to give a comprehensive review of state-of-the-art research related to user behavior in OSNs from several perspectives. First, we discuss social con- nectivity and interaction among users. Also, we investigate traffic activity from a network per- spective. Moreover, as mobile devices become a commodity, we pay attention to the characteris- tics of social behaviors in mobile environments. Last but not least, we review malicious behav- iors of OSN users, and discuss several solutions to detect misbehaving users. Our survey serves the important roles of both providing a system- atic exploration of existing research highlights and triggering various potentially significant research in these topics. ACCEPTED FROM OPEN CALL Long Jin, University of California, San Diego Yang Chen, Duke University Tianyi Wang, Tsinghua University Pan Hui, Hong Kong University of Science and Technology/Telekom Innovation Laboratories Athanasios V. Vasilakos, Kuwait University Understanding User Behavior in Online Social Networks: A Survey

Upload: athanasios-v

Post on 14-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding user behavior in online social networks: a survey

IEEE Communications Magazine • September 2013144 0163-6804/13/$25.00 © 2013 IEEE

1 http://finance.yahoo.com/news/number-active-users-facebook-over-years-214600186—finance.html

INTRODUCTION

In recent years, online social networks (OSNs)have dramatically expanded in popularity aroundthe world. According to the data in October2012, Facebook has 1.01 billion people using thesite each month.1 Moreover, the numbers ofusers in five popular OSNs are listed in Table 1.The rapid growth of OSNs has attracted a largenumber of researchers to explore and study thispopular, ubiquitous, and large-scale service. Inthis article, we focus on understanding userbehavior in OSNs.

OSN user behavior covers various socialactivities that users can do online, such as friend-ship creation, content publishing, profile brows-ing, messaging, and commenting. Notably, theseactivities can be legitimate or malicious. Under-

standing OSN user behavior is important to dif-ferent Internet entities in several aspects:• For Internet service providers (ISPs), as

OSN traffic is growing quickly and becom-ing significant, they want to learn the evolu-tion of the traffic pattern of OSNs. This canguide them to do some infrastructuralactions (e.g., adding traffic optimization innetwork middle-boxes).

• For OSN service providers, it helps themunderstand their customers’ attitudestoward different functions, especially forsome experimental functions. Moreover,from the perspective of infrastructureinvestment, such as which locations aremost cost-effective to build data centers orwhich content delivery network (CDN)cluster could be leveraged to deliver fre-quently accessed data, understanding users’geographic distribution and traffic activity isvital.

• For OSN users, behavior study is importantto enhance user experience. For example,there are numerous malicious accounts inOSNs. These accounts generate unwantedmessages for legitimate users. Therefore,identifying and blocking malicious users arevery important to ensure good user experi-ence.Our survey contains four aspects of under-

standing user behavior in OSNs. First, a socialgraph is a classic and effective mathematicalmodel to represent the relationship betweenusers in OSNs, and has been widely used in OSNresearch. Based on four different types of socialgraphs, we discuss the aspect of connectivity andinteraction. Second, network monitoring recordsdetailed traffic activity of OSNs and provides uswith a method to understand the network usageof OSNs. Also, network-based measurementresults can demonstrate more users’ activitiesthan using the social graph only. Therefore, wefocus on the perspective of traffic activity. Third,the rapid development of mobile platforms andapplications plays an important role in OSN-related applications. Mobile devices not only

ABSTRACT

Currently, online social networks such asFacebook, Twitter, Google+, LinkedIn, andFoursquare have become extremely popular allover the world and play a significant role inpeople’s daily lives. People access OSNs usingboth traditional desktop PCs and new emergingmobile devices. With more than one billionusers worldwide, OSNs are a new venue of inno-vation with many challenging research prob-lems. In this survey, we aim to give acomprehensive review of state-of-the-artresearch related to user behavior in OSNs fromseveral perspectives. First, we discuss social con-nectivity and interaction among users. Also, weinvestigate traffic activity from a network per-spective. Moreover, as mobile devices become acommodity, we pay attention to the characteris-tics of social behaviors in mobile environments.Last but not least, we review malicious behav-iors of OSN users, and discuss several solutionsto detect misbehaving users. Our survey servesthe important roles of both providing a system-atic exploration of existing research highlightsand triggering various potentially significantresearch in these topics.

ACCEPTED FROM OPEN CALL

Long Jin, University of California, San Diego

Yang Chen, Duke University

Tianyi Wang, Tsinghua University

Pan Hui, Hong Kong University of Science and Technology/Telekom Innovation Laboratories

Athanasios V. Vasilakos, Kuwait University

Understanding User Behavior in Online Social Networks: A Survey

VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 144

Page 2: Understanding user behavior in online social networks: a survey

IEEE Communications Magazine • September 2013 145

provide a venue for users to access OSNs every-where, but also establish mobile-centric func-tions like location-based service (LBS) in OSNs.Understanding the behavior of mobile users canbe leveraged to enhance the performance ofmobile social applications and systems. Thus, wereview the studies of mobile social behavior. Lastbut not least, OSNs introduce new challengesrelated to security and privacy. Malicious behav-iors, such as spam and Sybil attacks, take placein OSNs and bring severe security threats. Weshow studies of malicious behavior.

CONNECTIVITY AND INTERACTION

MOTIVATION AND CHALLENGESThe social graph is an effective and widely-usedmathematical tool to represent the relationshipsamong users in OSNs, which benefits the analy-sis of social interactions and user behavior char-acterization. Usually, social networks can bemodeled as undirected graphs (e.g., friendshipgraph, interaction graph) or directed graphs(e.g., latent graph, following graph) according tothe properties of OSNs. Table 2 lists four differ-ent types of social graphs. Based on these graphtypes, we discuss the connectivity and interactionamong OSN users. Moreover, the huge size ofthe social graph challenges the effectiveness ofanalysis. Thus, graph sampling and crawlingtechniques have been proposed to deal with thisproblem. In this section, we investigate severalmeasurement, analysis, and modeling worksrelated to the social graph.

EXISTING SOLUTIONS AND DISCUSSIONUndirected Graph Model — For a friendshipgraph, every user is denoted as a node, and thefriendship between any user pair is representedby an edge. Wilson et al. [1] try to find outwhether social links are valid indicators of userinteractions. They define wall posts and photocomments as interactions. Based on the crawleddata from Facebook, they have found that userstend to interact mostly with only a small subsetof their friends, while often having no interac-tion with up to half of their friends. Therefore,friendship in OSNs can hardly be viewed thesame as friendship in the real world. Corre-spondingly, a new interaction graph is proposedto reflect the real user interactions in social net-works, where only visible interaction betweentwo users can create an edge in the graph,instead of being friends only. Using two repre-sentative applications, spam and Sybil protec-tions, they demonstrate that using an interactiongraph performs better than using a friendshipgraph.

Directed Graph Model — Latent interactionsare passive actions of OSN users (e.g., profilebrowsing) that cannot be observed by traditionalmeasurement techniques. Jiang et al. [2] studylatent interactions based on the crawled data ofRenren, the largest OSN provider in China.Renren tracks the most recent nine visitors toevery user’s profile, making the measurement oflatent interactions possible. In a directed latentgraph, a directed edge from A to B indicates Ahas visited B’s profile. Therefore, the in-degree

of a node shows the number of visitors to thatuser’s profile, while the out-degree reveals thenumber of profiles that user has visited. A com-parison between latent interactions and visibleinteractions is conducted based on Renren’scrawled data, which contains 42 million usersand 1.66 billion social links. There are threemajor findings. First, latent interactions are sig-nificantly more prevalent and frequent than visi-ble interactions. Second, latent interactions arenon-reciprocal in nature. Last but not least, theprofile popularity is uncorrelated with the fre-quency of content updates or number of friendsfor very popular users. The characteristics oflatent graphs are shown to fall between visibleinteraction graphs and classical friendshipgraphs.

Hwak et al. [3] perform extensive measure-ment on Twitter, the world’s largest microblog-ging service, and reveals its power in informationspreading on the news media level. In Twitter’sfollowing graph, a directed edge from A to Bindicates A has subscribed to receive B’s latestmessages. The collected data is crawled over 24days, with 41.7 million user profiles, 1.47 billionrelations, 4262 trending topics, and 106 milliontweets. It introduces a directed graph model togive a basic informative overview of Twitter,studies the distribution of followers/followees,and analyzes how the number of followers or fol-lowees affects the number of tweets. Additional-ly, in order to show how Twitter acts as a socialmedium and top users influence other users, thisarticle tries to rank the users by number of fol-lowers, page rank, and retweets. The rankings bynumber of followers and page rank are almostthe same, and the top users in the rankings areeither celebrities or news media accounts. Thisarticle also analyzes the trending topics in Twit-ter and compares it with other media. It is found

Table 1. Information about five popular OSNs.

OSN site No. of users

Facebook 1.01 billion (Oct. 2012)

Twitter 500 million (Apr. 2012)

Google+ 400 million (Sep. 2012)

LinkedIn 175 million (Jun. 2012)

Foursquare 25 million (Sep. 2012)

Table 2. Four different types of social graph.

Type Edge

Friendship graph Friendship between users

Interaction graph Visible interaction, such as posting on a wall

Latent graph Latent interaction, such as browsing profile

Following graph Subscribe to receive all messages

VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 145

Page 3: Understanding user behavior in online social networks: a survey

IEEE Communications Magazine • September 2013146

that the majority (over 85 percent) of trendingtopics in Twitter are headline or persistent newsin nature, which reveals Twitter’s live broadcast-ing nature and confirms Twitter’s role as a newsmedium.

Graph Sampling — A fast increase in the num-ber of users makes the size of social graphs larg-er and larger, which presents researchers with abig challenge when performing any analysis withlimited computation and storage capability.Graph sampling techniques are used to get asmaller but representative snapshot of socialgraphs, which preserves properties such asdegree distribution. As shown in [4], the sam-pling result of Breadth-First Sampling (BFS) andRandom Walk (RW) are biased toward high-degree vertices, although they have been widelyused in social graph analysis. The Metropolis-Hasting RW (MHRW) and a Re-Weighted RW(RWRW) are proposed and proved to performuniformly in sampling Facebook. The article alsointroduces online convergence diagnostics toassess sample quality during the sampling pro-cess. Frontier Sampling (FS) [5], which leveragesmultidimensional RW, is proposed to achievelower estimation errors than RW, especially inthe presence of disconnected or loosely connect-ed graphs. Ribeiro et al. [5] show that FS is moresuitable for estimating the tail of degree distri-bution than random vertex sampling. Moreover,FS can be made fully distributed without anycoordination costs.

FUTURE WORKThe dynamic feature is an important aspect todeeply understand an OSN’s user behavior.Much of the existing work tries to investigate anOSN in a relatively static way, by collecting orstudying a static snapshot dataset. However, thegrowth of OSNs is extremely rapid. Every daynew users join OSNs, while existing users makenew friends or end social connections, join orleave groups, and so on. Considering this dynam-ic can extract more inherent information thanstudying static data, not only revealing the situa-tion at a certain time but also predicting somefuture activities. Also, studying different timeintervals and time granularities would lead to

more interesting findings. There are severalchallenges for performing dynamic analysis. Oneis fast data collection and timely processing,where an unbiased and efficient graph samplingalgorithm can play an important role. Also, col-lecting dynamic data raises challenges for infor-mation storage; therefore, the temporal andspatial dependence between different data itemscan be utilized for better compression.

TRAFFIC ACTIVITY

MOTIVATION AND CHALLENGESDifferent kinds of social graphs can reveal howusers connect and interact with each other.However, due to the limited information that thegraph can represent, various types of users’activities cannot be characterized (e.g., timeduration of browsing a profile). An observationfrom network operators can monitor such infor-mation easily, and interpret how users use OSNsbetter. Furthermore, for ISPs, they have strongincentive to get better understanding of how thetraffic pattern between end users and OSN siteswill evolve, and take optimization actions accord-ing to the distribution and activities of OSNusers. In this section, we review OSN user behav-ior study from the perspective of network trafficanalysis.

EXISTING SOLUTIONS AND DISCUSSIONTraffic Monitoring — Besides crawling, peoplecan also study OSNs by monitoring the corre-sponding network traffic. Benevenuto et al. [6]analyze the user behavior of OSNs based ondetailed clickstream data obtained from a socialnetwork aggregator, as illustrated in Fig. 1.

In [6], the clickstream data was collected over12 days with HTTP sessions of 37,024 users whoaccessed popular social networks. This articledefines and analyzes the OSN session character-istics:• The frequency of accessing OSNs• Total time spent on OSNs• Session duration of OSNsThrough the clickstream data, user activities arealso identified. Forty-one types of user activitiesare classified into nine groups, and the populari-ty of different activities and the traffic bytes are

Figure 1. Data collection through a social network aggregator.

3. OSN activity

1. User login

2. Authenticationto all sites

Socialnetwork

aggregator

Data collection

Data collection through social network aggregator

A fast increase in the

number of users

makes the size of

social graphs larger

and larger, which

presents researchers

with a big challenge

when performing

any analysis with lim-

ited computation

and storage capabili-

ty. Graph sampling

techniques are used

to get a smaller but

representative snap-

shot of social graphs.

VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 146

Page 4: Understanding user behavior in online social networks: a survey

IEEE Communications Magazine • September 2013 147

analyzed. Interestingly, it is found that silent orlatent interactions such as browsing account formore than 90 percent of user activities. Also,they show how users have different activities indifferent OSNs. They also characterize howusers transit from one activity to another using afirst-order Markov chain.

Schneider et al. [7] also study clickstreamdata, but their focuses are feature popularity,session characteristics, and the dynamics withinOSN sessions. The distribution of HTTPrequest-response pairs reveal the popularity ofdifferent features. The popularity of featurescan be different among users from differentareas and of different OSNs. It can also differby the time spent by the users. Besides, the dis-tribution of transmission bytes per OSN sessionis given, which helps the ISPs learn the trafficpattern of different OSNs. Photo featuresaccount for most traffic bytes of OSNs. It alsoshows the duration of sessions and number ofsubsessions within a session. Moreover, the arti-cle reveals the dynamics within OSN sessions. Itis found that most users access web sites otherthan OSNs during OSN sessions for more than1 min. That is, users can be inactive whenaccessing the OSNs.

Clickstream data contributes a lot in the userbehavior study of OSNs. However, it can beincomplete, which restricts its usage and perfor-mance. First, click-stream data is limited by thecollection duration, and the behavior of inactiveusers in the duration is not monitored. More-over, the data is restricted by the monitoringlocations. That is, only the behavior of usersusing certain monitored ISPs is captured.

Locality of Interest — Facebook is heavilydependent on centralized U.S. data centers toprovide consistent service to users all over theworld. Therefore, users outside the UnitedStates experience slow response time. Also, a lotof unnecessary traffic is generated on the Inter-net backbone. Wittie et al. [8] investigate thedetailed causes of these two problems and iden-tify mitigation opportunities. It is found thatOSN state is amenable to partitioning, and itsfine-grained distribution and processing can sig-nificantly improve performance without loss inservice consistency. Based on simulations ofreconstructed Facebook traffic over measuredInternet paths, it is shown that user requests canbe processed 79 percent faster and use 91 per-cent less bandwidth. Therefore, the partitioningof OSN state is an attractive scaling strategy forOSN service providers.

Navigation Characteristics — Nowadays,OSNs represent a significant portion of web traf-fic, comparable with search engines. Dunn et al.[9] try to understand the similarities and differ-ences in the web sites users visit through OSNsvs. through search engines. Using web trafficlogs from 17,000 digital subscriber line (DSL)subscribers of a Tier 1 ISP in the United States,it is found that OSN visitors are less likely tonavigate to external web sites. But when theyvisit external web sites, OSN users will spendmore time at those web sites compared to searchengine users. Also, OSNs direct visitors to a nar-

rower subset of the web than search engines.While web sites related to games and video aremore commonly visited from OSNs, shoppingand reference sites are common for searchengines. Finally, OSNs send users to less populardomains more often than search engines. Thesefindings can be useful to ISPs in network provi-sioning and traffic engineering.

FUTURE WORKMost existing measurement and analysis projectsare led by either academic groups or ISPs, with-out the active involvement of OSN service pro-viders. Such a situation limits the insight of thestudy. On one hand, academic researchers alwaysuse extensive crawling to obtain the data, whichencounters many restrictions from the OSN pro-viders, such as traffic control (how many mes-sages per IP and/or per account can be fetchedin one hour). Also, some users may use privacyoptions to make their data unavailable. Last butnot least, the huge number of users makes italmost impossible to get a timely snapshot, sodata consistency cannot be guaranteed. On theother hand, although an ISP is able to captureand analyze all its traffic to/from an OSN sitethrough traffic monitoring, it can only get a par-tial view of the whole site; that is, only users whoget access to OSNs through a specific ISP’sinfrastructure can be observed. As we have dis-cussed, user behavior study can be beneficial forOSN providers themselves. We envision thatOSN providers can collaborate with academiaand industrial researchers in order to understanduser behavior in an insightful way. This canenhance the user experience interactively andquickly. Also, this will save operational costs forOSN providers.

MOBILE SOCIAL BEHAVIOR

MOTIVATION AND CHALLENGENowadays, due to the wide use of mobile devices,more and more web applications have beenexpanded to mobile platforms, as have OSN ser-vices. We believe that it is the right time to high-light the importance of mobile social networks(MSNs). In MSNs, mobile users can publish andshare information based on the social connec-tions among them. On one hand, most majorOSN platforms such as Facebook, Twitter, andLinkedIn release mobile applications to allowusers to access their services through mobiledevices. On the other hand, more mobile-centricfunctions have been integrated into OSNs, suchas location-based services and mobile communi-cation. Understanding the user behavior inMSNs is very helpful for the design and imple-mentation of MSN systems, improving the sys-tem efficiency in mobile environments orsupporting better mobile-centric functions. Inthis section, we focus on studies of user behav-iors in MSNs.

EXISTING SOLUTIONS AND DISCUSSIONMobile Social Application — A large numberof interesting and useful mobile social applica-tions have been proposed. Social Serendipity [10]is a mobile-phone-based system that combineswidely used mobile phones with the functionality

We envision that

OSN providers can

collaborate with

academia and indus-

trial researchers in

order to understand

user behavior in an

insightful way. This

can enhance the

user experience

interactively and

quickly. Also, this will

save operational

costs for OSN

providers.

VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 147

Page 5: Understanding user behavior in online social networks: a survey

IEEE Communications Magazine • September 2013148

of online introduction systems to cue informalface-to-face interactions between nearby userswho do not know each other but probablyshould. Serendipity uses Bluetooth to sensenearby people and utilizes a centralized server todecide whether two users should be introducedto each other. The system calculates a similarityscore by extracting the commonalities betweentwo proximate users’ profiles and behavioraldata, and sums them according to user-definedweights. If the score is higher than the thresholdset by both users, the system will inform themthat someone nearby might be interested inthem. For instance, internal collaboration inlarge companies can be facilitated by Serendipityfor introducing people who are working on simi-lar projects. It is emphasized that privacy issuesare important and fundamental in Serendipity,and privacy-protecting tools should be designedcarefully.

Geographical Prediction in OSN — Geogra-phy and social relationships are inextricablyintertwined. As people spend more time online,data regarding these two dimensions are becom-ing increasingly precise, allowing building reli-able models to describe their interaction. In[11], the study of user-contributed address andassociation data from Facebook shows that theaddition of social information producesimprovement in accuracy of predicting physicallocation. First, friendship as a function of dis-tance and rank is analyzed. It is found that atmedium to long-range distances, the probabilityof friendship is roughly proportional to theinverse of distance. However, at shorter ranges,distance does not influence much. Then themaximum likelihood approach is presented topredict the physical location of a user, given theknown location of her friends. This method pre-dicts the physical location of 69.1 percent of theusers with 16 or more located friends to within25 mi, compared to only 57.2 percent using IP-based methods.

Friendship and Mobility in LBSN —Althoughhuman movement and mobility patterns have ahigh degree of freedom and variation, they alsoexhibit structural patterns due to geographic andsocial constraints. Using cell phone locationdata, as well as data from two online location-based social networks, Cho et al. [12] aim tounderstand the basic laws that govern humanmotion and dynamics. It is found that humansexperience a combination of strong short-rangespatially and temporally periodic movement thatis not impacted by the social network structure,while long-distance travel is more influenced bythe social network ties. Furthermore, it is shownthat social relationships can explain about 10 to30 percent of all human movement, while peri-odic behavior explains 50 to 70 percent. Basedon these findings, a model of human mobility isproposed that combines periodic short-rangemovements with travel due to the social networkstructure and gives an order of magnitude betterperformance than previous models.

Social-Based Routing in PSNs — Widely usedsmart devices with networking capability formnovel networks, such as pocket switched network(PSN). Due to the mobility of devices, PSNs areintermittently connected, and effective routingprotocols are essential in such networks. Previ-ous methods relied on building and updatingrouting tables to deal with dynamic conditions.Actually, the social structure and the interactionof users of smart devices have a great influenceon the performance of routing protocols. BUB-BLE Rap [13] is a social-based forwardingmethod for PSNs. Two social and structural met-rics, centrality and community, are used to effec-tively enhance delivery performance. As shownin Fig. 2, BUBBLE Rap first uses a centralitymetric to spread out the messages (i.e., sendingmessages to more popular nodes), and then usesa community metric to identify the destinationcommunity and focus the messages to the desti-nation. The evaluation shows that BUBBLE Raphas a similar delivery ratio, but much lowerresource utilization than flooding, control flood-ing, and other social-based forwarding schemes.

Content Distribution in MSN — Ioannidis etal. [14] study the dissemination of dynamic con-tent, such as news and traffic information, overan MSN. In this application, mobile users sub-scribe to a dynamic-content distribution serviceoffered by their service provider. To improvecoverage and increase capacity, it is assumedthat users share any content updates they receivewith other users they meet. Reference 14 deter-mines how the service provider can allocate itsbandwidth optimally to make the content atusers as “fresh” as possible. Moreover, there is acondition under which the system with high scal-ability is specified: even if the total bandwidthdedicated by the service provider remains fixed,the expected content age at each user growsslowly (as log(n)) with the number of users n.

FUTURE WORKThere are several fundamental issues thatrequire continuous exploration in the researchrelated to user behavior in MSNs, including

Figure 2. Illustration of the BUBBLE Rap algorithm.

Source

Illustration of the BUBBLE Rap algorithm

Globe community

Destination

Sub community

Ranking

VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 148

Page 6: Understanding user behavior in online social networks: a survey

IEEE Communications Magazine • September 2013 149

incentive mechanism, identity management,trust, reputation and privacy, energy efficiency,methods for social network metrics estimationand community detection, content distributionand sharing protocols, and precise localizationtechniques for geographic and semantic spaces.A comprehensive summary related to applica-tions, architectures, and protocol design issuesfor MSNs can be found in [15].

Furthermore, we believe social data deliveryand social applications in mobile environmentsrouse challenges in several layers of the Internetprotocol stack. Let us list three examples here.First, we need a better transport layer protocolto handle packet loss caused by wireless environ-ment and host mobility. Second, to efficientlydeliver popular content desired by multiple MSNusers, we need to deploy social-aware proxies inthe network infrastructure to eliminate duplicatetransmission. The deployment of those proxiesneeds to carefully consider social connections,users’ geolocations, and the topology of theunderlying wired/wireless Internet. Third, con-text-aware services will become very useful inMSNs. Such services will let user express theirdemand for social activities in cyberspace in ahuman-readable fashion, thus making socialinteraction among mobile users easier. All threeof these examples need lots of work in data ana-lyzing, modeling, and prototyping.

MALICIOUS BEHAVIOR

MOTIVATION AND CHALLENGESThe usage of OSNs introduces numerous securi-ty and privacy threats. For instance, as a userneeds to interact with other users through anOSN service provider, its activities and uploadeddata can be tracked and stored by the OSN ser-vice provider. These data (photos, articles, pub-lic posts, private messages, etc.) may be leakedto a third party without the user’s explicit autho-rization, even when the user regards some ofthese as confidential. Moreover, Sybil attacks arevery common in OSNs, as a user can registermultiple fake accounts maliciously. These fakeaccounts can perform various malicious activitiesincluding spamming, obtaining privacy contactlists, misleading crowd-sourcing results, and soon. Besides those, Gao et al. [16] list severalother attacks such as re-identification and de-anonymization of anonymized OSN data, fetch-ing personal data through untrusted third-partyapplications, cross-site profile cloning, socialspamming, and phishing. Due to space limita-tion, this survey mainly focuses on maliciousbehavior in OSNs, including spam and Sybilattacks.

EXISTING SOLUTIONS AND DISCUSSIONSocial Spam — OSNs are popular collabora-tion tools for millions of users and their friends.Unfortunately, they also become effective toolsfor executing spam campaigns and spreadingmalware. Intuitively, a user is more likely torespond to a message from a friend than from astranger; thus, social spamming is a more effec-tive distribution mechanism than traditionalemail. Gao et al. [17] study a large dataset com-posed of over 187 million wall messages among

3.5 million Facebook users. The system detected200,000 malicious wall posts with embeddedURLs originating from more than 57,000accounts. It is shown that more than 70 percentof all malicious wall posts advertise phishingsites. It is also found that more than 97 percentare compromised accounts rather than”fake”accounts created solely for the purpose of spam-ming. Finally, spamming dominates actual wallpost activity in the early morning hours, whennormal users are asleep.

Lumezanu et al. [18] perform a joint analysisof spam in email and social networks. Spam datafrom Yahoo’s web-based email service and Twit-ter are used to characterize the publishing behav-ior and effectiveness of spam advertised acrossboth platforms. It is shown that email spammersthat also advertise on Twitter tend to send moreemail spam than those advertising exclusivelythrough email. Furthermore, sending spam onboth email and Twitter has better exposure thanspamming exclusively with email: spam domainsappearing on both platforms are looked up byan order of magnitude more networks thandomains using just one platform.

Social-Graph-Based Sybil Defense — Sybilattacks are the fundamental problem in peer-to-peer and other distributed systems. In a Sybilattack, a malicious attacker creates multiple fakeidentities to influence the working of systemsthat depend on open membership, such as rec-ommendation and delivery systems. Recently, anumber of social network-based schemes, suchas SybilGuard, Sybillimit, SybilInfer, and SumUp,have been proposed to mitigate Sybil attacks.Viswanath et al. [19] develop a deep understand-ing of these approaches. It shows that existingSybil defense schemes, which can be viewed asgraph partitioning algorithms, work by identify-ing local communities (i.e., clusters of nodesmore tightly knit than the rest of the graph)around a trusted node. Therefore, the substan-tial amount of prior research on general commu-nity detection algorithms can be used to designeffective and novel Sybil defense schemes.

Usually, binary Sybil/non-Sybil classifiers havehigh false positives; thus, manual inspectionneeds to be involved in the decision process forsuspending an account. SybilRank [20] aims toefficiently derive a Sybil-likelihood ranking; onlythe most suspicious accounts need to be inspect-ed manually. It is based on efficiently com-putable early-terminated RWs and is suitable forparallel implementation on a framework such asMap Reduce, uncovering Sybils in OSNs withmillions of accounts. SybilRank is deployed andtested in the operation center of Tuenti, which isthe largest OSN in Spain with 11 million users.Almost 100 and 90 percent of the 50K and 200Kaccounts, which SybilRank regards as the mostsuspicious, are indeed fake. In contrast, the hitrate of the current user-report-based approach isonly 5 percent. Thus, SybilRank represents a sig-nificant step toward practical Sybil defense.

FUTURE WORKBecause of the abundance of available personalinformation, OSNs suffer a vital problem of pri-vacy breach, and such attacks may be caused by

Because of the

abundance of avail-

able personal infor-

mation, OSNs suffer

from the vital

problem of privacy

breach. Such attacks

may be caused by

three primary parties

in the OSN: service

providers, malicious

users, and third-party

applications.

VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 149

Page 7: Understanding user behavior in online social networks: a survey

IEEE Communications Magazine • September 2013150

three primary parties in the OSN: service pro-viders, malicious users, and third-party applica-tions. Decentralized OSN is a potentialarchitecture to protect sensitive informationfrom leaking out to service providers and third-party applications. However, how to provideincentives to encourage users to switch to adecentralized OSN is challenging, especially forusers who do not care much about their privacyand security. Furthermore, the recipients ofshared information should be controlled by theusers themselves. Instead of sharing informationbased on the virtual links in OSNs, real-life rela-tionship between users should also be taken intoaccount. Finally, Sybil defense is still a hot topicand more solid work are expected to conduct inthis area. We foresee that semantic informationextracting from user profiles and social behaviorcan be used for Sybil detection, and should beutilized collaboratively with existing schemes.

CONCLUSIONIn this survey, we study user behavior in OSNsfrom four different perspectives: connection andinteraction, traffic activity, mobile social behav-ior, and malicious behavior. We review the exist-ing representative schemes and also providepotential future directions. We envision that thisresearch line will enhance the user experiencefrom various aspects, as well as satisfy differentplayers, including the infrastructure providers,service providers, and end users. We believe thatfurther research of user behavior in OSNs willgenerate more interesting research problems andexciting solutions in this area.

REFERENCES[1] C. Wilson et al., “User Interactions in Social Networks

and Their Implications,” Proc. EuroSys, 2009. [2] J. Jiang et al., “Understanding Latent Interactions in

Online Social Networks,” Proc. IMC, 2010. [3] H. Kwak et al., “What Is Twitter, a Social Network or a

News Media?,” Proc. WWW, 2010. [4] M. Gjoka et al., “Practical Recommendations on Crawl-

ing Online Social Networks,” IEEE Trans. Commun. Spe-cial Issue on Measurement of Internet Topologies, vol.29, no. 9, Oct. 2011.

[5] B. Ribeiro and D. Towsley, “Estimating and SamplingGraphs with Multidimensional Random Walks,” Proc.IMC, 2010.

[6] F. Benevenuto et al., “Characterizing User Behavior inOnline Social Networks,” Proc. IMC, 2009.

[7] F. Schneider et al., “Understanding Online Social NetworkUsage from a Network Perspective,” Proc. IMC, 2009.

[8] M. Wittie et al., “Exploiting Locality of Interest in OnlineSocial Networks,” Proc. CoNext, 2010.

[9] C. W. Dunn et al., “Navigation Characteristics of OnlineSocial Networks and Search Engines Users,” Proc.WOSN, 2012.

[10] N. Eagle and A. Pentland, “Social Serendipity: Mobiliz-ing Social Software,” IEEE Pervasive Computing, vol. 4,no. 2, 2005, pp. 28–34.

[11] L. Backstrom, E. Sun, and C. Marlow, “Find Me If YouCan: Improving Geographical Prediction with Social andSpatial Proximity,” Proc. WWW, 2010.

[12] E. Cho, S. Myers, and J. Leskovec, “Friendship andMobility: User Movement in Location-Based Social Net-works,” Proc. KDD, 2011.

[13] P. Hui, J. Crowcroft, and E. Yoneki, “BUBBLE Rap: Social-Based Forwarding in Delay Tolerant Networks,” IEEE Trans.Mobile Computing, vol. 10, Nov. 2011, pp. 1576–89.

[14] S. Ioannidis, A. Chaintreau, and L. Massoulie, “Optimaland Scalable Distribution of Content Updates over aMobile Social Network,” Proc. INFOCOM, 2009.

[15] N. Kayastha et al., “Applications, Architectures, andProtocol Design Issues for Mobile Social Networks: ASurvey,” Proc. IEEE, vol. 99, no. 12, 2011, pp. 2130–58.

[16] H. Gao et al., “Security Issues in Online Social Net-works,” IEEE Internet Computing, vol. 15, no. 4, 2011.

[17] H. Gao et al., “Detecting and Characterizing SocialSpam Campaigns,” Proc. IMC, 2010.

[18] C. Lumezanu and N. Lumezanu, “Observing CommonSpam in Tweets and Email,” Proc. IMC, 2012.

[19] B. Viswanath et al., “An Analysis of Social Network-Based Sybil Defenses,” Proc. SIGCOMM, 2010.

[20] Q. Cao et al., “Aiding the Detection of Fake Accountsin Large-Scale Online Social Services,” Proc. NSDI, 2012.

BIOGRAPHIESLONG JIN [email protected]) is currently a Ph.D. studentin the Department of Computer Science and Engineering,University of California, San Diego. He received his B.S. andM.S. degrees in Electronic Engineering from Tsinghua Uni-versity in 2010 and 2013, respectively. He visited CarnegieMellon University and Microsoft Research Asia in 2012. Hisresearch interests include social networks, mobile comput-ing and wireless networks.

YANG CHEN ([email protected]) is a postdoctoral associatein the Department of Computer Science, Duke University.From September 2009 to March 2011, he was a researchassociate at the University of Goettingen, Germany. Hereceived his B.S. and Ph.D. degrees from the Departmentof Electronic Engineering, Tsinghua University, in 2004 and2009, respectively. He visited Stanford University (in 2007)and Microsoft Research Asia (2006-2008) as a visiting stu-dent. His research interests include Internet architectureand protocols, cloud computing, and online/mobile socialnetworks.

TIANYI WANG ([email protected]) is now pursu-ing his Ph.D. degree in the Department of Electronic Engi-neering, Tsinghua University, and his advisor is Prof. XingLi. He received his B.S. degree in electronic engineeringfrom Tsinghua University in 2011. His research interestsinclude analysis of online social networks and data min-ing.

PAN HUI ([email protected]) received his Ph.D degreefrom the Computer Laboratory, University of Cambridge,and earned his M.Phil. and B.Eng. from the Departmentof Electrical and Electronic Engineering, University ofHong Kong. He is currently a faculty member of theDepartment of Computer Science and Engineering at theHong Kong University of Science and Technology wherehe directs the System and Media Lab. He also serves as aDistinguished Scientist of Telekom Innovation Laborato-ries (T-labs) Germany and an adjunct professor of socialcomputing and networking at Aalto University, Finland.Before returning to Hong Kong, he spent several years atT-labs and Intel Research Cambridge. He has publishedmore than 100 research papers, and has several grantedand pending European patents. He has founded andchaired several IEEE/ACM conferences/workshops, andserved on the technical program committees of numer-ous international conferences and workshops includingIEEE INFOCOM, SECON, MASS, GLOBECOM, WCNC, andITC.

ATHANASIOS V. VASILAKOS ([email protected], [email protected]) is currently professor at Kuwait Uni-versity. He has served or is serving as an Editor for manytechnical journals, such as IEEE TNSM, IEEE TC,IEEE TSMC-PART B, IEEE TITB, ACM TAAS, and IEEE JSAC Special Issuesin May 2009, and January and March 2011. He is Chairmanof the Council of Computing of the European Alliances forInnovation.

We envision that this

research line will

enhance the user

experience from

various aspects,

as well as satisfy

different players,

including the

infrastructure

providers, service

providers,

and end users.

VASILAKOS_LAYOUT_Layout 1 8/26/13 12:40 PM Page 150