Analysis and mining of online social networks: emerging trends and challenges

Download Analysis and mining of online social networks: emerging trends and challenges

Post on 11-Mar-2017




3 download

Embed Size (px)


<ul><li><p>Advanced Review</p><p>Analysis and mining of onlinesocial networks: emerging trendsand challengesSajid Yousuf Bhat and Muhammad Abulaish</p><p>Social network analysis (SNA) is a multidisciplinary field dedicated to the analysisand modeling of relations and diffusion processes among various objects in natureand society, and other information/knowledge processing entities with an aim ofunderstanding how the behavior of individuals and their interactions translatesinto large-scale social phenomenon. Because of exploding popularity of onlinesocial networks and availability of huge amount of user-generated content, thereis a great opportunity to analyze social networks and their dynamics at resolutionsand levels not seen before. This has resulted in a significant increase in researchliterature at the intersection of the computing and social sciences leading to severaltechniques for social network modeling and analysis in the area of machine learningand data mining. Some of the current challenges in the analysis of large-scale socialnetwork data include social network modeling and representation, link mining,sentiment analysis, semantic SNA, information diffusion, viral marketing, andinfluential node mining. 2013 John Wiley &amp; Sons, Ltd.</p><p>How to cite this article:WIREs Data Mining Knowl Discov 2013, 3:408444. doi: 10.1002/widm.1105</p><p>INTRODUCTION</p><p>The social world is a network of interactions andrelationships that facilitates the flow and exchangeof information and resources like norms, values,and ideas among individuals.1 Such a view of thesocial world can be treated as a social network, andit can be defined as a social structure representedby a set of nodes and their interrelationships,generally called ties. A node in a social network isusually called a social actor, and may represent aperson, group, document, organization, or nation.A relation between a pair of nodes represents theirties reflecting friendship, kinship, dislike, commoninterest, acquaintance, financial exchange, physicalconnection, hyperlink, or colocation. Social networkanalysis (SNA) is one of the important techniquesused in the field of sociology and also finds significantapplication in anthropology, biology, communication,</p><p>Correspondence to: of Computer Science, Jamia Millia Islamia (A CentralUniversity), New Delhi, IndiaConflict of interest: The authors have declared no conflicts ofinterest for this article.</p><p>economics, geography, and social computing.2,3 Theincreasing popularity of social networks is largelydue to their relevance to various processes takingplace in society, such as spread of cultural fadsor diseases, formation of groups and communities,and recommendations. The process of SNA andmodeling facilitates to understand how the behaviorof individuals and their interactions translates intolarge-scale social systems. Recently, the application ofSNA and social network concepts to a wide domainof research interests has gained huge popularity.For example, an application of SNA for transportplanning is demonstrated in Ref 4. Hulst5 highlightedthe importance and applications of SNA for dealingwith organized crime in adversary networks. Withthe variety of data such networks provide, the SNAtasks applicable (but are not limited to) includecommunity (gang) identification, sentiment analysisand opinion mining, node influence analysis, andlink prediction. The work of Lewis et al.6 signifiedthat the clusters or community analysis in proteininteraction networks using SNA techniques canhighlight functionally coherent groups of proteinsand predict the level of function homogeneity within</p><p>408 2013 John Wiley &amp; Sons, Ltd. Volume 3, November/December 2013</p></li><li><p>WIREs Data Mining and Knowledge Discovery Analysis and mining of online social networks</p><p>these protein groups or communities. Similarly,Swan et al.7 throw some light on the importanceof community-wise management of knowledge andhow managed intracommunity and intercommunity(across structural holes) interactions lead to significantinnovations. Using SNA for perceiving, controlling,and distributing strategic and domain knowledge inorganizations8 and collaborative distance learning9</p><p>is also promising, considering the improvement ofthroughput for such systems.</p><p>Bonchi et al.10 presented a state-of-the-artsurvey on the business application of SNA in anonline social network (OSN) environment. Theyhighlighted the exploitation of various SNA conceptslike social contagion, influence, communities, andranking for facilitating many business challengessuch as marketing, customer service, and managingresources (financial, human, and knowledge). Onthe other hand, this article aims to present someimportant challenges of SNA in a generic domain,and it highlights how OSNs facilitate the analysis andunderstanding of such challenges by providing a spec-trum of opportunities and data that are highly relatedto the real world. A huge amount of literature alongthe direction of SNA exists. Most of them concentrateon some specific aspect of the social networks (e.g.,community detection and information diffusion) inisolation or are mostly oriented toward sociologicalaspects relieving computer science. However, thepresent multidimensional OSNs provide a means ofstudying various data mining tasks related to SNA ina unified framework where each of them can benefitfrom the others. In this regard, we present a survey onthe current state-of-the-art and challenges related toSNA in the light of OSNs and also present the designof a possible unified framework for some major tasksrelated to SNA.</p><p>This article introduces some of the recent datamining tasks targeting social networks, taking intoconsideration data about the structure of socialnetworks. Different data mining tasks and techniquesrequire that the social networks be represented andmodeled according to the needs of the analysisbeing performed in the task. As a result, variousalternatives for social network modeling exist and canbe categorized into different groups depending uponthe techniques used and the level of analysis sup-ported. Some of the popular social network modelingtechniques have been reviewed in Social NetworkRepresentations section. For a detailed descriptionof the graph-theoretic properties and social networkmetrics that form a basis for almost all kind of anal-yses of social networks, readers should refer to Refs 2and 1113.</p><p>SOCIAL NETWORK PROPERTIESAND DATA</p><p>Network systems have been traditionally consideredto be random structures and despite links beingconsidered to occur at random between nodes,most nodes were expected to have almost the samedegree. However, significant contributions made byresearchers14,15 revealed that the vertex connectivityof large-scale real-world networks actually follows ascale-free power-law distribution. That is, for largevalues of k, the fraction P(k) of nodes having kconnections to other nodes in the network follows therelation as shown in (1), where c is a normalizationconstant and is a parameter usually ranging between2 and 3.</p><p>P(k) ck . (1)</p><p>The growth of such networks involves a rich-get-richer scheme [preferential attachment (PA)] whereinnew nodes have a higher probability to link to nodesof higher degree, or it can be stated that the likelihoodof a node acquiring a new link is in proportion to thenodes degree.16</p><p>Some special properties of social networks thatdifferentiate social networks from other networks ashighlighted by Newman and Park17 are as follows:first, social networks show positive correlationsbetween the degrees of adjacent vertices (assorta-tivity). More specifically, vertices of similar degreetend to be connected more with each other thanwith others. Second, social networks have nontrivialclustering or network transitivity. This propertymakes the networks to exhibit community structures,i.e., clusters of vertices or nodes that are more similaror connected within the group than to the rest ofthe network. Communities in social networks oftenmap to important functional or interest groups ofthe underlying nodes and designing methods andtechniques to identify them is a challenging task.</p><p>Milgram18 showed that in a well-definedpopulation the average path length between twoindividuals so that they can meet each other wassix hops, demonstrating that social networks can beclassified as small-world, which led to the famousphrase six degrees of separation. Another famousconcept of sociology is the weak link hypothesisby Granovetter,19 according to which the degree ofoverlap between the friend neighborhoods of twoindividuals is observed to increase as a function of thestrength of the tie connecting these two individuals.This specifies that strong ties are tightly clustered,whereas the weak ties represent longer distance rela-tionships, thus playing an important role for the flow</p><p>Volume 3, November/December 2013 2013 John Wiley &amp; Sons, Ltd. 409</p></li><li><p>Advanced Review</p><p>of information and innovation. This phenomenonis known as the strength of weak ties. On the basisof this concept and the existence of groups in socialnetworks, Burt20 defined structural holes as the topo-logical scarcity or the weakness of links between thegroups in a social network. In terms of the productivityperspective of organization control, structural holesappear to provide an opportunity of the brokerage ofinformation flow between different working groups.This in turn provides an opportunity to control theprojects on which various groups across a structuralhole work. Within groups, opinion and behavior tendto be more homogeneous than otherwise. Thus, indi-viduals who connect groups across structural holesoften tend to have alternative ways of thinking andapproaching a problem as they are possibly exposed tomultiple activities. Brokerage between groups acrossthe structural holes highlights alternative options thatotherwise remain unexplored and based on this prop-erty, brokerage across a structural hole becomes socialcapital.21 Burt22 signifies that an increase in the num-ber of individuals performing the same task decreasesthe value of social capital and thus peers tend tolose the value of social capital to individuals (oftenmanagers) who have a very less number of peers.</p><p>The theory proposed by Granovetter19 related tothe strength of weak ties highlights an important prop-erty of contagion, i.e., diffusion of diseases and inno-vations (belief, ideology, norm, technology, organiza-tional form, fad, or fashion) in social and informationnetworks through physical and/or virtual contacts.That is, the reach of a diffusion process (social distancecovered) is significantly higher if it is passed throughweak ties rather than strong. Moreover, small-worldnetworks composed of a few long intercommunityties between tightly clustered communities facilitatingrapid diffusion of information and disease.</p><p>Social Network Data SourcesFor analyzing social networks, the primary require-ment is the access to social network data, which canbe viewed as a social relational system characterizedby a set of actors and their social ties.13 Additionalinformation in the form of actor attributes or multiplerelations can be a part of the social relationalsystem. Traditional sources of social network dataincluded questionnaires, interviews, and observations.However, acquiring data through these sources islaborious and costly, and restricts the analysis to asmaller number of individuals, resulting in possiblysignificant individual biases. With the availability oflarge electronic datasets like the e-mail networks andtelephone call graphs, and efficient computational</p><p>resources, in the late 1990s, physicists entered the fieldof social networks (and complex networks in general)with a concern of analyzing topological properties ofnetworks, developing new concepts, algorithms, andmodels. The advantage of such electronic datasets isthat they are large, relatively easy to process, and accu-rate in the sense that subjective biases are absent.23</p><p>Wu et al.24 argued that besides proxy datasetslike e-mail networks, face-to-face (F2F) interactionsalso remain a powerful conduit for informationexchange, especially for complex or tacit information.They incorporated wearable sociometric badgesthat can collect and analyze behavioral data fromindividuals over time by detecting people in closeproximity, capturing F2F interaction time, andrecording tonal variation and prosody using amicrophone. Alternatively, Eagle et al.25 showed thatdata gathered from the usage of mobile phones can beused to produce a significant insight into the relationaldynamics of user behavior. Besides communicationinformation presented by communication networkslike e-mail networks and telephone call graphs, theinformation provided by mobile phone data alsospans to the behavior, location, and proximity ofmobile phone users using GPS, Bluetooth, cell towerIDs, and application usage. With an aim of comparingthe user behavior represented by the data collectedfrom mobile phones with user-reported data collectedfrom direct user survey, the analysis of Eagle et al.25</p><p>highlighted that the former, as a complement tothe latter, provides significant insights into purelycognitive constructs, such as friendship and individualsatisfaction besides observable behavior. Such datacan easily be mapped to the actual F2F interactionsand relations between individuals.</p><p>The Online Social Network BoomThe growth of the World Wide Web (WWW) hasled to the evolution of different types of informationsharing systems, which include OSNs like Facebook,MySpace, Flickr,, Digg, Bebo, Orkut, hi5,LinkedIn, LiveJournal, and Twitter. In the recentyears, OSNs like Facebook have achieved significantpopularity and now represent the most popular websites. OSNs provide individuals a means of joininga network (become users), provide information thatcould define them or their preferences (profile), andenable them to publish any content that they like toshare with other users. One of the important andoutstanding features provided by these OSNs is toenable users to create links to other users with whomthey associate. These user-centric features of OSNsenables a user to define and maintain social relations,</p><p>410 2013 John Wiley &amp; Sons, Ltd. Volume 3, November/December 2013</p></li><li><p>WIREs Data Mining and Knowledge Discovery Analysis and mining of online social networks</p><p>find and link to other users with similar interestsand preferences, and share, find, and endorse contentand knowledge contributed by a user itself or byother users.26 Considering the extreme popularity,huge membership, and the enormous amounts ofsocial network data generated by these OSNs, thereexists a unique opportunity to study, understand, andleverage their properties. An in-depth analysis of OSNstructure and growth can not only aid in designing andevaluating current systems but can also lead to betterdesign of future OSN-based systems and to a deeperunderstanding of the impact of OSNs on society.</p><p>Th...</p></li></ul>