Analysis of Fusing Online and Co-presence Social Networks
Post on 22-Feb-2016
DESCRIPTIONJuan (Susan) Pan , Daniel Boston, and Cristian Borcea Department of Computer Science New Jersey Institute of Technology. Analysis of Fusing Online and Co-presence Social Networks. Pervasive social applications. Location-aware social apps. Traditional social apps. Socially-aware a pps - PowerPoint PPT Presentation
Analysis of Fusing Online and Co-presence Social Networks
Analysis of Fusing Online and Co-presence Social NetworksJuan (Susan) Pan, Daniel Boston, and Cristian Borcea
Department of Computer Science New Jersey Institute of Technology
Hello everybody,My name is Susan Pan from the New Jersey Institute of TechnologyAnd today I will present our work on Analysis of Fusing Online and Co-presence Social networks
We try to do in this paper is to see if the online and co-presence networks are similar or different. Does it make sense to fuse them?1Pervasive social applicationsTraditional social apps
Location-aware social apps
Socially-aware appsBUBBLE RapUse social knowledge to improve packet forwarding in delayed tolerant networksTriblerUse social knowledge to reduce peer-to-peer communication overhead
There are three types of social applications.Traditional social applications simply declare friendship onlineLocation aware application incorporate GPS location information to notify user if friends are close enough.Socially aware applications use social knowledge to improve the performance of other system such as packet forwarding and P2P communication.2Social information collectionDeclared by usersImplicitly, through online social networksExplicitly, through surveysExtracted from user online interactionsExtracted from user mobility tracesLocation tracesCo-presence traces (e.g., using Bluetooth)
33so, what social information is collected or used by these applications?
Social relationship can be pairwise individuals or individual and groups.
There three ways, Declare by users, Extracted from user online interactions and extracted from user mobility traces.If Declare by user, it could be online friendship declaration or through surveys. In this study, we harvest online facebook social declarations given user permission.If mobility traces, it could be locations traces such as GPS or co-presence such as using bluetooth. Due to lower power consumption and privacy, we use co-presence bluetooth traces.Social information representationMultiple social graphs (e.g., Facebook and co-presence)Vertices -> usersEdges -> social tiesOnline social networks (OSN) provide relatively stable social graphMany connections are weakExample: actors have millions of friendsNot all social contacts use OSN appsCo-presence social network (CSN) identifies social ties grounded on real-world interactionsHard to differentiate social connections from passers-by4Pair-wise information typically can represented as social graphs.The vertices are users, and edges are social ties. We can assign weight on social ties based on the density of interaction
Online social network and co-presence provide different social perspectives with drawbacks and strength.OSN provide relatively stable social graph, However, many connections are weak, OSN apps can not represent whole pictures of social contacts accurately, moreover, user rarely delete relationships from their profiles.
Co-presence identify social ties grounded on real-world interactions. However, it is hard to differentiate social connections from passers-by. Some meetings are just chance encounters without social significance. For instance, two students sitting at nearby tables in the cafeteria
OSN:- Slowly add new relationships after initial bootstrapRarely delete relationships from their profile
4Research questionsDo OSN and CSN just reinforce each other or capture different types of social ties?Can a fused network take advantage of the strengths of both?How can we quantify the benefits of this fusion?Can we measure the contribution of each source network to the fused network?
5The goal of our research is to investigate the potential of fusing OSN and CSN.
Most of the system maintain the two social graphs separately.For instance, one of the papers published in our lab on 2010 Promethus manages different social graphs as multi-graph with labels on the edges. Application can queries social information from peers. Another paper developed an application that maintains the two networks separately, but utilizes both to help balanceyoungsters social connections.Thus, we ask the research questions as belowDo OSN and CSN just reinforce each other or they capture different type of social ties and complement each other?Does it make sense to fuse them?If so, How can we quantify the benefits of this fusion? Can we measure the contribution of each source network to the fused network?
CB: Need to say that other systems (Prometheus, etc) keep them separateCB: We want to investigate the potential benefits of fusion5OutlineMotivationData collectionSocial graph representationAnalysis of global network parametersAnalysis of local network parametersConclusions66Study participantsOne month of CSN data and Facebook data for the same set of 104 students VolunteersReceived compensationBelong to various departments at NJIT7
Based on those questions, we conducted an experiment. We recruited 104 student volunteers in our University.The slides shows the statistics and demographics of our subjects. We believe our subjects are representative of a median size campus in urban area.
Dont say one month of faceboook data.7Bluetooth based co-presence data
CBC1:05AC1:07A performs scan and created record (A,B)B performs scan and created record (B,A)(B,C)Then A performs scan again and created (A,B) (A,C) Records are uploaded by the end of the day.The average scan time period is 2 minutes, 40 seconds, with a standard deviation of 1 minute.This slide illustrates the how we generate our bluetooth co-presence record.We distribute each of the volunteer students mobile phones, where a program quietly scan the nearby devices and sent it back to the central server..
HTC Windows Mobile 8595 and 8525 phones, which come preloaded with Windows Mobile 58Co-presence statistics MaxMeanStandard Dev.Meeting Duration220 hrs 2 min1hr 16min7hrs 34 minMeeting Frequency512.23.79
The first graph illustrates the contribution of volunteers by hours.Given that our sample size (104 volunteers) was small compared to the university population (9000 students) and that many students are commuters, our trace data is relatively sparse. 50% of the students contribute less than 32 hours total during one month.We did not make effort to select friends.
The typical user provided a few hours of data per day, especially during the week days.80% of the people provide less than 2 hours per day.The meeting durations is the summation of each of meeting time during one month. The meeting frequency is the total number of meetings during one month.
We use elimination algorithm to merge the meeting records with very short intervals. (granularity is 5 minutes.)
CB: use the color version of the graph (if you dont have it, ask Daniel for the source)
9Facebook dataSubjects gave us permission to collect dataFriends, wall writings, comments, photo tagsOnline interaction is wall writing, comment or photo tagCount number of interactions between user pairs
MaxMeansStandard Dev.Online Interactions40241010OutlineMotivationData collectionSocial graph representationAnalysis of global network parametersAnalysis of local network parametersConclusions11Global refer to structure of the network, the parameters averaged across the entire network.Local refer to parameter calculated based on individual properties of users, edges and communities.
What difference between global and local11Weighted social graphs are more accurateOSN: Weightonline = number of interactionsCSN: Weightco-presence = 0.5 Weightduration + 0.5 Weightfrequency
Remember its the 2nd max12How to remove edges due to passers-by in CSN?Very short and infrequent co-presence does not indicate the presence of a social tieCSN noise reduction13Find duration & frequency thresholds for adding a CSN edgeIncrease thresholds until Edit distance between CSN and OSN stabilizesEdit distance: number of edge additions/deletions to transform one graph into the otherKeep OSN unchanged because Facebook friendship confirmations validate social ties
In order to eliminate the noised introduced by familiar strangers, we introduce two parameters. is the total meeting duration is total count number of total meetings. We think two person have a social tie only when their total meeting duration and total meeting time reach the threshold.In a sense, their social tie are strong enough.13Threshold selection
14Total meeting duration threshold= 160 minutes per monthTotal meeting frequency threshold= 3 times per monthDraw two arrows and textboxes to show its the threshold Say per month
Tij is the total time users i and j spent together; Fij is the total number of meetings in the encounter history. and are thresholds for meeting duration and meeting frequency that we vary during the analysis ( within [30min, 1800min]and within [1, 10]). We pick 160 minutes and 3 meetings as thresholds based on the results in Figures 2 and 3 that show the Edit distanceremaining stable beyond these values.14Resulting social graphs
Co-presence SocialNetworkOnline SocialNetwork
Fused Network (51 shared edges)1515OutlineMotivationData collectionSocial graph representationAnalysis of global network parametersDegree, connectivity, centrality, cohesivenessAnalysis of local network parametersConclusions16What difference between global and localGlobal refer to structure of the networkLocal refer to individual properties of users, edges and communities
16OSNCSNFusedCorrelation (online, co-presence)= 0.202Average degree3.173.775.96
OSN degree follows proximately power law distribution CSN degree does not resemble as strong power-law distribution as OSNsDue to meeting with familiar strangersConsequently, similar result observed for fused networkDegree distribution3 nodes are social butterfliesMost nodes have high degree in either CSN or OSN, but not both3 nodes have high degree in both CSN and OSNIncreased average degree means people meet different sets of contacts in the two source networks
17Increased Average degree means people meet different set of contacts.
OSN degree follows closely a power-law distribution in which few nodes with many degrees and many others with few degreesDue to mobility, in real life CSN, people meet a lot familiar stranger which they may know by face but not implicitly declare friendship.However, in OSN, people make through friends. CSN degree does not resemble a as strong power-law distribution as OSNPeoples mobility result in meetings with familiar strangersMost of the nodes either have more contacts on CSN or OSN , thus resulting the decreasing of max count in y axis.
Familiar stranger is an individual who is recognized from regular activities, but with whom one does not interact.They are people who one indeed meet frequently or longer time.
the validation of power-law claims remains a very active field of research in many areas of modern science.In general, there We remove only the people who meet rarely and less time.
Only 3 nodes have high number of contacts both in OSN and CSN (social butterfly)Most of the nodes either have more contacts on CSN or OSN , thus resulting the decreasing of max count in y axis. (Degree correlation= 0.202 )
17ConnectivityOSNCSNFusedWeightedNumber of edges165196310NSize of LCC(largest connected component)638498N
Diameter of LCC787NAverage length of shortest path12.321.988.77YCSN contributes 27% more edges than OSNCompared to OSN, CSN has 55% more connected peopleAlmost all people connected in fused networkAverage weighted shortest path reduced in fused networkStronger social connectivity: reason to leverage it in social apps18Co-presence network contributes 27% more edgesCompared to OSN, 55% more people are connected in LCC.the diameter and average weighted shortest path are reduced. people become closer and more involved in each others lives if the fused network is leveraged in social applications.Only 51 edges are shared but average degree increased. It indicates people interact with different people when online and in real life. Among all the people one interact in real life, only 26% are facebook friend.
diameter measures the longest shortest path in the connected component.The weighted shortest path is the path with the greatest capacity of carrying information
There are 51 shared edges between the online and co-presence networks, which is less than a third of the number of edges in each of the two networks. Number of connected componentsFacebook 63 15 2 4 2 2 2Co-presence 84 4Fused 98
18OSNCSNFusedWeightedAverage weight betweenness49.190.1394.83YAverage length of shortest path12.321.988.77YAverage edge weight3.023.641.95YAverage weighted clustercoefficient0.1560.1220.157YCSN has much longer average shortest path than OSNHence, average betweenness is highIn fused network, average shortest path is low, but betweenness is highest Social centrality is improvedBetweenness centrality and cluster coefficientAverage edge weight shows that people interact more in real life than onlineHighly socially active person online is not necessarily highly socially active in real lifeThus, smaller values in fused networkOSN has higher cohesivenessPeople become friends when sharing common friendsOSN contributes more to fused19OSNCSNCorresponding random graph(same number of vertices and edges) only have 0.029 cluster coefficient.I may need to calculate the number of the structureCentrality determine the relative importance of a vertex within the graph.
Co-presence have much longer average length of shortest path, hence the average betweenness is high.In fused network, Average length of shortest path is small, how, meaning becoming friends because of their common friend. (Cohesiveness)Co-presence does not contributes to the cohesivenessever , betweenness is highest , thus showing social centrality is improved.In OSN, people have higher tendency to make transitive friendsThe betweenness centrality counts the number of times a node occurs on the shortestpath of other pairs of nodesThe local cluster coefficient (also known as transitivity) is a measure of the extent towhich nodes in a graph cluster together. It is the fraction of the number of present ties over the total number of possible ties between the nodes neighbors.the weighted version , the contribution of each tri-set (visualized as a triangle) of nodes is weighted by a ratio of the average weight of the two adjacent edges of the triangle to the average weight of the node.
 A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani, The architecture of complex weighted networks, Proceedingsof the National Academy of Sciences of the United States of America, vol. 101, no. 11, pp. 37473752, 2004
19OutlineMotivationData collectionSocial graph representationAnalysis of global network parametersAnalysis of local network parametersNode, edge, communityConclusions2020Similarity of node degree and edge weightCalculate Euclidean distance of the degree vector (104 nodes) and shared edge weight vector (51 edges) Similarity is inverse of distanceDistance(OSN, CSN)Distance(OSN, fused)Distance(CSN, fused)Weighted node degree0.5580.3060.256Node degree0.3990.3050.225Edge weight0.5600.3240.29521CSN more similar to fused network
Dont say contribution
Say similarity then contributions and mention this is the local parameters21Computation of community similarityHow to quantify community similarity across networks?Few communities are the sameBetter to quantify community overlappingCompute k-clique overlapping clusters on the three networks separatelyUse community overlapping matrix to compute distance between networks (inverse of similarity)
Social network tends to have overlapping communities by nature.A k-clique overlapping community is the union of all k-cliques (complete subgraphs ofsize k) that can be reached from each other through a series of adjacent k-cliques(where adjacency means sharing k-1nodes)
CB: Re-do the distance (replace OSN and CSN with NW1 and NW2)Is it any i, j or all i, j?Make sure its not blurry when you copy it again
22Community similarityK=3K=4K=5Dist(OSN, fused)256114226.5Dist(CSN, fused)228913532.0Fused network has larger average size community than OSN and CSN (fused=6.1, CSN=4.9, OSN=5.2)CSN is closer to the fused network for weaker communities (k=3,4)OSN is closer to fused network for stronger communities(k=5)OSN contributes stronger social communities than CSN
23Social network tends to have overlapping communities by nature. A k-clique overlapping community is the union of all k-cliques (complete subgraphs of size k) that can be reached from each other through a series of adjacent k-cliques
Average size of communities for k=4 Fused = 6.1Copresence =4.9Online=5.2
For k=3Facebook: 452 pairs of nodes share communities, average number of communities shared is 1.0287610619469028Co-presence: 727 pairs of nodes share communities, average number of communities shared is 1.015130674002751Fused: 3019 pairs of nodes share communities, average number of communities shared is 1.0026498840675723ConclusionsCSN and OSN represent two different classes of social engagementApplications may benefit from fused network that merges CSN and OSNCSN increases the fused network connectivity and communication strengthOSN strengthens the community structure and lowers the average path length of fused networkTypical example is friend-of-friend apps24Discuss the applicationsFriend recommendationEvent recommendation
The fusion represent more accurate understanding of social relationship
Mobius projectDecentralized two-tier infrastructure for mobile social computingP2P tierCollects on-line social informationManages social stateRuns user-deployed services to support mobile apps Dynamically adapts to geo-social contextEnergy-efficiency, scalability, reliability Mobile tierRuns mobile applicationsCollects geo-social information from phones
25Application scenario: communitymultimedia sharing systemAnd before I finish, I want to mention this research is our project Mobius, the goal of the project is to explore the benefits of embedding social knowledge in the network protocols and services that support mobile social computing applications.The infrastructure has two tiers. The p2p is groups of users PC where we can run services, collect online-social information.Mobile tier is groups of mobile phones, which runs mobile apps and collect geo-social information
The aim is to explore the benefits of embedding social knowledge in the network protocols and services that support mobile social computing applications.P2P tier is manage geo-social information, and run services to support mobile apps.Mobile Tier run light weight client apps and collect geo-social information using mobile phones. (social sensor)For example, Community multimedia sharing system, Alice upload pictures to P2P peer Bob, Bob store picture on Alices friend Janes PC, and also notify all Alices friends, Jane and Mike, Mike is interested to see the picture, so he download the picture from Janes PC.
CB: Use the application scenario just to show the two tiers. No need to explain it during the talk (but you should be prepared to explain it during the questions at the end if someone asks about it)JP: Is Bob Alices friends? Is not, Bob can query who is Alices friends? Is that secure? 25Thank you!Acknowledgment: NSF Grant CNS-0831753
2626Related workKostakos The networks are very sparseCo-presence social ties are based on only one meetingDoes not consider user interaction (edge weight)There is no proper noise reductionEagle, Cranshaw Focused on using co-presence data to predict friendshipMtibaa Concluding that the two graphs are similarConference over a single dayThese results cannot be broadened
2727AppendixWe use the method proposed by Barrat et al. (2004) generalized local clustering coefficient. In this weighted version, the contribution of each triangle is weighted by a ratio of the average weight of the two adjacent edges of the triangle to the average weight of node i . The formula is
is the non-weighted degree of the node i.28- CB: I think you should remove it (you can keep it for yourself, but not in this presentation)28Power Law distributionNode degrees in real-world large scale social networks often follow a power law distributionfew nodes with many degrees and many others with few degrees29