Analysis of Fusing Online and Co-presence Social Networks

Analysis of Fusing Online and Co-presence Social Networks

Juan (Susan) Pan, Daniel Boston, and Cristian Borcea

Department of Computer Science New Jersey Institute of Technology

Pervasive social applications
Traditional social apps
Location-aware social apps
Socially-aware apps

BUBBLE Rap: Use social knowledge to improve packet forwarding in delayed tolerant networks
Tribler: Use social knowledge to reduce peer-to-peer communication overhead Social information collection
Declared by users
- Implicitly, through online social networks
- Explicitly, through surveys
Extracted from user online interactions
Extracted from user mobility traces
- Location traces
- Co-presence traces (e.g., using Bluetooth) Social information representation
Multiple social graphs (e.g., Facebook and co-presence)
- Vertices -> users
- Edges -> social ties

Online social networks (OSN) provide relatively stable social graph
- Many connections are weak
- Example: actors have millions of friends
- Not all social contacts use OSN apps

Co-presence social network (CSN) identifies social ties grounded on real-world interactions
- Hard to differentiate social connections from passers-by OSN:
- Slowly add new relationships after initial bootstrap
- Rarely delete relationships from their profile Research questions
Do OSN and CSN just reinforce each other or capture different types of social ties?
Can a fused network take advantage of the strengths of both?
How can we quantify the benefits of this fusion?
Can we measure the contribution of each source network to the fused network? Outline
- Motivation
- Data collection
- Social graph representation
- Analysis of global network parameters
- Analysis of local network parameters
- Conclusions

Study participants
One month of CSN data and Facebook data for the same set of 104 students
- Volunteers
- Received compensation
- Belong to various departments at NJIT We recruited 104 student volunteers in our University.The slides shows the statistics and demographics of our subjects. Co-presence statistics
                Max     Mean    Standard Dev.
Meeting Duration   220 hrs  2 min   1hr 16min   7hrs 34 min
Meeting Frequency  51       2.23    3.79

Given that our sample size (104 volunteers) was small compared to the university population (9000 students) and that many students are commuters, our trace data is relatively sparse. 50% of the students contribute less than 32 hours total during one month.

The typical user provided a few hours of data per day, especially during the week days. 80% of the people provide less than 2 hours per day. The meeting frequency is the total number of meetings during one month.</p> <p>We use elimination algorithm to merge the meeting records with very short intervals. Facebook data
Subjects gave us permission to collect data
- Friends, wall writings, comments, photo tags
- Online interaction is wall writing, comment or photo tag
- Count number of interactions between user pairs

                      Max  Mean  Standard Dev.
Online Interactions   402  4     10

Outline
- Motivation
- Data collection
- Social graph representation
- Analysis of global network parameters
- Analysis of local network parameters
- Conclusions

Global refer to structure of the network, the parameters averaged across the entire network.
Local refer to parameter calculated based on individual properties of users, edges and communities.

Weighted social graphs are more accurate
OSN: Weight_online = number of interactions
CSN: Weight_co-presence = 0.5 * Weight_duration + 0.5 * Weight_frequency How to remove edges due to passers-by in CSN?
Very short and infrequent co-presence does not indicate the presence of a social tie
CSN noise reduction

Find duration & frequency thresholds for adding a CSN edge
Increase thresholds until Edit distance between CSN and OSN stabilizes
Edit distance: number of edge additions/deletions to transform one graph into the other
Keep OSN unchanged because Facebook friendship confirmations validate social ties

Threshold selection

Total meeting duration threshold = 160 minutes per month
Total meeting frequency threshold = 3 times per month

T_ij is the total time users i and j spent together; F_ij is the total number of meetings in the encounter history. Thresholds for meeting duration and meeting frequency are varied during the analysis (within [30min, 1800min] and within [1, 10]). Resulting social graphs

Co-presence Social Network
Online Social Network
Fused Network (51 shared edges)

Outline
- Motivation
- Data collection
- Social graph representation
- Analysis of global network parameters
  - Degree, connectivity, centrality, cohesiveness
- Analysis of local network parameters
- Conclusions Degree distribution

                OSN    CSN    Fused   Correlation (online, co-presence)
Average degree  3.17   3.77   5.96    0.202

OSN degree follows proximately power law distribution
CSN degree does not resemble as strong power-law distribution as OSNs
- Due to meeting with familiar strangers
Consequently, similar result observed for fused network

3 nodes are social butterflies
Most nodes have high degree in either CSN or OSN, but not both
3 nodes have high degree in both CSN and OSN
Increased average degree means people meet different sets of contacts in the two source networks

Familiar stranger is an individual who is recognized from regular activities, but with whom one does not interact. Connectivity

                                    OSN    CSN    Fused   Weighted
Number of edges                     165    196    310     N
Size of LCC (largest connected      63     84     98      N
component)
Diameter of LCC                     7      8      7       N
Average length of shortest path     12.32  1.98   8.77    Y

CSN contributes 27% more edges than OSN
Compared to OSN, CSN has 55% more connected people
Almost all people connected in fused network
Average weighted shortest path reduced in fused network
Stronger social connectivity: reason to leverage it in social apps

Co-presence network contributes 27% more edges. Compared to OSN, 55% more people are connected in LCC. The diameter and average weighted shortest path are reduced. People become closer and more involved in each others lives if the fused network is leveraged in social applications.

There are 51 shared edges between the online and co-presence networks, which is less than a third of the number of edges in each of the two networks.

Number of connected components:
Facebook: 63 15 2 4 2 2 2
Co-presence: 84 4
Fused: 98 Among all the people one interact in real life, only 26% are facebook friend.</p> <p>diameter measures the longest shortest path in the connected component.The weighted shortest path is the path with the greatest capacity of carrying information</p> <p>There are 51 shared edges between the online and co-presence networks, which is less than a third of the number of edges in each of the two networks. The betweenness centrality counts the number of times a node occurs on the shortest path of other pairs of nodes.

The local cluster coefficient (also known as transitivity) is a measure of the extent to which nodes in a graph cluster together. It is the fraction of the number of present ties over the total number of possible ties between the nodes neighbors.

In OSN, people have higher tendency to make transitive friends. Co-presence does not contribute to the cohesiveness; however, betweenness is highest, thus showing social centrality is improved. (Cohesiveness)Co-presence does not contributes to the cohesivenessever , betweenness is highest , thus showing social centrality is improved.In OSN, people have higher tendency to make transitive friendsThe betweenness centrality counts the number of times a node occurs on the shortestpath of other pairs of nodesThe local cluster coefficient (also known as transitivity) is a measure of the extent towhich nodes in a graph cluster together. It is the fraction of the number of present ties over the total number of possible ties between the nodes neighbors.the weighted version [18], the contribution of each tri-set (visualized as a triangle) of nodes is weighted by a ratio of the average weight of the two adjacent edges of the triangle to the average weight of the node.</p> <p>[18] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and A. Vespignani, The architecture of complex weighted networks, Proceedingsof the National Academy of Sciences of the United States of America, vol. 101, no. 11, pp. 37473752, 2004</p> <p>19OutlineMotivationData collectionSocial...</p>