Measurement and Analysis of Online Social Networks 1 A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Shahan Khatchadourian.

Download Measurement and Analysis of Online Social Networks 1 A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Shahan Khatchadourian.

Post on 27-Mar-2015

213 views

Category:

Documents

1 download

TRANSCRIPT

  • Slide 1

Measurement and Analysis of Online Social Networks 1 A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Shahan Khatchadourian Supervisor: Prof. Mariano P. Consens Slide 2 Measurement and Analysis of Online Social Networks 2 Focus graphs of online social networks how they were obtained how they were verified how measurement and analysis was performed properties of obtained graphs why these properties are relevant Slide 3 Measurement and Analysis of Online Social Networks 3 Why study the graphs? important to improve existing system and develop new applications information search trusted users what is the structure of online social networks what are different ways to examine a social network when complete data is not available? how do they compare with each other and to the Web? Slide 4 Measurement and Analysis of Online Social Networks 4 Which graphs? Flickr, YouTube, LiveJournal, and Orkut All are directed except for Orkut Weakly Connected Component (WCC) Strongly Connected Component (SCC) Slide 5 Measurement and Analysis of Online Social Networks 5 How are the graphs obtained? API users groups forward/backward links HTML Screen Scraping Slide 6 Measurement and Analysis of Online Social Networks 6 Summary of graph properties small-world scale-free correlation between indegree and outdegree large strongly connected core of high- degree nodes surrounded by small clusters of low-degree nodes Slide 7 Measurement and Analysis of Online Social Networks 7 Crawling Concerns - Algorithms BFS and DFS Snowball method: underestimates number of low-degree nodes. In social networks, they underestimate the power-law coefficient, but closely match other metrics such as overall clustering coefficient. Slide 8 Measurement and Analysis of Online Social Networks 8 Crawling Concerns FW links cannot reach entire WCC Slide 9 Measurement and Analysis of Online Social Networks 9 How to Verify Samples 1.Obtain a random user sample LJ: feature which returns 5,000 random users Flickr: random 8-digit user id generation 2.Conduct a crawl using these random users as seeds 3.See if these random nodes connect to the original WCC 4.See what the graph structure of the newly crawled graph compares to original Slide 10 Measurement and Analysis of Online Social Networks 10 Crawling Concerns FW links no effect on largest WCC Slide 11 Measurement and Analysis of Online Social Networks 11 Crawling Concerns FW links increasing the size of the WCC by starting at a different seed Slide 12 Measurement and Analysis of Online Social Networks 12 SiteYTFlickrLJOrkut Users(mill)1.11.85.23 Links(mill)4.92272223 symmetry79.1%62.0%73.5%100.0% Access (FW: Forward- only) (SS: HTML screen- scraping) API (users only) FW SS for group info API (users + groups) FW API (users + groups) FW + BW SS for users + groups Slide 13 Measurement and Analysis of Online Social Networks 13 Link Symmetry even with directed links, there is a high level of symmetry possibly contributed to by informing users of new incoming links makes it harder to identify reputable sources due to dilution possible sol: who initiated the link? Slide 14 Measurement and Analysis of Online Social Networks 14 Power-law node degrees Orkut deviates: only 11.3% of network reached (effect of partial BFS crawl Snowball method) artificial cap of users number of outgoing links, leads to a distortion in distribution of high degrees differs from Web Slide 15 Measurement and Analysis of Online Social Networks 15 Power-law node degrees Slide 16 Measurement and Analysis of Online Social Networks 16 Power-law node degrees e.g. analysis of top keywords Slide 17 Measurement and Analysis of Online Social Networks 17 Spread of Information Slide 18 Measurement and Analysis of Online Social Networks 18 Power Law affectors services, accessibility, features mobile users 10 0 10 -8 1110000 Slide 19 Measurement and Analysis of Online Social Networks 19 Correlation of indegree and outdegree over 50% of nodes have indegree within 20% of their outdegree Slide 20 Measurement and Analysis of Online Social Networks 20 Path lengths and diameter all four networks have short path length Broder et al noted if Web were treated as undirected graph, path length would drop from 16 to 7, so what? Slide 21 Measurement and Analysis of Online Social Networks 21 Link degree correlations JDD: joint degree distribution mapping between outdegree and average indegree of all nodes connected to nodes of that outdegree YouTube different due to extremely popular users being connected to by many unpopular users Orkut shows bump due to undersampling Slide 22 Measurement and Analysis of Online Social Networks 22 Joint degree distribution and Scale- free behaviour undersampling of low-degree nodes celebrity-driven nature cap on links Slide 23 Measurement and Analysis of Online Social Networks 23 Densely connected core removing 10% of core nodes results in breaking up graph into millions of very small SCCs why an SCC? directed links matter for actual communication graphs below show results as nodes are removed starting with highest-degree nodes (left) and path length as graph is constructed beginning with highest-degree nodes(right) Sub logarithmic growth Slide 24 Measurement and Analysis of Online Social Networks 24 Tightly clustered fringe based on clustering coefficient social network graphs show stronger clustering, most likely due to mutual friends Possibly because personal content is not shared Slide 25 Measurement and Analysis of Online Social Networks 25 Groups group sizes follow power-law distribution represent tightly clustered communities Slide 26 Measurement and Analysis of Online Social Networks 26 Groups Orkut special case maybe because of partial crawl Slide 27 Measurement and Analysis of Online Social Networks 27 Node Value Determination 1.Directed Graph, current model nodes with many incoming links (hubs) have value due to their connection to many users it becomes easy to spread important information to the other nodes, e.g. DNS unhealthy in case of spam or viruses in order for a user to send spam, they have become a more important node, amass friends Slide 28 Measurement and Analysis of Online Social Networks 28 2.Link Initiator, requires temporal information if user A requests a link with user B, does that mean that user B is more important? even though graphs have a high level of link symmetry, this additional information can offset this symmetry unfortunately, examined graphs do not have temporal information Node Value Determination Slide 29 Measurement and Analysis of Online Social Networks 29 Trust lendingclub.com, Facebook application people are more willing to lend money to friends who are linked through a short path people are more willing to pay back those who are linked through a short path no indication of whether this actually works does trust increase as degree increases? what credit rating and JDD does a person have to get a good interest rate? Slide 30 Measurement and Analysis of Online Social Networks 30 Thank you shahan@cs

Recommended

View more >