measurement and analysis of online social networks

34
MEASUREMENT AND ANALYSIS OF ONLINE SOCIAL NETWORKS Professor : Dr Sheykh Esmaili Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1

Upload: xena

Post on 14-Jan-2016

28 views

Category:

Documents


4 download

DESCRIPTION

Measurement and Analysis of Online Social Networks. Professor : Dr Sheykh Esmaili Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani. INTRODUCTION. The Internet has spawned different types of information sharing systems, including the Web. MySpace (over 190 million users) - PowerPoint PPT Presentation

TRANSCRIPT

1

MEASUREMENT AND ANALYSIS OF ONLINE SOCIAL NETWORKS

Professor : Dr Sheykh Esmaili

Presenters: Pourya Aliabadi

Boshra ArdallaniParia Rakhshani

2

INTRODUCTION The Internet has spawned different types of

information sharing systems, including the Web.

MySpace (over 190 million users) Orkut (over 62 million) LinkedIn (over 11 million) LiveJournal (over 5.5 million) Unlike the Web, which is largely organized

around content, online social networks are organized around users.

3

INTRODUCTION(CONT.)

Users join a network, publish their profile and create links to any other users with whom they associate.

The resulting social network provides a basis for maintaining social relationships, for finding users with similar interests, and for locating content.

Understanding of the graph structure of online social networks is necessary to evaluate current systems, to design future online social network based systems.

4

INTRODUCTION(CONT.)

Recent work has proposed the use of social networks to mitigate email spam, to improve Internet search, and to defend against Sybil attacks.

We obtained our data by crawling publicly accessible information on these sites

5

INTRODUCTION(CONT.)

This differs from content graphs like the graph formed by Web hyperlinks, where the popular pages (authorities) and the pages with many references (hubs) are distinct.

We find that online social networks contain a large, strongly connected core of high-degree nodes, surrounded by many small clusters of low-degree nodes.

Flow of information in these networks.

6

ONLINE SOCIAL NETWORKING SITES

Online social networking sites. are usually run by individual corporations.

Users. must register with a site, possibly under a pseudonym. Some sites allow browsing of public data without explicit.

Links. The social network is composed of user accounts and links between users. Some sites (e.g. Flickr, LiveJournal) allow users to link to any other user, without consent from the link target.

7

ONLINE SOCIAL NETWORKING SITES(CONT.)

Groups. Most sites enable users to create and join special interest groups.

Users can post messages to groups and upload shared content to the group. Certain groups are moderated. admission to such a group and postings to a group are controlled by a user designated as the group’s moderator.

Other groups are unrestricted, allowing any member to join and post messages or content.

8

IS THE SOCIAL NETWORK USED IN LOCATING CONTENT?

Only Orkut is a “pure” social networking site, in the sense that the primary purpose of the site is finding and connecting to new users.

Flickr, YouTube, and LiveJournal are used for sharing photographs, videos, and blogs, respectively.

9

WHY STUDY SOCIAL NETWORKS? Are already at the heart of some very popular

Web sites. Play an important role in future personal and

commercial online interaction. Help us understand the impact of online

social networks on the future Internet. We speculate how our data might be of

interest to researchers in other disciplines.

10

SHARED INTEREST AND TRUST

Adjacent users in a social network tend to trust each other.

A number of research systems have been proposed to exploit this trust.

Adjacent users in a social network also tend to have common interests.

Users browse neighboring regions of their social network because they are likely to find content that is of interest to them.

11

IMPACT ON FUTURE INTERNET

Impact on future Internet Impact on other disciplines Sociologists can examine our data to test

existing theories Studying the structure of online social

networks may help improve the understanding of online campaigning and viral marketing.

Political campaigns have realized the importance of blogs in elections.

12

HOW TO GET DATASETS? Sites reluctant to give out data

Cannot enumerate user list Performed crawls of user graph

Crawled using cluster of 58 machines Used APIs where available Otherwise, used HTML screen scraping

13

CHALLENGES IN CRAWLING LARGE GRAPHS Need to crawl quickly

Underlying social networks changing rapidly

Need to crawl completely Social networks aren’t necessarily connected,

some users have no links, or small clusters Need to estimate the crawl coverage

14

HOW TO VERIFY SAMPLES

Obtain a random user sample

Conduct a crawl using these random users as seeds

See if these random nodes connect to the original WCC (weakly connected component)

15

DATASET FROM FLICKR

Used API to conduct the crawl

Obtained random users by guessing usernames (########@N00) to evaluate coverage

Covered 27% of user population, but remaining users have very few links

16

DATASET FROM LIVEJOURNAL

Used API to conduct the crawl

Obtained random users using special URL http://www.livejournal.com/random.bml

Crawl covered 95% of user population

17

DATASET FROM ORKUT

Used HTML screen-scraping to conduct the crawl

18

DATASET FROM YOUTUBE

Used API to conduct the crawl

Could not obtain random users Usernames user-specified strings

Unable to estimate fraction of users covered

19

HIGH-LEVEL DATA CHARACTERISTICS

Metrics vary by orders of magnitude However, networks share many key

properties

Flickr LiveJournal Orkut YouTube

Number of Users 1.8M 5.3M 3.1M 1.2M

Avg. friends per User 12.2 17.0 106 4.3

Number of User Groups 0.1M 7.5M 8.7M 30K

Avg. Group per User 4.6 21.2 106 0.25

19

20

ANALYSIS OF NETWORK STRUCTURE

Characterize the structural properties of the four network and compare them Link symetry Power-law node degrees Correlation of indegree and outdegree Path lengths and diameter Link degree correlations Densely connected core Tightly clustered fringe

20

21

HOW ARE THE LINKS DISTRIBUTED?

Distribution of indegree and outdegree is similar● Underlying cause is link symmetry

22

LINK SYMMETRY

YouTube Flickr LiveJournal Orkut

symmetry 79.1% 62.0% 73.5% 100.0%

Possibly contributed by informing users of new incoming links

Unlike other complex networks, such as the Web● Sites like cnn.com receive much links more than they

give

makes it harder to identify reputable sources

22

23

POWER-LAW NODE DEGREES

All social networks show properties consistent with power-law networks.●The majority of the nodes have small degree,

and a few nodes have significantly higher degree

CORRELATION OF INDEGREE AND OUTDEGREE

outdegree vs. indegree in web outdegree vs. indegree in social networks

24

PW CNN

OSN

25

PATH LENGTHS AND DIAMETER

all four networks have short path length from 4.25 – 5.88

network Avg.path len diameter

web 16.12 905

flickr 5.67 27

livejournal 5.88 20

orkat 4.25 9

youtube 5.10 21

25

LINK DEGREE CORRELATIONS

Examine which users tend to connect to each other Focus on:

Joint degree distribution How often nodes of different degrees connect to each

other Scale free behavior

A value calculated directly from the joint degree distribution of graph

Assortativity A measure of the likelihood for nodes to connect to

other nodes with similar degrees

26

27

JOINT DEGREE DISTRIBUTION AND SCALE-FREE BEHAVIOUR

28

DENSELY CONNECTED CORE comprising of between 1% and 10% of the highest degree

nodes removing 10% of core nodes results in breaking up graph

into millions of very small SCCs graphs below show results as nodes are removed starting

with highest-degree nodes (left) and path length as graph is constructed beginning with highest-degree nodes(right)

Sub logarithmic growth28

29

CLUSTERING COEFFICIENT

Clustering coefficient C is a metric of cliquishness

Online social networks are tightly clustered 10,000 times more clustered than random graphs 5-50 times more clustered than random power-law

graphs

Number of links between friends

Number of links that could existC

30

TIGHTLY CLUSTERED FRINGE

Low-degree users show high degree of clustering Social network graphs show stronger clustering

31

GROUPS

Network Groups Usage Avg. Size Avg. C

Flickr 103M 21% 82 0.47

LiveJournal 7M 61% 15 0.81

Orkut 9M 13% 37 0.52

YouTube 30M 8% 10 0.34

31

32

WHAT DOES THE STRUCTURE LOOK LIKE

o the networks contain a densely connected core of high-degree nodes;

o and that this core links small groups of strongly clustered, low-degree nodes at the fringes of the network.

octopus

33

CONCLUSIONS

Structure of OSNs is significantly different from the Web Higher degree symmetry in OSNs Much higher levels of local clustering in OSNs Privacy controls make graph crawling very difficult Pure social networks different from content sharing networks

34

Thanks