measurement and analysis of cyberlocker services (www 2011)

13
Measurement and Analysis of Cyberlocker Services Aniket Mahanti University of Calgary Canada

Upload: asicsnew

Post on 26-Mar-2015

108 views

Category:

Documents


0 download

DESCRIPTION

Cyberlocker Services (CLS) such as RapidShare and Megaupload have recently become popular. The decline of Peer-to-Peer (P2P) file sharing has prompted various services including CLS to replace it. We propose a comprehensive multi-level characterization of the CLS ecosystem. We answer three research questions: (a) what is a suitable measurement infrastructure for gathering CLS workloads; (b) what are the characteristics of the CLS ecosystem; and (c) what are the implications of CLS on Web 2.0 (and the Internet). To the best of our knowledge, this work is the first to characterize the CLS ecosystem. The work will highlight the content, usage, performance, infrastructure, quality of service, and evolution characteristics of CLS.

TRANSCRIPT

Page 1: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Measurement and Analysis of Cyberlocker Services

Aniket Mahanti

University of Calgary Canada

Page 2: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Introduction • User-generated content has transformed how

people share and disseminate information. • Web has witnessed the emergence of

Cyberlocker services (CLS) lately. • Popular CLS include Rapidshare and

Megaupload. • CLS offer several advantages over P2P file

sharing and new-age content sharing systems: – Convenience, higher file availability, improved

privacy, diverse content, and economic incentives.

2

Page 3: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Motivation

3 Apr-08 Dec-08 Aug-09 Apr-10

Uni

que

Visi

tors

(mill

ion)

0

1

2

3

4

5RapidshareMegauploadMininova

0 5 10

Ipoque, 07 (Germany)

Ipoque, 07 (Mid East)

Sandvine, 09 (Global)

Ipoque, 09 (Global)

Maier et al. , IMC 09 (European ISP)

Antoniades et al. , IMC 09 (Greek Univ.)

CLS Traffic (% of Internet Traffic Volume)

Labovitz et al.[SIGCOMM 10]

P2P traffic (-71%) ↓

Labovitz et al.[SIGCOMM 10]

CLS traffic ↑

Page 4: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Research Questions 1. What is a suitable measurement

infrastructure for gathering CLS workloads? – Measurement framework for data collection from

multiple viewpoints. 2. What are the characteristics of the CLS

ecosystem? – Comprehensive multi-layered characterization of

the CLS ecosystem. 3. What are the implications of CLS on Web 2.0

(and the Internet)? – Evolution of CLS, caching, copyright issues.

4

Page 5: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Cyberlocker Architecture

5

Page 6: Measurement and Analysis of Cyberlocker Services (WWW 2011)

CLS Service Structure Feature Premium Free

Content

Upload file size limit Yes (*Megupload) Yes

No Max. of downloads (*Rapidshare)

Upload any content Yes Yes

No File expiry

QoS No Wait time (*Mediafire)

Parallel downloads Yes

Performance Download rate priority Highest Low

Parallel connections Yes (*Mediafire)

6 *Exception = Yes = No

Page 7: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Methodology

7

Page 8: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Datasets Data Description Salient Features

Local

HTTP transaction logs (Jan-Dec/09)

Over 500 GB of compressed data; over 5 billion HTTP transactions

Connection summaries (Jan-Dec/09)

Over 1 TB of compressed logs; over 3 billion Web flows

Global

CLS crawl (Mar-Jul/10)

Over 1 million unique files (1% of all indexed files)

Web analytics (Apr/08 – Jun/10)

Over 2 years of data from 2 million users (1% of US Internet population)

Supplementary

File status requests Status requests for every file in the traces

Geolocation databases All CLS address prefixes geolocated

Query IP registries ISP and organization of CLS 8

Page 9: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Data Analysis (HTTP Transactions)

• Develop signatures to distinguish free and premium users by leveraging clickstreams.

• Example shows how wait time is calculated. 9

Page 10: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Content Dissemination Case Study

10

• CLS are an easy media for users to disseminate content quickly . • Many content replicas on CLS, when compared to BitTorrent.

Time since Broadcast (min)100 101 102 103 104 105C

umul

ativ

e %

of P

ostin

gs

0

20

40

60

80

100

(a)

CLSBitTorrent

Postings per Episode0 10 20 30

Cum

ulat

ive

% o

f File

s

0

20

40

60

80

100

(b)

BitTorrentCLS

Page 11: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Content Properties

11

Content Size (MB)100 101 102 103 104 105C

umul

ativ

e %

of C

onte

nt

0

20

40

60

80

100

(a)

HotfileMediafire

MegauploadRapidshare

File Age (days)0 400 800 1200 1600

Cum

ulat

ive

% o

f File

s

0

20

40

60

80

100

HotfileMediafireMegauploadRapidshare

(b)

• CLS (generally) being used to host very large content. • Active files being hosted for a long period of time.

Page 12: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Performance and QoS [Rapidshare]

12

Download Rate (KB/sec)100 101 102 103 104

Cum

ulat

ive

% o

f Fi

le D

ownl

oads

0

20

40

60

80

100

(a)

Premium Free

File Size (MB)0 50 100 150 200

Wai

t Tim

e (s

ec)

50

100

150

200

(b)

• Premium downloads are order of magnitude faster than free ones. • Wait times increase linearly with file size.

Page 13: Measurement and Analysis of Cyberlocker Services (WWW 2011)

Summary

• Proposed a comprehensive multi-level characterization of the CLS ecosystem.

• Devised a measurement framework to collect datasets from multiple vantage points.

• CLS is one of the fastest growing Web service with the potential to replace P2P as the dominant content sharing technology.

• This research will highlight the content, usage, performance, infrastructure, and quality of service characteristics of CLS.

13