measurement and analysis of cyberlocker services (www 2011)
DESCRIPTION
Cyberlocker Services (CLS) such as RapidShare and Megaupload have recently become popular. The decline of Peer-to-Peer (P2P) file sharing has prompted various services including CLS to replace it. We propose a comprehensive multi-level characterization of the CLS ecosystem. We answer three research questions: (a) what is a suitable measurement infrastructure for gathering CLS workloads; (b) what are the characteristics of the CLS ecosystem; and (c) what are the implications of CLS on Web 2.0 (and the Internet). To the best of our knowledge, this work is the first to characterize the CLS ecosystem. The work will highlight the content, usage, performance, infrastructure, quality of service, and evolution characteristics of CLS.TRANSCRIPT
Measurement and Analysis of Cyberlocker Services
Aniket Mahanti
University of Calgary Canada
Introduction • User-generated content has transformed how
people share and disseminate information. • Web has witnessed the emergence of
Cyberlocker services (CLS) lately. • Popular CLS include Rapidshare and
Megaupload. • CLS offer several advantages over P2P file
sharing and new-age content sharing systems: – Convenience, higher file availability, improved
privacy, diverse content, and economic incentives.
2
Motivation
3 Apr-08 Dec-08 Aug-09 Apr-10
Uni
que
Visi
tors
(mill
ion)
0
1
2
3
4
5RapidshareMegauploadMininova
0 5 10
Ipoque, 07 (Germany)
Ipoque, 07 (Mid East)
Sandvine, 09 (Global)
Ipoque, 09 (Global)
Maier et al. , IMC 09 (European ISP)
Antoniades et al. , IMC 09 (Greek Univ.)
CLS Traffic (% of Internet Traffic Volume)
Labovitz et al.[SIGCOMM 10]
P2P traffic (-71%) ↓
Labovitz et al.[SIGCOMM 10]
CLS traffic ↑
Research Questions 1. What is a suitable measurement
infrastructure for gathering CLS workloads? – Measurement framework for data collection from
multiple viewpoints. 2. What are the characteristics of the CLS
ecosystem? – Comprehensive multi-layered characterization of
the CLS ecosystem. 3. What are the implications of CLS on Web 2.0
(and the Internet)? – Evolution of CLS, caching, copyright issues.
4
Cyberlocker Architecture
5
CLS Service Structure Feature Premium Free
Content
Upload file size limit Yes (*Megupload) Yes
No Max. of downloads (*Rapidshare)
Upload any content Yes Yes
No File expiry
QoS No Wait time (*Mediafire)
Parallel downloads Yes
Performance Download rate priority Highest Low
Parallel connections Yes (*Mediafire)
6 *Exception = Yes = No
Methodology
7
Datasets Data Description Salient Features
Local
HTTP transaction logs (Jan-Dec/09)
Over 500 GB of compressed data; over 5 billion HTTP transactions
Connection summaries (Jan-Dec/09)
Over 1 TB of compressed logs; over 3 billion Web flows
Global
CLS crawl (Mar-Jul/10)
Over 1 million unique files (1% of all indexed files)
Web analytics (Apr/08 – Jun/10)
Over 2 years of data from 2 million users (1% of US Internet population)
Supplementary
File status requests Status requests for every file in the traces
Geolocation databases All CLS address prefixes geolocated
Query IP registries ISP and organization of CLS 8
Data Analysis (HTTP Transactions)
• Develop signatures to distinguish free and premium users by leveraging clickstreams.
• Example shows how wait time is calculated. 9
Content Dissemination Case Study
10
• CLS are an easy media for users to disseminate content quickly . • Many content replicas on CLS, when compared to BitTorrent.
Time since Broadcast (min)100 101 102 103 104 105C
umul
ativ
e %
of P
ostin
gs
0
20
40
60
80
100
(a)
CLSBitTorrent
Postings per Episode0 10 20 30
Cum
ulat
ive
% o
f File
s
0
20
40
60
80
100
(b)
BitTorrentCLS
Content Properties
11
Content Size (MB)100 101 102 103 104 105C
umul
ativ
e %
of C
onte
nt
0
20
40
60
80
100
(a)
HotfileMediafire
MegauploadRapidshare
File Age (days)0 400 800 1200 1600
Cum
ulat
ive
% o
f File
s
0
20
40
60
80
100
HotfileMediafireMegauploadRapidshare
(b)
• CLS (generally) being used to host very large content. • Active files being hosted for a long period of time.
Performance and QoS [Rapidshare]
12
Download Rate (KB/sec)100 101 102 103 104
Cum
ulat
ive
% o
f Fi
le D
ownl
oads
0
20
40
60
80
100
(a)
Premium Free
File Size (MB)0 50 100 150 200
Wai
t Tim
e (s
ec)
50
100
150
200
(b)
• Premium downloads are order of magnitude faster than free ones. • Wait times increase linearly with file size.
Summary
• Proposed a comprehensive multi-level characterization of the CLS ecosystem.
• Devised a measurement framework to collect datasets from multiple vantage points.
• CLS is one of the fastest growing Web service with the potential to replace P2P as the dominant content sharing technology.
• This research will highlight the content, usage, performance, infrastructure, and quality of service characteristics of CLS.
13