a comparative analysis of web and p2p traffic (www 2008)

Upload: asicsnew

Post on 08-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    1/23

    A Comparative

    Analysis of Web andP2P Traffic

    Naimul Basher, Aniket Mahanti,Anirban Mahanti, CareyWilliamson, and Martin Arlitt

    WWW 2008, Beijing

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    2/23

    INTRODUCTION

    In the past a significant proportion of Internettraffic was from Web applications using HTTP.

    Web traffic is distinguished by small-sizedflows, short-lived connections, asymmetricflow volumes, and well-defined port usage.

    The advent of Peer-to-Peer (P2P) file sharingapplications have triggered a paradigm shiftin Internet data exchange.

    P2P usage has grown steadily since

    inception, and recent empirical studies reportthat Web and P2P dominate todays Internettraffic.

    2

    WWW

    20

    0 8,Beijing

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    3/23

    WEB AND P2P CHARACTERIZATION

    We use recent packet traces collected at a large

    university (30,000 students and employees) toextensively characterize and compare trafficgenerated by Web and P2P applications.

    We primarily focus on characterizing behaviors of

    these applications at the flow-level and host-level. Our work develops flow-level distributional models

    that may be used to refine Internet traffic modelsfor use in network simulations and emulationexperiments.

    We also analyze and compare two P2Papplications, BitTorrent and Gnutella.

    3

    WWW

    20

    0 8,Beijing

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    4/23

    PREVIEW OF RESULTS

    Characteristics Web P2P

    Flow size Introduces many micebut few elephant flows.

    Introduces manymice and elephantflows.

    Flow IAT Typically short IAT. Typically long IAT.

    Flow duration Typically short-lived. Typically long-lived.

    Flow concurrency Most hosts maintainmore than oneconcurrent flow.

    Many hosts maintainonly one flow at atime.

    Transfer volume Large transfers are

    dominated bydownstream traffic.

    Large transfers

    happen in eitherupstream ordownstreamdirection.

    Geography Most externals hosts arelocated in the samegeographic region.

    External peers areglobally distributed.

    4

    WWW

    20

    0 8,Beijing

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    5/23

    TRACE COLLECTION METHODOLOGY

    Full packet traces were collected using lindump

    from the 100 Mbps full duplex commercial Internetconnection of the University of Calgary.

    Since P2P applications frequently use random portnumbers, we used payload signatures to identify

    applications. We used Bro, a network intrusion detection system

    to perform payload signature matching and mapnetwork flows to traffic types.

    Due to storage limitations we used non-contiguous

    1-hour traces collected each morning and eveningon Thursday through Sunday between April 6 andApril 30, 2006.

    5

    WWW

    20

    0 8,Beijing

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    6/23

    TRACE SUMMARY

    TCP Trace Statistics Count

    Number of Flows 23 million

    Number of Packets 945 million

    Data Volume 585 GB

    6

    WWW

    20

    0 8,Beijing

    Internet Applications Flows Bytes

    Web 40% 35%

    P2P 3% 33%

    P2P Applications Flows Bytes

    Gnutella 21% 78%

    BitTorrent 61% 17%

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    7/23

    CHARACTERIZATION METRICS

    Flow-level characterization metrics Flow size total bytes transferred during a connection.

    We label flows as mice if they transfer < 10 KB andelephants if they transfer > 5 MB.

    Flow duration the time between the start and the endof a TCP flow.

    Flow inter-arrival time (IAT) the time between twoconsecutive flow arrivals.

    Host-level characterization metrics Flow concurrency the maximum number of TCP flows

    a single host uses concurrently to transfer content.

    Transfer volume the total bytes transferred to

    (downstream) and from (upstream) a host.Geographic distribution the distribution of the shortest

    distance between hosts and our campus along thesurface of the Earth.

    7

    WWW

    20

    0 8,Beijing

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    8/23

    WEB AND P2P FLOW SIZES

    8

    WWW

    20

    0 8,Beijing

    P2P applications generate many small and many very large-sized flowsthan Web applications.

    Three sources of small sized flows in P2P: extensive signaling, aborted

    transfers, and connection attempts to non-responsive peers. We also find few very large P2P flows that are much larger than the

    occasional large Web transfers.

    P2P model: HybridPareto and Weibull

    Web model: HybridPareto and Weibull

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    9/23

    GNUTELLA/BITTORRENT FLOW SIZES

    9

    WWW

    20

    0 8,Beijing Gnutella and BitTorrent generate similar

    percentage of small-sized flows, mostly controldata exchanged between peers.

    Gnutella appears to generate more large-sizedflows than BitTorrent. BitTorrent uses file segmentation to split an object

    into multiple equal-sized pieces and downloadsthem using parallel flows. Gnutella typically

    downloads the entire object from a single peer.

    BitTorrent model:Hybrid Lognormal andPareto

    Gnutella model:Hybrid Lognormal andPareto

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    10/23

    MICE AND ELEPHANT PHENOMENONApplications Mice

    FlowsMiceBytes

    ElephantFlows

    ElephantBytes

    Web 76% 9% 0.04% 15%

    P2P 93% 0.5% 1% 93%

    Gnutella 83% 0.1% 3% 93%

    BitTorrent 95% 2% 0.1% 95%

    10

    WWW

    20

    0 8,Beijing

    Web mice flows account for a relatively higher proportion of the totalWeb bytes than P2P mice flows account for the total P2P bytes.

    P2P elephant flows are significantly larger than Web elephant flows. BitTorrent mice flows, on average, are larger than Gnutella mice flows

    because of BitTorrents intense signaling activities. BitTorrent elephant flows, on average, are larger than Gnutella

    elephant flows. Gnutella users share mostly audio files, while BitTorrent

    users share more video files.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    11/23

    WEB AND P2P INTER-ARRIVAL TIMES

    11

    WWW

    20

    0 8,Beijing

    Web flow IAT are much shorter than those of P2P flows.

    Web traffic has a higher arrival rate (80 flows/sec)compared to P2P traffic (6 flows/sec).

    Another factor contributing to the lower arrival rate andthe longer IAT values for P2P flows is the persistent natureof their TCP connections.

    P2P model: HybridWeibull and Pareto

    Web model: Two-mode Weibull

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    12/23

    WEB AND P2P FLOW DURATIONS

    12

    WWW

    20

    0 8,Beijing

    Approx. 70% of Web durations are < 1 sec indicating low response timesfor Web requests because of good Internet connectivity in our campus.

    Approx. 30% of P2P flows are shorter than 30 sec. These are failed,aborted, or signaling flows.

    There are few long duration P2P mice flows due to repeatedunsuccessful connection attempts. Approx. 40% of P2P flow durations are between 20 and 200 sec. These

    reflect bandwidth-limited connections.

    P2P model: HybridWeibull and Pareto

    Web model: Two-mode Pareto

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    13/23

    GNUTELLA/BITTORRENT FLOW DURATIONS

    13

    WWW

    20

    0 8,Beijing

    BitTorrent flows, on average, last longer thanGnutella flows.

    Longer flows of BitTorrent resulted due to its

    protocol architecture rarest first piece selection,fixed number of uploads/downloads permitted,persistent connection.

    Gnutella can use a single flow for downloading anobject and does not need to share bandwidth.

    BitTorrent model:Hybrid Lognormal andPareto

    Gnutella model:Hybrid Lognormal andPareto

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    14/23

    WEB AND P2P FLOW CONCURRENCY

    14

    WWW

    20

    0 8,Beijing

    Surprisingly many P2P hosts in our network maintain onlya single TCP connection.

    A significant proportion of internal Web hosts maintainmore than one concurrent TCP connection. Web browsers often initiate multiple concurrent connections to

    transfer content in parallel.

    High degree of Web flow concurrency (> 30) is due toWeb proxies and content distribution nodes.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    15/23

    GNUTELLA/BT FLOW CONCURRENCY

    15

    WWW

    20

    0 8,Beijing

    Most Gnutella hosts connect with only one hostat a time.

    We observed few Gnutella hosts with > 10concurrent TCP connections. These hosts actedas super-peers in Gnutellas peer hierarchy.

    Most BitTorrent hosts exhibit a high degree offlow concurrency, which is a natural occurrencein BitTorrent.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    16/23

    WEB AND P2P TRANSFER VOLUME

    16

    WWW

    20

    0 8,Beijing Approx. 50% of Web and P2P hosts transfer small amounts of data

    (< 1 MB) and are typically active for < 100 sec. P2P hosts that repeatedly yet unsuccessfully attempt connecting to peers. Web hosts that browse the Web, widgets that retrieve information from the

    Web periodically, and downloading small files.

    Approx. 35% of Web and 15% of P2P hosts transfer data < 10 MBand are active for < 1000 sec. P2P hosts that share small objects. Web hosts that browse the Web for prolonged periods, downloading

    software/multimedia, and HTTP-based streaming.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    17/23

    P2P TRANSFER SYMMETRY

    System Freeloader Fair-share Benefactor

    Gnutella 57% 10% 33%

    BitTorrent 10% 40% 50%

    17

    WWW

    20

    0 8,Beijing

    Transfer symmetry is a major concern for P2Psystem developers, who want to encourage fair

    sharing among participating peers. We observe more fairness in BitTorrent and more

    freeloading in Gnutella.

    BitTorrents tit-for-tat mechanism encourages

    uploading for the opportunity to download. Gnutella host behavior appears to be dominated by

    extreme upstream and downstream transfers.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    18/23

    WEB AND P2P HEAVY HITTERS

    WWW

    20

    0 8,Beijing

    18

    Heavy hitters are the few hosts that account for much ofthe traffic volume transferred.

    Heavy hitters are present in both Web and P2P.

    Most P2P heavy hitters are either freeloaders orbenefactors.

    The total amount of data transferred by the top 10% of Weband P2P hosts follows a power law distribution.

    Top ranked P2P hosts transfer an order of magnitude more

    data than top ranked Web hosts.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    19/23

    WEB AND P2P GEOGRAPHIC DISTRIBUTION

    19

    WWW

    20

    0 8,Beijing Approx. 75% of external Web hosts are in North

    America; Europe and Asia account for 10% each. A majority of our Web campus users are English

    speaking, and thus are likely to visit Web sites located

    in predominantly English-speaking countries. Approx. 60% of P2P hosts are located outside

    North America.This indicates that connectivity between P2P hosts

    does not strongly rely on host locality, rather itdepends on resource availability during connection

    establish phase.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    20/23

    GNUTELLA/BT GEOGRAPHIC DISTRIBUTION

    20

    WWW

    20

    0 8,Beijing

    Approx. 70% of Gnutella hosts are located inNorth America.This suggest either Gnutella peers prefer to connect

    with hosts that are in close proximity or that Gnutellaclients are widely used in North America for file

    sharing. Approx. 30% BitTorrent hosts are located in North

    America and approx. 40% are located in Europe. We believe that the list of trackers is created based on

    host bandwidth availability in a swarm, and we see abias towards regions with high broadband penetration.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    21/23

    NETWORK TRAFFIC MANAGEMENT

    21

    WWW

    20

    0 8,Beijing

    At the University of Calgary, traffic is managed using acommercial packet shaping device. At the time of capture the network policy was to group together all

    identified P2P flows and collectively limit their bandwidth to 56Kbps.

    We do not observe a strong positive correlation betweenflow size and duration. Some P2P flows are indeed identified and limited by the traffic

    shaper, however, we do see many other P2P flows that escapeddetection by the traffic shaper.

    Our results provide a snapshot of Web and P2Pcharacteristics from a large edge network, and should berepresentative of other edge networks with similar userpopulation and network management policies.

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    22/23

    RESULT HIGHLIGHTS

    Characteristics Web P2P

    Flow size Introduces many micebut few elephant flows.

    Introduces manymice and elephantflows.

    Flow IAT Typically short IAT. Typically long IAT.

    Flow duration Typically short-lived. Typically long-lived.Flow concurrency Most hosts maintain

    more than oneconcurrent flow.

    Many hosts maintainonly one flow at atime.

    Transfer volume Large transfers are

    dominated bydownstream traffic.

    Large transfers

    happen in eitherupstream ordownstreamdirection.

    Geography Most externals hosts arelocated in the same

    geographic region.

    External peers areglobally distributed.

    22

    WWW

    200

    8,Beijing

  • 8/6/2019 A Comparative Analysis of Web and P2P Traffic (WWW 2008)

    23/23

    SUMMARY AND FUTURE WORK

    Our work presented an extensive characterization of Web

    and P2P traffic using full packet traces collected at a largeedge network.

    We observed a number of contrasting features betweenWeb and P2P traffic using flow-level and host-levelmetrics.

    Flow-level distributional models were developed for Weband P2P traffic, which can be used in network simulationand emulation experiments.

    Traffic from other networks should be studied to facilitatedevelopment of general models for Web and P2P traffic.

    Impact of other non-Web applications, such as P2P VoIP,P2P IPTV, on Web-based applications can be studied aswell.

    23

    WWW

    200

    8,Beijing