transport layer identification of p2p traffic

Upload: mukesh-katole

Post on 06-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    1/30

    Transport LayerIdentification of P2PTraffic

    T. Karagiannis, A. Broido,M. Faloutsos, K. Claffy

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    2/30

    Outline

    Introduction

    Related work

    Payload analysis & Limitations Non-payload identification

    Experiments & Evaluation

    P2P traffic trends Conclusions

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    3/30

    Characters of P2P Traffic Traffic volume grows rapidly

    Frequently upgrades & emergence of newprotocols

    Disguise the traffic to circumvent firewalls& legal issues Non-standard, proprietary protocols (poorly

    documented) Operate on arbitrary port numbers Support payload encryption

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    4/30

    Identification Methodology

    Examining packet payload

    Signature-based methodology

    Limitations

    Identifying at transport layer

    Based on flow patterns & P2P behaviorsAdvantages

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    5/30

    Contributions

    Develop a methodology for P2Ptraffic profiling by identifying flowpatterns and behavior characteristics

    Evaluate the effectiveness bycomparing with payload analysis

    Convince the growing of P2P trafficby analyzing backbone traces

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    6/30

    Previous Work

    Detailed characterization of a small subsetof P2P protocols & networks

    Properties of topology, bandwidth, caching& availability, etc.

    Signature-based traffic identification

    Traffic estimation of P2P applications withfixed ports

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    7/30

    Payload Analysis

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    8/30

    Payload Analysis M1: Flag a flow with a src/dst port number

    matching one of the well-known port numbers.

    M2: Flag a flow as P2P if the 16-byte payload of

    any packet matches the signatures , else flag itas non-P2P. A loose lower bound on P2P volume

    M3: Hash the {src, dst} ip pair of a flow flaggedas P2P into a table. Flag the flows containing anIP address in the table as possible P2P even ifno payload matches.

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    9/30

    Limitations

    Captured payload size

    Only first 16 bytes of payload

    Only 4 bytes in older traces

    HTTP requests

    Encryption

    Other P2P protocols Unidirectional traces

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    10/30

    Non-payload Identification

    Two main heuristics:

    {src, dst} IP pairs that use both TCPand UDP to transfer data

    The behavior of peers by studying

    connection characteristics of {IP, port}pairs

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    11/30

    High-level description

    Data processing

    Build the flow table

    Collect information on various characteristics

    Identification of potential P2P pairs

    Based on the two P2P heuristics

    Eliminate false positives

    By other heuristics of non-P2P traffic

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    12/30

    TCP/UDP Heuristic

    Concurrent usage of both TCP & UDP istypical for many P2P protocols

    Look for {src, dst} IP pairs that use bothTCP & UDP protocols to identify P2P hosts

    Other protocols that also use TCP & UDP

    concurrently DNS, NETBIOS, IRC, gaming, streaming

    Fixed well-known ports

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    13/30

    TCP/UDP Heuristic

    If a {src, dst} IP pair concurrently usesboth TCP and UDP, we consider flowsbetween this pair P2P so long as the src ordst ports are not in the set in Table 3

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    14/30

    Connection Pattern Heuristic

    P2P: for a {IP, port} pair,

    N(distinct connected ports) = N(distinctconnected IPs)

    Web: for {w, 80} pair,

    N(distinct connected ports) N(distinctconnected IPs)

    while a host initiates more than oneconcurrent connection for paralleldownloading

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    15/30

    False positives

    Some heuristics for decreasing falsepositives

    Mail server

    DNS

    Gameing

    Malware

    Others

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    16/30

    Mail Server

    Behavior resembles {IP, port} heuristic

    Examine the flows with port number 25,110, 113

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    17/30

    DNS

    Concurrently use TCP & UDP at port 53

    For flows that (src-port = dst-port) < 501,

    both src & dst {IP, port} pairs areconsidered non-P2P

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    18/30

    Gaming & Malware

    Many flows to different IPs/ports, carryingthe same-sized packet

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    19/30

    Gaming & Malware

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    20/30

    Other Heuristics

    Scans

    Count the number of {IP, port} withspecific IP to eliminate port scans

    One packet pairs

    Remove one packet flows

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    21/30

    Other Heuristics

    Msn messenger serverPort 1863

    3 distinct dst IPs within the same prefix

    Port historyExamine the set of ports connected to

    an {IP, port} pairReject if all ports reflect well-known

    service

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    22/30

    Final Algorithm P2PIP: IPs classified as P2P by TCP/UDP

    heuristic

    P2PPairs: {IP, port} pairs classified as P2P by{IP, port} pair heuristic

    Rejected: rejected pairs MailServers: rejected IPs

    IPPort: {IP, port} pairs not in MailServers orRejected

    IPSet: distinct IPs with specific pair PortSet: distinct ports for specific pair

    Avg_pktssizesSet: distinct average packet sizes

    Transfer_sizesSet: distinct tranferred flow sizes

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    23/30

    Final Algorithm

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    24/30

    Fraction of Identified P2PTraffic

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    25/30

    False Positives

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    26/30

    Robustness

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    27/30

    Pros & Cons

    ProsPrivacy issues

    Anonymization of IP addresses

    Storage overheadProcessing overhead

    Ability to detect unknown protocols

    Overcome encryption Cons

    Disability in analyzing specific protocol

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    28/30

    P2P Traffic Trends

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    29/30

    Conclusions

    Non-payload identificationmethodologyAbility to identify unknown protocols

    Miss 5% flows comparing with payloadanalysis

    8%~12% false positives

    Challenge the claims of P2P trafficsdecline

  • 8/2/2019 Transport Layer Identification of P2P Traffic

    30/30

    Thanks !