transport layer identification of p2p traffic
TRANSCRIPT
-
8/2/2019 Transport Layer Identification of P2P Traffic
1/30
Transport LayerIdentification of P2PTraffic
T. Karagiannis, A. Broido,M. Faloutsos, K. Claffy
-
8/2/2019 Transport Layer Identification of P2P Traffic
2/30
Outline
Introduction
Related work
Payload analysis & Limitations Non-payload identification
Experiments & Evaluation
P2P traffic trends Conclusions
-
8/2/2019 Transport Layer Identification of P2P Traffic
3/30
Characters of P2P Traffic Traffic volume grows rapidly
Frequently upgrades & emergence of newprotocols
Disguise the traffic to circumvent firewalls& legal issues Non-standard, proprietary protocols (poorly
documented) Operate on arbitrary port numbers Support payload encryption
-
8/2/2019 Transport Layer Identification of P2P Traffic
4/30
Identification Methodology
Examining packet payload
Signature-based methodology
Limitations
Identifying at transport layer
Based on flow patterns & P2P behaviorsAdvantages
-
8/2/2019 Transport Layer Identification of P2P Traffic
5/30
Contributions
Develop a methodology for P2Ptraffic profiling by identifying flowpatterns and behavior characteristics
Evaluate the effectiveness bycomparing with payload analysis
Convince the growing of P2P trafficby analyzing backbone traces
-
8/2/2019 Transport Layer Identification of P2P Traffic
6/30
Previous Work
Detailed characterization of a small subsetof P2P protocols & networks
Properties of topology, bandwidth, caching& availability, etc.
Signature-based traffic identification
Traffic estimation of P2P applications withfixed ports
-
8/2/2019 Transport Layer Identification of P2P Traffic
7/30
Payload Analysis
-
8/2/2019 Transport Layer Identification of P2P Traffic
8/30
Payload Analysis M1: Flag a flow with a src/dst port number
matching one of the well-known port numbers.
M2: Flag a flow as P2P if the 16-byte payload of
any packet matches the signatures , else flag itas non-P2P. A loose lower bound on P2P volume
M3: Hash the {src, dst} ip pair of a flow flaggedas P2P into a table. Flag the flows containing anIP address in the table as possible P2P even ifno payload matches.
-
8/2/2019 Transport Layer Identification of P2P Traffic
9/30
Limitations
Captured payload size
Only first 16 bytes of payload
Only 4 bytes in older traces
HTTP requests
Encryption
Other P2P protocols Unidirectional traces
-
8/2/2019 Transport Layer Identification of P2P Traffic
10/30
Non-payload Identification
Two main heuristics:
{src, dst} IP pairs that use both TCPand UDP to transfer data
The behavior of peers by studying
connection characteristics of {IP, port}pairs
-
8/2/2019 Transport Layer Identification of P2P Traffic
11/30
High-level description
Data processing
Build the flow table
Collect information on various characteristics
Identification of potential P2P pairs
Based on the two P2P heuristics
Eliminate false positives
By other heuristics of non-P2P traffic
-
8/2/2019 Transport Layer Identification of P2P Traffic
12/30
TCP/UDP Heuristic
Concurrent usage of both TCP & UDP istypical for many P2P protocols
Look for {src, dst} IP pairs that use bothTCP & UDP protocols to identify P2P hosts
Other protocols that also use TCP & UDP
concurrently DNS, NETBIOS, IRC, gaming, streaming
Fixed well-known ports
-
8/2/2019 Transport Layer Identification of P2P Traffic
13/30
TCP/UDP Heuristic
If a {src, dst} IP pair concurrently usesboth TCP and UDP, we consider flowsbetween this pair P2P so long as the src ordst ports are not in the set in Table 3
-
8/2/2019 Transport Layer Identification of P2P Traffic
14/30
Connection Pattern Heuristic
P2P: for a {IP, port} pair,
N(distinct connected ports) = N(distinctconnected IPs)
Web: for {w, 80} pair,
N(distinct connected ports) N(distinctconnected IPs)
while a host initiates more than oneconcurrent connection for paralleldownloading
-
8/2/2019 Transport Layer Identification of P2P Traffic
15/30
False positives
Some heuristics for decreasing falsepositives
Mail server
DNS
Gameing
Malware
Others
-
8/2/2019 Transport Layer Identification of P2P Traffic
16/30
Mail Server
Behavior resembles {IP, port} heuristic
Examine the flows with port number 25,110, 113
-
8/2/2019 Transport Layer Identification of P2P Traffic
17/30
DNS
Concurrently use TCP & UDP at port 53
For flows that (src-port = dst-port) < 501,
both src & dst {IP, port} pairs areconsidered non-P2P
-
8/2/2019 Transport Layer Identification of P2P Traffic
18/30
Gaming & Malware
Many flows to different IPs/ports, carryingthe same-sized packet
-
8/2/2019 Transport Layer Identification of P2P Traffic
19/30
Gaming & Malware
-
8/2/2019 Transport Layer Identification of P2P Traffic
20/30
Other Heuristics
Scans
Count the number of {IP, port} withspecific IP to eliminate port scans
One packet pairs
Remove one packet flows
-
8/2/2019 Transport Layer Identification of P2P Traffic
21/30
Other Heuristics
Msn messenger serverPort 1863
3 distinct dst IPs within the same prefix
Port historyExamine the set of ports connected to
an {IP, port} pairReject if all ports reflect well-known
service
-
8/2/2019 Transport Layer Identification of P2P Traffic
22/30
Final Algorithm P2PIP: IPs classified as P2P by TCP/UDP
heuristic
P2PPairs: {IP, port} pairs classified as P2P by{IP, port} pair heuristic
Rejected: rejected pairs MailServers: rejected IPs
IPPort: {IP, port} pairs not in MailServers orRejected
IPSet: distinct IPs with specific pair PortSet: distinct ports for specific pair
Avg_pktssizesSet: distinct average packet sizes
Transfer_sizesSet: distinct tranferred flow sizes
-
8/2/2019 Transport Layer Identification of P2P Traffic
23/30
Final Algorithm
-
8/2/2019 Transport Layer Identification of P2P Traffic
24/30
Fraction of Identified P2PTraffic
-
8/2/2019 Transport Layer Identification of P2P Traffic
25/30
False Positives
-
8/2/2019 Transport Layer Identification of P2P Traffic
26/30
Robustness
-
8/2/2019 Transport Layer Identification of P2P Traffic
27/30
Pros & Cons
ProsPrivacy issues
Anonymization of IP addresses
Storage overheadProcessing overhead
Ability to detect unknown protocols
Overcome encryption Cons
Disability in analyzing specific protocol
-
8/2/2019 Transport Layer Identification of P2P Traffic
28/30
P2P Traffic Trends
-
8/2/2019 Transport Layer Identification of P2P Traffic
29/30
Conclusions
Non-payload identificationmethodologyAbility to identify unknown protocols
Miss 5% flows comparing with payloadanalysis
8%~12% false positives
Challenge the claims of P2P trafficsdecline
-
8/2/2019 Transport Layer Identification of P2P Traffic
30/30
Thanks !