a simulation study of p2p file pollution prevention mechanisms chia-li huang, polly huang network...
TRANSCRIPT
A Simulation Study of P2P File Pollution Prevention Mechanisms
Chia-Li Huang, Polly HuangNetwork & Systems Laboratory
Department of Electrical EngineeringNational Taiwan University
1
Outline
• Background • Problem• Methodology• Simulation Environment & Results• Conclusion
2
• P2P file sharing system with search capability• Issue a query with keywords to search for a file
Meta-Data(For keyword
matching)
Content
A file in system
songA
Song title, length, encoding scheme of songA
Overview of P2P file sharing system
3
HashValue
Hash function
Different versions of songA
Mp3, wma,…
How a user searches for a file
4
P2P network
Query for songA
Peer1
Responses for songA
Randomly choose a source for download
Pollution in file sharing system
• Definition of a polluted file– Meta-data description doesn’t match its content!
• Current P2P networks are full of polluted files [1]– Unintentional – Intentional
Meta-Data A
Content B
[1] J. Liang, Y. X. R. Kumar, and K. Ross, “Pollution in p2p file sharing systems,” in Proceedings of IEEE Infocom, 20055
Problem
• Pollution in P2P system results in the following problems– Reduce content availability– Increase redundant traffic
• There are different anti-pollution mechanisms existing– Which one is better?
6
Methodology• Simulation study on anti-pollution mechanisms– Extending a P2P simulator [2]– Existing anti-pollution mechanisms
• Peer reputation system– Choose a reputable peer to download file– EigenTrust [3]
• Object reputation system– Choose a reputable version of a file to download– Credence [4]
– Different pollution attacks– User behavior
7
[2] M. Schlosser and S. Kamvar, “Simulating a file-sharing p2p network ,” In Proc.of SemPGRID 2003[3] S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina, “The eigentrust algorithm for reputation management in p2p networks”, in Proceedings of the Twelfth International World Wide Web Conference,[4] K. Walsh and E. G. Sirer, “Experience with an object reputation system for peer-topeer filesharing”, in Proceedings of Networked System Design and Implementation (NSDI), May 2006.
• Rate a peer by it’s uploading history from the whole system
Peer Reputation System : EigenTrust
Peeri Peerj
Cij=
Local reputation (Cij) Global reputation(Ti)
Good file
• Rate a peer by it’s uploading history from the whole system
• Choose a reputable peer to download
T1 =?
T2
Peer2
Peer1
Peer3
C21 C31
T3
Peer Reputation System : EigenTrust
9
Peeri Peerj
Cij=
Local reputation (Cij) Global reputation(Ti)
Bad file
1.0
0
9.0
0
C 12
C 14
0
0
Peer 2
Peer 4
Peer 1
A peer will store a list of local reputations
T1 = C21* T2 + C31*T3
• Calculate an object (file) reputation by weighted votes– After download vote it as clean or polluted
Query of song A
Vote-gather Query of song A
Object Reputation System : Credence
10
Vote database of Peer1
Obj3 Good
Obj4 Good
Obj5 Bad
Obj6 Bad
P2
P1
P3
P4
P5
• Calculate an object (file) reputation by weighted votes– After download vote it as clean or polluted
• Choose a reputable version for download
Votep3
Version1
Responses of song A
Vote-responses of song A Version no.
Sources Received Votes
VersionReputation
Version 1 P2 , P3 VoteP2 VoteP3
CorrP1,P2*VoteP2 + CorrP1,P3*VoteP3
Version 2 P4 VoteP4
VoteP5
CorrP1,P4*VoteP4 + CorrP1,P5*VoteP5
Object Reputation System : Credence
11
Vote database of Peer2
Obj1 Bad
Obj2 Bad
Obj3 Good
Obj4 Good
Vote database of Peer3
Obj5 Good
Obj6 Good
Obj7 Bad
Obj8 Bad
Vote database of Peer1
Obj3 Good
Obj4 Good
Obj5 Bad
Obj6 Bad
Received Responses of P1
Positive correlation
Negativecorrelation
Version 2
P2
P1
P3
P4
P5
Version1
Votep2
Vote p4
Vote p5
random choose a source
Pollution Attacks• Prevalent pollution attacks [5]
– Decoy Insertion– Hash Corruption
12
A clean file ofSongA
Hash Corruption
MA
H1
Clean
MA
H1
Corrupted
MA
H2
Corrupted
[5] F. Benevenuto, C. Costa, M. Vasconcelos, V. Almeida, J. Almeida, and M. Mowbray,“Impact of peer incentives on the dissemination of polluted content”, in SAC ’06
Decoy Insertion
[6] U. Lee, M. Choi, J. Cho, M. Y. Sanadidi, and M. Gerla, “Understanding pollution dynamics in p2p file sharing”, in Proceedings of the 5th International Workshop on Peer-to-Peer Systems (IPTPS’06), 2006
• Slackness [6]– A period of time between download completion and quality check– Bimodal distribution
• Awareness [6]– The probability that a user can correctly recognize
a file being polluted – No clear characteristic is observed
• high-awareness prob. = 0.8 • low-awareness prob. = 0.2
User Behavior
13
Outline
• Background • Problem• Methodology• Simulation Environment & Results• Conclusion
14
Simulator Description• P2P Query Cycle based simulator – In a cycle, each peer issues one query and repeats
downloading until satisfied
• Extension– Types of attacks
• Decoy Insertion, Hash Corruption– Anti-Pollution mechanisms
• EigenTrust, Credence– User behavior
• Slackness, awareness
15
Simulation Scenario
16
Type of Peer
MaliciousAlways share polluted files based on different attack s
Normal Share what they’ve downloaded
Simulation Setup
17
Peers [9]# of normal peers# of malicious peers# of neighbors
100 100 6
ContentDistribution
[8] [9]
# of Categories in the system# of Categories of each peerFiles in a category andVersions of each fileFile size distribution
20At least 4Zipf distribution with α = 1
Table 1
Simulation # of cycles# of experiments
30010
[8] S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina, “The eigentrust algorithm for reputation management in p2p networks”, in Proceedings of the Twelfth International World Wide Web Conference,[9] K. Walsh and E. G. Sirer, “Experience with an object reputation system for peer-topeer filesharing”, in Proceedings of
Networked System Design and Implementation (NSDI), May 2006.[10] N. Leibowitz, M. Ripeanu, and A. Wierzbicki, “Deconstructing the Kazaa network”, Internet Applications. WIAPP 2003.
Proceedings. The Third IEEE Workshop
Size 1KB 10KB 100KB 1MB 10MB 100MB 1GB
Percentage 1.5% 1.83% 26.67% 10.00% 35.00% 15.00% 10.00%
Table 1. File size distribution of P2P traffic [10]
Critical Evaluation Parameters
18
AttackFraction of high-aware
peers in the network
slackness
Decoy-Insertion80%50%20%
Yes or No
Hash-corruption80%50%20%
Yes or No
Decoy-Insertion & Hash- corruption
80%50%20%
Yes or No
Evaluate different anti-pollution mechanisms
under the following scenarios
• Successful Downloading Rate (per cycle)
• Redundant Traffic (per cycle)
• Reduced traffic Ratio(compared to randomly selection )
R
MR
M RT
RTRTRTR j
j
Evaluation metrics
19
n
i i
n
iM
tSDR
j
1
11
Symbol Descriptions
Mj Mechanism of Credence or EigentTrust
n # of high-aware peers
ti Trials of downloads for a peer i to geta clean file in a cycle
PT Polluted traffic
CT Control traffic
jjj MMM CTPTRT
Total successful downloads
Total trials of downloads
Redundant traffic generated by random selection
Reduced redundant traffic by using Mj
Simulation Result
• Compare the performance of different anti-pollution mechanisms under different scenarios– EigenTrust– Credence– Random
20
Successful Downloading Rate
21
Credence is more sensitive to the type of attacks
Under Hash-Corruption attackUnder Decoy-Insertion attack
Credence identifies a clean version before download
EigenTrsut rates on peers, not the hashvalue
Converge after 100 cycles
Credence > EigenTrust
EigenTrust > Credence
Observation 1 : User awareness
22
EigenTrustCredence
Reasons:1. Fewer peers share clean files
2. Less peers correctly operate the reputation system
Observation 1 : User awareness
23
EigenTrustCredence
User awareness is critical on anti-pollution mechanisms
Reasons:1. Fewer peers share clean files
2. Less peers correctly operate the reputation system
Observation 2 : User slackness
User slackness has negative effect onAnti-pollution mechanisms
24
Pollution held by a user longer has more chances to be download
Discussion
• User behavior has significant effect on anti-pollution mechanisms
• Credence performs better under Decoy Insertion, while Eigentrust performs better under Hash Corruption– Type of attacks can’t be predicted– Suggest a hybrid anti-pollution mechanism
25
Versions Sources
Version1 P1, P5, P7, . . .P124
Version2 P14, P21, P35
: :
VersionN P4, P2
Hybrid Anti-pollution Mechanism
26
Response -list
Step1:Select a reputable version byobject reputation mechanism
Step2:Select a reputable peer by peer reputation mechanism
P2P network
Query for songA
Successful Downloading Rate
27
Decoy Insertion Hash Corruption
Ensure both a reputable version and a source confront different types of attacks
Successful Downloading Rate
28
Decoy Insertion Hash Corruption
Ensure both a reputable version and a source confront different types of attacks
Hybrid mechanism performs the best under both attacks
Reduced-Traffic Ratio
• Hybrid mechanism generate more control traffic– Trade-off between pollution traffic & control traffic
29
The trade-off is worthwhile
Decoy Insertion Hash Corruption
Conclusion
• Both peer reputation and object reputation system are necessary
• User behavior has significant influence on anti-pollution mechanisms
30
Thank you!
31