transitivity of trust

55
Transitivity of Trust Team Founta Antigoni-Maria, UID: 647 Kouslis Ilias, UID: 650 Moutidis Iraklis, UID: 636 Spathis Dimitris, UID: 640

Upload: antigoni-maria-founta

Post on 13-Apr-2017

119 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Transitivity of Trust

Transitivity of Trust

TeamFounta Antigoni-Maria, UID: 647Kouslis Ilias, UID: 650Moutidis Iraklis, UID: 636Spathis Dimitris, UID: 640

Page 2: Transitivity of Trust

OVERVIEW

● Introduction● Task Definition● Schema

● Combating web spam with TrustRank. ● Propagation of Trust and Distrust

● The EigenTrust algorithm for reputation management in p2p networks● Attack-Resistant Trust Metrics for Public Key Certification

● Dataset Suggestions● Conclusion

Page 3: Transitivity of Trust

Introduction

● Given the open nature of social networks and their current level of popularity, users are increasingly concerned about privacy and security;

● We need to trust the entities that belong to our social network;

● To achieve that, a “Web of Trust” should be introduced;

● In order to balance the open nature of social networks and safeguard the privacy concerns of users, it is important to build “Trust Communities”.

Page 4: Transitivity of Trust

Task Definition [2]

Challenges

Users sometimes adopt many personas and express a large number of biased opinions.

Difficulty in defining trust.

Importance

On e-commerce, a trust model can increase the value of a product.

Trusted users will have greater influence and perks; that can lead to positive effect on user behaviour.

Better recommendations.

Applications

Internet Networks:

● Social networks● P2P networks● Certificate networks● Mail networks

Page 5: Transitivity of Trust

Schema

Web of Trust

Case Studies

EigenTrust on P2P Digital Certificates

Methodologies

TrustRank Trust & Distrust

Case Studies

Page 6: Transitivity of Trust

OVERVIEW

● Introduction● Task Definition● Schema

● Combating web spam with TrustRank ● Propagation of Trust and Distrust

● The EigenTrust algorithm for reputation management in p2p networks● Attack-Resistant Trust Metrics for Public Key Certification

● Dataset Suggestions● Conclusion

Page 7: Transitivity of Trust

Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August).

Combating web spam with TrustRank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 576-587). VLDB Endowment.

Page 8: Transitivity of Trust

TrustRank overview

Gyongyi et al proposed a couple of techniques to semi-automatically separate reputable web pages from spam. Their process is to first select a small set of seed pages to be human-evaluated. Once they detect manually the reputable seed pages, they exploit the nature of web, namely its link structure in order to discover more that are likely to be good, as well. The benchmark dataset consists of AltaVista’s web index as of 2003.

Page 9: Transitivity of Trust

Contribution

1. Formalization of web spam problem and detection algorithms.2. Metrics defined for assessing the efficacy of detection algorithms.3. Schemes for selecting seed sets of pages to be manually evaluated.4. Introduction of TrustRank algorithm for determining the likelihood that pages

are reputable.5. An extensive evaluation, based on 31 million sites crawled by the AltaVista

search engine, and a manual examination of over 2,000 sites.

Page 10: Transitivity of Trust

Assessing trust

The creators of good pages can sometimes be “tricked,” so we do find some good-to-bad links on the web.

Page 11: Transitivity of Trust

Assessing trustOracle functionO(p) = 0 if p is bad,1 if p is good.

Trust functionT(p) = Pr[O(p) = 1]

Ordered Trust PropertyT(p) < T(q) ⇔ Pr[O(p) = 1] < Pr[O(q) = 1]T(p) = T(q) ⇔ Pr[O(p) = 1] = Pr[O(q) = 1]

Threshold value δT(p) > δ ⇔ O(p) = 1

Page 12: Transitivity of Trust

EvaluationPairwise Orderedness Precision

Recall

Page 13: Transitivity of Trust

Computing trustIgnorant trust functionT(p) = O(p) if p ∈ S, 1/2 otherwise

A randomly selected seed set S = {1,3,6}

Oracle vectoro = [1, 1, 1, 1, 0, 0, 0]

Trust vectort = [1, 1/2 , 1, 1/2 , 1/2 , 0, 1/2 ].

7·6 = 42 ordered pairs

Page 14: Transitivity of Trust

Computing trustPairwise orderednessT = 17/21

Thresholdδ = ½

Precision 1

Recall ½

Page 15: Transitivity of Trust

Trust AttenuationThe further away we are from good seed pages, the less certain we are that a page is good. For instance, in Figure 2 there are 2 pages (namely, pages 2 and 4) that are at most 2 links away from the good seed pages. As both of them are good, the probability that we reach a good page in at most 2 steps is 1. Similarly, the number of pages reachable from the good seed in at most 3 steps is 3. These observations suggest that we reduce trust as we move further and further away from the good seed pages.

Page 16: Transitivity of Trust

Trust AttenuationTrust dampening. Since page 2 is one link away from the good seed page 1, we assign it a dampened trust score of β, where β < 1. Since page 3 is reachable in one step from page 2 with score β, it gets a dampened score of β · β.

Trust splitting. If a good page has only a handful of outlinks, then it is likely that the pointed pages are also good. However, if a good page has hundreds of outlinks, it is more probable that some of them will point to bad pages.

Page 17: Transitivity of Trust

TrustRank

selectSeed: inverse Pagerank in order to choose the best seeds

Page 18: Transitivity of Trust

s = 0.08, 0.13, 0.08, 0.10, 0.09, 0.06, 0.02

σ = 2, 4, 5, 1, 3, 6, 7

(L = 3, seed set is {2,4,5})

d = 0, ½ , 0, 1 2 , 0, 0, 0

(aβ = 0.85 and MB = 20)

t∗ = 0, 0.18, 0.12, 0.15, 0.13, 0.05, 0.05

Page 19: Transitivity of Trust

TrustRankTrustRank usually gives good pages a higher score. In particular, three of the four good pages (namely, pages 2, 3, and 4) got high scores and two of the three bad pages (pages 6 and 7) got low scores. However, the algorithm failed to assign pages 1 and 5 adequate scores. Page 1 was not among the seeds, and it did not have any inlinks through which to accumulate score, so its score remained at 0. All good unreferenced web pages receive a similar treatment, unless they are selected as seeds. Bad page 5 received a high score because it is the direct target of one of the rare good-to-bad links.

Page 20: Transitivity of Trust

ExperimentsTo evaluate the algorithms, authors performed experiments using the complete set of pages crawled and indexed by the AltaVista search engine as of August 2003. In order to reduce computational demands, they worked with web sites instead of individual pages. They grouped the several billion pages into 31,003,946 sites, using a proprietary algorithm that is part of the AltaVista engine. More than one third of the sites (13,197,046) were unreferenced. The first author of this paper played the role of the oracle, examining pages of various sites, determining if they are spam, and performing additional classification. The manual evaluations took weeks.

Page 21: Transitivity of Trust

Evaluation1000 sites, not at random.

With a random sample, a great number of the sites would be very small (with few pages) and/or have very low PageRank. It is more important to correctly detect spam in high PageRank sites, since they will more often appear high in query result sets.

Page 22: Transitivity of Trust

Evaluation Virtually no spam in the top 5 TrustRank buckets, while there is a marked increase in spam concentration in the lower buckets.

At the same time, it is surprising that almost 20% of the second PageRank bucket is bad.

Page 23: Transitivity of Trust

Precision & Recall TrustRank assigned the highest scores to good sites, and the proportion of bad increases gradually as we move to lower scores. Hence, precision and recall manifest an almost linear decrease and increase, respectively.

Page 24: Transitivity of Trust

ConclusionExperimental results show that we can effectively identify a significant number of strongly reputable (non-spam) pages. In a search engine, TrustRank can be used either separately to filter the index, or in combination with PageRank and other metrics to rank search results.

Page 25: Transitivity of Trust

Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May).

Propagation of Trust and Distrust.In Proceedings of the 13th international conference on World Wide Web (pp. 403-412).

Page 26: Transitivity of Trust

Propagation of Trust and Distrust

Guha et al set a formal framework of propagation schemes, using both trust and distrust, in order to measure the “belief” of a user on any other user.

● Why Distrust?Distrust is as important as trust, regarding the opinion of a user for another user, if not more. Their results show that using distrust in retailing / recommendation networks is of significant use and improves the accuracy of the predictions.

Page 27: Transitivity of Trust

Distrust Challenges

Challenge 1

How to model

“Does a trust score of 0 translate to distrust or to ‘no opinion’?”[2]

Challenge 2

Chain Distrust

How can one apply distrust on a user chain? What if there is a chain of distrust?

Challenge 3

Algorithmic Challenges

The main eigenvector of a trust matrix including distrust doesn’t have to be real, but that raises algorithmic issues (Matrix to Markov chain).

Page 28: Transitivity of Trust

Fundamentals

Atomic Propagation:● Direct Propagation● Co-citation● Transpose trust● Trust coupling

Page 29: Transitivity of Trust

Methodology

T & D

B Matrix

Propagation Process

CB,α

CB,α

.

.

.

CB,α

k times

P<k>

F Matrix

Rounding

> Global> Local> Majority

> Trust only> One-Step Distrust> Propagated Distrust

> EIG: F = P<k>> WLC: add constant γ γ=0.5 / γ=0.9> Direct-only: a = e1

> Co-citation: a = e2> Combined (all 4):a = (0.4, 0.4, 0.1, 0.1)

Page 30: Transitivity of Trust

Data● Directed Graph from Epinions

● 131.829 nodes & 841.372 edgesEdge Labels: Trust or Distrust (85% Trust Edges)

● Distribution: Power Law ● Structure: Symmetric Bow-tie

SCC of 41.500 nodes40.000 in SCC / 30.000 out of SCC

● Giant WCC ~120.000 nodes

81 different schemes

Best Combination:● k=20● a=e* (combination)● Majority Rounding● EIG● One-step distrust

Error (incorrect predictions):e = 0.064 & es = 0.147

Results

Page 31: Transitivity of Trust

OVERVIEW

● Introduction● Task Definition● Schema

● Combating web spam with TrustRank. ● Propagation of Trust and Distrust

● The EigenTrust algorithm for reputation management in p2p networks● Attack-Resistant Trust Metrics for Public Key Certification

● Dataset Suggestions● Conclusion

Page 32: Transitivity of Trust

Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May).

The EigenTrust algorithm for reputation management in p2p networks.In Proceedings of the 12th international conference on World Wide Web (pp. 640-651). ACM.

Page 33: Transitivity of Trust

EigenTrust Overview

1. Reputation system on a P2P Network

2. Trust level

3. Robust against malicious peers and Freeriders

4. Reward good behavior through several transactions

Page 34: Transitivity of Trust

Trust level

A Peer will Trust:

1. Peers who have provided him authentic files.

2. Their opinions about other files.

3. Known trustworthy Peers.

Page 35: Transitivity of Trust

Estimating Trust

● Terminology

○ Local trust value cij■ The opinion peer i has of peer j, based on past experience

■ Each time peer i downloads an authentic/inauthentic file from peer j, cij increases or decreases

○ Global trust value ti■ The trust that the entire system places in peer i

Page 36: Transitivity of Trust

Estimating Trust Level

● Normalization of cij otherwise,malicious peers can assign arbitrarily high local trust value to other malicious peers

● Local Trust Vector: ci contains all local trust values cij that peer i has of other peers j

● Iterative friend-friend reference:○ Ask your friend t = CT*ci○ Ask their friend t = (CT)2*ci○ Ask until all nodes t = (CT)n*ci○ Ask until all nodes: For N large, ti converge to same vector for every peer i

Page 37: Transitivity of Trust

Practical issues and solutions

● A priori notions of trust○ Define some distribution p over pre-trusted peers

● Inactive Peers○ If a peer i does not download from anybody else, or if he assigns a zero score to all other

peers, their trust value will be redefined as they will choose to trust pre trusted users

● Malicious Collectives○ This is addressed by having each peer place at least some trust in the peers that are not part

of a collective

Page 38: Transitivity of Trust

Distributed Eigentrust

● Each peer stores his local trust vector ci

● Each peer stores and computes his own global trust value ti

● With the addition of p distribution

Page 39: Transitivity of Trust

Secure Eigentrust

● A peer should not hold his own t○ Problem: malicious Peer can report false value○ Solution: A different peer computes t for this peer

● t should not be computed by only one peer○ Problem: malicious Peer can report false value for another peer○ Solution: multiple score managers

Page 40: Transitivity of Trust

Experiments

The performance of this scheme is assessed based on simulations of a P2P network.The number of peers is usually 100 and they are connected by a power-law model.There are different threat models, that are executed on this network.

Page 41: Transitivity of Trust

Malicious Peers

Page 42: Transitivity of Trust

Malicious Collectives

Page 43: Transitivity of Trust

Levien, R., & Aiken, A. (1998, January).

Attack-Resistant Trust Metrics for Public Key Certification.In Usenix Security.

Page 44: Transitivity of Trust

Trust Metrics on Network Certificates

Certificate Applications:

● Authentication

● Data Integrity

● Encryption

Page 45: Transitivity of Trust

Trust Metrics on Network CertificatesUsing the digitally signed certificates a directed graph is formed which will be the model for deploying and test a number of trust metrics measuring the attack resistance of a given certificate network.

Two Types of certificates:

● Binding Certificates, “I believe that subject key k is the key belonging to name n”

● Delegation certificates “I trust certificates signed by subject key k”

Page 46: Transitivity of Trust

Trust Metrics on Network CertificatesA good trust metric ensures that there are really multiple independent sources of certification, and rejects assertions with insufficient certification.

No trust metric can protect against attacks on d keys or more, where d is the minimum number of certifiers on any widely accepted key.

Page 47: Transitivity of Trust
Page 48: Transitivity of Trust

Trust Metrics on Network CertificatesAttack Types:

● Node attack: the attacker is able to generate any certificate from the attacked key. (stolen password)

● Edge attack: the attacker is only able to generate a delegation certificate from the attacked key. (convince key owner)

Page 49: Transitivity of Trust

Trust Metrics on Network CertificatesMaximum Network Flow Metric

Each node n in the graph is assigned a capacityC(s,t)(n) = max(fs(dist(s, n)), gt(dist(n, t)))s = source, t = target, dist(n,t) = shortest path, d = degree

Page 50: Transitivity of Trust

Trust Metrics on Network CertificatesResults

Maximum Network Flow Metric is as effective as previously suggested approaches for node attacks but is far more resistant to edge attacks.

Page 51: Transitivity of Trust

OVERVIEW

● Introduction● Task Definition● Schema

● Combating web spam with TrustRank. ● Propagation of Trust and Distrust

● The EigenTrust algorithm for reputation management in p2p networks● Attack-Resistant Trust Metrics for Public Key Certification

● Dataset Suggestions● Conclusion

Page 52: Transitivity of Trust

Dataset Table*

Paper Existing Dataset Suggested Dataset Reason

TrustRank AltaVista GoogleBetter representation of the web by Google, as it is used by more users.

Trust & Distrust Epinions Amazon reviewsEvaluation on large network; low number of votes and people can be count as distrust.

EigenTrust in P2P SimulationGnutella Peer to Peer

NetworkEvaluate the consistency of the system on a large network.

Digital CertificatesPGP key database (certificate graph)

Ego-Facebook / email-EuAll/ email-Enron

Evaluate a community for resistant on circulating malicious information and on inflirtating.

* All suggested datasets can be found in SNAP [5]

Page 53: Transitivity of Trust

Conclusion● Trust is an important aspect that should not be missing from the social web;● We can successfully separate reputable pages from spam in a search engine

using TrustRank;● Distrust is a significant value that should not be ignored as it can promote the

importance of trust and improve the performance of an approach;● Malicious peers can be identified and isolated using the uploads of a user

with the EigenTrust algorithm; ● We can achieve the evaluation of the attack resistance of a network using the

Maximum Network Flow metric.

Page 54: Transitivity of Trust

References1. Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August). Combating web spam with TrustRank.

In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 576-587). VLDB Endowment.

2. Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May). Propagation of Trust and Distrust. In Proceedings of the 13th international conference on World Wide Web (pp. 403-412).

3. Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May). The EigenTrust algorithm for reputation management in p2p networks. In Proceedings of the 12th international conference on World Wide Web (pp. 640-651). ACM.

4. Levien, R., & Aiken, A. (1998, January). Attack-Resistant Trust Metrics for Public Key Certification. In Usenix Security.

5. Stanford Network Analysis Project: http://snap.stanford.edu/

Page 55: Transitivity of Trust

Any questions?Thank you!