transitivity of trust

Transitivity of Trust

TeamFounta Antigoni-Maria, UID: 647Kouslis Ilias, UID: 650Moutidis Iraklis, UID: 636Spathis Dimitris, UID: 640

OVERVIEW

● Introduction● Task Definition● Schema

● Combating web spam with TrustRank. ● Propagation of Trust and Distrust

● The EigenTrust algorithm for reputation management in p2p networks● Attack-Resistant Trust Metrics for Public Key Certification

● Dataset Suggestions● Conclusion

Introduction

● Given the open nature of social networks and their current level of popularity, users are increasingly concerned about privacy and security;

● We need to trust the entities that belong to our social network;

● To achieve that, a “Web of Trust” should be introduced;

● In order to balance the open nature of social networks and safeguard the privacy concerns of users, it is important to build “Trust Communities”.

Task Definition [2]

Challenges

Users sometimes adopt many personas and express a large number of biased opinions.

Difficulty in defining trust.

Importance

On e-commerce, a trust model can increase the value of a product.

Trusted users will have greater influence and perks; that can lead to positive effect on user behaviour.

Better recommendations.

Applications

Internet Networks:

● Social networks● P2P networks● Certificate networks● Mail networks

Schema

Web of Trust

Case Studies

EigenTrust on P2P Digital Certificates

Methodologies

TrustRank Trust & Distrust

Case Studies

OVERVIEW


● Combating web spam with TrustRank ● Propagation of Trust and Distrust



Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August).

Combating web spam with TrustRank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 576-587). VLDB Endowment.

TrustRank overview

Gyongyi et al proposed a couple of techniques to semi-automatically separate reputable web pages from spam. Their process is to first select a small set of seed pages to be human-evaluated. Once they detect manually the reputable seed pages, they exploit the nature of web, namely its link structure in order to discover more that are likely to be good, as well. The benchmark dataset consists of AltaVista’s web index as of 2003.

Contribution

1. Formalization of web spam problem and detection algorithms.2. Metrics defined for assessing the efficacy of detection algorithms.3. Schemes for selecting seed sets of pages to be manually evaluated.4. Introduction of TrustRank algorithm for determining the likelihood that pages

are reputable.5. An extensive evaluation, based on 31 million sites crawled by the AltaVista

search engine, and a manual examination of over 2,000 sites.

Assessing trust

The creators of good pages can sometimes be “tricked,” so we do find some good-to-bad links on the web.

Assessing trustOracle functionO(p) = 0 if p is bad,1 if p is good.

Trust functionT(p) = Pr[O(p) = 1]

Ordered Trust PropertyT(p) < T(q) ⇔ Pr[O(p) = 1] < Pr[O(q) = 1]T(p) = T(q) ⇔ Pr[O(p) = 1] = Pr[O(q) = 1]

Threshold value δT(p) > δ ⇔ O(p) = 1

EvaluationPairwise Orderedness Precision

Recall

Computing trustIgnorant trust functionT(p) = O(p) if p ∈ S, 1/2 otherwise

A randomly selected seed set S = {1,3,6}

Oracle vectoro = [1, 1, 1, 1, 0, 0, 0]

Trust vectort = [1, 1/2 , 1, 1/2 , 1/2 , 0, 1/2 ].

7·6 = 42 ordered pairs

Computing trustPairwise orderednessT = 17/21

Thresholdδ = ½

Precision 1

Recall ½

Trust AttenuationThe further away we are from good seed pages, the less certain we are that a page is good. For instance, in Figure 2 there are 2 pages (namely, pages 2 and 4) that are at most 2 links away from the good seed pages. As both of them are good, the probability that we reach a good page in at most 2 steps is 1. Similarly, the number of pages reachable from the good seed in at most 3 steps is 3. These observations suggest that we reduce trust as we move further and further away from the good seed pages.

Trust AttenuationTrust dampening. Since page 2 is one link away from the good seed page 1, we assign it a dampened trust score of β, where β < 1. Since page 3 is reachable in one step from page 2 with score β, it gets a dampened score of β · β.

Trust splitting. If a good page has only a handful of outlinks, then it is likely that the pointed pages are also good. However, if a good page has hundreds of outlinks, it is more probable that some of them will point to bad pages.

TrustRank

selectSeed: inverse Pagerank in order to choose the best seeds

s = 0.08, 0.13, 0.08, 0.10, 0.09, 0.06, 0.02

σ = 2, 4, 5, 1, 3, 6, 7

(L = 3, seed set is {2,4,5})

d = 0, ½ , 0, 1 2 , 0, 0, 0

(aβ = 0.85 and MB = 20)

t∗ = 0, 0.18, 0.12, 0.15, 0.13, 0.05, 0.05

TrustRankTrustRank usually gives good pages a higher score. In particular, three of the four good pages (namely, pages 2, 3, and 4) got high scores and two of the three bad pages (pages 6 and 7) got low scores. However, the algorithm failed to assign pages 1 and 5 adequate scores. Page 1 was not among the seeds, and it did not have any inlinks through which to accumulate score, so its score remained at 0. All good unreferenced web pages receive a similar treatment, unless they are selected as seeds. Bad page 5 received a high score because it is the direct target of one of the rare good-to-bad links.

ExperimentsTo evaluate the algorithms, authors performed experiments using the complete set of pages crawled and indexed by the AltaVista search engine as of August 2003. In order to reduce computational demands, they worked with web sites instead of individual pages. They grouped the several billion pages into 31,003,946 sites, using a proprietary algorithm that is part of the AltaVista engine. More than one third of the sites (13,197,046) were unreferenced. The first author of this paper played the role of the oracle, examining pages of various sites, determining if they are spam, and performing additional classification. The manual evaluations took weeks.

Evaluation1000 sites, not at random.

With a random sample, a great number of the sites would be very small (with few pages) and/or have very low PageRank. It is more important to correctly detect spam in high PageRank sites, since they will more often appear high in query result sets.

Evaluation Virtually no spam in the top 5 TrustRank buckets, while there is a marked increase in spam concentration in the lower buckets.

At the same time, it is surprising that almost 20% of the second PageRank bucket is bad.

Precision & Recall TrustRank assigned the highest scores to good sites, and the proportion of bad increases gradually as we move to lower scores. Hence, precision and recall manifest an almost linear decrease and increase, respectively.

ConclusionExperimental results show that we can effectively identify a significant number of strongly reputable (non-spam) pages. In a search engine, TrustRank can be used either separately to filter the index, or in combination with PageRank and other metrics to rank search results.

Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May).

Propagation of Trust and Distrust.In Proceedings of the 13th international conference on World Wide Web (pp. 403-412).

Propagation of Trust and Distrust

Guha et al set a formal framework of propagation schemes, using both trust and distrust, in order to measure the “belief” of a user on any other user.

● Why Distrust?Distrust is as important as trust, regarding the opinion of a user for another user, if not more. Their results show that using distrust in retailing / recommendation networks is of significant use and improves the accuracy of the predictions.

Distrust Challenges

Challenge 1

How to model

“Does a trust score of 0 translate to distrust or to ‘no opinion’?”[2]

Challenge 2

Chain Distrust

How can one apply distrust on a user chain? What if there is a chain of distrust?

Challenge 3

Algorithmic Challenges

The main eigenvector of a trust matrix including distrust doesn’t have to be real, but that raises algorithmic issues (Matrix to Markov chain).

Fundamentals

Atomic Propagation:● Direct Propagation● Co-citation● Transpose trust● Trust coupling

Methodology

T & D

B Matrix

Propagation Process

CB,α

CB,α

.

.

.

CB,α

k times

P<k>

F Matrix

Rounding

> Global> Local> Majority

> Trust only> One-Step Distrust> Propagated Distrust

> EIG: F = P<k>> WLC: add constant γ γ=0.5 / γ=0.9> Direct-only: a = e1

> Co-citation: a = e2> Combined (all 4):a = (0.4, 0.4, 0.1, 0.1)

Data● Directed Graph from Epinions

● 131.829 nodes & 841.372 edgesEdge Labels: Trust or Distrust (85% Trust Edges)

● Distribution: Power Law ● Structure: Symmetric Bow-tie

SCC of 41.500 nodes40.000 in SCC / 30.000 out of SCC

● Giant WCC ~120.000 nodes

81 different schemes

Best Combination:● k=20● a=e* (combination)● Majority Rounding● EIG● One-step distrust

Error (incorrect predictions):e = 0.064 & es = 0.147

Results

OVERVIEW





Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May).

The EigenTrust algorithm for reputation management in p2p networks.In Proceedings of the 12th international conference on World Wide Web (pp. 640-651). ACM.

EigenTrust Overview

1. Reputation system on a P2P Network

2. Trust level

3. Robust against malicious peers and Freeriders

4. Reward good behavior through several transactions

Trust level

A Peer will Trust:

1. Peers who have provided him authentic files.

2. Their opinions about other files.

3. Known trustworthy Peers.

Estimating Trust

● Terminology

○ Local trust value cij■ The opinion peer i has of peer j, based on past experience

■ Each time peer i downloads an authentic/inauthentic file from peer j, cij increases or decreases

○ Global trust value ti■ The trust that the entire system places in peer i

Estimating Trust Level

● Normalization of cij otherwise,malicious peers can assign arbitrarily high local trust value to other malicious peers

● Local Trust Vector: ci contains all local trust values cij that peer i has of other peers j

● Iterative friend-friend reference:○ Ask your friend t = CT*ci○ Ask their friend t = (CT)2*ci○ Ask until all nodes t = (CT)n*ci○ Ask until all nodes: For N large, ti converge to same vector for every peer i

Practical issues and solutions

● A priori notions of trust○ Define some distribution p over pre-trusted peers

● Inactive Peers○ If a peer i does not download from anybody else, or if he assigns a zero score to all other

peers, their trust value will be redefined as they will choose to trust pre trusted users

● Malicious Collectives○ This is addressed by having each peer place at least some trust in the peers that are not part

of a collective

Distributed Eigentrust

● Each peer stores his local trust vector ci

● Each peer stores and computes his own global trust value ti

● With the addition of p distribution

Secure Eigentrust

● A peer should not hold his own t○ Problem: malicious Peer can report false value○ Solution: A different peer computes t for this peer

● t should not be computed by only one peer○ Problem: malicious Peer can report false value for another peer○ Solution: multiple score managers

Experiments

The performance of this scheme is assessed based on simulations of a P2P network.The number of peers is usually 100 and they are connected by a power-law model.There are different threat models, that are executed on this network.

Malicious Peers

Malicious Collectives

Levien, R., & Aiken, A. (1998, January).

Attack-Resistant Trust Metrics for Public Key Certification.In Usenix Security.

Trust Metrics on Network Certificates

Certificate Applications:

● Authentication

● Data Integrity

● Encryption

Trust Metrics on Network CertificatesUsing the digitally signed certificates a directed graph is formed which will be the model for deploying and test a number of trust metrics measuring the attack resistance of a given certificate network.

Two Types of certificates:

● Binding Certificates, “I believe that subject key k is the key belonging to name n”

● Delegation certificates “I trust certificates signed by subject key k”

Trust Metrics on Network CertificatesA good trust metric ensures that there are really multiple independent sources of certification, and rejects assertions with insufficient certification.

No trust metric can protect against attacks on d keys or more, where d is the minimum number of certifiers on any widely accepted key.

Trust Metrics on Network CertificatesAttack Types:

● Node attack: the attacker is able to generate any certificate from the attacked key. (stolen password)

● Edge attack: the attacker is only able to generate a delegation certificate from the attacked key. (convince key owner)

Trust Metrics on Network CertificatesMaximum Network Flow Metric

Each node n in the graph is assigned a capacityC(s,t)(n) = max(fs(dist(s, n)), gt(dist(n, t)))s = source, t = target, dist(n,t) = shortest path, d = degree

Trust Metrics on Network CertificatesResults

Maximum Network Flow Metric is as effective as previously suggested approaches for node attacks but is far more resistant to edge attacks.

OVERVIEW





Dataset Table*

Paper Existing Dataset Suggested Dataset Reason

TrustRank AltaVista GoogleBetter representation of the web by Google, as it is used by more users.

Trust & Distrust Epinions Amazon reviewsEvaluation on large network; low number of votes and people can be count as distrust.

EigenTrust in P2P SimulationGnutella Peer to Peer

NetworkEvaluate the consistency of the system on a large network.

Digital CertificatesPGP key database (certificate graph)

Ego-Facebook / email-EuAll/ email-Enron

Evaluate a community for resistant on circulating malicious information and on inflirtating.

* All suggested datasets can be found in SNAP [5]

Conclusion● Trust is an important aspect that should not be missing from the social web;● We can successfully separate reputable pages from spam in a search engine

using TrustRank;● Distrust is a significant value that should not be ignored as it can promote the

importance of trust and improve the performance of an approach;● Malicious peers can be identified and isolated using the uploads of a user

with the EigenTrust algorithm; ● We can achieve the evaluation of the attack resistance of a network using the

Maximum Network Flow metric.

References1. Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004, August). Combating web spam with TrustRank.

In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 576-587). VLDB Endowment.

2. Guha, R., Kumar, R., Raghavan, P. & Tomkins, A (2004, May). Propagation of Trust and Distrust. In Proceedings of the 13th international conference on World Wide Web (pp. 403-412).

3. Kamvar, S. D., Schlosser, M. T., & Garcia-Molina, H. (2003, May). The EigenTrust algorithm for reputation management in p2p networks. In Proceedings of the 12th international conference on World Wide Web (pp. 640-651). ACM.

4. Levien, R., & Aiken, A. (1998, January). Attack-Resistant Trust Metrics for Public Key Certification. In Usenix Security.

5. Stanford Network Analysis Project: http://snap.stanford.edu/

http://snap.stanford.edu/

Any questions?Thank you!

transitivity of trust

Data & Analytics