the efficacy of collusions in web ranking and the countermeasurements
DESCRIPTION
The Efficacy of Collusions in Web Ranking and the Countermeasurements. Hui Zhang University of Southern California. Outline. Problem Statement. PageRank algorithm : a brief introduction. Study of PageRank’s robustness to collusion. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/1.jpg)
The Efficacy of Collusions in Web Ranking
and the Countermeasurements
Hui ZhangUniversity of Southern California
![Page 2: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/2.jpg)
04/22/23 USC CS599 2
• Problem Statement.
• PageRank algorithm : a brief introduction.
• Study of PageRank’s robustness to collusion.
• Adaptive-resetting: make PageRank robust to collusion.
• Conclusions.
Outline
![Page 3: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/3.jpg)
04/22/23 USC CS599 3
Search Engine Optimization (SEO)
![Page 4: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/4.jpg)
04/22/23 USC CS599 4
Web spam [Gyongyin et al. 2004]
• Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve.
• A spammer will play with two factors which decide the rank score of a page in a query:
Relevance – textual similarity between the query and a page.
Importance – the global popularity of a page, which is query-independent.
![Page 5: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/5.jpg)
04/22/23 USC CS599 5
Collusion in Web ranking
• A manipulation of the hyperlink structure by a group of users with the intention of improving the rating one or more users in the group.
![Page 6: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/6.jpg)
04/22/23 USC CS599 6
PageRank [Brin1998] • An eigenvector-based rating scheme to
rank hypertext documents on the WWW.
• An iterative algorithm to calculate the importance of a web page based on the importance of its parent pages.
• Can be applied to other systems than WWW.
![Page 7: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/7.jpg)
04/22/23 USC CS599 7
PageRank: random walk modelnode
referential linkThe walker
1/21/3
X
Y Z
• As time goes on, the expected percentage of steps the walker is at each node v converges to the PageRank weight PR(v).
With prob. (1-), I will continue the walk to a random successor node.
: resetting probability
With prob. , I will restart the walk at a random node.
: resetting probability
![Page 8: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/8.jpg)
04/22/23 USC CS599 8
PageRank: is it collusion-proof?• Can a node easily boost its rank by
manipulating its out-going links with others’?
I’m not colluding!
I’m not colluding!
I’m not colluding!
I’m not colluding!
![Page 9: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/9.jpg)
04/22/23 USC CS599 9
• In the system of node group G, for a subgroup G’,
the amplification factor Amp(G’) =
)'()'(
GWGW
in
G
':
)()'(Gii
G iPRGW
))'(1(|||'|)1(
)()()'(
',',:),(
GWGG
ioutiPRGW G
jiGjGijiin
WG(G’) =PR(i)+PR(j)
Win(G’) =
real group weight
“actual” group weight
Amp(G): a metric on group collusion
PR(x)3
PR(x)3 (1-) PR(y)
2PR(y)4PR(y)
4
+ (1-)
+ 2N
(1-W(G’))xy
GG’ i j
: resetting probability
![Page 10: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/10.jpg)
04/22/23 USC CS599 10
Answer for (1+1 = ?) in PageRank
• In the original PageRank system,
where is the resetting probability.
2)'(,' GAmpGG
1)'(,1
|||'|, GAmp
GGwhenlySpecifical
![Page 11: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/11.jpg)
04/22/23 USC CS599 11
Two experimental topologies• W, a Web link topology
Contains the link structure of upwards of 80 million URLs.
Source: the Stanford WebBase.
• B, a weblog blogrolling topologyContains the blogrolling structure of upwards of 72,000 blogs.
Source: www.blogstreet.com, the XML-RPC webblog service.
![Page 12: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/12.jpg)
04/22/23 USC CS599 12
• Model a small number of web pages simultaneously colluding.
• Methodology:
•100 colluding groups of 200 nodes;
•Each colluding group has the circle topology consisting of two nodes with adjacent ranks;
•Arbitrarily chose node pairs originally ranked around 1000th, 2000th, …, 100000th.
= 0.15.
Experiment 1: Collusion200
![Page 13: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/13.jpg)
04/22/23 USC CS599 13
Experiment result of Collusion200 (I)
Figure 1: W - Amplification factors of the 100 colluding groups in Collusion200.
![Page 14: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/14.jpg)
04/22/23 USC CS599 14
Experiment result of Collusion200 (III)
Figure 2: W – new PR rank after Collusion200.
Old rank: 1005th
New rank: 67th
Old rank: 10001th
New rank: 450th
Old rank: 100009th
New rank: 5038th
![Page 15: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/15.jpg)
04/22/23 USC CS599 15
There is a long flat portion…
Figure 3: The PR weight distribution of 4 topologies.
![Page 16: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/16.jpg)
04/22/23 USC CS599 16
• Identifying colluding groups is unlikely to be computationally tractable.•The densest k-subgraph problem[Feige et al. 1997].
•The classical CLIQUE problem.
•The problem of finding hiding large cliques in random graphs[Juels 1998].
Next step: how to detect collusions?
![Page 17: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/17.jpg)
04/22/23 USC CS599 17
• Theorem on Hardness.
Max G’G Amp(G’) is a NP-Hard problem.
Hardness on Amp
![Page 18: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/18.jpg)
04/22/23 USC CS599 18
• The revisit intervals of the random walk on a colluding node will likely to have a large variance compared to its expectation.
A counterexample: a star+dangling circle topology
0
12
N N+1
N-1N-2
Figure E:
How about using finer statistics of the random walk
![Page 19: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/19.jpg)
04/22/23 USC CS599 19
An observation on collusion behaviors• To increase their PR weight, i.e., the
stationary weight in the random walk, the colluding nodes will stall the random walk.
• When the resetting probability increases, the colluding nodes must suffer a significant drop in PR weight.
• Therefore, we expect the PR weight of colluding nodes to be highly correlated with 1/ (the average walk length), while that of non-colluding nodes is relatively insensitive to the change in .
GG’
![Page 20: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/20.jpg)
04/22/23 USC CS599 20
An intuitive examplenode
referential link
![Page 21: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/21.jpg)
04/22/23 USC CS599 21
An intuitive examplenode
referential link
A colluding group
![Page 22: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/22.jpg)
04/22/23 USC CS599 22
An intuitive examplenode
referential link
A colluding group
• A colluding node x: PR(x) = , and co-co(PR(x), 1/ ) 1. (co-co: correlation coefficient)
• A non-colluding node y: PR(x) = , and co-co(PR(y), 1/ ) 0.
NKNK1
)(1
NKNK1
)(
x
y
N: the system size; K: the colluding group size; K << N.
![Page 23: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/23.jpg)
04/22/23 USC CS599 23
• Part I – collusion detection:Given the topology, calculate the PR vector under different values.
{} = {0.0375, 0.05, 0.075, 0.15, 0.3, 0.45, 0.6}, default = 0.15.
Calculate the correlation coefficient between the curve of each node x's PR weight and the curve of 1/ . Label it as co-co(x).
• Part II – personalization:Calculate each node x's out-link personalized- = F(default, co-co(x)).
Exponential function FExp= .
Linear function FLinear= default+(0.5-default)*co-co(x)
The final PR weight vector is calculated with these personalized resetting values.
))(0.1( xcocodefault
Adaptive-resetting scheme
![Page 24: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/24.jpg)
04/22/23 USC CS599 24
Experiment result of Collusion200 (IV)
Figure 5: W - Amplification factors of the 100 colluding groups in Collusion200.
![Page 25: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/25.jpg)
04/22/23 USC CS599 25
Experiment result of Collusion200 (VI)
Figure 6: W – new PR rank after Collusion200.
![Page 26: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/26.jpg)
04/22/23 USC CS599 26
• Model various colluding subgraphs.• Methodology:
3 colluding groups:node
referential link
G1: 10-node ring G2: 10-node star topology
G3: 2-node ring
Experiment 2: Collusion22
![Page 27: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/27.jpg)
04/22/23 USC CS599 27
Experiment result of Collusion22 (I)
Figure 7: Amplification factors of the 3 colluding groups in Collusion22.
![Page 28: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/28.jpg)
04/22/23 USC CS599 28
Experiment result of Collusion22 (II)
Figure 8: W – new PR weight after Collusion22.
![Page 29: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/29.jpg)
04/22/23 USC CS599 29
New top-25 URL list in W Dropped outDropping New
![Page 30: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/30.jpg)
04/22/23 USC CS599 30
Conclusions• Simple collusions lead to effective Web ranking
improvement.
• A simple scheme based on PageRank algorithm effectively counteracts Web ranking collusions.
![Page 31: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/31.jpg)
04/22/23 USC CS599 31
Backup slides
![Page 32: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/32.jpg)
04/22/23 USC CS599 32
• A means of describing social trust networks.
• The basic concept is a democratic meritocracy.
• A rating system is used to evaluate individual members, and those results are then collated to produce a consensus about the merit of any given member.
• Examples: Livejournal, Friendster, eBay, Advogato
Reputation systems [Okita2003]
![Page 33: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/33.jpg)
04/22/23 USC CS599 33
• Assume N pages.• Assign all pages the initial value 1/N• Let Nu be the out-degree of Page u, Rank(v)
the importance of Page v, Bv the set of pages pointing to v. • Basic algorithmv Rank(v) =
vBuuNuRank /)(
• Enhanced algorithm against rank sinksv Rank(v) =
vBu
u NNuRank //)()1(
: damping factor
PageRank algorithm [Brin1998]
![Page 34: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/34.jpg)
04/22/23 USC CS599 34
Figure 4: the co-co PDF distribution in W and B: the [0, 0.1] range actually corresponds to [-1, 0.1] range.
Co-co distribution in real-world graphs
![Page 35: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/35.jpg)
04/22/23 USC CS599 35
Figure A: W – new PR weight after Collusion200.
Experiment result of Collusion200 (II)
![Page 36: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/36.jpg)
04/22/23 USC CS599 36
Figure B: B – new PR rank after Collusion200
Experiment result of Collusion200 (VII)
![Page 37: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/37.jpg)
04/22/23 USC CS599 37
Figure C: B – new PR weight after Collusion200
Experiment result of Collusion200 (X)
![Page 38: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/38.jpg)
04/22/23 USC CS599 38
Figure 6: W – new PR weight after Collusion200.
Experiment result of Collusion200 (V)
![Page 39: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/39.jpg)
04/22/23 USC CS599 39
Correlation coefficient
![Page 40: The Efficacy of Collusions in Web Ranking and the Countermeasurements](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815d98550346895dcbb933/html5/thumbnails/40.jpg)
04/22/23 USC CS599 40
Figure D: W – new PR rank after Collusion22.
Experiment result of Collusion22 (III)