two unrelated talks

83
1/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions Two unrelated talks MARCO BRESSAN January 30, 2012

Upload: thegreatbrix

Post on 26-May-2015

124 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Two Unrelated Talks

1/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Two unrelated talks

MARCO BRESSAN

January 30, 2012

Page 2: Two Unrelated Talks

2/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Outline

1 Local computation of PageRank: the ranking sideIntroductionMotivationsLocal ranking in theoryLocal ranking in practiceConclusions

2 psort, yet another fast stable external sorting softwareIntroductionMaking sorting a complicate taskInside psortConclusions

3 Conclusions

Page 3: Two Unrelated Talks

3/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Local computation of PageRank:the ranking side

Page 4: Two Unrelated Talks

4/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Ranking robustly

Rank a graph’s nodes

1. the graph 2. external factors

• (varying) parameters• graph availability• . . .

Is ranking robust?

How is ranking influenced by external factors?

Page 5: Two Unrelated Talks

4/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Ranking robustly

Rank a graph’s nodes

1. the graph 2. external factors

• (varying) parameters• graph availability• . . .

Is ranking robust?

How is ranking influenced by external factors?

Page 6: Two Unrelated Talks

5/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

PageRank

u

v

PageRank of node v:

P (v) =

α

∑u→v

P (u)

o(u)

+1− αn

n = |G| α = damping factor

Applicationsweb search, web crawling, web spam detection, personalized web search, social network

mining, ranking in databases, structural re-ranking, opinion mining, word sense

disambiguation, credit and reputation systems, bibliometrics, gene ranking, . . .

Among top data mining algorithmsWu et al. Top 10 algorithms in data mining. Knowl. and Inform. Systems, 2007.

Page 7: Two Unrelated Talks

5/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

PageRank

u

v

PageRank of node v:

P (v) = α∑u→v

P (u)

o(u)+

1− αn

n = |G| α = damping factor

Applicationsweb search, web crawling, web spam detection, personalized web search, social network

mining, ranking in databases, structural re-ranking, opinion mining, word sense

disambiguation, credit and reputation systems, bibliometrics, gene ranking, . . .

Among top data mining algorithmsWu et al. Top 10 algorithms in data mining. Knowl. and Inform. Systems, 2007.

Page 8: Two Unrelated Talks

5/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

PageRank

u

v

PageRank of node v:

P (v) = α∑u→v

P (u)

o(u)+

1− αn

n = |G| α = damping factor

Applicationsweb search, web crawling, web spam detection, personalized web search, social network

mining, ranking in databases, structural re-ranking, opinion mining, word sense

disambiguation, credit and reputation systems, bibliometrics, gene ranking, . . .

Among top data mining algorithmsWu et al. Top 10 algorithms in data mining. Knowl. and Inform. Systems, 2007.

Page 9: Two Unrelated Talks

6/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Choose the damping, choose the ranking?

P (v) = α∑u→v

P (u)

o(u)+

1− αn

Is PageRank’s rankingrobust to small variationsin α ?

Results1. not robust in theory (permutation theorem, reversal theorem)2. novel tools for checking robustness (lineage analysis)3. somewhat robust in real-world graphs (experiments)

Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking?

J. Discrete Algorithms 8(2): 199-213 (2010)

Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking?

Proc. of WAW 2009: 76-89

Page 10: Two Unrelated Talks

6/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Choose the damping, choose the ranking?

P (v) = α∑u→v

P (u)

o(u)+

1− αn

Is PageRank’s rankingrobust to small variationsin α ?

Results1. not robust in theory (permutation theorem, reversal theorem)2. novel tools for checking robustness (lineage analysis)3. somewhat robust in real-world graphs (experiments)

Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking?

J. Discrete Algorithms 8(2): 199-213 (2010)

Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking?

Proc. of WAW 2009: 76-89

Page 11: Two Unrelated Talks

7/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Is it possible to compute the rank locally?

Local computation

u

v

Ranking

0.15

0.2

0.10.3

0.25

In many applicationsonly the rank matters!

Is it possible to compute the rank locally?

• stated by Chen et al. (CIKM 2004)• restated by Bar-Yossef and Mashiach (CIKM 2008)

Page 12: Two Unrelated Talks

7/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Is it possible to compute the rank locally?

Local computation

u

v

Ranking

0.15

0.2

0.10.3

0.25

1st

2nd3rd

4th

5th

In many applicationsonly the rank matters!

Is it possible to compute the rank locally?

• stated by Chen et al. (CIKM 2004)• restated by Bar-Yossef and Mashiach (CIKM 2008)

Page 13: Two Unrelated Talks

7/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Is it possible to compute the rank locally?

Local computation

u

v

Ranking

0.15

0.2

0.10.3

0.25

1st

2nd3rd

4th

5th

In many applicationsonly the rank matters!

Is it possible to compute the rank locally?

• stated by Chen et al. (CIKM 2004)• restated by Bar-Yossef and Mashiach (CIKM 2008)

Page 14: Two Unrelated Talks

8/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (I): crawling

The visited graph expands startingfrom seed nodes.

Is it possible to rank the red frontier for a low cost, without visitingthe whole crawled graph?

Page 15: Two Unrelated Talks

8/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (I): crawling

The visited graph expands startingfrom seed nodes.

Is it possible to rank the red frontier for a low cost, without visitingthe whole crawled graph?

Page 16: Two Unrelated Talks

8/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (I): crawling

The visited graph expands startingfrom seed nodes.

Which red nodes should be visitednow? And in what order?

Is it possible to rank the red frontier for a low cost, without visitingthe whole crawled graph?

Page 17: Two Unrelated Talks

8/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (I): crawling

The visited graph expands startingfrom seed nodes.

Which red nodes should be visitednow? And in what order?

Order the nodes with PageRank!

Cho et al. Efficient crawling through URLordering. Computer Networks, 1998.

Is it possible to rank the red frontier for a low cost, without visitingthe whole crawled graph?

Page 18: Two Unrelated Talks

9/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (II): ranking withcompetitors

Retrieve graph structure using e.g. Google’s link:

Bar-Yossef and Mashiach. Local approximation of PageRank and reversePageRank. Proc. ACM CIKM, 2008.

Is it possible to compute this rank efficiently, using few queries?

Page 19: Two Unrelated Talks

9/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (II): ranking withcompetitors

Retrieve graph structure using e.g. Google’s link:

Bar-Yossef and Mashiach. Local approximation of PageRank and reversePageRank. Proc. ACM CIKM, 2008.

Is it possible to compute this rank efficiently, using few queries?

Page 20: Two Unrelated Talks

9/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (II): ranking withcompetitors

Retrieve graph structure using e.g. Google’s link:

Bar-Yossef and Mashiach. Local approximation of PageRank and reversePageRank. Proc. ACM CIKM, 2008.

Is it possible to compute this rank efficiently, using few queries?

Page 21: Two Unrelated Talks

9/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (II): ranking withcompetitors

Retrieve graph structure using e.g. Google’s link:

Bar-Yossef and Mashiach. Local approximation of PageRank and reversePageRank. Proc. ACM CIKM, 2008.

Is it possible to compute this rank efficiently, using few queries?

Page 22: Two Unrelated Talks

9/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (II): ranking withcompetitors

Retrieve graph structure using e.g. Google’s link:

Bar-Yossef and Mashiach. Local approximation of PageRank and reversePageRank. Proc. ACM CIKM, 2008.

Is it possible to compute this rank efficiently, using few queries?

Page 23: Two Unrelated Talks

10/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (III): social networkmining

Rank key users in social networks

Heidemann et al. Identifying key users in online social networks: APageRank based approach. Proc. ICIS, 2010.

Full graph not available (privacy settings).

Is it still possible to pretend correctness of the output ranking?

Page 24: Two Unrelated Talks

10/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (III): social networkmining

Rank key users in social networks

Heidemann et al. Identifying key users in online social networks: APageRank based approach. Proc. ICIS, 2010.

Full graph not available (privacy settings).

Is it still possible to pretend correctness of the output ranking?

Page 25: Two Unrelated Talks

10/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (III): social networkmining

Rank key users in social networks

Heidemann et al. Identifying key users in online social networks: APageRank based approach. Proc. ICIS, 2010.

Full graph not available (privacy settings).

Is it still possible to pretend correctness of the output ranking?

Page 26: Two Unrelated Talks

10/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (III): social networkmining

Rank key users in social networks

Heidemann et al. Identifying key users in online social networks: APageRank based approach. Proc. ICIS, 2010.

Full graph not available (privacy settings).

Is it still possible to pretend correctness of the output ranking?

Page 27: Two Unrelated Talks

10/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Motivating examples (III): social networkmining

Rank key users in social networks

Heidemann et al. Identifying key users in online social networks: APageRank based approach. Proc. ICIS, 2010.

Full graph not available (privacy settings).Is it still possible to pretend correctness of the output ranking?

Page 28: Two Unrelated Talks

11/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Formal definition of the problem

Input

• graph G of size n

• target nodes v1, . . . , vk

• score separation ε > 0

Output

• ranking of v1, v2, . . . , vk

If (1− ε) < P (vi)P (vj)

< (1 + ε)

any ranking of vi, vj is valid

Cost Model• computation for free• but visiting G costs

(query to link server)

cost of ranking = |queries| = |nodes visited|

Page 29: Two Unrelated Talks

12/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Is it possible to compute the rank locally?

Our contribution: NO!

NO in practice: experimental results

1. real web/social graphs behave like worst-case input instancesfor local ranking

2. approximating is not trivial:state-of-the-art local score approximation algorithms do notturn into low-cost local rank approximation algorithms

Page 30: Two Unrelated Talks

12/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Is it possible to compute the rank locally?Our contribution: NO!

NO in theory: lower bounds

1. Every deterministic local ranking algorithm has an adversarialgraph forcing Ω(n) queries (and can be tightened)

2. Every randomized local ranking algorithm has an adversarialgraph forcing Ω(n) queries

even to rank the top k nodes,even if their scores are highly separated!

=⇒ a general low-cost local ranking algorithm does not exist

NO in practice: experimental results

1. real web/social graphs behave like worst-case input instancesfor local ranking

2. approximating is not trivial:state-of-the-art local score approximation algorithms do notturn into low-cost local rank approximation algorithms

Page 31: Two Unrelated Talks

12/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Is it possible to compute the rank locally?Our contribution: NO!

NO in practice: experimental results

1. real web/social graphs behave like worst-case input instancesfor local ranking

2. approximating is not trivial:state-of-the-art local score approximation algorithms do notturn into low-cost local rank approximation algorithms

Page 32: Two Unrelated Talks

13/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Lower bounds (I): deterministic algorithms

Every det.algorithm has anadversarial graphforcing cost Ω(n)

n(1 −O(εk))

Theorem 1 (paper Thm. 4)

Choose integers k > 1 and n0 ≥ k2, a damping factor α ∈ (0, 1), and ε ≤ α2

20k . For

any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0) where the

top k nodes v0, . . . , vk−1 are ε-separated and, to compute their relative ranking

according to Pα(·), algorithm A performs Ω(n) queries.

n(1−O(εk)) queries.

Page 33: Two Unrelated Talks

13/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Lower bounds (I): deterministic algorithms

Every det.algorithm has anadversarial graphforcing cost Ω(n)

n(1 −O(εk))

Theorem 1 (paper Thm. 4)

Choose integers k > 1 and n0 ≥ k2, a damping factor α ∈ (0, 1), and ε ≤ α2

20k . For

any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0) where the

top k nodes v0, . . . , vk−1 are ε-separated and, to compute their relative ranking

according to Pα(·), algorithm A performs Ω(n) queries.

n(1−O(εk)) queries.

Page 34: Two Unrelated Talks

13/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Lower bounds (I): deterministic algorithms

Every det.algorithm has anadversarial graphforcing cost Ω(n)

n(1 −O(εk))

Theorem 1 (paper Thm. 4)

Choose integers k > 1 and n0 ≥ k2, a damping factor α ∈ (0, 1), and ε ≤ α2

20k . For

any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0) where the

top k nodes v0, . . . , vk−1 are ε-separated and, to compute their relative ranking

according to Pα(·), algorithm A performs Ω(n) queries.

n(1−O(εk)) queries.

Page 35: Two Unrelated Talks

13/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Lower bounds (I): deterministic algorithms

Every det.algorithm has anadversarial graphforcing cost Ω(n)

n(1 −O(εk))

Theorem 1 (paper Thm. 4)

Choose integers k > 1 and n0 ≥ k2, a damping factor α ∈ (0, 1), and ε ≤ α2

20k . For

any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0) where the

top k nodes v0, . . . , vk−1 are ε-separated and, to compute their relative ranking

according to Pα(·), algorithm A performs Ω(n) queries.

n(1−O(εk)) queries.

Page 36: Two Unrelated Talks

13/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Lower bounds (I): deterministic algorithms

Every det.algorithm has anadversarial graphforcing cost Ω(n)

n(1 −O(εk))

Theorem 1 (paper Thm. 4)

Choose integers k > 1 and n0 ≥ k2, a damping factor α ∈ (0, 1), and ε ≤ α2

20k . For

any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0) where the

top k nodes v0, . . . , vk−1 are ε-separated and, to compute their relative ranking

according to Pα(·), algorithm A performs Ω(n) n(1−O(εk)) queries.

Page 37: Two Unrelated Talks

14/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Lower bounds (II): randomized algorithms

Every rand.(Las Vegas orMonte Carlo)algorithm has anadvers. graphforcing costΩ(α√nε

)

Ω(n)

[v3 v10 ... v7]

link

serv

er v1

(109 nodes)

v2

v20

AR

AN

DO

M

graph G

~104.5 queries

Theorem 2 (paper Thm. 3)

Choose k > 1, n0 ≥ 6k3, a damping factor α ∈ (0, 1), and ε ∈[α2k2

4n0, α

2

24k

]. Then

1. for any Las Vegas local algorithm A

2. for any Monte Carlo local algorithm A with constant confidence

there exists a graph of size n ∈ Θ(n0) where the top k nodes v0, . . . , vk−1 are

ε-separated and, to compute their relative ranking, A performs in expectation Ω(α√nε

)queries.

Page 38: Two Unrelated Talks

14/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Lower bounds (II): randomized algorithms

Every rand.(Las Vegas orMonte Carlo)algorithm has anadvers. graphforcing costΩ(α√nε

)Ω(n) [v3 v10 ... v7]

link

serv

er v1

(109 nodes)

v2

v20

AR

AN

DO

M

graph G

~104.5 108 queries

Theorem 2 (paper Thm. 3)

Choose k > 1, n0 ≥ 6k3, a damping factor α ∈ (0, 1), and ε ∈[α2k2

4n0, α

2

24k

]. Then

1. for any Las Vegas local algorithm A

2. for any Monte Carlo local algorithm A with constant confidence

there exists a graph of size n ∈ Θ(n0) where the top k nodes v0, . . . , vk−1 are

ε-separated and, to compute their relative ranking, A performs in expectation Ω(α√nε

)queries.

Page 39: Two Unrelated Talks

15/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

What happens in practice?

Two experiments

1. Hardness of real-world graphs

Compute the minimal number of nodes that an algorithm mustvisit to always guarantee a correct ranking.

2. Performance of approximation algorithms

Evaluate cost and accuracy of local ranking algorithms derivedfrom state-of-the-art local score approximation algorithms.

Datasets

nodes arcs crawled.it 40M 1150M 2004

LiveJournal 5M 79M 2008

publicly available from LAW

- Univ. Milan

http://law.dsi.unimi.it

Page 40: Two Unrelated Talks

16/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 1: hardness of real-world graphs (1/2)

Breakdown of a local ranking algorithm

1. Visit ancestors

Thm.: must visit at least|minset(G, u, v)|ancestors

2. Compute ranking

Thm.: must agree withnatural PageRank scoreapproximation

|minset(G, u, v)| ≤ cost of ranking u, v in graph G

Page 41: Two Unrelated Talks

16/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 1: hardness of real-world graphs (1/2)

Breakdown of a local ranking algorithm

1. Visit ancestors

Thm.: must visit at least|minset(G, u, v)|ancestors

2. Compute ranking

Thm.: must agree withnatural PageRank scoreapproximation

|minset(G, u, v)| ≤ cost of ranking u, v in graph G

Page 42: Two Unrelated Talks

17/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 1: hardness of real-world graphs (2/2)

103

104

105

106

107

.01.02.04.08.16.32.641.282.56

ave

rage n

um

ber

of vi

site

d n

od

es

ε

.it web graphLiveJournal graph

Page 43: Two Unrelated Talks

18/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 2: performance of approximationalgorithms

Improved variant of the pruned bruteforce algorithm: limitPageRank computation to ancestors giving a high contribution.

vpruning

threshold = 10%

Page 44: Two Unrelated Talks

18/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 2: performance of approximationalgorithms

Improved variant of the pruned bruteforce algorithm: limitPageRank computation to ancestors giving a high contribution.

v

35%

24%17%

10%

pruningthreshold = 10%

Page 45: Two Unrelated Talks

18/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 2: performance of approximationalgorithms

Improved variant of the pruned bruteforce algorithm: limitPageRank computation to ancestors giving a high contribution.

v

35%

24%17%

10%

<10%

<10%

<10%

<10%<10%

<10%

pruningthreshold = 10%

Page 46: Two Unrelated Talks

19/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 2: performance of approximationalgorithms

.it web graph

103

104

105

106

10-710-610-510-410-310-210-1

ave

rage c

ost

pruning threshold

(0.64,1.28)(0.32,0.64)(0.16,0.32)(0.08,0.16)(0.04,0.08)(0.02,0.04)(0.01,0.02)

(2.56,5.12)(1.28,2.56)

Page 47: Two Unrelated Talks

20/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 2: performance of approximationalgorithms

LiveJournal graph

103

104

105

106

10-710-610-510-410-310-210-1

ave

rage c

ost

pruning threshold

(0.64,1.28)(0.32,0.64)(0.16,0.32)(0.08,0.16)(0.04,0.08)(0.02,0.04)(0.01,0.02)

(2.56,5.12)(1.28,2.56)

Page 48: Two Unrelated Talks

21/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 2: performance of approximationalgorithms

.it web graph

-0.2

0

0.2

0.4

0.6

0.8

1

10-7

10-6

10-5

10-4

10-3

10-2

10-1

pruning threshold

(2.56,5.12)(1.28,2.56)(0.64,1.28)(0.32,0.64)(0.16,0.32)(0.08,0.16)(0.04,0.08)(0.02,0.04)(0.01,0.02)

fra

ctio

n o

f co

rre

ctly

ra

nke

d n

od

e p

air

s

Page 49: Two Unrelated Talks

22/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Exp. 2: performance of approximationalgorithms

LiveJournal graph

-0.2

0

0.2

0.4

0.6

0.8

1

10-710-610-510-410-310-210-1

fra

ctio

n o

f co

rre

ctly

ra

nke

d n

od

e p

air

s

pruning threshold

(0.64,1.28)(0.32,0.64)(0.16,0.32)(0.08,0.16)(0.04,0.08)(0.02,0.04)(0.01,0.02)

(1.28,2.56)(2.56,5.12)

Page 50: Two Unrelated Talks

23/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Conclusions

1. Local computation of PageRank ranking is infeasible

2. Cost of exact local ranking algorithms bounded by minsets

3. Tested real web/social graphs are near worst-case

4. And approximation is not trivial

Marco Bressan, Luca Pretto. Local computation of PageRank: the ranking side.Proc. of CIKM 2011: 631-640

Page 51: Two Unrelated Talks

24/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

psort, yet another fast stableexternal sorting software

Page 52: Two Unrelated Talks

25/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

In a nutshell

the psort sorting library

• written in C++• handles large datasets (> TB)• stable sorting• fast• designed for PC-class machines

ideal applications of psort

• sorting large databases• sorting large log files• sorting on commodity machines• . . .

Page 53: Two Unrelated Talks

25/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

In a nutshell

the psort sorting library

• written in C++• handles large datasets (> TB)• stable sorting• fast• designed for PC-class machines

ideal applications of psort

• sorting large databases• sorting large log files• sorting on commodity machines• . . .

Page 54: Two Unrelated Talks

26/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

psort and the Sort Benchmark (1/2)

The PennySort Benchmark

Sort what you can in 0.01$ of computing time.

1998

1999

2000

2002

2003

2007

2008

2009

2011

0 GB

50 GB

100 GB

150 GB

200 GB

250 GB

300 GB

350 GB

400 GBye

arly

rec

ord

(Sor

t Ben

chm

ark)

psort

Source: http://sortbenchmark.org

Paolo Bertasi, Marco Bressan, Enoch Peserico. psort, yet another fast stable sorting software.

ACM Journal of Experimental Algorithmics 16: (2011)

Page 55: Two Unrelated Talks

27/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

psort and the Sort Benchmark (2/2)

The Datamation BenchmarkSort 100MB disk-to-disk as fast as you can.

440 msNOW-sort (2001)

980 sthunder (1987)

psort (2011)

Paolo Bertasi, Michele Bonazza, Marco Bressan, Enoch Peserico: Datamation. A Quarter of a

Century and Four Orders of Magnitude Later. CLUSTER 2011: 605-609

Page 56: Two Unrelated Talks

28/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

psort and the STXXL library

101

102

103

1040

20

40

60

80

100

120

140

160

180

200

sort size (in MB)

sort

spe

ed (

in M

B/s

)

stxxl on disks (8,8)stxxl on disks (8,32)stxxl on disks (8,128)stxxl on RAID (8,8)stxxl on RAID (8,32)stxxl on RAID (8,128)psort on RAID (8,8)psort on RAID (8,32)psort on RAID (8,128)

Page 57: Two Unrelated Talks

29/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Machine budget for Sort Benchmark 2011

Power Supply Unit15 EUR

Case22 EUR

CPU38 EUR

RAM47 EURMotherboard

60 EUR

Hard Disks215 EUR

Assembly fee35 EUR

Page 58: Two Unrelated Talks

30/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

The big picture

psort execution diagram

CPU/cache

main memory

external memory

mergesort heap merge heap merge

1st disk pass 2nd disk pass

time

1MB, 10GB/s

1GB, 3GB/s

1TB, 0.7GB/s

Page 59: Two Unrelated Talks

31/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

The big picture - now complicated

Hardware/software details you must deal with:

I/O• hdd quality• file system• scheduling

• buffer size• direct transfer• data placement

memory• size• bandwidth• latency

• page size• access pattern• conflicts

cache• size• speed

• line size• associativity

Page 60: Two Unrelated Talks

32/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Hard disks

The speed curve of 13 “identical” WD1600JS disks

0 50 100 1500

50

100

150

Bandw

idth

(M

B/s

)

Distance from the outer rim (in GB)

Page 61: Two Unrelated Talks

33/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Memory

Why main memory is not really a RAM

0.5

1

1.5

2

2.5

3

3.5

4

4.5

struct size (bytes)

band

wid

th(G

B/s

)

sequential readrandom readsequential writerandom write

20 22 24 26 28 210 212 214 216 218

L2 c

ach

e lin

e s

ize

Page 62: Two Unrelated Talks

34/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

CPU

Is a dual-core always worth its price?

0

5e+09

1e+10

1.5e+10

2e+10

2.5e+10

3e+10

16 18 20 22 24 26 28 30

band

wid

th (

MB

/s)

log2( bytes visited )

Intel dual core readIntel dual core write

AMD single core readAMD single core write

Page 63: Two Unrelated Talks

35/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

A list of psort’s tricks

general• fast polling• payload

detachment

• key pre/postprocessing• . . .

diskaccess

• O_DIRECT• independent

disks

• uniform fetching• . . .

mergesort • smart merging• quasi-in-place

• special base case• . . .

heapsort• key caching• key offsetting

• payload interleaving• . . .

Page 64: Two Unrelated Talks

35/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

A list of psort’s tricks

general• fast polling• payload

detachment

• key pre/postprocessing• . . .

diskaccess

• O_DIRECT• independent

disks

• uniform fetching• . . .

mergesort • smart merging• quasi-in-place

• special base case• . . .

heapsort• key caching• key offsetting

• payload interleaving• . . .

Page 65: Two Unrelated Talks

35/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

A list of psort’s tricks

general• fast polling• payload

detachment

• key pre/postprocessing• . . .

diskaccess

• O_DIRECT• independent

disks

• uniform fetching• . . .

mergesort • smart merging• quasi-in-place

• special base case• . . .

heapsort• key caching• key offsetting

• payload interleaving• . . .

Page 66: Two Unrelated Talks

35/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

A list of psort’s tricks

general• fast polling• payload

detachment

• key pre/postprocessing• . . .

diskaccess

• O_DIRECT• independent

disks

• uniform fetching• . . .

mergesort • smart merging• quasi-in-place

• special base case• . . .

heapsort• key caching• key offsetting

• payload interleaving• . . .

Page 67: Two Unrelated Talks

35/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

A list of psort’s tricks

general• fast polling• payload

detachment

• key pre/postprocessing• . . .

diskaccess

• O_DIRECT• independent

disks

• uniform fetching• . . .

mergesort • smart merging• quasi-in-place

• special base case• . . .

heapsort• key caching• key offsetting

• payload interleaving• . . .

Page 68: Two Unrelated Talks

35/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

A list of psort’s tricks

general• fast polling• payload

detachment

• key pre/postprocessing• . . .

diskaccess

• O_DIRECT• independent

disks

• uniform fetching• . . .

mergesort • smart merging• quasi-in-place

• special base case• . . .

heapsort• key caching• key offsetting

• payload interleaving• . . .

Page 69: Two Unrelated Talks

36/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Smart merging (1/3)

Naive merging

void merge(T *s1, T *s2, T *out, int size) int i = 0, j = 0, k = 0;bool bit;while ((i < size) & (j < size))

if (s1[i] > s2[j]) // READ + READout[k] = s2[j]; // READj++;

else out[k] = s1[i]; // (READ)i++;

k++;...

total mem READs per iteration: 3

Page 70: Two Unrelated Talks

36/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Smart merging (1/3)

Naive merging

void merge(T *s1, T *s2, T *out, int size) int i = 0, j = 0, k = 0;bool bit;while ((i < size) & (j < size))

if (s1[i] > s2[j]) // READ + READout[k] = s2[j]; // READj++;

else out[k] = s1[i]; // (READ)i++;

k++;...

total mem READs per iteration: 3

Page 71: Two Unrelated Talks

37/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Smart merging (2/3)

Smart merging

void merge(T* s1, T* s2, T* out, int size) int i = 0, j = 0, k = 0;bool bit;T cache[ 2 ];cache[0] = s1[0];cache[1] = s2[0];while ((i < size) & (j < size))

if (cache[0] > cache[1]) out[k] = cache[1];cache[1] = s2[j]; // READj++;

else out[k] = cache[0];cache[0] = s1[i]; // (READ)i++;

k++;...

total mem READs per iteration: 1

Page 72: Two Unrelated Talks

37/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Smart merging (2/3)

Smart merging

void merge(T* s1, T* s2, T* out, int size) int i = 0, j = 0, k = 0;bool bit;T cache[ 2 ];cache[0] = s1[0];cache[1] = s2[0];while ((i < size) & (j < size))

if (cache[0] > cache[1]) out[k] = cache[1];cache[1] = s2[j]; // READj++;

else out[k] = cache[0];cache[0] = s1[i]; // (READ)i++;

k++;...

total mem READs per iteration: 1

Page 73: Two Unrelated Talks

38/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Smart merging (3/3)

Time required to merge two sequences

0

100000

200000

300000

400000

500000

600000

700000

800000

10 12 14 16 18 20 22 24

tim

e in m

icro

seconds

log2( merge size )

smart mergenaive merge

Page 74: Two Unrelated Talks

39/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Quasi-in-place mergesort (1/3)

traditional mergesort

void mergesort(T* input, T* output, int size) for (int i = 1; i < log2(size); i++) int subsize = 1 << (i + 1);for (int j = 0; j < size/subsize; j++) merge(&input[j * subsize],

&input[(j + 1) * subsize],&output[j * subsize * 2],subsize);

T* tmp = input; // swap input and outputinput = output;output = tmp;

extra space = N

Page 75: Two Unrelated Talks

39/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Quasi-in-place mergesort (1/3)

traditional mergesort

void mergesort(T* input, T* output, int size) for (int i = 1; i < log2(size); i++) int subsize = 1 << (i + 1);for (int j = 0; j < size/subsize; j++) merge(&input[j * subsize],

&input[(j + 1) * subsize],&output[j * subsize * 2],subsize);

T* tmp = input; // swap input and outputinput = output;output = tmp;

extra space = N

Page 76: Two Unrelated Talks

40/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Quasi-in-place mergesort (2/3)

“quasi-in-place” mergesort

void mergesort(T* input, T* output, int size) for (int i = 1; i < log2(size/2); i++) int subsize = 1 << (i + 1);for (int j = 0; j < size/subsize; j++) /* merge, overwriting the input vector */merge(&input[j * subsize],

&input[(j + 1) * subsize],&input[(j - 1) * subsize],subsize);

input = &input[-subsize]; // shift input left

// finally merge into the output vectormerge(input, &input[size/2], output, size/2);

extra space = N/2

Page 77: Two Unrelated Talks

40/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Quasi-in-place mergesort (2/3)

“quasi-in-place” mergesort

void mergesort(T* input, T* output, int size) for (int i = 1; i < log2(size/2); i++) int subsize = 1 << (i + 1);for (int j = 0; j < size/subsize; j++) /* merge, overwriting the input vector */merge(&input[j * subsize],

&input[(j + 1) * subsize],&input[(j - 1) * subsize],subsize);

input = &input[-subsize]; // shift input left

// finally merge into the output vectormerge(input, &input[size/2], output, size/2);

extra space = N/2

Page 78: Two Unrelated Talks

41/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Quasi-in-place mergesort (3/3)

Average time required to compare two keys

0

0.5

1

1.5

2

2.5

3

3.5

4

10 12 14 16 18 20 22 24

rela

tive

uniti

es

log2( input size in bytes )

quasi-in-place

Page 79: Two Unrelated Talks

42/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Conclusions

1. Solving old problems really fast is still tricky

2. To do it, you must match today’s hardware

3. Solution: software engineering and tuning

Paolo Bertasi, Marco Bressan, Enoch Peserico. psort, yet another fast stable sorting software.

ACM Journal of Experimental Algorithmics 16: (2011)

Page 80: Two Unrelated Talks

43/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Conclusions

Ranking

1. Local computation of PageRank ranking infeasible in theory

2. On tested web/social graphs, infeasible also in practice

3. Rank analysis requires novel tools!

Sorting

1. Solving old problems really fast is still tricky

2. To do it, you must match today’s hardware

3. Software engineering and tuning are the ways

And of course now you should pay me twice! :-)

Page 81: Two Unrelated Talks

43/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Conclusions

Ranking

1. Local computation of PageRank ranking infeasible in theory

2. On tested web/social graphs, infeasible also in practice

3. Rank analysis requires novel tools!

Sorting

1. Solving old problems really fast is still tricky

2. To do it, you must match today’s hardware

3. Software engineering and tuning are the ways

And of course now you should pay me twice! :-)

Page 82: Two Unrelated Talks

43/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Conclusions

Ranking

1. Local computation of PageRank ranking infeasible in theory

2. On tested web/social graphs, infeasible also in practice

3. Rank analysis requires novel tools!

Sorting

1. Solving old problems really fast is still tricky

2. To do it, you must match today’s hardware

3. Software engineering and tuning are the ways

And of course now you should pay me twice! :-)

Page 83: Two Unrelated Talks

43/43

Localcomputation ofPageRank: theranking sideIntroduction

Motivations

Local ranking intheory

Local ranking inpractice

Conclusions

psort, yet anotherfast stableexternal sortingsoftwareIntroduction

Making sorting acomplicate task

Inside psort

Conclusions

Conclusions

Conclusions

Ranking

1. Local computation of PageRank ranking infeasible in theory

2. On tested web/social graphs, infeasible also in practice

3. Rank analysis requires novel tools!

Sorting

1. Solving old problems really fast is still tricky

2. To do it, you must match today’s hardware

3. Software engineering and tuning are the ways

And of course now you should pay me twice! :-)