placing skips optimally in expectation flavio chierichetti, silvio lattanzi, federico mari...

79
Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Upload: ernest-atkins

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Placing Skips Optimally in Expectation

Flavio Chierichetti,Silvio Lattanzi, Federico Mari Alessandro Panconesi

Supported by

Page 2: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Problem Statement

Page 3: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Answering conjunctive queries

3 7 9 14 19 23 41 47

2 4 7 11 19 41 57 62

Latte

Macchiato

query: latte macchiato

Page 4: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Answering conjunctive queries

3 7 9 14 19 23 41 47

2 4 7 11 19 41 57 62

Compute the intersection of 2 sorted lists

Latte

Macchiato

query: latte macchiato

Page 5: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Merging

3 7 9 14 19 23 41 47

2 4 7 9 19 41 57 62

Latte

Macchiato

Page 6: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Merging

3 7 9 14 19 23 41 47

2 4 7 9 19 41 57 62

Latte

Macchiato

Page 7: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Merging

3 7 9 14 19 23 41 47

2 4 7 9 19 41 57 62

Latte

Macchiato

Page 8: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Merging

3 7 9 14 19 23 41 47

2 4 7 9 19 41 57 62

Latte

Macchiato

Page 9: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Merging

3 7 9 14 19 23 41 47

2 4 7 9 19 41 57 62

Latte

Macchiato

Page 10: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Skips

1 2 3 4 5 6 7 8

7 8 17 18 19 41 57 62

Latte

Macchiato

Page 11: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Skips

1 2 3 4 5 6 7 8

7 8 17 18 19 41 57 62

Latte

Macchiato

Page 12: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Skips

1 2 3 4 5 6 7 8

7 8 17 18 19 41 57 62

Latte

Macchiato

Page 13: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Conventional WSDM

t

Skips are placed every N many positions

Page 14: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Question

If we know the query distribution, can we place skips better?

Page 15: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Problem statement

If we know the query distribution, can we place skips in order to minimize the expected time of a merge?

Page 16: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Problem statement

If we know the query distribution, can we place skips in order to minimize the expected time of a merge?

Is the assumption realistic?

Page 17: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

The Power of the Law

Page 18: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

The query distribution contains a lot of information.Can we provably take advantage of it?

Page 19: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Algorithms to follow work with any querydistribution whatsoever

Page 20: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Algorithms to follow work with any querydistribution whatsoever

..and can be extended to deal with soft conjunctions

Page 21: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Outline

• Skip placement policies• A matter of definitions• Algorithms• Experiments

Page 22: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Skip Placement Policies

Page 23: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Spaghetti Skips

Page 24: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Spaghetti Skips

t

Page 25: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Simple Skips

t

Page 26: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Simple Skips

t

This is the most interesting case

Page 27: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

A Matter of Definitions

Page 28: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 1

3 7 9 14 19 23 41 47world

2 4 7 9 19 41 57 62cup

q : world cup

Relevant docs are “useful”

Page 29: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

But usefulness does not coincide with relevance

Page 30: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 ? 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Is the skip useful for q?

Page 31: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 ? 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Is the skip useful for q?

Page 32: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 ? 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Is the skip useful for q?

Page 33: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 ? 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

The skip is useful

Page 34: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 ? ? 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Page 35: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 ? ? 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Page 36: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 16 ? 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Page 37: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 16 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Page 38: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 16 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Page 39: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 16 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

Page 40: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 16 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

The skip is useless

Page 41: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

14 15 16 18 47 ? ? ?world

cup

q : world cup

13 19 41 43 62 ? ? ?

The skip is useless

18 cannot be skipped

Page 42: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Useful Documents: 2

Useful documents are those that cannot

be avoided during a merge

Page 43: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Induced Distributions

The query distribution induces another distribution on the postings

1 2 3 i j k nplatypus

p1 p2 pi pn

Page 44: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Induced Distributions

The query distribution induces another distribution on the postings

1 2 3 i j k nplatypus

p1 p2 pi pn

pi = Pr(i useful for q | platypus q)

Page 45: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Induced Distributions

The query distribution induces another distribution on the postings

1 2 3 i j k nplatypus

p1 p2 pi pn

We will assume this distribution to be known

Page 46: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

In practice these probabilities can be approximated using a small sample of the query universe

Induced Distributions

Page 47: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Making Life Simple

Events like “a is useful” and “b is useful” are not independent

..but from now on we will assume that they are

Page 48: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Making Life Simple

Events like “a is useful” and “b is useful” are not independent

..but from now on we will assume that they are

This simplifying assumption willbe vindicated by our experiments

Page 49: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Algorithms

Page 50: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Algorithms

Input: a list with, for each doc, the probability that it is useful

Output: skip placement that minimizes the expected time to merge

Page 51: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Algorithms

Input: a list with, for each doc, the probability that it is useful

Output: skip placement that minimizes the expected time to merge

cost of a merge = #elements read in posting list

Page 52: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Algorithms

• O(nt) algorithm for spaghetti skips, where t is the average length of a skip

• O(n log n) for simple skips

Page 53: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Algorithms

• O(nt) algorithm for spaghetti skips, where t is the average length of a skip

• O(n log n) for simple skips

O(n log n) algorithm for simple skips is by farthe most interesting

Page 54: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Simple Skips

t: d1d2..di..dn & p1p2..pn

• Build the solution from left to right• M(i) is best placement for prefix

d1..di

Page 55: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Computing M(i)

In computing M(i) we have two choices. We either place a skip landing at position i or we do not

Page 56: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Computing M(i)

i

M(i-1)

If we place no skip to i then M(i) = M(i-1)

Page 57: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Computing M(i)

j i

M(j)

G(j,i)

Page 58: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Want this in O(log n) maxj M(j) + G(j, i)

Computing M(i)

j i

M(j)

G(j,i)

M(i) = max { M(i-1), maxj M(j) + G(j, i) }

Page 59: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Computing M(i)

T(i) i

M(T(i))

G(T(i),i)

maxj M(j) + G(j, i) = M(T(i)) + G(T(i),j)

Page 60: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Monotonicity of T(i)

T(i) i

i+1

Page 61: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Monotonicity of T(i)

T(i) i

T(i+1) i+1

T(i) ≤ T(i+1)

Page 62: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Monotonicity of T(i,k)

k i

T(i,k) is best jump to i under the additionalconstraint that it must start no later than k

Page 63: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Monotonicity of T(i,k)

Key lemma: T(i,k) ≤ T(i+1,k)

k i

Page 64: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Updating T(i,k)

T(i,k-1)

1 1 1 1 1 i1 i1 i1 i1 i2 i2 i2 i2 i3 i3 i3 i4 i4 i4 i4

Page 65: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Updating T(i,k)

T(i,k-1)

1 1 1 1 1 i1 i1 i1 i1 i2 i2 i2 i2 i3 i3 i3 i4 i4 i4 i4

1 < i1 < i2 < i3 < i4 < k-1

Page 66: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Updating T(i,k)

T(i,k-1)

1 1 1 1 1 i1 i1 i1 i1 i2 i2 i2 i2 i3 i3 i3 i4 i4 i4 i4

j

The best skip toreach j starts at

position i1

Page 67: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Updating T(i,k)

T(i,k-1)

1 1 1 1 1 i1 i1 i1 i1 i2 i2 i2 i2 i3 i3 i3 i4 i4 i4 i4

T(i,n) gives the optimal placement

Page 68: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Updating T(i,k)

T(i,k-1)

1 1 1 1 1 i1 i1 i1 i1 i2 i2 i2 i2 i3 i3 i3 i4 i4 i4 i4

T(i,n) gives the optimal placement

T(•,1) T(•,2) … T(•,k) … T(•,n)

Page 69: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Updating T(i,k)

T(i,k-1)

1 1 1 1 1 i1 i1 i1 i1 i2 i2 i2 i2 i3 i3 i3 i4 i4 i4 i4

min {i: T(i,k)=k}

Page 70: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Updating T(i,k)

T(i,k-1)

1 1 1 1 1 i1 i1 i1 i1 i2 i2 i2 i2 i3 i3 i3 i4 i4 i4 i4

T(i,k)

1 1 1 1 1 i1 i1 i1 i1 i2 i2 k k k k k k k k k

min {i: T(i,k)=k}

Page 71: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

The resulting algorithm takes O(N logN)where N is the length of the list

Page 72: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Experiments

Page 73: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Space

Page 74: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Time to merge

Page 75: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Build up time

Page 76: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Size of query sample for simple skips

1/256

Page 77: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

The Bottomline

Simple skips are the solution of choice (for power law distributions):

• They merge as fast as spaghetti skips (the general case)• They occupy less space• Build time is much faster• They need a smaller sample to collect statistics on

document usefulness

Page 78: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Summing up

• First attempt to exploit in a rigorous way knowledge of the distribution

• Much work remains to be done but results are encouraging

Page 79: Placing Skips Optimally in Expectation Flavio Chierichetti, Silvio Lattanzi, Federico Mari Alessandro Panconesi Supported by

Extensions

• Taking the cache into account• Taking dependencies into account• Compare against skip list and other

data structures

Thanks for your attention