element distinctness, frequency moments, and sliding windows · previous element distinctness lower...

38
Element Distinctness, Frequency Moments, and Sliding Windows Rapha¨ el Clifford University of Bristol, UK arXiv:1309.3690 FOCS 2013 Joint work with Paul Beame and Widad Machmouchi

Upload: others

Post on 22-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Element Distinctness, Frequency Moments, andSliding Windows

Raphael Clifford

University of Bristol, UK

arXiv:1309.3690

FOCS 2013

Joint work with Paul Beame and Widad Machmouchi

Page 2: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Time-space tradeoffs

Wikipedia:Beer

1. Frequency moments. E.g. how many different beer cans?

2. Element distinctness (ED). Have I had the same can twice?

Particularly simple to solve if presorted.

Page 3: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Time-space tradeoffs

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Page 4: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Time-space tradeoffs

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Page 5: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Time-space tradeoffs

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Page 6: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Time-space tradeoffs

What is the complexity of these problems using small space?

I Any solution using sorting requires T ∈ Ω(n2/S)[Borodin-Cook 82, Beame 91].

I What is the true complexity using small space?

I Are both problems really as hard as sorting? (No.)

I How about multi-output or sliding window versions?

Page 7: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Sliding window ED and frequency moments

A B C B A B C

I #distinct elements (F0) = 3, 2, 3, 2, 3.

I Sliding window ED gives 1, 0, 1, 0, 1.

I We show sliding window ED is easier than sorting but slidingwindow F0 mod 2 is as hard as sorting.

Page 8: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Our new results

Our new upper and lower bounds:

Single window Sliding window

Frequency moments T ∈ Ω(n√

log(n/S)/ log log(n/S)) [BSSV 03]T ∈ Ω(n2/S) (New)T ∈ O(n2/S) (New)

Element distinctnessT ∈ Ω(n

√log(n/S)/ log log(n/S)) [BSSV 03]

T ∈ O(n√n/S) (New)

T ∈ O(n√n/S) (New)

F0 mod 2 T ∈ O(n2/S) [PR 98]T ∈ O(n2/S) (New)T ∈ Ω(n2/S) (New)

Previous element distinctness lower bounds:

Comparison model Multi-way branching

Borodin et al. 1987 T ∈ Ω(n3/2√

log n/S) -

Yao 1988 T ∈ Ω(n2−ε(n)/S) -Ajtai 1999 - S ∈ o(n)⇒ T ∈ ω(n)

Beame et al. 2003 - T ∈ Ω(n√

log(n/S)/ log log(n/S))

Page 9: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Our new results

Our new upper and lower bounds:

Single window Sliding window

Frequency moments T ∈ Ω(n√

log(n/S)/ log log(n/S)) [BSSV 03]T ∈ Ω(n2/S) (New)T ∈ O(n2/S) (New)

Element distinctnessT ∈ Ω(n

√log(n/S)/ log log(n/S)) [BSSV 03]

T ∈ O(n√n/S) (New)

T ∈ O(n√n/S) (New)

F0 mod 2 T ∈ O(n2/S) [PR 98]T ∈ O(n2/S) (New)T ∈ Ω(n2/S) (New)

Previous element distinctness lower bounds:

Comparison model Multi-way branching

Borodin et al. 1987 T ∈ Ω(n3/2√

log n/S) -

Yao 1988 T ∈ Ω(n2−ε(n)/S) -Ajtai 1999 - S ∈ o(n)⇒ T ∈ ω(n)

Beame et al. 2003 - T ∈ Ω(n√

log(n/S)/ log log(n/S))

Page 10: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Our new results

Our new upper and lower bounds:

Single window Sliding window

Frequency moments T ∈ Ω(n√

log(n/S)/ log log(n/S)) [BSSV 03]T ∈ Ω(n2/S) (New)T ∈ O(n2/S) (New)

Element distinctnessT ∈ Ω(n

√log(n/S)/ log log(n/S)) [BSSV 03]

T ∈ O(n√n/S) (New)

T ∈ O(n√n/S) (New)

F0 mod 2 T ∈ O(n2/S) [PR 98]T ∈ O(n2/S) (New)T ∈ Ω(n2/S) (New)

Previous element distinctness lower bounds:

Comparison model Multi-way branching

Borodin et al. 1987 T ∈ Ω(n3/2√

log n/S) -

Yao 1988 T ∈ Ω(n2−ε(n)/S) -Ajtai 1999 - S ∈ o(n)⇒ T ∈ ω(n)

Beame et al. 2003 - T ∈ Ω(n√

log(n/S)/ log log(n/S))

Page 11: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

The overall method for our upper bounds

I Using Floyd’s (Pollard’s rho) cycle finding algorithm, we

construct a T ∈ O(n√

n/S) randomised branching programalgorithm for single window ED.

I 1-sided error, inversely polynomial in n

I Reduction from sliding-window ED to single window ED.I T ∈ O(n

√n/S) for a single window gives T ∈ O(n

√n/S) for

sliding windows

I Sliding window frequency moments T ∈ O(n2/S) in thecomparison model.

Page 12: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Page 13: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Page 14: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Page 15: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Page 16: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Page 17: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Page 18: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Page 19: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Cycle finding in graphs

Floyd’s “tortoise and hare” algorithm.

Tortoise

Hare

Page 20: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Randomised cycle finding

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

157

ED input

Page 21: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Randomised cycle finding

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

2…

6…

3…

5…

7…

3…

1…

…4

157

37

98

134

157

396

452

812

921

ED input

Page 22: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Randomised cycle finding

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

2…

6…

3…

5…

7…

3…

1…

…4

157

37

98

134

157

396

452

812

921

ED input

Page 23: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Randomised cycle finding

Random

Hash function

: → []

134 921 37 812 396 452 921 98

1 2 3 4 5 6 7 8 9

2…

6…

3…

5…

7…

3…

1…

…4

157

37

98

134

157

396

452

812

921

ED input

475

2

1

3

8 6

9

Induced Graph

Page 24: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

A new small-space upper bound for ED

Sampling uniformly from [n]:

I Expect to find a repeated value after Θ(√n) samples.

I Prob. of any fixed pair of numbers appearing before arepeated value approaches 2/n.

There is a reasonable chance of finding a real input duplicate inone cycle. Using constant space we can repeat Θ(n) times.

But to run faster using more space, we can’t just run S instancesin parallel.

Page 25: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

A new small-space upper bound for ED

Sampling uniformly from [n]:

I Expect to find a repeated value after Θ(√n) samples.

I Prob. of any fixed pair of numbers appearing before arepeated value approaches 2/n.

There is a reasonable chance of finding a real input duplicate inone cycle. Using constant space we can repeat Θ(n) times.

But to run faster using more space, we can’t just run S instancesin parallel.

Page 26: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

A new small-space upper bound for ED

I Maintain a redirection list to split cycles. Update listwhenever a new collision is found.

I Cycles roughly halve in length each time they are visited.

We find all collisions reachable from any S distinct starting pointsusing O(S) items of space and time roughly proportional to thesize of the subgraph explored.

Page 27: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

A new small-space upper bound for ED

Theorem

There is a randomised branching program algorithm computing EDwith 1-sided error that uses space S and T ∈ O(n

√n/S).

1. Run roughly n/S independent runs of collision-finding withindependent random choices of hash functions andindependent choices of roughly S starting indices.

2. Use run-time cut-off bounding the number of explored verticesat 2√Sn.

3. On each run, check if any of the collisions found is a duplicatein x , in which case output ED(x) = 0 and halt.

4. If none is found in any round then output ED(x) = 1.

Page 28: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

A new small-space upper bound for ED

Theorem

There is a randomised branching program algorithm computing EDwith 1-sided error that uses space S and T ∈ O(n

√n/S).

1. Run roughly n/S independent runs of collision-finding withindependent random choices of hash functions andindependent choices of roughly S starting indices.

2. Use run-time cut-off bounding the number of explored verticesat 2√Sn.

3. On each run, check if any of the collisions found is a duplicatein x , in which case output ED(x) = 0 and halt.

4. If none is found in any round then output ED(x) = 1.

Page 29: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Sliding window ED

Theorem

Sliding window ED can be solved in time T ∈ O(n√n/S) with

1-sided error probability o(1/n).

Idea:

I Reduce to single window ED.

I A duplicate in one window determines a large number ofoutputs.

Page 30: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

A general sequential lower bound for sliding window F0

Framework of [Borodin-Cook 82, Abrahamson 91]

T ∈ Ω(n2/S) follows if, for some random input distribution, nomatter how cn input values are fixed, any fixed set of Θ(S) outputvalues occurs with prob. 2−S .

I Informally, for the first S outputs, it is hard to predict thenext output value even when a constant fraction of the inputvector is known.

Page 31: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

A general sequential lower bound for sliding window F0

Framework of [Borodin-Cook 82, Abrahamson 91]

T ∈ Ω(n2/S) follows if, for some random input distribution, nomatter how cn input values are fixed, any fixed set of Θ(S) outputvalues occurs with prob. 2−S .

I Informally, for the first S outputs, it is hard to predict thenext output value even when a constant fraction of the inputvector is known.

Page 32: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a ? ? ? ? a

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

Page 33: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a a a a a ?

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

Page 34: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a b c d e ?

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

Page 35: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a b c d e ?

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

Page 36: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

The sliding window F0 lower bound

Need to show: for the first S outputs, it is hard to predict the nextoutput value even when a constant fraction of the input vector isknown.

I Input uniform over [n]2n−1. But some outputs are easy topredict.

a b c d e ?

I Uniform distribution gives whp Ω(n) positions where:I xi is unique in the input.I F0(i , i + n − 1) of the window is in the range [0.5n, 0.85n].

I Only consider outputs for these positions and show they arehard to predict.

Page 37: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Summing up

I Finding two identical items is easier than sorting the input.

I Sliding window element distinctness is still easier than sorting

I F0 mod 2 may be better than ED as an example of a harddecision problem to study.

I Sliding window F0 mod 2 has the same complexity as sorting(ignoring log factors).

I Is our new complexity for element distinctness in fact tight?

Page 38: Element Distinctness, Frequency Moments, and Sliding Windows · Previous element distinctness lower bounds: Comparison model Multi-way branching Borodin et al. 1987 T 2 ( n3=2 p log

Thank youwww.background-free.com