lower bounds for read / write streams

33
Lower Bounds for Read/Write Streams Paul Beame Joint work with Trinh Huynh (Dang- Trinh Huynh-Ngoc) University of Washington

Upload: shasta

Post on 09-Jan-2016

46 views

Category:

Documents


3 download

DESCRIPTION

Lower Bounds for Read / Write Streams. Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington. Data stream Algorithms. Many huge successes No need to remind people at this workshop! Some problems provably hard - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lower Bounds for   Read / Write Streams

Lower Bounds for Read/Write Streams

Paul Beame

Joint work with Trinh Huynh (Dang-Trinh Huynh-

Ngoc)

University of Washington

Page 2: Lower Bounds for   Read / Write Streams

Data stream Algorithms

• Many huge successes– No need to remind people at this workshop!

• Some problems provably hard

– E.g. Frequency moments Fk, k > 2 require space Ω(n1-2/k) [Bar-Yossef-Jayram-Kumar-Sivakumar 02], [Chakrabarti-Khot-Sun 03]

Page 3: Lower Bounds for   Read / Write Streams

Beyond Data Streams

• Disk storage can be huge– Can stream data to/from disks in real time

• Sequential access hides latency– Motivates multipass streams

• Analyzed by similar methods to single pass

• Why stop at a single copy?– Working with more than one copy at once may

make computations easier

• Why stream the data onto disks exactly as read?– Can make modifications to data while writing

Page 4: Lower Bounds for   Read / Write Streams

0 1 0 0 1 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0

0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 0 0 0

Read/write streams model

• Disks read/write streams– Key Parameters: space, #passes=reversals– Assume #streams is constant

• Introduced by [Grohe-Schweikardt 05]

0 1 0 0 1 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0

memory

0 0 1 1 1 1 0 1 0

Page 5: Lower Bounds for   Read / Write Streams

Read/write streams model

• Much more powerful than data-stream model– Sort with O(log n) passes, O(log n) space, 3

streams• MergeSort

– Exactly compute any frequency moment• Data-stream requires passes space = Ω(n)

– Θ(log n) passes, O(1) space gives all of LOGSPACE [Hernich-Schweikardt 08]

What can be computed in o(log n) passes + small space?

Page 6: Lower Bounds for   Read / Write Streams

Previous lower bounds for R/W streams

• In o(log n) passes need Ω(n1-ε) space to– Sort n numbers

[Grohe-Schweikardt 05]– Test set-equality A=B, multiset equality,

XQuery, XPath

[Grohe-Hernich-Schweikardt 06]

• Same lower bounds apply for randomized algorithms with one-sided error [Grohe-Hernich-Schweikardt 06]

Page 7: Lower Bounds for   Read / Write Streams

Previous lower bounds for R/W streams

• Lower bounds for general randomness and two-sided error:– In o(log nlog log n) passes, need Ω(n1-ε)

space to:• Approximate F

* within factor 2 • Find Empty-Join, XQuery/XPath-Filtering etc.

[B-Jayram-Rudra 07]

What about approximating frequency moments Fk for k 2 ?

Page 8: Lower Bounds for   Read / Write Streams

Our Main Result

Theorem: Any randomized R/W-stream algorithm using o(log n) passes needs Ω(n1-4/k-ε) space to 2-approximate Fk

• Implies polynomial space for k>4

• Compare with: Θ(n1-2/k) on data streamsR/W streams with o(log n) passes don’t help

much for approximating frequency moments.R/W streams with o(log n) passes don’t help much for approximating frequency moments.

Page 9: Lower Bounds for   Read / Write Streams

Methods

Page 10: Lower Bounds for   Read / Write Streams

1. Reduce testing t-party set-disjointness to Fk

Easy!

2. Simulate any data-stream algorithm by amulti-party number-in-hand communication game

Trivial!

3. Apply Ω(n/t) communication lower bound on t-party set-disjointness

[AMS 96,Saks-Sun 02,Bar-Yossef-Jayram-Kumar-Sivakumar

02, Chakrabarti-Khot-Sun 03,Grönemeier 09] (tight!)

[Alon-Matias-Szegedy 96] approach to lower bounding Fk in data streams

Fails for R/W streams!

Fails for R/W streams!

Solved easily by R/W streams!Solved easily by R/W streams!

Cannot be applied to R/W streams!

Cannot be applied to R/W streams!

Page 11: Lower Bounds for   Read / Write Streams

Promise Set-Disjointness (DISJ)

0, x1,…,xt are pair-wise disjoint

DISJn,t(x1,…,xt) = 1, a s.t. a xi for every i

Undefined otherwise

0 1 0 1 0 0 1 0 1 0 0 0 1 0 01 0 0 0 1 0 0 0 1 0 0 0 0 0 10 0 1 0 0 0 0 1 1 0 1 0 0 0 00 0 0 0 0 1 0 0 1 0 0 1 0 0 00 0 0 0 0 0 0 0 1 0 0 0 0 1 0

x1

x2

x3

x4

x5

• t-party NIH communication: Ω(nt)• Approximating Fk testing DISJn,t for t n1/k

Page 12: Lower Bounds for   Read / Write Streams

xtxt-1x2x1

• Testing DISJn,t with 2 streams,3 passes,O(log n)

space

• Input: x1,x2,…,xt{0,1}n

R/W streams easily solve DISJn,t

x1 x2 xt-1 xt

Page 13: Lower Bounds for   Read / Write Streams

• Lower bounds [GS05], [GHS05], [BJR07] for R/W streams don’t use [AMS96] outline

– Introduce permuted 2-party versions of problems

– Employ ad-hoc combinatorial arguments

How to prove lower bounds in R/W streams?

We take a more general approach related to [AMS96] directly using NIH comm. complexity

Page 14: Lower Bounds for   Read / Write Streams

Our approach to lower bound Fk

R/W streams algorithm for

t-party-permuted-DISJ

on input size n

Number-in-hand communication protocol for t-party-DISJ

on input size nt2

Page 15: Lower Bounds for   Read / Write Streams

1.Reduce testing t-party set-disjointness to Fk

Easy!

2.Simulate data-stream algorithms bymulti-party number-in-hand communication game

Apply our simulation

3.Apply communication lower bound on t-party set-disjointness

[AMS96,SS02,B-YJKS02,CKS03,G09] (tight!)

2. Simulate R/W streams for permuted DISJ by NIH comm. for DISJ on slightly smaller input size

1. Reduce testing permuted t-party DISJ to Fk

[Alon,Matias,Szegedy 96]’s approach to lower bound Fk in data streamOur approach to lower bound Fk

in R/W streams

Page 16: Lower Bounds for   Read / Write Streams

Ideas from the proof

Page 17: Lower Bounds for   Read / Write Streams

Segmenting DISJn,t

Input: x1,x2,…,xt{0,1}n

• View DISJn,t as an OR of m subproblems DISJn/m,t

x1 x2 xt-1 xt

1 2 m

nm 1 2 m

nm

Page 18: Lower Bounds for   Read / Write Streams

Fix 1,2,…,t permutations on [m]

Permuted-DISJn,m,t

• View Permuted-DISJn,m,t as an OR of m subproblems

DISJn/m,t

Permuted DISJ

1(1) 1(2) 1(m)

1(x1) 2(x2) t(xt)

1 2 m

DISJn/m,tDISJn/m,t

nm

DISJn/m,tDISJn/m,t

1 2 m

nm

t(1) t(2) t(m)

Page 19: Lower Bounds for   Read / Write Streams

• Intuitively, to solve a subproblem (e.g. blue), we need to compare at least two blue

segments

• Need to compare at least two segments of every color

• If segments are shuffled, many passes are needed

Why is permuted-DISJ hard?

i(xi) j(xj) l(xl)

DISJn/m,tDISJn/m,t

Page 20: Lower Bounds for   Read / Write Streams

Permuted DISJ• Good subproblem: computation always depends

only on at most one of its t segments (and the memory/state)

• If segments are randomly shuffled:With o(log m) passes, t=o(m1/2) parties,

99% of the m subproblems are good• Reduction idea: Try to embed an ordinary

DISJn/m,t in one of the good subproblems

Catch: Which subproblems are good depends on input

Page 21: Lower Bounds for   Read / Write Streams

t players on input y1,y2,…,yt:1. Generate m-1 DISJn/m,t’s

that look like* y1,y2,…,yt

2. Shuffle with 1,2,…,t

• (y1,y2,…,yt) is good w.h.p

3. Run A on 1(x1),…,t(xt)

Simulation

s-space R/W streams algo A for permuted-

DISJn,m,t

NIH comm. protocol

for DISJn/m,t

y1

y2

1(x1)

2(x2)

x1

x2

*same sizes but don’t intersect

Page 22: Lower Bounds for   Read / Write Streams

Generating the extended input

Given y1,y2,…,yt, players– Exchange the sizes of each of the sets

• O(t log n) bits– Choose random consistent reordering of the indices

of each y1,y2,…,yt

– Generate m-1 random inputs to DISJn/m,t with same set sizes as y1,y2,…,yt but that are disjoint

– Place y1,y2,…,yt in random position and then shuffle

Key observation: If y1,y2,…,yt are disjoint then this resolves the catch– After shuffling, all the subproblems look the same

so the probability that the subproblem where y1,y2,…,yt lands is good does not depend on the input

Page 23: Lower Bounds for   Read / Write Streams

Simulating R/W stream algorithm A using NIH

communication• As A executes on input v=1(x1),…,t(xt) players

know all inputs except y1,…,yt – each player builds up copy of a dependency graph

σ(v) for the elements of each stream so far• Using σ(v), at each step all players either

– know the next move, or – know which one player knows next block of moves

• that player communicates – know that need two players’ info: simulation

“fails” • If subproblem y1,…,yt is good for v then simulation

does not fail• If players detect failure they output “not disjoint”

– If input was disjoint then only 1% chance of this

Page 24: Lower Bounds for   Read / Write Streams

Dependency Graph

pass j

pass j+1

Stream R to L Stream L to R

Stream L to R

Vertices: Elements of each stream in each passEdges: From element to elements in previous pass that contained heads at same time it did

pass j -1

pass 0

pass 1

Page 25: Lower Bounds for   Read / Write Streams

Why most subproblems are good

• Simple case: algorithm just makes copies of the input stream and compares them– # of subproblems with > 1 segment read at same

time on single pass through the streams (L-to-R or R-to-L on each stream)

• ≤ # segments appearing in the same (or reversed) order

– Almost surely, for random permutations 1,2,…,t

no pair has a common subsequence or inverted subsequence longer than 2em1/2

– When t is o(m1/2) the total is o(m).

Page 26: Lower Bounds for   Read / Write Streams

Why most subproblems are good

• General case: May combine information about all streams onto a single stream in single pass– What is combined may depend on the

input values

– Each element depends on the segments that it can reach in the input stream via the dependency graph

Page 27: Lower Bounds for   Read / Write Streams

• For each fixed v, after p=o(log m) passes: – Each element can depend on only 2O(p) different

input segments

– For any one stream, the sequence of its

elements’ dependencies on input segments is

the interleaving of 2O(p) monotone

subsequences from 1,2,…,t

Only 2O(p) t m1/2=mo(1) bad subproblems on

input v

Why most subproblems are good

Page 28: Lower Bounds for   Read / Write Streams

Communication Cost of Simulation

• For each fixed v, after p=o(log m) passes: – Only 2O(p) t elements depend on a segment and

have a neighbor that does not depend on it

• Players only need to communicate when segment dependencies change – only happens 2O(p)t times at cost of O(ps) bits

per time

Page 29: Lower Bounds for   Read / Write Streams

Limitations and Future Work

Page 30: Lower Bounds for   Read / Write Streams

• Gap from data stream due to loss in input size

• Most of this loss is necessary– Need nm (t2) to use Ω(n/t) CC lower bound for

DISJn/m,t

– Efficient R/W algo for permuted-DISJn,m,t unless m ≥ t32

– Implies that n is Ω(mt2) which is Ω(t3.5)

Since we need t≈n1/k, the lower bound Ω(n/t) is trivial for k 3.5

Limitation of using permuted-DISJ

R/W streams algo for

permuted-DISJn,m,t

NIH CC protocol for DISJn/m,t

Page 31: Lower Bounds for   Read / Write Streams

• Algorithm for permuted-DISJn,m,t follows from the following theorem:

Proof: For each i [m] define a triple ti of integers:

For each of the 3 pairs of permutations put length of the longest common subsequence for that pair that ends with value i. Can show that all m triples are different.

So some triple must contain a coordinate ≥ m1/3

• Tight even for 4 permutations

In any 3 permutations on [m] there is a pair

with

longest common subsequence length ≥

m1/3.

In any 3 permutations on [m] there is a pair

with

longest common subsequence length ≥

m1/3.

A longest-common-subsequence problem on

permutations

Page 32: Lower Bounds for   Read / Write Streams

t m2/3, any : Testing permuted-DISJn,m,t

with 2 streams, 3 passes, O(log nmt) space

R/W stream algorithm for permuted-DISJn,m,t for large t

In any three permutations on [m] there is a pair

with

longest common subsequence length ≥ m1/3.

In any three permutations on [m] there is a pair

with

longest common subsequence length ≥ m1/3.

1(x1) 2(x2) 3(x3) 4(x4) 5(x5) 6(x6)

1(x1) 2(x2) 3(x3) 4(x4) 5(x5) 6(x6)

• Compare m1/3 blocks each time

Page 33: Lower Bounds for   Read / Write Streams

Open problems

• Is Ω(n1-4/k-ε) lower bound for R/W streams tight?– Gap from O(n1-2/k) upper bound in data stream

• Can’t use permuted-DISJn,m,t to close it

– Polynomial space to compute Fk for 2 < k ≤ 4 ?

• Other problems on R/W streams?• L(m,k) maximum LCS length that can be guaranteed

between some pair in any set of k permutations on [m].

– We show L(m,3) L(m,4) m1/3

– What is L(m,k) for other values of k?

– [B-Blais-Huynh 08] L(m,k) = m1/3+o(1) for k mO(1)