monitoring tight bounds for distributed functional...1-1 tight bounds for distributed functional...

60
1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan Joint with David Woodruff, IBM Almaden

Upload: others

Post on 22-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

1-1

Tight Bounds for Distributed FunctionalMonitoring

Jan. 2012

Qin Zhang

MADALGO, Aarhus University

NII Shonan meeting, Japan

Joint with

David Woodruff, IBM Almaden

Page 2: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

2-1

The distributed streaming model

· · ·S1 S2 S3 Sk

time

Ccoordinator

sitesA(t) : set of ele-ments received up totime t from all sites.a

aAssume ≤ 1 itemcomes at each time unit.

(a.k.a. distributed functional/continuous monitoring)

Page 3: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

2-2

The distributed streaming model

· · ·S1 S2 S3 Sk

time

Ccoordinator

sitesA(t) : set of ele-ments received up totime t from all sites.a

aAssume ≤ 1 itemcomes at each time unit.

The coordinator needs tomaintain f(A(t)) for all t.

(a.k.a. distributed functional/continuous monitoring)

Page 4: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

2-3

The distributed streaming model

· · ·S1 S2 S3 Sk

time

Ccoordinator

sitesA(t) : set of ele-ments received up totime t from all sites.a

aAssume ≤ 1 itemcomes at each time unit.

The coordinator needs tomaintain f(A(t)) for all t.

Goal: minimize communication cost

(a.k.a. distributed functional/continuous monitoring)

Page 5: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

3-1

Problems

· · ·S1 S2 S3 Sk

time

Ccoordinator

sites

The Distributed Streaming Model

Static case (a one-shot/staticcomputation at the end)

• Top-k

• Heavy-hitter

• . . .

Dynamic case

• Samplings

• Frequent moments

(F0, F1, F2, . . .)

• Heavy-hitter

• Quantile

• Entropy

• Non-linear functions

• . . .

Page 6: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

4-1

What you would like to see:

• Efficient algorithms/protocols

• Practical heuristics

This talk

Page 7: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

4-2

What you would like to see:

• Efficient algorithms/protocols

• Practical heuristics

What you (probably) do not want to see:

• “Useless” impossibility results

• Complicated proofs

This talk

Page 8: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

4-3

What you would like to see:

• Efficient algorithms/protocols

• Practical heuristics

What you (probably) do not want to see:

• “Useless” impossibility results

• Complicated proofs

Unfortunately, in the next 30 minutes ...

This talk

Page 9: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

5-1

The multiparty communication model

x1 = 010011 x2 = 111011

x3 = 111111xk = 100011

We want to compute f(x1, x2, . . . , xk)

f can be bit-wise XOR, OR, AND, MAJ . . .

– A model for lower bounds

Page 10: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

5-2

The multiparty communication model

x1 = 010011 x2 = 111011

x3 = 111111xk = 100011

We want to compute f(x1, x2, . . . , xk)

f can be bit-wise XOR, OR, AND, MAJ . . .

Message passing: If x1talks to x2, others can-not hear.

Blackboard: One speaks,everyone else hears.

– A model for lower bounds

Page 11: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

5-3

The multiparty communication model

x1 = 010011 x2 = 111011

x3 = 111111xk = 100011

We want to compute f(x1, x2, . . . , xk)

f can be bit-wise XOR, OR, AND, MAJ . . .

Message passing: If x1talks to x2, others can-not hear.

Blackboard: One speaks,everyone else hears.

· · ·S1 S2 S3 Sk

Ccoordinator

sites

=

– A model for lower bounds

Page 12: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

6-1

Previously

Some works in the blackboard model. Almost nothing inthe message-passing model.

Page 13: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

6-2

Previously

1. Ω(nk) for the bitwise-XOR/OR/AND/MAJ.

2. Ω(nk) for connectivity.

Some works in the blackboard model. Almost nothing inthe message-passing model.

This SODA, with Jeff Phillips and Elad Verbin we proposeda general and elegant technique called “symmetrization”which works in both variants. In particular, we obtained(in the message-passing model)

Page 14: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

6-3

Previously

1. Ω(nk) for the bitwise-XOR/OR/AND/MAJ.

2. Ω(nk) for connectivity.

Some works in the blackboard model. Almost nothing inthe message-passing model.

This SODA, with Jeff Phillips and Elad Verbin we proposeda general and elegant technique called “symmetrization”which works in both variants. In particular, we obtained(in the message-passing model)

Artificial? Well ...

Page 15: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

6-4

Previously

1. Ω(nk) for the bitwise-XOR/OR/AND/MAJ.

2. Ω(nk) for connectivity.

Some works in the blackboard model. Almost nothing inthe message-passing model.

This SODA, with Jeff Phillips and Elad Verbin we proposeda general and elegant technique called “symmetrization”which works in both variants. In particular, we obtained(in the message-passing model)

Artificial? Well ...

In any case, let’s look at real important problems.

Page 16: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

7-1

Now, important problems

• Samplings

• Frequent moments

(F0, F1, F2, . . .)

• Heavy-hitter

• Quantile

• Entropy

• . . .

Page 17: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

7-2

Now, important problems

• Samplings

• Frequent moments

(F0, F1, F2, . . .)

• Heavy-hitter

• Quantile

• Entropy

• . . .

Solved

Page 18: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

7-3

Now, important problems

• Samplings

• Frequent moments

(F0, F1, F2, . . .)

• Heavy-hitter

• Quantile

• Entropy

• . . .

Solved

Our work

Page 19: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

8-1

Results

Page 20: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

8-2

Results

• (Almost) tight bounds for all these questions

• Static lower bounds (almost) match dynamic upper bounds.

(up to polylog factors)

Page 21: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

8-3

Results

• (Almost) tight bounds for all these questions

• Static lower bounds (almost) match dynamic upper bounds.

(up to polylog factors)

Today

Page 22: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

9-1

F0 upper bound(Cormode, Muthu and Yi 2008)

Page 23: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

10-1

The (1 + ε)-approximation F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

1

59

457

28

57

6

10

How many distinct items?

A fundamental problem indata analysis.

Page 24: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

10-2

The (1 + ε)-approximation F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

1

59

457

28

57

6

10

How many distinct items?

A fundamental problem indata analysis.

Current best UB: O(k/ε2) (Cormode, Muthu, Yi 2008)

Holds in the dynamic case.

Page 25: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

11-1

General idea for the one-shot computation

Each site generates a “sketch” via small-spacestreaming algorithms.

The coordinator combines (via communication)the sketches from the k sites to obtain a globalsketch, from which we can extract the answer.

Page 26: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

12-1

The FM sketch

Take a pair-wise independent random hash functionh : 1, . . . , n → 1, . . . , 2d, where 2d > n

Page 27: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

12-2

The FM sketch

Take a pair-wise independent random hash functionh : 1, . . . , n → 1, . . . , 2d, where 2d > n

For each incoming element x, compute h(x)

e.g., h(5) = 10101100010000

Count how many trailing zeros

Remember the max # trailing zeroes in any h(x)

Page 28: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

12-3

The FM sketch

Take a pair-wise independent random hash functionh : 1, . . . , n → 1, . . . , 2d, where 2d > n

For each incoming element x, compute h(x)

e.g., h(5) = 10101100010000

Count how many trailing zeros

Remember the max # trailing zeroes in any h(x)

Let Y be the max # trailing zeroes

Can show E[2Y ] = #distinct elements

Page 29: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

13-1

One-shot case, the FM sketch (cont.)

So 2Y is an unbiased estimator for # distinct elements

Page 30: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

13-2

One-shot case, the FM sketch (cont.)

So 2Y is an unbiased estimator for # distinct elements

However, has a large variance

Some techniques [Bar-Yossef et. al. 2002] can produce agood estimator that has probability 1−δ to be within relativeerror ε.

Space increased to O(1/ε2)

Page 31: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

13-3

One-shot case, the FM sketch (cont.)

So 2Y is an unbiased estimator for # distinct elements

However, has a large variance

Some techniques [Bar-Yossef et. al. 2002] can produce agood estimator that has probability 1−δ to be within relativeerror ε.

Space increased to O(1/ε2)

FM sketch has linearity

Y1 from A, Y2 from B, then 2maxY1,Y2 estimates # distinctitems in A ∪B.

Page 32: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

13-4

One-shot case, the FM sketch (cont.)

So 2Y is an unbiased estimator for # distinct elements

However, has a large variance

Some techniques [Bar-Yossef et. al. 2002] can produce agood estimator that has probability 1−δ to be within relativeerror ε.

Space increased to O(1/ε2)

FM sketch has linearity

Y1 from A, Y2 from B, then 2maxY1,Y2 estimates # distinctitems in A ∪B.

Thus, we can use it to design a one-shot algorithm withcommunication O(k/ε2)

Page 33: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

14-1

F0 lower bound

Page 34: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

15-1

The F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

159

457

28 5

76

10

How many distinct items?

A fundamental problem indata analysis.

Page 35: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

15-2

The F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

159

457

28 5

76

10

How many distinct items?

A fundamental problem indata analysis.

(Cormode, Muthu, Yi, 2008)

Holds in the dynamic case.

Our LB: Ω(k/ε2).Holds in the static and message-passing case.

Current best UB: O(k/ε2)

Previous LB: Ω(k) (Cormode, Muthu, Yi, 2008)

Ω(1/ε2) (reduction from Gap-Hamming)

Page 36: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

15-3

The F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

159

457

28 5

76

10

How many distinct items?

A fundamental problem indata analysis.

(Cormode, Muthu, Yi, 2008)

Holds in the dynamic case.

Our LB: Ω(k/ε2).Holds in the static and message-passing case.

Tight!

Current best UB: O(k/ε2)

Previous LB: Ω(k) (Cormode, Muthu, Yi, 2008)

Ω(1/ε2) (reduction from Gap-Hamming)

Page 37: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

16-1

The proof framework

Step 1: We first introduce a simpler problem calledk-GAP-MAJ

Step 2: We compose k-GAP-MAJ with the Set Dis-jointness problem using information cost to prove alower bound for F0

Page 38: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

17-1

k-GAP-MAJ

We have k sites S1, S2, . . . , Sk. Si holds a bit Zi which is 1 w.p.β and 0 w.p. 1−β where ω(1/k) ≤ β ≤ 1/2 is a prefixed value.

Our goal: compute the following function.

GM(Z1, Z2, . . . , Zk) =

0, if

∑i∈[k] Zi ≤ βk −

√βk,

1, if∑

i∈[k] Zi ≥ βk +√βk,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

Page 39: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

17-2

k-GAP-MAJ

We have k sites S1, S2, . . . , Sk. Si holds a bit Zi which is 1 w.p.β and 0 w.p. 1−β where ω(1/k) ≤ β ≤ 1/2 is a prefixed value.

Our goal: compute the following function.

GM(Z1, Z2, . . . , Zk) =

0, if

∑i∈[k] Zi ≤ βk −

√βk,

1, if∑

i∈[k] Zi ≥ βk +√βk,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

Lemma 1: If a protocol P computes k-GAP-MAJ correctly w.p.0.9999, then w.p. Ω(1), the protocol has to learn at least Ω(k)of Zi each with Ω(1) bit (that is, H(Zi | Π) ≤ Hb(0.01β)).

Page 40: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

17-3

k-GAP-MAJ

We have k sites S1, S2, . . . , Sk. Si holds a bit Zi which is 1 w.p.β and 0 w.p. 1−β where ω(1/k) ≤ β ≤ 1/2 is a prefixed value.

Our goal: compute the following function.

GM(Z1, Z2, . . . , Zk) =

0, if

∑i∈[k] Zi ≤ βk −

√βk,

1, if∑

i∈[k] Zi ≥ βk +√βk,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

Alternatively: I(Z1, Z2, . . . , Zk; Π) = Ω(k)

Lemma 1: If a protocol P computes k-GAP-MAJ correctly w.p.0.9999, then w.p. Ω(1), the protocol has to learn at least Ω(k)of Zi each with Ω(1) bit (that is, H(Zi | Π) ≤ Hb(0.01β)).

Page 41: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

18-1

Set disjointness (2-DISJ)

Alice Bob

x ∈ 0, 1n y ∈ 0, 1n

x ∩ y = ∅?

Page 42: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

18-2

Set disjointness (2-DISJ)

Alice Bob

x ∈ 0, 1n y ∈ 0, 1n

x ∩ y = ∅?

A classical hard instance:

Distribution µ: X and Y are both random subsets of size ` =(n+1)/4 from [n] such that |X∩Y | = 1 w.p. β and |X∩Y | = 0w.p. 1− β.

Razborov [1990] shows an Ω(n) for this hard distribution anderror β/100.

Page 43: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

19-1

Next step: Compose k-GAP-MAJ with 2-DISJ

· · ·S1 S2 S3 Sk

Ccoordinator

sites

n = Θ(1/ε2)` = (n+ 1)/4β = 1/kε2

Page 44: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

19-2

Next step: Compose k-GAP-MAJ with 2-DISJ

· · ·S1 S2 S3 Sk

Ccoordinator

sites

n = Θ(1/ε2)` = (n+ 1)/4β = 1/kε2

Step 1: Pick Y = y ⊂ [n] ofsize ` uniformly at random

Y

Page 45: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

19-3

Next step: Compose k-GAP-MAJ with 2-DISJ

· · ·S1 S2 S3 Sk

Ccoordinator

sites

n = Θ(1/ε2)` = (n+ 1)/4β = 1/kε2

Step 1: Pick Y = y ⊂ [n] ofsize ` uniformly at random

Step 2: Pick X1, . . . , Xk ⊂ [n] indepedently and randomly from µ|Y =y

Y

X1 X2 X3 Xk

Page 46: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

19-4

Next step: Compose k-GAP-MAJ with 2-DISJ

· · ·S1 S2 S3 Sk

Ccoordinator

sites

n = Θ(1/ε2)` = (n+ 1)/4β = 1/kε2

Step 1: Pick Y = y ⊂ [n] ofsize ` uniformly at random

Step 2: Pick X1, . . . , Xk ⊂ [n] indepedently and randomly from µ|Y =y

Y

X1 X2 X3 Xk

F0(X1, X2, . . . , Xk)?

Page 47: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

20-1

The proof

F0(X1, X2, . . . , Xk) ⇐⇒ k-GAP-MAJ(Z1, Z2, . . . , Zk)

(Zi = |Xi ∩ Y |)⇐⇒ learn Ω(k) Zi’s well

(by Lemma 1)

⇐⇒ need Ω(k/ε2) bits

(learning each Zi = |Xi ∩ Y | well needsΩ(n) = Ω(1/ε2) bits, by 2-DISJ)

· · ·S1 S2 S3 Sk

Ccoordinator

sites

Y

X1 X2 X3 Xk

Zi = |Xi ∩ Y |

1 w.p. β0 w.p. 1− β

Page 48: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

20-2

The proof

F0(X1, X2, . . . , Xk) ⇐⇒ k-GAP-MAJ(Z1, Z2, . . . , Zk)

(Zi = |Xi ∩ Y |)⇐⇒ learn Ω(k) Zi’s well

(by Lemma 1)

⇐⇒ need Ω(k/ε2) bits

(learning each Zi = |Xi ∩ Y | well needsΩ(n) = Ω(1/ε2) bits, by 2-DISJ)

· · ·S1 S2 S3 Sk

Ccoordinator

sites

Y

X1 X2 X3 Xk

Zi = |Xi ∩ Y |

Q.E.D.

1 w.p. β0 w.p. 1− β

Page 49: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

21-1

Proof:

1. Suppose Π does not satisfy this.

2. Since the Zi are independent given Π,∑k

i=1 Zi | Π is a sumof independent Bernoulli random variables.

3. Since most H(Zi | Π) are large, by anti-concentration, bothof the following events occur with constant probability:

•∑k

i=1 Zi | Π > βk +√βk,

•∑k

i=1 Zi | Π < βk −√βk.

4. So P can’t succeed with large probability.

Proof sketch of Lemma 1

Lemma 1: If a protocol P computes k-GAP-MAJ correctly w.p.0.9999, then w.p. Ω(1), for Ω(k) Zi’s, we haveH(Zi | Π) ≤ Hb(0.01β).

Page 50: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

22-1

F2 lower bound

Page 51: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

23-1

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F2(∪i∈kXi) up to (1 + ε)-approximation.

What’s the size of self-join?

Another fundamental problemin data analysis.

The F2 problem

2

7

9

2

4

2

2

7

9

2

4

2

Join

Page 52: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

23-2

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F2(∪i∈kXi) up to (1 + ε)-approximation.

What’s the size of self-join?

Another fundamental problemin data analysis.

The F2 problem

2

7

9

2

4

2

2

7

9

2

4

2

Join

Previous UB: O(k2/ε+ k1.5/ε3)(Cormode, Muthu, Yi 2008)

Our UB: O(k/poly(ε)), one way protocolHolds in the dynamic case.

Previous LB: Ω(k)Our LB: Ω(k/ε2).Holds in the static and blackboard case.

(Cormode, Muthu, Yi, 2008)

Page 53: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

23-3

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F2(∪i∈kXi) up to (1 + ε)-approximation.

What’s the size of self-join?

Another fundamental problemin data analysis.

The F2 problem

2

7

9

2

4

2

2

7

9

2

4

2

Join

Previous UB: O(k2/ε+ k1.5/ε3)(Cormode, Muthu, Yi 2008)

Our UB: O(k/poly(ε)), one way protocolHolds in the dynamic case.

Previous LB: Ω(k)Our LB: Ω(k/ε2).Holds in the static and blackboard case.

(Cormode, Muthu, Yi, 2008)Almost Tight!

Page 54: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-1

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

Page 55: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-2

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

Page 56: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-3

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

2 copies

k-XOR

Page 57: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-4

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-BTA

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

2 copies

k-XOR

compose via in-formation cost

Page 58: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-5

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-BTA

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

CC(k-BTA) = Ω(k/ε2)

2 copies

k-XOR

compose via in-formation cost

Page 59: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-6

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-BTA

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

CC(k-BTA) = Ω(k/ε2)

Finally, we reduce F2 to k-BTA.

2 copies

k-XOR

compose via in-formation cost

Page 60: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

25-1

The end

T HANK YOU

Q and A