testing collections of properties reut levi dana ron ronitt rubinfeld ics 2011

Post on 13-Jan-2016

230 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Testing Collections of Properties

Reut Levi Dana Ron

Ronitt Rubinfeld

ICS 2011

Shopping distribution

What properties do your distributions have?

Transactions in California Transactions in New York

Testing closeness of two distributions:

trend change?

Testing Independence:Shopping patterns:

Independent of zip code?

This work: Many distributions

One distribution:

D is arbitrary black-box distribution over [n], generates iid samples.

Sample complexity in terms of n? (can it be sublinear?)

D

Test

samples

Pass/Fail?

Uniformity (n1/2) [Goldreich, Ron 00] [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] [Paninski 08]

Identity (n1/2) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01]

Closeness (n2/3) [Batu, Fortnow, Rubinfeld, Smith, White], [Valiant 08]

Independence O(n12/3 n2

1/3), (n12/3 n2

1/3) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] , this work

Entropy n1/β^2+o(1) [Batu, Dasgupta, Kumar, Rubinfeld 05], [Valiant 08]

Support Size (n/logn) [Raskhodnikova, Ron, Shpilka, Smith 09], [Valiant, Valiant 10]

Monotonicity on total order (n1/2) [Batu, Kumar, Rubinfeld 04]

Monotonicity on poset n1-o(1)

[Bhattacharyya, Fischer, Rubinfeld, Valiant 10]

Some answers…

Collection of distributions:

Two models: Sampling model:

Get (i,x) for random i, xDi

Query model: Get (i,x) for query i and xDi

Sample complexity in terms of n,m?

D1

Test

samples

Pass/Fail?

D2 Dm…

Further refinement: Known or unknown distribution on i’s?

Properties considered:

Equivalence All distributions are equal

``Clusterability’’ Distributions can be clustered into k

clusters such that within a cluster, all distributions are close

Equivalence vs. independence

Process of drawing pairs: Draw i [m], x Di output (i,x)

Easy fact: (i,x) independent iff Di‘s are equal

Results

Def: (D1,…Dm) has the Equivalence property if Di = Di' for all 1 ≤ i, i’ ≤ m.

Lower Bound Upper Bound

n>m (n2/3m1/3) Unknown Weights Õ(n2/3m1/3)

m>n (n1/2m1/2) Õ(n1/2m1/2) Known Weights

Also yields “tight” lower bound for independence testing

Clusterability

Can we cluster distributions s.t. in each cluster, distributions (very) close? Sample complexity of test is

O(kn2/3) for n = domain size, k = number of clusters No dependence on number of distributions Closeness requirement is very stringent

Open Questions

• Clusterability in the sampling model, less stringent notion of close

• Other properties of collections?• E.g., all distributions are shifts of each other?

Thank you

top related