a tutorial on property testing dana ron tel aviv university

41
A Tutorial on Property Testing Dana Ron Tel Aviv University

Post on 22-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Tutorial on Property Testing Dana Ron Tel Aviv University

A Tutorial on Property Testing

Dana Ron

Tel Aviv University

Page 2: A Tutorial on Property Testing Dana Ron Tel Aviv University

Property Testing (Informal Definition)

For a fixed property P and any object O,determine whether O has property P,or whether O is far from having property P (i.e., far from any other object having P ).

Task should be performed by querying the object (in as few places as possible).

? ?

?

??

Page 3: A Tutorial on Property Testing Dana Ron Tel Aviv University

Examples

• The object can be a graph (represented by its adjacency matrix), and the property can be 3-colorabilty.

• The object can be a string and the property can be membership in a given regular language LL.

• The object can be a function and the property can be linearity.

Page 4: A Tutorial on Property Testing Dana Ron Tel Aviv University

Context

• A relaxation of exactly deciding whether the object has the property.

• A relaxation of learning the object.

Property testing can be viewed as:

In either case want testing algorithm to be significantly more efficient than decision/learning algorithm.

Page 5: A Tutorial on Property Testing Dana Ron Tel Aviv University

When can Property Testing be Useful?

• Object is to too large to even fully scan, so must make approximate decision.

• Object is not too large but (1) Exact decision is NP-hard (e.g. coloring)(2) Prefer sub-linear approximate algorithm to polynomial exact algorithm.

• Use Testing as preliminary step to exact decision or learning. In first case can quickly rule out object far from property. In second case can aid in efficiently selecting good hypothesis class.

Page 6: A Tutorial on Property Testing Dana Ron Tel Aviv University

Property Testing - Background

• Initially defined by Rubinfeld and Sudan in the context of Program Testing (of algebraic functions).

• Goldreich Goldwasser and Ron initiated study of testing properties of graphs.

• Growing body of work deals with properties of functions, graphs, strings, sets of points ...

Many algorithms with complexity that is sub-linear in (or even independent of) size of object.

Page 7: A Tutorial on Property Testing Dana Ron Tel Aviv University

Talk Organization

Will discuss four topics:

• Testing Algebraic Properties of Functions:

Linearity Testing [BLR]

• Testing “Basic” (non-algebraic) Properties of Functions: Singletons, Monomials, small DNF [PRS]

• Testing Graph Properties: Testing Bipartiteness [GGR]

• Testing Properties of strings: Testing Membership in Regular Languages [AKNS]

Page 8: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Algebraic Properties of Functions: Linearity Testing [BLR]

Page 9: A Tutorial on Property Testing Dana Ron Tel Aviv University

Linearity Testing

Def1: Let F be a finite field. A function f : Fm F is called linear (multi-linear) if there exists constants a1,…,am F s.t. for every x=x1,…,xm Fm it holds that

f(x) = aixi .

Fact: A function f : Fm F is linear i.f.f for every x,y Fm it holds that f(x)+f(y)=f(x+y) .

Def2: A function f is said to be -far from linear if for every linear function g, dist(f,g)>, where dist(f,g)=Pr[f(x) g(x)] (x selected uniformly in Fm).

Page 10: A Tutorial on Property Testing Dana Ron Tel Aviv University

Linearity Testing Cont’

Linearity Test (Input: F, m, )

1) Uniformly and independently select (1/) pairs of elements x,y Fm .

2) For every pair x,y selected, verify that f(x)+f(y) = f(x+y).

3) If for any of the pairs selected linearity is violated (i.e., f(x)+f(y) f(x+y)), then REJECT, otherwise ACCEPT.

Observe: If f is linear then tests accepts w.p. 1.

Theorem: If f is -far from linear then with probability at least 2/3 the test rejects it.

Page 11: A Tutorial on Property Testing Dana Ron Tel Aviv University

Linearity Testing Cont’Proof (of special case): Let (f) denote distance of f to closest linear function g. Assume 1/2 - (f) is constant.Let G={x: f(x)=g(x)} (so that Pr[xG]= (f)>).

Say that x and y are a violating pair if f(x)+f(y) f(x+y). Observation: for any x, y, if among the 3 elements, x, y, x+y we have 2 in G and 1 not in G, then x,y are a violating pair.

Consider one of the 3 (disjoint) events. Can show:Pr[xG , yG , (x+y) G ] (f) (1 - 2 (f) ).

Since events are disjoint, prob of violating pair is at least 3(f) (1 - 2 (f) ) = 6 (f) (1/2- (f) ) = ().

Since test takes (1/) pairs x,y, will reject w.h.p.

Page 12: A Tutorial on Property Testing Dana Ron Tel Aviv University

Linearity Testing Cont’How do we deal with the general case (where (f) not necessarily bounded away from 1/2)?

In order to prove that if (f)> then reject w.p. 2/3 , prove contrapositive: if accept w.p > 1/3 (i.e., small fraction of violating pairs) then f is -close to linear. That is, exists linear g s.t. dist(f,g) .

Specifically, define g as follows: g(x) = 1 if Pry[f(x+y)-f(y)=1] 1/2 g(x) = 0 if Pry[f(x+y)-f(y)=0] > 1/2

Can prove that if fraction of violating pairs (w.r.t. f) is sufficiently small the f is close to g and g is linear.

Note: definition of g allows for Self-Correcting of f (for every x can determine g(x) w.h.p by few queries to f).

Page 13: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing “Basic” Properties of Functions: Singletons, Monomials, small DNF [PRS]

Page 14: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing “Basic” Properties of Functions:

This work considers “The most basic” function classes:

• Singletons:

• Monomials:

• DNF:

ixxf )(

kji xxxxf )(

)()()( mkji xxxxxxf

Page 15: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing “Basic” Properties of Functions Cont’

• Can test whether f is a singleton using queries.

• Can test whether f is a monomial using queries.

• Can test whether f is a monotone DNF with at most t terms using queries.

)/1( O

)/1( O

)/(~ 2 tO

Common theme: no dependence in query complexity on size of input, n, and polynomial dependence on distance parameter, .

Page 16: A Tutorial on Property Testing Dana Ron Tel Aviv University

Learning Boolean Formulae

Basic observation: (proper) learning implies testing.

Main difference w.r.t testing results: no dependence on n and different algorithmic approach.

• Can learn singletons and monomials under uniform distribution using queries [BEHW].

• Can properly learn monotone DNF with t terms and r literals using queries [A+BJT].

)/(log nO

))/1(/log(~ 2 rtnrO

F

f h

Ff

h

Page 17: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing (Monotone) Singletons

Singletons satisfy: (1) (2) yxyfxfyxf ,)()()(

2/1]1)(Pr[ xf

Natural test: check, by sampling, that conditions hold (approximately).

Can analyze natural test for case that distance between function and class of singletons is not too big (bounded from 1/2).

Page 18: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Singletons II - Parity Testing

Observation: Singletons are a special case of parity functions (i.e., functions of the form .)i

Sixxg

)(

Claim: Let . If then

iSixxg

)( 2|| S

4/1)]()()(Pr[ ygxgyxg

Modified algorithm:

(1) Test whether f is a parity function (with dist. par. ) using algorithm of [BLR] .

(2) Uniformly select constant number of pairs x,y and check whether any is a violating pair (i.e.: ). )()()( yfxfyxf

Page 19: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Singletons III - Self Correcting

Use Self-Corrector of [BLR] to “fix” f into parity function (g), and then test violations on self-corrected version.

This “almost works”: If f is singleton - always accepted. If f is -far from parity - rejected w.h.p. But if f is -close to parity function g, then cannot simply apply claim to argue that many violating pairs w.r.t. f.

If we could only test violations w.r.t. g instead of f ...

Page 20: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Singletons IIII - The Algorithm

Final Algorithm for Testing Singletons: (1) Test whether f is a parity function with dist. par. using algorithm of [BLR] . (2) Uniformly select constant number of pairs x,y. Verify that

Self-Cor(f,x) Self-Cor(f,y) = Self-Cor(f,xy) .

(3) Verify that Self-Cor( ) = 1 .1

Page 21: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Monomials and Monotone DNF

Monomial testing algorithm has similar structure to Singleton testing algorithm. (Here too suffice to find test for monotone monomials.)

The first stage of linearity testing is replaced by Affinity Testing: if f is a monomial then F1={x: f(x)=1} is an affine subspace. [Fact: H is affine subspace i.f.f x,y,zH, xyz H]. Affinity test is similar to parity test: select x,yF1, z{0,1}n, verify that f(xyz)=f(x)f(y)f(z).

The second stage is as in singleton test (check for violating pairs). Here affinity adds structure that helps analyze second stage.

Testing monotone DNF: use monomial test as sub-routine (a monotone DNF function is a disjunction of monotone monomials).

Page 22: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Graph Properties [GGR]

Page 23: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Graph Properties

Assume graphs are represented by their adjacency matrix. In this model, testing algorithm can perform queries: “is there an edge between u and v”. Distance between graphs: fraction of entries in adjacency matrix on which they differ.This model most appropriate for testing dense graphs.

v

u 1

Page 24: A Tutorial on Property Testing Dana Ron Tel Aviv University

Results for Testing Graph Properties

• Can test: Bipartiteness, k-colorability, -Clique, -Cut and a more general family of partition problems, with sample complexity poly(1/and running time exp(poly(1/both independent of size of graph [GGR].

• Can test all properties that can be formulated by first order expression about graphs with sample and time complexity independent of graph size (but at “steep” cost as function of 1/[AFKS].

• In directed graphs can test acyclicity with sample and time complexity poly(1/[BR] (special case treated in [EKKRV]).

In Adjacency-Matrix model

In Incidence-Lists model

Connectivity, k-edge-connectivity: complexity poly(1/[GR1], Bipartiteness: poly(1/|V|1/2[GR2], Diameter: poly(1/[PR].

Page 25: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Bipartiteness

Def: Graph G=(V,E) is bipartite i.f.f. can partition vertices into two subsets V1 and V2 s.t. there are no edges between vertices that are both in V1 or both in V2.

V1 V2

Recall that can decide whether graph is bipartite in time O(|V|+|E|) by Breadth First Search (BFS). However, we want very fast approximate decision. Furthermore, can extend algorithm and analysis to testing k-colorability (which is NP-Hard).

Page 26: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Bipartiteness Cont’

Bipartite Testing Algorithm

• Uniformly and independently select m=(log(1/)/2) vertices in graph.

• For every pair of vertices selected query whether there is an edge between the two, obtaining induced sub-graph.

• Perform a BFS to determine whether induced subgraph is bipartite. If it is output accept, o.w. output reject.

Query complexity and running time of algorithm: O(log2(1/)/4) . Slight variant of alg yields O(log2(1/)/3) and [AK] have reduced to O(log2(1/)/2) . Correctness: If graph is bipartite then clearly always accepted. From this point on assume graph is -far from bipartite. Will show that rejected w.p. at least 2/3.

G

Page 27: A Tutorial on Property Testing Dana Ron Tel Aviv University

Analysis of Bipartiteness Testing Alg

Def: Let X be a subset of points, and (X1,X2) a partition of X. Say that an edge (u,v) is violating w.r.t. (X1,X2) if either both u,v in X1 or both in X2.

View sample as consisting of two parts: U and S. Show that w.h.p., for every partition (U1,U2) of U there is no partition (S1,S2) of S, s.t. (U1S1,U2S2) is bipartite.

u

vX1 X2

X1 X2If there are no violating edges w.r.t. (X1,X2) then say it is a bipartite partition.

In other words, the sub-graph induces by sample US is not bipartite.

U1 U2

S

U1 U2

S

Page 28: A Tutorial on Property Testing Dana Ron Tel Aviv University

Analysis of Bipartiteness Testing Alg Cont’

Def1: A vertex v is influential if has degree at least ( /4)|V|.

Def2: A vertex v is covered by subset U if has neighbor in U.

Lem: W.h.p. U covers all influential vertices but ( /4)|V|.

U

v

V

U

Influential Non-influentialUncovered influential

Page 29: A Tutorial on Property Testing Dana Ron Tel Aviv University

Analysis of Bipartiteness Testing Alg Cont’ Let C be vertices covered by U and let R be remaining vertices.

Observe: Since R contains at most all non-influential vertices, and at most ( /4)|V| influential ones, total num of edges incident to R is at most ( /2)|V|2.

Recall, graph G is -far from bipartite: every partition (V1,V2) of

V has > |V|2 violating edges.

Together, above two imply that every partition of UC has > ( /2)|V|2 violating edges.

C

U

R

Uncovered influential

Non-influential

Page 30: A Tutorial on Property Testing Dana Ron Tel Aviv University

Analysis of Bipartiteness Testing Alg Cont’ Consider fixed partition (U1,U2) of U , and let (C1,C2) be partition of C where neighbors of vertices in U1 are put in C2 and neighbors of vertices in U2 are put in C1.

U1

C2C1

U2 Since (U1C1,U2C2) contains

> ( /2)|V|2 violating edges, this many pairs of vertices (v,w) in C1 (C2) have violating edge between them.

If get such pair (v,w) in sample S, then for every partition (S1,S2), partition (U1S1,U2S2) contains some violating edge.

v

w

Since many such pairs, the sample S contains such a pair w.h.p. By union bound on number of partitions (U1,U2) (at

most 2|U|= exp(log(1/)/)) S contains such a pair for every (U1,U2).

Page 31: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Other Graph (Partition) Properties Each property (k-colorability, -Clique, -Cut ) has its own “particularities” but in all cases:

• “Natural algorithm” (take small uniform sub-sample and check induced subgraph for property) works.

• Analysis works by breaking sample into two parts: the first part, U “forces” constraints on possible partitions of all vertices. Second part, S, “tests” whether constraints are satisfied.

More general results of [AFKS] (combination of partition and forbidden subgraph properties ( properties)) also analyze natural algorithm. Analysis builds on Szemerdi’s regularity lemma.

Page 32: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Properties of Strings: Membership in Regular Languages

[AKNS]

Page 33: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Membership in Regular Languages

For fixed regular language L {0,1}*, testing algorithm should accept w.h.p. every word wL, and should reject w.h.p. every word w that differs on more than nbits (n=|w|) from every w’L (|w’|=n). Algorithm can query any bit wi of w.

Let M=(Q,F,q0,) be the (minimum) DFA that accepts L. Let G(M) denote directed graph induced by M (that is, there is a directed edge for every transition).

Def: Let u=wi…wj be sub-word of w that starts at position i. Say that u is feasible w.r.t. M starting from i if there exists a state q s.t. q can be reached in G(M) from q0 in exactly i-1 steps, and there is a path of length (n-(|u|+i-1)) in G(M) from q’= (q,u) to an accepting state qf.

q0 q q’ qfi-1 steps u n-(|u|+i-1) steps

Page 34: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Regular Language Cont’

- The GCD of cycle-lengths in G(C) is 1 There exists a constant r (=O(|Q|2) s.t. q,q’ C , m r , exists path of length m from q to q’.

Consider special case:

• Unique accepting state qf ;

• Q can be partitioned into two parts: C and D: - q0,qf C ; - subgraph G(C) strongly connected; - no edges from D to C.

CD

q0

qfq

q’

Page 35: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Regular Language Cont’

The Algorithm (simplified version):

• Uniformly and independently select (r/) indices 1i n .

• For each i selected, check that the substring wi … wi+r/

is feasible.

• If any substring is infeasible then reject, otherwise accept.

Number of queries: O(r2/ )=poly(|Q|)/ and running time poly(|Q|)/ (can improve to almost linear dependence on 1/ ).

Correctness: If wL, then always accept.If w is -far from L , would like to show that w contains many (short) infeasible substrings (causing rejection w.h.p).

Page 36: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Regular Language Cont’

Prove contrapositive statement: If number of (short) infeasible substrings in w is small then w is close to w*L

Proof idea: partition w (except first and last r symbols) into disjoint maximal feasible substrings u1, … ,uh : each uj is feasible, but addition of next symbol wk makes it infeasible.

By slightly modifying each uj , can “glue” the modified substrings together into one string w* that “does not leave C”, and reaches qf. If h is small (as assumed), the w* close to w.

CD

qj

qj’uj

wk

qj+1

q’j+1uj+1

Page 37: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Regular Language Cont’

General case works by reducing to special case we discussed. In particular need to decompose G(M) into its strongly connected components, and consider how a word “moves between them”.

This work has been extended by Newman to testing Branching Programs of bounded width, and by Kupferman and XX to testing Tree Automata.

Page 38: A Tutorial on Property Testing Dana Ron Tel Aviv University

Directions for Further Research

“Biggest” open problem: Can we characterize what properties are efficiently testable? (e.g., find a measure analogous to VC - dimension.)

Find Families of properties that are efficiently testable. Exist some such results for testing graph properties (e.g. partition problems) and we have the regular languages result.

Extend scope of property testing.

Page 39: A Tutorial on Property Testing Dana Ron Tel Aviv University
Page 40: A Tutorial on Property Testing Dana Ron Tel Aviv University

Testing Properties of Collections of Points: Testing of Clustering

Page 41: A Tutorial on Property Testing Dana Ron Tel Aviv University

Property Testing - Background

Properties of functions:

• Initially defined by Rubinfeld and Sudan in the context of Program Testing. Tested algebraic properties of functions: low-degree polynomials.

• Other work on testing algebraic properties: [BLR,R,EKKRV...].

• Non-algebraic properties: Monotonicity [GGLRS,DGLRSS,B,FN].

Properties of other objects:

• Main focus: Graph properties: [GGR,GR,AK,AFKS,BR,PR,CS...]

• Growing body of work deals with properties of strings [AKNS,N,PRR], sets of points [PR], geometric objects [CSZ], distributions [BFRW], and more.

All algorithms have complexity that is sub-linear in (or even independent of) size of object.