# distributed (local) monotonicity reconstruction

Post on 24-Feb-2016

35 views

Category:

## Documents

Tags:

• #### different g

Embed Size (px)

DESCRIPTION

Distributed (Local) Monotonicity Reconstruction. Michael Saks Rutgers University C. Seshadhri Princeton University (Now IBM Almaden). Overview. Introduce a new class of algorithmic problems: Distributed Property Reconstruction (extending framework of program self-correction, - PowerPoint PPT Presentation

TRANSCRIPT

Self Improving Algorithms

1Distributed (Local) Monotonicity ReconstructionMichael SaksRutgers University

2OverviewIntroduce a new class of algorithmic problems:Distributed Property Reconstruction(extending framework of program self-correction, robust property testing locally decodable codes)

A solution for the property Monotonicity

3Data SetsData set = function f : V = finite index setV = value setIn this talk, = [n]d = {1,,n}d V = nonnegative integersf = d-dimensional array of nonnegative integers

4For f,g with common domain :

dist(f,g) =fraction of domain where f(x) g(x) Distance between two data sets5Properties of data setsFocus of this talk:

Monotone: nondecreasing along every line (Order preserving) When d=1, monotone = sorted6Some Algorithmic problems for PGiven data set f (as input):Recognition: Does f satisfy P?RobustTesting:(Define (f) = min{ dist(f,g) : g satisfies P})For some 0 1 < 2 < 1, output either(f) > 1 :f is far from P(f) < 2: f is close to P (If 1 < (f) 2 then can decide either)7Property ReconstructionSetting: Given f We expect f to satisfy P (e.g. we run algorithms on f that assume P) but f may not satisfy P

Reconstruction problem for P: Given data set f, produce data set g that satisfies Pis close to f: d(f,g) is not much bigger than (f)

8 What does it mean to produce g?Offline computationInput: function table for f Output: function table for g

9Distributed monotonicity reconstructionWant algorithm A that on input x, computes g(x)

may query f(y) for any yhas access to a short random string sand is otherwise deterministic.

10Distributed Property ReconstructionGoal: WHP (with probability close to 1) (over choices of random string s):

g has property Pd(g,f) = O( (f) ) Each A(x) runs quicklyin particular only reads f(y) for a small number of y.

11Distributed Property ReconstructionPrecursors: Online Data Reconstruction Model(Ailon-Chazelle-Liu-Seshadhri)[ACCL]Locally Decodable Codes and Program self-correction (Blum-Luby-Rubinfeld; Rubinfeld-Sudan; etc )Graph Coloring (Goldreich-Goldwasser-Ron)Monotonicity Testing (Dodis-Goldreich- Lehman-Raskhodnikova-Ron-Samorodnitsky; Goldreich-Goldwasser- Lehman-Ron-Samorodnitsky;Fischer;Fischer-Lehman-Newman-Raskhodnikova-Rubinfeld-Samorodnitsky;Ergun-Kannan-Kumar-Rubinfeld-Vishwanathan; etc)Tolerant Property Testing (Parnas, Ron, Rubinfeld)

12Example: Local Decoding of Codes Data set f = boolean string of length n

Property = is a Code word of a given error correcting code C

Reconstruction = Decoding to a close code wordDistributed reconstruction = Local decoding13Key issue: making answers consistentFor error correcting code, can assume input f decodes to a unique g.The set of positions that need to be corrected is determined by f.For general property, many different g (even exponentially many) that are close to f may have the property We want to ensure that A produces one of them.

14An exampleMonotonicity with input array:

1,.,100, 111,,120,101,,110,121,,200.

15Monotonicity Reconstruction: d=1 f is a linear array of length nFirst attempt at distributed reconstruction:

A(x) looks at f(x) and f(x-1)

If f(x) f(x-1),then g(x) = f(x)

Otherwise, we have a non-monotonicityg(x) = max { f(x) , f(x-1) }

16Monotonicity Reconstruction: d=1 Second attempt

Set g(x) = max{ f(1), f(2),, f(x) }

g is monotone but

A(x) requires time (x)dist(g,f) may be much larger than (f)

17Our results (for general d )A distributed monotonicity reconstruction algorithm for general dimension d such that:

Time to compute g(x) is (log n)O(d)dist(f,g) = C1(D) (f) Shared random string s has size (d log n)O(1)

(Builds on prior results on monotonicity testing and online monotonicity reconstruction.)18Which array values should be changed?A subset S of is f-monotoneif f restricted to S is monotone.

For each x in , A(x) must:Decide whether g(x) = f(x)If not , then determine g(x)Preserved = { x : g(x) = f(x) }Corrected = { x : g(x) f(x) }In particular, Preserved must be f-monotone19Identifying PreservedThe partition (Preserved, Corrected)must satisfy:

Preserved is f-monotone|Corrected|/|| = O((f))

Preliminary algorithmic problem:20Classification problemClassify each y in as Green or RedGreen is f - monotoneRed has size O((f)||)

Need subroutine Classify(y).21A sufficient condition for f-monotonicityA pair (x,y) in is a violation ifx < y and f(x) > f(y)

To guarantee that Green is f - monotone:

Red should hit all violations:

For every violation (x,y) at least one of x,y is Red

22Classify: 1-dimensional cased=1: ={1,,n} f is a linear array.

For x in , and subinterval J of :violations(x,J)=|{y in J : (x,y) is a violation}|

23Constructing a large f-monotone setThe set Bad: x in Bad if for some interval J containing x|violations(x,J)||J|/2

Lemma.1)Good= - Bad is f-monotone2)|Bad| 4 (f)|| . Proof: If x,y are a violation then one of them is Bad for the interval [x,y].

So wed like to take:Green=Good Red = Bad

25How do we compute Good?To test whether y in Good:For each interval J containing y, check violations(y,J)< |J|/2DifficultiesThere are (n) intervals J containing yFor each J, computing violations(y,J) takes time (|J|) . 26Speeding up the computationEstimate violations(y,J) by random sampling sample size polylog(n) is sufficient

violations* (y,J) denotes the estimate

Compute violations* (y,J) only for a carefully chosen set of test intervals27Set of test intervalsWant set T of intervals of [n] with:

Each x is in few ( O(log n) ) intervals of T

If S is any subset of T, then S has a subfamily C such that: each x in US belongs to 1 or 2 sets of C.

For any interval I, there is a J in T containing I,of length at most 4|I|.28The Test Set TAssume n=||=2kk layers of intervalsLayer j consists of 2k-j+1-1 intervals of size 2j

29View T as a DAGEvery point x belongs to 2(log n) intervalsAll intervals in a level have same sizeFor any subcollection S of T, the subset of S-maximal intervals covers each point of the union exactly once or twice For x xSome interval beginning with y and containing x has a high density of Reds.(There is a non-trivial chance that Sample(z)has no Green points y.)

xyz39Thinning out Green* (x)Green* (x) = { y: y in Sample(x) , y Green}m*(x) = max {y in Green* (x)}

Green^(x) = { y: y in Green* (x) , y safe for x}m^(x) = max {y in Green^ (x)}

(Hiding: Efficient implementation of Green^(x))40Redefining Green^(x)WHP

if x y, then m^(x) m^(y)

{x: m^(x) x} is O((f) ||).41Summary of 1-dimensional caseClassify points as Green and RedFew Red pointsf restricted to Green is f-monotoneFor each x, choose Sample(x) size polylog(n)All points less than xDensity inversely proportional to distance from xGreen^ (x) from Sample(x) that are safe for xm^(x) is the maximum of Green^(x) Output g(x)=f(m^(x))42Dimension greater than 1For x < y, want g(x) < g(y)

xy43Red/Green ClassificationExtend the Red/Green classification to higher dimensions:f restricted to Green is MonotoneRed is small

Straightforward (mostly) extension of 1-dimensional case44Given Red/Green classificationIn the one-dimensional case, Green^ (x) = sampled Green points safe for x g(x) = f(max {y : y in Green^ (x) }

.45The Green points below xSet of Green maxima could be very largeSparse Random Sampling will only roughly capture the frontierFinding an appropriate definition of unsafe points is much harder than in the one dimensional case

01x46Further workThe g produced by our algorithm hasd(g,f) C(d)(f)|| Our C(d) is exp(d2) .What should C(d) be? (Guess: C(d) = exp(d) )

Distributed reconstruction for other interesting properties?(Reconstructing expanders, Kale,Peres, Seshadhri, FOCS 08)