minimax rates for homology inference

Minimax Rates for

Homology Inference

Don Sheehy

Joint work with Sivaraman Balakrishan,

Alessandro Rinaldo, Aarti Singh, and

Larry Wasserman

Something like a joke.


What is topological inference?



It’s when you infer the topology of a space given only a finite subset.

We add geometric and statistical hypotheses to make the problem well-posed.

Geometric Assumption: The underlying space is a smooth manifold M.

Statistical Assumption:The points are drawn i.i.d. from a distribution derived from M.

Input: n points from a d-manifold M in D-dimensions.


Output: The homology of M.



Upper bound: What is the worst case complexity?




Lower Bound: What is the worst case complexity of the best possible algorithm?





sampled i.i.d.





sampled i.i.d. distributionsupported on






with noise






an estimate of with noise







probability of giving

a wrong answer





The Goal: Matching Bounds

(asymptotically)



probability of giving

a wrong answer

Minimax risk is the error probability of the best estimator on the hardest examples.


Minimax Risk: Rn = infH

supQ!Q

Qn(H != H(M))



supQ!Q

Qn(H != H(M))

the best estimator



supQ!Q

Qn(H != H(M))

the best estimator the hardest

distribution



supQ!Q

Qn(H != H(M))


distributionproduct distribution



supQ!Q

Qn(H != H(M))


distributionproduct distribution the true

homology



supQ!Q

Qn(H != H(M))

Sample Complexity: n(!) = min{n : Rn ! !}


distributionproduct distribution the true

homology

We assume manifolds without boundary of bounded volume and reach.


Let M be the set of compact d-dimensionalRiemannian manifolds without boundary such that



M ! ballD(0, 1)1



M ! ballD(0, 1)1vol(M) ! cd2



M ! ballD(0, 1)1vol(M) ! cd2The reach of M is at most ! .3



Let P be the set of probability distributionssupported over M ! M with densities boundedfrom below by a constant a.

M ! ballD(0, 1)1vol(M) ! cd2The reach of M is at most ! .3

We consider 4 different noise models.Noiseless Clutter

Tubular Additive


Tubular Additive

Q = P


Tubular Additive

Q = PQ = (1! !)U + !P

P ! P

U is uniformon ball(0, 1)


Tubular Additive

Q = PQ = (1! !)U + !P

P ! P


Let QM,! be

uniform on M!.

Q = {QM,! : M ! M}


Tubular Additive

Q = PQ = (1! !)U + !P

P ! P


Let QM,! be

uniform on M!.

Q = {QM,! : M ! M}

Q = {P ! ! : P ! P}

! is Gaussian

with ! ! "

or ! has Fourier transformbounded away from 0and ! is fixed.

Le Cam’s Lemma is a powerful tool for proving minimax lower bounds.


Lemma. Let Q be a set of distributions. Let !(Q) take valuesin a metric space (X, ") for Q ! Q. For any Q1, Q2 ! Q,

inf!

supQ!Q

EQn

!

"(!, !(Q))"

"1

8"(!(Q1), !(Q2))(1#TV(Q1, Q2))

2n



inf!

supQ!Q

EQn

!

"(!, !(Q))"

"1

8"(!(Q1), !(Q2))(1#TV(Q1, Q2))

2n

For homology, use the trivial metric. !(x, y) =

!

0 if x = y

1 if x != y



inf!

supQ!Q

EQn

!

"(!, !(Q))"

"1

8"(!(Q1), !(Q2))(1#TV(Q1, Q2))

2n


!

0 if x = y

1 if x != y

infH

supQ!Q

Qn(H != H(M)) "1

8(1# TV(Q1, Q2))

2n



inf!

supQ!Q

EQn

!

"(!, !(Q))"

"1

8"(!(Q1), !(Q2))(1#TV(Q1, Q2))

2n


!

0 if x = y

1 if x != y

infH

supQ!Q

Qn(H != H(M)) "1

8(1# TV(Q1, Q2))

2nRn =

The lower bound requires two manifolds that are geometrically close but topologically distinct.


B = balld(0, 1! !) A = B \ balld(0, 2!)


B = balld(0, 1! !) A = B \ balld(0, 2!)

M1 = !(B! ) M2 = !(A! )


B = balld(0, 1! !) A = B \ balld(0, 2!)

The overlap

M1 = !(B! ) M2 = !(A! )

It suffices to bound the total variation distance.

It suffices to bound the total variation distance.Total Variation Distance:

TV(Q1, Q2) = supA

|Q1(A)!Q2(A)|

" amax{vol(M1 \M2), vol(M2 \M1)}

" Cda!d


Minimax Risk:

Rn !1

8(1" TV(Q1, Q2)

2n !1

8(1" Cda!

d)2n !1

8e!2Cda!

dn

Total Variation Distance:

TV(Q1, Q2) = supA

|Q1(A)!Q2(A)|


" Cda!d


Minimax Risk:

Rn !1

8(1" TV(Q1, Q2)

2n !1

8(1" Cda!

d)2n !1

8e!2Cda!

dn

Sampling Rate:n(!) !

!

1

"

"d

log1

#

Total Variation Distance:

TV(Q1, Q2) = supA

|Q1(A)!Q2(A)|


" Cda!d

The upper bound uses a union of balls to estimate the homology of M.


Take a union of balls.1


Take a union of balls.1Compute the homology of the resulting Cech complex.2



0 Denoise the data.



0 Denoise the data.

To prove: The density is bounded from below near M and from above far from M.

Many fundamental problems are still open.


1 Is the reach the right parameter?



2 What about manifolds with boundary?




3 Homotopy equivalence?





4 How to choose parameters?





4 How to choose parameters?

5 Are there efficient algorithms?

Thank you.

minimax rates for homology inference

Documents

n points

minimax risk

worst case complexity

nestimato d istributio

infsup n q h

smooth manifold

statistical assumption

geometric assumption