topological data analysis - columbia universityverma/classes/uml/lec/uml_lec11_tda...topological...

119
Topological Data Analysis Geelon So (ags2191) July 19, 2018 1 / 91

Upload: others

Post on 01-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Data Analysis

Geelon So (ags2191)

July 19, 2018

1 / 91

Page 2: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Manifolds

“Manifolds are spaces that locally look like Euclidean space.”

Figure 1: ‘Local’ region U identified with a piece of Euclideanspace Rn. [L2003]

2 / 91

Page 3: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Manifolds

“Manifolds are spaces that locally look like Euclidean space.”

Figure 1: ‘Local’ region U identified with a piece of Euclideanspace Rn. [L2003]

2 / 91

Page 4: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Manifolds

But what does locally mean? Why care about locality?

3 / 91

Page 5: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Metric spaces

Metric spaces have a notion of distance:

d : X ×X → R.

4 / 91

Page 6: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Space

A topological space is one on which similar points behave similarly.

I comes with a notion of similarity (a ‘topology’)

I continuous functions are maps that ‘respect’ similarity

5 / 91

Page 7: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Space

A topological space is one on which similar points behave similarly.

I comes with a notion of similarity (a ‘topology’)

I continuous functions are maps that ‘respect’ similarity

5 / 91

Page 8: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topology

DefinitionA topology T of a space X is a collection of subsets of X (calledopen sets) such that:

(i) ∅, X ∈ T(ii) T is closed under finite intersection

(iii) T is closed under arbitrary union

6 / 91

Page 9: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Learning Setup: Interlude

I X is the space our data comes fromI f a computation/measurement on X

I we think of f as a partial function with some domain A ⊂ XI if x ∈ A, then f(x) returns > in finite timeI otherwise, f(x) does not halt

I f1, f2, . . . a collection of ‘primitive measurements’ on XI these are all the physical measurements we can make on X in

finite time

7 / 91

Page 10: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Learning Setup: Interlude

DefinitionA function f is computable if it is either:

I a primitive measurement

I the conjunction of a finite number of computable functions

I the disjunction of a countable number of computable functions

8 / 91

Page 11: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Invariants

I cardinality

I number of connected components

I compactness

I metrizability

I separation

I homology group

I etc.

9 / 91

Page 12: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Invariants

If we know how points in a space are related, we can saysomething about its shape.

I How are points connected?

10 / 91

Page 13: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

The Classical Example

Figure 2: The donut is topologically equivalent to the coffee mug. [link]

11 / 91

Page 14: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Data Analysis: the hope

Topology studies the ‘shape’ of objects.

I What is the shape of data?

Topological invariants are indifferent to ‘nice deformations’.

I Any robust statistics about our data?

The topology of a space determine which functions are possible.

I Does knowing the topology of data improve learnability?

12 / 91

Page 15: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Data Analysis: the hope

Topology studies the ‘shape’ of objects.

I What is the shape of data?

Topological invariants are indifferent to ‘nice deformations’.

I Any robust statistics about our data?

The topology of a space determine which functions are possible.

I Does knowing the topology of data improve learnability?

12 / 91

Page 16: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Data Analysis: the hope

Topology studies the ‘shape’ of objects.

I What is the shape of data?

Topological invariants are indifferent to ‘nice deformations’.

I Any robust statistics about our data?

The topology of a space determine which functions are possible.

I Does knowing the topology of data improve learnability?

12 / 91

Page 17: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Example: space determines the possible functions

Figure 3: If you travel along a straight road from A to B, Icould deduce how fast you were traveling at some pointjust by observing how long it took. (No wormholes). [link]

13 / 91

Page 18: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Intuition into Topology

1. Defining simplicial complexes as model spaces

2. Generating simplicial complexes from data

3. Understanding the shape of data (simplicial homology)

4. Summarizing data (persistence homology)

14 / 91

Page 19: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

How are points connected?

Figure 4: To model how NYC is connected for cars, use a graph. [link]

15 / 91

Page 20: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

1-simplex is an edge

DefinitionA 1-simplex [v0, v1] is a graph G = (S0, S1) where

S0 =v0, v1

S1 =

v0, v1

,

where S0 are the vertices and S1 are the edges.

16 / 91

Page 21: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Model 1-dimensional connection with 1-simplices

A 1-dimensional simplicial complex (a union of 1-simplexes) canrepresent the 1D ‘topology’ for a car.

I i.e. a graph

17 / 91

Page 22: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Model 1-dimensional connection with 1-simplices

A 1-dimensional simplicial complex (a union of 1-simplexes) canrepresent the 1D ‘topology’ for a car.

I i.e. a graph

17 / 91

Page 23: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Model n-dimensional connections with n-simplexes

Figure 5: The 0-, 1-, 2-, and 3-simplex are ‘high-dimensional edges’. [link]

18 / 91

Page 24: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Approximation of Topological Spaces

DefinitionAn n-simplex [v0, . . . , vn] is a hypergraph K = (S0, S1, . . . , Sn)where

S0 =v0, . . . , vn

,

S1 =vi, vj : 0 ≤ i < j ≤ n

...

Sn =v0, . . . , vn

.

An element of Sk is a k-dimensional face of K.

19 / 91

Page 25: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Approximation of Topological Spaces

Figure 6: Approximation of the torus. [link]

20 / 91

Page 26: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Simplicial Complex

DefinitionA simplicial complex K is a set of simplexes such that:1

I every face of a simplex of K is also in K

I the intersection of any two simplexes σ1, σ2 ∈ K is a face ofboth σ1 and σ2

1Definition statement from Wikipedia.21 / 91

Page 27: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Simplicial Complex

Figure 7: Definition of simplicial complex so that this is excluded. [link]

22 / 91

Page 28: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Notation

In the following, we will tend to use:I X is an underlying space where data is generated from

I often a metric space (X, d)

I X = x1, . . . , xn is the point data

23 / 91

Page 29: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Graph from Data

Given data X and some measure of similarity/distance, how togenerate graph?

I k-nearest neighbor

I ε-graph

I etc.

24 / 91

Page 30: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Simplicial Complex from Data: Vietoris-Rips complex

Figure 8: Vietoris-Rips complex for data sampledfrom an annulus. [G2008]

25 / 91

Page 31: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Vietoris-Rips complex

DefinitionLet X be a set of points from a metric space, and ε ≥ 0. TheVietoris-Rips complex Ripsε(X) is the set of simplices[x0, . . . , xk] such that d(xi, xj) ≤ ε for all (i, j).2

I i.e. generate ε-graph from data, then ‘fill in’ any k-clique withthe k-simplex.

2Definition statement from [C2017].26 / 91

Page 32: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Vietoris-Rips complex

DefinitionLet X be a set of points from a metric space, and ε ≥ 0. TheVietoris-Rips complex Ripsε(X) is the set of simplices[x0, . . . , xk] such that d(xi, xj) ≤ ε for all (i, j).2

I i.e. generate ε-graph from data, then ‘fill in’ any k-clique withthe k-simplex.

2Definition statement from [C2017].26 / 91

Page 33: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Simplicial Complex from Data: Cech complex

Figure 9: Generating a Cech complex for point data. [G2008]

27 / 91

Page 34: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Cech complex

DefinitionLet X be as above. The Cech complex Cechε(X) is the set ofsimplices [x0, . . . , xk] such that the k+ 1 closed balls B(xi, ε) havea nonempty intersection.3

I i.e. draw ε-ball around points xi, and k-way intersectionsbecome k-simplexes.

3Definition statement from [C2017].28 / 91

Page 35: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Cech complex

DefinitionLet X be as above. The Cech complex Cechε(X) is the set ofsimplices [x0, . . . , xk] such that the k+ 1 closed balls B(xi, ε) havea nonempty intersection.3

I i.e. draw ε-ball around points xi, and k-way intersectionsbecome k-simplexes.

3Definition statement from [C2017].28 / 91

Page 36: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Cech complex vs. Rips

Figure 10: The Cech complex is not the same as the Vietoris-Ripscomplex. [G2008]

29 / 91

Page 37: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Rips vs. Cech complex

Figure 11: Zoom in to example where Cech (middle) andRips (right) complex differ. [C20097]

30 / 91

Page 38: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Rips vs. Cech complex

Ripsε(X) ⊂ Cechε(X) ⊂ Rips2ε(X).

31 / 91

Page 39: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

X and X

What about the relationship between the complex generated onpoint data X and its underlying space X?

32 / 91

Page 40: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Relationships

DefinitionLet X and Y be topological spaces. They are homeomorphic ifthere exists a continuous map f : X → Y with continuous inverse.

I f is just a ‘renaming’ of points

I so, one can choose to study either X or Y

33 / 91

Page 41: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Relationships

DefinitionLet X and Y be topological spaces. They are homeomorphic ifthere exists a continuous map f : X → Y with continuous inverse.

I f is just a ‘renaming’ of points

I so, one can choose to study either X or Y

33 / 91

Page 42: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Relationships

Definition (Informal)

X and Y are homotopy equivalent if they can be continuouslydeformed into each other.

I weaker than homeomorphism(i.e. X,Y homeomorphic =⇒ homotopy equivalent)

I guarantees certain topological properties (shapes) are shared

34 / 91

Page 43: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Relationships

Definition (Informal)

X and Y are homotopy equivalent if they can be continuouslydeformed into each other.

I weaker than homeomorphism(i.e. X,Y homeomorphic =⇒ homotopy equivalent)

I guarantees certain topological properties (shapes) are shared

34 / 91

Page 44: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Topological Property

DefinitionA space X is contractible if it is homotopy equivalent to a pointwithin X.

Figure 12: Which spaces are contractible? [link]

35 / 91

Page 45: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

What properties are preserved by the complexes?

Under certain conditions, the Cech complex will be homotopyequivalent to the underlying space X where the data came from.

I i.e. we can learn something about the shape of X from X

36 / 91

Page 46: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

What properties are preserved by the complexes?

Under certain conditions, the Cech complex will be homotopyequivalent to the underlying space X where the data came from.

I i.e. we can learn something about the shape of X from X

36 / 91

Page 47: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Good Cover

DefinitionA good cover of X is an open cover such that all finiteintersections of open sets are either empty or contractible.

Figure 13: A bad (left) and good (right) cover. [link]

37 / 91

Page 48: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Nerve Theorem

Theorem (Nerve Theorem)

Let X := x1, . . . , xn ⊂ X. If the open cover

B(xi, ε) : xi ∈ X

is a good cover of X, then Cechε(X) and X are homotopyequivalent.

38 / 91

Page 49: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Nerve Theorem (general)

DefinitionLet U be a collection of open sets in X. Then, the nerve C(U) isthe Cech complex generated from U .

I this contrasts to using the open cover B(xi, ε)

39 / 91

Page 50: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Nerve Theorem (general)

TheoremLet U be a good cover of X. Then, X and the nerve C(U) arehomotopy equivalent.

I this suggests an algorithm to study the topological propertiesof X—generate a good cover of X and compute the nerve

40 / 91

Page 51: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Nerve Theorem (general)

TheoremLet U be a good cover of X. Then, X and the nerve C(U) arehomotopy equivalent.

I this suggests an algorithm to study the topological propertiesof X—generate a good cover of X and compute the nerve

40 / 91

Page 52: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Mapper Algorithm Intuition

Let X be the space we want to understand.

1. Let f : X → Rd summarize the important aspects of X

2. Form open cover V of Rd

3. Induces an open cover f−1(V) on X

4. Ensure f−1(V is a good cover

5. Generate and study the nerve of open cover

41 / 91

Page 53: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Mapper Algorithm Intuition

Let X be the space we want to understand.

1. Let f : X → Rd summarize the important aspects of X

2. Form open cover V of Rd

3. Induces an open cover f−1(V) on X

4. Ensure f−1(V is a good cover

5. Generate and study the nerve of open cover

41 / 91

Page 54: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Mapper Algorithm Intuition

Let X be the space we want to understand.

1. Let f : X → Rd summarize the important aspects of X

2. Form open cover V of Rd

3. Induces an open cover f−1(V) on X

4. Ensure f−1(V is a good cover

5. Generate and study the nerve of open cover

41 / 91

Page 55: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Mapper Algorithm Intuition

Let X be the space we want to understand.

1. Let f : X → Rd summarize the important aspects of X

2. Form open cover V of Rd

3. Induces an open cover f−1(V) on X

4. Ensure f−1(V is a good cover

5. Generate and study the nerve of open cover

41 / 91

Page 56: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Mapper Algorithm Intuition

Let X be the space we want to understand.

1. Let f : X → Rd summarize the important aspects of X

2. Form open cover V of Rd

3. Induces an open cover f−1(V) on X

4. Ensure f−1(V is a good cover

5. Generate and study the nerve of open cover

41 / 91

Page 57: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Mapper Algorithm Intuition

Let X be the space we want to understand.

1. Let f : X → Rd summarize the important aspects of X

2. Form open cover V of Rd

3. Induces an open cover f−1(V) on X

4. Ensure f−1(V is a good cover

5. Generate and study the nerve of open cover

41 / 91

Page 58: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Computing the Nerve

Figure 14: The refined pullback cover of the height function andits nerve.4 [C2017]

4Caption statement from [C2017]42 / 91

Page 59: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Pullback Cover

DefinitionLet f : X → Y be continuous. If V is an open cover of Y , then:

f−1(V) := f−1(V ) : V ∈ V

is the pullback cover of X induced by (f,V). The refinedpullback cover is the collection of connected components off−1(V ) for V ∈ V.

43 / 91

Page 60: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Mapper Algorithm

Figure 15: Example of mapper algorithm on point cloud. [C2017]

44 / 91

Page 61: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Mapper Algorithm

Algorithm 1 Mapper

input X a data set, a distance/similarity measure, f : X → Rd,and a cover U of f(X)

1: For each U ∈ U , decompose f−1(U) into clusters CU,1, . . . , CU,k2: Compute the nerve of CU,i’s.3: return simplicial complex, the nerve

I choice of f , the filter or lens function

I choice of cover U (resolution/gain)

I choice of clustering algorithm

45 / 91

Page 62: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Choice of Filter

The choice of f determines similarity with respect to what:

I PCA coordinates/nonlinear dimensionality reductioncoordinates

I centrality function and eccentricity function:

central(x) =∑y∈X

d(x, y) ecc(x) = maxy∈X

d(x, y)

I density estimates

46 / 91

Page 63: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Choice of Cover

Mapper may be very sensitive to choice of cover. A standardchoice is evenly sized and spaced intervals

I resolution: the size of the intervals

I gain: the percent overlap of the intervals

Figure 16: A cover of R with resolution r and gain 25%. [C2017]

47 / 91

Page 64: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Choice of Clustering

We need to cluster the preimages f−1(U), to generate a goodcover.

I apply any clustering algorithm, or

I build neighborhood graph (k-NN or ε-graph)

48 / 91

Page 65: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Neural Codes

Figure 17: Place fields of for four place cells, recorded while a ratexplored a 2-dimensional square box environment.5 [Cu2017]

5Caption statement from [Cu2017]49 / 91

Page 66: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Neural Codes

Figure 18: Three environments and place fields that cover theunderlying space. [Cu2017]

50 / 91

Page 67: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Neural Codes

Figure 19: Spike trains and neural codes. [Cu2017]

51 / 91

Page 68: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Neural Codes

Figure 20: Codes and nerves of open covers. [Cu2017]

52 / 91

Page 69: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Mapping Disease Space

53 / 91

Page 70: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Mapping Disease Space

Figure 21: The path of an individual through a disease space. The choiceof filter through partially out-of-phase heath statistics can generate aspace with nontrivial fundamental group. [B2016]

54 / 91

Page 71: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Mapping Disease Space

Figure 22: Disease space of malaria-infected mice. [B2016]

55 / 91

Page 72: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Geometric Reconstruction

Figure 23: A point cloud sampled from a torus withvarying offset values r1 < r2 < r3. [C2017]

56 / 91

Page 73: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Geometric Reconstruction

If K is a compact set, let dK(x) := infy∈K d(x, y). Then, thereconstruction Xr is:

Xr = d−1X ([0, r]).

57 / 91

Page 74: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Geometric Reconstruction

Let φ = dX and ψ = dX.

Theorem (Reconstruction Theorem)

Suppose that the α-reach of φ is at least R. If

‖φ− ψ‖∞ < ε(α,R),

then there is some r(ε, α,R) such that ψ−1([0, r]) is homotopyequivalent to X.

I the α-reach is just a regularity condition

I it could be the case that given φ and ψ no reconstruction ispossible

58 / 91

Page 75: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Geometric Reconstruction

Let φ = dX and ψ = dX.

Theorem (Reconstruction Theorem)

Suppose that the α-reach of φ is at least R. If

‖φ− ψ‖∞ < ε(α,R),

then there is some r(ε, α,R) such that ψ−1([0, r]) is homotopyequivalent to X.

I the α-reach is just a regularity condition

I it could be the case that given φ and ψ no reconstruction ispossible

58 / 91

Page 76: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Geometric Reconstruction

Let φ = dX and ψ = dX.

Theorem (Reconstruction Theorem)

Suppose that the α-reach of φ is at least R. If

‖φ− ψ‖∞ < ε(α,R),

then there is some r(ε, α,R) such that ψ−1([0, r]) is homotopyequivalent to X.

I the α-reach is just a regularity condition

I it could be the case that given φ and ψ no reconstruction ispossible

58 / 91

Page 77: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Geometric Reconstruction

The Reconstruction theorem tells us when the Cech complex ishomotopy equivalent to the original space.

I But what can we deduce about X from Cech(X)?

59 / 91

Page 78: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Geometric Reconstruction

The Reconstruction theorem tells us when the Cech complex ishomotopy equivalent to the original space.

I But what can we deduce about X from Cech(X)?

59 / 91

Page 79: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Homology Theory

Homology theory, homotopy theory, and more generally,algebraic topology studies the topological features of a spacealgebraically.

60 / 91

Page 80: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Example: Hairy Ball Theorem

Figure 24: No nonzero smooth vector field exists on S2. [link]

61 / 91

Page 81: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Example: Brouwer’s Fixed Point Theorem

Theorem (Brouwer)

Let X ⊂ Rn be a convex compact set. If f : X → X iscontinuous, then f has a fixed point.

62 / 91

Page 82: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Example: Closed vs. Exact

Figure 25: A vector field V on R2 where ∇× V = 0everywhere except at 0. On R2, ∇× V ≡ 0⇔ V = ∇φ.What about R2 \ 0? [link]

63 / 91

Page 83: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Example: Stokes’ theorem

Theorem (Stokes)

Let Ω be an orientable manifold and ω be a differential form overits boundary ∂Ω. Then: ∫

∂Ωω =

∫Ω

dω.

64 / 91

Page 84: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

k-chains

DefinitionLet K be a simplicial complex and k a nonnegative number.Denote by Ck(K) the space of k-chains on K, the set of formallinear combinations of k-simplexes of K.

For example, if σ1, . . . , σp ∈ K are k-simplexes, a k-chain is:

c =

p∑i=1

αiσi,

where the αi’s are scalars.6

6Definition statement from [C2017].65 / 91

Page 85: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

k-chains

DefinitionLet K be a simplicial complex and k a nonnegative number.Denote by Ck(K) the space of k-chains on K, the set of formallinear combinations of k-simplexes of K.

For example, if σ1, . . . , σp ∈ K are k-simplexes, a k-chain is:

c =

p∑i=1

αiσi,

where the αi’s are scalars.6

6Definition statement from [C2017].65 / 91

Page 86: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

k-chains

The intuition is that Ck(K) is the vector space built over all thek-dimensional subcomponents of K.

66 / 91

Page 87: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundary operator

∂ : Ck(K)→ Ck−1(K)

I maps a k-dimensional object into its (k − 1)-dimensionalboundary

67 / 91

Page 88: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundary operator

∂ : Ck(K)→ Ck−1(K)

I maps a k-dimensional object into its (k − 1)-dimensionalboundary

67 / 91

Page 89: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundary operator

DefinitionLet σ = [v0, . . . , vk] be a k-simplex. Then the (k − 1)-chain:

∂(σ) :=

k∑i=1

(−1)i+1[v0, . . . , vi, . . . , vk]

is the boundary of σ. The boundary operator is the linearextension to Ck(K).

68 / 91

Page 90: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundary operator

Example

An edge e = [u, v] is a 1-simplex, and its boundary is:

∂([u, v]) = v − u.

69 / 91

Page 91: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundary of Boundary

Figure 26: What is the boundary of a sphere? What is thesphere a boundary of? [link]

70 / 91

Page 92: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundary of Boundary

∂k−1 ∂k ≡ 0.

71 / 91

Page 93: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundaries and Cycles

The following are (linear) subspaces of Ck(K):

DefinitionThe image im(∂k+1) is the space of boundaries Bk(K) of K.

DefinitionThe kernel ker(∂k) is the space of cycles Zk(K) of K.

72 / 91

Page 94: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundaries and Cycles

Because the boundary of a boundary is zero,

Bk(K) ⊂ Zk(K).

I the dimension of Zk(K) gives ‘how many ways we can makesubcomplexes of K that can be filled in?’

I the dimension of Bk(K) gives ‘how many of thesesubcomplexes are actually filled in?’

73 / 91

Page 95: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Boundaries and Cycles

Because the boundary of a boundary is zero,

Bk(K) ⊂ Zk(K).

I the dimension of Zk(K) gives ‘how many ways we can makesubcomplexes of K that can be filled in?’

I the dimension of Bk(K) gives ‘how many of thesesubcomplexes are actually filled in?’

73 / 91

Page 96: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Betti Number

Definition (Informal)

The Betti number βk of K is:

βk := dimZk(K)− dimBk(K).

I i.e. how many k-dimensional holes are there?I β0 = number of connected componentsI β1 = number of ‘circular’ holes (punctures)I β2 = number of ‘voids’

74 / 91

Page 97: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Betti Number

Definition (Informal)

The Betti number βk of K is:

βk := dimZk(K)− dimBk(K).

I i.e. how many k-dimensional holes are there?I β0 = number of connected componentsI β1 = number of ‘circular’ holes (punctures)I β2 = number of ‘voids’

74 / 91

Page 98: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Example

Figure 27: The circle itself is a cycle; however, because the whole space is1-dimensional, it is not the image of a 2-dimensional subspace. It is not aboundary. So, β1(S1) = 1, as there is a ‘1D hole’ in S1. [link]

75 / 91

Page 99: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Example

Figure 28: The torus has β0 = 1, β1 = 2, β2 = 1.

76 / 91

Page 100: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Homology Group

Definition (Formal)

The kth simplicial homology group of K is the quotient vectorspace

Hk(K) = Zk(K)/Ck(K).

The kth Betti number βk is βk(K) = dimHk(K).

I the homology group can be extended to general topologicalspaces.

77 / 91

Page 101: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Homology Group

Definition (Formal)

The kth simplicial homology group of K is the quotient vectorspace

Hk(K) = Zk(K)/Ck(K).

The kth Betti number βk is βk(K) = dimHk(K).

I the homology group can be extended to general topologicalspaces.

77 / 91

Page 102: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Homotopy Invariance

TheoremIf f : X → Y is a homeomorphism or a homotopy equivalence,then

Hk(X) ∼= Hk(Y )

are isomorphic.

78 / 91

Page 103: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Simplex and Space

Figure 29: A space can be studied by considering the appropriatesimplicial complex. [C2017]

79 / 91

Page 104: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

High-Level View

I the homology group summarizes the shape of an object

I homotopy equivalent spaces have the same homology group(“they have the same shape”)

I given certain conditions, the Cech complex of point data ishomotopy equivalent to underlying space

80 / 91

Page 105: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

High-Level View

I the homology group summarizes the shape of an object

I homotopy equivalent spaces have the same homology group(“they have the same shape”)

I given certain conditions, the Cech complex of point data ishomotopy equivalent to underlying space

80 / 91

Page 106: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

High-Level View

I the homology group summarizes the shape of an object

I homotopy equivalent spaces have the same homology group(“they have the same shape”)

I given certain conditions, the Cech complex of point data ishomotopy equivalent to underlying space

80 / 91

Page 107: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Challenge

We don’t have access to underlying space, so cannot proveregularity conditions about it.

I So, study the Cech complex over a range of parameters.

81 / 91

Page 108: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Challenge

We don’t have access to underlying space, so cannot proveregularity conditions about it.

I So, study the Cech complex over a range of parameters.

81 / 91

Page 109: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Persistence Barcodes

Figure 30: Persistence barcode for a function f : [0, 1]→ R. The diagramon the right plots β0, the number of connected components. Imaginewater filling up the graph on the left. [C2017]

82 / 91

Page 110: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Persistence Diagrams

Figure 31: Persistence barcode, plotting β1. [C2017]

83 / 91

Page 111: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Persistence Barcodes for Data

Figure 32: Persistence barcode, plotting β0 and β1. Thesize of the ball is called the filtration value. [C2017]

84 / 91

Page 112: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Viral Evolution

85 / 91

Page 113: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Viral Evolution

I data: viral genomes, with genetic distance as metricI compute persistence homology; at a filtration value ε:

I β0 represents the number of strains/subcladesI 1D topology provides information about horizontal evolution

86 / 91

Page 114: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Viral Evolution

Figure 33: (A) tree shows vertical evolution (B) DAG with horizontalevolution (C) trees are contractible (D) DAGs are not.7 [C2013]

7Caption statement from [C2013].87 / 91

Page 115: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Viral Evolution

Hypothesis: higher dimensional homology groups capture evenmore complex/multiple horizontal genetic exchange.

88 / 91

Page 116: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Viral Evolution

Hypothesis: higher dimensional homology groups capture evenmore complex/multiple horizontal genetic exchange.

88 / 91

Page 117: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Viral Evolution

Figure 34: Persistence barcode based on genetic distances in three genes:ENV, GAG, POL. Plot (D) is based on their concatenation. [C2013]

89 / 91

Page 118: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

Application to Viral Evolution

Figure 35: Representation of the recombination event with multipleparental strains. The vertices correspond to a HIV-1 subtype. [C2017]

90 / 91

Page 119: Topological Data Analysis - Columbia Universityverma/classes/uml/lec/uml_lec11_tda...Topological Data Analysis: the hope Topology studies the ‘shape’ of objects. I What is the

References

[B2016] Torres, Brenda Y., et al. “Tracking resilience to infections by mapping disease space.” PLoS biology14.4 (2016): e1002436.

[C2009] Carlsson, Gunnar. “Topology and data.” Bulletin of the American Mathematical Society 46.2 (2009):255-308.

[C2013] Chan, Joseph Minhow, Gunnar Carlsson, and Raul Rabadan. “Topology of viral evolution.” Proceedingsof the National Academy of Sciences 110.46 (2013): 18566-18571.

[C2017] Chazal, Frederic, and Bertrand Michel. “An introduction to Topological Data Analysis: fundamental andpractical aspects for data scientists.” arXiv preprint arXiv:1710.04019 (2017).

[Cu2017] Curto, Carina. “What can topology tell us about the neural code?.” Bulletin of the AmericanMathematical Society 54.1 (2017): 63-78.

[G2008] Ghrist, Robert. “Barcodes: the persistent topology of data.” Bulletin of the American MathematicalSociety 45.1 (2008): 61-75.

[L2003] Lee, John M. Introduction to Smooth Manifolds. Springer, New York, NY, 2003. 1-29.

[S2007] De Silva, Vin, and Robert Ghrist. “Coverage in sensor networks via persistent homology.” Algebraic &Geometric Topology 7.1 (2007): 339-358.

91 / 91