diffusion maps - tauamir1/course2012-2013/dm_presentation.pdf · diffusion maps since p is...

Post on 28-Sep-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Diffusion Maps

Aviv Rotbart

Tel Aviv University

December 2012

Aviv Rotbart (TAU) Diffusion Maps December 2012 1 / 40

Data Analysis Scope

Data Analysis

Manifold Learning

Kernel Methods

Diffusion Maps

Aviv Rotbart (TAU) Diffusion Maps December 2012 2 / 40

Data Analysis Scope

Data Analysis

Manifold Learning

Kernel Methods

Diffusion Maps

Aviv Rotbart (TAU) Diffusion Maps December 2012 2 / 40

Outline

1 Data Analysis Background

Aviv Rotbart (TAU) Diffusion Maps December 2012 3 / 40

Manifold learningWhy use manifolds?

availability of data⇓

many observable parameters⇓

high-dimensional recorded data

# parameters = O(10), O(100), . . .︷ ︸︸ ︷···

Data contain dependencies & redundanciesObservable space = non-linear mapping of few underlying factorsUnderlying locally low-dimensional in high-dimensional ambientspace

Aviv Rotbart (TAU) Diffusion Maps December 2012 4 / 40

Manifold learningThe goal

Aviv Rotbart (TAU) Diffusion Maps December 2012 5 / 40

Manifold learningThe goal

Aviv Rotbart (TAU) Diffusion Maps December 2012 5 / 40

Manifold learningThe goal

Aviv Rotbart (TAU) Diffusion Maps December 2012 5 / 40

Manifold learningKernel methods

Aviv Rotbart (TAU) Diffusion Maps December 2012 6 / 40

Manifold learningKernel methods

Aviv Rotbart (TAU) Diffusion Maps December 2012 6 / 40

Manifold learningDiffusion distances

Aviv Rotbart (TAU) Diffusion Maps December 2012 7 / 40

Diffusion maps1

Diffusion process & affinities

Gaussian kernel:k(x , y) , e−

‖x−y‖ε

Degrees: q(x) ,∑ k(x , y)

Transition probabilities:

p(x , y) ,k(x , y)

q(x)

Diffusion affinities:

a(x , y) ,k(x , y)√

q(x)√

q(y)

= q1/2(x)p(x , y)q−1/2(y)

1R.R. Coifman and S. Lafon. “Diffusion Maps”. In: Applied andComputational Harmonic Analysis 21.1 (2006), pp. 5–30.

Aviv Rotbart (TAU) Diffusion Maps December 2012 8 / 40

Diffusion mapsSpectral embedding

1 = λ0 ≥ λ1 ≥ λ2 ≥ · · · ≥ λδ > 0

φ0 φ1 φ2 · · · φδn

Spectrum (eigenvalues) of the diffusion affinity A and its powers

x 7→ Φ(x) , [λ0φ0(x) , λ1φ1(x) , λ2φ2(x) , . . . , λδφδ(x)]T

Aviv Rotbart (TAU) Diffusion Maps December 2012 9 / 40

Diffusion mapsSpectral embedding

1 = λ0 ≥ λ1 ≥ λ2 ≥ · · · ≥ λδ > 0

φ0 φ1 φ2 · · · φδn

Spectrum (eigenvalues) of the diffusion affinity A and its powersx 7→ Φ(x) , [λ0φ0(x) , λ1φ1(x) , λ2φ2(x) , . . . , λδφδ(x)]T

Aviv Rotbart (TAU) Diffusion Maps December 2012 9 / 40

Diffusion mapsSpectral embedding

1 = λ0 ≥ λ1 ≥ λ2 ≥ · · · ≥ λδ > 0

φ0 φ1 φ2 · · · φδn

Spectrum (eigenvalues) of the diffusion affinity A and its powers

x 7→ Φ(x) , [λ0φ0(x) , λ1φ1(x) , λ2φ2(x) , . . . , λδφδ(x)]T

Aviv Rotbart (TAU) Diffusion Maps December 2012 9 / 40

Diffusion mapsEmbedded distances

embedded distance︷ ︸︸ ︷‖Φ(x)− Φ(y)‖ =

diffusion distance︷ ︸︸ ︷‖a(x , ·)− a(y , ·)‖

Aviv Rotbart (TAU) Diffusion Maps December 2012 10 / 40

Gaussian-based diffusion mapsGaussian kernel:

Normalization =⇒ diffusion kernelSpectral analysis =⇒ map from M⊆ Rm to Rδ�m

Aviv Rotbart (TAU) Diffusion Maps December 2012 11 / 40

Gaussian-based diffusion mapsGaussian kernel:

Normalization =⇒ diffusion kernelSpectral analysis =⇒ map from M⊆ Rm to Rδ�m

Aviv Rotbart (TAU) Diffusion Maps December 2012 11 / 40

Gaussian-based diffusion mapsGaussian kernel:

Normalization =⇒ diffusion kernelSpectral analysis =⇒ map from M⊆ Rm to Rδ�m

Aviv Rotbart (TAU) Diffusion Maps December 2012 11 / 40

Gaussian-based diffusion mapsGaussian kernel:

Normalization =⇒ diffusion kernelSpectral analysis =⇒ map from M⊆ Rm to Rδ�m

Aviv Rotbart (TAU) Diffusion Maps December 2012 11 / 40

Gaussian-based diffusion mapsGaussian kernel:

Normalization =⇒ diffusion kernelSpectral analysis =⇒ map from M⊆ Rm to Rδ�m

Aviv Rotbart (TAU) Diffusion Maps December 2012 11 / 40

Gaussian-based diffusion mapsGaussian kernel:

Normalization =⇒ diffusion kernelSpectral analysis =⇒ map from M⊆ Rm to Rδ�m

Aviv Rotbart (TAU) Diffusion Maps December 2012 11 / 40

Numerical rank

λ0 · · · λδ︸ ︷︷ ︸numerical rank

Spectrum (eigenvalues) of the diffusion affinity A and its powers

Aviv Rotbart (TAU) Diffusion Maps December 2012 12 / 40

Numerical rank

λ0 · · · λδ︸ ︷︷ ︸numerical rank

Spectrum (eigenvalues) of the diffusion affinity A and its powers

Aviv Rotbart (TAU) Diffusion Maps December 2012 12 / 40

Numerical rank

λ0 · · · λδ︸ ︷︷ ︸numerical rank

Spectrum (eigenvalues) of the diffusion affinity A and its powers

Aviv Rotbart (TAU) Diffusion Maps December 2012 12 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training PhaseTesting Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training PhaseTesting Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100

∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reduction

Out-of-sample ext.

⊆ R3

∈ R3

Training Phase

Testing Phase

⊆ R100

∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100

∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reduction

Out-of-sample ext.

⊆ R3

∈ R3

Training Phase

Testing Phase

⊆ R100

∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Typical Scenario: Anomaly Detection

100 system parameters

every few minutes

Dim. reductionOut-of-sample ext.

⊆ R3∈ R3

Training Phase

Testing Phase

⊆ R100∈ R100

Parameter SignificanceCPU % High

Process # High... ...

Net I/O LowHD I/O Low

Aviv Rotbart (TAU) Diffusion Maps December 2012 13 / 40

Diffusion Maps

The Training stage is based on diffusion maps.Diffusion maps generate efficient representations of complexgeometric structures in lower dimensional spaces.Use the eigenfunctions of Markov matrices.Revels global geometric information from local structures.Nonlinear reduction of dimensionality.

Aviv Rotbart (TAU) Diffusion Maps December 2012 14 / 40

Diffusion Maps - example ILet’s take a look at this dataset and the two pairs of points on it.

Is it right to measure the distance between the points using euclidiandistance?

Aviv Rotbart (TAU) Diffusion Maps December 2012 15 / 40

Diffusion maps and distance

The distance between two points xi and xj can be defined as theprobability to land in xj after N random steps that start at xi .

(a) Euclidean distance (b) Random walk

This distance measures the connectivity between xi and xj in thedata, while taking into account all possible paths between xi and xj .

Aviv Rotbart (TAU) Diffusion Maps December 2012 16 / 40

Diffusion Maps

Let Γ = {x1, ..., xm} be a set of points in Rn.Construct a non-negative symmetric kernel on the data:

wε(x , y) = e−‖x−y‖2

2ε .

(c) Gaussian (d) Gaussian Kernel

Aviv Rotbart (TAU) Diffusion Maps December 2012 17 / 40

Diffusion MapsA general form of a kernel, with a parameter α that controls thenormalization type, given by

w (α)ε (x , y) =

wε (x , y)

qα (x) qα (y), q (x) =

∑y∈Γ

wε (x , y) (1)

If we set α to 0.5, the normalized kernel will look like:

(e) Gaussian Kernelwε (x , y)

(f) w (0.5)ε (x , y)

Aviv Rotbart (TAU) Diffusion Maps December 2012 18 / 40

Diffusion MapsThe transition matrix is defined by

pαε (x , y) =w (α)ε (x , y)

d (α)ε (y)

(2)

whered (α)ε (y) =

∑x∈Γ

w (α)ε (x , y) . (3)

The transition matrix p0.5ε (x , y) look like this:

Aviv Rotbart (TAU) Diffusion Maps December 2012 19 / 40

Diffusion Maps

The asymptotic behavior, ε −→ 0, generates differentinfinitesimal operator for different values of α:

1 α = 0 is the classical normalized graph Laplacian.2 α = 1 approximates the Laplace-Beltrami operator.3 α = 1

2 approximates the diffusion of the Fokker-Planck equation.

Aviv Rotbart (TAU) Diffusion Maps December 2012 20 / 40

Diffusion Maps

The transition matrix P is conjugate to a symmetric matrix A

a(x , y) =√

d(x)p(x , y)1√

d(y). (4)

A , D 12 PD− 1

2 .The symmetric matrix A has the following spectraldecomposition:

a(x , y) =∑k≥0

λkvk(x)vk(y). (5)

Aviv Rotbart (TAU) Diffusion Maps December 2012 21 / 40

Diffusion Maps

Since P is conjugate to A

φk = D 12 vk , ψk = D− 1

2 vk . (6)

where {φk} and {ψk} are the left and right eigenvectors of P.From the orthonormality of {vi} and Eq. 6 it follows that {φk}and {ψk} are biorthonormal which means 〈φl , ψk〉 = δlk .

This leads to the following eigendecomposition:

pt(x , y) =∑k≥0

λtkψk(x)φk(y). (7)

Because of the fast decay of the spectrum, only a few terms arerequired to achieve sufficient accuracy in the sum.

Aviv Rotbart (TAU) Diffusion Maps December 2012 22 / 40

Diffusion Maps

The family of diffusion maps {Ψt}, is defined by

Ψt(x) = (λt1ψ1(x), λt

2ψ2(x), λt3ψ3(x), · · · ) , (8)

Diffusion maps embeds the dataset into a Euclidean space.

Aviv Rotbart (TAU) Diffusion Maps December 2012 23 / 40

Diffusion Maps

Figure: Top Left: The eigenvectors: ψk(x). Top Right: The eigenvaluesλt

k . Bottom: The transition matrix.

Aviv Rotbart (TAU) Diffusion Maps December 2012 24 / 40

Diffusion Maps

The diffusion distance between two data points xi and xj is theweighted L2 distance

D2t (xi , xj) =

∑xl∈Γ

(pt(xi , xl)− pt(xl , xj))2

φ0(z). (9)

The diffusion distance can be expressed by the eigenvalues andeigenvector in the following way:

D2t (xi , xj) =

∑k≥1

λ2tk (ψk(xi)− ψk(xj))2. (10)

Aviv Rotbart (TAU) Diffusion Maps December 2012 25 / 40

Diffusion Maps

(a) Original dataset (b) Embedded dataset

Aviv Rotbart (TAU) Diffusion Maps December 2012 26 / 40

Diffusion Maps: A simple example

These are a few pictures of a vehicle that belong to a little guy Iknow:

Aviv Rotbart (TAU) Diffusion Maps December 2012 27 / 40

Diffusion Maps: A simple example

What is the main difference between the pictures?Will the Diffusion Maps algorithm find this?

Now apply the Diffusion Maps algorithm to the picture set infollowing way:

Reshaped each picture to be a vector in R600∗800∗3.Built a Gaussian kernel on this high dimensional data.Normalized it, and looked at the valued of the first eigenvector.

Aviv Rotbart (TAU) Diffusion Maps December 2012 28 / 40

The truck

These are the values of the first diffusion maps coordinate.

(c) Before sorting (d) After sorting

Aviv Rotbart (TAU) Diffusion Maps December 2012 29 / 40

The truck

Pictures sorted by the value on the first Diffusion map coordinate:

Aviv Rotbart (TAU) Diffusion Maps December 2012 30 / 40

Toy Exampleunordered

Aviv Rotbart (TAU) Diffusion Maps December 2012 31 / 40

Toy ExampleApplying DM

Aviv Rotbart (TAU) Diffusion Maps December 2012 32 / 40

Toy Exampleordered

Aviv Rotbart (TAU) Diffusion Maps December 2012 33 / 40

Embedded Space

Aviv Rotbart (TAU) Diffusion Maps December 2012 34 / 40

Embedded Space

Aviv Rotbart (TAU) Diffusion Maps December 2012 35 / 40

Embedded Space

Aviv Rotbart (TAU) Diffusion Maps December 2012 36 / 40

Embedded Space

Aviv Rotbart (TAU) Diffusion Maps December 2012 37 / 40

Embedded Space

Aviv Rotbart (TAU) Diffusion Maps December 2012 38 / 40

Acknowledgement

Thanks to Guy Wolf and Neta Rabin for the slides

Aviv Rotbart (TAU) Diffusion Maps December 2012 39 / 40

Questions

Questions?

Aviv Rotbart (TAU) Diffusion Maps December 2012 40 / 40

top related