manifold statistics - diku...manifold statistics tom fletcher school of computing scientific...

Post on 05-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Manifold Statistics

Tom Fletcher

School of ComputingScientific Computing and Imaging Institute

University of Utah

June 12, 2017

Probabilities on Manifolds

Least Squares and Maximum Likelihood

Geometric: Least squares

minmodel

N∑i=1

d(model, yi)2

Probabilistic: Maximum likelihood

maxmodel

N∏i=1

p(yi; model)

How about this “Gaussian” likelihood?

p(yi; model) ∝ exp(−τd(model, yi)

2)

Least Squares and Maximum Likelihood

Geometric: Least squares

minmodel

N∑i=1

d(model, yi)2

Probabilistic: Maximum likelihood

maxmodel

N∏i=1

p(yi; model)

How about this “Gaussian” likelihood?

p(yi; model) ∝ exp(−τd(model, yi)

2)

Least Squares and Maximum Likelihood

Geometric: Least squares

minmodel

N∑i=1

d(model, yi)2

Probabilistic: Maximum likelihood

maxmodel

N∏i=1

p(yi; model)

How about this “Gaussian” likelihood?

p(yi; model) ∝ exp(−τd(model, yi)

2)

A Riemannian Normal Distribution

For a simple model with Frechet mean:

p(y;µ, τ) =1

C(µ, τ)exp

(−τd(µ, y)2)

Notation: y ∼ NM(µ, τ−1)

Problem: Normalizing constant may depend on µ:

ln p(y;µ, τ) = − ln C(µ, τ)− τd(µ, y)2

Note: not a problem in Rd because C(µ, τ) ∝ τ−d/2.

A Riemannian Normal Distribution

For a simple model with Frechet mean:

p(y;µ, τ) =1

C(µ, τ)exp

(−τd(µ, y)2)

Notation: y ∼ NM(µ, τ−1)

Problem: Normalizing constant may depend on µ:

ln p(y;µ, τ) = − ln C(µ, τ)− τd(µ, y)2

Note: not a problem in Rd because C(µ, τ) ∝ τ−d/2.

Riemannian Homogeneous Spaces

Definition: A Riemannian manifold M is called aRiemannian homogeneous space if its isometry groupG acts transitively.

Theorem: If M is a homogeneous space, thenormalizing constant for a normal distribution on M doesnot depend on µ.

Fletcher, IJCV 2013

Riemannian Homogeneous Spaces

Definition: A Riemannian manifold M is called aRiemannian homogeneous space if its isometry groupG acts transitively.

Theorem: If M is a homogeneous space, thenormalizing constant for a normal distribution on M doesnot depend on µ.

Fletcher, IJCV 2013

Examples of Homogeneous Spaces

I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces

I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.

I Stiefel manifolds: space of orthonormal k-framesin Rn

I Grassmann manifolds: space of k-dimensionalsubspaces in Rn

I Positive-definite symmetric matrices

Examples of Homogeneous Spaces

I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces

I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.

I Stiefel manifolds: space of orthonormal k-framesin Rn

I Grassmann manifolds: space of k-dimensionalsubspaces in Rn

I Positive-definite symmetric matrices

Examples of Homogeneous Spaces

I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces

I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.

I Stiefel manifolds: space of orthonormal k-framesin Rn

I Grassmann manifolds: space of k-dimensionalsubspaces in Rn

I Positive-definite symmetric matrices

Examples of Homogeneous Spaces

I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces

I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.

I Stiefel manifolds: space of orthonormal k-framesin Rn

I Grassmann manifolds: space of k-dimensionalsubspaces in Rn

I Positive-definite symmetric matrices

Examples of Homogeneous Spaces

I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces

I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.

I Stiefel manifolds: space of orthonormal k-framesin Rn

I Grassmann manifolds: space of k-dimensionalsubspaces in Rn

I Positive-definite symmetric matrices

Medians on Manifolds

Sensitivity of the Frechet Mean to Outliers

Data Outliers

Sensitivity of the Frechet Mean to Outliers

# Outliers = 0 2 6 12

Geometric Medians

Definition: The geometric median of a set of pointsx1, . . . , xN ∈ M is a point satisfying

m = arg minx∈M

∑i

d(x, xi)

Existence & Uniqueness of the GeometricMedian

Theorem: The geometric median exists and is unique if(a) the sectional curvatures of M are nonpositive, or if(b) the sectional curvatures of M are bounded above by∆ > 0 and diam(x1, . . . , xN) < π/(2

√∆).

Fletcher et al. (2008)

Weiszfeld Algorithm in Rn

mk+1 = mk − αGk,

Gk =

(∑i∈Ik

xi

‖xi − mk‖

)·(∑

i∈Ik

1‖xi − mk‖

)−1

,

0 < α ≤ 2

Weiszfeld (1937), Ostresh (1978)

Weiszfeld Algorithm for a RiemannianManifold

mk+1 = Expmk(αvk),

vk =

(∑i∈Ik

Logmk(xi)

d(mk, xi)

)·(∑

i∈Ik

1d(mk, xi)

)−1

,

0 < α ≤ 2

Fletcher et al. (2008), Afsari (2011), Hartley et al. (2011)

Geometric Median with Outliers

# Outliers = 0 2 6 12

Rotation Example: SO(3)

Data Outliers

Geometric Median with Outliers

Mean:

Median:

# Outliers = 0 2 6 12

Manifold Regression

Describing Shape Change

I How does shape change over time?I Changes due to growth, aging, disease, etc.I Example: 100 healthy subjects, 20–80 yrs. old

I We need regression of shape!

Regression on Manifolds

M

yi

Given:Manifold data: yi ∈ MScalar data: xi ∈ R

Want:Relationship f : R→ M“how x explains y”

x

f (x)

Regression on Manifolds

M

yi

Given:Manifold data: yi ∈ MScalar data: xi ∈ R

Want:Relationship f : R→ M“how x explains y”

x

f (x)

Regression on Manifolds

M

yi

(x if )

Given:Manifold data: yi ∈ MScalar data: xi ∈ R

Want:Relationship f : R→ M“how x explains y”

x

f (x)

Parametric vs. Nonparametric Regression

●●

●●

●●

●●

● ●

●●

0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

x

y

Linear Regression

●●

●●

●●

● ●

●●

● ●●●

● ●

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.5

0.6

0.7

0.8

0.9

xy

Kernel Regression

Euclidean Case: Multiple Linear Regression

Regression function f : R→ Rn

f (X) = α + Xβ, α, β ∈ Rn

Regression model becomes:

Y = α + Xβ + ε, ε ∼ N(0, σ2)

Least-squares solution:

(α, β) = arg min(α,β)

N∑i=1

‖yi − α− xiβ‖2

Euclidean Case: Multiple Linear Regression

Regression function f : R→ Rn

f (X) = α + Xβ, α, β ∈ Rn

Regression model becomes:

Y = α + Xβ + ε, ε ∼ N(0, σ2)

Least-squares solution:

(α, β) = arg min(α,β)

N∑i=1

‖yi − α− xiβ‖2

Euclidean Case: Multiple Linear Regression

Regression function f : R→ Rn

f (X) = α + Xβ, α, β ∈ Rn

Regression model becomes:

Y = α + Xβ + ε, ε ∼ N(0, σ2)

Least-squares solution:

(α, β) = arg min(α,β)

N∑i=1

‖yi − α− xiβ‖2

Geodesic Regression Function

I Regression function is a geodesic γ : [0, 1]→ MI Parameterized by

I Intercept: initial position γ(0) = pI Slope: initial velocity γ′(0) = v

I Given by exponential map:

γ(x) = Exp(p, x v)p

T M pExp (X)p

X

M

Geodesic Regression

I Generalization of linear regression.I Find best fitting geodesic to the data (xi, yi).I Least-squares problem:

E(p, v) =12

N∑i=1

d (Exp(p, xi v), yi)2

(p, v) = arg min(p,v)∈TM

E(p, v)

Fletcher (2011); Niethammer et al. (2011)

Geodesic Regression

M

yi

(x f ) = Exp(p, xv)

p

v

Fletcher (2011); Niethammer et al. (2011)

Derivative of Exp: Jacobi Fields

Jacobi Field: J′′(x) + R(J(x), f ′(x)) f ′(x) = 0Initial Conditions: J(0) = u1, J′(0) = u2

d Exp(p, v) · (u1, u2) = J(1)

Mp

vu1

J(x)

Mp

v

u2J(x)

dp Exp dv Exp

Derivative of Exp: Jacobi Fields

Jacobi Field: J′′(x) + R(J(x), f ′(x)) f ′(x) = 0Initial Conditions: J(0) = u1, J′(0) = u2

d Exp(p, v) · (u1, u2) = J(1)

Mp

vu1

J(x)

Mp

v

u2J(x)

dp Exp dv Exp

Derivative of Exp: Jacobi Fields

Jacobi Field: J′′(x) + R(J(x), f ′(x)) f ′(x) = 0Initial Conditions: J(0) = u1, J′(0) = u2

d Exp(p, v) · (u1, u2) = J(1)

Mp

vu1

J(x)

Mp

v

u2J(x)

dp Exp dv Exp

Gradient Descent for Regression

E(p, v) =12

N∑i=1

d(Exp(p, xi v), yi)2

∇p E(p, v) = −N∑

i=1

dp Exp(p, xi v)T Log(Exp(p, xi v), yi)

∇v E(p, v) = −N∑

i=1

xi dv Exp(p, xi v)T Log(Exp(p, xi v), yi)

Experiment: Corpus Callosum

● ● ●● ● ●

● ●●

● ●●

●●

●●●

●●

●●

● ● ● ● ● ●●

●●

●●●●●

●●

●●

●●●●●●●●●●

●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●● ● ● ●

●●●

●●●●

●●

●●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ●

I The corpus callosum is the main interhemisphericwhite matter connection

I Known volume decrease with agingI 32 corpura callosa segmented from OASIS MRI

dataI Point correspondences generated using

ShapeWorks www.sci.utah.edu/software/

Corpus Callosum Data

Age range: 20 - 90 years

R2 Statistic

Define R2 statistic as percentage of variance explained:

R2 =variance along geodesic

total variance of data

=var(xi)‖v‖2∑

i d(y, yi)2 ,

where y is the Frechet mean:

y = arg miny∈M

∑d(y, yi)

2

Hypothesis Testing of R2

I Parametric form for sampling distribution of R2 isdifficult

I Instead use a nonparametric permutation testI Null hypothesis: no relationship between X and YI Permute the order of the xi and compute R2

k fork = 1, . . . , S

I Count percentage of R2k that are larger than R2:

p =|{R2

K > R2}|S

Hypothesis Testing: Corpus Callosum

I R2 = 0.12I Low R2 indicates that age does not explain a high

percentage of the variability seen in corpuscallosum shape

I Ran 10,000 permutations, computing R2k

I p = 0.009I Low p value indicates that the trend seen in corpus

callosum shape due to age is unlikely to be byrandom chance

Hypothesis Testing: Corpus Callosum

I R2 = 0.12I Low R2 indicates that age does not explain a high

percentage of the variability seen in corpuscallosum shape

I Ran 10,000 permutations, computing R2k

I p = 0.009I Low p value indicates that the trend seen in corpus

callosum shape due to age is unlikely to be byrandom chance

Riemannian Polynomials

(∇ d

dtγ

)k ddtγ(t) = 0

Initial conditions:

γ(0) ∈ M,

ddtγ(0) ∈ Tγ(0)M,(

∇ ddtγ

) ddtγ(0) ∈ Tγ(0)M,

...(∇ d

dtγ

)k−1 ddtγ(0) ∈ Tγ(0)M.

Hinkle et al. (2012, 2013)

Riemannian Polynomial RegressionSetup energy with Lagrange multipliers λi(t):

E(γ,{vi}, {λi}) =1N

N∑j=1

d(γ(tj), Jj)2

+

∫ T

0

⟨λ0(t),

ddtγ(t)− v1(t)

⟩dt

+k−1∑i=1

∫ T

0

⟨λi(t),∇ d

dtγvi(t)− vi+1(t)

⟩dt

+

∫ T

0

⟨λk(t),∇ d

dtγvk(t)

⟩dt.

Adjoint Equations

∇ ddtγλi(t) = −λi−1(t), i = 1, . . . , k

∇ ddtγλ0(t) = −

k∑i=1

R(vi(t), λi(t))v1(t).

Corpus Callosum Polynomial Regression

Kernel Regression (Nadaraya-Watson)

Define regression function through weighted averaging:

f (t) =N∑

i=1

wi(t)Yi

wi(t) =Kh(t − Ti)∑Ni=1 Kh(t − Ti)

Example: Gray Matter Volume

K (t-s)

t

h

sti

wi(t) =Kh(t − Ti)∑Ni=1 Kh(t − Ti)

f (t) =N∑

i=1

wi(t)Yi

Manifold Kernel Regression

m

M

pi

^ (t)h

Using Frechet weighted average:

mh(t) = arg miny

N∑i=1

wi(t)d(y,Yi)2

Davis, et al. ICCV 2007

Longitudinal Shape Analysis

OASIS data:

11 healthy subjects

12 dementia subjects

3 images over 6 years

Goal: Understand how individuals change over time.

Why Longitudinal?

Why Longitudinal?

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

Independent variable, t

De

pe

nde

nt v

aria

ble

, y

Why Longitudinal?

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

Independent variable, t

De

pe

nde

nt v

aria

ble

, y

Why Longitudinal?

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

Independent variable, t

De

pe

nde

nt v

aria

ble

, y

Why Longitudinal?

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

Independent variable, t

De

pe

nde

nt v

aria

ble

, y

Why Longitudinal?

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

Independent variable, t

De

pe

nde

nt v

aria

ble

, y

Hierarchical Geodesic Models

M

p

u

i

i yij

I Group Level: Average geodesic trend (α, β)

I Individual Level: Trajectory for ith subject (pi, ui)

Muralidharan, CVPR 2012; Singh, IPMI 2013

Comparing Geodesics: Sasaki MetricsWhat is the distance between two geodesic trends?

Define distance between initial conditions:

dS((p1, u1), (p2, u2))

Sasaki geodesic on tangent bundle of the sphere.

Comparing Geodesics: Sasaki MetricsWhat is the distance between two geodesic trends?

Define distance between initial conditions:

dS((p1, u1), (p2, u2))

Sasaki geodesic on tangent bundle of the sphere.

Comparing Geodesics: Sasaki MetricsWhat is the distance between two geodesic trends?

Define distance between initial conditions:

dS((p1, u1), (p2, u2))

Sasaki geodesic on tangent bundle of the sphere.

Results on Longitudinal Corpus Callosum

Non-Demented Trend

Demented Trend

Permutation Test:

Variable T2 p-valueIntercept α 0.734 0.248Slope β 0.887 0.027

Dimensionality Reduction: Principal GeodesicAnalysis

Principal Geodesic Analysis

Linear Statistics (PCA) Curved Statistics (PGA)

Principal Geodesic Analysis

Linear Statistics (PCA) Curved Statistics (PGA)

Principal Geodesic Analysis

Linear Statistics (PCA) Curved Statistics (PGA)

Principal Geodesic Analysis

Linear Statistics (PCA) Curved Statistics (PGA)

Principal Geodesic Analysis

Linear Statistics (PCA) Curved Statistics (PGA)

Principal Geodesic Analysis

Linear Statistics (PCA) Curved Statistics (PGA)

Principal Geodesic Analysis

Linear Statistics (PCA) Curved Statistics (PGA)

PGA of Kidney

Mode 1 Mode 2 Mode 3

Probabilistic Principal Geodesic Analysisw

µ

i

Zhang & Fletcher, NIPS 2013

y|x ∼ NM(Exp(µ, z), τ−1)

z = WΛx

I W is n× k matrix = orthonormal k-frame in TµMI Λ is a k × k diagonal matrixI x ∼ N(0, I) are latent variables

Generalization of PPCA (Roweis, 1998; Tipping & Bishop, 1999)

PPGA of Corpus Callosum

− 3λ1

− 1.5λ1

01.5λ1

3λ1

− 3λ2

− 1.5λ2

01.5λ2

3λ2

Principal Geodesic 1 Principal Geodesic 2

top related