manifold statistics - diku...manifold statistics tom fletcher school of computing scientific...
TRANSCRIPT
Manifold Statistics
Tom Fletcher
School of ComputingScientific Computing and Imaging Institute
University of Utah
June 12, 2017
Probabilities on Manifolds
Least Squares and Maximum Likelihood
Geometric: Least squares
minmodel
N∑i=1
d(model, yi)2
Probabilistic: Maximum likelihood
maxmodel
N∏i=1
p(yi; model)
How about this “Gaussian” likelihood?
p(yi; model) ∝ exp(−τd(model, yi)
2)
Least Squares and Maximum Likelihood
Geometric: Least squares
minmodel
N∑i=1
d(model, yi)2
Probabilistic: Maximum likelihood
maxmodel
N∏i=1
p(yi; model)
How about this “Gaussian” likelihood?
p(yi; model) ∝ exp(−τd(model, yi)
2)
Least Squares and Maximum Likelihood
Geometric: Least squares
minmodel
N∑i=1
d(model, yi)2
Probabilistic: Maximum likelihood
maxmodel
N∏i=1
p(yi; model)
How about this “Gaussian” likelihood?
p(yi; model) ∝ exp(−τd(model, yi)
2)
A Riemannian Normal Distribution
For a simple model with Frechet mean:
p(y;µ, τ) =1
C(µ, τ)exp
(−τd(µ, y)2)
Notation: y ∼ NM(µ, τ−1)
Problem: Normalizing constant may depend on µ:
ln p(y;µ, τ) = − ln C(µ, τ)− τd(µ, y)2
Note: not a problem in Rd because C(µ, τ) ∝ τ−d/2.
A Riemannian Normal Distribution
For a simple model with Frechet mean:
p(y;µ, τ) =1
C(µ, τ)exp
(−τd(µ, y)2)
Notation: y ∼ NM(µ, τ−1)
Problem: Normalizing constant may depend on µ:
ln p(y;µ, τ) = − ln C(µ, τ)− τd(µ, y)2
Note: not a problem in Rd because C(µ, τ) ∝ τ−d/2.
Riemannian Homogeneous Spaces
Definition: A Riemannian manifold M is called aRiemannian homogeneous space if its isometry groupG acts transitively.
Theorem: If M is a homogeneous space, thenormalizing constant for a normal distribution on M doesnot depend on µ.
Fletcher, IJCV 2013
Riemannian Homogeneous Spaces
Definition: A Riemannian manifold M is called aRiemannian homogeneous space if its isometry groupG acts transitively.
Theorem: If M is a homogeneous space, thenormalizing constant for a normal distribution on M doesnot depend on µ.
Fletcher, IJCV 2013
Examples of Homogeneous Spaces
I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces
I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.
I Stiefel manifolds: space of orthonormal k-framesin Rn
I Grassmann manifolds: space of k-dimensionalsubspaces in Rn
I Positive-definite symmetric matrices
Examples of Homogeneous Spaces
I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces
I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.
I Stiefel manifolds: space of orthonormal k-framesin Rn
I Grassmann manifolds: space of k-dimensionalsubspaces in Rn
I Positive-definite symmetric matrices
Examples of Homogeneous Spaces
I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces
I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.
I Stiefel manifolds: space of orthonormal k-framesin Rn
I Grassmann manifolds: space of k-dimensionalsubspaces in Rn
I Positive-definite symmetric matrices
Examples of Homogeneous Spaces
I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces
I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.
I Stiefel manifolds: space of orthonormal k-framesin Rn
I Grassmann manifolds: space of k-dimensionalsubspaces in Rn
I Positive-definite symmetric matrices
Examples of Homogeneous Spaces
I Constant curvature spaces: Euclidean spaces,spheres, hyperbolic spaces
I Lie groups: SO(n) (rotations), SE(n) (rigidtransforms), GL(n) (non-singular matrices),Aff(n) (affine transforms), etc.
I Stiefel manifolds: space of orthonormal k-framesin Rn
I Grassmann manifolds: space of k-dimensionalsubspaces in Rn
I Positive-definite symmetric matrices
Medians on Manifolds
Sensitivity of the Frechet Mean to Outliers
Data Outliers
Sensitivity of the Frechet Mean to Outliers
# Outliers = 0 2 6 12
Geometric Medians
Definition: The geometric median of a set of pointsx1, . . . , xN ∈ M is a point satisfying
m = arg minx∈M
∑i
d(x, xi)
Existence & Uniqueness of the GeometricMedian
Theorem: The geometric median exists and is unique if(a) the sectional curvatures of M are nonpositive, or if(b) the sectional curvatures of M are bounded above by∆ > 0 and diam(x1, . . . , xN) < π/(2
√∆).
Fletcher et al. (2008)
Weiszfeld Algorithm in Rn
mk+1 = mk − αGk,
Gk =
(∑i∈Ik
xi
‖xi − mk‖
)·(∑
i∈Ik
1‖xi − mk‖
)−1
,
0 < α ≤ 2
Weiszfeld (1937), Ostresh (1978)
Weiszfeld Algorithm for a RiemannianManifold
mk+1 = Expmk(αvk),
vk =
(∑i∈Ik
Logmk(xi)
d(mk, xi)
)·(∑
i∈Ik
1d(mk, xi)
)−1
,
0 < α ≤ 2
Fletcher et al. (2008), Afsari (2011), Hartley et al. (2011)
Geometric Median with Outliers
# Outliers = 0 2 6 12
Rotation Example: SO(3)
Data Outliers
Geometric Median with Outliers
Mean:
Median:
# Outliers = 0 2 6 12
Manifold Regression
Describing Shape Change
I How does shape change over time?I Changes due to growth, aging, disease, etc.I Example: 100 healthy subjects, 20–80 yrs. old
I We need regression of shape!
Regression on Manifolds
M
yi
Given:Manifold data: yi ∈ MScalar data: xi ∈ R
Want:Relationship f : R→ M“how x explains y”
x
f (x)
Regression on Manifolds
M
yi
Given:Manifold data: yi ∈ MScalar data: xi ∈ R
Want:Relationship f : R→ M“how x explains y”
x
f (x)
Regression on Manifolds
M
yi
(x if )
Given:Manifold data: yi ∈ MScalar data: xi ∈ R
Want:Relationship f : R→ M“how x explains y”
x
f (x)
Parametric vs. Nonparametric Regression
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
x
y
Linear Regression
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.5
0.6
0.7
0.8
0.9
xy
Kernel Regression
Euclidean Case: Multiple Linear Regression
Regression function f : R→ Rn
f (X) = α + Xβ, α, β ∈ Rn
Regression model becomes:
Y = α + Xβ + ε, ε ∼ N(0, σ2)
Least-squares solution:
(α, β) = arg min(α,β)
N∑i=1
‖yi − α− xiβ‖2
Euclidean Case: Multiple Linear Regression
Regression function f : R→ Rn
f (X) = α + Xβ, α, β ∈ Rn
Regression model becomes:
Y = α + Xβ + ε, ε ∼ N(0, σ2)
Least-squares solution:
(α, β) = arg min(α,β)
N∑i=1
‖yi − α− xiβ‖2
Euclidean Case: Multiple Linear Regression
Regression function f : R→ Rn
f (X) = α + Xβ, α, β ∈ Rn
Regression model becomes:
Y = α + Xβ + ε, ε ∼ N(0, σ2)
Least-squares solution:
(α, β) = arg min(α,β)
N∑i=1
‖yi − α− xiβ‖2
Geodesic Regression Function
I Regression function is a geodesic γ : [0, 1]→ MI Parameterized by
I Intercept: initial position γ(0) = pI Slope: initial velocity γ′(0) = v
I Given by exponential map:
γ(x) = Exp(p, x v)p
T M pExp (X)p
X
M
Geodesic Regression
I Generalization of linear regression.I Find best fitting geodesic to the data (xi, yi).I Least-squares problem:
E(p, v) =12
N∑i=1
d (Exp(p, xi v), yi)2
(p, v) = arg min(p,v)∈TM
E(p, v)
Fletcher (2011); Niethammer et al. (2011)
Geodesic Regression
M
yi
(x f ) = Exp(p, xv)
p
v
Fletcher (2011); Niethammer et al. (2011)
Derivative of Exp: Jacobi Fields
Jacobi Field: J′′(x) + R(J(x), f ′(x)) f ′(x) = 0Initial Conditions: J(0) = u1, J′(0) = u2
d Exp(p, v) · (u1, u2) = J(1)
Mp
vu1
J(x)
Mp
v
u2J(x)
dp Exp dv Exp
Derivative of Exp: Jacobi Fields
Jacobi Field: J′′(x) + R(J(x), f ′(x)) f ′(x) = 0Initial Conditions: J(0) = u1, J′(0) = u2
d Exp(p, v) · (u1, u2) = J(1)
Mp
vu1
J(x)
Mp
v
u2J(x)
dp Exp dv Exp
Derivative of Exp: Jacobi Fields
Jacobi Field: J′′(x) + R(J(x), f ′(x)) f ′(x) = 0Initial Conditions: J(0) = u1, J′(0) = u2
d Exp(p, v) · (u1, u2) = J(1)
Mp
vu1
J(x)
Mp
v
u2J(x)
dp Exp dv Exp
Gradient Descent for Regression
E(p, v) =12
N∑i=1
d(Exp(p, xi v), yi)2
∇p E(p, v) = −N∑
i=1
dp Exp(p, xi v)T Log(Exp(p, xi v), yi)
∇v E(p, v) = −N∑
i=1
xi dv Exp(p, xi v)T Log(Exp(p, xi v), yi)
Experiment: Corpus Callosum
● ● ●● ● ●
● ●●
● ●●
●●
●●●
●●
●●
●
● ● ● ● ● ●●
●●
●●●●●
●●
●●
●●●●●●●●●●
●●
●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●
●●
●●
●●
●●
●●
●●
●● ● ● ●
●●●
●●●●
●●
●●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ●
I The corpus callosum is the main interhemisphericwhite matter connection
I Known volume decrease with agingI 32 corpura callosa segmented from OASIS MRI
dataI Point correspondences generated using
ShapeWorks www.sci.utah.edu/software/
Corpus Callosum Data
Age range: 20 - 90 years
R2 Statistic
Define R2 statistic as percentage of variance explained:
R2 =variance along geodesic
total variance of data
=var(xi)‖v‖2∑
i d(y, yi)2 ,
where y is the Frechet mean:
y = arg miny∈M
∑d(y, yi)
2
Hypothesis Testing of R2
I Parametric form for sampling distribution of R2 isdifficult
I Instead use a nonparametric permutation testI Null hypothesis: no relationship between X and YI Permute the order of the xi and compute R2
k fork = 1, . . . , S
I Count percentage of R2k that are larger than R2:
p =|{R2
K > R2}|S
Hypothesis Testing: Corpus Callosum
I R2 = 0.12I Low R2 indicates that age does not explain a high
percentage of the variability seen in corpuscallosum shape
I Ran 10,000 permutations, computing R2k
I p = 0.009I Low p value indicates that the trend seen in corpus
callosum shape due to age is unlikely to be byrandom chance
Hypothesis Testing: Corpus Callosum
I R2 = 0.12I Low R2 indicates that age does not explain a high
percentage of the variability seen in corpuscallosum shape
I Ran 10,000 permutations, computing R2k
I p = 0.009I Low p value indicates that the trend seen in corpus
callosum shape due to age is unlikely to be byrandom chance
Riemannian Polynomials
(∇ d
dtγ
)k ddtγ(t) = 0
Initial conditions:
γ(0) ∈ M,
ddtγ(0) ∈ Tγ(0)M,(
∇ ddtγ
) ddtγ(0) ∈ Tγ(0)M,
...(∇ d
dtγ
)k−1 ddtγ(0) ∈ Tγ(0)M.
Hinkle et al. (2012, 2013)
Riemannian Polynomial RegressionSetup energy with Lagrange multipliers λi(t):
E(γ,{vi}, {λi}) =1N
N∑j=1
d(γ(tj), Jj)2
+
∫ T
0
⟨λ0(t),
ddtγ(t)− v1(t)
⟩dt
+k−1∑i=1
∫ T
0
⟨λi(t),∇ d
dtγvi(t)− vi+1(t)
⟩dt
+
∫ T
0
⟨λk(t),∇ d
dtγvk(t)
⟩dt.
Adjoint Equations
∇ ddtγλi(t) = −λi−1(t), i = 1, . . . , k
∇ ddtγλ0(t) = −
k∑i=1
R(vi(t), λi(t))v1(t).
Corpus Callosum Polynomial Regression
Kernel Regression (Nadaraya-Watson)
Define regression function through weighted averaging:
f (t) =N∑
i=1
wi(t)Yi
wi(t) =Kh(t − Ti)∑Ni=1 Kh(t − Ti)
Example: Gray Matter Volume
K (t-s)
t
h
sti
wi(t) =Kh(t − Ti)∑Ni=1 Kh(t − Ti)
f (t) =N∑
i=1
wi(t)Yi
Manifold Kernel Regression
m
M
pi
^ (t)h
Using Frechet weighted average:
mh(t) = arg miny
N∑i=1
wi(t)d(y,Yi)2
Davis, et al. ICCV 2007
Longitudinal Shape Analysis
OASIS data:
11 healthy subjects
12 dementia subjects
3 images over 6 years
Goal: Understand how individuals change over time.
Why Longitudinal?
Why Longitudinal?
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Independent variable, t
De
pe
nde
nt v
aria
ble
, y
Why Longitudinal?
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Independent variable, t
De
pe
nde
nt v
aria
ble
, y
Why Longitudinal?
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Independent variable, t
De
pe
nde
nt v
aria
ble
, y
Why Longitudinal?
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Independent variable, t
De
pe
nde
nt v
aria
ble
, y
Why Longitudinal?
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Independent variable, t
De
pe
nde
nt v
aria
ble
, y
Hierarchical Geodesic Models
M
p
u
i
i yij
I Group Level: Average geodesic trend (α, β)
I Individual Level: Trajectory for ith subject (pi, ui)
Muralidharan, CVPR 2012; Singh, IPMI 2013
Comparing Geodesics: Sasaki MetricsWhat is the distance between two geodesic trends?
Define distance between initial conditions:
dS((p1, u1), (p2, u2))
Sasaki geodesic on tangent bundle of the sphere.
Comparing Geodesics: Sasaki MetricsWhat is the distance between two geodesic trends?
Define distance between initial conditions:
dS((p1, u1), (p2, u2))
Sasaki geodesic on tangent bundle of the sphere.
Comparing Geodesics: Sasaki MetricsWhat is the distance between two geodesic trends?
Define distance between initial conditions:
dS((p1, u1), (p2, u2))
Sasaki geodesic on tangent bundle of the sphere.
Results on Longitudinal Corpus Callosum
Non-Demented Trend
Demented Trend
Permutation Test:
Variable T2 p-valueIntercept α 0.734 0.248Slope β 0.887 0.027
Dimensionality Reduction: Principal GeodesicAnalysis
Principal Geodesic Analysis
Linear Statistics (PCA) Curved Statistics (PGA)
Principal Geodesic Analysis
Linear Statistics (PCA) Curved Statistics (PGA)
Principal Geodesic Analysis
Linear Statistics (PCA) Curved Statistics (PGA)
Principal Geodesic Analysis
Linear Statistics (PCA) Curved Statistics (PGA)
Principal Geodesic Analysis
Linear Statistics (PCA) Curved Statistics (PGA)
Principal Geodesic Analysis
Linear Statistics (PCA) Curved Statistics (PGA)
Principal Geodesic Analysis
Linear Statistics (PCA) Curved Statistics (PGA)
PGA of Kidney
Mode 1 Mode 2 Mode 3
Probabilistic Principal Geodesic Analysisw
µ
i
Zhang & Fletcher, NIPS 2013
y|x ∼ NM(Exp(µ, z), τ−1)
z = WΛx
I W is n× k matrix = orthonormal k-frame in TµMI Λ is a k × k diagonal matrixI x ∼ N(0, I) are latent variables
Generalization of PPCA (Roweis, 1998; Tipping & Bishop, 1999)
PPGA of Corpus Callosum
− 3λ1
− 1.5λ1
01.5λ1
3λ1
− 3λ2
− 1.5λ2
01.5λ2
3λ2
Principal Geodesic 1 Principal Geodesic 2