mark girolami's read paper 2010
Post on 05-Dec-2014
916 Views
Preview:
DESCRIPTION
TRANSCRIPT
Manifold Monte Carlo Methods
Mark Girolami
Department of Statistical ScienceUniversity College London
Joint work with Ben Calderhead
Research Section Ordinary Meeting
The Royal Statistical SocietyOctober 13, 2010
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
I Advanced Monte Carlo methodology founded on geometric principles
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
I Advanced Monte Carlo methodology founded on geometric principles
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
I Advanced Monte Carlo methodology founded on geometric principles
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
I Advanced Monte Carlo methodology founded on geometric principles
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifold
I Non-Euclidean geometry for probabilities - distances, metrics, invariants,curvature, geodesics
I Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesics
I Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«
I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
M.C. Escher, Heaven and Hell, 1960
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Proposal mechanism diffuses approximately along the manifold
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Proposal mechanism diffuses approximately along the manifold
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Proposal mechanism diffuses approximately along the manifold
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Proposal mechanism diffuses approximately along the manifold
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Proposal mechanism diffuses approximately along the manifold
Langevin Diffusion on Riemannian manifold
−5 0 55
10
15
20
25
30
35
40
µ
!
−5 0 55
10
15
20
25
30
35
40
µ
!
Langevin Diffusion on Riemannian manifold
−10 0 10
5
10
15
20
25
µ
!
−10 0 10
5
10
15
20
25
µ
!
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour
- first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
Warped Bivariate GaussianI p(w1,w2|y, σx , σy ) ∝
QNn=1N (yn|w1 + w2
2 , σ2y )N (w1,w2|0, σ2
x I)
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
w! w!
w" w"
HMC RMHMC
Warped Bivariate GaussianI p(w1,w2|y, σx , σy ) ∝
QNn=1N (yn|w1 + w2
2 , σ2y )N (w1,w2|0, σ2
x I)
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
w! w!
w" w"
HMC RMHMC
Gaussian Mixture Model
I Univariate finite mixture model
p(xi |θ) =KX
k=1
πkN (xi |µk , σ2k )
I FI based metric tensor non-analytic - employ empirical FI
G(θ) =1N
ST S − 1N2 s̄s̄T −−−−→
N→∞cov (∇θL(θ)) = I(θ)
∂G(θ)
∂θd=
1N
∂ST
∂θdS + ST ∂S
∂θd
!− 1
N2
„∂s̄∂θd
s̄T + s̄∂s̄T
∂θd
«with score matrix S with elements Si,d = ∂ log p(xi |θ)
∂θdand s̄ =
PNi=1 ST
i,·
Gaussian Mixture Model
I Univariate finite mixture model
p(xi |θ) =KX
k=1
πkN (xi |µk , σ2k )
I FI based metric tensor non-analytic - employ empirical FI
G(θ) =1N
ST S − 1N2 s̄s̄T −−−−→
N→∞cov (∇θL(θ)) = I(θ)
∂G(θ)
∂θd=
1N
∂ST
∂θdS + ST ∂S
∂θd
!− 1
N2
„∂s̄∂θd
s̄T + s̄∂s̄T
∂θd
«with score matrix S with elements Si,d = ∂ log p(xi |θ)
∂θdand s̄ =
PNi=1 ST
i,·
Gaussian Mixture Model
I Univariate finite mixture model
p(xi |θ) =KX
k=1
πkN (xi |µk , σ2k )
I FI based metric tensor non-analytic - employ empirical FI
G(θ) =1N
ST S − 1N2 s̄s̄T −−−−→
N→∞cov (∇θL(θ)) = I(θ)
∂G(θ)
∂θd=
1N
∂ST
∂θdS + ST ∂S
∂θd
!− 1
N2
„∂s̄∂θd
s̄T + s̄∂s̄T
∂θd
«with score matrix S with elements Si,d = ∂ log p(xi |θ)
∂θdand s̄ =
PNi=1 ST
i,·
Gaussian Mixture ModelI Univariate finite mixture model
p(x |µ, σ2) = 0.7×N (x |0, σ2) + 0.3×N (x |µ, σ2)
µ
lnσ
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
Figure: Arrows correspond to the gradients and ellipses to the inverse metrictensor. Dashed lines are isocontours of the joint log density
Gaussian Mixture ModelI Univariate finite mixture model
p(x |µ, σ2) = 0.7×N (x |0, σ2) + 0.3×N (x |µ, σ2)
µ
lnσ
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
Figure: Arrows correspond to the gradients and ellipses to the inverse metrictensor. Dashed lines are isocontours of the joint log density
Gaussian Mixture Model
µ
lnσ
MALA path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ
lnσ
mMALA path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ
lnσ
Simplified mMALA path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
MALA Sample Autocorrelation Function
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
mMALA Sample Autocorrelation Function
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
Simplified mMALA Sample Autocorrelation Function
Figure: Comparison of MALA (left), mMALA (middle) and simplified mMALA (right)convergence paths and autocorrelation plots. Autocorrelation plots are from thestationary chains, i.e. once the chains have converged to the stationary distribution.
Gaussian Mixture Model
µ
lnσ
HMC path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ
lnσ
RMHMC path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ
lnσ
Gibbs path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
HMC Sample Autocorrelation Function
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
RMHMC Sample Autocorrelation Function
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
Gibbs Sample Autocorrelation Function
Figure: Comparison of HMC (left), RMHMC (middle) and GIBBS (right) convergencepaths and autocorrelation plots. Autocorrelation plots are from the stationary chains,i.e. once the chains have converged to the stationary distribution.
Log-Gaussian Cox Point Process with Latent Field
I The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝Y64
i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1
θ (x−µ1)/2)
I Metric tensors
G(θ)i,j =12
trace„
Σ−1θ
∂Σθ
∂θiΣ−1
θ
∂Σθ
∂θj
«G(x) = Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )
I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional
I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain
I Manifold methods requires no pilot tuning or additional transformations
Log-Gaussian Cox Point Process with Latent Field
I The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝Y64
i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1
θ (x−µ1)/2)
I Metric tensors
G(θ)i,j =12
trace„
Σ−1θ
∂Σθ
∂θiΣ−1
θ
∂Σθ
∂θj
«G(x) = Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )
I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional
I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain
I Manifold methods requires no pilot tuning or additional transformations
Log-Gaussian Cox Point Process with Latent Field
I The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝Y64
i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1
θ (x−µ1)/2)
I Metric tensors
G(θ)i,j =12
trace„
Σ−1θ
∂Σθ
∂θiΣ−1
θ
∂Σθ
∂θj
«G(x) = Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )
I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional
I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain
I Manifold methods requires no pilot tuning or additional transformations
Log-Gaussian Cox Point Process with Latent Field
I The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝Y64
i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1
θ (x−µ1)/2)
I Metric tensors
G(θ)i,j =12
trace„
Σ−1θ
∂Σθ
∂θiΣ−1
θ
∂Σθ
∂θj
«G(x) = Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )
I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional
I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain
I Manifold methods requires no pilot tuning or additional transformations
RMHMC for Log-Gaussian Cox Point Processes
Table: Sampling the latent variables of a Log-Gaussian Cox Process - Comparison ofsampling methods
Method Time ESS (Min, Med, Max) s/Min ESS Rel. SpeedMALA (Transient) 31,577 (3, 8, 50) 10,605 ×1MALA (Stationary) 31,118 (4, 16, 80) 7836 ×1.35
mMALA 634 (26, 84, 174) 24.1 ×440RMHMC 2936 (1951, 4545, 5000) 1.5 ×7070
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
Funding Acknowledgment
I Girolami funded by EPSRC Advanced Research Fellowship and BBSRCproject grant
I Calderhead supported by Microsoft Research PhD Scholarship
top related