Download - Mark Girolami's Read Paper 2010
![Page 1: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/1.jpg)
Manifold Monte Carlo Methods
Mark Girolami
Department of Statistical ScienceUniversity College London
Joint work with Ben Calderhead
Research Section Ordinary Meeting
The Royal Statistical SocietyOctober 13, 2010
![Page 2: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/2.jpg)
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
I Advanced Monte Carlo methodology founded on geometric principles
![Page 3: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/3.jpg)
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
I Advanced Monte Carlo methodology founded on geometric principles
![Page 4: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/4.jpg)
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
I Advanced Monte Carlo methodology founded on geometric principles
![Page 5: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/5.jpg)
Riemann manifold Langevin and Hamiltonian Monte Carlo Methods
I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.
I Advanced Monte Carlo methodology founded on geometric principles
![Page 6: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/6.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 7: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/7.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 8: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/8.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 9: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/9.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 10: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/10.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 11: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/11.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 12: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/12.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 13: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/13.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 14: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/14.jpg)
Talk Outline
I Motivation to improve MCMC capability for challenging problems
I Stochastic diffusion as adaptive proposal process
I Exploring geometric concepts in MCMC methodology
I Diffusions across Riemann manifold as proposal mechanism
I Deterministic geodesic flows on manifold form basis of MCMC methods
I Illustrative Examples:-
I Warped Bivariate Gaussian
I Gaussian mixture model
I Log-Gaussian Cox process
I Conclusions
![Page 15: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/15.jpg)
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
![Page 16: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/16.jpg)
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
![Page 17: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/17.jpg)
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
![Page 18: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/18.jpg)
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
![Page 19: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/19.jpg)
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
![Page 20: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/20.jpg)
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
![Page 21: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/21.jpg)
Motivation Simulation Based Inference
I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =
1N
Xnφ(θn) +O(N−
12 )
I Draw θn from ergodic Markov process with stationary distribution p(θ)
I Construct process in two parts
I Propose a move θ → θ′ with probability pp(θ′|θ)
I accept or reject proposal with probability
pa(θ′|θ) = min
»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Efficiency dependent on pp(θ′|θ) defining proposal mechanism
I Success of MCMC reliant upon appropriate proposal design
![Page 22: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/22.jpg)
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
![Page 23: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/23.jpg)
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
![Page 24: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/24.jpg)
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
![Page 25: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/25.jpg)
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
![Page 26: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/26.jpg)
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
![Page 27: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/27.jpg)
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
![Page 28: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/28.jpg)
Adaptive Proposal Distributions - Exploit Discretised Diffusion
I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion
dθ(t) =12∇θL(θ(t))dt + db(t)
I First order Euler-Maruyama discrete integration of diffusion
θ(τ + ε) = θ(τ) +ε2
2∇θL(θ(τ)) + εz(τ)
I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +
ε2
2∇θL(θ)
I Acceptance probability to correct for bias
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Isotropic diffusion inefficient, employ pre-conditioning
θ′ = θ + ε2M∇θL(θ)/2 + ε√
Mz
I How to set M systematically? Tuning in transient & stationary phases
![Page 29: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/29.jpg)
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
![Page 30: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/30.jpg)
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
![Page 31: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/31.jpg)
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
![Page 32: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/32.jpg)
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifold
I Non-Euclidean geometry for probabilities - distances, metrics, invariants,curvature, geodesics
I Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
![Page 33: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/33.jpg)
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesics
I Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
![Page 34: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/34.jpg)
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
![Page 35: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/35.jpg)
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
![Page 36: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/36.jpg)
Geometric Concepts in MCMC
I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log
p(y; θ + δθ)
p(y; θ)dθ ≈ δθTG(θ)δθ
where
G(θ) = Ey|θ
(∇θp(y; θ)
p(y; θ)
∇θp(y; θ)
p(y; θ)
T)
I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,
curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,
1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989
I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998
I Can geometric structure be employed in MCMC methodology?
![Page 37: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/37.jpg)
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
![Page 38: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/38.jpg)
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
![Page 39: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/39.jpg)
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
![Page 40: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/40.jpg)
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«
I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
![Page 41: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/41.jpg)
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
![Page 42: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/42.jpg)
Geometric Concepts in MCMC
I Tangent space - local metric defined by δθTG(θ)δθ =P
k,l gklδθkδθl
I Christoffel symbols - characterise connection on curved manifold
Γikl =
12
Xm
g im„∂gmk
∂θl +∂gml
∂θk −∂gkl
∂θm
«I Geodesics - shortest path between two points on manifold
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
![Page 43: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/43.jpg)
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
![Page 44: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/44.jpg)
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
![Page 45: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/45.jpg)
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
![Page 46: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/46.jpg)
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
![Page 47: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/47.jpg)
Illustration of Geometric Concepts
I Consider Normal density p(x |µ, σ) = Nx (µ, σ)
I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T
I Metric is Fisher Information
G(µ, σ) =
»σ−2 0
0 2σ−2
–
I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean
I Metric tensor for univariate Normal defines a Hyperbolic Space
![Page 48: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/48.jpg)
M.C. Escher, Heaven and Hell, 1960
![Page 49: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/49.jpg)
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Proposal mechanism diffuses approximately along the manifold
![Page 50: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/50.jpg)
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Proposal mechanism diffuses approximately along the manifold
![Page 51: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/51.jpg)
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Proposal mechanism diffuses approximately along the manifold
![Page 52: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/52.jpg)
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–
I Proposal mechanism diffuses approximately along the manifold
![Page 53: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/53.jpg)
Langevin Diffusion on Riemannian manifold
I Discretised Langevin diffusion on manifold defines proposal mechanism
θ′d = θd +ε2
2
“G−1(θ)∇θL(θ)
”d− ε2
DXi,j
G(θ)−1ij Γd
ij + ε“p
G−1(θ)z”
d
I Manifold with constant curvature then proposal mechanism reduces to
θ′ = θ +ε2
2G−1(θ)∇θL(θ) + ε
pG−1(θ)z
I MALA proposal with preconditioning
θ′ = θ +ε2
2M∇θL(θ) + ε
√Mz
I Proposal and acceptance probability
pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))
pa(θ′|θ) = min»1,
p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)
–I Proposal mechanism diffuses approximately along the manifold
![Page 54: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/54.jpg)
Langevin Diffusion on Riemannian manifold
−5 0 55
10
15
20
25
30
35
40
µ
!
−5 0 55
10
15
20
25
30
35
40
µ
!
![Page 55: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/55.jpg)
Langevin Diffusion on Riemannian manifold
−10 0 10
5
10
15
20
25
µ
!
−10 0 10
5
10
15
20
25
µ
!
![Page 56: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/56.jpg)
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
![Page 57: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/57.jpg)
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
![Page 58: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/58.jpg)
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
![Page 59: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/59.jpg)
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour
- first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
![Page 60: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/60.jpg)
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
![Page 61: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/61.jpg)
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
![Page 62: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/62.jpg)
Geodesic flow as proposal mechanism
I Desirable that proposals follow direct path on manifold - geodesics
I Define geodesic flow on manifold by solving
d2θi
dt2 +Xk,l
Γikl
dθk
dtdθl
dt= 0
I How can this be exploited in the design of a transition operator?
I Need slight detour - first define log-density under model as L(θ)
I Introduce auxiliary variable p ∼ N (0,G(θ))
I Negative joint log density is
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|+ 12
pTG(θ)−1p
![Page 63: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/63.jpg)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
![Page 64: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/64.jpg)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
![Page 65: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/65.jpg)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
![Page 66: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/66.jpg)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
![Page 67: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/67.jpg)
Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold
H(θ,p) = −L(θ) +12
log 2πD|G(θ)|| {z }Potential Energy
+12
pTG(θ)−1p| {z }Kinetic Energy
I Marginal density follows as required
p(θ) ∝ exp {L(θ)}p2πD|G(θ)|
Zexp
−1
2pTG(θ)−1p
ffdp = exp {L(θ)}
I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)
pn+1|θn ∼ N (0,G(θn))
θn+1|pn+1 ∼ p(θn+1|pn+1)
I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
![Page 68: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/68.jpg)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
![Page 69: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/69.jpg)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
![Page 70: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/70.jpg)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
![Page 71: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/71.jpg)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
![Page 72: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/72.jpg)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
![Page 73: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/73.jpg)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
![Page 74: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/74.jpg)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
![Page 75: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/75.jpg)
Riemannian Manifold Hamiltonian Monte Carlo
I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p
I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃
I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p
I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)
I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h
I The solution of
dθ
dt=
∂
∂pH(θ,p)
dpdt
= − ∂
∂θH(θ,p)
is therefore equivalent to the solution of
d2θi
dt2 +Xk,l
Γ̃ikl
dθk
dtdθl
dt= 0
I RMHMC proposals are along the manifold geodesics
![Page 76: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/76.jpg)
Warped Bivariate GaussianI p(w1,w2|y, σx , σy ) ∝
QNn=1N (yn|w1 + w2
2 , σ2y )N (w1,w2|0, σ2
x I)
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
w! w!
w" w"
HMC RMHMC
![Page 77: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/77.jpg)
Warped Bivariate GaussianI p(w1,w2|y, σx , σy ) ∝
QNn=1N (yn|w1 + w2
2 , σ2y )N (w1,w2|0, σ2
x I)
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
-4 -3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
w! w!
w" w"
HMC RMHMC
![Page 78: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/78.jpg)
Gaussian Mixture Model
I Univariate finite mixture model
p(xi |θ) =KX
k=1
πkN (xi |µk , σ2k )
I FI based metric tensor non-analytic - employ empirical FI
G(θ) =1N
ST S − 1N2 s̄s̄T −−−−→
N→∞cov (∇θL(θ)) = I(θ)
∂G(θ)
∂θd=
1N
∂ST
∂θdS + ST ∂S
∂θd
!− 1
N2
„∂s̄∂θd
s̄T + s̄∂s̄T
∂θd
«with score matrix S with elements Si,d = ∂ log p(xi |θ)
∂θdand s̄ =
PNi=1 ST
i,·
![Page 79: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/79.jpg)
Gaussian Mixture Model
I Univariate finite mixture model
p(xi |θ) =KX
k=1
πkN (xi |µk , σ2k )
I FI based metric tensor non-analytic - employ empirical FI
G(θ) =1N
ST S − 1N2 s̄s̄T −−−−→
N→∞cov (∇θL(θ)) = I(θ)
∂G(θ)
∂θd=
1N
∂ST
∂θdS + ST ∂S
∂θd
!− 1
N2
„∂s̄∂θd
s̄T + s̄∂s̄T
∂θd
«with score matrix S with elements Si,d = ∂ log p(xi |θ)
∂θdand s̄ =
PNi=1 ST
i,·
![Page 80: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/80.jpg)
Gaussian Mixture Model
I Univariate finite mixture model
p(xi |θ) =KX
k=1
πkN (xi |µk , σ2k )
I FI based metric tensor non-analytic - employ empirical FI
G(θ) =1N
ST S − 1N2 s̄s̄T −−−−→
N→∞cov (∇θL(θ)) = I(θ)
∂G(θ)
∂θd=
1N
∂ST
∂θdS + ST ∂S
∂θd
!− 1
N2
„∂s̄∂θd
s̄T + s̄∂s̄T
∂θd
«with score matrix S with elements Si,d = ∂ log p(xi |θ)
∂θdand s̄ =
PNi=1 ST
i,·
![Page 81: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/81.jpg)
Gaussian Mixture ModelI Univariate finite mixture model
p(x |µ, σ2) = 0.7×N (x |0, σ2) + 0.3×N (x |µ, σ2)
µ
lnσ
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
Figure: Arrows correspond to the gradients and ellipses to the inverse metrictensor. Dashed lines are isocontours of the joint log density
![Page 82: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/82.jpg)
Gaussian Mixture ModelI Univariate finite mixture model
p(x |µ, σ2) = 0.7×N (x |0, σ2) + 0.3×N (x |µ, σ2)
µ
lnσ
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
Figure: Arrows correspond to the gradients and ellipses to the inverse metrictensor. Dashed lines are isocontours of the joint log density
![Page 83: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/83.jpg)
Gaussian Mixture Model
µ
lnσ
MALA path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ
lnσ
mMALA path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ
lnσ
Simplified mMALA path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
MALA Sample Autocorrelation Function
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
mMALA Sample Autocorrelation Function
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
Simplified mMALA Sample Autocorrelation Function
Figure: Comparison of MALA (left), mMALA (middle) and simplified mMALA (right)convergence paths and autocorrelation plots. Autocorrelation plots are from thestationary chains, i.e. once the chains have converged to the stationary distribution.
![Page 84: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/84.jpg)
Gaussian Mixture Model
µ
lnσ
HMC path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ
lnσ
RMHMC path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ
lnσ
Gibbs path
−910
−1010
−1110
−1210
−1310
−1410
−1410
−2910
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
HMC Sample Autocorrelation Function
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
RMHMC Sample Autocorrelation Function
0 2 4 6 8 10 12 14 16 18 20−0.2
0
0.2
0.4
0.6
0.8
Lag
Sam
ple
Auto
corr
ela
tion
Gibbs Sample Autocorrelation Function
Figure: Comparison of HMC (left), RMHMC (middle) and GIBBS (right) convergencepaths and autocorrelation plots. Autocorrelation plots are from the stationary chains,i.e. once the chains have converged to the stationary distribution.
![Page 85: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/85.jpg)
Log-Gaussian Cox Point Process with Latent Field
I The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝Y64
i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1
θ (x−µ1)/2)
I Metric tensors
G(θ)i,j =12
trace„
Σ−1θ
∂Σθ
∂θiΣ−1
θ
∂Σθ
∂θj
«G(x) = Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )
I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional
I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain
I Manifold methods requires no pilot tuning or additional transformations
![Page 86: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/86.jpg)
Log-Gaussian Cox Point Process with Latent Field
I The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝Y64
i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1
θ (x−µ1)/2)
I Metric tensors
G(θ)i,j =12
trace„
Σ−1θ
∂Σθ
∂θiΣ−1
θ
∂Σθ
∂θj
«G(x) = Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )
I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional
I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain
I Manifold methods requires no pilot tuning or additional transformations
![Page 87: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/87.jpg)
Log-Gaussian Cox Point Process with Latent Field
I The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝Y64
i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1
θ (x−µ1)/2)
I Metric tensors
G(θ)i,j =12
trace„
Σ−1θ
∂Σθ
∂θiΣ−1
θ
∂Σθ
∂θj
«G(x) = Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )
I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional
I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain
I Manifold methods requires no pilot tuning or additional transformations
![Page 88: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/88.jpg)
Log-Gaussian Cox Point Process with Latent Field
I The joint density for Poisson counts and latent Gaussian field
p(y, x|µ, σ, β) ∝Y64
i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1
θ (x−µ1)/2)
I Metric tensors
G(θ)i,j =12
trace„
Σ−1θ
∂Σθ
∂θiΣ−1
θ
∂Σθ
∂θj
«G(x) = Λ + Σ−1
θ
where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )
I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional
I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain
I Manifold methods requires no pilot tuning or additional transformations
![Page 89: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/89.jpg)
RMHMC for Log-Gaussian Cox Point Processes
Table: Sampling the latent variables of a Log-Gaussian Cox Process - Comparison ofsampling methods
Method Time ESS (Min, Med, Max) s/Min ESS Rel. SpeedMALA (Transient) 31,577 (3, 8, 50) 10,605 ×1MALA (Stationary) 31,118 (4, 16, 80) 7836 ×1.35
mMALA 634 (26, 84, 174) 24.1 ×440RMHMC 2936 (1951, 4545, 5000) 1.5 ×7070
![Page 90: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/90.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 91: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/91.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 92: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/92.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 93: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/93.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 94: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/94.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 95: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/95.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 96: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/96.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 97: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/97.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 98: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/98.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 99: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/99.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 100: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/100.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 101: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/101.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 102: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/102.jpg)
Conclusion and Discussion
I Geometry of statistical models harnessed in Monte Carlo methods
I Diffusions that respect structure and curvature of space - Manifold MALA
I Geodesic flows on model manifold - RMHMC - generalisation of HMC
I Assessed on correlated & high-dimensional latent variable models
I Promising capability of methodology
I Ongoing development
I Potential bottleneck at metric tensor and Christoffel symbols
I Theoretical analysis of convergence
I Investigate alternative manifold structures
I Design and effect of metric
I Optimality of Hamiltonian flows as local geodesics
I Alternative transition kernels
I No silver bullet or cure all - new powerful methodology for MC toolkit
![Page 103: Mark Girolami's Read Paper 2010](https://reader034.vdocuments.mx/reader034/viewer/2022042606/5482f1a2b07959290c8b4956/html5/thumbnails/103.jpg)
Funding Acknowledgment
I Girolami funded by EPSRC Advanced Research Fellowship and BBSRCproject grant
I Calderhead supported by Microsoft Research PhD Scholarship