bayesian inference for doubly-intractable distributions · bayesian inference for...

69
Bayesian inference for doubly-intractable distributions Anne-Marie Lyne [email protected] April, 2016

Upload: others

Post on 13-Jun-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Bayesian inference for doubly-intractabledistributions

Anne-Marie Lyne

[email protected]

April, 2016

Page 2: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Joint Work

Page 3: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

On Russian Roulette Estimates for Bayesian Inference withDoubly-Intractable Likelihoods

Anne-Marie Lyne, Mark Girolami, Yves Atchade, HeikoStrathmann, Daniel Simpson

Statist. Sci. Volume 30, Number 4 (2015)

Page 4: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Talk overview

1 Doubly intractable models

2 Current Bayesian approaches

3 Our approach using Russian roulette sampling

4 Results

Page 5: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Motivation: Doubly-intractable models

Page 6: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Motivation: Exponential Random Graph Models

Used extensively in the social networks community

View the observed network as one realisation of a randomvariableThe probability of observing a given graph is dependent oncertain ‘local’ graph propertiesFor example the edge density, the number of triangles or k-stars

Page 7: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Motivation: Exponential Random Graph Models

Used extensively in the social networks communityView the observed network as one realisation of a randomvariable

The probability of observing a given graph is dependent oncertain ‘local’ graph propertiesFor example the edge density, the number of triangles or k-stars

Page 8: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Motivation: Exponential Random Graph Models

Used extensively in the social networks communityView the observed network as one realisation of a randomvariableThe probability of observing a given graph is dependent oncertain ‘local’ graph properties

For example the edge density, the number of triangles or k-stars

Page 9: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Motivation: Exponential Random Graph Models

Used extensively in the social networks communityView the observed network as one realisation of a randomvariableThe probability of observing a given graph is dependent oncertain ‘local’ graph propertiesFor example the edge density, the number of triangles or k-stars

Page 10: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Motivation: Modelling social networks

P(Y = y) = Z(θ)−1 exp

(∑k

θkgk (y)

)

g(y) is a vector of K graph statisticsθ is a K -dimensional parameter indicating the ‘importance’ ofeach graph statistic

(Intractable) partition function or normalising term

Z(θ) =∑y∈Y

exp

(∑k

θkgk (y)

)

Page 11: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Motivation: Modelling social networks

P(Y = y) = Z(θ)−1 exp

(∑k

θkgk (y)

)

g(y) is a vector of K graph statisticsθ is a K -dimensional parameter indicating the ‘importance’ ofeach graph statistic(Intractable) partition function or normalising term

Z(θ) =∑y∈Y

exp

(∑k

θkgk (y)

)

Page 12: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Parameter inference for doubly-intractable models

Expectations with respect to the posterior distribution

Eπ[φ(θ)] =

∫Θφ(θ)π(θ|y)dθ ≈ 1

N

N∑k=1

φ(θk ) θk ∼ π(θ|y)

Simplest function of interest

Eπ[θ] =

∫Θθ π(θ|y)dθ ≈ 1

N

N∑k=1

θk θk ∼ π(θ|y)

But need to sample from the posterior distribution...

Page 13: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

The Metropolis-Hastings algorithm

To draw samples from a distribution π(θ):

Choose an initial θ0, define a proposal distribution q(θ, ·), set n = 0.

Iterate the following for n = 0 . . .Niters

1 Propose new parameter value, θ′, from q(θn, ·)2 Set θn+1 = θ′ with probability, α(θn, θ

′), else θn+1 = θn

α(θn, θ′) = min

[1,π(θ′)q(θ′, θn)

π(θn)q(θn, θ′)

]3 n = n + 1

Page 14: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Doubly-intractable distributions

Unfortunately ERGMs are example of ‘doubly-intractable’distribution:

π(θ|y) = p(y|θ)π(θ)p(y)

=f (y;θ)Z(θ)

π(θ)/

p(y)

Partition function or normalising term, Z(θ), is intractable andfunction of parameters

α(θ,θ′) = min(

1,q(θ′,θ)π(θ′)f (y;θ′)Z(θ)q(θ,θ′)π(θ)f (y;θ)Z(θ′)

)

As well as ERGMs, lots of other examples... (Ising and Pottsmodels, spatial models, phylogenetic models)

Page 15: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Doubly-intractable distributions

Unfortunately ERGMs are example of ‘doubly-intractable’distribution:

π(θ|y) = p(y|θ)π(θ)p(y)

=f (y;θ)Z(θ)

π(θ)/

p(y)

Partition function or normalising term, Z(θ), is intractable andfunction of parameters

α(θ,θ′) = min(

1,q(θ′,θ)π(θ′)f (y;θ′)Z(θ)q(θ,θ′)π(θ)f (y;θ)Z(θ′)

)

As well as ERGMs, lots of other examples... (Ising and Pottsmodels, spatial models, phylogenetic models)

Page 16: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Doubly-intractable distributions

Unfortunately ERGMs are example of ‘doubly-intractable’distribution:

π(θ|y) = p(y|θ)π(θ)p(y)

=f (y;θ)Z(θ)

π(θ)/

p(y)

Partition function or normalising term, Z(θ), is intractable andfunction of parameters

α(θ,θ′) = min(

1,q(θ′,θ)π(θ′)f (y;θ′)Z(θ)q(θ,θ′)π(θ)f (y;θ)Z(θ′)

)

As well as ERGMs, lots of other examples... (Ising and Pottsmodels, spatial models, phylogenetic models)

Page 17: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Current Bayesian approaches

Approaches which use some kind of approximation:pseudo-likelihoods

Exact-approximate MCMC approaches:auxiliary variable methods such as Exchange algorithm (requiresperfect sample if implemented correctly)

pseudo-marginal (requires unbiased estimate of likelihood)

α(θn, θ′) = min

[1,π(θ′)q(θ′, θn)

π(θn)q(θn, θ′)

]

Page 18: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

The Exchange algorithm (Murray et al 2004 andMøller et al 2004)

Expand the state space of our target (posterior) distribution to

p(x, θ, θ′|y) = f (y;θ)Z(θ)

π(θ)q(θ, θ′)f (x;θ′)Z(θ′)

/p(y)

Gibbs sample q(θ, θ′) f (x;θ′)Z(θ′)

Propose to swap θ ↔ θ′ using Metropolis-Hastings

α(θ, θ′) =f (y ; θ′)f (x ; θ)π(θ′)q(θ′, θ)f (y ; θ)f (x ; θ′)π(θ)q(θ, θ′)

Page 19: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

The Exchange algorithm (Murray et al 2004 andMøller et al 2004)

Expand the state space of our target (posterior) distribution to

p(x, θ, θ′|y) = f (y;θ)Z(θ)

π(θ)q(θ, θ′)f (x;θ′)Z(θ′)

/p(y)

Gibbs sample q(θ, θ′) f (x;θ′)Z(θ′)

Propose to swap θ ↔ θ′ using Metropolis-Hastings

α(θ, θ′) =f (y ; θ′)f (x ; θ)π(θ′)q(θ′, θ)f (y ; θ)f (x ; θ′)π(θ)q(θ, θ′)

Page 20: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Pseudo-marginal MCMC (Roberts and Andrieu, 2009)Need an unbiased positive estimate of the target distributionp(y |θ,u) such that∫

p(y |θ,u)pθ(u)du = p(y |θ)

Define joint distribution

π(θ,u|y) = π(θ)p(y |θ,u)pθ(u)p(y)

This integrates to one, and has the posterior as its marginalWe can sample from this distribution!

α(θ, θ′) =p(y |θ′,u′)π(θ′)pθ′(u′)

p(y |θ,u)π(θ)pθ(u)× q(θ′, θ)pθ(u)

q(θ, θ′)pθ′(u′)

Page 21: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Pseudo-marginal MCMC (Roberts and Andrieu, 2009)Need an unbiased positive estimate of the target distributionp(y |θ,u) such that∫

p(y |θ,u)pθ(u)du = p(y |θ)

Define joint distribution

π(θ,u|y) = π(θ)p(y |θ,u)pθ(u)p(y)

This integrates to one, and has the posterior as its marginalWe can sample from this distribution!

α(θ, θ′) =p(y |θ′,u′)π(θ′)pθ′(u′)

p(y |θ,u)π(θ)pθ(u)× q(θ′, θ)pθ(u)

q(θ, θ′)pθ′(u′)

Page 22: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Pseudo-marginal MCMC (Roberts and Andrieu, 2009)Need an unbiased positive estimate of the target distributionp(y |θ,u) such that∫

p(y |θ,u)pθ(u)du = p(y |θ)

Define joint distribution

π(θ,u|y) = π(θ)p(y |θ,u)pθ(u)p(y)

This integrates to one, and has the posterior as its marginal

We can sample from this distribution!

α(θ, θ′) =p(y |θ′,u′)π(θ′)pθ′(u′)

p(y |θ,u)π(θ)pθ(u)× q(θ′, θ)pθ(u)

q(θ, θ′)pθ′(u′)

Page 23: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Pseudo-marginal MCMC (Roberts and Andrieu, 2009)Need an unbiased positive estimate of the target distributionp(y |θ,u) such that∫

p(y |θ,u)pθ(u)du = p(y |θ)

Define joint distribution

π(θ,u|y) = π(θ)p(y |θ,u)pθ(u)p(y)

This integrates to one, and has the posterior as its marginalWe can sample from this distribution!

α(θ, θ′) =p(y |θ′,u′)π(θ′)pθ′(u′)

p(y |θ,u)π(θ)pθ(u)× q(θ′, θ)pθ(u)

q(θ, θ′)pθ′(u′)

Page 24: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

What’s the problem?

we need unbiased estimate of

π(θ|y) = p(y|θ)π(θ)p(y)

=f (y;θ)Z(θ)

π(θ)/

p(y)

We can unbiasedly estimate Z(θ) but if we take the reciprocalthen the estimate is no longer unbiased.

E[Z(θ)] = Z(θ)

E

[1Z(θ)

]6= 1Z(θ)

Page 25: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

What’s the problem?

we need unbiased estimate of

π(θ|y) = p(y|θ)π(θ)p(y)

=f (y;θ)Z(θ)

π(θ)/

p(y)

We can unbiasedly estimate Z(θ) but if we take the reciprocalthen the estimate is no longer unbiased.

E[Z(θ)] = Z(θ)

E

[1Z(θ)

]6= 1Z(θ)

Page 26: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Our approach

Construct an unbiased estimate of the likelihood, based on aseries expansion of the likelihood and stochastic truncation.

Use pseudo-marginal MCMC to sample from the desiredposterior distribution.

Page 27: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Proposed methodologyConstruct random variables {V j

θ, j ≥ 0} such that the series

π(θ|y , {V j}) =∞∑

j=0

V jθ has E

[π(θ|y , {V j})

]= π(θ|y).

This infinite series then needs to be truncated unbiasedly.This can be achieved via a number of Russian roulette schemes.Define random time, τθ, such that u := (τθ, {V j

θ,0 ≤ j ≤ τθ})

π(θ,u|y) =τθ∑

j=0

V jθ which satisfies

E[π(θ,u|y)|{V j

θ, j ≥ 0}]=∞∑

j=0

V jθ

Page 28: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Proposed methodologyConstruct random variables {V j

θ, j ≥ 0} such that the series

π(θ|y , {V j}) =∞∑

j=0

V jθ has E

[π(θ|y , {V j})

]= π(θ|y).

This infinite series then needs to be truncated unbiasedly.This can be achieved via a number of Russian roulette schemes.

Define random time, τθ, such that u := (τθ, {V jθ,0 ≤ j ≤ τθ})

π(θ,u|y) =τθ∑

j=0

V jθ which satisfies

E[π(θ,u|y)|{V j

θ, j ≥ 0}]=∞∑

j=0

V jθ

Page 29: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Proposed methodologyConstruct random variables {V j

θ, j ≥ 0} such that the series

π(θ|y , {V j}) =∞∑

j=0

V jθ has E

[π(θ|y , {V j})

]= π(θ|y).

This infinite series then needs to be truncated unbiasedly.This can be achieved via a number of Russian roulette schemes.Define random time, τθ, such that u := (τθ, {V j

θ,0 ≤ j ≤ τθ})

π(θ,u|y) =τθ∑

j=0

V jθ which satisfies

E[π(θ,u|y)|{V j

θ, j ≥ 0}]=∞∑

j=0

V jθ

Page 30: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Implementation example

Rewrite the likelihood as an infinite series. Inspired by the workof Booth (2005).

Simple manipulation:

f (y;θ)Z(θ)

=f (y;θ)Z(θ)

1

1−[1− Z(θ)

Z(θ)

] =f (y;θ)Z(θ)

∞∑n=0

κ(θ)n

whereκ(θ) = 1− Z(θ)

Z(θ)

The series converges for |κ(θ)| < 1.

Page 31: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Implementation example

Rewrite the likelihood as an infinite series. Inspired by the workof Booth (2005).Simple manipulation:

f (y;θ)Z(θ)

=f (y;θ)Z(θ)

1

1−[1− Z(θ)

Z(θ)

] =f (y;θ)Z(θ)

∞∑n=0

κ(θ)n

whereκ(θ) = 1− Z(θ)

Z(θ)

The series converges for |κ(θ)| < 1.

Page 32: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Implementation example

Rewrite the likelihood as an infinite series. Inspired by the workof Booth (2005).Simple manipulation:

f (y;θ)Z(θ)

=f (y;θ)Z(θ)

1

1−[1− Z(θ)

Z(θ)

] =f (y;θ)Z(θ)

∞∑n=0

κ(θ)n

whereκ(θ) = 1− Z(θ)

Z(θ)

The series converges for |κ(θ)| < 1.

Page 33: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Implementation example continued

We can unbiasedly estimate each term in the series using nindependent estimates of Z(θ).

f (y;θ)Z(θ)

=f (y;θ)Z(θ)

∞∑n=0

[1− Z(θ)Z(θ)

]n

≈ f (y;θ)Z(θ)

∞∑n=0

n∏i=1

[1− Zi(θ)

Z(θ)

]

Computed using importance sampling (IS) or sequential MonteCarlo (SMC), for example.

But can’t compute an infinite number of them...

Page 34: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Implementation example continued

We can unbiasedly estimate each term in the series using nindependent estimates of Z(θ).

f (y;θ)Z(θ)

=f (y;θ)Z(θ)

∞∑n=0

[1− Z(θ)Z(θ)

]n

≈ f (y;θ)Z(θ)

∞∑n=0

n∏i=1

[1− Zi(θ)

Z(θ)

]

Computed using importance sampling (IS) or sequential MonteCarlo (SMC), for example.

But can’t compute an infinite number of them...

Page 35: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Implementation example continued

We can unbiasedly estimate each term in the series using nindependent estimates of Z(θ).

f (y;θ)Z(θ)

=f (y;θ)Z(θ)

∞∑n=0

[1− Z(θ)Z(θ)

]n

≈ f (y;θ)Z(θ)

∞∑n=0

n∏i=1

[1− Zi(θ)

Z(θ)

]

Computed using importance sampling (IS) or sequential MonteCarlo (SMC), for example.

But can’t compute an infinite number of them...

Page 36: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Unbiased estimates of infinite series

Take an infinite convergent series S =∑∞

k=0 ak which we wouldlike to estimate unbiasedly.

The simplest: draw integer k with probability p(K = k) where∑∞k=0 p(K = k) = 1, then S = ak/p(K = k)

E[S] =∑

kp(K =k)akp(K =k) = S

(This is essentially importance sampling)

Variance:∑∞

n=0

[a2

np(N=n)

]− S2

Page 37: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Unbiased estimates of infinite series

Take an infinite convergent series S =∑∞

k=0 ak which we wouldlike to estimate unbiasedly.

The simplest: draw integer k with probability p(K = k) where∑∞k=0 p(K = k) = 1, then S = ak/p(K = k)

E[S] =∑

kp(K =k)akp(K =k) = S

(This is essentially importance sampling)

Variance:∑∞

n=0

[a2

np(N=n)

]− S2

Page 38: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Unbiased estimates of infinite series

Take an infinite convergent series S =∑∞

k=0 ak which we wouldlike to estimate unbiasedly.

The simplest: draw integer k with probability p(K = k) where∑∞k=0 p(K = k) = 1, then S = ak/p(K = k)

E[S] =∑

kp(K =k)akp(K =k) = S

(This is essentially importance sampling)

Variance:∑∞

n=0

[a2

np(N=n)

]− S2

Page 39: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Unbiased estimates of infinite series

Take an infinite convergent series S =∑∞

k=0 ak which we wouldlike to estimate unbiasedly.

The simplest: draw integer k with probability p(K = k) where∑∞k=0 p(K = k) = 1, then S = ak/p(K = k)

E[S] =∑

kp(K =k)akp(K =k) = S

(This is essentially importance sampling)

Variance:∑∞

n=0

[a2

np(N=n)

]− S2

Page 40: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Russian roulette

Alternative: Russian roulette.

Choose series of probabilities, {qn}, and draw sequence of i.i.d.uniform random variables, {Un} ∼ U [0,1], n = 1,2,3 . . .

Define time τ = inf{k ≥ 1 : Uk ≥ qk}

Sτ =∑τ−1

j=0aj∏j

i=1 qi, is an unbiased estimate of S.

Must choose {qn} to minimise variance of estimator.

Page 41: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Russian roulette

Alternative: Russian roulette.

Choose series of probabilities, {qn}, and draw sequence of i.i.d.uniform random variables, {Un} ∼ U [0,1], n = 1,2,3 . . .Define time τ = inf{k ≥ 1 : Uk ≥ qk}

Sτ =∑τ−1

j=0aj∏j

i=1 qi, is an unbiased estimate of S.

Must choose {qn} to minimise variance of estimator.

Page 42: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Russian roulette

Alternative: Russian roulette.

Choose series of probabilities, {qn}, and draw sequence of i.i.d.uniform random variables, {Un} ∼ U [0,1], n = 1,2,3 . . .Define time τ = inf{k ≥ 1 : Uk ≥ qk}

Sτ =∑τ−1

j=0aj∏j

i=1 qi, is an unbiased estimate of S.

Must choose {qn} to minimise variance of estimator.

Page 43: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Russian roulette

Alternative: Russian roulette.

Choose series of probabilities, {qn}, and draw sequence of i.i.d.uniform random variables, {Un} ∼ U [0,1], n = 1,2,3 . . .Define time τ = inf{k ≥ 1 : Uk ≥ qk}

Sτ =∑τ−1

j=0aj∏j

i=1 qi, is an unbiased estimate of S.

Must choose {qn} to minimise variance of estimator.

Page 44: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Debiasing estimator, McLeish (2011), Rhee Glynn(2012)

Want unbiased estimate of E[Y ], but can’t generate Y . Cangenerate approximations, Yn, s.t. limn→∞ E[Yn]→ E[Y ].

E[Y ] = Y0 +∞∑

i=1

Yi − Yi−1

Define probability distribution for N, non-negative, integer-valuedrandom variable, then

Z = Y0 +N∑

i=1

Yi − Yi−1

P(N ≥ i)

is an unbiased estimator of E[Y ] and has finite variance.

Page 45: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Debiasing estimator, McLeish (2011), Rhee Glynn(2012)

Want unbiased estimate of E[Y ], but can’t generate Y . Cangenerate approximations, Yn, s.t. limn→∞ E[Yn]→ E[Y ].

E[Y ] = Y0 +∞∑

i=1

Yi − Yi−1

Define probability distribution for N, non-negative, integer-valuedrandom variable, then

Z = Y0 +N∑

i=1

Yi − Yi−1

P(N ≥ i)

is an unbiased estimator of E[Y ] and has finite variance.

Page 46: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Debiasing estimator, McLeish (2011), Rhee Glynn(2012)

Want unbiased estimate of E[Y ], but can’t generate Y . Cangenerate approximations, Yn, s.t. limn→∞ E[Yn]→ E[Y ].

E[Y ] = Y0 +∞∑

i=1

Yi − Yi−1

Define probability distribution for N, non-negative, integer-valuedrandom variable, then

Z = Y0 +N∑

i=1

Yi − Yi−1

P(N ≥ i)

is an unbiased estimator of E[Y ] and has finite variance.

Page 47: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Debiasing estimator, McLeish (2011), Rhee Glynn(2012)

Want unbiased estimate of E[Y ], but can’t generate Y . Cangenerate approximations, Yn, s.t. limn→∞ E[Yn]→ E[Y ].

E[Y ] = Y0 +∞∑

i=1

Yi − Yi−1

Define probability distribution for N, non-negative, integer-valuedrandom variable, then

Z = Y0 +N∑

i=1

Yi − Yi−1

P(N ≥ i)

is an unbiased estimator of E[Y ] and has finite variance.

Page 48: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Negative estimates

Often we cannot guarantee the overall estimate will be positive

Can use a trick from the Physics literature...Recall we have an unbiased estimate of the likelihood, p(y |θ,u)

Eπ[φ(θ)] =

∫φ(θ)π(θ|y)dθ =

∫ ∫φ(θ)π(θ,u|y) dθdu

=

∫ ∫φ(θ) p(y |θ,u)π(θ)pθ(u) dθdu∫ ∫

p(y |θ,u)π(θ)pθ(u) dθdu

=

∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu

Page 49: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Negative estimates

Often we cannot guarantee the overall estimate will be positive

Can use a trick from the Physics literature...Recall we have an unbiased estimate of the likelihood, p(y |θ,u)

Eπ[φ(θ)] =

∫φ(θ)π(θ|y)dθ =

∫ ∫φ(θ)π(θ,u|y) dθdu

=

∫ ∫φ(θ) p(y |θ,u)π(θ)pθ(u) dθdu∫ ∫

p(y |θ,u)π(θ)pθ(u) dθdu

=

∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu

Page 50: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Negative estimates

Often we cannot guarantee the overall estimate will be positive

Can use a trick from the Physics literature...Recall we have an unbiased estimate of the likelihood, p(y |θ,u)

Eπ[φ(θ)] =

∫φ(θ)π(θ|y)dθ =

∫ ∫φ(θ)π(θ,u|y) dθdu

=

∫ ∫φ(θ) p(y |θ,u)π(θ)pθ(u) dθdu∫ ∫

p(y |θ,u)π(θ)pθ(u) dθdu

=

∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu

Page 51: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Negative estimates cont.From last slide

Eπ[φ(θ)] =

∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu

=

∫ ∫φ(θ)σ(p) q(θ,u|y) dθdu∫ ∫σ(p) q(θ,u|y) dθdu

Can get a Monte Carlo estimate of φ(θ) wrt the posterior usingsamples from the ‘absolute’ distribution

Eπ[φ(θ)] ≈∑

k φ(θk )σ(pk )∑k σ(pk )

Page 52: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Negative estimates cont.From last slide

Eπ[φ(θ)] =

∫ ∫φ(θ)σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu∫ ∫σ(p) |p(y |θ,u)|π(θ)pθ(u) dθdu

=

∫ ∫φ(θ)σ(p) q(θ,u|y) dθdu∫ ∫σ(p) q(θ,u|y) dθdu

Can get a Monte Carlo estimate of φ(θ) wrt the posterior usingsamples from the ‘absolute’ distribution

Eπ[φ(θ)] ≈∑

k φ(θk )σ(pk )∑k σ(pk )

Page 53: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Summary

So, we can get an unbiased estimate of doubly-intractabledistribution.

Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood

Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.

However, the methodology is computationally costly as needmany low-variance estimates of partition function.

But, we can compute estimates in parallel...

Page 54: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Summary

So, we can get an unbiased estimate of doubly-intractabledistribution.

Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood

Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.

However, the methodology is computationally costly as needmany low-variance estimates of partition function.

But, we can compute estimates in parallel...

Page 55: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Summary

So, we can get an unbiased estimate of doubly-intractabledistribution.

Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood

Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.

However, the methodology is computationally costly as needmany low-variance estimates of partition function.

But, we can compute estimates in parallel...

Page 56: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Summary

So, we can get an unbiased estimate of doubly-intractabledistribution.

Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood

Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.

However, the methodology is computationally costly as needmany low-variance estimates of partition function.

But, we can compute estimates in parallel...

Page 57: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Summary

So, we can get an unbiased estimate of doubly-intractabledistribution.

Draw a random integer, k ∼ p(k)Compute k -th term in infinite seriesCompute overall unbiased estimate of likelihood

Plug the absolute value into a pseudo-marginal MCMC schemeto sample from the posterior (or close to...).Compute expectations with respect to the posterior usingimportance sampling identity.

However, the methodology is computationally costly as needmany low-variance estimates of partition function.

But, we can compute estimates in parallel...

Page 58: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Results: Ising models

We simulated a 40x40 grid of data points from a Gibbs sampler withJβ = 0.2

Page 59: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Geometric construction with Russian roulette samplingAISParallel implementation using Matlab.

Page 60: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Example: Florentine business network

ERGM model, 16 nodes

Graph statistics in model exponent are number of edges,number of 2- and 3-stars and number of trianglesEstimates of the normalising term were computed using SMCSeries truncation was carried out using Russian roulette

Page 61: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Example: Florentine business network

ERGM model, 16 nodesGraph statistics in model exponent are number of edges,number of 2- and 3-stars and number of triangles

Estimates of the normalising term were computed using SMCSeries truncation was carried out using Russian roulette

Page 62: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Example: Florentine business network

ERGM model, 16 nodesGraph statistics in model exponent are number of edges,number of 2- and 3-stars and number of trianglesEstimates of the normalising term were computed using SMC

Series truncation was carried out using Russian roulette

Page 63: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Example: Florentine business network

ERGM model, 16 nodesGraph statistics in model exponent are number of edges,number of 2- and 3-stars and number of trianglesEstimates of the normalising term were computed using SMCSeries truncation was carried out using Russian roulette

Page 64: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Example: Florentine business network

Page 65: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Example: Florentine business network

Page 66: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Example: Florentine business network

Page 67: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Further work

Optimise various parts of the methodology, stoppingprobabilities etc.Compare with approximate approaches in terms of variance andcomputation

Thank you for listening!

Page 68: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

Further work

Optimise various parts of the methodology, stoppingprobabilities etc.Compare with approximate approaches in terms of variance andcomputation

Thank you for listening!

Page 69: Bayesian inference for doubly-intractable distributions · Bayesian inference for doubly-intractable distributions Anne-Marie Lyne anne-marie.lyne@curie.fr April, 2016

References

MCMC for doubly-intractable distributions. Murray, Ghahramani,MacKay (2004)An efficient Markov chain Monte Carlo method for distributionswith intractable normalising constants. Møller, Pettitt,Berthelsen, Reeves (2004).Unbiased Estimation with Square Root Convergence for SDEModels. Rhee, Glynn (2012)A general method for debiasing a Monte Carlo estimator.McLeish (2011)Unbiased Monte Carlo Estimation of the Reciprocal of anIntegral. Booth (2007)