model-based geostatistics - purdue universityhuang251/zhang_1118.pdf · i model-based geostatistics...

25
Title Model-based Geostatistics Presented by Tonglin Zhang Department of Statistics Purdue University November 18, 2014 Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Upload: others

Post on 18-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Title

Model-based GeostatisticsPresented by Tonglin Zhang

Department of StatisticsPurdue University

November 18, 2014

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 2: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

References

Major References

I Diggle, P.J., Tawn, J.A., Moyeed, R.A. Model-basedgeostatistics. Applied Statistics, 47, 299-350.

I Diggle, P.J., Ribeiro, P.J. (2007). Model-based geostatistics.Springer.

Minor References

I Elliott, P., Wakefield, J., Best, N., Briggs, D. (2000). SpatialEpidemiology, Chapters 6 & 7. Oxford press.

I Agresti, A. (2002). Categorical Data Analysis. Wiley.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 3: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Outline

Outline

I Geostatistical Model

I Generalized Linear Mixed-Effect Model

I Moment Formulae

I Examples

I Generalized Linear Prediction

I Estimation

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 4: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Geostatistical Model

Spatial Gaussian Process

A process Z (s) on Rd is a Gaussian process ifz = (Z (s1), · · · ,Z (sn)) is a multivariate normal random vector forany distinct s1, · · · , sn ∈ Rd . It is often assumed that

I E [Z (s)] = 0.

I Cov [Z (s),Z (s+ h)] = cθz (∥h∥), where cθz is a parametriccovariance family.

I Then, we can write cθz (∥h∥) = τ21 ρθz (∥h∥), where ρθz (h) is aparametric correlation function.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 5: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Geostatistical Model

The geostatistical model is often proposed as

Y (s) = x(s)β + Z (s) + ϵ(s),

where x is the vector of explanatory variables, Z (s) is a Gaussianrandom field, and ϵ(s) is the white noise error term (i.e.,ϵ(s1), · · · , ϵ(sn) are iid N(0, τ22 ) for distance s1, · · · , sn). Letσ2 = τ21 + τ22 and

θ = (τ21σ2

, θz).

Then, the covariance between Y (s) is completely determined by θ.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 6: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Geostatistical Model

Let

y =

Y (s1)...

Y (sn)

,X =

x(s1)...

x(sn)

.

Using matrix expression, there is

y = Xβ + z+ ϵ.

Then,y ∼ N(xβ, σ2R)

where R = Cor(y). Then, the kriging prediction of Y (s0) is

Y ∗(s0) = E (Y (s0)|y) = x(s0)β + c′0R−1(y − Xβ),

where c0 = Cor(y,Y (s0)).

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 7: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Geostatistical Model

Estimation

The likelihood function is

ℓ(β, σ2, θ) =− n

2log(2π)− n

2log(σ2)

− 1

2log | det(Rθ)| −

1

2σ2(y − Xβ)′R−1

θ (y − Xβ).

Given θ, there is

β = βθ = (X′R−1θ X)−1X′R−1

θ y

σ2 = σ2θ =

1

n[y′R−1

θ y − y′R−1θ X(X′R−1

θ X)−1X′R−1θ y].

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 8: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Geostatistical Model

Then, θ can be estimated by maximizing the profile likelihood as

ℓP(θ) = −n

2[1 + log(

n)]− 1

2log | det(Rθ)| −

n

2log(y′Mθy),

whereMθ = R−1

θ − R−1θ X(X′R−1

θ X)−1X′R−1θ .

I Since there is no analytic result, numerical methods are oftenused.

I The above involves the computation of Rθ, which is large if nis large.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 9: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Generalized Linear Mixed-Effect Model

Non Gaussian Data

In applications, the response may be count. Then, non-Gaussiandata appear. Mostly y(si ) follows either a binominal or Poissondistribution, where the spatial correlated effect is interpreted by aGaussian random field.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 10: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Generalized Linear Mixed-Effect Model

Binomial Data

Assume Y (s) ∼ Bin(n(s), p(s)). Then, a logistic model-basedgeostatistical model is

logp(s)

1− p(s)= x(s)β + Z (s),

where Z (s) is a Gaussian random field. It is often assumed thatgiven (p(s1), · · · , p(sn)), y = (Y (s1), · · · ,Y (sn)) are independent.This is called the conditional independence assumption.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 11: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Generalized Linear Mixed-Effect Model

Poisson Data

Assume Y (s) ∼ Poisson(λ(s)). Then, a loglinear model-basedgeostatistical model is

log λ(s) = x(s)β + Z (s).

Conditional independence is also often assumed.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 12: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Moment Formulae

Moment Formulae

Agresti (2002) (page 563-564) presents useful mean, variance, andcovariance formulae. We have

E(Y (s)) =E[E(Y (s)|Z (s))]

V(Y (s)) =E[V(Y (s)|Z (s))] + V[E(Y (s)|Z (s))]

and

Cov(Y (s),Y (s′)) =E[Cov(Y (s),Y (s′)|Z (s),Z (s′))]+ Cov[E(Y (s)|Z (s)),E(Y (s′)|Z (s′))].

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 13: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Moment Formulae

Specification: Poisson Model

For the Poisson model, we have

E(Y (s)) =ex(s)β+σ2/2

V(Y (s)) =ex(s)β+σ2/2 + e2x(s)β(e2σ2 − eσ

2)

and

Cov(Y (s),Y (s′)) =ex(s)β+x(s′)βCov(eZ(s), eZ(s′))

=ex(s)β+x(s′)β[eσ2(eσ

2ρθ(∥s−s′∥) − 1)].

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 14: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Examples

Tree Locations: Ambrosia Dumosa Plants

I The Ambrosia Dumosa data consisted of locations and severalimportant measurements of 4358 Ambrosia dumosa in asquare area in the Colorado Desert in 1984.

I Other measurements areI the height of the plant canopy;I the length of the major axis of the plant canopy;I the length of the minor axis of the plant canopy;I the volume of the plant canopy.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 15: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Examples

0 20 40 60 80 100

020

4060

8010

0

Ambrosia Dumosa locations in 1984 in the Colorado Desert

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 16: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Generalized Linear Mixed-Effect Model

We have model the locations of trees by

logp(s)

1− p(s)= α+ Z (s),

where p(s) is the intensity function.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 17: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Examples

Disease Mapping: Cancer Clusters

Assume an area is partitioned in m units: y(si ), i = 1, · · · ,m, isthe count of disease, ξi is the at risk population.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 18: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Examples

Marion

40 0 40 Miles

Legend56 - 9195 - 103104 - 114115 - 129130 - 151154 - 258

Male Colorectal Cancer Rate (per 100,000) in Indiana 2003-2007

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 19: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Examples

Disease Mapping: Cancer Clusters

We may assumey(si ) ∼ Poisson(ξiθi )

withlog θi = µ+ log(ξi ) + Z (si ).

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 20: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Examples

Extensions in Poisson Case

There are three extensions (the first is not):I Stationary models: Z (s) is a stationary Gaussian random field.

I CAR (conditional autoregressive) model: the variance of Z (si )is inverse proportional to ξi .

I SAR (spatial autoregressive) model: the distirbution of Z (s)only depends on its neighbour.

I Quasi-Poisson model: the expected value of Y (s) onlydepends on β but not Z (s).

Note: CAR and SAR models are not typical geostatisticalapproaches.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 21: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Generalized Linear Prediction

Generalized Linear Prediction

For any unobsered point s0, the prediction of response is

y(s0) = g(x(s0)β + Z (s0)),

where Z (s0) is the prediction of Z (s0). To compute Z (s0), it isrecommend to use

Z (s0) = E[Z (s0)|y].

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 22: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Estimation

Likelihood Function

Let Yi = Y (si ), xi = x(si ), and Zi = Z (si ). Writefi (Yi |Zi , β, θ, σ

2) as the PMF of PDF at location si . Theconditional likelihood function is

L(β, θ, σ2|z) =n∏

i=1

fi (Yi |Zi , β, θ, σ2).

The marginal likelihood function is

L(β, θ, σ2) =

∫Rn

L(β, θ|z)g(z|θ, σ2)dz.

This is hard to compute. Therefore, MCMC (Markov Chain MonteCarlo) algorithm is used.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 23: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Estimation

Specification: Poisson Example

I The conditional PMF is

fi (Yi |Zi , β, θ, σ2)) =

1

Yi !eYi (xiβ+Zi )−exiβ+Zi .

I The conditional likelihood function is

L(β, θ, σ2|z) =n∏

i=1

1

Yi !eYi (xiβ+Zi )−exiβ+Zi .

I The likelihood function is

L(β, θ, σ2) =

∫Rn

[n∏

i=1

1

Yi !eYi (xiβ+Zi )−exiβ+Zi ]

1

σn2 | det(Rθ)|

e−12zR−1

θ zdz.

This is not integrable.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 24: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Estimation

Specification: MCMC algorithm

I Generate θ, β, and σ2 from their prior distributions.

I Generate z from N(0, σ2Rθ).

I Compute the conditional likelihood function L(β, θ, σ2|z).I The posterior is proportional to the conditional likelihood

function given priors. Then, we can derive the posterior meanof σ2, β, and θ if parameters are weighted by their priors.

I Disadvantage: convergence rate is low.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics

Page 25: Model-based Geostatistics - Purdue Universityhuang251/Zhang_1118.pdf · I Model-based geostatistics can be used to analyze non-Gaussian data. I Estimation of parameters is difficult

Summary

Summary

I Model-based geostatistics can be used to analyzenon-Gaussian data.

I Estimation of parameters is difficult.

I Generalized Linear Prediction is used to compute thepredicted response.

Tonglin Zhang, Department of Statistics, Purdue University Model-based Geostatistics