bayesian semiparametric intensity estimation for ...meng/papers/biometrics.yueloh.2011.pdf · liang...

Biometrics 000, 000–000 DOI: 000

000 0000

Bayesian Semiparametric Intensity Estimation for Inhomogeneous Spatial

Point Processes

Yu Ryan Yue∗

Baruch College, City University of New York, New York, NY 10010, USA

*email: [email protected]

and

Ji Meng Loh∗

AT&T Labs-Research, Florham Park, NJ 07932, USA

*email: [email protected]

Summary: In this work we propose a fully Bayesian semiparametric method to estimate the

intensity of an inhomogeneous spatial point process. The basic idea is to first convert intensity

estimation into a Poisson regression setting via binning data points on a regular grid, and then

model log intensity semiparametrically using an adaptive version of Gaussian Markov random fields

to smooth the corresponding counts. The inference is carried by an efficient MCMC simulation

algorithm. Compared to existing methods for intensity estimation, e.g., parametric modeling and

kernel smoothing, the proposed estimator not only provides inference regarding the dependence of

the intensity function on possible covariates, but also uses information from the data to adaptively

determine the amount of smoothing at the local level. The effectiveness of using our method is

demonstrated through simulation studies and an application to a rainforest dataset.

Key words: Adaptive spatial smoothing; Gaussian Markov random fields; Gibbs sampling; In-

tensity estimation; Spatial point process.

1

Bayesian Semiparametric Intensity Estimation 1

1. Introduction

Spatial point data arise in a wide variety of applications including ecology, seismology,

astronomy and epidemiology (e.g. Waagepetersen, 2007; Schoenberg, 2003; Stein et al.,

2000; Diggle et al., 2000, respectively). Diggle (2003) and Stoyan et al. (1995) both contain

comprehensive introductions to the analysis of spatial point patterns. While spatial point

analyses range from relatively simple exploratory methods to complex Markov Chain Monte

Carlo (MCMC) algorithms for model fitting (Møller and Waagepetersen, 2003), in any spatial

point analysis, the first main interest is often the intensity function λ(s), s ∈ R2, defined by

λ(s) = lim|ds|→0

(E[N(ds)]

|ds|

), (1)

where ds represents an infinitesimal region around s, |ds| its area and N(ds) the number of

points in ds. The quantity λ(s)|ds| can be thought of as the probability of observing one data

point in the region ds. In the stationary case, λ(s) ≡ λ a constant and, given a spatial point

pattern with n points observed in an observational region D ⊂ R2, can be easily estimated

using λ = n/|D|. In the nonstationary or inhomogeneous case, however, λ(s) is not constant

but a function of s.

Motivated by theoretical considerations of consistency, much of recent work has considered

parametric estimation of the intensity, i.e. modeling the intensity as a function of one or more

measured covariates, for an inhomogeneous spatial point process. Specifically, suppose X(s),

s ∈ D, is a row vector of p spatial covariates. The point data are then modeled as a realization

of a spatial point process with intensity function given by

λ(s;α,β) = exp {α +X(s)β} . (2)

This model can be generalized by replacing the exponential function by another positive

strictly increasing differentiable function. Assuming that the point process is an inhomoge-

neous Poisson with intensity function given by (2), the log-likelihood for a point pattern

2 Biometrics, 000 0000

T = {t1, . . . , tn} is

L(α,β;T ) =1

|D|

n∑i=1

log λ(ti;α,β)− 1

|D|

∫D

λ(s;α,β) ds. (3)

The quantity L(α,β;X) can be easily maximized using regular software for fitting generalized

linear models (Baddeley and Turner, 2000) to obtain estimates. The spatstat R package

(Baddeley and Turner, 2005) contains the function ppm that can be used to fit such model.

Schoenberg (2004) showed that under some mild conditions on the spatial point process,

the estimates can be consistent even if the process is not Poisson. Guan and Loh (2007),

Waagepetersen and Guan (2009) and Guan (2009) used this framework in their work.

Many important theoretical results for intensity function estimation using parametric

models have been derived under some assumptions. One crucial assumption is that the model

is correct, i.e. the correct covariates X(s) are used, and that the choice of the relationship

between the intensity function and the covariates, usually the form shown in (2), is correct.

However, spatial point patterns are often the results of complex interactions between physical,

environmental and ecological processes. For example, a census of tree locations in a tropical

forest includes many generations of trees whose growth and decay are affected by soil, rainfall,

water runoff, etc. At best, this makes the choice of model very challenging. If the relevant

covariates were not measured, an appropriate model may not even be possible.

When there is no covariate involved or it is too difficult to incorporate covariate in-

formation, nonparametric estimation of the intensity function is often used. One way to

estimate λ(s) nonparametrically is with kernel estimation (Diggle, 1985), usually with a single

bandwidth h chosen by eye. Diggle (2003) gave an ad-hoc rule, h = 0.68n−0.2 for the quadratic

Epanecnikov kernel when the observation window is a square. He also provided a more

sophisticated method where h can be obtained by minimizing an estimate M(h) of the mean

squared error as a function of h. Stoyan et al. (1995) suggested a likelihood cross-validation

method for selecting the bandwidth, which is an extension of the method in Silverman


(1986) to the context of spatial point patterns. Since the same amount smoothing is applied

to the whole data area, the ordinary kernel smoother often fails to capture the local spatial

variation of the intensity function. Diggle et al. (2005) therefore used various bandwidths

hi to perform adaptive smoothing with the kernel estimator. Guan (2008) introduced an

interesting version of kernel estimation, with the distance measure in the kernel based on

covariate distance instead of geographic distance. This allowed consistent estimation of the

intensity by pooling information over widely separated points that have similar covariate

values. He addressed bandwidth selection and dimension reduction using a sliced inverse

regression technique. While it is nonparametric and does not involve a specified form of the

relationship between the intensity and the covariate(s), this method does assume that the

correct covariate(s) are used. Furthermore, it is not entirely clear how covariate distances

should be computed if two or more covariates are involved. Guan (2008) used the Euclidean

norm on the standardized covariates. The standardization reduces the issues involved with

using the Euclidean norm. However, with some covariates, e.g. altitude and soil pH, it is

difficult to interpret the meaning of this distance, nor is it clear whether the Euclidean norm

is the correct norm to use.

To combine the strengths of both parametric and nonparametric methods, we propose

modeling the intensity function in a semiparametric way, i.e.,

λ(s;α,β, z) = exp {α +X(s)β + z(s)} , (4)

where z(s) is a stochastic process accounting for spatial correlation among locations that

cannot be explained by the covariates. We convert intensity estimation into a generalized

spatial regression problem via binning the data and assume that the observed number of

points in each grid cell follows a Poisson distribution with intensity of the form given in (4).

A fully Bayesian hierarchical model is developed by taking appropriate prior distributions on

α, β and z(s). In particular, the prior on z(s) is an adaptive version of a Gaussian Markov


random field (GMRF). Such an adaptive GMRF incorporates local spatial structure into

the model while simultaneously accounting for spatial variation and uncertainty at a larger

spatial scale. The inference is carried out by an efficient MCMC simulation algorithm using

the sparse structure of the GMRF. The proposed method avoids the restrictive assumptions

of the specific form of relationship between the intensity function and the set of possible

covariates. Compared to a kernel smoother whose bandwidth(s) is (are) often chosen in an

ad-hoc manner, the method uses information contained in the data to adaptively determine

the amount of smoothing at the local level.

There are some other Bayesian approaches that can be used to estimate intensity of an

inhomogeneous spatial point process. Knorr-Held and Rue (2002) and Rue et al. (2004)

developed a flexible Bayesian model for disease mapping based on MCMC inference. Rue

et al. (2009) introduced a novel Bayesian inference method based on integrated Laplace

approximations (INLA), which provides (much) faster and more reliable inference than using

MCMC. Liang et al. (2009) introduced Bayesian wombling for spatial point processes. All the

methods, however, modeled unknown spatial process with either non-adaptive GMRFs or

conditionally autoregressive (CAR) models (Besag, 1974). Illian and Rue (2010) incorporated

constructed covariates into their model to capture local spatial structure and also used GMRF

to account for spatial variation at a larger scale.

The rest of the paper is organized as follows. In Sections 2 and 3, we introduce our Bayesian

hierarchical modeling method based on an adaptive GMRF prior and MCMC inference.

In Section 4, we compare our method with kernel intensity estimation through simulation

studies. In Section 5, we apply the method to four species of trees in the Barro Colorado

Island tropical forest dataset.


2. Model and priors

2.1 Observation model

Consider a regular grid In with n = n1n2 cells and denote by Cjk the grid cell in the j-th

row and k-th column. Given a spatial point pattern, we bin the observed points into the

cells. Letting yjk be the number of points binned to Cjk, we use the model

yjk|λjk ∼ Poisson (|Cjk|λjk) and log(λjk) = α +X(sjk)β + z(sjk), (5)

for j = 1, . . . , n1 and k = 1, . . . , n2, as a simple approximation to (3). The location point

sjk is the center of Cjk, X(sjk) represents p spatial covariates measured at each sjk, z(sjk)

is the realization of a spatial process at sjk and λjk can be considered the average intensity

for each grid cell. We note that in an actual application, the covariates may not have been

measured at the grid points. In such situations some preprocessing such as interpolation

would be needed.

Binning has been proposed by a number of authors in density estimation and nonparametric

regression. In particular, Brown et al. (2009) considered binning data to convert the density

estimation problem to a nonparametric regression problem in one dimension, using a wavelet

block thresholding estimator (Cai, 1999) as the regression estimator. Kass et al. (2009)

considered, for neuronal spike data, adaptive smoothing by binning the data and using

Bayesian regression splines where the number of knots of the cubic splines is determined

from data.

2.2 Spatially adaptive Gaussian Markov random fields

We model the spatial process z(sjk) in (5) using an adaptive version of the Gaussian Markov

random field (GMRF) introduced by Yue and Speckman (2010). In particular, this GMRF

is derived by approximating a two-dimensional Bayesian thin-plate spline (TPS) estimator

(e.g. Wahba, 1990; Gu, 2002). We briefly describe the adaptive GMRF in this section and


refer the reader to Rue and Held (2005) and Yue and Speckman (2010) respectively for

details on GMRF’s in general and on the movitation behind the TPS GMRF.

The adaptive GMRF of Yue and Speckman (2010) is based on the following spatial

Gaussian random walk model :

(∇2

(1,0) +∇2(0,1)

)zjk ∼ N

(0, (δγjk)

−1) , (6)

where zjk = z(sjk), and ∇2(1,0) and ∇2

(0,1) denote the second-order backward difference opera-

tors in the vertical and horizontal directions respectively, i.e., ∇2(1,0)zjk = zj+1,k−2zjk+zj−1,k

and ∇2(0,1)zjk = zj,k+1 − 2zjk + zj,k−1 for 2 6 j 6 n1 − 1 and 2 6 j 6 n2 − 1 . The positive

parameter δ is a global smoothing parameter accounting for large-scale spatial variation

while the γjk (also positive) are the adaptive smoothing parameters that capture the local

structure of the process z(s). The use of γjk is important for estimating the intensity function

from an inhomogeneous point pattern. Heuristically, we need less smoothing (small γjk) at

dense sub-regions and relatively more smoothing (large γjk) where very few or no points are

observed. Standard smoothing techniques (e.g., kernel smoother with fixed width) involve

a trade-off between increasing detectability for intensity and retaining information on the

spatial extent of the areas. Using adaptive γjk reduces such loss of information. The need of

adaptive smoothing for intensity estimation was also demonstrated in Diggle et al. (2005)

and Illian and Rue (2010). Note that setting γjk ≡ 1 makes (6) a non-adaptive GMRF

on lattice, which yields a Bayesian solution for thin-plate splines (see Rue and Held, 2005,

section 3.4.2).

Yue and Speckman (2010) showed that model (6) together with certain edge correction

terms define an adaptive GMRF with matrix form given by

π(z | γ, δ) ∝ δ(n−1)/2|Aγ|1/2+ exp(− δ

2zTAγz

), (7)

where z = (z11, z21, . . . , zn1,n2)′, γ = (γ21, γ22, . . . , γn1,n2)

′, Aγ = BTdiag(γ)B is an n × n

structure matrix of rank n − 1, B is an (n − 1) × n full rank sparse matrix, and diag(γ)


is a diagonal matrix of γ. The quantity |Aγ|+ is the product of the non-zero eigenvalues

from Aγ. Yue et al. (2008) and Yue and Speckman (2010) proved that |Aγ|+ = |BBT |∏

` γ`

which is easily computed. The prior (7) is an intrinsic GMRF because its density is improper

Gaussian (Aγ is singular) and it has an attractive Markov property makingAγ highly sparse.

See Yue and Speckman (2010) for the detailed specification of Aγ.

Additional priors need to be specified for γjk in (7). Brezger et al. (2007) suggested using iid

gamma priors γjk ∼ Gamma(ν/2, ν/2). The marginal prior distribution of the increment in

(6) turns out to be a heavy-tailed Student t distribution with ν degrees of freedom. The case

ν = 1 (i.e. a Cauchy distribution) is of special interest as a prior for robust nonparametric

regression (Carter and Kohn, 1996).

It is intuitive to expect γjk to be spatially correlated. Accordingly, Yue and Speckman

(2010) and Yue et al. (2009) used a second hierarchy with a first order spatial GMRF model

for log(γjk). They showed that this two-stage GMRF prior was quite flexible for estimating

functions with differing curvature and had successful applications in spatial modeling of

meteorological data and functional magnetic resonance imaging data.

From our investigations, we find that the iid gamma prior approach has a better ability to

capture sharp changes in the underlying spatial process compared to the two-stage GMRF

approach. Such changes often occur in inhomogeneous point patterns. Furthermore, it is

computationally much less demanding. We therefore decided to use iid gamma priors for γjk.

2.3 Other priors

Fully Bayesian inference requires a hyperprior on the precision parameter δ in (6). For

conjugacy, we choose a weakly informative gamma prior, i.e., δ ∼ Gamma(ε, ε), where ε is

some small value, say 0.0001. Other priors, such as the Pareto prior (Yue and Speckman,

2010) and half-Cauchy prior (Gelman, 2006), are also good options. Since the posterior

inference was found robust to the choice of the prior on δ, we use the diffuse gamma prior for


computational simplicity. Since the null space of Aγ is spanned by 1 = (1, . . . , 1)′, we remove

intercept α from model (5) for identifiability. It is equivalent to taking an implicit constant

prior on α. For β, we assume a highly dispersed Gaussian prior π(β) ∼ Np(0, κ−1Ip), where

κ is a small constant, say 0.0001. Such prior specification has been commonly used for the

Bayesian inference for generalized linear mixed models (see, e.g., Fahrmeir and Lang, 2001).

3. Posterior inference via MCMC

Based on the likelihood and prior distributions specified in Section 2, the joint posterior

distribution of our Bayesian hierarchical model is

π(β, z,γ, δ | y) ∝ L(y | β, z) π(z | γ, δ) π(γ) π(δ)π(β). (8)

Although the GMRF prior on z is improper, the posterior distribution in (8) is proper (see

Sun et al., 2001; Fahrmeir and Kneib, 2009). In order to derive an efficient Gibbs sampler,

we let η = Xβ+ z and obtain the joint posterior distribution of (β,η,γ, δ) proportional to

δ(n−1)/2+ε−1 exp(− δ

2(η −Xβ)′Aγ(η −Xβ)− κ

2β′β − εδ

)× exp

(∑j,k

yjkηjk − |Cjk| exp(ηjk)) n−1∏`=1

γν/2−1` exp

(−ν

2γ`

). (9)

We sample β from the full conditionalNp(µ,Σ), where µ = δΣX ′Aγη and Σ = (δX ′AγX+

κIp)−1. The full conditionals of δ and γ` are gamma distributions that can be easily derived.

However, the full conditional of η does not correspond to any regular distribution. We thus

employ the Metropolis-Hastings algorithm to update η as described below.

3.1 Block-move Metropolis-Hastings sampling algorithm

From (9), the full conditional of η = (η11, η21, . . . , ηn1,n2)′ is proportional to

exp(− δ

2(η′Aγη − 2θ′X ′Aγη) +

∑j,k

yjkηjk − |Cjk| exp(ηjk)). (10)

Knorr-Held and Rue (2002) and Rue and Held (2005) suggest updating η as a block using the

Metropolis-Hastings algorithm. The proposal distribution is a GMRF approximation of (10).


One can efficiently evaluate the acceptance rate using the sparse structure of the GMRF.

However, the acceptance rate can be low, especially when η is highly dimensional. Rue

et al. (2004) introduced a more flexible non-Gaussian approximation method for sampling

η, where the accuracy can be tuned by intuitive parameters to nearly any desired precision.

This method is, however, much more computationally intensive.

To improve the acceptance rate, we first partition η into several blocks and then update

each block conditional on the other blocks. More specifically, letting φk = (η1k, . . . , ηn1,k)′

for k = 1, . . . , n2, the conditional prior distribution of φk is then

π(φk|φ−k,γ,θ, δ) ∝ δn1/2|Ak|1/2 exp{− δ

2(φk − µk)TAk(φk − µk)

}, (11)

where φ−k is the vector of ηjk not included in φk, Ak is the submatrix of Aγ corresponding

to φk, and µk is the mean vector depending on φ−k, γ and θ. Due to the sparsity of Ak,

the vector µk can be efficiently computed. See Knorr-Held and Richardson (2003) and Yue

and Speckman (2010) for a detailed specification of the prior distribution in (11).

From (9) and (10), the full conditional of φk is proportional to

π(φk | φ−k,y) ∝ exp{− δ

2(φk − µk)TAk(φk − µk) +

n1∑j=1

yjkηjk − |Cjk| exp(ηjk)}, (12)

We now use a second-order Taylor expansion of exp(ηjk) around φmk , the mode of the full

conditional (12), to construct a suitable approximation of (12),

π(φk | φ−k,y) ∝ exp{− δ

2φTkAkφk + δµTkAkφk +

n1∑j=1

(djηjk −

1

2cjη

2jk

)}∝ exp

{− 1

2φTk (δAk + diag(c))φk + (δAkµk + d)T φk

}, (13)

where cj and dj are easily derived constants. To locate mode φmk , one may use the well-known

(multivariate) Newton-Raphson method. It is straightforward to check that (13) is a GMRF

with distribution

N(

(δAk + diag(c))−1 (δAkµk + d) , (δAk + diag(c))−1),

and will be used as the proposal distribution. Using the Metropolis-Hastings algorithm, for


k = 1, . . . , n2 we draw a candidate φ∗k from π(φk|φ−k,y) and then accept it with probability

α = min

(1,π(φ∗k|φ−k,y) π(φk|φ−k,y)

π(φk|φ−k,y) π(φ∗k|φ−k,y)

). (14)

An important feature of (13) is that the GMRF approximation inherits the Markov prop-

erty of the prior on φk, which makes the MCMC simulation efficient. Furthermore, such an

approximation is fairly accurate and yields acceptance rates above 0.9 for all the examples

that we present in the paper.

4. Simulations and comparisons with kernel methods

We performed two simulated examples to compare the adaptive GMRF method for intensity

estimation with both the fixed and adaptive bandwidth versions of the kernel estimator.

Although we included covariates in our model in Section 2, for simplicity, we do not use any

here. In the first example, the point processes are simulated by assuming the true intensity

to be a (smooth) two-dimensional unimodal function. In the second example, we simulate

realizations of inhomogeneous Poisson processes with log-intensities generated from Gaussian

processes. For fixed bandwidths, we used the cross-validation likelihood method of Stoyan

et al. (1995) and the M(h) minimization method of Diggle (2003) to select bandwidths.

The implementation of the adaptive kernel method follows Diggle et al. (2005), except that

instead of estimating intensities at the observed points, we estimate them at grid locations.

The initial bandwidth h0 was chosen to be the bandwidth used in the fixed bandwidth

kernel methods. We use the integrated squared error (ISE) between the true and estimated

intensities as a performance criterion. Since the results depend on the bandwidth selectors,

we only report the minimum ISE achieved by kernel methods.

4.1 Unimodal intensity function

We first considered a unimodal intensity function on a 10×10 window centered at the origin:

λ(s) = 9 + 27 exp{−2||s||2

}. (15)


We simulated 100 inhomogeneous Poisson spatial point patterns using the rpoispp function

in the spatstat R package. We also used the inhomogeneous Neyman-Scott process (INSP)

model introduced by Waagepetersen (2007), simulating 100 realizations using the intensity

function (15). These are generated by first simulating a stationary Neyman-Scott realization

and then thinning the sample using retension probabilities that depend on λ. We used the

Thomas process, a Neyman-Scott process where the offspring points are distributed about

the parent points according to a bivariate Gaussian density. We chose the constant parent

intensity, the mean number of offspring points per parent and the standard deviation of the

Gaussian density to be 10, 4 and 0.5 respectively. The functions in spatstat we used are

rThomas and rthin. These point patterns have correlations and so are not Poisson processes.

It allows us to examine the performance of the adaptive GMRF method when the underlying

Poisson assumption is not valid. Both the Poisson and INSP point processes have roughly

1000 points in each data set. The intensities are estimated on 100×100 regular grids for the

(non-adaptive) kernel method and on 30×30 grids for adaptive methods.

We evaluated ISEs on three different grid dimensions: 30×30, 50×50 and 100×100. The

results are summarized in Table 1. Compared to the kernel methods, the adaptive GMRF

model works much better for Poisson point patterns. The Poisson assumption shows in the

results for the INSP point patterns, where the ISEs are slightly larger than for the kernel

methods. It is also interesting to see that the adaptive kernel estimator slightly outperforms

the fixed bandwidth kernel for Poisson processes but performs similarly for INSP.

[Table 1 about here.]

4.2 Stationary Gaussian processes

Following Diggle et al. (2005), we also simulated realisations of inhomogeneous Poisson

processes whose intensities are generated from λ(x) = exp {S(x)}, where S(x) is a stationary

Gaussian process with covariance function Cov {S(x), S(x− u)} = σ2 exp(−u/φ). For each


comparison, we simulated 100 samples, each consisting of 1000 points on a 89×89 region.

From each simulated sample we computed the (minimum) integrated squared errors ISEFK ,

ISEAK and ISEAG achieved by the fixed bandwidth kernel, adaptive bandwidth kernel,

and adaptive GMRF estimators respectively. All the ISEs are evaluated on 100×100 regular

grids. To compare the performance of the three estimators it is more informative to look at

the ratios of the ISE’s: RFK/AG = ISEFK/ISEAG and RAK/AG = ISEAK/ISEAG.

Table 2 summarizes the results for each pair of the model parameters (σ2, φ). As we can see,

the adaptive GMRF estimator outperforms both kernel estimators in almost every situation.

Also, the adaptive GMRF model seems to work better for φ = 5, 7 than φ = 3. This is

due to the GMRF prior that we constructed, which assumes second-order smoothness and is

more suitable for modeling Gaussian processes with large φ (i.e. stronger correlation). This

can also be seen from the first simulation example, where the true intensity function is a

smooth unimodal function. Finally, note that the superiority of using adaptive methods is

more pronounced at larger values of σ2, consistent with the fact that larger values of σ2 are

associated with more pronounced spatial heterogenity in the resulting point patterns. This

agrees with the findings in Diggle et al. (2005).

As highlighted by a referee, the performance should not be as good for, say, a Thomas

Neyman-Scott model, since it violates the assumptions of model (5). Briefly, we expect

better performance with weaker spatial correlation (i.e. closer to a Poisson process) and vice

versa. A comprehensive study would require consideration of a wide range of point process

models and model parameters, which is beyond the scope of this paper.

[Table 2 about here.]

4.3 Some remarks

The adaptive GMRF framework allows for reliable and efficient MCMC simulation. The

Markov chains have quick convergence: 15,000 MCMC iterations with a burn-in of 5,000


proved to be sufficient for all the simulated examples. Due to the block-move sampling

method described in Section 3, the MCMC computation is fast considering the large number

of parameters to be estimated in the model. With n = 402, 602, 802 and 1002, our FORTRAN

program took 114, 255, 510 and 837 seconds, respectively, to estimate intensities from a

simulated data set of 10,000 points. The computational times increase roughly at order n.

All the computations were carried out on a Mac laptop with a 2.13 GHz Intel Core 2 Duo

processor.

Since the adaptive GMRF model is based on binning of the data, its performance depends

on the choice of grid size. This is similar to bandwidth choice with kernel methods. For our

simulation studies, we used grids of size 30×30, 40×40 and 50×50. The results in Table 2 are

for the best performing grid size, specifically 30×30 for σ2=1, 40×40 for σ2=3 and 50×50

for σ2=7. Choice of grid size does not appear to be influenced by φ, at least, not for the

values we considered. Note that for the kernel method with fixed bandwidth, we used optimal

bandwidths selected using cross-validation (Stoyan et al., 1995) and the M(h) miminization

method (Diggle, 2003).

We believe the advantage of the adaptive GMRF estimator is the use of data information

to choose the amount of smoothing. In contrast, the kernel smoothing methods depend

on bandwidth selectors that have not been well studied in the context of spatial point

analyses. In our simulations, we found that the two selectors we used, the cross-validation

likelihood method and the M(h) minimization method, often selected significantly different

bandwidths. Stoyan et al. (1995) indicates that the likelihood cross-validation method has

a tendency to return small bandwidths. The quantity M(h) depends on the K function,

a second-order statistic that will have to be estimated in practice. Estimates of the K

function are known to be rather variable. Moreover, the estimation using adaptive kernel


method is sensitive to the initial bandwidth h0. Therefore, the kernel estimators rarely yielded

consistent performance in our simulation studies.

5. Application to rainforest data

We here use the dataset from a census of a tropical forest in a 1000 by 500 meter plot in

Barro Colorado Island in the Panama. See Hubbell and Foster (1983), Condit et al. (1996)

and Condit (1998). The full dataset contains the locations of more than 300 species of trees,

many of these numbering thousands of trees. We consider three particular species of trees:

the Acalypha diversifolia (Acaldi, 528 trees), Capparis frondosa (Cappfr, 3299 trees) and

Lonchocarpus heptaphyllus (Loncla, 836 trees) species. Measurements of the altitude, slope,

various soil minerals, and soil pH are available on a grid of locations. Figure 1 shows plots of

Potassium and Nitrogen levels in the soil as well as altitude and slope over the observation

region.

We obtained intensity estimates for these three species of trees using the parametric

model of Waagepetersen and Guan (2009), the fixed and adaptive kernel methods, and the

semiparametric model based on the adaptive GMRF. Figures 2, 3 and 4 show our results for

the Acaldi, Cappfr and Loncla species respectively. Specifically, these figures contain plots

of the actual data, the parametric, adaptive kernel and semi-parametric adaptive GMRF

estimates (we exclude the plots of the fixed kernel estimates in the interest of conserving

space but they are available upon request). The intensity plots are on the square-root scale

to show the spatial structures more clearly.

Waagepetersen and Guan (2009) found altitude and potassium to be significant for both

Acaldi and Cappfr, and nitrogen and phosphorus for Loncla. However, the parametric in-

tensity estimates tend to contain the features that are visible in the plot of covariate values

but are not apparent from the actual point data. For example, in Figure 2(b) for Acaldi,

the peak near the lower right corner does not correspond to an abundance of trees, while in


Figure 3(a) for Cappfr, the dense curved area from the lower left corner to location (200,

200) has a very low estimated intensity as shown in Figure 3(b). Also notice the similarity

between the Acaldi and Cappfr plots. In Figure 4(b) for Loncla, where the coefficient for

the nitrogen covariate is negative, we find that the pattern appears to roughly follow the

inverse of the nitrogen pattern. See, for example, the two intensity peaks on the left edge

and the low intensity area on the upper right. We also found that the non-adaptive kernel

estimator (not shown) appears to over-smooth dense regions and under-smooth areas with

few data points. Conversely, the adaptive kernel estimator seems to provide local smoothing

for intensity estimation, but at the cost of losing global smoothness, resulting in a very

granular appearance.

Using the semiparametric adaptive GMRF model, the significant covariates at the 5% level

were potassium for Acaldi, slope for Cappfr, and altitude for Loncla. For Acaldi, the estimate

of the regression parameter for potassium was 0.0045 (0.0002, 0.0089) (approximate 95%

Bayesian credible interval in parenthesis). For Cappfr, the estimate of slope was -2.51 (-4.45,

-0.62), and for Loncla the estimate of elevation was 0.094 (0.043, 0.148). Note that the result

on potassium for Acaldi is consistent with Waagepetersen and Guan (2009). We also found

that our Bayesian point estimates are close to Waagepetersen and Guan’s estimates, but with

more conservative 95% intervals. The reason may be that our estimates take into account

the uncertainty from estimating the spatial features that are not due to the covariates.

Compared to the kernel estimates, the proposed method, in our opinion, achieves a much

better balance between smoothing the image and retaining the necessary details on spatial

extents. Due to its data-driven nature, our model is able to apply adaptive smoothing when

the point process has increasingly local variation and provides general smoothing where the

process is more homogeneous. In particular, notice that although the intensity estimates

contain some spatial features of the underlying covariate structure, these features do not


overwhelm the information contained in the data. For example, for the Cappfr species, the

dense curved structure in the lower left portion of the observation region is retained. As

suggested by a referee, we checked the plots of estimated γjk for all species, which show low

estimates for clustered points and high estimates for more homogeneous points. Such plots

indicate that it is necessary to consider the spatially adaptive approach for the rainforest

data.

[Figure 1 about here.]




6. Discussion

In this work, we introduced a data-driven adaptive method to obtain estimates of the

intensity function from a spatial point pattern. This method involves binning the data and

using adaptive Gaussian Markov random fields to model the underlying spatial process.

The amount of smoothing at the local level is controlled by parameters in the prior, and

these parameters are estimated using the data in the neighboring areas. The simulation

study and rainforest data examples show that this method can achieve a nice balance

between smoothing and retension of features of the intensity function that is difficult to

attain with kernel estimation. Our method is versatile and works with or without covariates.

The nonparametric version is useful for exploratory analyses and mapping purposes, or if

information about covariates or form of the model is lacking. This is particularly important

when the process producing the point pattern is complicated and/or not well understood.

If appropriate covariates are available, inference regarding the dependence of the intensity

function on these covariates can be obtained using the semiparametric version of our method.


To implement our method, one must first select an appropriate grid dimension for binning.

Currently, this is a subjective decision to be made by the investigator. We examined the

effects of choice of grid size in our simulation study and provided some initial guidelines. In

particular, in the simulation study with the stationary Gaussian processes (Section 4.2), we

found that performance is better with small cells (larger grid size) when σ is larger, i.e. when

there is more heterogeneity in the point pattern. Thus we recommend a grid size so that each

cell is roughly homogeneous. Also, a finer grid can be used if the data size is larger. We are

planning a more comprehensive study to explore this topic further, especially in the context

of differing correlation strengths. A theoretical study of this and of issues of consistency are

also planned.

Another area of further investigation is the use of these intensity estimates in estimates of

the inhomogeneous K function (Baddeley et al., 2000). The correlation exhibited in point

patterns is often of great interest. However, in the inhomogeneous case this requires good

estimates of the intensity function. We will study how estimates of the inhomogeneous K

function might be improved by the use of our intensity estimates.

Acknowledgements

Yu Yue’s research is supported by PSC-CUNY research award #60147-39 40.

References

Baddeley, A. and Turner, R. (2000). Practical maximum pseudolikelihood for spatial pointpatterns. Australian and New Zealand Journal of Statistics 42, 283–322.

Baddeley, A. J., Møller, J., and Waagepetersen, R. (2000). Non- and semi-parametricestimation of interaction in inhomogeneous point patterns. Statistica Neerlandica 54,329–350.

Baddeley, A. J. and Turner, R. (2005). Spatstat: an R package for analyzing spatial pointpatterns. Journal of Statistical Software 12, 1–42.

Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (withdiscussion). Journal of the Royal Statistical Society, Series B: Methodological 36, 192–236.


Brezger, A., Fahrmeir, L., and Hennerfeind, A. (2007). Adaptive Gaussian Markov randomfields with applications in human brain mapping. Journal of the Royal Statistical Society:Series C (Applied Statistics) 56, 327–345.

Brown, L., Cai, T., Zhang, R., Zhao, L., and Zhou, H. (2009). The root-unroot algorithmfor density estimation as implemented via wavelet block thresholding. to appear inProbability Theory and Related Fields .

Cai, T. (1999). Adaptive wavelet estimation: a block thresholding and oracle inequalityapproach. Annals of Statistics 27, 898–924.

Carter, C. K. and Kohn, R. (1996). Markov chain Monte Carlo in conditionally Gaussianstate space models. Biometrika 83, 589–601.

Condit, R. (1998). Tropical forest census plots. Springer-Verlag and R. G. Landes Company,Berlin, Germany and Georgetown, Texas.

Condit, R., Hubbell, S. P., and Foster, R. B. (1996). Changes in tree species abundance ina neotropical forest: impact of climate change. Journal of Tropical Ecology 12, 231–256.

Diggle, P., Rowlingson, B., and Su, T. (2005). Point process methodology for on0line spatio-temporal disease surveillance. Environmetrics 16, 423–434.

Diggle, P. J. (1985). A kernel method for smoothing point process data. Applied Statistics34, 138–147.

Diggle, P. J. (2003). Statistical Analysis of Spatial Point Patterns. Arnold, London, 2ndedition.

Diggle, P. J., Morris, S. E., and Wakefield, J. C. (2000). Point-source modelling using matchedcase-control data. Biostatistics 1, 89–105.

Fahrmeir, L. and Kneib, T. (2009). Propriety of posteriors in structured additive regressionmodels: Theory and empirical evidence. Journal of Statistical Planning and Inference139, 843–859.

Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generalized additive mixed modelsbased on Markov random field priors. Journal of the Royal Statistical Society, Series C:Applied Statistics 50, 201–220.

Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models.Bayesian Analysis 1, 515–533.

Gu, C. (2002). Smoothing Spline ANOVA Models. Springer-Verlag Inc, New York.Guan, Y. (2008). On consistent nonparametric intensity estimation for inhomogeneous

spatial point processes. Journal of the American Statistical Association 103, 1238–1247.Guan, Y. (2009). Fast block variance estimation procedure for inhomogeneous spatial point

processes. Biometrika 96, 213–220.Guan, Y. and Loh, J. M. (2007). A thinned block bootstrap procedure for modeling

inhomogeneous spatial point patterns. Journal of the American Statistical Association102, 1377–1386.

Hubbell, S. P. and Foster, R. B. (1983). Diversity of canopy trees in a neotropical forestand implications for conservation. In Sutton, S. L., Whitmore, T. C., and Chadwick,A. C., editors, Tropical Rain Forest: Ecology and Management, pages 25–41. BlackwellScientific.

Illian, J. B. and Rue, H. (2010). A toolbox for fitting complex spatial point process modelsusing integrated Laplace transformation (INLA). Technical Report 6, Department ofMathematical Sciences, Norwegian University of Science and Technology.


Kass, R., Ventura, V., and Cai, C. (2009). Statistical smoothing of neuronal data. Biometrics65, 1243–1253.

Knorr-Held, L. and Richardson, S. (2003). A hierarchical model for space-time surveillancedata on meningococcal disease incidence. Journal of the Royal Statistical Society, SeriesC: Applied Statistics 52, 169–183.

Knorr-Held, L. and Rue, H. (2002). On block updating in Markov random field models fordisease mapping. Scandinavian Journal of Statistics 29, 597–614.

Liang, S., Banerjee, S., and Carlin, B. P. (2009). Bayesian wombling for spatial pointprocesses. Biometrics 65, 1243–1253.

Møller, J. and Waagepetersen, R. P. (2003). Statistical Inference and Simulation for SpatialPoint Processes. Chapman and Hall, New York.

Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications,volume 104 of Monographs on Statistics and Applied Probability. Chapman & Hall,London.

Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latentGaussian models using integrated nested Laplace approximations (with discussion).Journal of the Royal Statistical Society, Series B 71, 319–392.

Rue, H., Steinsland, I., and Erland, S. (2004). Approximating hidden Gaussain Markovrandom fields. Journal of the Royal Statistical Society, Series B: Statistical Methodology66, 877–892.

Schoenberg, F. P. (2003). Multi-dimensional residual analysis of point process models forearthquake occurrences. Journal of American Statistical Association 98, 789—795.

Schoenberg, F. P. (2004). Consistent parametric estimation of the intensity of a spatial-temporal point process. Journal of Statistical Planning and Inference 128, 79–93.

Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. CRC Press,New York.

Stein, M. L., Quashnock, J. M., and Loh, J. M. (2000). Estimating the K function of a pointprocess with an application to cosmology. Annals of Statistics 28, 1503–1532.

Stoyan, D., Kendall, W. S., and Mecke, J. (1995). Stochastic Geometry and Its Applications,2nd edition. John Wiley, New York.

Sun, D., Tsutakawa, R. K., and He, Z. (2001). Propriety of posteriors with improper priorsin hierarchical linear mixed models. Statistica Sinica 11, 77–95.

Waagepetersen, R. (2007). An estimating function approach to inference for inhomogeneousNeyman-Scott processes. Biometrics 63, 252–258.

Waagepetersen, R. and Guan, Y. (2009). Two-step estimation for inhomogeneous spatialpoint processes. Journal of the Royal Statistical Association Series B 71, 685–702.

Wahba, G. (1990). Spline Models for Observational Data. SIAM [Society for Industrial andApplied Mathematics], Philadelphia.

Yue, Y., Loh, J. M., and Lindquist, M. A. (2009). Adaptive spatial smoothing of fMRIimages. Statistics and Its Interface (to appear).

Yue, Y. and Speckman, P. L. (2010). Nonstationary spatial Gaussian Markov random fields.Journal of Computational and Graphical Statistics 19, 96–116.

Yue, Y., Speckman, P. L., and Sun, D. (2008). Fully Bayesian adaptive spline smoothing.Manuscript (submitted), University of Missouri-Columbia, Department of Statistics.


0 200 400 600 800 1000

0100

200

300

400

500

100

150

200

250

300

350

(a) Potassium

0 200 400 600 800 1000

0100

200

300

400

500

-0.05

0.00

0.05

0.10

0.15

0.20

(b) Slope

0 200 400 600 800 1000

0100

200

300

400

500

-10

0

10

20

30

40

(c) Nitrogen

0 200 400 600 800 10000

100

200

300

400

500

125130135140145150155

(d) Altitude

Figure 1. Plots of the measurements of soil Potassium and mineralized Nitrogen (in unitsof mg/kg of soil), as well as the altitude (in meters) and slope in the observation region inBarro Colorado Island.


0 200 400 600 800 1000

0100200300400500

(a) Acaldi

0 200 400 600 800 1000

0100

200

300

400

500

0.025

0.030

0.035

0.040

0.045

(b) Parametric estimator

0 200 400 600 800 1000

0100

200

300

400

500

0.02

0.04

0.06

0.08

0.10

0.12

0.14

(c) Adaptive kernel estimator

0 200 400 600 800 10000

100

200

300

400

500

0.02

0.04

0.06

0.08

0.10

0.12

(d) Adaptive GMRF estimator

Figure 2. Plots of Acalypha diversifolia trees: (a) tree locations; (b) intensity estimatesusing the parametric method; (c) intensity estimates using the adaptive kernel method; (d)intensity estimates using the adaptive GMRF method. (The estimates are on a square-rootscale.)


0 200 400 600 800 1000

0100200300400500

(a) Cappfr

0 200 400 600 800 1000

0100

200

300

400

500

0.06

0.07

0.08

0.09

0.10

0.11


0 200 400 600 800 1000

0100

200

300

400

500

0.05

0.10

0.15

0.20

0.25

0.30


0 200 400 600 800 10000

100

200

300

400

500

0.05

0.10

0.15


Figure 3. Plots of Capparis frondosa trees: (a) tree locations; (b) intensity estimatesusing the parametric method; (c) intensity estimates using the adaptive kernel method; (d)intensity estimates using the adaptive GMRF method. (The estimates are on a square-rootscale.)


0 200 400 600 800 1000

0100200300400500

(a) Loncla

0 200 400 600 800 1000

0100

200

300

400

500

0.025

0.030

0.035

0.040

0.045

0.050

0.055


0 200 400 600 800 1000

0100

200

300

400

500

0.020.040.060.080.100.120.140.16


0 200 400 600 800 10000

100

200

300

400

500

0.02

0.04

0.06

0.08

0.10

0.12


Figure 4. Plots of Lonchocarpus heptaphyllus trees: (a) tree locations; (b) intensityestimates using the parametric method; (c) intensity estimates using the adaptive kernelmethod; (d) intensity estimates using the adaptive GMRF method. (The estimates are on asquare-root scale.)


Table 1The integrated squared errors obtained with different methods of intensity estimation, averaged over 100 simulated

realizations from the inhomogeneous Poisson (POIS) and inhomogeneous Neymann-Scot (INSP) models, withunimodal intensity function in (15).

POIS INSPMethod n = 302 n = 502 n = 1002 n = 302 n = 502 n = 1002

Fixed Kernel 512.71 512.78 512.70 706.64 706.86 706.55Adapt Kernel 502.99 503.25 503.36 705.27 705.53 705.69Adapt GMRF 422.53 437.02 436.30 720.21 726.83 743.59


Table 2Summary results from simulation study to compare performance of the adaptive GMRF, fixed and adaptive

band-width kernel estimators, for different values of the Gaussian process parameters σ2 and φ. The values are theratios of minimum integrated squared errors achieved by the kernel and adaptive GMRF estimators.

σ2 = 1 σ2 = 3 σ2 = 5 σ2 = 7RFK/AG RAK/AG RFK/AG RAK/AG RFK/AG RAK/AG RFK/AG RAK/AG

φ = 3 .859 1.048 1.148 1.016 1.097 1.012 1.188 1.069φ = 5 .949 1.199 1.496 1.659 1.696 1.787 1.493 1.116φ = 7 .851 1.162 1.311 1.605 1.427 1.658 1.164 0.963

bayesian semiparametric intensity estimation for ...meng/papers/biometrics.yueloh.2011.pdf · liang...

Documents