bayesian semiparametric intensity estimation for ...meng/papers/biometrics.yueloh.2011.pdf · liang...
TRANSCRIPT
Biometrics 000, 000–000 DOI: 000
000 0000
Bayesian Semiparametric Intensity Estimation for Inhomogeneous Spatial
Point Processes
Yu Ryan Yue∗
Baruch College, City University of New York, New York, NY 10010, USA
*email: [email protected]
and
Ji Meng Loh∗
AT&T Labs-Research, Florham Park, NJ 07932, USA
*email: [email protected]
Summary: In this work we propose a fully Bayesian semiparametric method to estimate the
intensity of an inhomogeneous spatial point process. The basic idea is to first convert intensity
estimation into a Poisson regression setting via binning data points on a regular grid, and then
model log intensity semiparametrically using an adaptive version of Gaussian Markov random fields
to smooth the corresponding counts. The inference is carried by an efficient MCMC simulation
algorithm. Compared to existing methods for intensity estimation, e.g., parametric modeling and
kernel smoothing, the proposed estimator not only provides inference regarding the dependence of
the intensity function on possible covariates, but also uses information from the data to adaptively
determine the amount of smoothing at the local level. The effectiveness of using our method is
demonstrated through simulation studies and an application to a rainforest dataset.
Key words: Adaptive spatial smoothing; Gaussian Markov random fields; Gibbs sampling; In-
tensity estimation; Spatial point process.
1
Bayesian Semiparametric Intensity Estimation 1
1. Introduction
Spatial point data arise in a wide variety of applications including ecology, seismology,
astronomy and epidemiology (e.g. Waagepetersen, 2007; Schoenberg, 2003; Stein et al.,
2000; Diggle et al., 2000, respectively). Diggle (2003) and Stoyan et al. (1995) both contain
comprehensive introductions to the analysis of spatial point patterns. While spatial point
analyses range from relatively simple exploratory methods to complex Markov Chain Monte
Carlo (MCMC) algorithms for model fitting (Møller and Waagepetersen, 2003), in any spatial
point analysis, the first main interest is often the intensity function λ(s), s ∈ R2, defined by
λ(s) = lim|ds|→0
(E[N(ds)]
|ds|
), (1)
where ds represents an infinitesimal region around s, |ds| its area and N(ds) the number of
points in ds. The quantity λ(s)|ds| can be thought of as the probability of observing one data
point in the region ds. In the stationary case, λ(s) ≡ λ a constant and, given a spatial point
pattern with n points observed in an observational region D ⊂ R2, can be easily estimated
using λ = n/|D|. In the nonstationary or inhomogeneous case, however, λ(s) is not constant
but a function of s.
Motivated by theoretical considerations of consistency, much of recent work has considered
parametric estimation of the intensity, i.e. modeling the intensity as a function of one or more
measured covariates, for an inhomogeneous spatial point process. Specifically, suppose X(s),
s ∈ D, is a row vector of p spatial covariates. The point data are then modeled as a realization
of a spatial point process with intensity function given by
λ(s;α,β) = exp {α +X(s)β} . (2)
This model can be generalized by replacing the exponential function by another positive
strictly increasing differentiable function. Assuming that the point process is an inhomoge-
neous Poisson with intensity function given by (2), the log-likelihood for a point pattern
2 Biometrics, 000 0000
T = {t1, . . . , tn} is
L(α,β;T ) =1
|D|
n∑i=1
log λ(ti;α,β)− 1
|D|
∫D
λ(s;α,β) ds. (3)
The quantity L(α,β;X) can be easily maximized using regular software for fitting generalized
linear models (Baddeley and Turner, 2000) to obtain estimates. The spatstat R package
(Baddeley and Turner, 2005) contains the function ppm that can be used to fit such model.
Schoenberg (2004) showed that under some mild conditions on the spatial point process,
the estimates can be consistent even if the process is not Poisson. Guan and Loh (2007),
Waagepetersen and Guan (2009) and Guan (2009) used this framework in their work.
Many important theoretical results for intensity function estimation using parametric
models have been derived under some assumptions. One crucial assumption is that the model
is correct, i.e. the correct covariates X(s) are used, and that the choice of the relationship
between the intensity function and the covariates, usually the form shown in (2), is correct.
However, spatial point patterns are often the results of complex interactions between physical,
environmental and ecological processes. For example, a census of tree locations in a tropical
forest includes many generations of trees whose growth and decay are affected by soil, rainfall,
water runoff, etc. At best, this makes the choice of model very challenging. If the relevant
covariates were not measured, an appropriate model may not even be possible.
When there is no covariate involved or it is too difficult to incorporate covariate in-
formation, nonparametric estimation of the intensity function is often used. One way to
estimate λ(s) nonparametrically is with kernel estimation (Diggle, 1985), usually with a single
bandwidth h chosen by eye. Diggle (2003) gave an ad-hoc rule, h = 0.68n−0.2 for the quadratic
Epanecnikov kernel when the observation window is a square. He also provided a more
sophisticated method where h can be obtained by minimizing an estimate M(h) of the mean
squared error as a function of h. Stoyan et al. (1995) suggested a likelihood cross-validation
method for selecting the bandwidth, which is an extension of the method in Silverman
Bayesian Semiparametric Intensity Estimation 3
(1986) to the context of spatial point patterns. Since the same amount smoothing is applied
to the whole data area, the ordinary kernel smoother often fails to capture the local spatial
variation of the intensity function. Diggle et al. (2005) therefore used various bandwidths
hi to perform adaptive smoothing with the kernel estimator. Guan (2008) introduced an
interesting version of kernel estimation, with the distance measure in the kernel based on
covariate distance instead of geographic distance. This allowed consistent estimation of the
intensity by pooling information over widely separated points that have similar covariate
values. He addressed bandwidth selection and dimension reduction using a sliced inverse
regression technique. While it is nonparametric and does not involve a specified form of the
relationship between the intensity and the covariate(s), this method does assume that the
correct covariate(s) are used. Furthermore, it is not entirely clear how covariate distances
should be computed if two or more covariates are involved. Guan (2008) used the Euclidean
norm on the standardized covariates. The standardization reduces the issues involved with
using the Euclidean norm. However, with some covariates, e.g. altitude and soil pH, it is
difficult to interpret the meaning of this distance, nor is it clear whether the Euclidean norm
is the correct norm to use.
To combine the strengths of both parametric and nonparametric methods, we propose
modeling the intensity function in a semiparametric way, i.e.,
λ(s;α,β, z) = exp {α +X(s)β + z(s)} , (4)
where z(s) is a stochastic process accounting for spatial correlation among locations that
cannot be explained by the covariates. We convert intensity estimation into a generalized
spatial regression problem via binning the data and assume that the observed number of
points in each grid cell follows a Poisson distribution with intensity of the form given in (4).
A fully Bayesian hierarchical model is developed by taking appropriate prior distributions on
α, β and z(s). In particular, the prior on z(s) is an adaptive version of a Gaussian Markov
4 Biometrics, 000 0000
random field (GMRF). Such an adaptive GMRF incorporates local spatial structure into
the model while simultaneously accounting for spatial variation and uncertainty at a larger
spatial scale. The inference is carried out by an efficient MCMC simulation algorithm using
the sparse structure of the GMRF. The proposed method avoids the restrictive assumptions
of the specific form of relationship between the intensity function and the set of possible
covariates. Compared to a kernel smoother whose bandwidth(s) is (are) often chosen in an
ad-hoc manner, the method uses information contained in the data to adaptively determine
the amount of smoothing at the local level.
There are some other Bayesian approaches that can be used to estimate intensity of an
inhomogeneous spatial point process. Knorr-Held and Rue (2002) and Rue et al. (2004)
developed a flexible Bayesian model for disease mapping based on MCMC inference. Rue
et al. (2009) introduced a novel Bayesian inference method based on integrated Laplace
approximations (INLA), which provides (much) faster and more reliable inference than using
MCMC. Liang et al. (2009) introduced Bayesian wombling for spatial point processes. All the
methods, however, modeled unknown spatial process with either non-adaptive GMRFs or
conditionally autoregressive (CAR) models (Besag, 1974). Illian and Rue (2010) incorporated
constructed covariates into their model to capture local spatial structure and also used GMRF
to account for spatial variation at a larger scale.
The rest of the paper is organized as follows. In Sections 2 and 3, we introduce our Bayesian
hierarchical modeling method based on an adaptive GMRF prior and MCMC inference.
In Section 4, we compare our method with kernel intensity estimation through simulation
studies. In Section 5, we apply the method to four species of trees in the Barro Colorado
Island tropical forest dataset.
Bayesian Semiparametric Intensity Estimation 5
2. Model and priors
2.1 Observation model
Consider a regular grid In with n = n1n2 cells and denote by Cjk the grid cell in the j-th
row and k-th column. Given a spatial point pattern, we bin the observed points into the
cells. Letting yjk be the number of points binned to Cjk, we use the model
yjk|λjk ∼ Poisson (|Cjk|λjk) and log(λjk) = α +X(sjk)β + z(sjk), (5)
for j = 1, . . . , n1 and k = 1, . . . , n2, as a simple approximation to (3). The location point
sjk is the center of Cjk, X(sjk) represents p spatial covariates measured at each sjk, z(sjk)
is the realization of a spatial process at sjk and λjk can be considered the average intensity
for each grid cell. We note that in an actual application, the covariates may not have been
measured at the grid points. In such situations some preprocessing such as interpolation
would be needed.
Binning has been proposed by a number of authors in density estimation and nonparametric
regression. In particular, Brown et al. (2009) considered binning data to convert the density
estimation problem to a nonparametric regression problem in one dimension, using a wavelet
block thresholding estimator (Cai, 1999) as the regression estimator. Kass et al. (2009)
considered, for neuronal spike data, adaptive smoothing by binning the data and using
Bayesian regression splines where the number of knots of the cubic splines is determined
from data.
2.2 Spatially adaptive Gaussian Markov random fields
We model the spatial process z(sjk) in (5) using an adaptive version of the Gaussian Markov
random field (GMRF) introduced by Yue and Speckman (2010). In particular, this GMRF
is derived by approximating a two-dimensional Bayesian thin-plate spline (TPS) estimator
(e.g. Wahba, 1990; Gu, 2002). We briefly describe the adaptive GMRF in this section and
6 Biometrics, 000 0000
refer the reader to Rue and Held (2005) and Yue and Speckman (2010) respectively for
details on GMRF’s in general and on the movitation behind the TPS GMRF.
The adaptive GMRF of Yue and Speckman (2010) is based on the following spatial
Gaussian random walk model :
(∇2
(1,0) +∇2(0,1)
)zjk ∼ N
(0, (δγjk)
−1) , (6)
where zjk = z(sjk), and ∇2(1,0) and ∇2
(0,1) denote the second-order backward difference opera-
tors in the vertical and horizontal directions respectively, i.e., ∇2(1,0)zjk = zj+1,k−2zjk+zj−1,k
and ∇2(0,1)zjk = zj,k+1 − 2zjk + zj,k−1 for 2 6 j 6 n1 − 1 and 2 6 j 6 n2 − 1 . The positive
parameter δ is a global smoothing parameter accounting for large-scale spatial variation
while the γjk (also positive) are the adaptive smoothing parameters that capture the local
structure of the process z(s). The use of γjk is important for estimating the intensity function
from an inhomogeneous point pattern. Heuristically, we need less smoothing (small γjk) at
dense sub-regions and relatively more smoothing (large γjk) where very few or no points are
observed. Standard smoothing techniques (e.g., kernel smoother with fixed width) involve
a trade-off between increasing detectability for intensity and retaining information on the
spatial extent of the areas. Using adaptive γjk reduces such loss of information. The need of
adaptive smoothing for intensity estimation was also demonstrated in Diggle et al. (2005)
and Illian and Rue (2010). Note that setting γjk ≡ 1 makes (6) a non-adaptive GMRF
on lattice, which yields a Bayesian solution for thin-plate splines (see Rue and Held, 2005,
section 3.4.2).
Yue and Speckman (2010) showed that model (6) together with certain edge correction
terms define an adaptive GMRF with matrix form given by
π(z | γ, δ) ∝ δ(n−1)/2|Aγ|1/2+ exp(− δ
2zTAγz
), (7)
where z = (z11, z21, . . . , zn1,n2)′, γ = (γ21, γ22, . . . , γn1,n2)
′, Aγ = BTdiag(γ)B is an n × n
structure matrix of rank n − 1, B is an (n − 1) × n full rank sparse matrix, and diag(γ)
Bayesian Semiparametric Intensity Estimation 7
is a diagonal matrix of γ. The quantity |Aγ|+ is the product of the non-zero eigenvalues
from Aγ. Yue et al. (2008) and Yue and Speckman (2010) proved that |Aγ|+ = |BBT |∏
` γ`
which is easily computed. The prior (7) is an intrinsic GMRF because its density is improper
Gaussian (Aγ is singular) and it has an attractive Markov property makingAγ highly sparse.
See Yue and Speckman (2010) for the detailed specification of Aγ.
Additional priors need to be specified for γjk in (7). Brezger et al. (2007) suggested using iid
gamma priors γjk ∼ Gamma(ν/2, ν/2). The marginal prior distribution of the increment in
(6) turns out to be a heavy-tailed Student t distribution with ν degrees of freedom. The case
ν = 1 (i.e. a Cauchy distribution) is of special interest as a prior for robust nonparametric
regression (Carter and Kohn, 1996).
It is intuitive to expect γjk to be spatially correlated. Accordingly, Yue and Speckman
(2010) and Yue et al. (2009) used a second hierarchy with a first order spatial GMRF model
for log(γjk). They showed that this two-stage GMRF prior was quite flexible for estimating
functions with differing curvature and had successful applications in spatial modeling of
meteorological data and functional magnetic resonance imaging data.
From our investigations, we find that the iid gamma prior approach has a better ability to
capture sharp changes in the underlying spatial process compared to the two-stage GMRF
approach. Such changes often occur in inhomogeneous point patterns. Furthermore, it is
computationally much less demanding. We therefore decided to use iid gamma priors for γjk.
2.3 Other priors
Fully Bayesian inference requires a hyperprior on the precision parameter δ in (6). For
conjugacy, we choose a weakly informative gamma prior, i.e., δ ∼ Gamma(ε, ε), where ε is
some small value, say 0.0001. Other priors, such as the Pareto prior (Yue and Speckman,
2010) and half-Cauchy prior (Gelman, 2006), are also good options. Since the posterior
inference was found robust to the choice of the prior on δ, we use the diffuse gamma prior for
8 Biometrics, 000 0000
computational simplicity. Since the null space of Aγ is spanned by 1 = (1, . . . , 1)′, we remove
intercept α from model (5) for identifiability. It is equivalent to taking an implicit constant
prior on α. For β, we assume a highly dispersed Gaussian prior π(β) ∼ Np(0, κ−1Ip), where
κ is a small constant, say 0.0001. Such prior specification has been commonly used for the
Bayesian inference for generalized linear mixed models (see, e.g., Fahrmeir and Lang, 2001).
3. Posterior inference via MCMC
Based on the likelihood and prior distributions specified in Section 2, the joint posterior
distribution of our Bayesian hierarchical model is
π(β, z,γ, δ | y) ∝ L(y | β, z) π(z | γ, δ) π(γ) π(δ)π(β). (8)
Although the GMRF prior on z is improper, the posterior distribution in (8) is proper (see
Sun et al., 2001; Fahrmeir and Kneib, 2009). In order to derive an efficient Gibbs sampler,
we let η = Xβ+ z and obtain the joint posterior distribution of (β,η,γ, δ) proportional to
δ(n−1)/2+ε−1 exp(− δ
2(η −Xβ)′Aγ(η −Xβ)− κ
2β′β − εδ
)× exp
(∑j,k
yjkηjk − |Cjk| exp(ηjk)) n−1∏`=1
γν/2−1` exp
(−ν
2γ`
). (9)
We sample β from the full conditionalNp(µ,Σ), where µ = δΣX ′Aγη and Σ = (δX ′AγX+
κIp)−1. The full conditionals of δ and γ` are gamma distributions that can be easily derived.
However, the full conditional of η does not correspond to any regular distribution. We thus
employ the Metropolis-Hastings algorithm to update η as described below.
3.1 Block-move Metropolis-Hastings sampling algorithm
From (9), the full conditional of η = (η11, η21, . . . , ηn1,n2)′ is proportional to
exp(− δ
2(η′Aγη − 2θ′X ′Aγη) +
∑j,k
yjkηjk − |Cjk| exp(ηjk)). (10)
Knorr-Held and Rue (2002) and Rue and Held (2005) suggest updating η as a block using the
Metropolis-Hastings algorithm. The proposal distribution is a GMRF approximation of (10).
Bayesian Semiparametric Intensity Estimation 9
One can efficiently evaluate the acceptance rate using the sparse structure of the GMRF.
However, the acceptance rate can be low, especially when η is highly dimensional. Rue
et al. (2004) introduced a more flexible non-Gaussian approximation method for sampling
η, where the accuracy can be tuned by intuitive parameters to nearly any desired precision.
This method is, however, much more computationally intensive.
To improve the acceptance rate, we first partition η into several blocks and then update
each block conditional on the other blocks. More specifically, letting φk = (η1k, . . . , ηn1,k)′
for k = 1, . . . , n2, the conditional prior distribution of φk is then
π(φk|φ−k,γ,θ, δ) ∝ δn1/2|Ak|1/2 exp{− δ
2(φk − µk)TAk(φk − µk)
}, (11)
where φ−k is the vector of ηjk not included in φk, Ak is the submatrix of Aγ corresponding
to φk, and µk is the mean vector depending on φ−k, γ and θ. Due to the sparsity of Ak,
the vector µk can be efficiently computed. See Knorr-Held and Richardson (2003) and Yue
and Speckman (2010) for a detailed specification of the prior distribution in (11).
From (9) and (10), the full conditional of φk is proportional to
π(φk | φ−k,y) ∝ exp{− δ
2(φk − µk)TAk(φk − µk) +
n1∑j=1
yjkηjk − |Cjk| exp(ηjk)}, (12)
We now use a second-order Taylor expansion of exp(ηjk) around φmk , the mode of the full
conditional (12), to construct a suitable approximation of (12),
π(φk | φ−k,y) ∝ exp{− δ
2φTkAkφk + δµTkAkφk +
n1∑j=1
(djηjk −
1
2cjη
2jk
)}∝ exp
{− 1
2φTk (δAk + diag(c))φk + (δAkµk + d)T φk
}, (13)
where cj and dj are easily derived constants. To locate mode φmk , one may use the well-known
(multivariate) Newton-Raphson method. It is straightforward to check that (13) is a GMRF
with distribution
N(
(δAk + diag(c))−1 (δAkµk + d) , (δAk + diag(c))−1),
and will be used as the proposal distribution. Using the Metropolis-Hastings algorithm, for
10 Biometrics, 000 0000
k = 1, . . . , n2 we draw a candidate φ∗k from π(φk|φ−k,y) and then accept it with probability
α = min
(1,π(φ∗k|φ−k,y) π(φk|φ−k,y)
π(φk|φ−k,y) π(φ∗k|φ−k,y)
). (14)
An important feature of (13) is that the GMRF approximation inherits the Markov prop-
erty of the prior on φk, which makes the MCMC simulation efficient. Furthermore, such an
approximation is fairly accurate and yields acceptance rates above 0.9 for all the examples
that we present in the paper.
4. Simulations and comparisons with kernel methods
We performed two simulated examples to compare the adaptive GMRF method for intensity
estimation with both the fixed and adaptive bandwidth versions of the kernel estimator.
Although we included covariates in our model in Section 2, for simplicity, we do not use any
here. In the first example, the point processes are simulated by assuming the true intensity
to be a (smooth) two-dimensional unimodal function. In the second example, we simulate
realizations of inhomogeneous Poisson processes with log-intensities generated from Gaussian
processes. For fixed bandwidths, we used the cross-validation likelihood method of Stoyan
et al. (1995) and the M(h) minimization method of Diggle (2003) to select bandwidths.
The implementation of the adaptive kernel method follows Diggle et al. (2005), except that
instead of estimating intensities at the observed points, we estimate them at grid locations.
The initial bandwidth h0 was chosen to be the bandwidth used in the fixed bandwidth
kernel methods. We use the integrated squared error (ISE) between the true and estimated
intensities as a performance criterion. Since the results depend on the bandwidth selectors,
we only report the minimum ISE achieved by kernel methods.
4.1 Unimodal intensity function
We first considered a unimodal intensity function on a 10×10 window centered at the origin:
λ(s) = 9 + 27 exp{−2||s||2
}. (15)
Bayesian Semiparametric Intensity Estimation 11
We simulated 100 inhomogeneous Poisson spatial point patterns using the rpoispp function
in the spatstat R package. We also used the inhomogeneous Neyman-Scott process (INSP)
model introduced by Waagepetersen (2007), simulating 100 realizations using the intensity
function (15). These are generated by first simulating a stationary Neyman-Scott realization
and then thinning the sample using retension probabilities that depend on λ. We used the
Thomas process, a Neyman-Scott process where the offspring points are distributed about
the parent points according to a bivariate Gaussian density. We chose the constant parent
intensity, the mean number of offspring points per parent and the standard deviation of the
Gaussian density to be 10, 4 and 0.5 respectively. The functions in spatstat we used are
rThomas and rthin. These point patterns have correlations and so are not Poisson processes.
It allows us to examine the performance of the adaptive GMRF method when the underlying
Poisson assumption is not valid. Both the Poisson and INSP point processes have roughly
1000 points in each data set. The intensities are estimated on 100×100 regular grids for the
(non-adaptive) kernel method and on 30×30 grids for adaptive methods.
We evaluated ISEs on three different grid dimensions: 30×30, 50×50 and 100×100. The
results are summarized in Table 1. Compared to the kernel methods, the adaptive GMRF
model works much better for Poisson point patterns. The Poisson assumption shows in the
results for the INSP point patterns, where the ISEs are slightly larger than for the kernel
methods. It is also interesting to see that the adaptive kernel estimator slightly outperforms
the fixed bandwidth kernel for Poisson processes but performs similarly for INSP.
[Table 1 about here.]
4.2 Stationary Gaussian processes
Following Diggle et al. (2005), we also simulated realisations of inhomogeneous Poisson
processes whose intensities are generated from λ(x) = exp {S(x)}, where S(x) is a stationary
Gaussian process with covariance function Cov {S(x), S(x− u)} = σ2 exp(−u/φ). For each
12 Biometrics, 000 0000
comparison, we simulated 100 samples, each consisting of 1000 points on a 89×89 region.
From each simulated sample we computed the (minimum) integrated squared errors ISEFK ,
ISEAK and ISEAG achieved by the fixed bandwidth kernel, adaptive bandwidth kernel,
and adaptive GMRF estimators respectively. All the ISEs are evaluated on 100×100 regular
grids. To compare the performance of the three estimators it is more informative to look at
the ratios of the ISE’s: RFK/AG = ISEFK/ISEAG and RAK/AG = ISEAK/ISEAG.
Table 2 summarizes the results for each pair of the model parameters (σ2, φ). As we can see,
the adaptive GMRF estimator outperforms both kernel estimators in almost every situation.
Also, the adaptive GMRF model seems to work better for φ = 5, 7 than φ = 3. This is
due to the GMRF prior that we constructed, which assumes second-order smoothness and is
more suitable for modeling Gaussian processes with large φ (i.e. stronger correlation). This
can also be seen from the first simulation example, where the true intensity function is a
smooth unimodal function. Finally, note that the superiority of using adaptive methods is
more pronounced at larger values of σ2, consistent with the fact that larger values of σ2 are
associated with more pronounced spatial heterogenity in the resulting point patterns. This
agrees with the findings in Diggle et al. (2005).
As highlighted by a referee, the performance should not be as good for, say, a Thomas
Neyman-Scott model, since it violates the assumptions of model (5). Briefly, we expect
better performance with weaker spatial correlation (i.e. closer to a Poisson process) and vice
versa. A comprehensive study would require consideration of a wide range of point process
models and model parameters, which is beyond the scope of this paper.
[Table 2 about here.]
4.3 Some remarks
The adaptive GMRF framework allows for reliable and efficient MCMC simulation. The
Markov chains have quick convergence: 15,000 MCMC iterations with a burn-in of 5,000
Bayesian Semiparametric Intensity Estimation 13
proved to be sufficient for all the simulated examples. Due to the block-move sampling
method described in Section 3, the MCMC computation is fast considering the large number
of parameters to be estimated in the model. With n = 402, 602, 802 and 1002, our FORTRAN
program took 114, 255, 510 and 837 seconds, respectively, to estimate intensities from a
simulated data set of 10,000 points. The computational times increase roughly at order n.
All the computations were carried out on a Mac laptop with a 2.13 GHz Intel Core 2 Duo
processor.
Since the adaptive GMRF model is based on binning of the data, its performance depends
on the choice of grid size. This is similar to bandwidth choice with kernel methods. For our
simulation studies, we used grids of size 30×30, 40×40 and 50×50. The results in Table 2 are
for the best performing grid size, specifically 30×30 for σ2=1, 40×40 for σ2=3 and 50×50
for σ2=7. Choice of grid size does not appear to be influenced by φ, at least, not for the
values we considered. Note that for the kernel method with fixed bandwidth, we used optimal
bandwidths selected using cross-validation (Stoyan et al., 1995) and the M(h) miminization
method (Diggle, 2003).
We believe the advantage of the adaptive GMRF estimator is the use of data information
to choose the amount of smoothing. In contrast, the kernel smoothing methods depend
on bandwidth selectors that have not been well studied in the context of spatial point
analyses. In our simulations, we found that the two selectors we used, the cross-validation
likelihood method and the M(h) minimization method, often selected significantly different
bandwidths. Stoyan et al. (1995) indicates that the likelihood cross-validation method has
a tendency to return small bandwidths. The quantity M(h) depends on the K function,
a second-order statistic that will have to be estimated in practice. Estimates of the K
function are known to be rather variable. Moreover, the estimation using adaptive kernel
14 Biometrics, 000 0000
method is sensitive to the initial bandwidth h0. Therefore, the kernel estimators rarely yielded
consistent performance in our simulation studies.
5. Application to rainforest data
We here use the dataset from a census of a tropical forest in a 1000 by 500 meter plot in
Barro Colorado Island in the Panama. See Hubbell and Foster (1983), Condit et al. (1996)
and Condit (1998). The full dataset contains the locations of more than 300 species of trees,
many of these numbering thousands of trees. We consider three particular species of trees:
the Acalypha diversifolia (Acaldi, 528 trees), Capparis frondosa (Cappfr, 3299 trees) and
Lonchocarpus heptaphyllus (Loncla, 836 trees) species. Measurements of the altitude, slope,
various soil minerals, and soil pH are available on a grid of locations. Figure 1 shows plots of
Potassium and Nitrogen levels in the soil as well as altitude and slope over the observation
region.
We obtained intensity estimates for these three species of trees using the parametric
model of Waagepetersen and Guan (2009), the fixed and adaptive kernel methods, and the
semiparametric model based on the adaptive GMRF. Figures 2, 3 and 4 show our results for
the Acaldi, Cappfr and Loncla species respectively. Specifically, these figures contain plots
of the actual data, the parametric, adaptive kernel and semi-parametric adaptive GMRF
estimates (we exclude the plots of the fixed kernel estimates in the interest of conserving
space but they are available upon request). The intensity plots are on the square-root scale
to show the spatial structures more clearly.
Waagepetersen and Guan (2009) found altitude and potassium to be significant for both
Acaldi and Cappfr, and nitrogen and phosphorus for Loncla. However, the parametric in-
tensity estimates tend to contain the features that are visible in the plot of covariate values
but are not apparent from the actual point data. For example, in Figure 2(b) for Acaldi,
the peak near the lower right corner does not correspond to an abundance of trees, while in
Bayesian Semiparametric Intensity Estimation 15
Figure 3(a) for Cappfr, the dense curved area from the lower left corner to location (200,
200) has a very low estimated intensity as shown in Figure 3(b). Also notice the similarity
between the Acaldi and Cappfr plots. In Figure 4(b) for Loncla, where the coefficient for
the nitrogen covariate is negative, we find that the pattern appears to roughly follow the
inverse of the nitrogen pattern. See, for example, the two intensity peaks on the left edge
and the low intensity area on the upper right. We also found that the non-adaptive kernel
estimator (not shown) appears to over-smooth dense regions and under-smooth areas with
few data points. Conversely, the adaptive kernel estimator seems to provide local smoothing
for intensity estimation, but at the cost of losing global smoothness, resulting in a very
granular appearance.
Using the semiparametric adaptive GMRF model, the significant covariates at the 5% level
were potassium for Acaldi, slope for Cappfr, and altitude for Loncla. For Acaldi, the estimate
of the regression parameter for potassium was 0.0045 (0.0002, 0.0089) (approximate 95%
Bayesian credible interval in parenthesis). For Cappfr, the estimate of slope was -2.51 (-4.45,
-0.62), and for Loncla the estimate of elevation was 0.094 (0.043, 0.148). Note that the result
on potassium for Acaldi is consistent with Waagepetersen and Guan (2009). We also found
that our Bayesian point estimates are close to Waagepetersen and Guan’s estimates, but with
more conservative 95% intervals. The reason may be that our estimates take into account
the uncertainty from estimating the spatial features that are not due to the covariates.
Compared to the kernel estimates, the proposed method, in our opinion, achieves a much
better balance between smoothing the image and retaining the necessary details on spatial
extents. Due to its data-driven nature, our model is able to apply adaptive smoothing when
the point process has increasingly local variation and provides general smoothing where the
process is more homogeneous. In particular, notice that although the intensity estimates
contain some spatial features of the underlying covariate structure, these features do not
16 Biometrics, 000 0000
overwhelm the information contained in the data. For example, for the Cappfr species, the
dense curved structure in the lower left portion of the observation region is retained. As
suggested by a referee, we checked the plots of estimated γjk for all species, which show low
estimates for clustered points and high estimates for more homogeneous points. Such plots
indicate that it is necessary to consider the spatially adaptive approach for the rainforest
data.
[Figure 1 about here.]
[Figure 2 about here.]
[Figure 3 about here.]
[Figure 4 about here.]
6. Discussion
In this work, we introduced a data-driven adaptive method to obtain estimates of the
intensity function from a spatial point pattern. This method involves binning the data and
using adaptive Gaussian Markov random fields to model the underlying spatial process.
The amount of smoothing at the local level is controlled by parameters in the prior, and
these parameters are estimated using the data in the neighboring areas. The simulation
study and rainforest data examples show that this method can achieve a nice balance
between smoothing and retension of features of the intensity function that is difficult to
attain with kernel estimation. Our method is versatile and works with or without covariates.
The nonparametric version is useful for exploratory analyses and mapping purposes, or if
information about covariates or form of the model is lacking. This is particularly important
when the process producing the point pattern is complicated and/or not well understood.
If appropriate covariates are available, inference regarding the dependence of the intensity
function on these covariates can be obtained using the semiparametric version of our method.
Bayesian Semiparametric Intensity Estimation 17
To implement our method, one must first select an appropriate grid dimension for binning.
Currently, this is a subjective decision to be made by the investigator. We examined the
effects of choice of grid size in our simulation study and provided some initial guidelines. In
particular, in the simulation study with the stationary Gaussian processes (Section 4.2), we
found that performance is better with small cells (larger grid size) when σ is larger, i.e. when
there is more heterogeneity in the point pattern. Thus we recommend a grid size so that each
cell is roughly homogeneous. Also, a finer grid can be used if the data size is larger. We are
planning a more comprehensive study to explore this topic further, especially in the context
of differing correlation strengths. A theoretical study of this and of issues of consistency are
also planned.
Another area of further investigation is the use of these intensity estimates in estimates of
the inhomogeneous K function (Baddeley et al., 2000). The correlation exhibited in point
patterns is often of great interest. However, in the inhomogeneous case this requires good
estimates of the intensity function. We will study how estimates of the inhomogeneous K
function might be improved by the use of our intensity estimates.
Acknowledgements
Yu Yue’s research is supported by PSC-CUNY research award #60147-39 40.
References
Baddeley, A. and Turner, R. (2000). Practical maximum pseudolikelihood for spatial pointpatterns. Australian and New Zealand Journal of Statistics 42, 283–322.
Baddeley, A. J., Møller, J., and Waagepetersen, R. (2000). Non- and semi-parametricestimation of interaction in inhomogeneous point patterns. Statistica Neerlandica 54,329–350.
Baddeley, A. J. and Turner, R. (2005). Spatstat: an R package for analyzing spatial pointpatterns. Journal of Statistical Software 12, 1–42.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (withdiscussion). Journal of the Royal Statistical Society, Series B: Methodological 36, 192–236.
18 Biometrics, 000 0000
Brezger, A., Fahrmeir, L., and Hennerfeind, A. (2007). Adaptive Gaussian Markov randomfields with applications in human brain mapping. Journal of the Royal Statistical Society:Series C (Applied Statistics) 56, 327–345.
Brown, L., Cai, T., Zhang, R., Zhao, L., and Zhou, H. (2009). The root-unroot algorithmfor density estimation as implemented via wavelet block thresholding. to appear inProbability Theory and Related Fields .
Cai, T. (1999). Adaptive wavelet estimation: a block thresholding and oracle inequalityapproach. Annals of Statistics 27, 898–924.
Carter, C. K. and Kohn, R. (1996). Markov chain Monte Carlo in conditionally Gaussianstate space models. Biometrika 83, 589–601.
Condit, R. (1998). Tropical forest census plots. Springer-Verlag and R. G. Landes Company,Berlin, Germany and Georgetown, Texas.
Condit, R., Hubbell, S. P., and Foster, R. B. (1996). Changes in tree species abundance ina neotropical forest: impact of climate change. Journal of Tropical Ecology 12, 231–256.
Diggle, P., Rowlingson, B., and Su, T. (2005). Point process methodology for on0line spatio-temporal disease surveillance. Environmetrics 16, 423–434.
Diggle, P. J. (1985). A kernel method for smoothing point process data. Applied Statistics34, 138–147.
Diggle, P. J. (2003). Statistical Analysis of Spatial Point Patterns. Arnold, London, 2ndedition.
Diggle, P. J., Morris, S. E., and Wakefield, J. C. (2000). Point-source modelling using matchedcase-control data. Biostatistics 1, 89–105.
Fahrmeir, L. and Kneib, T. (2009). Propriety of posteriors in structured additive regressionmodels: Theory and empirical evidence. Journal of Statistical Planning and Inference139, 843–859.
Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generalized additive mixed modelsbased on Markov random field priors. Journal of the Royal Statistical Society, Series C:Applied Statistics 50, 201–220.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models.Bayesian Analysis 1, 515–533.
Gu, C. (2002). Smoothing Spline ANOVA Models. Springer-Verlag Inc, New York.Guan, Y. (2008). On consistent nonparametric intensity estimation for inhomogeneous
spatial point processes. Journal of the American Statistical Association 103, 1238–1247.Guan, Y. (2009). Fast block variance estimation procedure for inhomogeneous spatial point
processes. Biometrika 96, 213–220.Guan, Y. and Loh, J. M. (2007). A thinned block bootstrap procedure for modeling
inhomogeneous spatial point patterns. Journal of the American Statistical Association102, 1377–1386.
Hubbell, S. P. and Foster, R. B. (1983). Diversity of canopy trees in a neotropical forestand implications for conservation. In Sutton, S. L., Whitmore, T. C., and Chadwick,A. C., editors, Tropical Rain Forest: Ecology and Management, pages 25–41. BlackwellScientific.
Illian, J. B. and Rue, H. (2010). A toolbox for fitting complex spatial point process modelsusing integrated Laplace transformation (INLA). Technical Report 6, Department ofMathematical Sciences, Norwegian University of Science and Technology.
Bayesian Semiparametric Intensity Estimation 19
Kass, R., Ventura, V., and Cai, C. (2009). Statistical smoothing of neuronal data. Biometrics65, 1243–1253.
Knorr-Held, L. and Richardson, S. (2003). A hierarchical model for space-time surveillancedata on meningococcal disease incidence. Journal of the Royal Statistical Society, SeriesC: Applied Statistics 52, 169–183.
Knorr-Held, L. and Rue, H. (2002). On block updating in Markov random field models fordisease mapping. Scandinavian Journal of Statistics 29, 597–614.
Liang, S., Banerjee, S., and Carlin, B. P. (2009). Bayesian wombling for spatial pointprocesses. Biometrics 65, 1243–1253.
Møller, J. and Waagepetersen, R. P. (2003). Statistical Inference and Simulation for SpatialPoint Processes. Chapman and Hall, New York.
Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications,volume 104 of Monographs on Statistics and Applied Probability. Chapman & Hall,London.
Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latentGaussian models using integrated nested Laplace approximations (with discussion).Journal of the Royal Statistical Society, Series B 71, 319–392.
Rue, H., Steinsland, I., and Erland, S. (2004). Approximating hidden Gaussain Markovrandom fields. Journal of the Royal Statistical Society, Series B: Statistical Methodology66, 877–892.
Schoenberg, F. P. (2003). Multi-dimensional residual analysis of point process models forearthquake occurrences. Journal of American Statistical Association 98, 789—795.
Schoenberg, F. P. (2004). Consistent parametric estimation of the intensity of a spatial-temporal point process. Journal of Statistical Planning and Inference 128, 79–93.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. CRC Press,New York.
Stein, M. L., Quashnock, J. M., and Loh, J. M. (2000). Estimating the K function of a pointprocess with an application to cosmology. Annals of Statistics 28, 1503–1532.
Stoyan, D., Kendall, W. S., and Mecke, J. (1995). Stochastic Geometry and Its Applications,2nd edition. John Wiley, New York.
Sun, D., Tsutakawa, R. K., and He, Z. (2001). Propriety of posteriors with improper priorsin hierarchical linear mixed models. Statistica Sinica 11, 77–95.
Waagepetersen, R. (2007). An estimating function approach to inference for inhomogeneousNeyman-Scott processes. Biometrics 63, 252–258.
Waagepetersen, R. and Guan, Y. (2009). Two-step estimation for inhomogeneous spatialpoint processes. Journal of the Royal Statistical Association Series B 71, 685–702.
Wahba, G. (1990). Spline Models for Observational Data. SIAM [Society for Industrial andApplied Mathematics], Philadelphia.
Yue, Y., Loh, J. M., and Lindquist, M. A. (2009). Adaptive spatial smoothing of fMRIimages. Statistics and Its Interface (to appear).
Yue, Y. and Speckman, P. L. (2010). Nonstationary spatial Gaussian Markov random fields.Journal of Computational and Graphical Statistics 19, 96–116.
Yue, Y., Speckman, P. L., and Sun, D. (2008). Fully Bayesian adaptive spline smoothing.Manuscript (submitted), University of Missouri-Columbia, Department of Statistics.
20 Biometrics, 000 0000
0 200 400 600 800 1000
0100
200
300
400
500
100
150
200
250
300
350
(a) Potassium
0 200 400 600 800 1000
0100
200
300
400
500
-0.05
0.00
0.05
0.10
0.15
0.20
(b) Slope
0 200 400 600 800 1000
0100
200
300
400
500
-10
0
10
20
30
40
(c) Nitrogen
0 200 400 600 800 10000
100
200
300
400
500
125130135140145150155
(d) Altitude
Figure 1. Plots of the measurements of soil Potassium and mineralized Nitrogen (in unitsof mg/kg of soil), as well as the altitude (in meters) and slope in the observation region inBarro Colorado Island.
Bayesian Semiparametric Intensity Estimation 21
0 200 400 600 800 1000
0100200300400500
(a) Acaldi
0 200 400 600 800 1000
0100
200
300
400
500
0.025
0.030
0.035
0.040
0.045
(b) Parametric estimator
0 200 400 600 800 1000
0100
200
300
400
500
0.02
0.04
0.06
0.08
0.10
0.12
0.14
(c) Adaptive kernel estimator
0 200 400 600 800 10000
100
200
300
400
500
0.02
0.04
0.06
0.08
0.10
0.12
(d) Adaptive GMRF estimator
Figure 2. Plots of Acalypha diversifolia trees: (a) tree locations; (b) intensity estimatesusing the parametric method; (c) intensity estimates using the adaptive kernel method; (d)intensity estimates using the adaptive GMRF method. (The estimates are on a square-rootscale.)
22 Biometrics, 000 0000
0 200 400 600 800 1000
0100200300400500
(a) Cappfr
0 200 400 600 800 1000
0100
200
300
400
500
0.06
0.07
0.08
0.09
0.10
0.11
(b) Parametric estimator
0 200 400 600 800 1000
0100
200
300
400
500
0.05
0.10
0.15
0.20
0.25
0.30
(c) Adaptive kernel estimator
0 200 400 600 800 10000
100
200
300
400
500
0.05
0.10
0.15
(d) Adaptive GMRF estimator
Figure 3. Plots of Capparis frondosa trees: (a) tree locations; (b) intensity estimatesusing the parametric method; (c) intensity estimates using the adaptive kernel method; (d)intensity estimates using the adaptive GMRF method. (The estimates are on a square-rootscale.)
Bayesian Semiparametric Intensity Estimation 23
0 200 400 600 800 1000
0100200300400500
(a) Loncla
0 200 400 600 800 1000
0100
200
300
400
500
0.025
0.030
0.035
0.040
0.045
0.050
0.055
(b) Parametric estimator
0 200 400 600 800 1000
0100
200
300
400
500
0.020.040.060.080.100.120.140.16
(c) Adaptive kernel estimator
0 200 400 600 800 10000
100
200
300
400
500
0.02
0.04
0.06
0.08
0.10
0.12
(d) Adaptive GMRF estimator
Figure 4. Plots of Lonchocarpus heptaphyllus trees: (a) tree locations; (b) intensityestimates using the parametric method; (c) intensity estimates using the adaptive kernelmethod; (d) intensity estimates using the adaptive GMRF method. (The estimates are on asquare-root scale.)
24 Biometrics, 000 0000
Table 1The integrated squared errors obtained with different methods of intensity estimation, averaged over 100 simulated
realizations from the inhomogeneous Poisson (POIS) and inhomogeneous Neymann-Scot (INSP) models, withunimodal intensity function in (15).
POIS INSPMethod n = 302 n = 502 n = 1002 n = 302 n = 502 n = 1002
Fixed Kernel 512.71 512.78 512.70 706.64 706.86 706.55Adapt Kernel 502.99 503.25 503.36 705.27 705.53 705.69Adapt GMRF 422.53 437.02 436.30 720.21 726.83 743.59
Bayesian Semiparametric Intensity Estimation 25
Table 2Summary results from simulation study to compare performance of the adaptive GMRF, fixed and adaptive
band-width kernel estimators, for different values of the Gaussian process parameters σ2 and φ. The values are theratios of minimum integrated squared errors achieved by the kernel and adaptive GMRF estimators.
σ2 = 1 σ2 = 3 σ2 = 5 σ2 = 7RFK/AG RAK/AG RFK/AG RAK/AG RFK/AG RAK/AG RFK/AG RAK/AG
φ = 3 .859 1.048 1.148 1.016 1.097 1.012 1.188 1.069φ = 5 .949 1.199 1.496 1.659 1.696 1.787 1.493 1.116φ = 7 .851 1.162 1.311 1.605 1.427 1.658 1.164 0.963