time-space kriging to address the spatiotemporal misalignment in the large datasets

11
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/authorsrights

Upload: lumhs

Post on 29-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/authorsrights

Author's personal copy

Time-space Kriging to address the spatiotemporal misalignment in thelarge datasets

Dong Liang a, Naresh Kumar b,*aDepartment of Epidemiology, University of Iowa, Iowa City, IA 52242, USAbDepartment of Epidemiology and Public Health, University of Miami, 1425 NW 10 Ave, Suite 308C, Miami, FL 33136, USA

h i g h l i g h t s

< Novel method of interpolation across space and time.< Computationally efficient method to estimate exposure at a given location and time.< Hierarchical time-space Kriging.

a r t i c l e i n f o

Article history:Received 30 July 2012Received in revised form10 February 2013Accepted 19 February 2013

Keywords:Time-space KrigingSpatiotemporal hierarchical modelGaussian Markov Random FieldsNonstationarityBayesian computationFine particulate matter PM2.5

a b s t r a c t

This paper presents a Bayesian hierarchical spatiotemporalmethodof interpolation, termed asMarkov CubeKriging (MCK). The classical Kriging methods become computationally prohibitive, especially for largedatasets due to theO(n3)matrix decomposition.MCKoffers novel and computationally efficient solutions toaddress spatiotemporal misalignment, mismatch in the spatiotemporal scales and missing values acrossspace and time in large spatiotemporal datasets. MCK is flexible in that it allows for non-separablespatiotemporal structure and nonstationary covariance at the hierarchical spatiotemporal scales. Employ-ingMCKwe developed estimates of daily concentration of fine particulates matter�2.5 mm in aerodynamicdiameter (PM2.5) at 2.5 km spatial grid for the Cleveland Metropolitan Statistical Area, 2000 to 2009. Ourvalidation and cross-validation suggest that MCK achieved robust prediction of spatiotemporal randomeffects and underlying hierarchical and nonstationary spatiotemporal structure in air pollution data. MCKhas important implications for environmental epidemiology and environmental sciences for exposurequantification and collocation of data from different sources, available at different spatiotemporal scales.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Blurring disciplinary boundaries and converging time and spacedomains are important developments for interdisciplinary multi-level (or hierarchical) research. While such research can help usunderstand complex patterns of socio-physical phenomena and theprocesses that operate at hierarchical spatiotemporal scales andshape them, it requires data from multiple sources and poses fourmajor challenges. First, the size of these data is large. For example,aerosol optical depth (AOD) data for Cleveland MSA between 2000and 2009 included 2.3 million data points. Second, these data sufferfrom spatiotemporal misalignment because the location and time

stamps of these data do not align well (see Fig. 1a; (Kumar, 2012)).For example, to investigate health effects of air pollution we needair pollution and health data. The location and time stamp of airpollution data (monitored at sparsely distributed EPA sites) doesnot perfectly align (or correspond) with the location and timestamp of (aggregated and disaggregated or individual level) healthdata. Thus, environmental data need to be imputed at the spatio-temporal stamp of health data in order to address themisalignmentproblem. Third, spatiotemporal scales/resolutions of these data aredifferent (Fig. 1b). This means that spatiotemporal scales at whichthese data are aggregated and reported are not the same. Forexample air pollution data are monitored at sparsely distributedsites every day (every hour at some sites), and mortality data areaggregated by county and day. Fourth, these data suffer frommissing values across geographic space and time. For example,there are systematic gaps in air pollution monitoring; likewisethere are systematic gaps in the retrieval of AOD from satellites dueto cloud cover and cloud contamination.

* Corresponding author. Tel.: þ1 305 243 4854.E-mail addresses: [email protected] (D. Liang), [email protected]

(N. Kumar).URL: http://eph.ccs.miami.edu

Contents lists available at SciVerse ScienceDirect

Atmospheric Environment

journal homepage: www.elsevier .com/locate/atmosenv

1352-2310/$ e see front matter � 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.atmosenv.2013.02.034

Atmospheric Environment 72 (2013) 60e69

Author's personal copy

The analysis of time-space varying datasets that come fromdifferent sources requires that these data are: a) aligned withrespect to location and time, b) arranged on the same spatiotem-poral scales and c) missing values are filled. For example, we needto estimate exposure using the existing air pollution data at thespatiotemporal scale of mortality data in order to evaluate the as-sociation between air pollution and mortality. Finest spatial reso-lution of mortality data is point location (i.e. street address ofdecedents) and the temporal scale is the date of mortality. Dailyexposure estimates are needed several days prior to the date ofdeath (for time-lagged exposure) at the location of residence (andpotentially at all other locations where decedents have spent sometime) for each case or these data need to be aggregated to coarserspatiotemporal scale. Likewise, the spatiotemporal scales ofdifferent environmental datasets are not the same. Thus, imputingone environmental dataset at the spatiotemporal scales of otherenvironmental dataset is critically important to collocate differentenvironmental datasets. If adequate data points, spread acrossgeographic space and time, are available, different methods ofinterpolation can be employed to impute value at a given locationand time. Among thesemethods, time-space Kriging is an attractiveoption because it minimizes the mean squared prediction errorsamong linear unbiased predictors.

Although time-space Kriging is a relatively newer development,spatial Kriging has been in practice for a while. Given a randomprocess {Z(s):s˛D}; where D is the spatial domain and s is thelocation represented by a pair of coordinates, Kriging relies on theassumption of spatial stationarity (i.e. constant variance withindomain D), which assumes that the process satisfies the conditionE(Z(s))¼ m, for all s˛D and cov(Z(s1),Z(s2))¼ C(s1�s2) for all s1,s2˛D.Kriging requires an understanding of how a quantity varies with

respect to those observed at the neighboring sites (located atdifferent distance intervals) to develop an empirical model to es-timate covariance. Based on the covariance function, Kriging hasbeen used extensively for developing continuous surface (i.e. pre-diction at every location within the study domain) of environ-mental variables, such as distribution of air pollution, minerals, soilcharacteristics and meteorological conditions (Cressie, 1993, 2008).

The extension of spatial Kriging to the spatiotemporal domain isstraightforward. Time-space Kriging requires proper specificationof spatiotemporal covariance functions. According to the classicaltheory, separable and stationary covariance assumptions requirethat the process satisfies the condition E(Z(s,t)) ¼ m, for all s˛D, t˛[1,T] and cov(Z(s1,t1),Z(s2,t2)) ¼ C(h,u) ¼ Cs(h)Ct(u) where h ¼ s1�s2and u ¼ t1�t2 for all s1,s2˛D and t1,t2˛{1,.,T}; Cs and Ct are spatialand temporal only stationary covariance functions. Satisfying theseassumptions can be difficult because the inherent differences inspatial and temporal scales of data are likely to produce nonsta-tionary covariance when time and space domains converge. Todevelop robust time-space Kriging model, we face three importantchallenges, especially for large datasets: a) non-separable covari-ance across time and space, b) nonstationary covariance at multiplespatiotemporal scales, and c) computational issues. Researchershave begun to address some of these challenges.

Time-space Kriging requires the specifications of spatial, tem-poral and non-separable spatiotemporal covariance. Spatial andtemporal covariance can be constructed using spatial and temporaltrends of the data separately. Non-separable spatiotemporalcovariance emerges due to the convergence of spatial and temporaldomains. Researchers suggest the use of product sum model (DeCesare et al., 2001) and integration of spectral densities (Cressieand Huang, 1999) to address non-separable spatiotemporal

Fig. 1. a. An example of Spatial Misalignment in the locations of EPA data monitored at point locations and the point locations of health data, where one might need exposure. b: Anexample of Mismatch in the geographic scales of EPA data (at point locations) and mortality data (aggregated at Census Tract level). Source: Kumar et al., 2011a,b. Data were acquiredfrom the Chicago Department of Public Health.

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e69 61

Author's personal copy

covariance. Since the rate of spatiotemporal trend can varyregionally, seasonally, and across local spatiotemporal sub-domains, the convergence of spatiotemporal domains also resultsin nonstationarity covariance at multiple spatiotemporal scales.The first order (at global scale) nonstationarity can be handled byincorporating covariates and/or non-linear spatiotemporal trends(De Iaco et al., 2002; Haas, 1995). The effectiveness of such anapproach largely depends on the robustness of covariates and/orspatiotemporal trends incorporated into the model.

Given the inherent regional and seasonal structures in theenvironmental data, nonstationarity needs to be modeled atmultiple spatiotemporal scales separately. For example, diurnalvariability in air pollution can be different from seasonal vari-ability; likewise, air pollution variation within a neighborhood(within a few miles) can be different from that within and be-tween regions, such as within and between urban and sub-urbanareas (Kumar et al., 2011a). Given this hierarchical spatiotemporalstructure in environmental data, the second-order non-stationarity at multiple spatiotemporal scales is important fortime-space Kriging. Nonstationarity spatiotemporal covariancecan be specified by basis function approximations and Kalmanfiltering (Cressie et al., 2010). Among other alternatives, acomputationally efficient hierarchical model can be used toaddress nonstationarity at multiple spatiotemporal scales(Fuentes et al., 2005; Sahu et al., 2006).

Time-space Kriging also poses computational challenges due tothe big n problem (Banerjee et al., 2004). Researchers address thischallenge through approximating a large process by its realizationson a small set of knots (or subset of data) (Banerjee et al., 2008). Analternative approach involves fitting a Gaussian Markov RandomFields (GMRF) to the original process on a lattice in order to utilizesparse matrix algorithms (Hartman and Hossjer, 2008). Thisapproach is useful for regularly spaced datawhere projection onto alattice can bemade. But it is of little use for irregular spatiotemporaldata (such as deviating path of Terra and Aqua satellite every day,before repeating every 16th day). Cressie and Johannesson presentan efficient solution by approximation of covariance using linearcombinations of small number of low-dimensional deterministic(or basis) functions (2008).

In this paper, we present a flexible and computationally effi-cient Markov spatiotemporal Cube (or voxel) Kriging (MCK) toaddress the above outlined challenges. MCK simultaneously ad-dresses modeling and computation challenges. The flexibility ofthe model lies in the fact that it incorporates varying covariance atmultiple spatiotemporal scales and separable and non-separablespatiotemporal covariance. MCK is a Bayesian hierarchical modelwhere GMRF priors are utilized to achieve the model richness,such as the use of multi-level interactions across space and time.We can also use Bayesian methodology to incorporate covariates,time-space heterogeneity, and physically meaningful prior infor-mation. MCK also utilizes structures within GMRF priors fordeveloping an efficient solution to the big n problem. To demon-strate the application of this method, we utilize satellite based airpollution data (on fine particulate matter�2.5 mm in aerodynamicdiameter (PM2.5)) for Cleveland MSA from 2000 to 2009. Thesedata exhibit strong spatial dependency, spatial nonstationarity,weak temporal autocorrelation and extensive gaps across timeand space. We also created artificial gaps in the dataset for cross-validation. These gaps can be treated as the locations and times forwhich exposure estimates are needed. The remainder of this pa-per is organized into four sections. The first section presents thestatistical formulation of MCK, followed by the second section thatdetails the implementation of MCK. The third section includesresults and the fourth section discusses our results with therelevant literature.

2. Statistical model

Let {Z(s,t)} denote the observed Gaussian process defined overD � {1,2,.,T} where D3R2 denotes the spatial domain and t in-dexes discrete time stamps. The term cube (voxel) in the context ofthis paper refers to as tessellation (or partitioning) of the datapoints into spatiotemporal subsets. We partition D into small areas,termed as regions. Likewise, we partition time T into L subsets,known as temporal intervals. Utilizing the tessellation, wedecompose the spatiotemporal domain hierarchically into sub-spatiotemporal domains, termed as spatiotemporal cubes. Wethen utilize hierarchical GaussianMarkov Random Fields (GMRF) tocapture spatiotemporal interactions at intra- and inter-spatiotem-poral cube level. The highly structured GMRF model also facilitatesBayesian estimation for large dataset.

In practice, the process Z is only observed at nspatiotemporal locations denoted by {(si,ti),i ¼ 1,.,n}. DefineZt ¼ [Z(s1,t),.,Z(snt,t)]T; where nt represents observed data attime t. Define the m spatiotemporal locations of health data. Weterm as the query points where predictions are needed using ntdata points. Our model aims to predict the un-observable atthese m spatiotemporal locations with minimum error.

The first stage of the model takes the form

Zt ¼ XTt bþ vt þ et þ εt

where Xt ¼ [X(s1,t)T,.,X(snt,t)T]T denote the nt � p design matrix forcovariates, The random vector vt and et model between cube andwithin cube variation, and et denote measurement errors that arewhite noises.

Our specification of cube model involves selection of R spatiallocations over D (the study area). Voronoi tessellation can beemployed to partition spatial domain D into non-overlapping re-gions (R). Likewise, temporal domain can be partitioned into non-overlapping L time intervals. The spatiotemporal domain is parti-tioned into R � L cubes. Partitioning spatiotemporal domain into(R � L) cubes is integral component to make MCK computationallyefficient and to capture nonstationarity at local and cube levels.Different approaches are suggested for partitioning spatial domain,which includes: minimizing variance within sub-regions andvariance maximization between sub-regions (Kumar et al., 2011b),overlaying a balanced sample of points (Stevens and Olsen, 2004)and minimizing prediction errors using model based approaches(Diggle and Lophaven, 2006; Zhu and Stein, 2006; Zimmerman,2006). Borrowing the strengths from the literature on spatial par-titioning, we propose overlaying systematic grid onto the studydomain D to partition it into R regions (termed as spatiotemporalcubes), and then regularly space time points on each grid cell. Thesystematic partitioning of geographic space reduces spatial auto-correlation and facilitate parameter estimation (Stevens and Olsen,2004).

Let r(si) denote the region identifier for location si and l(ti) theinterval for time stamp ti. Utilizing the tessellation, we candecompose the process v(si,ti) additively (Schmid and Held, 2004)as

nðsi; tiÞ ¼ fðrðsiÞÞ þ qðlðtiÞÞ þ dðrðsiÞ; lðtiÞÞ

where the term f represents un-observed regional effects,denoting factors affecting observations only over space; the termq models un-observed temporal effects; the term d captures re-sidual stochastic variations that can interact across spatiotem-poral cubes. Details on each of these spatiotemporal and localeffects are provided in Supplementary Online Material (SOM):Appendix e A.

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e6962

Author's personal copy

For regional effects f, we adopt a conditional autoregressiveprior (Besag et al., 1991). For the regional temporal trends, we adopta first-order randomwalk GMRF (Clayton,1996). For the interactioneffects d, we construct a GMRF on the spatiotemporal domain byusing the Kronecker product of the structure matrices of f and q

(Clayton, 1996; Rue and Held, 2006).

pðdjsdÞfs

nqnf2

dexp

n� sd

2dT

�Q q5Qf

�do

where sd denotes the unknown scale parameters, nf ¼ R � 1 andnq ¼ L � 1 denote the ranks of structure matrices for regional effect(Qf) and temporal trend (Qq) and5 denotes the Kronecker productoperator.

Local effects et represent variation within a cube. We assumethat within the cube, temporal dynamic are negligible.We partitioneach cube using Voronoi tessellation. We model the within cubevariation using xt, an nt þ mt random vector. We propose intrinsicCAR model on xt, where adjacency is defined through sharing aboundary between the Voronoi polygons. We assume that

et ¼ Ktxt where Kt ¼ �Int0nt�mt

�where Kt maps the data to local effects. In the general case, theabove model is specified across each cube. We allowed the CARscaling parameter to be independent across cubes, enforcing localsmoothing.

We specify a spatiotemporal CAR prior on the local effects due tocomputational and modeling issues of large datasets. First, thisconditional specification allows efficient computation. Instead ofiteratively factoring dense matrices, our implementation involvesdiagonalization of the fixed adjacency matrices only once duringthe setup. The resulting Bayesian computation becomes feasible forlarge datasets. Second, the accuracy of GMRF approximation toGaussian field over regular grid has been validated for stationaryGaussian field (Rue and Tjelmeland, 2002). The correlationapproximated by the nearest neighbor structure is not restricted tostationary or separable family, similar to basis function approxi-mations (Cressie and Johannesson, 2008).

We allow for cube specific scale parameters (srl) for local spatialeffects (xrl) and white noise (lrl). While such richly parameterizedmodel allows for local non-parametric smoothing of the data topredict missing values, the model requires efficient algorithms toderive Bayesian inference. The Bayesian computation details areoutlined in the implementation section. All scale parameters areassigned hyper-priors to complete the Bayesianmodel specification.We adopt semi-conjugate Gamma priors for all scale parameters

sxwGamma�nx2;nx2s2x

�where x stands for q;f; d; srl; lrl

where nx is small so that the hyper prior is uninformative and sx ischosen such that resulting prior with the 95% credible sets arewideenough to cover possible range of the scale parameters. Thesehyper-priors are chosen to be proper, yet uninformative, whichallows data to dominate the inference.

3. Implementation

3.1. Tessellation of spatiotemporal domain

The EPA data are monitored at sparsely distributed seven sitesin the Cleveland MSA. Thus, these data are not sufficient to esti-mate spatial co-variance function. Therefore, we utilize 2.3milliondata points of aerosol optical depth (AOD) retrieved using NASA’s

satellite between 2000 and 2009 and indirect estimates ofairborne particulate �2.5 mm in aerodynamic diameter (PM2.5)using AOD data. AOD was retrieved at 2 km spatial resolutionusing the same algorithm as for the standard 10 km AOD fromNASA (Levy et al., 2007), but with more restrictive criteria (seeKumar et al. (2011a) for details). AOD data represent w10:30 AMfor Terra and w1:30 PM Aqua satellites local time, respectively. Ahybrid approach was employed to develop PM2.5 (Kumar et al.,2011a). Both AOD and PM2.5 datasets were used to implementand validate MCK.

Given the large dataset and nonstationary spatiotemporal co-variation in AOD and PM2.5, partitioning these data into non-overlapping spatiotemporal cubes and identifying intra- andinter-cube interactions becomes important. A balance needs to beachieved between computational efficiency and spatiotemporalscales of decomposition because the finer scale decomposition willresults in a large number of spatiotemporal cubes and will increasethe computation burden. Given the strong spatial autocorrelationin AOD and PM2.5 data, we decompose spatial domain by overlayinga 10 � 10 systematic grid. This overlay ensures a spatially balancedrepresentation of these data. Because of high diurnal variability inthese data, we do not partition these data into sub-temporal do-mains, and utilize the finest available temporal scale of these data,i.e. every day.

3.2. Markov chain Monte Carlo algorithm

We propose a hierarchical model to predict value for the querypoints (i.e. points across space and time where predictions areneeded)

Zmt ¼ Xmtbþ q*t 1nt þ Kf;t

�f* þ d*t

�þ Kx;txt þ εt

where Zmt denotes the underlying true values; the ‘*’ above a givenparameter indicates appropriate sum-to-zero constraint and Xmt ismt � p design matrix. The K matrices are mapping matrices ofappropriate dimension. Specifically, K4,t is a mt � Rt adjacencymatrix where K4;t ½s*it ; rðs*itÞ� ¼ 1 and zero elsewhere. LetNt ¼ PRt

rt ¼1 Nt;rtdenote total local parameters at time t, Kx,t is anmt � Nt adjacency matrix where missing data are associated withthe corresponding underlying truth. The error process has diagonalcovariance matrix W(l). Our prediction rely on the distributionpðZmjZÞ. Let Q denote all random effects and T all variance pa-rameters in the model.

pðZmjZÞ ¼Z Z

pðZmjQ; T;ZÞpðQ; TjZÞdnðQÞdnðTÞ

The above integral can be approximated as a Monte Carlomixture of the form

bpðZmjZÞ ¼ 1G

XGg¼1

p�Zm

���QðgÞ;TðgÞ;Z

where g denotes the iteration number.Ideally, marginalized posterior pðT jZÞ are appropriate after

integrating the random effects (Rue and Held, 2006). This inte-gration requires factoring of a large covariance matrix of largedimension (nþm)� (nþm), where n denotes observed data pointsand m query points (where predictions are needed). In addition tolarge data dimension, such factorization needs to satisfy numerousidentifying constraints, which makes the inverse covariance matrixdense and numerical integration of random effects challenging.Thus, we employ block Gibbs sampler to explore the posterior

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e69 63

Author's personal copy

density. Details of the Gibbs sampler are included in SOM:Appendix e B.

4. Data and results

4.1. Data

For the implementation and validation of MCK we utilize threedifferent datasets: aerosol optical depth (AOD) and PM2.5 data at2 km spatial resolution from December 2007 to January 2009 forCleveland MSA (Fig. 2a), and data simulated using hierarchicalnonstationarity assumption. The methodologies for the extractionof 2 km AOD and prediction of PM2.5 using a hybrid approach isdetailed elsewhere (Kumar et al., 2011a; Levy et al., 2007).Assuming these data represent the best estimates of AOD andPM2.5, a systematic grid, as shown in Fig. 2b, was developed usingthe proposed MCK.

Validation was performed on the 2008 data with one-monthoverlap on both ends. Thus, the final dataset included all datapoints between December 2007 and January 2009. Exploratoryanalysis provided insight into the important characteristics of thesedata. First, therewere systematic spatiotemporal gaps in these data,because AOD retrievals were not possible every day due to cloudcover and contamination. For example, over a span of 14 months,there were no observations for about 41% of the days. Second, therate of spatiotemporal variability in AOD and PM2.5 was not thesame, for example temporal variability in both AOD and PM2.5 wasgreater than the spatial variability. Given the static emission sour-ces and dynamic physical and chemical atmospheric processes, thedifferences in the rate of spatial and temporal variability wereobvious. Moreover, meteorological fields (such as wind velocity,wind direction and relative humidity) can transport aerosols andfine particulates to greater distances within short time span.Therefore, AOD and PM2.5 data are likely to show greater temporalheterogeneity and stronger spatial structure.

4.2. Results

Markov Cube Kriging (MCK) was implemented on AOD andPM2.5 for the year 2008 (n w 269 k and 255 k, respectively). Sinceboth AOD and PM2.5 were highly skewed, both were transformed toa natural log scale to stabilize variance. For cross-validation

purposes, 10% of the data points (15 k and 22 k, respectively wererandomly left out and MCK was employed to predict values at theleft out points. The study area was partitioned into 100 regions byoverlaying a 10 � 10 grid on Cleveland MSA. This resulted in twolevels spatial hierarchy, individual point locations of 2 km AOD andPM2.5 and their grouping within 100 regions. The implementationand validation of MCK excluded hierarchical temporal structurebecause of significant temporal variability withinweeks. Cubes thatshowed a sample standard deviation of 0.1 or less were modeledusing exchangeable measurement errors only. Initial values forregional effects were calculated from the observed regional meansand the local effects were initialized by maximizing the marginal-ized density pðurl jrrl Þ; where rrl denotes the observed datacentered on the regional means.

The hyper-priors for the model were chosen to be proper, yetvague. The ranges of the ln(PM10) and ln(PM2.5) were w8.45 andw4.09 ln(mg m�3) respectively. The scale parameter lrl wereassigned Gamma(1,0.1) prior such that the 95% prior credible set forthe standard deviation parameters were (0.16, 1.98) on the naturallog scale of mg m�3, this interval was wide enough to containcandidate values for the parameters. The local spatial scales lf2

ðrl; lÞwere assigned Gamma(1,1) hyper-prior. The hyper-priors forinter-cube scales parameters were sf w Gamma(0.1,0.1),sq w Gamma(0.1,0.1) and sd w Gamma(0.5,0.2). One MCMC chainwas run for 2100 iterations, discarding the initial 100 as burn-in.The convergence diagnostic was performed using R package coda.

4.3. Cross validation

The robustness of MCK was evaluated using cross validation, inwhich we Krigged the values for 10% data points skipped randomly.The Krigged (or predicted) values were then compared against theobserved values. We ran several Monte Carlo experiments byholding off 10% of the data. We define the cross validation residualRCVjs ¼ Zðsj; tjÞ � EðZðsj; tjÞjZÞ in experiment s. Following Haas(1995), the prediction accuracy was assessed bybiass ¼ 1=m

Pmj¼1 RCVjs=Zs. The average residual over the average

of the actual data when and where prediction is needed is denotedby Zs. We also used the t-test statistic for H0:RCVjs¼ 0 using biass fors ¼ 1,.,S. Both biass and the t-statistic should be small for un-

biased predictors. The accuracy of prediction uncertainty SEðbZjsÞ

Fig. 2. a: Study Area e Cleveland Metropolitan Statistical Area (MSA), PM2.5 and PM10 monitoring stations and 2.5 km spatial grid overlaid on the study area. In the proposed study,daily PM2.5 was predicted for each grid point from 2000 to 2009. b: Location of the 2 km AOD; using an empirical relationship between PM monitored at EPA sites (in (a)) andsatellite based AOD data, the daily PM2.5 was predicted for all locations and days AOD was available between 2000 and 2009 (see Kumar et al., 2011a,b for details).

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e6964

Author's personal copy

can be assessed by ratios ¼ 1=mPm

j¼1 SE2ðbZjsÞ=MSEswhere

MSEs ¼ 1=mPm

j¼1 RCV2js. For unbiased predictors, ratios should be

close to 1. Let eRCVjs ¼ RCVjs=SEðbZjsÞ denote the standardized cross

validation residuals and var(zs) denote the sample variance of eRCVjs

for dataset s. The sample variance var(zs) should also be close to 1 ifthe standard errors are negligibly biased.

We also implemented a naive spatiotemporal predictionmethod to predict values at the query points (i.e. 10% of therandomly skipped data points). This involved averaging allobserved data points within the chosen spatiotemporal lags; weused 0.10� and 10 days to search for the neighboring data pointsaround a given query point. Table 1 shows that MCK generatesrobust predictions as compared to the naïve average method. Theoverall bias is 0.0074 � 0.0019 and �0.012 � 0.0019 for ln(PM2.5)and ln(AOD) respectively. The t test statistics of the null hypothesisthat cross validation residuals equal zero are �4.40 and 2.59,respectively. Thus, MCK prediction is slightly negatively biased. Theestimated standard errors for ln(AOD) are too small, representing82% of the MSE. The negative bias for ln(AOD) indicates mis-specification of hyper-priors for local effects. With sparse data,hyper-priors assert little smoothing. Thus, the standard errors areunder-estimated. Figs. 3 and 4 show the validation of MCK andnaïve averaging method. As expected, MCK outperforms the naïveaveraging method in terms of both bias and MSE.

Our analysis suggests that the RMSE for ln(PM2.5) and ln(AOD)were 0.10 � 0.0044 and 0.30 � 0.02, respectively (Table 1 andFigs. 3e4). Both were significant smaller than those estimated us-ing the naïve averaging method. Figs. 3e4 suggests that MCK per-forms poorly for extremely low values, but for most data pointswithin mean � 2 standard deviation, (SD) predicted values matchnicely with the observed values. About 95% and 93% of the observed(randomly left-out) data points were within 95% intervals of thepredicted values for ln(PM2.5) and ln(AOD) respectively.

4.4. Computational performance

MCK is computationally intensive, for example, a prediction of2000 and 2500 values (of AOD and PM2.5, respectively) using40,000 and 53,000 observed data points took 4.96 and 8.26 CPUhours, respectively. This time included knot selection and simula-tion. The experiment was run on a PCwith 2.00 GHZ CPU and 24 GBRAM. In principle, MCK can handle much larger number of querypoints at negligible additional cost because knot selection andregional parameters are estimated only once.

4.5. Application

To demonstrate the application of MCK we developed a sys-tematic grid of daily PM2.5 for five different days for three differentseasons (Fig. 5). MCK nicely fills the continuous surface. Somecaution must be used for the prediction of extremely high values,

especially when observed data are very sparse across geographicspace. Similar to the prediction surfaces, PM2.5 exposure can becomputed for any day and location. As the cross-validation suggestsmost predictions were within 95% confidence intervals. Therefore,eliminating prediction beyond mean � 2 SD MCK offers a robustsolution for developing time-space resolved estimates of airpollution exposure.

4.6. Simulation

Since the PM2.5 and AOD data we used for implementation ofMCK do not fully follow the MCK assumption and lacked temporalhierarchical structure, a simulation study was performed to eval-uate the robustness of MCK. Simulation was conducted for pre-dicting estimates for regular grid and irregularly spaced points.Nine independent datasets (S ¼ 9) were simulated using thenonstationary process as detailed below. The spatial domain was a50 � 50 regular grid. The temporal domainwas 18 regularly spacedtime points. We divided the spatial domain into 10 � 10 regions,and region was sub-divided overlaying a 5 � 5 grid. Within eachregion, a non-separable product sum variogram model for thespatiotemporal process

gðh;u; parmsÞ ¼ gsðh;nugs;Hs; sillsÞ þ gtðu;nugt ;Ht ; silltÞ� kgsðh;nugs;Hs; sillsÞ � gtðu;nugt ;Ht ; silltÞ

where nug, H and sill denote the nugget, range and sill parameters(parms) for spatial and temporal variograms. We allowed fornonstationarity by varying the spatial and temporal covarianceparameters across regions.

nugswUNIFð0:4;0:6Þ HswUNIFð30;70Þ sillswUNIFð1:5;2:5Þ

nugtwUNIFð1:5;2:5Þ HtwUNIFð1;19Þ silltwUNIFð3:5;4:5Þ

and the interaction parameter k w UNIF(0,[max(nugs þ sills,nugt þ sillt)]�1) so that the resulting variogram isdefined (De Iaco et al., 2001).

Through partitioning the spatial domain, we simulated thespatiotemporal field independent across regions. However, parti-tioning created artificial edge effects for data points at the bound-aries of the regions. Thus, we smoothed the data spatially at eachtime point using a Gaussian kernel with bandwidth of one latticeunit. We also included large-scale spatiotemporal trend with a non-linear spatial trend

Msðr; cÞ ¼ 0:5þ 0:1r þ 0:1cþ 0:5r2 þ 0:7c2 þ 0:5rc

�1

where r and c denote the standardized row and column numbersand seasonal temporal trend

Table 1Mean and 95% margin of errors (based on 9 Monte Carlo samples) of MCK and AVG using observed ln(AOD), predicted ln(PM2.5) and simulated data. The results are basedholding off 10% of the data. The statistics are proportion of average predicted minus data values over the average of the data values (bias); the Mean Square prediction Error(MSE); the t test statistic (t) for hypothesis H0: mb ¼ 0; the sample variance of standardized cross-validation residuals (VarZ); and the ratio of average square standard errorsover MSE (ratio).

ln(AOD) ln(PM2.5) Simulated

MCK AVG MCK AVG MCK AVG

Bias �0.012 � 0.0019 �0.002 � 0.0025 0.0074 � 0.0019 0.0076 � 0.0034 �0.00043 � 0.0048 �0.076 � 0.015MSE 0.30 � 0.02 0.85 � 0.038 0.10 � 0.0044 0.34 � 0.038 0.34 � 0.0039 20 � 0.068t �4.40 �0.53 2.59 1.47 �0.06 �3.21VarZ 1.5 � 0.055 1.00 � 0.018 2.40 � 0.041Ratio 0.82 � 0.023 1.00 � 0.014 0.81 � 0.015

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e69 65

Author's personal copy

MtðtÞ ¼ 4sin�2pt6

�þ 4cos

�2pt6

where t denotes the discrete time point between 1 and 18. As aresult, we simulated S datasets with seasonal and non-linear spatialtrends, as well as nonstationary and non-separable local process.

Fig. 6a show the comparison of observed (from the simulateddataset) and predicted values. There are hardly any differencesbetween observed and predicted values. The overall biaswas �0.0004 � 0.0048 and MSE was 0.034 � 0.0039. The t-teststatistic of the null hypothesis H0:RCVjs ¼ 0 was 0.06. This meansthat given the model assumptions (in the observed dataset), MCKcan provide robust prediction across geographic space and timeand can enable us to develop robust exposure estimates at thespatiotemporal scales of health datasets, and offers an attractivesolutions for collocating environmental datasets available atdifferent spatiotemporal scales.

We also conducted a simulation study at irregularly spaced 2500points within spatiotemporal cubes, rather than on 50 � 50 grid.Our assumption of nonstationarity and non-separable spatiotem-poral covariance function remained the same as in the previous

analysis. The data ranged between �10 and 10. Tenfold cross-validation yielded bias ¼ 0.004, root mean square error ¼ 0.43and correlation 0.99. Like the simulation, MCK simulation atirregularly spaced data points yielded robust estimates (Fig. 6b).

5. Discussion

Time-space Kriging is an important method of prediction thatcan address multiple problems arising from the convergence oftime and space domains e misalignment problem, missing valuesand mismatch in the spatiotemporal resolutions (Kumar, 2012).This paper presents a theoretical framework of a computationallyefficient MCK procedure for predicting a systematic grid of time-space varying quantities. Our analysis suggests that the MCKhandled both nonstationarity and large gaps effectively andefficiently.

MCK is a Bayesian hierarchical approach and addresses both themodeling and computational challenges posed by Kriging for largespatiotemporal dataset. We specify time-space covariance functionconditionally by regionalization and decomposition of the

Fig. 3. Validation of MCK and spatiotemporal averaging methods using observed ln(PM2.5) data between December 2007 and January 2009, Cleveland MSA. 10% of the data pointswere held-off and values were predicted at these points using MCK method and averaging method.

Fig. 4. Validation of MCK and spatiotemporal averaging methods using observed ln(AOD) data between December 2007 and January 2009 for Cleveland MSA. 10% of the data pointswere held-off and values were predicted at these points using MCK method and averaging method.

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e6966

Author's personal copy

underlying process into temporal, regional and local effects. Inaddition, the induced marginal covariance is not limited to theseparable or stationary family. Computationally, we utilize initialdecompositions of various GMRF structure matrices and advancesin sampling from high dimensional GMRFs with linear constraints(Rue, 2001). The computation involves matrices of fixed dimensionchosen by themodeler, independent of the number of observations.Therefore, we can analyze large data and offer a worked outexample of overcoming the big n problem.

Despite the above advantages, a number of limitations remain.First, full Bayesian inference by MCK is still computationallyintensive when performing Kriging on large dataset that consists oflarge geographical area span across greater time interval without

losing local spatiotemporal details. This issue can be addressed byapproximation methods such as integrated nested Laplaceapproximation of posterior marginal (Rue et al., 2009). Parallelcomputing is another solution to address the computation prob-lem. Second, the choice of tessellation over time and space is still achallenging issue. We need to determine the time and spatial lo-cations in the dataset that lead to best predictive performance.Some researchers suggest varying locations over time and space,especially for MCMC sampling (Banerjee et al., 2010). Third, thecurrent implementation does not make full use of the temporalhierarchy in the model. However, such hierarchy will be useful forprocess(es) that exhibit both seasonal and local temporal trends. Inthis paper, we demonstrate the application of MCK to predict PM2.5

Fig. 5. Observed PM2.5 data points and predicted PM2.5 surface in Cleveland, MSA 2008.

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e69 67

Author's personal copy

and AOD at a given location and time, assuming the health data areat point locations. Nonetheless, MCK application can be extendedfurther to predict estimates for at any geographic units units andtemporal intervals. It can be achieved either by computing an es-timates at the centroid of the areal unit (as demonstrated for the2.5 km grid) or by predicting estimates at many points (simulatedrandomly) within the selected areal unit and time interval andaveraging the predictions at the simulated locations.

Our modeling approach has the potential to improve time-spaceresolved estimates of PM2.5, which is vital for air quality moni-toring. As typical with other environmental processes, air pollutiondata come from multiple sources. The EPA monitoring stationscollect hourly data for longer periods; but the monitoring stationsare sparsely distributed. Therefore, these data from EPA alone arenot sufficient to quantify population exposure to air pollution. Thesatellite based AOD data (corrected for meteorological conditionsand spatiotemporal structure) is increasingly being recognized as asurrogate measure of air quality (Kumar et al., 2007; Kumar et al.,2011a). Although AOD data have comprehensive spatial coverage,these data suffer from cloud contamination and include systematictemporal gaps. The hierarchical framework of MCK offers aconsistent approach to combine data from these two sources (EPAand NASA) to develop time-space resolved estimates of air pollu-tion exposure. Not only will these data be critically important todevelop population exposure at unprecedented spatiotemporalscales (of health dataset), but also these data can be used for airquality surveillance and management.

With the increasing availability of multi-resolution spatiotem-poral data from multiple sources, the use of time-space Kriging islikely to increase in the near future. We live in the era of geospatialinformation revolution in which convergence of spatiotemporaldata and information from multiple sources and multiple sensors(with varying spatiotemporal scales) is inevitable. While thisconvergence is important to develop an understanding of thecomplex socio-physical patterns and process that shape them,increasing data dimension and data size make it difficult to addressthe problem of spatiotemporal misalignment, mismatch in thespatiotemporal resolution/scales and gaps (or missing values)across space and time. As demonstrated in this paper, the compu-tationally efficient MCK method holds the key to address theseproblems. This paper also calls for further research on computa-tionally efficient time-space Kriging models and algorithms that

minimize computation time and capturehierarchical (local, regionaland global) nonstationary spatiotemporal structures in the data.

Acknowledgement

This research was supported by NIH (R21 ES014004-01A2) andEPA (R833865).Wewould like to thank two anonymous referees forproviding us with constructive comments and suggestions thatallowed us to improve the quality of the initial submission.

Appendices A and B. Supplementary data

Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.atmosenv.2013.02.034.

R Library: MCK library is at http://www.ccs.miami.edu/wnkumar/MCK_0.0-1.zip.

References

Banerjee, S., Carlin, B.P., Gelfand, A.E., 2004. Hierarchical Modeling and Analysis forSpatial Data. Chapman & Hall/CRC, Boca Raton, Florida.

Banerjee, S., Finley, A.O., Waldmann, P., Ericsson, T., 2010. Hierarchical spatial pro-cess models for multiple traits in large genetic trials. Journal of the AmericanStatistical Association 105, 506e521.

Banerjee, S., Gelfand, A.E., Finley, A.O., Sang, H., 2008. Stationary process approxi-mation for the analysis of large spatial datasets. Journal of the Royal StatisticalSociety Series B-Statistical Methodology 70, 825e848.

Besag, J., York, J., Mollie, A., 1991. Bayesian image-restoration, with 2 applications inspatial statistics. Annals of the Institute of Statistical Mathematics 43, 1e20.

Clayton, D.G., 1996. Generalized linear mixed models. In: Gilks, W.R., Richardson, S.,Spiegelhalter, D.J. (Eds.), Markov Chain Monte Carlo in Practice. Chapman &Hall, London, pp. 275e301.

Cressie, N., 1993. Statistics for Spatial Data. John Wiley & Sons, INC, New York.Cressie, N., Huang, H.C., 1999. Classes of nonseparable, spatio-temporal stationary

covariance functions. Journal of the American Statistical Association 94,1330e1340.

Cressie, N., Johannesson, G., 2008. Fixed rank Kriging for very large spatial data sets.Journal of the Royal Statistical Society Series B-Statistical Methodology 70,209e226.

Cressie, N., Shi, T., Kang, E.L., 2010. Fixed rank filtering for spatio-temporal data.Journal of Computational and Graphical Statistics 19, 724e745.

De Cesare, L., Myers, D.E., Posa, D., 2001. Product-sum covariance for space-timemodeling: an environmental application. Environmetrics 12, 11e23.

De Iaco, S., Myers, D.E., Posa, D., 2001. Space-time analysis using a general product-sum model. Statistics & Probability Letters 52, 21e28.

De Iaco, S., Myers, D.E., Posa, D., 2002. Space-time variograms and a functional formfor total air pollution measurements. Computational Statistics & Data Analysis41, 311e328.

Fig. 6. a: MCK validation using the simulated data at regularly spaced data points, simulated from spatiotemporal process with large spatiotemporal trends and nonstationarycovariance function. Shown is 10% random subset of the 9 datasets. b: MCK validation using simulated data at irregularly spaced data points, simulated from spatiotemporal processwith large spatiotemporal trends and nonstationary covariance function. Shown is 10% random subset of the 9 datasets.

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e6968

Author's personal copy

Diggle, P., Lophaven, S., 2006. Bayesian geostatistical design. Scandinavian Journalof Statistics 33, 53e64.

Fuentes, M., Chen, L., Davis, J.M., Lackmann, G.M., 2005. Modeling and predictingcomplex space-time structures and patterns of coastal wind fields. Environ-metrics 16, 449e464.

Haas, T.C., 1995. Local prediction of a spatio-temporal process with an applicationto wet sulfate deposition. Journal of the American Statistical Association 90,1189e1199.

Hartman, L., Hossjer, O., 2008. Fast kriging of large data sets with Gaussian Markovrandom fields. Computational Statistics & Data Analysis 52, 2331e2349.

Kumar, N., 2012. Uncertainty in the relationship between criteria pollutants and lowbirth weight in Chicago. Atmospheric Environment 49, 171e179.

Kumar, N., Chu, A., Foster, A., 2007. An empirical relationship between PM2.5 andaerosol optical depth in Delhi metropolitan. Atmospheric Environment 41,4492e4503.

Kumar, N., Chu, A.D., Foster, A.D., Peters, T., Willis, R., 2011a. Satellite remote sensingfor developing time and space resolved estimates of ambient particulate inCleveland, OH. Aerosol Science and Technology 45, 1090e1108.

Kumar, N., Liang, D., Linderman, M., Chen, J., 2011b. An Optimal Spatial Sampling forDemographic and Health Surveys. SSRN Working Paper e 1808974.

Levy, R.C., Remer, L.A., Mattoo, S., Vermote, E.F., Kaufman, Y.J., 2007. Second-gen-eration operational algorithm: retrieval of aerosol properties over land from

inversion of moderate resolution imaging spectroradiometer spectral reflec-tance. Journal of Geophysical Research-atmospheres 112, D13211.

Rue, H., 2001. Fast sampling of Gaussian Markov random fields. Journal of the RoyalStatistical Society Series B-Statistical Methodology 63, 325e338.

Rue, H., Held, L., 2006. Gaussian Markov Random Fields Theory and Applications.Chapman & Hall/CRC, Boca Raton.

Rue, H., Martino, S., Chopin, N., 2009. Approximate Bayesian inference for latentGaussian models by using integrated nested Laplace approximations. Journal ofthe Royal Statistical Society Series B-Statistical Methodology 71, 319e392.

Rue, H., Tjelmeland, H., 2002. Fitting Gaussian Markov random fields to Gaussianfields. Scandinavian Journal of Statistics 29, 31e49.

Sahu, S.K., Gelfand, A.E., Holland, D.M., 2006. Spatio-temporal modeling of fineparticulate matter. Journal of Agricultural Biological and Environmental Sta-tistics 11, 61e86.

Schmid, V., Held, L., 2004. Bayesian extrapolation of space-time trends in cancerregistry data. Biometrics 60, 1034e1042.

Stevens, D.L., Olsen, A.R., 2004. Spatially balanced sampling of natural resources.Journal of the American Statistical Association 99, 262e278.

Zhu, Z.Y., Stein, M.L., 2006. Spatial sampling design for prediction with estimated pa-rameters. Journal of Agricultural Biological and Environmental Statistics 11, 24e44.

Zimmerman, D.L., 2006. Optimal network design for spatial prediction, covarianceparameter estimation, and empirical prediction. Environmetrics 17, 635e652.

D. Liang, N. Kumar / Atmospheric Environment 72 (2013) 60e69 69