functional convolution models - cornell...

Functional Convolution Models

Maria Asencio

Department of Statistical Science

Cornell University

Ithaca, NY 14850

Giles Hooker

Department of Biological Statistics and

Computational Biology

Cornell University

Ithaca, NY 14850

H. Oliver Gao

Department of Civil and Environmental Engineering

Cornell University

Ithaca, NY 14850

Acknowledgements

Giles Hooker was partially supported from NSF grants DEB-0813743, CMG-0934735 and DMS-

1053252.

1

Functional Convolution Models

Abstract

This paper considers the application of functional data analysis methods to modeling particulate

matter trajectories from dynamometer experiments. In particular the functional convolution model

is introduced as a restriction of the functional historical linear model that allows for functional data

of different lengths to be used. We present a penalized ordinary least squares estimator for the

model and a novel bootstrap procedure to provide pointwise confidence regions for the estimated

convolution functions. The model is illustrated on the California E55/59 study of diesel truck

emissions.

Keywords: Convolution Model, Functional Data Analysis, Distributed Lag Model, Particulate

Matter, Emissions Modeling

1. INTRODUCTION

This paper proposes the functional convolution model as an approach to modeling the concur-

rent dependence of functional data at different lengths. This work is motivated by the California

E55/59 study (Clark, Gautam, Wayne, W.Lyons, Thompson, and Zielinska, 2007) of vehicle par-

ticulate matter (PM) emissions. In this study, particulate matter responses to driving conditions

in medium and heavy duty trucks are examined via dynamometer experiments. Trucks are placed

on a dynamometer – a series of rollers that allow the truck wheels to turn while keeping the truck

stationary – and are driven through a pre-set series of driving cycles that specify speeds to be main-

tained for given times. These driving cycles are chosen to mimic real-world road conditions, from

highways to suburban traffic. An emissions analyzer is attached to the truck tailpipe and records

second-by-second counts of particulate matter; small solid particles less than 2.5 micrometers in

diameter that have been implicated in serious respiratory illness.

Existing models for particulate matter emission have tended to be based on either (i) a prediction

of average or total PM from average speed, average acceleration and other quantities accumulated

over the driving cycle, (ii) a regression based only on the instantaneous driving conditions or (iii)

complex models that attempt to parameterize all aspects of the production and transportation of

2

particles in the engine and exhaust system in terms of physical processes (see Ajtay, Weilenmann,

and Soltic, 2005; Capiello, Chabini, Nam, Lue, and Zeid, 2002; Ahn, Rakha, Trai, and Aerde,

2002). Our goal lies between these approaches in providing a more accurate model for PM emissions

without requiring knowledge of kinetic parameters within a dynamical system, while still accounting

for mixing during transportation and the serial dependence of measurements.

The goal of the study is to develop a model to predict instantaneous PM emissions from driving

behavior (i.e. speed and acceleration) that will be applicable across numerous driving cycles. From

Figure 1, it is apparent that all of speed, acceleration and PM can be regarded as following smooth

dynamics and thus can be approached through the machinery of functional data analysis (Ramsay

and Silverman, 2005; Ramsay et al., 2009). However, the direct application of functional data

analysis techniques is hampered by the lack of a standard time interval for the sampling domain –

experimental runs can vary between 370 and 1190 seconds – nor is it appropriate in this context to

register the responses to a common time scale as this will disrupt real-time constants such as the

transport time from the engine to the emissions analyzer. Instead, we assume stationarity in the

dependence of PM on the past history of velocity and acceleration.

Specifically, our model can be represented as a functional response model. We assume a response

yi(t) (in this case PM) with covariates xij(t) (velocity and acceleration) measured on the interval

[0 Ti] for observations i = 1, . . . , n. These are related via the functional convolution model:

yi(t) = β0 +p∑

j=1

∫ αj

0βj(u)xij(t− u)du + εi(t), i = 1, . . . , n. (1)

Here we take yi(t) respond to the past αj time units of the xij via a functional linear model (Ramsay

and Silverman, 2005). Note that while the yi(t) and the xij(t) must share the same time domain,

this domain need not be the same across different observations. We have parameterized the model

so that βj(0) represents the instantaneous effect of xij(t) on the response at time t. In the context

of emissions experiments, the convolution in the integral (1) can be thought of approximating the

mixing of particles from different time points during transit through the exhaust system. This

model is a restriction of the historical functional linear model presented in Malfait and Ramsay

(2003) in which we require the effects β(u) = (β0, β1(u), . . . , βp(u)) to remain constant over time. It

can also be thought of as the functional extension of distributed lag models in time series (Greene,

3

2008). Similar models have been used to estimate the hemodynamic response in functional MRI

imaging Genovese (2000); Zhang et al. (2007); Zhang and Yu (2008).

In practise, the yi(t) and xij(t) are measured at discrete time points ti = (ti1, . . . , tiNi) providing

data yi = (yi(ti1), . . . , yi(tiNi)), and xij = (xij(ti1), . . . , xij(tiNi)). These measurements need not

be taken at concurrent times and our methods will work when pre-smoothed estimates are obtained

for functional data. However, in our example all observations are taken at regular second-by-second

intervals and we will use this regularity to avoid the need to pre-smooth the observed data.

As a distinction from distributed lag models, we assume the dependence on past covariates as

given by β(u) to be smooth. We therefore propose a penalized ordinary least squares estimate as

an initial estimate of the parameters β(u) in (1). In addition, the lengths α = (α1, . . . , αp) of the

convolutions and smoothing parameters are selected by leave-one-curve-out cross validation. This

cross validation must be conducted appropriately to balance the differing lengths of observations

per experimental run.

In addition to estimation we develop confidence intervals for model parameters through both

a delta-method and via bootstrapping techniques. Because of the variable length of observations,

both these techniques require the εi(t) to be stationary and generated from an approximately

Gaussian process with an autocovariance structure

cov(εi(s), εi(t)) = R(|t− s|) (2)

that can be estimated directly from the fitted residuals. Given this specification of covariance,

delta-method confidence intervals can be calculated directly. By way of contrast, bootstrapping

methods must be carried out in a way that accounts for the dependence in the εi(t). This is usually

obtained by bootstrapping in blocks (see Lahiri, 2003). In the context of functional data, however,

block bootstrapping destroys the continuity and smoothness of the εi(t) leading to finite-sample

distortion of the estimated variance. Instead, a transformation of block-bootstrapped residuals is

proposed that recovers the structure of (2) while retaining some of the non-parametric flavor of the

block-bootstrap.

In the remainder of the paper Section 2 introduces the estimation of the functional convolution

model, Section 3 develops our estimate for auto-covariance and delta-method confidence intervals

4

and Section 4 introduces our bootstrapping procedures and Section 5 demonstrates the success of

our method on emissions data.

2. FUNCTIONAL CONVOLUTION MODELS AND ESTIMATION

In this section we specify our estimation of parameters in the model (1). The βj(u) are estimated

via a penalized ordinary least squares (OLS) estimate. Conditions to ensure the identifiability of

OLS estimates in this context are not easy to establish; a discussion of them is given in Appendix A.

However, a smoothing penalty is applied to these data in order to improve the numerical stability

of the least squares estimate and as a regularization device. We then apply leave-one-curve-out

cross validation to estimate the lengths of convolution required along with smoothing parameters.

2.1 Ordinary Least Squares

We propose an ordinary least squares estimate for the model given in (1), setting β0 = y for

convenience and minimizing

SSE(β) =n∑

i=1

∫ Ti

α∗

yi(t)−

p∑

j=1

∫ t

t−αj

βj(u)xij(t− u)du

2

dt.

for α∗ = max(α1, . . . , αp) Formally, estimates for the βj can be written as the solution of the

variational problem; conditions under which these estimates are uniquely defined are studied in

Appendix A. However, it will be useful to have a continuous representation of the βj(t) which also

admits smoothing, we will thus approximate the βj via the basis expansion:

βj(t) =Kj∑

k=1

φjk(t)cjk = φj(t)cj .

In the case of the emissions data we have chosen the φj(t) to be a fourth-order B-spline basis on

the interval [0 αj ] with knots every second.

This representation allows us to introduce a smoothing penalty in order to improve the regularity

of our estimates. This penalty takes the form of

PEN(β, λ) =p∑

j=1

λj

∫ αj

0

[Ljβj(u)

]2du

where each Lj is a linear differential operator which we take to be the operator Ljβj(u) = d2

dt2βj(u).

Our estimates now minimize the classical combined criterion SSE(β) + PEN(β). Specific formulae

5

for the minimizing coefficients cjk can now be calculated directly. Letting

βj(t) =Kj∑

k=1

φjkcjk

we can express the solution c∗ = [cT1 · · · cT

p ]T as

c∗ = (Z + P )−1 Y

where

Z =

Z11 · · · Zip

.... . .

...

Zp1 · · · Zpp

is a block matrix with blocks

Zkl =n∑

i=1

∫ Ti

α∗

[∫ αj

0xij(t− u)φj(u)du

] [∫ αl

0xil(t− u)φl(u)du

]T

dt

and

Y =

∑ni=1

∫ Ti

α∗ yi(t)[∫ α1

0 xi1(t− u)φ1(u)du]dt

...∑n

i=1

∫ Ti

α∗ yi(t)[∫ αp

0 xi1(t− u)φp(u)du]dt

and the composite penalty matrix is

P =

0 0 · · · 0 0

0 λ1R1 · · · 0 0...

. . ....

0 0 λp−1Rp−1 0

0 0 · · · 0 λpRp

with penalty matrices for each cj· given by the semi-norm

[Rj

]kl

=∫ αj

0Ljφjk(u)Ljφjl(u)du.

An important note here is that while SSE(β) is given via integrals, as are the formulae above,

in our application all of PM, velocity and acceleration are measured on a second-by-second basis.

We can thus replace the integrals with summations:

SSE(β) =n∑

i=1

Ni∑

t=α∗

yi(t)−

p∑

j=1

αp∑

l=0

βj(l)xij(t− l)

6

where Ni is the number of measurements for the functional observation (yi(t), xi(t)). The use

of integer-valued observation points in xik(t − l) is justified here by the one-second measurement

interval. The integrals in the formulae above can also be adjusted accordingly, yielding a formulation

in terms of penalized linear regression; these calculations are detailed in Appendix B. This avoids

the need to pre-smooth either the yi(t) or the xij(t), but is only feasible when both are sampled on

the same, regularly-spaced, time points. For more sparsely sparsely observed cases, a representation

in terms of an explicit smooth could also be instantiated.

2.2 Cross Validation

While parameters (β0, β1(u), . . . , βp(u)) can be estimated via a penalized OLS approach, we

turn to cross validation to specify the convolution lengths α and smoothing parameters λ. In

particular, we employ leave-one-curve-out cross validation as a score for both λ and α:

CV (λ, α) =n∑

i=1

∫ Ti

α∗

yi(t)− β−i

0,λα −p∑

j=1

∫ αj

0β−i

j,λα(u)xij(t− u)du

2

dt.

Where (β−i0,λα, β−i

1,λα, . . . β−ip,λα) are the parameter estimates obtained at λ and α after removing the

ith observation. This approach has been used for estimating the mean of a collection of functional

random variables (Rice and Silverman, 1991) and for functional response models (Ramsay and

Silverman, 2005). Explicit formulae to calculate CV (λ, α) have been obtained in Golub and van

Loan (1996) and Hoover et al. (1998).

Cross validation over numerous parameters is problematic in requiring a search of a high di-

mensional space; moreover multiple local minima in CV (λ, α) complicate this search (see examples

in Ramsay et al., 2009). We have therefore simplified the problem in two ways:

1. We have set all parameters to be equal: α1, . . . , αp = α and λ1, . . . , λp. The former of these

choices can be motivated physically by the assumption that the convolution approximately

represents the mixing of the instantaneous production of PM in the engine and we may

therefore expect that all engine parameters affect PM output over the same time interval.

The latter is commonly employed where more than one functional parameter is estimated;

see examples in Ramsay et al. (2009).

7

2. A one-step estimate for α and λ is employed, first estimating α at λ = 0 and then choosing

the smoothing parameters:

α = argminα

CV (0, α)

λ = argminλ

CV (λ, α).

An iteration of this scheme will lead to a local minimum over the joint space of α and λ.

However, the one-step approximation taken in this order can be justified if the model is

identifiable at λ = 0. In practice, a small value of λ is helpful in stabilizing the numerical

estimates and we begin from there. The estimate of α is then an un-penalized model selection

problem after which λ can be used to regularize the estimators if needed.

Following this one-step procedure, we obtain estimates (β0,αλ, β1,αλ(u), . . . , βp,αλ(u), α, λ). In

providing confidence intervals below, we will keep α and λ fixed.

3. VARIANCE ESTIMATION

As was done for the mean dependence of yi(t) on xi(t), we can account for the unequal lengths

of functional observations through the assumption of stationarity in the deviation of yi(t) from its

expectation. In particular, we assume the non-parametric auto-covariance structure given in (2).

We can estimate R(u) via a method-of-moments estimator. Obtaining residuals

εi(t) = yi(t)− β0,λα −p∑

j=1

∫ αj

0βj,λα(u)xij(t− u)du

our estimate is simply

R(u) =

1n

∑ni=1

1Ti−u

∫ Ti−u0 εi(t− u)εi(t)dt u < h

0 u ≥ h

Here we have chosen to threshold R(u) for values u ≥ h for some h < min(T1, . . . , Tn) at a value

chosen manually by examining the empirical autocovariance of the residuals (see Figure 2).

The estimation of R(u) now allows for a direct delta-method variance calculation that

Cov(βj(u), βl(v)) = Φj(u)(Z + P )−1C(Z + P )−1Φl(v)T

8

where

C =

C11 · · · Cip

.... . .

...

Cp1 · · · Cpp

is a block matrix with blocks

Ckl =n∑

i=1

∫ ∫ [∫ αj

0xij(t− u)φj(u)du

]R(|t− s|)

[∫ αl

0xil(s− u)φl(u)du

]T

dtds.

and Φj(u) is a∑

Ki vector with the values φjk(u) in the entries corresponding to cj . For regular

second-by-second samples, these calculations can be approximated; detailed formulae have been

given in Appendix B.

Once Cov(βj(u), βl(v)) have been calculated, pointwise confidence bands can be obtained for

β(t) from

βj(u)± 2√

Cov(βj(u), βj(u)).

These can be compared to the bootstrap-based confidence bands as shown in Figure 5.

4. BOOTSTRAP PROCEDURES

The delta-method confidence intervals proposed above rely on assumptions about the stationar-

ity and near-Gaussianity of the residual processes εi(t) in order to be valid. As a means of providing

more robust intervals, we develop a residual bootstrap. In contrast to classical functional response

models, we cannot simply re-sample the estimated residual processes εi(t) due to the different

lengths of observations. Instead, we make use of the assumed stationarity structure in the εi by

performing a block bootstrap.

There is a considerable literature on bootstrap methods for dependent data (see Lahiri, 2003;

Hardle et al., 2003). Here we use a modified non-overlapping block bootstrap. The block bootstrap

re-samples sequences of data while retaining the sequence structure, we thus break each εi into

segments of length h

εik(s) = εi(s + (k − 1)h), 0 ≤ s < h for k = 1, . . . , Ti/h.

These are re-sampled with replacement across both i and k and new residual processes are con-

9

structed from the resampled curves

ε∗i (t) ={

εσ(i,k)(t− (k − 1)h) : (k − 1)h ≤ t < kh

Where σ(i, k) indicates the resampled indices of the collection of blocks. In a residual block boot-

strap, the ε∗i (t) would be added to predicted values to create a new collection of functional responses

y∗i (t), i = 1, . . . , n from which parameters could be re-estimated.

In functional data analysis, however, the ε∗i (t) violate the smoothness assume for the εi(t) as

illustrated in Figure 4. This is a general problem for the block bootstrap and has been dealt with

in a number of ways; either by trying to match blocks at their ends (Carlstein et al., 1998) or

by down-weighting the block ends (Paparoditis and Politis, 2001). In this paper we propose an

alternative based on transforming the ε∗i (t) to have the same covariance structure as was assumed

for the original process.

Specifically, the residual processes are assumed to have covariance structure (2), while the

block-bootstrapped version has a block-structure

cov(ε∗i (t), ε∗i (s)) =

R(|t− s|) ∃k : (k − 1)h ≤ t < kh, (k − 1)h ≤ s < kh

0 otherwise

and we seek transformations Ki so that Ki[ε∗i (t)] has covariance (2). This is formally given in terms

of functional operators. However, when working with regular discrete observations, we can readily

instantiate the matrices

[Ci]jk = cov(εi(tij), εi(tik)) = R(|tij − tik|)

and

[C∗i ]jk = cov(ε∗i (tik), ε

∗i (tij))

then the discrete-sample realization of Ki is given by

Mi = C1/2i

[C∗

i

]−1/2

where C1/2i has been calculated from a singular value decomposition. We now use the transformed

block-bootstrap residuals

ε∗i = Miε∗i .

10

and obtain

y∗i (t) = yi(t) + ε∗i (t)

and use these to obtain an estimate β∗ as above, keeping the parameters chosen by cross-validation

fixed. This is repeated B times and the collection {β∗b }Bb=1 allows us to calculate biases as well as

variances from the bootstrap sample. Pointwise confidence intervals are then obtained from the

mean and standard deviation of the bootstrap samples and can be compared with the delta-method

intervals above. An estimate of the distortion caused by blocking is given in the lower-left panel of

Figure 4 where we have re-estimated the auto-covariance R(u) from bootstrap samples, resulting

in noticeably lower estimates.

An important detail is the way in which end-blocks of length less than h are treated. In the

bootstrap scheme above, these were given equal sampling weight with all the other blocks and a

sequence ε∗i was created that may be longer than Ti. The structure for Mi was updated to account

for potentially smaller blocks appearing and the residuals were transformed to Miε∗i before being

truncated at the right end. This order of operations was chosen to minimize the effect of the choice

to truncate from left or right ends.

We note that our transformation of the block bootstrap falls somewhere between a parametric

bootstrap – in which new ε∗i would be directly sampled from a Gaussian process distribution with

variance Ci – and a standard block bootstrap. The former case will result exactly in delta-method

estimates assuming infinite bootstrap samples. We speculate that our transformed block bootstrap

is dependent on the assume covariance structure (2) but that it will provide robustness to violations

of Gaussian assumptions and faster convergence to true sampling distributions when the covariance

structure is accurate. We also note that we do not necessarily require the size of the blocks to

increase with sample size for these properties to hold. The asymptotic properties of this bootstrap

procedure are the subject of ongoing research.

5. THE E55/59 STUDY

We present a case study of data taken from the E-55/59 program in southern California (Clark

et al., 2007). In this program, chassis dynamometer measurements were gathered with the objective

of getting a better emission inventory of medium and heavy duty trucks. We specifically focused on

11

modeling PM which has been associated with the increased risk of respiratory and cardio-vascular

disease in an exposed population (Laden et al., 2000; McCreanor et al., 2007; Schwartz et al.,

2002). Furthermore, predictive model of PM emissions is an important component of developing

new regulatory frameworks and planning transportation networks. We make a case-study of the

functional convolution model as applied emission in a single truck across a variety of driving models.

In this experiment, trucks are placed in the chassis dynamometer and their tailpipe connected

to an emission analyzer which separates and records the emissions. A velocity pattern that char-

acterizes a specific driving behavior is applied to the truck. Measuring emissions in this manner

allows the researcher to apply a velocity pattern multiple of times with high precision and in a

closed environment to reduce between-experiment variability. However, the recorded emissions are

not the direct effect of the instantaneous change of velocity since particles can experience some

delay during transportation through tailpipe to the analyzer. This is because the particles interact

with other factors such as temperature, other emission particles and the air-flow of the equipment.

The purpose of this case study to demonstrate how the convolution model can provide accurate

predictions while accounting for the smoothness of the observed signals and the delay and mixing

due to particle transport. In addition, we want our model to be cycle-independent: predictive of

PM trajectories given any velocity pattern even when this was not used at the time when the model

parameters were estimated.

Our sample data comes from four different velocity patterns and the weight load 3100 applied

to a 1992 Ford medium-heavy duty truck. The four different velocity patterns can be related to

general traffic situations that the trucks experience in real-world driving scenarios. For example,

we can see that the truck accelerates and then maintains a speed of around 60 miles per hour in

HHDDT S velocity pattern shown in Figure 1. In contrast, the truck slows down, speeds up or

stops completely in different time intervals and the highest velocity value that it experiences is 30

miles per hour in the MHDTLO velocity pattern. Given these characteristics, the first velocity

pattern described approximates driving behavior on highways and the second in suburban streets.

The trajectories of PM as result of the four velocity patterns are also shown in Figure 1. The

nonlinearity of these curves makes it difficult to easily observe their relation with the velocity

patterns. In some intervals, we can see that the relative fast increase of velocity for a period of

12

time manifests as high points in the PM. However, this behavior not as evident in other intervals

of the curves. Figure 2 shows the cross correlation curves of the four sample driving cycles where

time dependence on the order of 20 seconds is evident.

Cross-validation resulted in values of λ = 0.1 and α = 20. Figure 3 demonstrates the cross-

validation surface where we observe that the choice of α does not depend on λ, providing further

justification for our one-step method.

Figure 4 demonstrates the effect of the transformed bootstrap. The top panel gives a sample

block-bootstrap residual before and after transformation. The bottom two panels plot bootstrap

estimates of the autocovariance of the residual functions before and after transformation along with

the sample autocovariance function where the distortion due to a short block is evident.

Figure 5 provides the autocovariance of residuals and the estimated convolution functions re-

spectively. Here we observe an immediate increase in PM resulting from high veolicity, while

acceleration appears to have a negative effect. We speculate that this is due to correlation between

velocity and acceleration across driving cycles. Delta-method and bootstrap confidence intervals

demonstrate demonstrate high agreement across samples. Figure 6 demonstrates the cross-validated

performance of our estimates. Each panel plots the observed PM trajectory along with the tra-

jectory predicted from the remaining data. Here we observe that the essential features of the PM

trajectory are replicated, demonstrating good generalization ability.

6. CONCLUSIONS

This paper has presented a new model for functional responses when functions are observed of

different lengths. This model makes use of an assumption of stationarity in the observed responses

to provide a predictive relationship between the observed values of the response and historical

values of a covariate. In addition to providing a modeling methodology, we have established a

residual block bootstrap procedure that is applicable for these data and demonstrated that it

reduces the distortion due to discontinuities in residual functions when block bootstrapping is

applied naively. We have applied our model to a study of PM emissions from truck exhaust where

we have demonstrated its successful generalization properties.

A number of further areas remain to be investigated. The numerical biases associated with

13

the use of a basis expansion can be expected to be removed for finer knot sequences, so long as

in-fill asymptotics can be assumed for the sampling times of the response. Similarly, properties of

block-bootstrap methods have not, to our knowledge, been investigated in the context of functional

data. More interestingly, we have selected the length of convolution α via cross-validation. Since

this represents a model selection problem parameterized by a continuous parameter we expect

significant theoretical development to be required to study the properties of this choice. In the

context of the transport emissions study, the inclusion of multiple trucks will require the extension

of these models to a random-effects framework; some exploratory data analysis suggests there is

also considerable between-truck heteroscedasticity, representing additional modeling challenges.

REFERENCES

Ahn, K., H. Rakha, A. Trai, and M. Aerde (2002). Estimating vehicle fuel consumption and

emissions based on instataneous speed and acceleration levels. Journal of Transportation Engi-

neering 128, 189–190.

Ajtay, D., M. Weilenmann, and P. Soltic (2005). Towards accurate instantaneous emission models.

Atmostpheric Environment 39, 2443–2449.

Borelli, R. L. and C. S. Coleman (2004). Differential Equations: A Modeling Perspective. New

York: John Wiley & Sons.

Capiello, A., I. Chabini, E. Nam, A. Lue, and M. Zeid (2002). A statistical model of vehicle

emissions in and fuel consumption. Ford-MIT Alliance.

Carlstein, E., K.-A. Do, P. Hall, T. Hesterberg, and H. R. Kunsch (1998). Matched-block bootstrap

for dependent data. Bernoulli 4, 305–328.

Clark, N. N., M. Gautam, W. S. Wayne, D. W.Lyons, G. Thompson, and B. Zielinska (2007).

Heavy-duty chassis dynamometer testing for emissions inventory, air quality modeling, source

apportionment and air tocins emissions inventory: E55/59 all phases. Technical Report E55/59,

Coordinating Research Council.

14

Genovese, C. (2000). A bayesian time-course model for functional magnetic resonance imaging

data. Journal of the American Statistical Association 95, 691–703.

Golub, G. H. and C. F. van Loan (1996). Matrix Computations (3rd ed.). Baltimore: Johns Hopkins

University Press.

Greene, W. H. (2008). Econometric Analysis. New Jersey: Prentice Hall.

Hardle, W., J. Horowitz, and J.-P. Kreiss (2003). Bootstrap methods for time series. International

Statistical Review 71, 435–459.

Hoover, D. R., J. A. Rice, C. O. Wu, and L. P. Yang (1998). Nonparametric smoothing estimates

of time-varying coefficient models with longitudinal data. Biometrika 85, 809–822.

Laden, F., J. Schwartz, D. W. Dockery, and L. M. N. L. M. (2000). Association of fine particulate

matter from difference sources with daily mortality in six u. s. cities. Environmental Health

Perspectives 108, 941–947.

Lahiri, S. N. (2003). Resampling Methods for Dependent Data. New York: Springer.

Malfait, N. and J. O. Ramsay (2003). The historical functional linear model. Canadian Journal of

Statistics 31, 115–128.

McCreanor, J., P. Cullinam, and M. J. Nieuwenhuijsen (2007). Respiratory effects of exposure to

diesel traffic in persons with asthma. New England Journal of Medicide 357, 2348–2358.

Paparoditis, E. and D. N. Politis (2001). Tapered block bootstrap. Biometrika 88, 1105–1109.

Ramsay, J. O., G. Hooker, and S. Graves (2009). Functional Data Analysis in R and Matlab. New

York: Springer.

Ramsay, J. O. and B. W. Silverman (2005). Functional Data Analysis. New York: Springer.

Rice, J. A. and B. W. Silverman (1991). Estimating the mean and covariance structure non-

parametrically when the data are curves. Journal of the Royal Statistical Society. Series B

(Methodological) 53 (1), 233–243.

15

Schwartz, J., F. Laden, and A. Zanobetti (2002). The concentration-resposne relation between pm

2.5 and daily deaths. Environmental Health Perspectives 110, 1025–1029.

Zhang, C. and T. Yu (2008). Semiparametric detection of significant activation for brain fMRI.

Annals of Statistics 36, 16931725.

Zhang, C., Y. Zhang, and T. Yu (2007). A comparative study of one-level and two-level semipara-

metric estimation of hemodynamic response function for fMRI data. Statistics in Medicine 26,

38453861.

A. ON THE IDENTIFIABILITY OF UNPENALISED ORDINARY LEAST SQUARES

ESTIMATES

In general, the design of covariate functions has in functional data analysis has received little

attention. Here the most common models fall into one of two categories; either being unidentifiable

in finite samples without penalization (smoothing and functional linear regression fall into this

category) or being always identifiable, as in the concurrent linear models, in which case penalization

is unnecessary apart from being a tool for regularization (see Ramsay and Silverman, 2005). The

functional convolution models studied here present a challenge in that identifiability depends on

the finite-sample design of the covariates. In this appendix, we illustrate the issues that arise.

We will choose to assume our covariate processes βj(t) lie in the Sobolev space W [0 αj ] of

functions defined on [0 αj ] for which all derivatives that appear in Lj are square integrable. Within

this space, minimizing SSE(β) is equivalent to solving the variational problem

< γ, G[β] >= F (γ), ∀γ = (γ1, . . . , γp) ∈ W [0 α1]⊗ · · · ⊗W [0 αp]

where

F (γ) =p∑

j=1

n∑

i=1

∫ αj

0γj(u)

∫ Ti−α

0xij(t− u)yi(t)dtdu,

and

G[β] =p∑

j,k=1

n∑

i=1

∫ αj

0

∫ Ti

α∗xik(t− v)xij(t− u)dtβj(u)du.

16

where the inner product is taken as the product L2 inner product on square-integrable functions

< γ, β >=p∑

j=1

∫ αj

0γj(u)βj(u)du.

The identification of the OLS estimates is now equivalent to the invertibility of G. In particular,

β will be uniquely identified if

< γ, G[β] >= 0, ∀γ ⇒ β = 0

and in particular if

< β, G[β] >=∫ Tj

α∗

n∑

i=1

[∫ αj

0xij(t− v)βj(v)dv

] [∫ αj

0xij(t− u)βj(u)du

]dt > 0 (A.1)

for all j and all non-zero βj .

In particular, since β lies in a reproducing kernel Hilbert space, we can let ξ1, . . . form a basis

for W [0 α1]⊗ · · · ⊗W [0 αp] and (A.1) reduces to the requirement that

∫ αj

0ξjl(u)xij(t− u)du 6= 0 (A.2)

for at least one u and i for every l. This condition is not readily checked, particularly in real-

world applications since it requires checking an infinite collection of inner-products. However, it is

possible to characterize designs such that the collection

xijt(j) = xij(t + u) : [0 αj ] → R

spans a finite-dimensional space as t is varied; ie for which there is a finite collection of functions

η1(u), . . . , ηK(u) such that

xijt(u) =K∑

k=1

ck(t)ηk(u)

for all t. This set of self-similar functions can be expressed as solutions to a linear differential

equation. In the lemma below we restrict to a single real-valued function for the sake of clarity and

set αj = 1

Lemma A.1. Let x(t) have continuous first derivatives, x satisfies

x(t + u) =K∑

k=1

ζk(t)ηk(u) (A.3)

17

for all t and ηk : [01] → R, if and only if

x(t) =K∑

k=1

cktmkeakt sin(bkt + dk) (A.4)

for real-valued constants (ak, bk, ck, dk), and integers mk, k = 1, . . . , K.

Proof. We observe that

x(t + u + dt) =K∑

k=1

ζk(t + dt)ηk(u) =K∑

k=1

ζk(t)ηk(u + dt)

and thus

x′(t + u) =K∑

k=1

ζ ′k(t)ηk(u) =K∑

k=1

ζk(t)η′k(u) (A.5)

where x′(·) is the derivative taken with respect to its argument. Defining ul = (l − 1)/K for

k = 1, . . . ,K and matrices

Xlk = ηk (ul) , X = η′k (ul)

we can represent the last inequality in A.5 restricted to u1, . . . , uK as the differential equation

d

dtη =

[X−X

]η (A.6)

where X− is a generalized inverse. Solutions to (A.6) have general form of (A.4) (eg Borelli and

Coleman, 2004), and thus x(t) must at least be of this form. It is easy to check that any function

of the form (A.4) satisfies (A.3) for some K and (ζk, ηk, k = 1, . . . ,K), completing the converse

implication.

It is not difficult to provide example designs that satisfy (A.2). Consider the basis of periodic

functions on [01] given by ξk(u) = sin(2kπt) then taking x(t) to be defined on [010], say, with

x(t) =∑

2−kξk(t)

has non-zero inner product on [0 1] with ξk(u) for each k. We note that the range of x(t) need

not be restricted to [0 1]. However, between the finite-dimensional design described in Lemma A.1

and the identifiable design, it is possible to find xi(t) that is orthogonal to an infinite dimensional

subspace. Continuing our example, setting

x =∑

2−4kξ2k(t)

18

will yield < x, ξk >= 0 for all k odd provided the domain of x is of integer length. Evaluating

identifiability based on finite-dimensional approximations as given, for example in Appendix B is

straightforward. However, it seems more challenging to provide a protocol for designing experiments

for which this model will be employed.

Beyond identifiability, we have not addressed the question of the asymptotic properties of the

design. Under stationarity conditions on the ε(t), consistency in the OLS estimates is demonstrat-

able if the left hand side of (A.2) diverges. Here, however, this divergence can occur based on

an increased number of samples, or based on increasing the domain Ti on which each sample is

measured.

B. PENALIZED OLS FORMULAE

In the case that an approximation to the integrals in Z and Y is made based on second-by-second

observational records, we can approximate

Z = Z∗T Z∗, Y = Z∗T Y

for

Z∗ =[Z∗T1 · · · Z∗Tn

]T

with

Z∗i =

1∑α1

u=1 xi1(1− u)φ(u)T · · · ∑α1u=1 xip(1− u)φ(u)T

......

1∑α1

u=1 xi1(Ti − u)φ(u)T · · · ∑α1u=1 xip(Ti − u)φ(u)T

and

Y =[yT

1 · · · yTn

]T.

This turns the estimate c into a generalized ridge estimate:

c =(Z∗T Z∗ + P

)−1Z∗T Y.

From this, residuals are readily calculated from

E = Y − Z∗c

19

and E can be broken down according to

E =[εT1 · · · εT

n

]T

where εi = (ε1, . . . , εTi). From here, an autocovariance is calculated explicitly from

R(l) =n∑

i=1

∑

|j−k|=l

εijεik

which produces a covariance for E given by

Cjk =

R(|j − k|) if Ej and Ek belong to the same observation

0 otherwise.

We can then calculate the usual sandwich estimator

cov(βk(u), βl(v)) = Φk(u)(Z∗T Z∗ + P

)−1Z∗T CZ∗

(Z∗T Z∗ + P

)−1Φl(v)T .

We note that this discrete setting also accounts for the case that yij = yi(tij) + εij where the εij

are approximately Gaussian measurement errors.

20

0 200 400 600 800 1000 1200

010

2030

4050

60

Velocity Trajectory

Time

Vel

ocity

MHDTHITEST_DHHDDT_SMHDTLO

0 200 400 600 800 1000 1200

−0.

010.

000.

010.

020.

03

Particulate Matter (PM) Trajectory

Time

PM


Figure 1: E55/59 Data: A single truck run on four separate driving cycles (top) and the particulate

matter output (bottom). The velocity patterns are chosen to represent a variety of real-world

driving conditions.

21

0 20 40 60 80 100

−0.

20.

00.

20.

40.

60.

81.

0

AUTOCOVARIANCE

Lag

auto

corr

elat

ion


Figure 2: The cross covariance between PM and velocity for each of four driving cycles. A cross-

dependence of around 20 seconds is indicated with reasonably consistent dependence across cycles.

Alpha

10

15

20

25

30

35

Lam

bda

0e+00

2e+07

4e+07

6e+07

8e+07

1e+08

SS

E

4.0

4.5

5.0

Figure 3: The cross-validation surface over both λ and α. The straight line provides the optimal

value of α for each λ.

22

0 20 40 60 80 100

−0.

010

0.00

00.

010

Block Bootstrapping Result

TIME

RV

BlockresidualsTransformed Residuals

5 10 15 20−2e

−05

0e+

002e

−05

4e−

05

Autocovariance

Lags

Cov

aria

nce

Estimated CovarianceBlockresidual Covariance

5 10 15 20−2e

−05

0e+

002e

−05

4e−

05

Autocovariance

Lags

Cov

aria

nce

EstimatedTransformed Blockresidual Covariance

Figure 4: Bootstrap Discontinuities. The top panel provides an example block-boostrap residual

before transformation (dashed) with the transformed residual (solid). The transformation smooths

out evident discontinuities between blocks. Bottom two panels: left the estimated auto-covariance

function along with autocovariances estimated from untransformed bootstrap residuals. Right:

the estimated autocovariance of transformed bootstrapped residuals agrees much closely with the

original autocovariance function.

23

5 10 15 20

−0.

30−

0.20

−0.

100.

00

Coefficients and CI

time

Coe

ffici

ent

CoefficientBoostrap CIDelta CI

5 10 15 20

−0.

04−

0.02

0.00

0.02

0.04

0.06

Coefficients and CI

time

Coe

ffici

ent

CoefficientBoostrap CIDelta CI

Figure 5: Estimated convolution functions for acceleration (top) and velocity (bottom) indicating

an immediate effect of increased velocity along with a delayed damping from acceleration. Dashed

lines give delta-method confidence intervals while dotted lines provide intervals resulting from the

transformed block bootstrap.

24

0 200 400 600 800 1000

−0.

010.

000.

010.

020.

03

Predictions and Observations

Time

PM

val

ues

PM Corr= 0.625Predictions SSE= 0.134

0 50 100 150 200 250 300 350−0.

010

0.00

00.

010

0.02

0


Time

PM

val

ues


0 200 400 600 800 1000 1200

−0.

010.

000.

010.

020.

03


Time

PM

val

ues


0 200 400 600

−0.

010.

000.

010.

02


Time

PM

val

ues


Figure 6: Cross-validated performance predicting PM trajectories for the four driving cycles. Solid

lines give observed PM, dashed predictions. Correlation between the signals is around 0.65.

25

functional convolution models - cornell...

Documents