functional convolution models - cornell...
TRANSCRIPT
Functional Convolution Models
Maria Asencio
Department of Statistical Science
Cornell University
Ithaca, NY 14850
Giles Hooker
Department of Biological Statistics and
Computational Biology
Cornell University
Ithaca, NY 14850
H. Oliver Gao
Department of Civil and Environmental Engineering
Cornell University
Ithaca, NY 14850
Acknowledgements
Giles Hooker was partially supported from NSF grants DEB-0813743, CMG-0934735 and DMS-
1053252.
1
Functional Convolution Models
Abstract
This paper considers the application of functional data analysis methods to modeling particulate
matter trajectories from dynamometer experiments. In particular the functional convolution model
is introduced as a restriction of the functional historical linear model that allows for functional data
of different lengths to be used. We present a penalized ordinary least squares estimator for the
model and a novel bootstrap procedure to provide pointwise confidence regions for the estimated
convolution functions. The model is illustrated on the California E55/59 study of diesel truck
emissions.
Keywords: Convolution Model, Functional Data Analysis, Distributed Lag Model, Particulate
Matter, Emissions Modeling
1. INTRODUCTION
This paper proposes the functional convolution model as an approach to modeling the concur-
rent dependence of functional data at different lengths. This work is motivated by the California
E55/59 study (Clark, Gautam, Wayne, W.Lyons, Thompson, and Zielinska, 2007) of vehicle par-
ticulate matter (PM) emissions. In this study, particulate matter responses to driving conditions
in medium and heavy duty trucks are examined via dynamometer experiments. Trucks are placed
on a dynamometer – a series of rollers that allow the truck wheels to turn while keeping the truck
stationary – and are driven through a pre-set series of driving cycles that specify speeds to be main-
tained for given times. These driving cycles are chosen to mimic real-world road conditions, from
highways to suburban traffic. An emissions analyzer is attached to the truck tailpipe and records
second-by-second counts of particulate matter; small solid particles less than 2.5 micrometers in
diameter that have been implicated in serious respiratory illness.
Existing models for particulate matter emission have tended to be based on either (i) a prediction
of average or total PM from average speed, average acceleration and other quantities accumulated
over the driving cycle, (ii) a regression based only on the instantaneous driving conditions or (iii)
complex models that attempt to parameterize all aspects of the production and transportation of
2
particles in the engine and exhaust system in terms of physical processes (see Ajtay, Weilenmann,
and Soltic, 2005; Capiello, Chabini, Nam, Lue, and Zeid, 2002; Ahn, Rakha, Trai, and Aerde,
2002). Our goal lies between these approaches in providing a more accurate model for PM emissions
without requiring knowledge of kinetic parameters within a dynamical system, while still accounting
for mixing during transportation and the serial dependence of measurements.
The goal of the study is to develop a model to predict instantaneous PM emissions from driving
behavior (i.e. speed and acceleration) that will be applicable across numerous driving cycles. From
Figure 1, it is apparent that all of speed, acceleration and PM can be regarded as following smooth
dynamics and thus can be approached through the machinery of functional data analysis (Ramsay
and Silverman, 2005; Ramsay et al., 2009). However, the direct application of functional data
analysis techniques is hampered by the lack of a standard time interval for the sampling domain –
experimental runs can vary between 370 and 1190 seconds – nor is it appropriate in this context to
register the responses to a common time scale as this will disrupt real-time constants such as the
transport time from the engine to the emissions analyzer. Instead, we assume stationarity in the
dependence of PM on the past history of velocity and acceleration.
Specifically, our model can be represented as a functional response model. We assume a response
yi(t) (in this case PM) with covariates xij(t) (velocity and acceleration) measured on the interval
[0 Ti] for observations i = 1, . . . , n. These are related via the functional convolution model:
yi(t) = β0 +p∑
j=1
∫ αj
0βj(u)xij(t− u)du + εi(t), i = 1, . . . , n. (1)
Here we take yi(t) respond to the past αj time units of the xij via a functional linear model (Ramsay
and Silverman, 2005). Note that while the yi(t) and the xij(t) must share the same time domain,
this domain need not be the same across different observations. We have parameterized the model
so that βj(0) represents the instantaneous effect of xij(t) on the response at time t. In the context
of emissions experiments, the convolution in the integral (1) can be thought of approximating the
mixing of particles from different time points during transit through the exhaust system. This
model is a restriction of the historical functional linear model presented in Malfait and Ramsay
(2003) in which we require the effects β(u) = (β0, β1(u), . . . , βp(u)) to remain constant over time. It
can also be thought of as the functional extension of distributed lag models in time series (Greene,
3
2008). Similar models have been used to estimate the hemodynamic response in functional MRI
imaging Genovese (2000); Zhang et al. (2007); Zhang and Yu (2008).
In practise, the yi(t) and xij(t) are measured at discrete time points ti = (ti1, . . . , tiNi) providing
data yi = (yi(ti1), . . . , yi(tiNi)), and xij = (xij(ti1), . . . , xij(tiNi)). These measurements need not
be taken at concurrent times and our methods will work when pre-smoothed estimates are obtained
for functional data. However, in our example all observations are taken at regular second-by-second
intervals and we will use this regularity to avoid the need to pre-smooth the observed data.
As a distinction from distributed lag models, we assume the dependence on past covariates as
given by β(u) to be smooth. We therefore propose a penalized ordinary least squares estimate as
an initial estimate of the parameters β(u) in (1). In addition, the lengths α = (α1, . . . , αp) of the
convolutions and smoothing parameters are selected by leave-one-curve-out cross validation. This
cross validation must be conducted appropriately to balance the differing lengths of observations
per experimental run.
In addition to estimation we develop confidence intervals for model parameters through both
a delta-method and via bootstrapping techniques. Because of the variable length of observations,
both these techniques require the εi(t) to be stationary and generated from an approximately
Gaussian process with an autocovariance structure
cov(εi(s), εi(t)) = R(|t− s|) (2)
that can be estimated directly from the fitted residuals. Given this specification of covariance,
delta-method confidence intervals can be calculated directly. By way of contrast, bootstrapping
methods must be carried out in a way that accounts for the dependence in the εi(t). This is usually
obtained by bootstrapping in blocks (see Lahiri, 2003). In the context of functional data, however,
block bootstrapping destroys the continuity and smoothness of the εi(t) leading to finite-sample
distortion of the estimated variance. Instead, a transformation of block-bootstrapped residuals is
proposed that recovers the structure of (2) while retaining some of the non-parametric flavor of the
block-bootstrap.
In the remainder of the paper Section 2 introduces the estimation of the functional convolution
model, Section 3 develops our estimate for auto-covariance and delta-method confidence intervals
4
and Section 4 introduces our bootstrapping procedures and Section 5 demonstrates the success of
our method on emissions data.
2. FUNCTIONAL CONVOLUTION MODELS AND ESTIMATION
In this section we specify our estimation of parameters in the model (1). The βj(u) are estimated
via a penalized ordinary least squares (OLS) estimate. Conditions to ensure the identifiability of
OLS estimates in this context are not easy to establish; a discussion of them is given in Appendix A.
However, a smoothing penalty is applied to these data in order to improve the numerical stability
of the least squares estimate and as a regularization device. We then apply leave-one-curve-out
cross validation to estimate the lengths of convolution required along with smoothing parameters.
2.1 Ordinary Least Squares
We propose an ordinary least squares estimate for the model given in (1), setting β0 = y for
convenience and minimizing
SSE(β) =n∑
i=1
∫ Ti
α∗
yi(t)−
p∑
j=1
∫ t
t−αj
βj(u)xij(t− u)du
2
dt.
for α∗ = max(α1, . . . , αp) Formally, estimates for the βj can be written as the solution of the
variational problem; conditions under which these estimates are uniquely defined are studied in
Appendix A. However, it will be useful to have a continuous representation of the βj(t) which also
admits smoothing, we will thus approximate the βj via the basis expansion:
βj(t) =Kj∑
k=1
φjk(t)cjk = φj(t)cj .
In the case of the emissions data we have chosen the φj(t) to be a fourth-order B-spline basis on
the interval [0 αj ] with knots every second.
This representation allows us to introduce a smoothing penalty in order to improve the regularity
of our estimates. This penalty takes the form of
PEN(β, λ) =p∑
j=1
λj
∫ αj
0
[Ljβj(u)
]2du
where each Lj is a linear differential operator which we take to be the operator Ljβj(u) = d2
dt2βj(u).
Our estimates now minimize the classical combined criterion SSE(β) + PEN(β). Specific formulae
5
for the minimizing coefficients cjk can now be calculated directly. Letting
βj(t) =Kj∑
k=1
φjkcjk
we can express the solution c∗ = [cT1 · · · cT
p ]T as
c∗ = (Z + P )−1 Y
where
Z =
Z11 · · · Zip
.... . .
...
Zp1 · · · Zpp
is a block matrix with blocks
Zkl =n∑
i=1
∫ Ti
α∗
[∫ αj
0xij(t− u)φj(u)du
] [∫ αl
0xil(t− u)φl(u)du
]T
dt
and
Y =
∑ni=1
∫ Ti
α∗ yi(t)[∫ α1
0 xi1(t− u)φ1(u)du]dt
...∑n
i=1
∫ Ti
α∗ yi(t)[∫ αp
0 xi1(t− u)φp(u)du]dt
and the composite penalty matrix is
P =
0 0 · · · 0 0
0 λ1R1 · · · 0 0...
. . ....
0 0 λp−1Rp−1 0
0 0 · · · 0 λpRp
with penalty matrices for each cj· given by the semi-norm
[Rj
]kl
=∫ αj
0Ljφjk(u)Ljφjl(u)du.
An important note here is that while SSE(β) is given via integrals, as are the formulae above,
in our application all of PM, velocity and acceleration are measured on a second-by-second basis.
We can thus replace the integrals with summations:
SSE(β) =n∑
i=1
Ni∑
t=α∗
yi(t)−
p∑
j=1
αp∑
l=0
βj(l)xij(t− l)
6
where Ni is the number of measurements for the functional observation (yi(t), xi(t)). The use
of integer-valued observation points in xik(t − l) is justified here by the one-second measurement
interval. The integrals in the formulae above can also be adjusted accordingly, yielding a formulation
in terms of penalized linear regression; these calculations are detailed in Appendix B. This avoids
the need to pre-smooth either the yi(t) or the xij(t), but is only feasible when both are sampled on
the same, regularly-spaced, time points. For more sparsely sparsely observed cases, a representation
in terms of an explicit smooth could also be instantiated.
2.2 Cross Validation
While parameters (β0, β1(u), . . . , βp(u)) can be estimated via a penalized OLS approach, we
turn to cross validation to specify the convolution lengths α and smoothing parameters λ. In
particular, we employ leave-one-curve-out cross validation as a score for both λ and α:
CV (λ, α) =n∑
i=1
∫ Ti
α∗
yi(t)− β−i
0,λα −p∑
j=1
∫ αj
0β−i
j,λα(u)xij(t− u)du
2
dt.
Where (β−i0,λα, β−i
1,λα, . . . β−ip,λα) are the parameter estimates obtained at λ and α after removing the
ith observation. This approach has been used for estimating the mean of a collection of functional
random variables (Rice and Silverman, 1991) and for functional response models (Ramsay and
Silverman, 2005). Explicit formulae to calculate CV (λ, α) have been obtained in Golub and van
Loan (1996) and Hoover et al. (1998).
Cross validation over numerous parameters is problematic in requiring a search of a high di-
mensional space; moreover multiple local minima in CV (λ, α) complicate this search (see examples
in Ramsay et al., 2009). We have therefore simplified the problem in two ways:
1. We have set all parameters to be equal: α1, . . . , αp = α and λ1, . . . , λp. The former of these
choices can be motivated physically by the assumption that the convolution approximately
represents the mixing of the instantaneous production of PM in the engine and we may
therefore expect that all engine parameters affect PM output over the same time interval.
The latter is commonly employed where more than one functional parameter is estimated;
see examples in Ramsay et al. (2009).
7
2. A one-step estimate for α and λ is employed, first estimating α at λ = 0 and then choosing
the smoothing parameters:
α = argminα
CV (0, α)
λ = argminλ
CV (λ, α).
An iteration of this scheme will lead to a local minimum over the joint space of α and λ.
However, the one-step approximation taken in this order can be justified if the model is
identifiable at λ = 0. In practice, a small value of λ is helpful in stabilizing the numerical
estimates and we begin from there. The estimate of α is then an un-penalized model selection
problem after which λ can be used to regularize the estimators if needed.
Following this one-step procedure, we obtain estimates (β0,αλ, β1,αλ(u), . . . , βp,αλ(u), α, λ). In
providing confidence intervals below, we will keep α and λ fixed.
3. VARIANCE ESTIMATION
As was done for the mean dependence of yi(t) on xi(t), we can account for the unequal lengths
of functional observations through the assumption of stationarity in the deviation of yi(t) from its
expectation. In particular, we assume the non-parametric auto-covariance structure given in (2).
We can estimate R(u) via a method-of-moments estimator. Obtaining residuals
εi(t) = yi(t)− β0,λα −p∑
j=1
∫ αj
0βj,λα(u)xij(t− u)du
our estimate is simply
R(u) =
1n
∑ni=1
1Ti−u
∫ Ti−u0 εi(t− u)εi(t)dt u < h
0 u ≥ h
Here we have chosen to threshold R(u) for values u ≥ h for some h < min(T1, . . . , Tn) at a value
chosen manually by examining the empirical autocovariance of the residuals (see Figure 2).
The estimation of R(u) now allows for a direct delta-method variance calculation that
Cov(βj(u), βl(v)) = Φj(u)(Z + P )−1C(Z + P )−1Φl(v)T
8
where
C =
C11 · · · Cip
.... . .
...
Cp1 · · · Cpp
is a block matrix with blocks
Ckl =n∑
i=1
∫ ∫ [∫ αj
0xij(t− u)φj(u)du
]R(|t− s|)
[∫ αl
0xil(s− u)φl(u)du
]T
dtds.
and Φj(u) is a∑
Ki vector with the values φjk(u) in the entries corresponding to cj . For regular
second-by-second samples, these calculations can be approximated; detailed formulae have been
given in Appendix B.
Once Cov(βj(u), βl(v)) have been calculated, pointwise confidence bands can be obtained for
β(t) from
βj(u)± 2√
Cov(βj(u), βj(u)).
These can be compared to the bootstrap-based confidence bands as shown in Figure 5.
4. BOOTSTRAP PROCEDURES
The delta-method confidence intervals proposed above rely on assumptions about the stationar-
ity and near-Gaussianity of the residual processes εi(t) in order to be valid. As a means of providing
more robust intervals, we develop a residual bootstrap. In contrast to classical functional response
models, we cannot simply re-sample the estimated residual processes εi(t) due to the different
lengths of observations. Instead, we make use of the assumed stationarity structure in the εi by
performing a block bootstrap.
There is a considerable literature on bootstrap methods for dependent data (see Lahiri, 2003;
Hardle et al., 2003). Here we use a modified non-overlapping block bootstrap. The block bootstrap
re-samples sequences of data while retaining the sequence structure, we thus break each εi into
segments of length h
εik(s) = εi(s + (k − 1)h), 0 ≤ s < h for k = 1, . . . , Ti/h.
These are re-sampled with replacement across both i and k and new residual processes are con-
9
structed from the resampled curves
ε∗i (t) ={
εσ(i,k)(t− (k − 1)h) : (k − 1)h ≤ t < kh
Where σ(i, k) indicates the resampled indices of the collection of blocks. In a residual block boot-
strap, the ε∗i (t) would be added to predicted values to create a new collection of functional responses
y∗i (t), i = 1, . . . , n from which parameters could be re-estimated.
In functional data analysis, however, the ε∗i (t) violate the smoothness assume for the εi(t) as
illustrated in Figure 4. This is a general problem for the block bootstrap and has been dealt with
in a number of ways; either by trying to match blocks at their ends (Carlstein et al., 1998) or
by down-weighting the block ends (Paparoditis and Politis, 2001). In this paper we propose an
alternative based on transforming the ε∗i (t) to have the same covariance structure as was assumed
for the original process.
Specifically, the residual processes are assumed to have covariance structure (2), while the
block-bootstrapped version has a block-structure
cov(ε∗i (t), ε∗i (s)) =
R(|t− s|) ∃k : (k − 1)h ≤ t < kh, (k − 1)h ≤ s < kh
0 otherwise
and we seek transformations Ki so that Ki[ε∗i (t)] has covariance (2). This is formally given in terms
of functional operators. However, when working with regular discrete observations, we can readily
instantiate the matrices
[Ci]jk = cov(εi(tij), εi(tik)) = R(|tij − tik|)
and
[C∗i ]jk = cov(ε∗i (tik), ε
∗i (tij))
then the discrete-sample realization of Ki is given by
Mi = C1/2i
[C∗
i
]−1/2
where C1/2i has been calculated from a singular value decomposition. We now use the transformed
block-bootstrap residuals
ε∗i = Miε∗i .
10
and obtain
y∗i (t) = yi(t) + ε∗i (t)
and use these to obtain an estimate β∗ as above, keeping the parameters chosen by cross-validation
fixed. This is repeated B times and the collection {β∗b }Bb=1 allows us to calculate biases as well as
variances from the bootstrap sample. Pointwise confidence intervals are then obtained from the
mean and standard deviation of the bootstrap samples and can be compared with the delta-method
intervals above. An estimate of the distortion caused by blocking is given in the lower-left panel of
Figure 4 where we have re-estimated the auto-covariance R(u) from bootstrap samples, resulting
in noticeably lower estimates.
An important detail is the way in which end-blocks of length less than h are treated. In the
bootstrap scheme above, these were given equal sampling weight with all the other blocks and a
sequence ε∗i was created that may be longer than Ti. The structure for Mi was updated to account
for potentially smaller blocks appearing and the residuals were transformed to Miε∗i before being
truncated at the right end. This order of operations was chosen to minimize the effect of the choice
to truncate from left or right ends.
We note that our transformation of the block bootstrap falls somewhere between a parametric
bootstrap – in which new ε∗i would be directly sampled from a Gaussian process distribution with
variance Ci – and a standard block bootstrap. The former case will result exactly in delta-method
estimates assuming infinite bootstrap samples. We speculate that our transformed block bootstrap
is dependent on the assume covariance structure (2) but that it will provide robustness to violations
of Gaussian assumptions and faster convergence to true sampling distributions when the covariance
structure is accurate. We also note that we do not necessarily require the size of the blocks to
increase with sample size for these properties to hold. The asymptotic properties of this bootstrap
procedure are the subject of ongoing research.
5. THE E55/59 STUDY
We present a case study of data taken from the E-55/59 program in southern California (Clark
et al., 2007). In this program, chassis dynamometer measurements were gathered with the objective
of getting a better emission inventory of medium and heavy duty trucks. We specifically focused on
11
modeling PM which has been associated with the increased risk of respiratory and cardio-vascular
disease in an exposed population (Laden et al., 2000; McCreanor et al., 2007; Schwartz et al.,
2002). Furthermore, predictive model of PM emissions is an important component of developing
new regulatory frameworks and planning transportation networks. We make a case-study of the
functional convolution model as applied emission in a single truck across a variety of driving models.
In this experiment, trucks are placed in the chassis dynamometer and their tailpipe connected
to an emission analyzer which separates and records the emissions. A velocity pattern that char-
acterizes a specific driving behavior is applied to the truck. Measuring emissions in this manner
allows the researcher to apply a velocity pattern multiple of times with high precision and in a
closed environment to reduce between-experiment variability. However, the recorded emissions are
not the direct effect of the instantaneous change of velocity since particles can experience some
delay during transportation through tailpipe to the analyzer. This is because the particles interact
with other factors such as temperature, other emission particles and the air-flow of the equipment.
The purpose of this case study to demonstrate how the convolution model can provide accurate
predictions while accounting for the smoothness of the observed signals and the delay and mixing
due to particle transport. In addition, we want our model to be cycle-independent: predictive of
PM trajectories given any velocity pattern even when this was not used at the time when the model
parameters were estimated.
Our sample data comes from four different velocity patterns and the weight load 3100 applied
to a 1992 Ford medium-heavy duty truck. The four different velocity patterns can be related to
general traffic situations that the trucks experience in real-world driving scenarios. For example,
we can see that the truck accelerates and then maintains a speed of around 60 miles per hour in
HHDDT S velocity pattern shown in Figure 1. In contrast, the truck slows down, speeds up or
stops completely in different time intervals and the highest velocity value that it experiences is 30
miles per hour in the MHDTLO velocity pattern. Given these characteristics, the first velocity
pattern described approximates driving behavior on highways and the second in suburban streets.
The trajectories of PM as result of the four velocity patterns are also shown in Figure 1. The
nonlinearity of these curves makes it difficult to easily observe their relation with the velocity
patterns. In some intervals, we can see that the relative fast increase of velocity for a period of
12
time manifests as high points in the PM. However, this behavior not as evident in other intervals
of the curves. Figure 2 shows the cross correlation curves of the four sample driving cycles where
time dependence on the order of 20 seconds is evident.
Cross-validation resulted in values of λ = 0.1 and α = 20. Figure 3 demonstrates the cross-
validation surface where we observe that the choice of α does not depend on λ, providing further
justification for our one-step method.
Figure 4 demonstrates the effect of the transformed bootstrap. The top panel gives a sample
block-bootstrap residual before and after transformation. The bottom two panels plot bootstrap
estimates of the autocovariance of the residual functions before and after transformation along with
the sample autocovariance function where the distortion due to a short block is evident.
Figure 5 provides the autocovariance of residuals and the estimated convolution functions re-
spectively. Here we observe an immediate increase in PM resulting from high veolicity, while
acceleration appears to have a negative effect. We speculate that this is due to correlation between
velocity and acceleration across driving cycles. Delta-method and bootstrap confidence intervals
demonstrate demonstrate high agreement across samples. Figure 6 demonstrates the cross-validated
performance of our estimates. Each panel plots the observed PM trajectory along with the tra-
jectory predicted from the remaining data. Here we observe that the essential features of the PM
trajectory are replicated, demonstrating good generalization ability.
6. CONCLUSIONS
This paper has presented a new model for functional responses when functions are observed of
different lengths. This model makes use of an assumption of stationarity in the observed responses
to provide a predictive relationship between the observed values of the response and historical
values of a covariate. In addition to providing a modeling methodology, we have established a
residual block bootstrap procedure that is applicable for these data and demonstrated that it
reduces the distortion due to discontinuities in residual functions when block bootstrapping is
applied naively. We have applied our model to a study of PM emissions from truck exhaust where
we have demonstrated its successful generalization properties.
A number of further areas remain to be investigated. The numerical biases associated with
13
the use of a basis expansion can be expected to be removed for finer knot sequences, so long as
in-fill asymptotics can be assumed for the sampling times of the response. Similarly, properties of
block-bootstrap methods have not, to our knowledge, been investigated in the context of functional
data. More interestingly, we have selected the length of convolution α via cross-validation. Since
this represents a model selection problem parameterized by a continuous parameter we expect
significant theoretical development to be required to study the properties of this choice. In the
context of the transport emissions study, the inclusion of multiple trucks will require the extension
of these models to a random-effects framework; some exploratory data analysis suggests there is
also considerable between-truck heteroscedasticity, representing additional modeling challenges.
REFERENCES
Ahn, K., H. Rakha, A. Trai, and M. Aerde (2002). Estimating vehicle fuel consumption and
emissions based on instataneous speed and acceleration levels. Journal of Transportation Engi-
neering 128, 189–190.
Ajtay, D., M. Weilenmann, and P. Soltic (2005). Towards accurate instantaneous emission models.
Atmostpheric Environment 39, 2443–2449.
Borelli, R. L. and C. S. Coleman (2004). Differential Equations: A Modeling Perspective. New
York: John Wiley & Sons.
Capiello, A., I. Chabini, E. Nam, A. Lue, and M. Zeid (2002). A statistical model of vehicle
emissions in and fuel consumption. Ford-MIT Alliance.
Carlstein, E., K.-A. Do, P. Hall, T. Hesterberg, and H. R. Kunsch (1998). Matched-block bootstrap
for dependent data. Bernoulli 4, 305–328.
Clark, N. N., M. Gautam, W. S. Wayne, D. W.Lyons, G. Thompson, and B. Zielinska (2007).
Heavy-duty chassis dynamometer testing for emissions inventory, air quality modeling, source
apportionment and air tocins emissions inventory: E55/59 all phases. Technical Report E55/59,
Coordinating Research Council.
14
Genovese, C. (2000). A bayesian time-course model for functional magnetic resonance imaging
data. Journal of the American Statistical Association 95, 691–703.
Golub, G. H. and C. F. van Loan (1996). Matrix Computations (3rd ed.). Baltimore: Johns Hopkins
University Press.
Greene, W. H. (2008). Econometric Analysis. New Jersey: Prentice Hall.
Hardle, W., J. Horowitz, and J.-P. Kreiss (2003). Bootstrap methods for time series. International
Statistical Review 71, 435–459.
Hoover, D. R., J. A. Rice, C. O. Wu, and L. P. Yang (1998). Nonparametric smoothing estimates
of time-varying coefficient models with longitudinal data. Biometrika 85, 809–822.
Laden, F., J. Schwartz, D. W. Dockery, and L. M. N. L. M. (2000). Association of fine particulate
matter from difference sources with daily mortality in six u. s. cities. Environmental Health
Perspectives 108, 941–947.
Lahiri, S. N. (2003). Resampling Methods for Dependent Data. New York: Springer.
Malfait, N. and J. O. Ramsay (2003). The historical functional linear model. Canadian Journal of
Statistics 31, 115–128.
McCreanor, J., P. Cullinam, and M. J. Nieuwenhuijsen (2007). Respiratory effects of exposure to
diesel traffic in persons with asthma. New England Journal of Medicide 357, 2348–2358.
Paparoditis, E. and D. N. Politis (2001). Tapered block bootstrap. Biometrika 88, 1105–1109.
Ramsay, J. O., G. Hooker, and S. Graves (2009). Functional Data Analysis in R and Matlab. New
York: Springer.
Ramsay, J. O. and B. W. Silverman (2005). Functional Data Analysis. New York: Springer.
Rice, J. A. and B. W. Silverman (1991). Estimating the mean and covariance structure non-
parametrically when the data are curves. Journal of the Royal Statistical Society. Series B
(Methodological) 53 (1), 233–243.
15
Schwartz, J., F. Laden, and A. Zanobetti (2002). The concentration-resposne relation between pm
2.5 and daily deaths. Environmental Health Perspectives 110, 1025–1029.
Zhang, C. and T. Yu (2008). Semiparametric detection of significant activation for brain fMRI.
Annals of Statistics 36, 16931725.
Zhang, C., Y. Zhang, and T. Yu (2007). A comparative study of one-level and two-level semipara-
metric estimation of hemodynamic response function for fMRI data. Statistics in Medicine 26,
38453861.
A. ON THE IDENTIFIABILITY OF UNPENALISED ORDINARY LEAST SQUARES
ESTIMATES
In general, the design of covariate functions has in functional data analysis has received little
attention. Here the most common models fall into one of two categories; either being unidentifiable
in finite samples without penalization (smoothing and functional linear regression fall into this
category) or being always identifiable, as in the concurrent linear models, in which case penalization
is unnecessary apart from being a tool for regularization (see Ramsay and Silverman, 2005). The
functional convolution models studied here present a challenge in that identifiability depends on
the finite-sample design of the covariates. In this appendix, we illustrate the issues that arise.
We will choose to assume our covariate processes βj(t) lie in the Sobolev space W [0 αj ] of
functions defined on [0 αj ] for which all derivatives that appear in Lj are square integrable. Within
this space, minimizing SSE(β) is equivalent to solving the variational problem
< γ, G[β] >= F (γ), ∀γ = (γ1, . . . , γp) ∈ W [0 α1]⊗ · · · ⊗W [0 αp]
where
F (γ) =p∑
j=1
n∑
i=1
∫ αj
0γj(u)
∫ Ti−α
0xij(t− u)yi(t)dtdu,
and
G[β] =p∑
j,k=1
n∑
i=1
∫ αj
0
∫ Ti
α∗xik(t− v)xij(t− u)dtβj(u)du.
16
where the inner product is taken as the product L2 inner product on square-integrable functions
< γ, β >=p∑
j=1
∫ αj
0γj(u)βj(u)du.
The identification of the OLS estimates is now equivalent to the invertibility of G. In particular,
β will be uniquely identified if
< γ, G[β] >= 0, ∀γ ⇒ β = 0
and in particular if
< β, G[β] >=∫ Tj
α∗
n∑
i=1
[∫ αj
0xij(t− v)βj(v)dv
] [∫ αj
0xij(t− u)βj(u)du
]dt > 0 (A.1)
for all j and all non-zero βj .
In particular, since β lies in a reproducing kernel Hilbert space, we can let ξ1, . . . form a basis
for W [0 α1]⊗ · · · ⊗W [0 αp] and (A.1) reduces to the requirement that
∫ αj
0ξjl(u)xij(t− u)du 6= 0 (A.2)
for at least one u and i for every l. This condition is not readily checked, particularly in real-
world applications since it requires checking an infinite collection of inner-products. However, it is
possible to characterize designs such that the collection
xijt(j) = xij(t + u) : [0 αj ] → R
spans a finite-dimensional space as t is varied; ie for which there is a finite collection of functions
η1(u), . . . , ηK(u) such that
xijt(u) =K∑
k=1
ck(t)ηk(u)
for all t. This set of self-similar functions can be expressed as solutions to a linear differential
equation. In the lemma below we restrict to a single real-valued function for the sake of clarity and
set αj = 1
Lemma A.1. Let x(t) have continuous first derivatives, x satisfies
x(t + u) =K∑
k=1
ζk(t)ηk(u) (A.3)
17
for all t and ηk : [01] → R, if and only if
x(t) =K∑
k=1
cktmkeakt sin(bkt + dk) (A.4)
for real-valued constants (ak, bk, ck, dk), and integers mk, k = 1, . . . , K.
Proof. We observe that
x(t + u + dt) =K∑
k=1
ζk(t + dt)ηk(u) =K∑
k=1
ζk(t)ηk(u + dt)
and thus
x′(t + u) =K∑
k=1
ζ ′k(t)ηk(u) =K∑
k=1
ζk(t)η′k(u) (A.5)
where x′(·) is the derivative taken with respect to its argument. Defining ul = (l − 1)/K for
k = 1, . . . ,K and matrices
Xlk = ηk (ul) , X = η′k (ul)
we can represent the last inequality in A.5 restricted to u1, . . . , uK as the differential equation
d
dtη =
[X−X
]η (A.6)
where X− is a generalized inverse. Solutions to (A.6) have general form of (A.4) (eg Borelli and
Coleman, 2004), and thus x(t) must at least be of this form. It is easy to check that any function
of the form (A.4) satisfies (A.3) for some K and (ζk, ηk, k = 1, . . . ,K), completing the converse
implication.
It is not difficult to provide example designs that satisfy (A.2). Consider the basis of periodic
functions on [01] given by ξk(u) = sin(2kπt) then taking x(t) to be defined on [010], say, with
x(t) =∑
2−kξk(t)
has non-zero inner product on [0 1] with ξk(u) for each k. We note that the range of x(t) need
not be restricted to [0 1]. However, between the finite-dimensional design described in Lemma A.1
and the identifiable design, it is possible to find xi(t) that is orthogonal to an infinite dimensional
subspace. Continuing our example, setting
x =∑
2−4kξ2k(t)
18
will yield < x, ξk >= 0 for all k odd provided the domain of x is of integer length. Evaluating
identifiability based on finite-dimensional approximations as given, for example in Appendix B is
straightforward. However, it seems more challenging to provide a protocol for designing experiments
for which this model will be employed.
Beyond identifiability, we have not addressed the question of the asymptotic properties of the
design. Under stationarity conditions on the ε(t), consistency in the OLS estimates is demonstrat-
able if the left hand side of (A.2) diverges. Here, however, this divergence can occur based on
an increased number of samples, or based on increasing the domain Ti on which each sample is
measured.
B. PENALIZED OLS FORMULAE
In the case that an approximation to the integrals in Z and Y is made based on second-by-second
observational records, we can approximate
Z = Z∗T Z∗, Y = Z∗T Y
for
Z∗ =[Z∗T1 · · · Z∗Tn
]T
with
Z∗i =
1∑α1
u=1 xi1(1− u)φ(u)T · · · ∑α1u=1 xip(1− u)φ(u)T
......
1∑α1
u=1 xi1(Ti − u)φ(u)T · · · ∑α1u=1 xip(Ti − u)φ(u)T
and
Y =[yT
1 · · · yTn
]T.
This turns the estimate c into a generalized ridge estimate:
c =(Z∗T Z∗ + P
)−1Z∗T Y.
From this, residuals are readily calculated from
E = Y − Z∗c
19
and E can be broken down according to
E =[εT1 · · · εT
n
]T
where εi = (ε1, . . . , εTi). From here, an autocovariance is calculated explicitly from
R(l) =n∑
i=1
∑
|j−k|=l
εijεik
which produces a covariance for E given by
Cjk =
R(|j − k|) if Ej and Ek belong to the same observation
0 otherwise.
We can then calculate the usual sandwich estimator
cov(βk(u), βl(v)) = Φk(u)(Z∗T Z∗ + P
)−1Z∗T CZ∗
(Z∗T Z∗ + P
)−1Φl(v)T .
We note that this discrete setting also accounts for the case that yij = yi(tij) + εij where the εij
are approximately Gaussian measurement errors.
20
0 200 400 600 800 1000 1200
010
2030
4050
60
Velocity Trajectory
Time
Vel
ocity
MHDTHITEST_DHHDDT_SMHDTLO
0 200 400 600 800 1000 1200
−0.
010.
000.
010.
020.
03
Particulate Matter (PM) Trajectory
Time
PM
MHDTHITEST_DHHDDT_SMHDTLO
Figure 1: E55/59 Data: A single truck run on four separate driving cycles (top) and the particulate
matter output (bottom). The velocity patterns are chosen to represent a variety of real-world
driving conditions.
21
0 20 40 60 80 100
−0.
20.
00.
20.
40.
60.
81.
0
AUTOCOVARIANCE
Lag
auto
corr
elat
ion
MHDTHITEST_DHHDDT_SMHDTLO
Figure 2: The cross covariance between PM and velocity for each of four driving cycles. A cross-
dependence of around 20 seconds is indicated with reasonably consistent dependence across cycles.
Alpha
10
15
20
25
30
35
Lam
bda
0e+00
2e+07
4e+07
6e+07
8e+07
1e+08
SS
E
4.0
4.5
5.0
Figure 3: The cross-validation surface over both λ and α. The straight line provides the optimal
value of α for each λ.
22
0 20 40 60 80 100
−0.
010
0.00
00.
010
Block Bootstrapping Result
TIME
RV
BlockresidualsTransformed Residuals
5 10 15 20−2e
−05
0e+
002e
−05
4e−
05
Autocovariance
Lags
Cov
aria
nce
Estimated CovarianceBlockresidual Covariance
5 10 15 20−2e
−05
0e+
002e
−05
4e−
05
Autocovariance
Lags
Cov
aria
nce
EstimatedTransformed Blockresidual Covariance
Figure 4: Bootstrap Discontinuities. The top panel provides an example block-boostrap residual
before transformation (dashed) with the transformed residual (solid). The transformation smooths
out evident discontinuities between blocks. Bottom two panels: left the estimated auto-covariance
function along with autocovariances estimated from untransformed bootstrap residuals. Right:
the estimated autocovariance of transformed bootstrapped residuals agrees much closely with the
original autocovariance function.
23
5 10 15 20
−0.
30−
0.20
−0.
100.
00
Coefficients and CI
time
Coe
ffici
ent
CoefficientBoostrap CIDelta CI
5 10 15 20
−0.
04−
0.02
0.00
0.02
0.04
0.06
Coefficients and CI
time
Coe
ffici
ent
CoefficientBoostrap CIDelta CI
Figure 5: Estimated convolution functions for acceleration (top) and velocity (bottom) indicating
an immediate effect of increased velocity along with a delayed damping from acceleration. Dashed
lines give delta-method confidence intervals while dotted lines provide intervals resulting from the
transformed block bootstrap.
24
0 200 400 600 800 1000
−0.
010.
000.
010.
020.
03
Predictions and Observations
Time
PM
val
ues
PM Corr= 0.625Predictions SSE= 0.134
0 50 100 150 200 250 300 350−0.
010
0.00
00.
010
0.02
0
Predictions and Observations
Time
PM
val
ues
PM Corr= 0.631Predictions SSE= 0.065
0 200 400 600 800 1000 1200
−0.
010.
000.
010.
020.
03
Predictions and Observations
Time
PM
val
ues
PM Corr= 0.677Predictions SSE= 2.851
0 200 400 600
−0.
010.
000.
010.
02
Predictions and Observations
Time
PM
val
ues
PM Corr= 0.656Predictions SSE= 1.144
Figure 6: Cross-validated performance predicting PM trajectories for the four driving cycles. Solid
lines give observed PM, dashed predictions. Correlation between the signals is around 0.65.
25