filling gaps evapotranspiration

8/2/2019 Filling Gaps Evapotranspiration

1/10

Filling gaps in evapotranspiration measurements for water budgetstudies: Evaluation of a Kalman filtering approach

Nasim Alavi a, Jon S. Warland a,*, Aaron A. Berg b

aDepartment of Land Resource Science, University of Guelph, Guelph, Canada N1G2W1bDepartment of Geography, University of Guelph, Guelph, Canada N1G2W1

Received 5 May 2006; accepted 29 September 2006

Abstract

Missing data in long-term eddy covariance measurements of latent heat flux produce errors in the estimation of evapotranspira-

tion and the water budget. Because no standard method of gap filling has been widely accepted, identification of optimal filling

methods for gaps is crucial for determining total evapotranspiration. In this study we evaluate the application of a Kalman filter for

filling gaps in latent heat flux data collected from an agricultural research station. The filtering approach was compared with several

gap-filling methods including mean diurnal variation, multiple regressions, 2-week average PriestleyTaylor coefficient, and

multiple imputation. The results demonstrated that a Kalman filtering approach developed using the relationship between latent heat

flux, available energy, and vapour pressure deficit provides a closer approximation of the original data and introduces smaller errors

than the other methods evaluated. Evaluation of the Kalman filter approach demonstrates the efficiency of this technique in

replacing data in both small and large gaps of up to several days.

# 2006 Elsevier B.V. All rights reserved.

Keywords: Eddy covariance; Missing data; Multiple imputation; Recursive estimation; Data filling

1. Introduction

With the proliferation of the eddy covariance system

(EC) in the 1980s, direct measurement of evapotran-

spiration (ET) has become very common (e.g. Aubinet

et al., 2000; Baldocchi and Meyers, 1998). Advantages of

this technique are high temporal resolution (e.g. half

hourly) and integration of the flux over a relatively large

area (footprint length of 1002000 m). The technique is

limited by missing or rejected measurements due to

system failures (system/sensor breakdown),maintenance

and calibration, and improper weather conditions. Today,

FluxNet, a world wide network equipped with eddy

covariance flux towers, is operating continuously to

collect water vapour and CO2 fluxes from more than 140

sites around the world (Baldocchi et al., 2001). However,

approximately 1750% of the observations in water

vapour fluxes are reported as missing or rejected at

FluxNet sites (Falge et al., 2001a).

ET measurements are used in estimation of the water

budget at daily, monthly and annual time scales.

Consequently, data gaps in observed ET must be

accurately filled. Accurate assessment of the water

budget is required in many studies including con-

taminant leaching, water available for plant growth and

irrigation scheduling. For example, in studies of nitrate

leaching from fertilized agricultural fields to ground

water, estimation of drainage often depends on knowl-

edge of ET for water budget estimation (e.g. McCoy

www.elsevier.com/locate/agrformetAgricultural and Forest Meteorology 141 (2006) 5766

* Corresponding author. Tel.: +1 519 824 4120;

fax: +1 519 824 5730.

E-mail address: [email protected] (J.S. Warland).

0168-1923/$ see front matter # 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.agrformet.2006.09.011
mailto:[email protected]://dx.doi.org/10.1016/j.agrformet.2006.09.011http://dx.doi.org/10.1016/j.agrformet.2006.09.011mailto:[email protected]


2/10

et al., 2006). In many studies, groundwater or soil

moisture contributions are estimated as a residual of the

water budget equation. Inaccurate estimation of ET

perpetuates errors into other aspects of the water budget

and its applications. Thus, accurate gap-filling proce-

dures need to be established to provide complete data

sets for ET with a minimization of errors.Numerous methods have been proposed to fill gaps

in eddy covariance data but no standard method has

been widely accepted. The commonly used methods for

filling gaps in water vapour (or latent heat flux)

measurements include: the mean diurnal variation

method (Falge et al., 2001a,b), regression methods

(Greco and Baldocchi, 1996; Berbigier et al., 2001),

evaluation of a 2-week average PriestleyTaylor

coefficient (Wilson and Baldocchi, 2000; Wilson

et al., 2001), look-up tables (Falge et al., 2001b) and

multiple imputation (Hui et al., 2004).Results from different gap-filling methods vary

widely. Falge et al. (2001b) filled gaps in annual ET

using mean diurnal variation and look-up tables for 28

data sets from 18 FluxNet sites and found that the

differences in annual ET among methods ranged from

48 to +86 mm per year. Although several gap-fillingtechniques have been evaluated in replacing missing

half hour data (Falge et al., 2001a, b), they have not been

evaluated for longer gaps of order of days.

Jarvis et al. (2004) have recently demonstrated the

application of a recursive parameter estimation algo-rithm based on a Kalman filter to fill gaps in CO2 flux

data. An approach similar to this technique may be

suitable for water vapour flux. However, the efficiency

of this technique has not been examined. The Kalman

filter is a data assimilation technique that has been

applied in various studies. Williams et al. (2005) used an

ensemble Kalman filter to evaluate the carbon budget

over a ponderosa pine forest. Other studies have

examined the ability of Kalman filters to retrieve the soil

moisture profile by assimilation of near-surface soil

moisture measurements (e.g. Entekhabi et al., 1994;

Reichle et al., 2002). In a study of the surface energybalance, Boni et al. (2001) applied a Kalman filtering

approach to satellite measurements of surface tempera-

ture to estimate the components of the energy balance.

In this study we compared several gap-filling

techniques including mean diurnal variation, 2-week

average PriestleyTaylor coefficient, regression

method, multiple imputation, and recursive parameter

estimation algorithms based on the Kalman filter for

filling gaps in latent heat flux data. Second, we

examined the application of recursive parameter

estimation algorithms for filling gaps in latent heat

flux data in replacing missing values over short and

large gaps in data. We also evaluated the efficiency of

this technique in improving the annual and monthly

estimation of ET.

2. Materials and methods

2.1. Site description and data collection

The experimental site was located at the University

of Guelphs Elora Research Station (438390, 808250),

25 km northwest of Guelph, Ontario, Canada. Data used

in this study were collected in 2002 over a 18 ha winter

wheat field. Measurement of water vapour flux was

conducted with two eddy covariance systems. Each

system consisted of a Campbell Scientific CSAT3 three-

dimensional sonic anemometer and a LICOR-7500

open path CO2/H2O analyzer. Net radiation (Rn) andsoil heat flux (G) were measured using net radiometers

(CNR1 Kipp and Zonen, Delft, the Netherlands) and

nine soil heat flux plates (five Model HFT-1, Radiation

and Energy Balance Systems Inc., Seattle WA. and four

Model 610, Thornthwaite Associates, Elmer, NJ). Air

temperature (Ta) was estimated from measurements by

the sonic anemometer and corrected for vapour density.

2.2. Preparation of data

2.2.1. Eddy covariance dataQuality control was applied to the EC data to exclude

potentially non-representative flux measurements.

Screening criteria for mean velocityu, friction velocity(u*), and wind direction were applied. During times of

weak turbulence or low wind speeds, often experienced at

night, mixing is weak and the flux is underestimated.

Therefore, we rejected measurements with mean wind

u< 1 m=s andwhen thefrictionvelocityu* < 0.1 m/s. Toavoid tower shadowing (Lee and Hu, 2002), periods

where the angle between the mean horizontal wind and

sonic axis was more than 1508 were rejected.

The two EC systems were mounted on the sametower during the study period, therefore, the data series

of two systems were merged to one time series by

averaging two data series when they both had data or by

using whichever data was available in the event that one

system was not operating.

In 2002 the amount of missing data in latent heat flux

(lE) was 17% (due to maintenance and system failures).

The quality tests discarded 27% of the measured lE

data, leading to total gap percentage of 44% in 2002

with 36% for day and 54% for night observations.

Fifty-five percent of the total gaps were less than 3 h.

N. Alavi et al. / Agricultural and Forest Meteorology 141 (2006) 576658


3/10

There were two long gaps of 7 and 14 days in summer

2002 due to field operations.

2.2.2. Meteorological variables

The climatic variables, including Rn, G, D, Ta, were

missing 4, 5.5, 17 and 17% of observations respectively,

during 2002. Gaps in the Ta, Rn, and vapor pressure (e)time series were filled with the temperature, relative

humidity and net radiation measurements also available

at a nearby weather station (approximately 200 m

away). Missing values in G were replaced by using a 20-

day running linear regression between G and Rn.

2.3. Gap-filling methods

The gap-filling methods examined here include:

mean diurnal variation (MDV), multiple regressions

(Regr.), 2-week average Priestley and Taylor coefficient(PT), multiple imputation (MI), and two recursive

parameter estimation algorithms using a Kalman filter.

In this study we evaluated the recursive parameter

estimation algorithms as new approaches to gap filling

of ET measurements by comparing it with the other

methods. The other filling methods where selected

because they are commonly used and they each have

some advantage. MDV does not require climatic data,

therefore it can be used if meteorological variables are

not available. Regr. and PT methods consider the impact

of climatic variables on ET and capture the response ofET to climatic forcing. MI also preserves the relation-

ships between lE and climatic variables. Furthermore,

this method can impute missing climatic variables and

missing lEat the same time and provides a confidence

level for each imputed value.

2.3.1. Mean diurnal variation

In this method, a missing observation is replaced by

the mean for that time of day based on observations

from the previous and subsequent days (Falge et al.,

2001a,b). Derivation of the mean for the missing data

depends on the length of the time interval of averaging(window size). A window size of 415 days is usually

used as the averaging interval. Small window sizes (less

than 4 days) were not sufficient to determine a mean

from the measurement and not recommended by

previous studies (Falge et al., 2001a). Larger window

sizes were not considered, because nonlinear depen-

dence of lE on environmental variables introduces

errors through averaging. In this study, an averaging

interval of7 days for the missing daytime data day,and 14 days for nighttime data, was found to create

the lowest errors, and was therefore used in this study.

2.3.2. Multiple regression

In this method a multiple regression relationship was

established between latent heat flux and its main

controlling factors including available energy (Rn G)and vapour pressure deficit (D), for some period around

the missing data. The resulting equations were then

used to fill the missing latent heat flux values. In thisstudy, a regression relationship for each missing value

was calculated using the data from the adjacent 10days (where 10 days resulted in the highestdetermination). If the coefficient of determination

(R2) was greater than 0.5, the resulting equations were

used to fill the gap, otherwise the gap remained unfilled.

For long gaps (gaps larger than 5 days), data from a

window of20 days were used.

2.3.3. Two-week average Priestley and Taylor

coefficientMissing measurements of half hourly latent heat flux

were estimated using the product of equilibrium

evaporation for the half hour and the 2-week average

Priestley and Taylor coefficient (Wilson and Baldocchi,

2000; Wilson et al., 2001). Equilibrium evaporation was

obtained using lEeq D=D gRn G, whereD is the slope of saturated vapour pressure with

temperature (K), g the psychrometric constant and l is

the latent heat of vapourization. For each missing value

oflE, the total measured lE and estimated lEeq were

obtained during a 2-week period before and after themissing value of lE and the Priestley and Taylor

coefficient (a lE=lEeq, Priestley and Taylor, 1972)was calculated. At the time of the missing lE

measurement the product of a and lEeq was used to

estimate missing lE following Wilson and Baldocchi

(2000).

2.3.4. Multiple imputation

Multiple imputation (MI) is a statistical technique

for analyzing incomplete data sets (Hui et al., 2004). In

this technique, the missing entries of the incomplete

data sets are imputed m times. The variation among them imputations reflects the uncertainty with which the

missing values can be predicted from the observed ones.

In order to generate imputations for the missing values,

a probability model must be imposed on the complete

data set (observed and missing values). Once the model

is selected, mean vector and covariance matrices of the

incomplete data set can be generated using the

expectation and maximization algorithm (EM). The

EM algorithm is a common technique for fitting models

to incomplete data and estimating mean and covariance

matrices (Rubin, 1987; Schafer and Olsen, 1998). Then

N. Alavi et al. / Agricultural and Forest Meteorology 141 (2006) 5766 59


4/10

a data augmentation algorithm uses the initial values

obtained from the EM algorithm and generates the

imputed values. Further details of this method are given

in Hui et al. (2004) and Schafer (1997).

For this study, we assumed multivariate normal

distribution for observed lEand the climatic variables.

A number of previous studies (e.g. Schafer, 1997) haveshown that MI is robust to violations from the normality

of values used in the analysis if the amount of missing

values is not large. We examined the assumption of

multivariate normal distribution by transforming data

using the BoxCox transformation method provided in

SAS software version 9.1 (SAS Institute Inc., 2004)

following Hui et al. (2004). Comparison of the results of

transformed data with non-transformed data indicated

that the annual estimations changed only slightly (about

0.03%). We also assumed that data were missing at

random (MAR). Under this assumption, the probabilitythat data are missing may depend on observed data but

not on ones that are missing. This assumption allows us

to first estimate the relationships among variables from

the observed data, and then use these relationships to

obtain unbiased predictions of the missing values from

the observed data. This assumption is plausible for the

lE data since most of the missing data were due to

improper wind conditions (e.g. low u*) or malfunction

of the instruments and only weakly related to lE itself

(e.g. when condensation occurs on the gas analyzer). In

our evaluation procedure (Section 2.4) this assumptionis strictly valid because data are removed randomly.

A matrix of observations oflEand available energy

(Rn G), Ta, and D was used to conduct MI analysis.MI analysis was conducted using SAS software version

9.1 (SAS Institute Inc., 2004). The analysis used an EM

algorithm (maximum number of iteration used in EM

was set to 500) for initial value estimations and five

imputations were created using a data augmentation

algorithm, Markov Chain Monte Carlo (MCMC). Rubin

(1987) and Schafer and Olsen (1998) showed that unless

the rate of missing information is very high only a few

imputations (35) are enough to estimate missing dataand there is little advantage to producing and analyzing

more than a few imputed datasets. The average of the

five imputed values was used to fill each missing

observation.

2.3.5. Recursive parameter estimation algorithms

This method is based on the Kalman filter and

smoothing techniques developed by Young and co-

workers (Young, 1999; Young and Pedregal, 1999;

Young et al., 2004). In this method, the system is

described by a proper model with unknown parameters

(observation model). Temporal variations of parameters

in this model are assumed to be described by a

generalized random-walk model (state model). Having

introduced these two models, a forward pass filtering

(Kalman filter) is applied to the data to predict time

variable model parameters. The estimates obtained

from the Kalman filter are updated using a backwardrecursive smoothing algorithm working through the

data in reverse temporal order. Based on the type of

observation model, different methodologies have been

introduced for applying recursive parameter estimation.

In this study dynamic linear regression (DLR) and state

dependent parameters (SDP) were applied to the lE

data using the Captain Toolbox for MATLAB (Young

et al., 2004).

In the DLR algorithm, parameters are considered to

vary with time. In this study, the regression model was

expressed as the relationship between lE, net radiation(Rn), and vapour pressure deficit (D):

lEt atRnt Gt btDt zt (1)

Here a(t) and b(t) are the model parameters and jis

the regression model error series that is zero mean

(white noise). The DLR parameters can be seen as

coefficients in the PenmanMontheith equation:

lED

D gRn G

raCp

raD gD

where g g1 rc=ra, ra is the air density, Cp thespecific heat at constant pressure, ra the aerodynamic

resistance, rc the canopy resistance and g is the psy-

chrometric constant.

The stochastic evolution of each parameter in (1) is

assumed to be described by the following random walk

process (Young, 1999):

at at 1 hat; bt bt 1 hbt

(2)

and ha, hb are zero mean, white noise error series.

In the SDP algorithm, parameters are dependent on

air temperature (Ta) as well as time:

lEt aTa; tRnt Gt bTa; tDt zt

(3)

where

aTa; t aTa; t 1 hat;

bTa; t bTa; t 1 hbt(4)

and ha, hb are zero mean, white noise error series. We

assumed that the evolution of the model parameters is



5/10

nonstationary stochastic as described by a simple ran-

dom walk process. This ensures that the recursive

parameter estimates at any sample time t depends only

on data in the vicinity of this sample. This weighting

effect has a Gaussian shape which applies the maximum

weight at sample time twith declining weight each side.

The bandwidth of the Gaussian window is defined bythe noise variance ratio (NVR) (NVR = s2(z(t))/

s2(h(t)). The required NVRs are optimized via max-

imum likelihood error decomposition introduced by

Young (1999).

When using the SDP method the state dependent

parameters may vary with Ta at a rate commensurate

with the temporal variations in the input series, thus a

simple random walk cannot be assumed to describe the

parametric variation over time. Therefore, before

estimation of parameter variation, the time series of

lE, Rn, G, and D were sorted with respect to theascending order of Ta. With sorting, the rapid natural

variations in the input series are effectively eliminated

from the data and a simple random walk process utilized

to describe their evolution (Jarvis et al., 2004).

2.4. Evaluation of gap-filling methods

We created random and systematic gaps in the data

sets and applied each of the methods to fill these gaps.

The analysis used four bimonthly data sets (February

March, AprilMay, JuneJuly and SeptemberNovem-ber for the 2002 data) with low percentages (2535%)

of missing and rejected data. Daytime and nighttime

data were treated separately. Half hourly data were

deleted randomly until 20% (about 100180 data

points) of the original daytime data was removed from

the bimonthly dataset. Then each of the gap-filling

methods was applied to fill these artificial gaps. This

procedure was repeated for the nighttime data. To assess

the performance of each we examined the potential

error associated with each method. Root mean square

error (RMSE), coefficient of determination (R2),

relative root mean square error (R_RMSE), and relativebias (R_bias) were calculated. The absolute error for

each method was calculated as the measured minus the

computed value for each artificially missed data and

was used to calculate the total RMSE for the each

bimonthly data set. It was divided by the mean oflEfor

the data set to calculate R_RMSE. R2 was computed

between the observed and filled data for each bimonthly

data set and R_bias was obtained by dividing the bias by

the mean oflE for each data set. The entire procedure

was performed three times and the average values

reported here.

3. Results and discussion

3.1. Results of filling artificial gaps in data

Table 1 compares the performance of the gap-filling

methods. For the daytime data, root mean square errors

in each bi-monthly period range from 10.5 to 70.3 W/m2. The DLR technique resulted in the smallest error

(average RMSE = 17.2 W/m2) and largest agreement

coefficient (average R2 = 0.93), while the largest errors

(average RMSE = 54.1 W/m2) were found with the PT

method. The smallest agreement coefficients (average

R2 = 0.5) were obtained with MDV. Unlike the other

methods MDV, does not consider the impact of

meteorological variables on lE so it can be used when

meteorological data are not available.

The root mean square errors for nighttime gaps

ranged from 5.1 to 38.4 W/m2

and the agreementcoefficients were all less than 0.5. Comparing the

different methods, the errors were smallest (average

RMSE = 11.3 W/m2) with DLR whereas PT resulted in

the largest error (33.7 W/m2). RMSEs are larger during

summer and daytime than winter and nighttime, but the

fluxes are also larger. Therefore we considered the

relative RMSE to make the comparison between

different seasons and time. Normalizing the RMSE

with the mean oflEfor each dataset (relative root mean

square error) shows that all gap-filling methods resulted

in smaller relative errors (0.10.5) during summer(JuneJuly) and spring (AprilMay) than fall (October

November) and winter (FebruaryMarch) (0.31.5).

Also better agreement between computed and original

values was obtained during summer and spring daytime

(greater than 70% except for MDV).

R_bias was mostly less than 5% for daytime datasets

except for the PT method. This shows that the error

introduced by the different methods is essentially

random.

The gaps in daytime data during the summer can be

filled with smaller relative errors than nighttime and

wintertime gaps. The large relative errors associated withfilling missing nighttime data is primarily related to the

large number of gaps in nighttime data and unreliable

measurements during nighttime. However, due to the low

nighttimevalue oflE, the error introducedby application

of different filling method may not influence the total

annual or monthly estimation oflE significantly.

For the evaluation process, gaps in the data were

created randomly. This may not be a true representation

of reality because real gaps in data may not be randomly

distributed. Most large gaps in the data were due to

unsuitable weather conditions or field operations. These



6/10


Table 1

Root mean square error (RMSE), agreement coefficient (R2), relative root mean square error (R_RMSE), and relative bias (R_bias) of gap filled lE

(W/m2)

Regr. MDV PT DLR SDP MI

Daytime data

RMSE

AprilMay 2002 30.22 55.08 42.67 19.46 28.83 40.41JuneJuly 2002 33.68 70.26 53.59 22.80 44.67 52.40

OctoberNovember 2002 20.20 29.52 61.15 16.19 22.98 29.14

FebruaryMarch 2002 20.64 27.96 59.15 10.47 21.09 18.13

Average 26.18 45.70 54.14 17.23 29.39 35.02

R2

AprilMay 2002 0.83 0.44 0.79 0.93 0.85 0.72

JuneJuly 2002 0.93 0.67 0.86 0.97 0.87 0.81


FebruaryMarch 2002 0.71 0.46 0.51 0.92 0.68 0.74

Average 0.82 0.54 0.68 0.93 0.79 0.72

R_RMSE

AprilMay 2002 0.29 0.54 0.42 0.19 0.28 0.39

JuneJuly 2002 0.19 0.39 0.29 0.13 0.25 0.29


FebruaryMarch 2002 0.53 0.72 1.53 0.27 0.55 0.50

Average 0.35 0.56 0.86 0.23 0.38 0.43

R_bias

AprilMay 2002 0.02 0.04 0.09 0.02 0.05 0.04JuneJuly 2002 0.03 0.00 0.05 0.00 0.00 0.02OctoberNovember 2002 0.02 0.05 0.54 0.04 0.06 0.02FebruaryMarch 2002 0.04 0.03 0.80 0.01 0.01 0.00Average 0.04 0.02 0.37 0.02 0.03 0.00

Nighttime data

RMSE

AprilMay 2002 13.62 15.83 25.76 11.41 13.25 22.51

JuneJuly 2002 17.88 17.83 38.43 15.61 20.35 29.29


FebruaryMarch 2002 9.34 10.52 32.96 5.15 9.35 12.83

Average 14.22 15.86 33.66 11.32 14.89 20.96

R2

AprilMay 2002 0.23 0.00 0.19 0.41 0.27 0.03

JuneJuly 2002 0.20 0.00 0.02 0.35 0.18 0.00


FebruaryMarch 2002 0.27 0.04 0.02 0.75 0.24 0.02

Average 0.22 0.01 0.07 0.49 0.22 0.03

R_RMSE

AprilMay 2002 0.88 1.03 1.67 0.74 0.86 1.52JuneJuly 2002 1.07 1.06 2.29 0.93 1.21 1.76


FebruaryMarch 2002 0.99 1.11 3.48 0.55 0.99 1.47

Average 1.02 1.14 2.53 0.79 1.06 1.54

R_bias

AprilMay 2002 0.16 0.09 1.28 0.04 0.21 0.10JuneJuly 2002 0.53 0.14 1.84 0.40 0.74 0.53OctoberNovember 2002 0.09 0.05 1.94 0.16 0.36 0.05FebruaryMarch 2002 0.15 0.12 2.69 0.01 0.02 0.05Average 0.03 0.10 1.94 0.13 0.31 0.11


7/10

gaps may introduce biases by their non-random

distribution. Therefore the resultant errors with artificial

gaps may not be equivalent to the errors from

application of filling methods to the real gaps. To

evaluate the application of the gap-filling methods over

longer gaps as would be expected from bad weather

conditions or instrument failure, in the followingsection we assess the top three methods (Regr., DLR

and SDP) for long gaps on order of few hours to days.

Lack of energy balance closure is a common problem

in measuring turbulent fluxes with EC technique. It has

become generally accepted that the turbulent fluxes

(sensible heat and latent heat fluxes) are underestimated

by approximately 1030% relative to estimate of

available energy (Wilson et al., 2002; Aubinet et al.,

2000). Closure has not been considered in this analysis,

however since the gap-filling methods are essentially

interpolative, correcting existing data for closure wouldproduce an equal change in the filled data.

3.2. The Kalman filter approach to filling long gaps

This section examines the efficiency of SDP, DLR

and Regr. methods in replacing data in large gaps by

creating 3, 6, 12, 18 h, and 1, 3, 5 and 7 day gaps in the

dataset. As an example, the results for a 3-day gap in

May 2002 are shown in Fig. 1. For this period the RMSE

for DLR was 8.2 W/m2 and for SDP 13.4 W/m2, while it

was 20.6 W/m2 for Regr. Next, a 1-day moving gap was

created throughout the growing season (MaySeptem-

ber 2002). Fig. 2 shows the resulting RMSE for eachday of growing season 2002. In general, the values of

RMSE are smaller for DLR than Regr. and SDP. The

average daily RMSE was 22.4 W/m2 with DLR,

33.2 W/m2 with SDP, and 30.2 W/m2 with Regr. This

procedure was repeated using moving gap sizes of 3, 6,

12, 18 h and 2, 3, 5, 7 days over MaySeptember 2002.

Fig. 3 shows the average RMSE for different gap sizes.

As the size of gap increases, mean RMSE of DLR, SDP

and Regr. increases. DLR outperformed the other

methods for all gap sizes and has the smallest mean

RMSE while SDP has the greatest mean RMSE for allgap sizes. The difference in mean RMSE between Regr.

and DLR is 6.8 W/m2 for 3 h gaps and decreases to

3.4 W/m2 for a gap size of 7 days.

In the SDP method, the lEtime series are sorted out

of temporal order. Jarvis et al. (2004) claimed that in

this case the systematic gaps in the time series become


Fig. 1. Comparison of gap filled lE (*) by (a) DLR, (b) SDP, and (c) Regr. with measurements () for 1821 May 2002.


8/10

much smaller nonsystematic gaps in the sorted

temperature space and these smaller gaps can be filled

with smaller errors. Therefore, we should expect better

estimation of data in longer gaps using SDP. However,

in this study the SDP approach gave larger errors for all

gap sizes compared to DLR. In the SDP algorithm the

time series of Rn, G and D were also sorted with respect

to the ascending order of Ta. Although there is some

correlation of these variables with air temperature, otherfactors are also significant. The assumption that Ta is the

dominant control may lead to greater errors in SDP

analysis.

Including more variables in the algorithm would

increase the complexity of the model and result in large

errors comparing to simpler DLR algorithm.

Application of the DLR method provides a very good

estimation of missing values in lE since it is able to

reproduce most variation in lE due to different factors

(e.g. net radiation, vapour pressure). It provides smallerRMSEs (3.46.8 W/m2, with the lower values for gap


Fig. 2. RMSE of (a) Regr., (b) DLR, and (c) SDP for 1 day moving gap in growing season 2002.

Fig. 3. Mean RMSE of Regr., SDP, and DLR for filling in 3, 6, 12, 18 h and 1, 2, 3, 5, 7 days moving gaps in growing season 2002.


9/10

sizes of 3 h to 7 days) than Regr. for estimating lE.

Similar to the analysis over random gaps, the DLR

method was the most accurate for filling the larger

artificial gaps.

3.3. Estimation of annual and monthly sum of ET

Annual sums of ET were computed by filling the real

gaps with the Regr. and DLR methods. Regr. method

was applied to data with a window of10. Since somegaps were not filled because of low agreement

coefficients (mostly in winter time) we repeated the

application of Regr. using windows of 20 and 30days to fill data left unfilled by the 10-day window.

The annual estimation of ET was 606 mm by Regr.,

and it was 585 by DLR which differ by 21 mm (3%).

Although the application of Regr. and DLR did not

affect the annual sum very much, estimating ET duringa shorter period of time might be highly affected by the

filling method.

The results for monthly estimation of ET for year

2002 showed that the difference in monthly ET filled by

Regr. and by DLR ranged from 1 to 8 mm (0.215%) for

different months. The absolute difference was small

during winter months (0.21.5 mm) while it was larger

in the growing season (48 mm). Therefore, the

selection of proper gap-filling method is crucial in

estimating monthly ET during growing season.

4. Summary

Several gap-filling methods were applied to eddy

covariance data of latent heat flux from an agricultural

site. The methods were evaluated based on their ability

to fill artificial data gaps. The recursive parameter

estimation approach based on the relation between

latent heat flux and available energy and vapour

pressure deficit resulted in better estimation of the

original data and smaller errors than the MI, PT and

MDV, and Regr. methods.

The recursive parameter estimation approach withthe DLR algorithm produced the smallest errors for both

short and long (hours to days) gaps. The relative errors

were smaller for daytime data during the growing

season, when ET is high, than for nighttime and winter

data. The application of this method can improve the

estimation of total ET over a period of time since it

results in smaller errors than other methods, assuming

no significant bias between missing and present data

(MAR is valid).

While the application of different gap-filling

techniques does not greatly affect influence the total

annual sum, the total ET for shorter periods of time

(month or days) can be highly affected by choice of gap-

filling technique. Total ET during the growing season,

when ET is high, is more sensitive to choice of gap-

filling method and so proper selection of a method is

crucial at this time. Since the evaluation process based

on filling artificial gaps showed that the resultant errorwith DLR was smaller than with other methods,

application of this technique should provide better

estimation of total ET, especially over a short period of

time and during the growing season.

Due to the efficiency of the Kalman filtering approach

in replacing data in both small and large gaps, and

providing the best approximation of the original data, the

application of this technique is recommended as a gap-

filling approach for estimation of total ET, as is further

study of these techniques on the other contexts.

Acknowledgements

This research project was funded by the grants from

the Ontario Ministry of Agriculture and Food and the

Natural Sciences and Engineering Research Council of

Canada. We are very grateful for the thoughtful and

detailed comments of the two anonymous reviewers.

References

Aubinet, M., Grelle, A., Ibrom, A., Rannik, U., Moncrieff, J., Foken,T., Kowalski, A.S., Martin, P.H., Berbigier, P., Bernhofer, Ch.,

Clement, R., Elbers, J., Granier, A., Grunwald, T., Morgenstern,

K., Pilegaard, K., Rebmann, C., Snijders, W., Valentini, R., Vesala,

T., 2000. Estimatesof the annual netcarbon and water exchange of

European forests: the EUROFLUX methodology. Adv. Ecol. Res.

30, 113175.

Baldocchi, D., Meyers, T., 1998. On using eco-physiological, micro-

meteorological and biogeochemical theory to evaluate carbon

dioxide, water vapour and trace gas fluxes over vegetation: a

perspective. Agric. For. Meteorol. 90, 125.

Baldocchi, D., Falge, E., Gu, L., Olson, R., Hollinger, D., Running, S.,

Anthoni, P., Bernhofer, C., Davis, K., Evans, R., Fuentes, J.,

Goldstein, A., Katul, G., Law, B., Lee, X., Malhi, Y., Meyers,

T., Munger, W., Oechel, W., Paw, U.K.T., Pilegaard, K., Schmid,

H.P., Valentini, R., Verma, S., Vesala, T., Wilson, K., Wofsy, S.,

2001. FLUXNET: a new tool to study the temporal and spatial

variability of ecosystem-scale carbon dioxide, water vapour and

energy flux densities. Bull. Am. Meteorol. Soc. 82, 24152434.

Berbigier, P., Bonnefond, J.M., Mellmann, P., 2001. CO2 and water

vapour fluxes for 2 years obove Euroflux forest site. Agric. For.

Meteorol. 108, 183197.

Boni, G.,Entekhabi, D.,Gastelli, F., 2001. Land data assimilationwith

satellite measurements for the estimation of surface energy bal-

ance components and surface control on evaporation. Water

Resour. Res. 37 (6), 17131722.

Entekhabi, D., Nakamura, H., Njoku, E.G., 1994. Solving the inverse

problem for soil moisture and temperature profiles by sequential



10/10

assimilation of multifrequency remotely sensed observations.

IEEE Trans. Geosci. Remote Sensing 32 (2), 438448.

Falge, E., Baldocchi, D.D., Olson, R., Anthoni, P., Aubinet, M.,

Bernhofer, C., Burba, G., Ceulemans, R., Clement, R., Dolman,

H., Granier, A., Gross, P., Grunwald, T., Hollinger, S., Jensen, N.-

O., Katul, G., Keronen, P., Kowalski, A., Lai, C.T., Law, B.E.,

Meyers, T., Moncrieff, J., Moors, E., Munger, J.W., Pilegaard, K.,

Rannik, U., Rebmann, C., Suyker, A.,Tenhunen,J., Tu, K.,Verma,S., Vesala, T., Wilson, K., Wofsy, S., 2001a. Gap filling strategies

for defensible annual sums of net ecosystem exchange. Agric. For.

Meteorol. 107, 4369.

Falge, E., Baldocchi, D.D., Olson, R., Anthoni, P., Aubinet, M.,

Bernhofer, C., Burba, G., Ceulemans, R., Clement, R., Dolman,

H., Granier, A., Gross, P., Grunwald, T., Hollinger, S., Jensen, N.-

O., Katul, G., Keronen, P., Kowalski, A., Lai, C.T., Law, B.E.,

Meyers, T., Moncrieff, J., Moors, E., Munger, J.W., Pilegaard, K.,

Rannik, U., Rebmann, C., Suyker, A.,Tenhunen,J., Tu, K.,Verma,

S., Vesala, T., Wilson, K., Wofsy, S., 2001b. Gap filling strategies

for long term energy flux data sets. Agric. For. Meteorol. 107,

7177.

Greco, S., Baldocchi, D.D., 1996. Seasonal variations of CO2 water

vapour exchange rates over a temperate deciduous forest. Global

Change Biol. 2, 183197.

Hui, D., Wan, S., Su, B., Katul, G., Monson, R., Luo, Y., 2004. Gap-

filling missing data in eddy covariance measurements using multi-

ple imputation (MI) for annual estimations. Agric. For. Meteorol.

121, 93111.

Jarvis, A.J., Stauch, V.J., Schulz, K., Young, P.C., 2004. The seasonal

temperature dependency of photosynthesis and respiration in two

deciduous forests. Global Change Biol. 10, 939950.

Lee, X.H., Hu, X.Z., 2002. Forest-air fluxes of carbon, water and

energy over non-flat terrain. Boundary-Layer Meteorol. 103 (2),

277301.

McCoy, A.J., Parkin, G.W., Wagner-Riddle, C., Warland, J.S., Lauzon,

J., von Bertoldi, P., Fallow, D., Jayasundara, S., 2006. Usingautomated soil water content and temperature measurement sys-

tems to estimate soil water budgets. Can. J. Soil Sci. 86, 4756.

Priestley, C.H.B., Taylor, R.J., 1972. On theassessment of surface heat

flux and evaporation using large-scale parameters. Monthly

Weather Rev. 100, 8192.

Reichle, R., McLaughlin, D.B., Entekhabi, D., 2002. Hydrologic data

assimilation with the ensemble Kalman filter. Monthly Weather

Rev. 130 (1), 103114.

Rubin, D.B., 1987. Multiple Imputation for Nonresponse Surveys.

John Wiley & Sons, New York.Schafer, J.L., Olsen, M.K., 1998. Multiple imputation for multivariate

missing-data problems: a data analysts perspective. Multivariate

Behav. Res. 33, 545571.

Schafer, J.L., 1997. Analysis of Incomplete Multivariate Data. Chap-

man and Hall, London.

Williams, M., Schwarz, P.A., Law, B.E., Irvine, J., Kurpius, M.R.,

2005. An improved analysis of forest carbon dynamics using data

assilmilation. Global Change Biol. 11, 89105.

Wilson, K.B., Baldocchi, D.D., 2000. Seasonal and interannual varia-

bility of energy fluxes over a broadleaved temperate deciduous

forest in North America. Agric. For. Meteorol. 100, 118.

Wilson, K.B., Hanson, P.J., Mulholland, P.J., Baldocchi, D.D., Wulls-

chleger, S.D., 2001. A comparison of methods for determining

forest evapotranspiration and its components: sap-flow, soil water

budget, eddy covariance and catchment water balance. Agric. For.

Meteorol. 106, 153168.

Wilson, K.B., Goldstein, A.H., Falge, E., Aubinet, M., Baldocchi, D.,

Berbigier, P., Bernhofer, Ch., Ceulemans, R., Dolman, H., Field, C.,

Grelle, A., Law, B., Meyers, T., Moncrieff, J., Monson, R., Oechel,

W., Tenhunen, J., Valentini, R., Verma, S., 2002. Energy balance

closure at FLUXNET sites. Agric. For. Meteorol. 113, 223243.

Young, P.C., 1999. Nonstationary time series analysis and forecasting.

Progr. Environ. Sci. 1, 348.

Young, P.C., Pedregal, D.J., 1999. Recursive and en-bloc approaches

to signal extraction. J. Appl. Stat. 26, 103128.

Young, P.C., Taylor, C.J., Tych, W., Pedregal, D.J., McKenna, P.G.,

2004. The Captain Toolbox. Centre for Research on Environmen-tal Systems and Statistics, Lancaster University (http://www.e-

s.lancs.ac.uk/cres/captain).

http://www.es.lancs.ac.uk/cres/captainhttp://www.es.lancs.ac.uk/cres/captainhttp://www.es.lancs.ac.uk/cres/captainhttp://www.es.lancs.ac.uk/cres/captain

filling gaps evapotranspiration

Documents