confidence bands for a regression-based, non-linear decomposition model

Ecological Modelling, 14 (1981) 125-132 125 Elsevier Scientific Publishing Company, Amsterdam--Printed in The Netherlands

C O N F I D E N C E B A N D S FOR A R E G R E S S I O N - B A S E D , N O N - L I N E A R D E C O M P O S I T I O N M O D E L

KENNETH M. PORTIER

Department of Statistics, University of Florida, Gainesville, FL 32611 (U.S.A.)

KATHERINE CARTER EWEL

School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611 (U.S.A.)

(Accepted for publication 12 June 1981)

ABSTRACT

Portier, K.M. and Ewel, K.C., 1981. Confidence bands for a regression-based, non-linear decomposition model. Ecol. Modelling, 14: 125-132.

Parameter estimation can be performed on field or experimental data by using non-linear estimation procedures, and confidence regions for these estimates can be constructed. For model evaluation it is also important to determine confidence bands about the time function. These bands may be derived from the confidence regions for the parameter estimates. The procedure is illustrated for a two-compartment model of cypress leaf litter decomposition.

INTRODUCTION

Uncertainty is common in parameter estimates obtained from field or experimental data. When these estimates are used in a non-linear regression model it is not immediately clear what effect this uncertainty will have on predictions obtained from the fitted curve. Under certain assumptions it is possible to construct an approximate confidence region for the true parameter values. Within this region we may construct confidence bands about the fitted function. These confidence bands can then be used to examine the predictive capability of the function.

To illustrate the construction and interpretation of these confidence bands we will use the two-compartment model of cypress leaf litter decomposit ion discussed by Dierburg and Ewei (1982).

0304-3800/81/0000-0000/$02.50 © 1981 Elsevier Scientific Publishing Company

126

D E R I V A T I O N O F C O N F I D E N C E B A N D S

Attempts to model field or experimental data usually involve a regression equation of the form

x ° = f i ( U ° , O ) + e

where Xi ° = (x,~ . . . . . x . , ) represents observations on the ith dependent model variable, i = 1,2 ... . . L

U 0 ~ /All " " " / ' l i ra

~ n l " " " b / n m

are the associated observations on the independent model variables U, 0 is a ( p × I) vector of model parameters, and e - - ( % .... ,e,,) represents experimental or observations error which is assumed, under suitable transformation of the data, to have a normal distribution with zero mean and unknown variance o 2. The function f, is often non-linear and may not be in a closed mathematical form but simply an algorithm that supplies values of Xs for the vector U and 0. In the general case where there is more than one dependent variable, fs may be an implicit function of the other Xj. The result of this is that f~ is actually more non-linear than simple examination of the equation might imply.

One can usually obtain least-squares estimates of the parameters 0, denoted by {J, either directly or by means of various iterative search procedures. Hartley's modified Gauss -Newton method and Marquardt 's algorithm are two of the more commonly used and readily available iterative procedures (Gallant, 1975). In general, these procedures linearize the non- linear model as described below and then iteratively apply linear regression theory to obtain the parameter estimates. To linearize the model we construct what is called the linear pseudo-model of the non-linear function as follows. Let F(0) be the (n × p ) matrix of elements

(O/OOj)[fi(Uk,O)] where i represents the dependent variable of interest, j is the column index and k the row index of the matrix. The vector

u k = - - . v k . , )

denotes the k th observation on the independent variables. Define

Z ° = X~. ° + f i(U °, 0) + F ( 0 ) . 0

and

w:F(o)

127

Then a pseudo-linear regression model can be defined as

Z ° = W 0 + e where the matrix F(0) is considered as performing the same task as the design matrix in linear regression.

Gallant (1975) has shown that the parameter estimates 0, obtained from iteration on the pseudo-model of eq. 10, are random variables having asymptotic multivariate normal distribution with mean equal to 0 and variance-covariance matrix

v : o2[v ( o ) 'v( o ) ] - '

The least-squares estimate of the error variance, o 2, and variance-covariance matrix V are given by, respectively

?l

s2=[1/(n-p)] ~ [X ik - - f l (Uk ,0 ) ] 2 and I~- - sZ[F(0) 'F (0) ] ' k = l

In addition, the sum of squared deviation, (n --p)s 2, is distributed independently of 0, as a X2-variable with (n - p ) degrees of freedom.

These distributional properties can be used to construct an approximate joint confidence region for 0 as

{0: [ ( 0 - 0 ) 'F(0 )'F(/} ) ( 0 - /) ) ] / s 2 < [ ( n - 1 ) p / ( n - p ) ] F , ~(p,n-p)) where F~ _~ (p, n - p ) is the (1 - a ) percentile from an F distribution with p and n - p degrees of freedom.

The confidence region created by this method will be a p-dimensional hyperellipsoid. Since the true confidence region for the parameter 0 is not necessarily hyperellipsoidal (Draper and Smith, 1966), the result is an approximate confidence region. The more linear in its parameters the function ~ is, the better this approximation will be. Alternative approaches to the construction of confidence regions are available but require more computing (e.g. Draper and Smith) and /or may not give as good approxima- tions as the method described here (e.g. Duncan, 1978; Fox et al., 1980).

Once the least-squares estimate, t~, has been obtained, it is possible to construct the fitted function by using this estimate in the equation

X, ---- f~ (V, 6 )

Confidence bands about the expected curve can be obtained by defining two functions, fL for lower limit and fu for upper limit, such that the probability the true curve lies between these two functions is equal to a specified value. A (1 - a ) 100% confidence band may be defined as

fk(U) = min [ f (U, 0)] 0 E C R

fu (U) = max [ f (U, 0)] 0 E C R

128

where CR is the approximate confidence region obtained for 0. The problem of constructing confidence bands becomes one of optimizing (maximizing or minimizing) a function for each prespecified value of U, subject to the constraint that the parameters lie on the boundary or within a p-dimensional region. Finding these bands is not a trivial exercise and may require extensive computation, especially when more than two or three parameters are estimated. As is shown in the next section it is possible to reduce the region over which the search is performed by examining the nature of the function with respect to the parameters. This results in a significant reduc- tion in the amount of labor needed to compute the confidence bands.

The confidence bands derived from this method may be interpreted as follows. Suppose the experiments were repeated, exactly as described (i.e. same number of replicates observed for the same values of the independent variables), a large number of times. For each experiment, parameters are estimated and a fitted curve determined. Then, of all the 95% confidence bands constructed, 95% will contain the true curve. If only one curve is determined, then the probability that the calculated confidence band will contain the true curve is 0.95.

EXAMPLE: A TWO-COMPARTMENT DECOMPOSITION MODEL

Jenkinson (1977) first proposed a two-compartment model for the decomposition of plant material under field conditions. A similar two-compartment model was independently proposed by Dierburg and Ewel (1982) to repre- sent decomposition losses from cypress litter. In this model the cypress leaf litter is assumed to be composed of two components, 100a% being the rapidly decomposing or labile material and the remaining 100(1 - a ) % being the slower decomposing or refractory material. The proposed model in reduced equation form is given as

Xt = X , /X~ =[ae-O,, +(l _a) e 02,]

where Xt represents the fraction of initial biomass XS remaining at time t. To simplify the example it is assumed that the proportion coefficient is known to be a = 0.183, as reported by Dierburg and Ewel (1981). Thus there are two unknown parameters 0 = (01, 0z) which need to be estimated from the data.

An experiment to determine decomposition rates of pond cypress (Taxodium distichum var. nutans) needles was conducted in a cypress dome in north-central Florida. Data on the decay of cypress leaf litter was obtained for eight points in time, with five replicates at each time point (n = 40, p = 2). The sample space of the one independent variable, time, is in years. Observations are given in Table I.

TABLE I

Fraction of initial

129

biomass remaining at eight specified time points (Dierberg and Ewel, 1982)

Fraction Fraction of Fraction Fraction of of a year biomass remaining of a year biomass remaining (t) ( x t) (t) (x,) 0.0411 0.936 0.5644 0.747

0.958 0.761 0.958 0.766 0.976 0.759 0.960 0.723

0.0795 0.940 0.8164 0.683 0.813 0.720 0.944 0.710 0.939 0.722 0.930 0.713

0.1589 0.866 1.0649 0.769 0.878 0.783 0.874 0.709 0.873 0.692 0.884 0.694

0.3151 0.827 1.5616 0.699 0.781 0.613 0.799 0.679 0.784 0.648 0.795 0.731

Errors in the observations were assumed to be multiplicative, hence the model was fitted by means of non-linear least squares using the natural logarithm transformation. The regression equation is

X ° ----/(t, 0 ) = l n [ a e -°,' + (1 -- a ) e -°2t] + e,

where x ° represents the natural logarithm of the observations of X, and e, represents observation error, being independent and identically distributed as normal with mean zero and variance 02; 02 unknown.

D I S C U S S I O N

Least-squares estimates for the two rate parameters, 01 and 02, were computed using the NLIN procedure in the SAS statistical package (Barr et al., 1979). These estimates are

0 = 6 . 5 9 , t~ 2 = 0 . 1 3

with estimated error variance

s 2 = 0.0018

130

and approximation of the variance-covariance matrix as

s2[V(O),V(O)]-, = 1.0858 -0 .0019 -0 .0019 0.0001

The estimated correlation between 0~ and 02 is given as

P12 = -0-202

The approximate 95% confidence region for 01 and 02 was calculated and is given in Fig. 1. As expected, the region is an ellipse with major axis having negative slope, the result of the negative correlation between 0 t and 02. The coefficient of variation for 01 is approximately 16% whereas for 02 it is approximately 7%. Thus we find that the rate coefficient for the slowly decaying material is better determined than the other rate coefficient.

Because the proposed model is a monotonic function of the parameter values in the confidence region, finding the bands for specified time points requires only that the search be performed on those parameter pairs on the boundary of the ellipse. This greatly decreases computation time and sim- plifies the computer programming. Thus by an examination of the character- istics of the function the authors were able to reduce the effort involved in constructing the confidence bands. It is unlikely that this will be the case for the more complicated models commonly encountered in environmental modeling.

0.16

0.14

0.12

0 . 1 0

I I 4.°° 5100 610o 7.'0° 8.°0 81oo ~ o o o

KI Fig. 1. Ninety percent confidence region for two rate parameters in a two-compartment decomposition model.

131

I 0 0 -

8 0 -

6 0 -

Z

b.I rr

4 0 - Ct) 03

0 m 2 0 -

EXPECTED CURVE

CONFIDENCE BAND

' i i ' 0 0 . 4 0 0 8 0 I 20 1.60 2.0

TIME ( y e a r s )

Fig. 2. Mean and range of litter samples undergoing decomposition, and 95% confidence bands around the expected curve.

A 95% confidence band for the expected curve was constructed using fL(U) and fu(U) . These are plotted in Fig. 2 along with the fitted curve and the actual observations. As can be seen from Fig. 2, the confidence bands diverge as time from initial deposition of the leaf litter increases. This implies an increasing uncertainty in the predictions from the fitted curve, i.e. we are less certain that the predicted value of X at a given t is close to the true X value at t. This uncertainty does not seem very large for the first year but becomes much larger for predictions in later years.

Some of the observations fall outside the calculated confidence band (Fig. 2). For illustration the authors assumed the mixture parameter, a, was known without error. Actually, this parameter was initially unknown and was estimated along with 0~ and 02. It is quite possible that the 95% confidence bands obtained from the simultaneous confidence region of a, 0~ and 02, would be wider and would include all data points.

C O N C L U S I O N S

The construction of confidence bands about fitted non-linear functions involved in an environmental model can yield useful information on the effect of parameter uncertainty on model predictions. In addition, these

132

confidence bands give the researcher an indication of the uncertainty associated with extrapolation outside that region in which the data were collected.

One of the future uses of the litter decomposition model, for example, might be to predict the age distribution of the leaf litter on the forest floor by calculating the undecomposed portion of litter from a number of years. Propagating this error through time in a simulation should result in some measure of the reliability of the results and thus increase the amount of information to be obtained from the available data and the simulation analysis.

The procedures discussed here may be applied to those component functions whose parameters are estimated from experimental or observa- tional data. Techniques need to be developed that will allow incorporation of confidence information or subcomponent functions to be extended throughout the structure of the model. Eventually some confidence state- ment on the final system predictions may be obtainable by these procedures. In fact, as pointed out by one of the reviewers, such procedures are already in development.

ACKNOWLEDGMENTS

We wish to thank Ramon C. Littell and P.V. Rao, of the Department of Statistics, University of Florida, for reviewing the original manuscript. This article is University of Florida Agricultural Experiment Station Journal Series No. 2566.

REFERENCES

Barr, A.J., Goodnight, J.H., Sail, J.P., Blair, W.H. and Chilko, D.M., 1979. SAS User's Guide 1979 Edition. SAS Institute, Raleigh, NC.

Dierburg, F. and Ewel, K.C., 1982. The effects of treated sewage effluent on decomposition and organic matter accumulation in cypress domes. In: K.C. Ewel and H.T. Odum (Editors), Cypress Swamps. Florida University Press (in press).

Draper, N, and Smith, H., 1966. Applied Regression Analysis. John Wiley, New York, NY, pp. 282-284.

Duncan, G.T., 1978. An empirical study of jackknife-constructed confidence regions in non-linear regression. Technometrics, 20: 123-130.

Fox, T., Hinkley, D. and Larntz, K., 1980. Jackknifing in non-linear regression. Technomet- rics, 22: 29-34.

Gallant, A.R., 1975. Non-linear regression. Am. Stat., 29: 73-81. Jenkinson, D.S., 1977. Studies on the decomposition of plant material in soil. V. The effects

of plant cover and soil type on the loss of carbon from 14C labelled ryegrass decomposing under field conditions. J. Soil Sci., 28: 424-434.

confidence bands for a regression-based, non-linear decomposition model

Documents