partially specified ecological models

Ecological Monographs 71(1), 2001, pp. 1-25

PARTIALLY SPECIFIED ECOLOGICAL MODELS

SIMON N. WOOD

Statistical Ecology Group, The Mathematical Institute, University of St Andrews

North Haugh, St Andrews KY16 9SS, U.K. [email protected]

Abstract. Models are useful when they are compared with data. Whether this com-parison should be qualitative or quantitative depends on circumstances, but in many casessome statistical comparison of model and data is useful and enhances objectivity. Unfor-tunately, ecological dynamic models tend to contain assumptions and simplifications whichenhance tractability, promote insight, but spoil model fit and this can cause difficulties whenadopting a statistical approach. Furthermore, the arcane numerical analysis required to fitdynamic models reliably presents an impediment to objective model testing by fitting. Thispaper presents methods for formulating and fitting partially specified models, which aim toachieve a measure of generality by avoiding some of the irrelevant incidental assumptionsthat are inevitable in more traditional approaches. This is done by allowing delay differ-ential equation models, difference equation models and differential equation models to beconstructed with part of their structure represented by unknown functions, while part of thestructure may contain conventional model elements that contain only unknown parameters.An integrated practical methodology for using such models is presented along with severalexamples, which include use of models formulated using delay differential equations, discretedifference equations/matrix models, ordinary differential equations and partial differentialequations. The methods also allow better estimation from ecological data by model fitting,since models can be formulated to include fewer unjustified assumptions than is usually thecase if more traditional models are used, while still including as much structure as the mod-eller believes can be justified by biological knowledge: model structure improves precision,while fewer extraneous assumptions reduce unquantifiable bias.

Key words: population dynamic model fitting; partially specified model; semi-parametricmodel; semi-mechanistic model; multiple smoothing parameter; non-linear spline model; de-lay differential equation; ecological dynamic model fitting.

INTRODUCTION

“The population went up and down, and so didthe model” is not totally unjustified as a carica-ture of the way in which ecological models are oftencompared to data. There are good reasons for this.Most formal theory for statistical data modelling isbased on linear and close to linear models. There isnot an equivalent body of theory for statistical anal-ysis using the kind of non-linear models on whichthe theory of ecological dynamics is based. Further-more, many models contain assumptions that havebeen introduced for reasons of tractability, ratherthan biology. These can be expected to result inmismatches between model and data, even whenthe biology underpinning a model is correct, and

this undermines the utility of formal comparison ofmodels with data. This paper aims to reduce theseproblems for an extensive family of non-stochasticmodels.

Why fit models? The most obvious and practi-cal motivation is to infer something about the sys-tem to which the model is being fitted. For ex-ample, mortality rates are very difficult to measuredirectly in the field, but population densities areeasier: by fitting an appropriate model to the latterit may be possible to infer the former. Falsificationand validation of models is another reason. Formalfitting can show whether a model is really capable ofproducing observed dynamics or not, as well as pin-pointing the features of data that are not explained

2 SIMON N. WOOD

by the theory embodied in a badly fitting model.Of course many models produce such ridiculous dy-namics that they have fulfilled their purpose andcan be discarded long before fitting is necessary, butfor many others fitting is helpful. Some models areproduced for prediction and here it is particularlyimportant to calibrate models by fitting to data. Fi-nally there is the comparison of models (hypothesistesting): one way of distinguishing between com-peting hypotheses about how a system works is toformulate these hypotheses as models and comparetheir ability to fit data from the system.

Although it may be desirable to fit models todata, it also turns out to be difficult. Most dy-namic ecological models are non-linear, so even if asensible measure of fit can be defined, methods thatare guaranteed to find the best fit do not generallyexist. There are therefore a large number of alter-native non-linear optimization methods to choosefrom. Treating these methods as black boxes tendsto give variable results: for some models fittingseems straightforward, while for others the fittingmethod makes very slow or no progress, terminatesfor no apparent reason or ‘converges’ to parametervalues that depend strongly on the initial parametervalues used. Such practical difficulties substantiallyundermine the usefulness of model fitting as a sci-entific tool.

There are at least three reasons for these diffi-culties. Firstly, it is necessary to choose some mea-sure of goodness of fit in order to fit models, andit is very easy to come up with choices that arevery difficult to minimise. Secondly, even for wellbehaved measures of model fit, the numerical anal-ysis involved in solving models and calculating thequantities required by fitting methods must be per-formed carefully: otherwise methods will be slowand convergence unreliable. Thirdly, efficiency andreliability are undermined if methods fail to makegood use of the structure of the model fitting prob-lem. One goal of this paper is to overcome the tech-nical obstacles for one class of models and fittingobjectives.

A further obstacle to useful model fitting is morefundamental. Most models contain elements thatare not derived entirely from mechanistic first prin-ciples. Instead, some parts of the model are phe-nomenological characterisations of a process or rela-tionship. These terms introduce incidental assump-tions into a model that have nothing to do withthe biological mechanisms on which the model isbased. It is usually assumed that these incidentalassumptions will have little effect on the qualita-

tive nature of the model’s dynamics, but this is byno means guaranteed (see Wood and Thomas 1999,for an extreme example). In any case, incidentalassumptions will produce quantitative effects andthese may have implications when it comes to inter-preting lack of model fit. Ideally, lack of fit shouldindicate that something is wrong with the biologicalassumptions of the model, rather than the inciden-tal assumptions, but disentangling these two is verydifficult. The issue is clearest when comparing thefit of two alternative models of a system. In thiscase interest focuses on distinguishing between thealternative mechanisms that the models embody,but model fit is contingent on incidental modellingassumptions as well as assumptions about biologi-cal mechanism. If the incidental assumptions differbetween models it is hard to know whether a differ-ence in fit is attributable to the differing biologicalassumptions made or merely to differences in theincidental modelling assumptions.

Further examples of the difficulty with inciden-tal assumptions arise in pure estimation problems.For example, when estimating mortality rates fromobserved (structured) population time series it isusually the case that the form of the mortality rateterm in the fitted model is not known a priori. Sim-ply assuming a form means that estimates are con-ditional on an untestable assumption: a problemthat compromises estimates and confidence inter-vals (see e.g. Wood and Nisbet 1991).

The problems introduced by model mis-specification are well known in other contexts. Fig-ure 1 shows an example in the context of lin-ear regression: over-specification leads to appar-ently small confidence intervals but substantialbias (which can not be estimated - bias correc-tion procedures assume a correct model and sta-tistical consistency of the estimators of the modelunknowns); too much flexibility in the specifica-tion leads to inflated confidence intervals. Simi-larly, non-parametric statistics is aimed at exactlythe problem of not knowing the correct parametricmodel for the random component of data. In thisstatistical context the benefits of non-parametricmethods are well known and usually accrue de-spite the relative robustness with which the centrallimit theorem endows equivalent parametric meth-ods. When constructing models for the systematiccomponent of data, rather than for the noise compo-nent, there is no equivalent of the central limit the-orem: hence the benefits of avoiding baseless para-metric representations of model components shouldbe even more obvious. This has been recognized

PARTIALLY SPECIFIED MODELS 3

50403020100x

20

10

0

-10

y

c

50403020100x

20

10

0

-10

y

a

50403020100x

20

10

0

-10

y

b

Figure 1: Model mis-specification problems. In all pan-els the dashed curve is the true underlying model (aquadratic). The circles are data generated from the truecurve, by adding normal random deviates: they are thesame in all panels. The solid curves are the 95% confi-dence intervals for the model fits illustrated in the dif-ferent panels. All fits are by least squares. (a) showsan overspecified model (a straight line): the confidenceinterval is misleadingly narrow. (b) shows an optimallyspecified model: the confidence interval has good cover-age, but is relatively narrow. (c) shows an underspecifiedmodel (a quartic): the interval is unnecessarily wide.

in conventional statistical modelling with the ad-vent of non-parametric approaches such as Gener-alized Additive Models, splines and other formalsmoothing methods (e.g. Hastie and Tibshirani1990; Wahba 1990). In these methods models arespecified in terms of linear combinations of fairlygeneral smooth functions of the statistical covari-ates, hence avoiding the problems associated withthe model mis-specification that must follow fromassuming more arbitrary, fully parametric, modelforms.

It seems sensible to take a similar approachwhen trying to reduce the problems of mis-specification in ecological dynamic models. Oneof the most obvious ways in which unwanted as-sumptions may be introduced into a model is viathe specification of the functional forms relatingmodel variables: if a relationship is poorly under-stood then the use of a parameter sparse functionalform to represent it is likely to introduce spuriousassumptions into the model. This can be avoidedby employing rather general function specificationsin such cases, so that problems are not introduced

by the specification itself. In special cases this ap-proach has been applied previously. Wood and Nis-bet (1991) and Wood (1994) used the fitting of flex-ible and very general models of structured popula-tions to extract death rate information from struc-tured population data. Both studies used extensiveMonte Carlo experiments to demonstrate the su-periority of this approach to (published) methodsemploying more tightly specified models based onless biologically defensible assumptions. The keyfeature of the models used was that only quite wellunderstood elements of the biology of the modelledsystems were described in a prescriptive manner,while less well known parts of the biology were rep-resented in a flexible non-parametric way. The hopeis that, by taking some care to avoid producing amodel that implicitly overstates how much is knownabout a system, it should be possible to producea model structure that is capable of being a rea-sonable approximation to the truth. This improvedmodel fidelity ought in turn lead to more reliable es-timates when the model is fit to data. The practicalutility of the approach has been further confirmed

4 SIMON N. WOOD

since (e.g. Ohman and Wood 1996), and Wood(1994) also suggested how these methods could begeneralized. In the context of measles epidemicsEllner et al. (1998) clearly and elegantly demon-strate the additional insight to be gained from whatthey term ‘semi-mechanistic’ models. Similarly,Bjørnstad et al. have use simple population dy-namic models formulated as generalized additivemodels to analyse cod (1999) and Indian meal moth(1998) populations. But while the approach hasbeen tried and proven in special cases, more gen-eral methods facilitating wider use are lacking: adeficit that this paper attempts to address.

To summarise: the ideal solution to the modelmis-specification problem is to write down modelswhich contain only what is actually known aboutthe workings of a biological system. This ideal isimpractical to achieve, but it is possible to movesome way towards it by only specifying model re-lationships in general terms when concrete knowl-edge does not justify imposing extra structure in theform of detailed parametric specification. Such anapproach has produced rich rewards when appliedto more conventional statistical modelling (Wahba,1990; Hastie & Tibshirani, 1990), and has shownconsiderable promise in the context of modellingecological dynamics (Wood and Nisbet, 1991; Wood1994, 1997; Ohman and Wood, 1996; Ellner et al.,1997; Ellner, 1998, Bjørnstad et al. 1998, 1999).What is missing in the ecological context is generalmethodology to allow this approach to be employedin more than a few special cases. The partially spec-ified modelling framework introduced in the nextfew sections of this paper is an attempt to fill thisgap. In order to produce something that works Iwill consider a limited class of models and will onlyconsider some fairly straightforward measures of fit:the latter restriction in particular is pragmatic andis not intended to imply that the measure of fit usedis the only sensible measure.

PARTIALLY SPECIFIED MODELS

The idea is best introduced by example, so con-sider a simple predator prey model. Let P and Ndenote predators and prey respectively and α , β, γ and δ be parameters with obvious interpreta-tions. The dynamics of predators and prey mightbe described by:

dPdt

= αPN − βPdNdt

= γN − δPN

I will refer to this as a fully specified model, since theinteraction between predators and prey is spelled

out quite clearly with some simple parametricmodel terms. Fitting such a model to time series ofpredator and prey abundances would involve find-ing values for α, β, γ, δ and (usually) the initialpopulations that best matched the data accordingto some criterion determined by the modeller.

Most ecologists do not believe that the propor-tional mixing assumption incorporated in the abovemodel is actually correct and conceptually it wouldbe preferable to work with a model that comescloser to embodying what can be stated with confi-dence. For example the partially specified model:

dPdt

= f(P, N)− βPdNdt

= γN − δf(P,N)

retains considerable structure, but involves a muchless specific assumption about the nature of the pre-dation process. In reality a little more structureshould probably be imposed in this case: for exam-ple, f should be smooth (meaning that its first fewderivatives should be nowhere too large) and

δ, β, γ, f > 0;∂f∂N

> 0;∂f∂P

> 0

Fitting this model involves somehow finding the pa-rameter values and function f that result in the bestfit of the model to data. Of course there are manyalternative partially specified models for this situa-tion, some more tightly specified and some less so.

This simple example is intended to introducethe concept of partial model specification: modelsare written with some components represented byunknown functions, rather than specific functionalforms, and qualitative information is introduced asbound constraints on the functions and parametersof the model. The hope is that models specified inthis fairly general way will be capable of closer ap-proximation to reality than more tightly specifiedmodels. The approach gives the modeller increasedcontrol over the assumptions made in modelling byremoving the need for the most common source ofunwanted assumptions: namely the need to writedown something expedient in place of the general‘f(·)’ which may be all that is justified by biologi-cal knowledge.

The rest of this section will attempt to lay outa practical framework for partially specified mod-elling in more detail. There are many ways in whichthe framework could be extended, but I will onlypresent what can currently be achieved technically.


Models

A wide range of population models can be writ-ten as a system of delay differential equations, witha finite number of discontinuities, including all ordi-nary differential equation models and discrete timemodels. In these models the state of a systemat some time t can be encapsulated in the valuesof some state variables at t and at some previoustimes. Most of the time the rates of change ofthese state variables are smooth functions of thestate variables and lagged state variables, but atsome time points the state variables may changediscontinuously. Let ni be the value of the ith statevariable and n be the vector of all state variablesat time t. Similarly let nt−τi be the values of thestate variables at t − τi (τi may change with timeand may be a state variable itself.) Then:

dni

dt= gi (n,nt−τ1 ,nt−τ2 . . . , t)

for all t > 0, t 6= {T1, T2, . . .} (1)

where {T1, T2 . . .} is the set of points at which thestate of the system changes discontinuously (the el-ements of this set may be state variable dependent).I will assume that gi does not actually depend onthe system state prior to t = 0, so that initial states,ni(0), rather than initial histories, are required tointegrate the model: i.e. gi is subject to the restric-tion that its partial differential with respect to anyelement of nt−τi is zero if t < τi. The model maybe supplemented by discontinuities:

ni(T+j ) = di(n(T−j ), j) (2)

where T+j is the instant after Tj and T−j the instant

before. The particular models given above and inthe examples sections provide illustrations from theclass of models.

Clearly most discrete time models and all ordi-nary differential equation models are special casesof this class of models. For example, by settinggi(·) = 0 for all i, we get the general class of modelsthat can be written as systems of difference equa-tions:

ni(Tj+1) = di(n(Tj), Tj)

This class includes matrix models and discrete dif-ference equation models.

Similarly by having no discontinuities and nolags the general model becomes a model written asa system of ordinary differential equations:

dni

dt= gi(n, t).

Therefore everything done in this paper for modelsformulated as delay differential equations will applywithout modification to difference equation modelsand differential equation models, except for somematerial on solving delay equations. In what followsthe reader interested solely in the more restrictedmodel classes can safely ignore the technical detailsassociated only with delays.

Limitation to this class of models is largelypragmatic since fitting general partial differen-tial equation models would usually require a pro-hibitive amount of computing with current tech-nology. However, it is possible to solve many par-tial differential equations by discretisation into aseries of ordinary (or even delay) differential equa-tions (see Al-Rabeh, 1992 and the section Example:marine copepods), so the restriction is not overlyonerous. Stochastic dynamics have been neglectedbecause they introduce enough extra technical dif-ficulty to double the length of this paper.

The class of models chosen covers a high propor-tion of the models actually used in ecology. Modelswritten as systems of discrete equations or systemsof differential equations are widely known and used.Delay differential equations (other than ordinarydifferential equations) are less widely used, but doallow a very wide range of situations to be modelled.In particular, a substantial amount of theory hasbeen produced on how to employ delay differentialequations to produce demographically sound, butcomputationally efficient, representations of (non-stochastic) population dynamics for organisms withmoderately complex life cycles. Accessible intro-ductions to the use of systems of d.d.e.’s to modelage-structured populations and more general phys-iologically structured populations are given in Gur-ney and Nisbet (1998) and Nisbet (1997) and theirapplication is amply illustrated by Gurney et al.(1983, 1986) Gurney and Nisbet (1985) or Nisbetand Gurney (1983), for example. It is worth notingthat delay differential equation modelling is by nomeans retricted to the kind of heroic phenomeno-logical characterisation that typifies some classic ex-amples of their use (e.g. May 1974). For example,by employing mixtures of delay and ordinary differ-ential equations it is straightforward to produce de-mographically rigorous models of populations of or-ganisms in which individual development is hetero-geneous in time and between individuals (see Blytheet al., 1984; Macdonald, 1978, 1989).

The gi’s and di’s in (1) and (2)will usually de-pend on unknown coefficients ci and also some un-known functions fi. Given particular fi’s and ci’s

6 SIMON N. WOOD

the model can be solved to give estimates of thestate variables at any time (or, more generally, es-timates of functions of the states variables). Hence,given observations of some of the state variables (orfunctions of state variables) at some times, it shouldbe possible to find the functions and coefficientsthat cause the model to best fit these data. Supposethat observations of state variables (or transforma-tions of state variables) can be written as a vectory. This vector will generally contain observationsof several state variables at a number of times ar-ranged in some order, the details of which are unim-portant. What matters is that given ci’s and fi’sthe population dynamic model can be solved (usu-ally numerically) to produce a vector of model es-timates µ corresponding to the observations in y.So the model can be thought of as a function(al) Mmapping the unknown functions and coefficients ofthe model to the model predictions of the data.

µ = M(f1, f2, . . . , c1, c2 . . .)

Given a correct model structure and the true valuesof the unknown functions and coefficients, and alsoneglecting all stochasticity except sampling error,the model states that µ = E(y). Even with morerealistic assumptions about stochasticity µ will usu-ally tend towards E(y) as population size increases.

A complete model may contain some additionalelements. The sampling distribution of the ob-served data, y, may be specified and there is otherinformation that might be included. For example,bound constraints may be available on some or allof the model parameters. Similarly, constraints maybe imposed on the unknown functions in a model:the easiest to deal with are constraints that can beexpressed in terms of linear functionals of the un-known functions (a functional is just a function of afunction). For example, it may be important to in-sist that the function be monotonic, convex or pos-itive, or some combination of these (i.e. f is suchthat f ′ ≥ 0 or f ′ ≤ 0 or f ′′ ≤ 0 or f ≥ 0 or somecombination of these). The inclusion of this sort ofqualitative information about model structure is an-other way of restricting the range of dynamics thatthe model can demonstrate, by as much as the mod-eller believes is justified by biological knowledge (forexample: we may not know how mortality rate isrelated to population size, but we surely know thatit is non-negative). A final conceptual element of apartially specified model is the belief that the un-known functions contained in the model should besmooth: how smooth will be discussed later, but itis usually implicit in the construction of the model

at all, that its component functions should not beinfinitely complex.

Fitting

Fitting partially specified models involves theconceptual difficulty that the unknown functions ina model must be represented somehow and that thecomplexity (or flexibility) of those functions mustbe chosen. Allowing too much flexibility can allowover-fitting (and over wide confidence intervals),while too little flexibility in the functions will lead tounder-fitting and the unquantifiable bias that par-tially specified models are intended to reduce. (SeeFigure 1.)

The problem of balancing goodness of fit andsmoothness of the unknown functions can be madequantitative by writing down an objective functionthat contains a term measuring badness of fit anda term measuring wiggliness of the unknown func-tions. An example of such an objective is:

minimisemd∑

i=1

wi(µi − yi)2 +mf∑

i=1

λi

∫

[f ′′i (x)]2 dx

(3)

subject to Lijfi ≥ bij {i = 1 . . .mf , j = 1 . . . k}and Acc ≥ bc (4)

The first term in (3) is model badness of fit mea-sured as a weighted least squares term (md is thenumber of data), in simple cases the weights wi

might all be set to unity, or proportional to the re-ciprocal of the variance of yi; the next term is a sumof ‘wiggliness’ measures for the model’s unknownfunctions (of which there are mf ). Each wigglinessmeasure term in this summation is multiplied by a‘smoothness parameter’ λi. λi controls the weightgiven to the goal of making fi smooth relative to thegoal of fitting the data closely (see Green and Sil-verman, 1994, for an accessible account of the use ofpenalized fitting objectives; Wahba, 1990, for moreadvanced examples or Villalobos & Wahba, 1987,for discussion of constrained penalized problems).Use of a weighted least squares term to measurebadness of fit facilitates smoothing parameter se-lection, but other choices are possible. For examplea (negative log) likelihood term might be used inplace of the least squares term, although for likeli-hoods based on exponential family distributions theresulting objective would still be minimised by it-erative solution of approximating problems of the


same form as (3) (the weights wi would be adjustedat each iteration according to the model mean- vari-ance relationship: again see Green and Silverman,1994, chapter 5). Note that the objective is fairlygeneral since the yi’s can be any transformationsand/or combinations of the raw data so long as itcan be predicted by the model: never the less, insome circumstances, it may be appropriate to usevery different measures of fit tailored to the particu-lar qualitative features of the biological system thatthe model is designed to investigate. Note also thatthe assumption that the fi’s are one dimensionalis made for simplicity of presentation: equivalentmultidimensional wiggliness measures are availableif the fi’s are functions of more than one variable(see, e.g. Green and Silverman, 1994, chapter 7 orWahba 1990).

The constraints (4), are the linear constraintsapplying to the fi’s and ci’s. The Lij ’s are lin-ear functionals, and bij ’s coefficients. For exam-ple, if one wanted to specify that the second con-straint of f1 is that its gradient is to be greaterthan one at the origin, then L12 would be the lin-ear functional that differentiates f1 at the origin andb12 = 1. Similarly Ac and bc are the matrix andvector containing coefficients relating to the linearconstraints on the coefficients. Note that it is alsopossible to impose general linear constraints involv-ing several functions and coefficients (for examplef1(x) + f2(x) < c1). Discussion of constrained op-timization can be found in Gill at al (1981) andappendix B.

Before examining the crucial question of howto choose λ (and thereby the wiggliness of the un-known functions), consider how to represent the un-known functions in practice. The functions can beapproximated using a suitable basis. This meansconstructing each fi from the sum of some sim-ple ‘basis functions’ (γij(x), say) multiplied by un-known parameters (αij , say) :

fi(x) =∑

j

αijγij(x) (5)

(γi1(x) = 1, γi2(x) = x, γi3 = x2 . . . is an exam-ple of a (bad) basis, a better basis is outlined inAppendix A). So the original population model willhave an expression like the right hand side of (5)substituted where ever there is an unknown fi inthe original specification (in practice this can bedone entirely automatically). The basis functionsthemselves have no unknown parameters, so find-ing the best fit fi is reduced to finding the bestfit parameters αij . Notice also that for any ba-

sis function representation like (5) the values andderivatives of the function at any point can be ex-pressed as linear transformations of the parameters(for example, f ′i(x) =

∑

αijγ′ij(x)): this is use-ful for turning inequality constraints on functionsinto general inequality constraints on parameters.The methods described here could be used with avariety of bases. A good choice is to use the ba-sis that arises naturally in linear spline smooth-ing problems, as this yields an easily calculatedform for the penalties,

∫

[f ′′]2, and is known tohave good approximation theoretic properties. De-tails can be found in Wahba (1990), Green & Sil-verman (1994) and Hastie & Tibshirani (1990),and Appendix A. Again, multidimensional func-tions can be used: the basis that arises from ‘thinplate splines’ is an obvious candidate, see Wahba(1990) or Green & Silverman (1994). If the un-known coefficients and the parameters for all theunknown functions are now collected into one vec-tor, pT = [α11, α12, . . . , α21, α22, . . . , c1, c2, c3 . . .],( T denotes transposition) then the fitting problem(3, 4) can be written:

min. q(p) =∑

i

wi(yi−µi(p))2+∑

i

λipT Cip (6)

subject to Acp ≥ b (7)

Ci is a matrix of coefficients that depend on thechoice of basis, but not the parameters; Ac and bare a matrix and vector of coefficients defining thelinear constraints (note that any basis function rep-resentation, which, like (5), is linear in its parame-ters, will allow the fitting problem to be written inthis general form). Given particular values for thesmoothing parameters, λi, this fitting problem isa constrained non-linear optimization problem forwhich solution methods will be given in the meth-ods section.

The second conceptual issue is the choice ofsmoothing parameters. This is the key to usingpartially specified models. Without an objectivemeans for choosing the amount of flexibility to al-low unknown functions within a model one has donenothing but replace arbitrary or ad hoc choices offunctional forms by arbitrary or ad hoc choice ofsmoothing parameters. Fortunately there has beena great deal of statistical work on this kind of modelselection problem and some methods with good the-oretical and practical properties exist. Generalizedcross validation (GCV, Craven & Wahba, 1979) isthe one that will be employed in this paper (seeGreen & Silverman 1994 for a clear introduction).

8 SIMON N. WOOD

Cross validation can be motivated as follows: imag-ine fitting a model to all your data but one, and thenmeasuring the square of the error in your modelprediction of the missing datum; now calculate theaverage of such squared deviations across all datapoints: this gives a cross validation score. This ordi-nary cross validation score measures how bad yourmodel is at predicting missing data: how bad it isat generalising. In the current context the cross val-idation score will depend on the choice of smooth-ing parameter. Low smoothing parameters will leadto complicated (i.e. flexible and/or wiggly) modelsthat fit the noise in the data and consequently pre-dict missing data badly. High smoothing parame-ters lead to simple models that don’t match the datato which they are fitted very well and do no betteron the missing data. Somewhere in the middle willbe better choices.

Cross validation as described has some prob-lems. Most worryingly, it is possible to performsimple transformations on some model fitting prob-lems in such a way that the solution and struc-ture of the fitting problem are unchanged, but thecross validation score ceases to contain any informa-tion about the optimum smoothing parameters (seeWahba 1990; p. 53 section 4.3). Furthermore, thecross validation score can be numerically expensiveexcept in certain special cases that don’t apply inthe current context. Fortunately the problems withordinary cross validation can be fixed with general-ized cross validation. The ordinary cross validationscore can be shown to be equivalent to a weightedsum of squares of the deviations of the fitted modelfrom the data. Generalized cross validation aver-ages the weights in this summation, so that eachdeviation gets the same weight. Writing µi for theestimate of µi obtained by model fitting, the result-ing score is:

V (λ) =∑

wi(yi − µi)2

[∑

(1− ∂µi/∂yi)]2(8)

Another way of viewing this quantity is as the es-timated error variance per error degree of freedom,since

∑

(1− ∂µi/∂yi) is an estimate of the degreesof freedom associated with the error (this can beseen by analogy with general linear regression, ifµ = Xp, then the least squares estimate of µ isµ = X(XT X)−1Xy, ∂µi/∂yi is just element i, i ofA = X(XT X)−1X. So

∑

∂µi/∂pi is the trace of A,which is well known to be the number of identifiableparameters in the original linear model).

The appealing theoretical property of GCV isthat, for linear problems in the large sample limit,

it has been proven to select the smoothing param-eters that minimise E[

∑

(µi − E(yi))2] (again seeWahba 1990, chapter 4). That is, the criterion isattempting to find the model that best matches thetrue underlying values of the observed data. No-tice that this will not be the model that fits thedata most closely, since this would involve fittingthe noise as well as the signal, implying that thesignal would not be optimally fit. Note also thatthis property is contingent on the yi’s being statis-tically independent (i.e. observations on indepen-dent random variables). Extensive numerical ex-perimentation seems to confirm that the theoreti-cal promise of GCV is realised at realistic samplesizes (again see Wahba 1990, Green and Silverman1994 and references therein). The principal diffi-culty with its use is computational: searching forthe optimal λ has the potential to be very costlyif there is more than one smoothing parameter (seeGu and Wahba 1991). Additionally there is no liter-ature on non-linear multiple smoothing parameterestimation. An alternative to GCV for smoothingparameter selection is Akaike’s Information Crite-rion (AIC, Akaike, 1973): Burnham and Ander-son (1998) is a good reference. AIC behaves ina way that is quite similar to GCV and is basedon the same basic principle of trying to pick outthe smoothing parameters that yield a model thatgets as close as possible to some underlying ‘truth’.In the current context AIC is just as difficult asGCV to work with computationally. A philosophi-cally different route would use a hypothesis testingapproach to model selection in order to find thesmoothing parameters: for example, by using like-lihood ratio testing to find the simplest model notfalsifiable by the data at the modellers favourite sig-nificance level: I do not pursue this approach fur-ther.

In summary, partially specified models are eco-logical models constructed so that part of the in-formation that they embody is in the form of un-known smooth functions, unknown coefficients, andconstraints on both of these: the idea is that thisextra flexibility in the structure of the models beingused allows models to be produced that are an hon-est reflection of what is known, or what you wishto assume, about a system, but no more. Fittingsuch models requires that the concept of ‘goodnessof fit’ is extended to include more than simply ‘fitsthe data as closely as possible’: one needs insteadto consider how the model can be made to approxi-mate the underlying ‘true’ model as closely as pos-sible. Table 1 provides a conceptual breakdown of


Step Model Objective

Formulate the model and the fitting objec-tive mathematically

dndt

= β(nt−τ )nt−τ − δn∑

i

(nti − yi)2

+λ∫

[β′′(x)]2dx

Use basis functions, γj(·), to represent β.i.e. β(x) =

∑kj=1 bjγj(x)

↓ ↓

Given the bj ’s, δ, τ and initial conditionsthe model can be solved numerically andthe objective evaluated along with deriva-tives w.r.t. the parameters.

dndt

=∑

bjγj(nt−τ )nt−τ − δn∑

i

(nti − yi)2

+λbT Cb

Now iterate the following 2 steps to convergence:

1. Given an estimate of λ adjust bj ’s, δ and τ to reduce the objective∑

i(nti − yi)2 +λbT Cb.

2. Adjust λ to reduce the GCV score.

Table 1: Example of construction and fitting of a simple partially specified model similar to the adult competitionmodel in the section Blowflies II. It is assumed that an adult population n suffers per capita death rate δ whilenet per capita fecundity is an unknown function of n: β(n). Individuals take time τ to mature. The example doesnot consider inference or constraints.

the construction and fitting process for a very sim-ple example.

A useful discussion of inference from these mod-els benefits from some technical background, so itis postponed until the end of the methods section.

METHODS

This section covers the technical issues involvedin fitting and using partially (and fully) specifiedmodels in practice and presents the arguments forusing the methods employed here rather than al-ternatives (particularly when these are more famil-iar and/or simpler). I have not documented ev-ery detail, but have covered those areas which re-quire a non-standard or novel approach (Numericalmodel solution, Calculating J, Smoothing parameterselection, Model fitting methods), as well as mate-rial required for understanding of the novel materialwhich would otherwise involve a tedious amount ofsupplementary reading (model fitting methods, cal-culating J). The material in this section can be ig-nored if the goal is to fit one particular model to oneset of data and it is therefore acceptable to employa considerable degree of trial and error in the fit-ting process. But more ambitious goals, such as the

comparison of competing models, require methodsthat are efficient (i.e. quick), reliable (meaning thatyou can tell when a best fit has been achieved) andaccurate (that is, with estimation uncertainty dom-inated by the statistical structure of the problemrather than by propagation of numerical errors).

Model fitting

In this section it will be assumed that λ esti-mates are provided and the aim is to find best fitparameters p, given some set of smoothing param-eters. The basic fitting strategy will be as follows:

1. Given some estimate (guess) of model param-eter vector p, numerically solve the modelequations (delay or ordinary differential equa-tions or difference equations), to obtain an es-timate of µ.

2. By repeatedly solving the model with slightchanges in parameters obtain an estimate ofthe matrix J where Jij = ∂µi/∂pj .

3. Use the current µ and J estimates to con-struct (or update) a quadratic model of thefitting objective.

10 SIMON N. WOOD

4. Finding the parameter vector that minimisesthe quadratic model, suggests a direction inwhich to change p in order to minimise thereal fitting objective.

The estimate of p can be improved by iteratingsteps 1. to 4. to convergence, although extra stepsare required to deal with constraints and in somecircumstances λ selection steps will also be includedin each iteration (see later).

Numerical model solution: There are 3 require-ments for the numerical solution of the model:speed, accuracy and stability. Speed and accuracysuggest using an explicit integration scheme withadaptive time-stepping (Press et al. 1992, Haireret al. 1987) Adaptive stepping also ensures thatthe explicit scheme maintains numerical stability,so that the model integration will not ‘explode’ andhalt the fitting process if an unfortunate set of pa-rameters is tried by the optimization algorithm.Sensible use of inequality constraints on parame-ters and functions during model formulation alsohelps to avoid really bad parameter choices duringfitting. (Readers uninterested in d.d.e. models cannow proceed to Calculating J).

Speed and accuracy of an integration scheme fordelay differential equations are surprisingly sensi-tive to some of the details of numerical analysis(Highman 1993b). When numerically solving de-lay differential equations it is necessary to storethe past values of state variables (the ni’s of (1)).Any integration scheme only calculates values andderivatives at discrete times. Never the less, it willusually be the case that the scheme will require es-timates of lagged state variables between the timesthat they were stored. So interpolation is required.

The order of accuracy of the interpolationscheme should be higher than that of the integrator:if it isn’t then one of two things will happen. Forsome non-embedded time-stepping schemes the in-tegrator will be forced to take very small steps, dueto what it perceives as frequent discontinuities in(higher) derivatives of the lagged variables: this isinefficient because it will generally take more stepsthan the integrator of the correct order and eachone of those steps involves more calculation than asingle step of the lower order integrator. For embed-ded time-stepping schemes the continuity assump-tions on which step-length selection is based will beviolated, leading to the situation in which the accu-racy of the numerical solution is no longer relatedto the integration tolerance used (Higham 1993a,b).These factors tend to be ignored in ecological ap-

plications with most workers using integrator/ in-terpolators of inconsistent order: in many cases thisdoes no more than lose a little accuracy (that no-onewill miss) and waste a little computer time (which,you can be sure, the machine would not have putto better use); for the current application it leadsto serious problems when anywhere close to best fitparameters.

In the work reported here I used adaptive time-stepping with an embedded Runge Kutta 2(3)scheme due to Fehlberg (Hairer et al. 1987, p.170)and interpolated lagged state variables using cu-bic Hermite interpolation (Higham 1993a, Paul,1992). The latter necessitates the storage of laggedvariables and their gradients, but, since the gra-dients of state variables must be calculated any-way, this presents no difficulty. Note that an ad-ditional advantage of the use of this consistent in-tegrator/interpolator pair is that the true accuracyof the numerical solution can be estimated by mak-ing use of the ‘tolerance proportionality’ of the es-timated solution: since accuracy is linearly propor-tional to integration tolerance (Higham, 1993b) ac-curacy can be estimated by integration the modelwith 2 different tolerances. This is useful for settingconvergence criteria when model fitting.

Calculating J: For reasons that will hopefullybecome clear, it is necessary to calculate a ‘Jaco-bian’ matrix, J, where

Jij =∂µi

∂pj

J can be approximated by finite differencing using:

Jij ≈µi(p + ∆jij)− µi(p)

∆j(9)

where ij is a vector having zeroes everywhere exceptin entry j where it has a 1, and ∆j is a small num-ber. Through diligent study of the opening chap-ters of any number of numerical analysis textbooksthe reader will be aware that a poor choice of ∆j

can cause problems. Obviously the finite differenceapproximation will be poor if ∆j is too large (trun-cation error). Less obviously, if ∆j is too small thenthe two estimates of µi will be so close that almostall the bits used to store them will be identical,leaving few or no bits storing the difference (can-cellation error). In the current context getting ∆jwrong leads to serious problems, but what counts asright can be sensitive to the local shape of the objec-tive function. Hence the usual folklore for intervalestimation is best ignored in favour of adaptively


adjusting the intervals in order to keep estimates ofthe average truncation error roughly equal to theaverage cancellation error (averages are over all es-timated derivatives). Appropriate error estimationformulae can be found in Gill et al. (1981). Giventhe evaluations required for ∆j control, a more ac-curate finite difference formula can be used at noextra cost:

Jij ≈µi(p + ∆jij)− µi(p−∆jij)

2∆j

There is a second issue relating to the calcula-tion of J that is specific to the fitting of (delay)differential equation models and requires a novelapproach. The model integration scheme will in-troduce some error into the calculation of µ and inthe interests of computational efficiency this errormay be allowed to be large relative to the machineprecision (after all, the data will not be measuredto a great number of significant figures). Unfortu-nately, moderate errors in µ can entail serious errorsin derivative estimates if the errors are independentbetween µi(p) and µi(p ± ∆jij). If step size con-trol is done separately for calculation of µi(p) andµi(p±∆jij) then their error terms can become al-most independent. The solution is to adopt exactlythe same set of time steps for integrating to findµi(p±∆jij) as was used for µi(p): in this circum-stance most of the error in the two approximationswill cancel when they are differenced.

Model fitting methods: Given µ and J, calcu-lated as described in the previous two sections,model fitting is quite straightforward and will bebased on the well tested approach of iteratively ap-proximating the fitting objective by a (multidimen-sional) quadratic:

q(p) ≈ a + hT p +12pT Gp (10)

where a is a constant, h is the vector of deriva-tives of the objective with respect to the parame-ters (hi = ∂q/∂pi), and G is the Hessian of the ob-jective (Gij = ∂2q/∂pi∂pj). Minimising the righthand side of (10) with respect to p suggests an up-dated parameter estimate of G−1h (although it isoften better to use this estimate to define the direc-tion along which to search for a reduction in q(p),rather than using the estimate directly). There areseveral reasons for favouring methods based on aquadratic model of the objective function. Firstly,the least squares part of the objective can be ex-pected to be close to quadratic (for linear models,

least squares objectives are exactly quadratic) andthe penalty terms are exactly quadratic. It wouldbe rather inefficient to make no use of this informa-tion by using gradient descent or function evalua-tion methods, especially given the superior conver-gence rates of quadratic methods (Gill et al. 1981).However it is probably not sensible to attempt toapproximate G directly by finite differencing: suchan approach would be numerically costly and thedifficulty of obtaining accurate estimates of secondderivatives would be even greater than the difficul-ties encountered in estimating J. The substantialextra numerical burden would certainly not yieldbenefits far from the optimum parameters where thequadratic approximation itself is not good. Whenclose to the optimum the approximations that willbe employed here should be at their best and theextra benefit of directly estimating G is likely to beslight. In contrast, the effort of evaluating J canalways be justified by the fact that it guaranteesthat a descent direction can be found (or otherwiseindicates a turning point).

A final important reason for favouring thequadratic model of the objective is the existenceof well known and robust methods for minimising aquadratic model subject to linear constraints of thetype used here. Appendix B outlines the basic prin-ciples underpinning such constrained optimizationwhile Gill et al. (1981) provide much more detailedinformation on the topic.

Given the basic methods for constrained opti-mization outlined in Appendix B, two approachesto the minimisation problem seem to work effec-tively. The first is to use a Quasi-Newton methodthat builds up an approximation to G using theinformation contained in J evaluated at successiveparameter vector estimates during the minimisationprocess. This method is good for badly behavedobjective functions, and in the case in which thebest fit model actually fits very badly. There are anumber of less than optimally stable implementa-tions described in the literature (e.g. Press et al.,1992) and it is important to use as stable an algo-rithm as possible in the current context: a suitablemethod is described in Gill at al. (1981), with thenecessary matrix factorisations given in Gill et al.(1974). Maximum stability of the method is impor-tant here because J is being approximated relativelycrudely and excessive propagation of the resultingerrors should be avoided (alternatively - stable op-timization methods may allow use of less accurateJ estimates with consequent saving of numerical ef-fort).

12 SIMON N. WOOD

An alternative, that can be very fast on suffi-ciently well behaved problems, is to approximatethe non-linear model with an approximating linearmodel. The following steps are iterated:

1. Using the current parameter estimate p(k) solvethe population model to obtain µ(k) and J(k).

2. Form ‘pseudodata’ y(k) = y − µ(k) + J(k)p(k).

3. Minimise:

(y(k) − J(k)p)T W(y(k) − J(k)p) +∑

λipT Cip

Subject to Acp ≥ bc

to find p(k+1) (the problem being solved here is ex-actly quadratic).

(W is a diagonal matrix with Wii = wi). Inprinciple these steps are repeated to convergence,although in practice the algorithm is improved byincluding step length selection so that the routinedoes not always step to the minimum implied by theapproximating model. Converged estimates will bedenoted p, µ, etc. This method is essentially a con-strained version of the Gauss Jordan method (see,for example Press et al. 1992), but the presenta-tion in terms of an approximating linear model fa-cilitates the smoothing parameter selection methodpresented in the next section.

The fitting methods have been presented as ifthe weights wi were fixed in advance and this willoften be the case. However, if some mean- variancerelation is known for the data then the fitting meth-ods presented can be used iteratively as follows.The model is first fitted with uniform (or other spec-ified) weights given to each data point. New weightsare then produced which are inversely proportionalto the variance predicted from the mean variancerelationship given the µi estimates from the bestfit so far. The model is then re-fitted, weightsare estimated again and the process repeated toconvergence of µ(k) (to µ). This approach is ap-propriate for exponential family error distributions(for example Poisson, gamma, binomial) and thealgorithm described is simply the one used for gen-eralised linear model fitting (e.g. McCullagh andNelder, 1989), although, since no simple transfor-mation of the µi’s will generally linearize the mod-els used here, there is no link function in the currentcase.

Finally, note that I have restricted attention tomethods appropriate to the case in which the objec-tive function is fairly smooth, and where minimisa-tion is not made difficult by multiple local minima.

There are circumstances where these assumptionsdo not hold, and other approaches are needed. Ap-pendix C describes an appealingly simple approachfor dealing with difficult objective functions, thatenables the methods described here to be used un-modified: the objective function itself is repeatedlyperturbed by bootstrapping in a way that allowslocal minima to be escaped. For really difficultproblems other approaches such as simulated an-nealing may have to be used, see Brooks and Mor-gan (1994), for example.

Smoothing parameter selection

The first step in smoothing parameter selectionis to replace the GCV score function (8) with anapproximation that is practical to work with. Usingthe linearisation of the fitting problem employed inthe model fitting section:

V (λ) =(y(k) − J(k)p)T W(y(k) − J(k)p)

[tr(I−A)]2(11)

where A is the so called ‘influence matrix’, or ‘hatmatrix’, of the fitting problem: that is the ma-trix that has the partial derivative of the ith ele-ment of J(k)p w.r.t. y(k)

j in its ith row and jth

column. Explicitly A = J(k)Z[ZT (J(k)T WJ(k) +∑

λiCi)Z]−1ZJ(k)T W, where Z (see Appendix B)is any matrix whose columns form the basis of aparameter space within which unlimited movementis possible without violating the model constraintsthat are exactly satisfied at the estimated best fit.Close to the optimum p and λ vectors, (11) shouldclosely approximate (8).

The principal difficulty in using GCV is the factthat evaluation of the denominator of V (λ) is po-tentially very expensive. Forming A explicitly willtake O(m3

d) operations where md is the numberof data, and since A depends on λ, direct searchfor the minimizing λ would take O(m3mf

d ) opera-tions, where mf is the dimension of λ. In the sin-gle smoothing parameter case a number of methodshave been developed that are much more efficientthan direct evaluation of A, but methods for mul-tiple smoothing parameters have proved more diffi-cult. Fortunately, Gu & Wahba (1991) produced away of minimising a GCV score with respect to mul-tiple smoothing parameters for certain spline mod-els, and it is possible to generalise their methodto the much wider class of problems giving rise toscores like the one used here (Wood, 2000). Onceagain a quadratic model of V is used to facilitate


minimisation. The challenging problem is to findan efficient way of evaluating the gradients and sec-ond derivatives of V with respect to the smooth-ing parameters: the painful details are in Wood(2000), but the essence of the method is to finda sequence of orthogonal transformations of A thatenable cheap evaluation of the GCV score with re-spect to one ‘overall’ smoothing parameter, whileallowing A to be decomposed into components thatfacilitate relatively cheap evaluation of the gradi-ents and second derivatives of the GCV score withrespect to the (logs of the) λi’s. (In many cases, thesame basic computational strategy can also be usedto perform smoothing parameter selection by AIC,where difficulty is caused by the need to calculatethe effective number of parameters in the model,which is tr(A).)

Use of (11) to find the smoothing parametersmust proceed iteratively. When used with the it-erative least squares scheme of the model fittingsection, it is most efficient to re-estimate smooth-ing parameters after each step of the iterativescheme. In the case in which (11) is being usedwith the Quasi-Newton method it is better to al-ternate smoothing parameter estimation with fullminimisation by Quasi-Newton, allowing the modelparameters p to converge at each step. This is be-cause the Quasi- Newton method is attempting tobuild up a quadratic model of the objective q(p),so it is best not to change q(p) during the minimi-sation.

Inference

Hypothesis testing (model comparison) and con-fidence interval estimation can be approached intwo ways: by bootstrapping (Efron and Tibshirani1993, Davison and Hinkley 1997) or by using theasymptotic distributions implied by the quadraticfitting objective.

There are several flavours of bootstrap that canbe employed. The simplest is the non-parametricbootstrap. Replicate data sets are produced bysampling with replacement from the collection ofyi’s with each selected yi being accompanied byall the auxiliary information that goes with it: e.g.which stage of the population it refers to, the timeat which it was sampled, and so on. In practice thismeans that each replicate dataset contains someof the original observations more than once, whilesome do not appear at all. The model is refittedto each replicate dataset and in this way an ap-proximate distribution for the p estimates can be

developed. Note that it is not sensible to performcross validation on these replicates: considerationof the ‘leave-one-out’ idea underlying GCV sug-gests that smoothing parameters would be system-atically under-estimated using the bootstrap repli-cate dataset. Hence the original smoothing param-eter estimates must be used for each model fit. Anadvantage of the non-parametric approach is the au-tomatic preservation of correlation structure in theresiduals.

The second bootstrapping possibility is to boot-strap the residuals. In its simplest from this meansthat the residuals from the original model fit arecollected, and treated as observations from a sin-gle distribution. Replicate data sets are producedby sampling with replacement from the residualsand adding the resampled residuals back on to thebest fit model estimates, µi. Again the model isrefitted to each replicate dataset, and after enoughreplicates an approximate distribution for the p es-timates will be produced.

An approach that has advantages for small sam-ple sizes is to bootstrap parametrically. This in-volves specifying a probability model for the residu-als and estimating its parameters from the observedresiduals. Replicate datasets are then produced bysimulating residuals from the parametric residualmodel and adding these to the µi’s.

The second strategy for inference is to useasymptotic results based on the quadratic modelemployed in minimisation. To do this requires es-timation of the covariance matrix for the parame-ter estimators. Assuming that the weights wi havebeen chosen to be inversely proportional to the sam-ple variances it is clear that the covariance matrixfor the data is Vy = W−1σ2. σ2 can be estimatedin the usual way by:

s2 =(y − µ)T W(y − µ)

tr(I−A)

(i.e. the residual sum of squares divided by theresidual degrees of freedom). There are then twopossibilities for an estimate of the covariance matrixfor the parameters Vp. The conventional approachuses the fact that at convergence of the fitting al-gorithm it is possible to write:

p ≈ k + By

where k and B are a vector and matrix whose exactform depends on the fitting algorithm used. Stan-dard results (e.g. Meyer, 1975, p417) then yield thecovariance matrix estimate:

Vp ≈ BBT s2

14 SIMON N. WOOD

An alternative is to use the co-variance matrix de-rived for spline smoothing by Wahba (1983) usinga Bayesian argument. The analogue in the currentcase is simply:

Vp ≈ Z(ZTGZ)−1ZT s2

in terms of the general quadratic model used forminimisation (again the columns of Z form the nullspace of any constraints satisfied exactly by the bestfit model: see Appendix B). Note however that thisapproximation can perform badly if G is estimatedas part of a Quasi-Newton optimization, since cur-vature in some directions in the model parameterspace can be poorly explored. Hence it may be moreuseful to use:

Vp ≈ Z[ZT (JT WJ +∑

λiCi)Z]−1ZT s2

implied by the iterative least squares scheme. The‘Bayesian’ results tend to give higher variance es-timates than the conventional results, by virtue ofattempting to account for mis-specification in thefinal model. However in simulation studies usinglinear spline models the coverage properties of in-terval estimates based on these results have provedvery good (Wahba 1990). At least in the large sam-ple limit one expects p ∼ N(p,Vp), and inferencecan be based on this result.

Identifiability

Optimization methods work best if model pa-rameters are identifiable from the outset. Thus itis best to remove parameter co-linearity from themodel before attempting to fit it. However, thereare automatic methods for dealing with the prob-lem. The most direct method is to attempt toidentify parameter dependence from the JacobianJ. This can be done by linearly regressing eachcolumn of J on all the other columns - a high r2

indicates problems and suggests that the parame-ter relating to that column should be constrainedat a fixed value (which is easily done, as seen in themodel fitting part of the methods section). The dif-ficulty comes in deciding at exactly what r2 valueone should treat a parameter as non-identifiable,since the columns of J are only estimated to rela-tively low precision. It is possible, but non-trivial,to base the criterion on the calculated accuracy ofthe integration scheme and finite difference approx-imation. On the other hand, the very facts that Jis only approximate and co-linearity is hard to de-tect usually mean that the optimization scheme will

not fail because of singularity problems: parameterswill almost always appear ‘nearly’ co-linear, ratherthan exactly so. In this case inequality constraintscan alleviate potential problems. For example, ifa non-identifiable parameter has upper and lowerbounds supplied then it will usually become fixedat one of these bounds quite rapidly, thus reduc-ing potential numerical problems. However, modelswith unidentifiable parameters will present difficul-ties when it comes to the calculation of confidenceintervals for parameters and may also reduce thereliability of smoothing parameter estimates.

Visualisation

Non-linear fitting methods tend to be as goodas their starting values, and in many cases may dis-play multiple local minima (for a good example ofthis see the Nicholson’s blowfly example, below).For this reason it is a bad idea to fit models blind.It is vital to be able to see a fit progressing, inorder to check that starting values are reasonableand that an obviously false local minimum has notbeen found. It is therefore important that fittingshould be done in an environment where it is easyto modify and test starting values for parameters.In the case of unknown functions, this may meanrelatively sophisticated programming to allow theuser to manipulate starting functions, but withinone of the many windowed environments now avail-able, this is not overly difficult to achieve.

EXAMPLES

This section contains examples of the applica-tion of the methods described thus far, chosen toillustrate different aspects of the PSM approach. Istart with a section that uses simulated data to ex-amine the efficacy of the framework when fittingpartially specified models, with multiple unknownfunctions, using a variety of modelling formalisms.A real example is then given, applying the methodto modelling of laboratory Daphnia cultures beforeexamining model comparison by statistical hypoth-esis testing using fully and partially specified mod-els of Nicolson’s blowfly data. The simplest modelexamples are in the blowfly section. Discrete ma-trix models, partial, ordinary and delay differen-tial equation models are covered in the first exam-ple section. The models presented in all sectionsare relatively simple. Obviously this simplicity issomewhat at odds with the stated aim of reducingbiologically spurious assumption in the modelling


process, but the hope is to be able to illustrate theapproach without becoming swamped in a mass ofmodelling detail.

Marine Copepods: a comparative example withsimulated data

Consider the problem of inferring underlyingmortality and birth rates in a structured copepodpopulation. This problem has produced more meth-ods than there are data to fit (Asknes et al., 1997);a fact for which the current author is as responsibleas anyone. Copepods develop through a number ofsequential life history stages that can be identifiedin population samples. If a model can be devel-oped to predict the population dynamics of eachstage from knowledge of the birth rate to the pop-ulation and the death rates afflicting each stage,then in theory it should be possible to infer theserates by fitting the model to data. This problemallows demonstration of the efficacy of the smooth-ing parameter estimation methodology and the useof these methods with models formulated as dif-ference equations, ordinary differential equations,delay differential equations and even partial differ-ential equations. It also permits exploration of thesensitivity of inferred functions to changes in mod-elling formalism.

Consider an 11 stage population. Suppose thatthe population of stage i at time t is ni(t), that di(t)is the per capita death rate in the stage, R1(t) is therecruitment (birth) rate into the first stage and τi isthe mean duration of the ith stage. Also define thesubsidiary variables γi = 1/τi and the survival ratefrom time a to time b: Si(a, b) = exp(−

∫ ba di(x)dx).

There are several competing structured populationmodels that might be used for the stage popula-tions, for example:

1. A matrix population model of the type advo-cated by Caswell (1989):

n(t + 1) = D(t)n(t) + r

where n is the vector of stage populations, r =(R1(t), 0, 0, . . . , 0)T and all elements of D(t)are zero except Dii(t) = Si(t, t+1)(1−γi) fori = 1, . . . , 11 and Di,i−1 = Si−1(t, t+1)(γi−1)for i = 2, . . . 11. The interested reader shouldconsult Caswell (1989) for justification of thismodel structure. Clearly this model is a sys-tem of difference equations and hence one ofthe general class covered by this paper.

0 50 1000

10

20

30

0 50 1000

20

40

60

0 50 1000

40

80

120

0 50 1000

40

80

0 50 1000

20

40

60

0 50 1000

20

40

60

0 50 1000

50

100

0 50 1000

20

40

60

0 50 1000

20

40

0 50 1000

20

40

0 50 1000

20

40

60

I

2

3

4

5

6

9

8

7

10

11

Sta

ge P

opul

atio

n

Time

Time

Figure 2: Typical replicate copepod data set simulatedfrom the delay differential equation based structuredpopulation model given in the section Marine Copepods:a comparative example with simulated data. Stage du-rations used were: 0.75, 1.4, 4.55, 2.8, 2.5, 1.7, 3.5, 3.1,3.2, 3.7 and 4.7. The functions used to simulate werea Gaussian plus a constant for recruitment, a decayingexponential for δ1 and a constant for δ2 (see figure 3).

2. A system of ordinary differential equations:

dni

dt= ri(t)− γini − di(t)ni i = 1, . . . , 11

where r1(t) = R1(t) and ri(t) = γi−1ni−1(t)for i = 2, . . . , 11. This model describes arather extreme form of distributed stage dura-tions in which individual stage durations arerandom variables following a negative expo-nential distribution.

3. A system of delay differential equations:

dni

dt= ri(t)− ri(t− τi)Si(t− τi, t)− di(t)ni

i = 1, . . . , 11

where r1(t) = R1(t) and ri+1(t) = ri(t −τi)Si(t − τi, t). This system of equations de-scribes the other extreme of stage duration

16 SIMON N. WOOD

distribution, in which all individuals take ex-actly τi days to pass through a stage. SeeGurney and Nisbet (1998) for a clear exposi-tion of this sort of model, or see Gurney et al.(1983).

4. The McKendrick equation for an age struc-tured population:

∂f∂t

+∂f∂a

+ d(a, t)f = 0

where f(a, t) is the population (or expectedpopulation) per unit age interval at age a andtime t and ni(t) is given by the integral of fover the age range covered by stage i at timet. For this model f(0, t) = R1(t). Strictly thismodel is not in the class covered in this paper,but its numerical solution by the ‘method oflines’ (see e.g. Al-Rabeh, 1992) yields a set ofordinary differential equations which is in theclass of models considered here. Discretizingin the age direction yields a system of equa-tions (in an obvious notation) of the form:

dfj

dt= −fj+1 − fj−1

2∆a− di(j∆a, t)fj(t)

j = 1, 2, . . .

where f0(t) = R1(t) and the equation atthe upper age boundary employs a backwardsdifference rather than a centred difference.ni(t)’s are obtained by appropriate summa-tions over the fj(t)’s. For the results givenhere ∆a = 0.05.

In all cases I set up the model initial conditionsto be consistent with the initial recruitment andmortality rates, assuming that the population wasat equilibrium (subject to those rates) prior to theinitial time. I have kept these models very sim-ple: clearly a realistic model would involve a moresensible distribution of stage durations - somethingwhich is easy to achieve within the d.d.e. framework(see Blythe et al. 1984). Note that all the modelshave been defined to be as similar as possible giventheir different structures. None the less models 1and 2 are fairly closely related as are models 3 and4, but 3 and 4 differ quite markedly from 1 and 2.

In some circumstances it may be reasonable totreat the stage durations as known, the birthrateto the first stage as an unknown function, thedeath rate in the first 6 stages (nauplii) as an un-known function of time (δ1(t), say) and the deathrate in the remaining 5 stages as a different un-known function of time(δ2(t), say). This is because

0 40 80

0.01

0.02

0.03

0 40 800

0.1

0.2

0 40 800

0.1

0.2

0 40 800

0.01

0.02

0.03

0 40 800

0.1

0.2

0 40 800

0.1

0.2

0 40 80

0.01

0.02

0.03

0 40 800

0.1

0.2

0 40 800

0.1

0.2

0 40 80

0.01

0.02

0.03

0 40 800

0.1

0.2

0 40 800

0.1

0.2

d

p

o

m

R (t)1 δ 1(t) δ 2(t)

Figure 3: Reconstructed functions from 11 stage sim-ulated copepod population data as described in figure2 and Marine Copepods: a comparative example withsimulated data. The column headings relate to the func-tions being reconstructed, while the row letters refer tothe model used to fit the data: d - delay differentialequations; p - partial differential equations; o - ordinarydifferential equations; m - discrete time matrix model.The heavy line in each panel is the true function usedin simulation. The 10 fine lines in each panel are thereplicate reconstructions (same true functions for each,but different sampling errors).

of substantial physiological and behavioural differ-ences between the naupliar and copepodite stages.To demonstrate the smoothing parameter selectionmethod I simulated an 11 stage population from afully specified version of model 3 above. The simu-lated population was sampled (without error) andthe samples were then perturbed with additive nor-mal random errors (standard deviation 8). A typi-cal resulting data set is shown in figure 2. I gener-ated 10 replicates in total.

I then fitted partially specified models with 3u.f.s to the resulting data treating stage durationsas known. In this case the objective function con-sists of the sum of a weighted least squares termand three wiggliness penalties, each with its ownsmoothing parameter. The objective was subject tolinear constraints imposing the condition that thethree unknown functions must be positive. Eachsimulated data set was fitted with the 4 differentpartially specified models obtained by using the 4different modelling approaches given above. Figure3 shows the reconstructed rates superimposed on


the true rates used in simulation, for each modelfor each of the 10 replicates. All the models werefitted using iterative least squares with λ selectionby GCV. A maximum of 10 degrees of freedom wasallowed for each unknown function (this restrictionis purely pragmatic - the number is higher than theeffective degrees of freedom selected by GCV, butlow enough that the computations are still reason-ably quick, since there are only 10 parameters perunknown function).

Figure 3 illustrates several points. The recruit-ment function is well reconstructed in all cases, andthe first mortality function is also reasonable. Thefinal mortality function fares less well, particularlyusing the two models which have quite different as-sumptions to the model used for simulations. How-ever, it is worth noting that the confidence inter-vals for this last function will be correspondinglywide: since the penalized likelihood approach ap-plies a zero penalty to a straight line the modelfitting will fit a line with a slope no matter how lit-tle statistical significance can be attached to thatslope: this problem is not fundamental, one couldexamine the GCV score of a model with and with-out a slope, and select between them on this basisbut I have not done so in this example. The ten-dency to raised mortality rates in the first half ofthe time range for δ2 under models 1 (o) and 2 (m)is directly attributable to the fact that both thesemodels allow some individuals very rapid transferthrough the stages, in a way that the model usedfor simulation does not (indeed the hint of this ef-fect in the partial differential equation model (4) isprobably the result of numerical diffusion allowinga similar phenomenon). The general comparabil-ity of the reconstructed functions under substan-tially different models is encouraging: although thequantitative results depend on the formalism usedfor demographic book-keeping, the qualitative re-sults do not seem greatly sensitive in this case andreconstructions are all quite close to the truth.

This example also demonstrates the effective-ness of the smoothing parameter selection method,and one of the important advantages of partiallyspecified models. All the fitted functions have farfewer effective degrees of freedom than the max-imum allowed in the fitting (10 in this example)and the fitted functions are mostly reasonable re-constructions of the truth. The importance of effi-ciency in the smoothing parameter selection is em-phasized by the partial differential equation model -solution of this involved in excess of 600 coupled or-dinary differential equations, yet fitting and model

selection was completed in 10-20 minutes of com-puter time on a low specification Pentium II 400Mhz PC. Notice as well that reconstructions arepoorer where the most directly influential popula-tion data is low so that the signal to noise ratio ispoor: the late parts of δ1 and the early parts of δ2.Confidence intervals estimated for these rates willreflect this: indicating clearly where the inferredknowledge of rates is poor. This contrasts with thesituation that pertains for a fully specified model,where the convenient fiction that a good parame-ter sparse model is known tends to lead to muchnarrower confidence limits than can really be jus-tified by data or knowledge - this confidence beingbased almost entirely on extrapolation of the fittedmodel from data rich periods and stages, to datapoor portions of the data set.

Daphnia

The examples presented in the previous sectionused simulated data to compare structurally sim-ple partially specified models constructed using anumber of alternative formalisms. I now turn to anexample using a more complicated partially speci-fied structured model of some Daphnia populationdata kindly supplied by E. McCauley. The dataconsist of time series of adult and juvenile popu-lations from a laboratory culture. The data weremodelled using the moderately detailed structuredpopulation model given in table 2, which was sup-plied by R.M. Nisbet and is a modified version ofthe model of McCauley et al. (1996), which shouldbe referred to for a detailed explanation and justifi-cation. Whilst most of the parameters of this modelhad been independently obtained (see McCauley etal. 1996), α and τA, were left as free parameters. Tosimplify presentation, the model used here assumesthat the rate of food supply to the experimental sys-tem was kept constant (although in reality it wasintroduced in frequent pulses). There was some evi-dence that food quality had been changing through-out the experiment, although no detailed informa-tion was available about how it had changed. Theaim was to produce a model in which food qualitywas described by an unknown, non-negative, func-tion of time. In this way it could be ascertained howmuch of the model mismatch could be explained byfood quality variation and the nature of any foodquality variation could be examined. Food was pro-vided to the population 3 times a week, usually onthe basis of 2,2 and 3 day intervals. This has the po-tential to drive perceived food quality on a weekly

18 SIMON N. WOOD

State variablesMeaning Equation

Juveniles dJ/dt = RJ(t)−MJ(t)− δJ(t)J(t)Adults dA/dt = MJ(t)−MJ(t− τA)PA − δA(t)A(t)

Juvenile Survival dPJ/dt =

{

−PJδJ t < τJ

PJ(δJ(t− τJ)h(t)/h(t− τJ)− δJ(t)) t ≥ τJ

Adult Survival dPA/dt =

{

−PAδA t < τA

PA(δA(t− τA)− δA(t)) t ≥ τA

Juvenile duration dτJdt =

{

1− h/hm t < τJ

1− h(t)/h(t− τJ) t ≥ τJ

Auxiliary functionsMeaning EquationJuvenile ration ρJ(J, A, t) = F (t)V

τI (J+αA)

Adult ration ρA(J, A, t) = αF (t)VτI (J+αA)

Initial inoculum i(t) = r0e−t

Juvenile recruitment RJ = i(t) + ρANAρA0

Juvenile development h(J, A, t) = max(

hmρJρJ+ρJ0

, hmin)

Adult death rate δA = δAme−ρA/ρA1

Juvenile death rate δJ = δJme−ρJ /ρJ1

Juvenile maturation MJ(t) =

{

0 t < τJRJ (t−τJ )PJ h(t)

h(t−τ) t ≥ τJ

Other definitionsFood quality F (t)Microcosm volume VTransfer interval τI

Initial inoculum r0

Minimum development rate hmin

Maximum development rate hm

Adult Longevity τA

Maximum adult death rate δAm

Maximum Juvenile death rate δJmVRation for 1/e adult mortality cut ρA1

Ration for 1/e juvenile mortality cut ρJ1

Juvenile ration halving development ρJ0

Adult food to egg ratio ρA0

Adult to juvenile food ratio α

Table 2: Daphnia model definitions.

basis - so the unknown function in the model wasgiven a maximum number of degrees of freedom of60, in order to be able to accommodate a weeklycycle if necessary.

wi’s for the adults were set to 9 times those forthe juveniles (this weighting was selected to givethe least bad residual plots) and the model was fit-ted by constrained minimisation of a weighted leastsquares objective penalized by a smoothness con-straint on the food quality function. This was doneby iterative least squares. The smoothing parame-ter controlling the trade-off between fidelity to thepopulation data and smoothness of the food qual-ity function was chosen by GCV, as outlined in themethods section.

Figures 4a and 4b show the best fitting modeland the original data plotted together. Figure 4cshows the best fitting F (t) with asymptotic con-fidence bands (which should be treated with cau-tion, given the residuals). Notice that the model

is incapable of reproducing the initial transient inthe data, and that later fluctuations seem to bematched by allowing F (t) to exhibit considerabletemporal variability.

This example illustrates several points. Firstly,it demonstrates that the fitting framework de-scribed can be used to fit real data with multipletime series and irregular sampling and that the factthat the model is moderately structurally complexdoes not cause difficulty. Secondly, it demonstratesan important point about partially specified mod-els: the model was allowed a maximum of 63 freeparameters and still failed to fit the data overly well.The well worn adage that ‘if you give me four freeparameters I will draw you an elephant, and given5 I can make it wiggle its tail’, only contains anytruth if you have freedom to choose the model whichwill incorporate these 5 parameters. In this casethe model structure imposes very strong constraintson what the model population can do, irrespective


0 50 100 150 2000

50

100

150

200

250

300

0 50 100 150 2000

20

40

60

80

0 50 100 150 2000

2

4

6

8

10

12

a

b

c

Days since experiment start

Foo

d qu

ality

Juve

nile

pop

ulat

ion

Adu

lt po

pula

tion

Figure 4: Partially specified model fit to Daphnia pop-ulation data. In (a) and (b) the dashed line joins thepopulation data and the solid line is the model fit. (a)shows the fit to the juvenile population. (b) shows thefit to the adult population. (c) is the fitted food function(a function of time), the solid line is the best estimateand the dashed line an asymptotic 95% confidence in-terval, calculated from the quadratic approximation tothe model fitting objective.

of the values taken by its parameters. The thirdrelated point concerns smoothing parameter selec-tion: the fitted model in fact suggests a relativelyslow rate of change for food quality, so that the finalfood quality function has far fewer effective degreesof freedom than the maximum possible number of60: the GCV criterion has in effect decided that noreal improvement would be achieved by allowing thefood quality to vary more rapidly. This illustratesthat GCV itself avoids over-fitting. Indeed, if foodquality is allowed a full 60 degrees of freedom (bysetting the smoothing parameter to zero), then itvaries widely and rapidly, but the ‘improvement’ infit is marginal.

Nicholson’s Blowflies I: a fully specified example

In order to introduce simple statistical hypoth-esis testing with population models, consider com-paring two alternative hypotheses about the mech-

anisms driving the dynamics of Nicholson’s fa-mous blowflies (1954a,b) using fully specified mod-els (Kendall et al. 1999 discuss this problem in somedetail). The data to be fitted are numbers of adultblowflies Lucila cuprina in laboratory cultures. Inthe cultures examined here the adults were suppliedwith protein at a fixed and limiting rate, but allother resources were supplied in abundance. As canbe seen from figure 5, the population cycled.

The first hypothesis is that the regulatoryprocess driving the blowfly cycles is competitionamongst adults affecting adult fecundity, so that(given suitable starting conditions) the adult pop-ulation can be described by the delay differentialequation:

dAdt

= SA(t− τ)e−A(t−τ)/A0 − δA(t)

where S is a compound variable made up of adultfecundity multiplied by juvenile survival, τ is thedelay from egg laying to adulthood, A0 is the re-ciprocal of the exponential decay rate of fecunditywith adult population, and δ is the adult deathrate. This model was first proposed by Gurney etal. (1980) at the same time as a structurally equiv-alent model was produced by Readshaw and Cuff(1980)

The second hypothesis is that regulation isthrough the effect of juvenile competition on adultsize and hence fecundity: this mechanism can giverise to the following formulation (after Gurney andNisbet, 1985):

dJdt

= B(t)−∆B(t− τ)e−τδJ − δJJ

dAdt

= ∆B(t− τ)e−τδJ − δAA

dBdt

= ∆B(t− τ)e−τδJ W − δAB

dWdt

= g(t)−∆g(t− τ)

g(t) =gm

1 + J/J0

where ∆(t) is zero for t < τ and one otherwise.J and A are adult and juvenile populations, B isbirth rate, W is weight at maturation to adulthood.τ is development time, δA is adult death rate, gm

is a parameter made up from maximum juvenilegrowth rate and adult fecundity per unit weight,J0 controls the rate of decrease growth rate withjuvenile population, µ is maintenance cost and δJ

juvenile death rate. Given the experimental set up,this second hypothesis is almost certainly incorrect,

20 SIMON N. WOOD

but in the current context this is useful, since we canbe fairly sure what the correct answer should be.

Both models were fitted to Nicholson’s data byusing constrained quasi-Newton to minimise thesum of squares of differences between model tra-jectories and adult population counts; parameterswere constrained to be positive, but of course thereare no wiggliness penalties in the fitting objective.The best fits obtained for the two models are shownin figure 5. The r2 values are 0.69 and 0.60 for theadult competition model and juvenile competitionmodels respectively. It might be useful to accessthe statistical significance of these results, but sta-tistical comparison of the models is a non-standardproblem.

To motivate the approach taken here it is worthconsidering what statistical hypothesis testing does,in quite general terms. Statistical hypothesis test-ing amounts to comparing two models of how a setof data has been generated. The model constitut-ing the null hypothesis is a restricted version of themodel constituting the alternative hypothesis: weare comparing a simple model and a more com-plicated model of data. Because the complicatedmodel is an extended version of the simple modelit will always be able to fit any set of data at leastas well as the simple model. Hypothesis testingasks the question: if the simple model is true, whatdistribution should I expect for the improvement infit of the complicated model relative to the simplemodel? All that then remains is to decide whetherthe observed difference in fit is consistent with thisdistribution, and hence with the null hypothesis.

The general method of hypothesis testing is wellcovered in Silvey (1975). Hypothesis testing usingthis approach in combination with standard distri-butional results is familiar in the context of analy-sis of variance or likelihood ratio tests (in general-ized linear modelling for example, McCullagh andNelder, 1989). If the assumptions required to usestandard distributional results are not met then anobvious computer intensive approach can be taken:generate replicate data from the best fit of the sim-pler model, and build up an empirical distributionfor the difference in fit between the two models, byfitting both to each of these replicates. Paramet-ric or semi-parametric bootstrapping is the obviousway to achieve this. A detailed discussion of thisapproach in the context of linear regression can befound in Davison and Hinkley (1997 section 6.3.2).

The current example does not quite fit into thegeneral hypothesis testing framework because theworse fitting model is not obviously a restricted ver-

sion of the better fitting model. Hence we don’tknow that the better fitting model would fit moreclosely even if the worse fitting model were trueand this precludes the use of general methods suchas the likelihood ratio test for comparing thesemodels (again Silvey, 1975, provides the necessarybackground for a mathematical appreciation of thispoint). Never the less, the general question of whatsort of improvement in fit of the better fitting modelover the worse fitting model is expected if the worsefitting model is true retains exactly the same mean-ing in the current non-nested case that it holds inthe case of nested models. By the same token theanswer holds the same scientific implications in thiscase as it holds in the case of nested models. Appre-ciation of this latter point suggests that it is worthusing a computer intensive approach to obtain anapproximate p-value to attach to the improvementin fit of the adult competition model.

I therefore simulated data by assuming the bestfit juvenile growth limitation model to be true.Replicate data sets were generated by samplingwith replacement from the residuals of the best fitjuvenile growth model and adding these resampledresiduals to the model predictions of the populationat each sampling time. Both models were fitted toeach replicate and their difference in goodness offit was assessed. By generating a large number ofreplicate datasets under the null hypothesis (thatthe juvenile growth model generated the data), itis possible to obtain an approximate distribution ofthe difference in fit between the two models assum-ing that the null hypothesis is true. The propor-tion of differences in fit that are at least as largeas the difference observed when fitting the originaldata, provides a p-value to associate with testingthe null hypothesis that the juvenile growth modelgenerated the data, against the alternative that theadult competition model did so. Figure 5c gives ahistogram of the simulated distribution of badnessof fit under the null (measured by residual sum ofsquares), and the true difference.

The adult competition model clearly fits bet-ter than the juvenile growth model, and this wouldtend to lend support to the adult competition hy-pothesis as opposed to the juvenile competition hy-pothesis. Of course this support is equivocal: giventhe simple structure of the models used and thefairly arbitrary specification of the key functionalrelationships within both models it is not clear towhat extent the test results are related to the abil-ity of the models’ incidental assumptions to matchreality, rather than relating to the core ecological

PARTIALLY SPECIFIED MODELS 21P

opul

atio

n (t

hous

ands

)P

opul

atio

n (t

hous

ands

)

0

2

4

6

8

10

0 100 200 300 400Time (days)

a

0

2

4

6

8

10

0 100 200 300 400Time (days)

b

100500-50-100-150-200

40

30

20

10

0

Test statistic interval midpoints

Fre

quen

cy

c

Figure 5: Comparison of fully specified model fits toNicholson’s Blowfly data. (a) The best fit of the adultfecundity model (solid line) to the experimental data(joined by dashed line). (b) The best fit of the juvenilecompetition model (solid line) to the same data (againjoined by a dashed line). (c) A histogram of the dif-ference in mean square deviations of models from data,for each of 99 replicate non-parametrically bootstrappeddatasets under the null hypothesis that the adult fecun-dity model is as good a description of the underlyingdynamics as the juvenile competition model. The tri-angle shows the difference in mean square errors for theoriginal data.

assumptions underpinning the two models. Thisdeficiency will be partially rectified in the next sec-tion. Note also that, as in most statistical hypoth-esis testing, the question posed is quite limited -attention has been restricted to just two alterna-tive mechanisms as embodied in two simple models- hence it is only the relative merits of the two al-ternatives that are assessed. Furthermore the boot-strap analysis just described must be treated withcaution, since the replicates under the null do notpreserve the covariance structure in the residuals,

and it is not clear under what circumstances thisstructure should be preserved (also note that thedifferences in goodness of fit is unlikely to be a piv-otal quantity in the statistical sense). Hence a com-plementary model comparison should also be per-formed, by non-parametrically bootstrapping fromthe data, to obtain a bootstrap estimate of the dis-tribution of goodness of fit difference between thetwo models. An example of this approach will begiven in the next section, when partially specifiedmodels for the blowfly population are considered.

Of course, none of this hypothesis testing ma-chinery is necessary if we merely seek to find whichmodel is“best” with respect to criteria like AIC orGCV, and do not seek to attach a p-value to thechoice. AIC in particular is quite often used forselecting between non-nested models, but the va-lidity of the approach does not seem to be settledin the literature. Chapter 6 of Burnham and An-derson (1998) should probably be read critically be-fore deciding to compare non-nested models in thisway. Murata et al. (1994) give a clear statementof the concerns about the practice raised by a care-ful examination of the derivation of AIC. However,if an AIC model selection approach is taken thenBurnham and Anderson (1998) suggest some inter-esting approaches for assigning confidence levels tothe model choice.

This example demonstrates the utility of the fit-ting approach for fully specified models, which are,after all, only special cases of partially specifiedmodels. On the other hand the example can alsobe used to illustrates some of the difficulties withmodel fitting by trajectory matching. The fact thata number of cycles are to be matched, means thatthe objective function is almost certain to have sev-eral local minima. To see this, imagine doublingthe frequency of the model cycles - every other cy-cle would still line up with a cycle in the real data- to move away from this situation is bound to in-volve increasing the objective function. One wayaround this difficulty, that works well in practice,is to supplement the fitting objective with somesquared terms that penalise deviation of the modelACF from the data ACF at a number of lags. Giv-ing sufficient weight to this part of the objective of-ten results in rapid attainment of approximately theright cycle period, after which the extra terms canbe removed. Kendall et al. (1999) present severalinteresting approaches to both model fitting and in-ference with respect to this example, including theuse of stochastic population dynamic models.

22 SIMON N. WOOD

Blowflies II: partially specified model comparison

Finally, consider the blowfly example again, butthis time let the analysis proceed with partially,rather than fully, specified models. Consider theadult fecundity model. An appropriate partiallyspecified statement of this model is:

dndt

= β(nt−τ )nt−τ − δ(nt)nt

where β is a monotonically decreasing non-negativefunction of density and δ is a non-negative mono-tonically increasing function of density.

The partially specified version of the juvenilegrowth limitation model is a little more compli-cated:

dJdt

= Bt −∆Bt−τ

dAdt

= ∆Bt−τ − f(Bt)At

dBdt

= ∆Bt−τWt − f(Bt)Wt

dWdt

= g(Jt)−∆g(Jt−τ )

Now g, the juvenile growth rate, can be expected todecline monotonically with juvenile density (whileremaining non-negative), and f the adult per capitadeath rate will be a monotonically increasing func-tion of B (which is proportional to adult biomass).∆ again takes the value 1 when t > τ and 0 other-wise. For practical fitting the unknown functions ineach of these models were each allowed at most 10degrees of freedom (again, this is well above the de-grees of freedom selected by GCV, while 10 param-eters per unknown function still allows relativelyquick computation). The function domains wereselected with some experimentation to make surethat the they were matched to (but were a bit largerthan) the domains suggested by the best fit models.

When fit to Nicholson’s data these partiallyspecified models fit slightly better than their fullyspecified counterparts. r2 values show only slightimprovement to 0.71 and 0.61, for the adult fecun-dity and juvenile competition versions, respectively,but the juvenile competition model now has a bestfit trajectory that is not obviously worse than theadult fecundity model.

Note that although the models are both capableof producing the double humped peak evident inparts of Nicholson’s data, the best fitting models,do not display such dynamics. Figures 6a and 6b

show the best fits of the 2 models and the best fitforms for f and g in both models. τ and the initialadult population were also fitted as a free parame-ters for both models, and the juvenile growth limi-tation model had an additional free parameter: theinitial adult fecundity.

The significance of the goodness of fit improve-ment of one model over the other was assessed bynon-parameteric bootstrapping, in order to obtaina 98% confidence interval for the difference in modelgoodness of fit (as measured by difference in resid-ual sum of squares divided by 1 million). In all99 replicates the adult fecundity model fitted bet-ter than the juvenile growth limitation model: soif bootstrap percentile confidence intervals (Efronand Tibshirani, 1993) are used the 98% C.I. for thedifferences in model fits does not include zero. Ifbasic bootstrap confidence intervals (Davison andHinkley, 1997) are used then the 98% confidenceinterval includes zero, but the 95% C.I. does not. Itis reasonable to conclude that the adult fecunditymodel is better at the 5% level.

Note that interpretation of statistical modelcomparison is complicated by the fact that it is notobvious to what extent the models can be treated as‘nested’. To see the problem, consider nested mod-els in linear regression. Using percentile confidenceintervals, the bootstrap analysis just performedwould never reject the model E(y) = a + bx + cx2

in favour of E(y) = a + bx, no matter how spuriousthe quadratic term. Using basic bootstrap intervalsthe quadratic model could be rejected, but thereis no reason to suppose that the p-value would re-late to the probability of rejecting a correct null hy-pothesis! These considerations would tend to favouruse of the bootstrap analysis employed for the fullyspecified blowfly models: but the error structureclearly violates the assumptions underlying that ap-proach suggesting that it should not be used alone.Despite the problems the combination of the ‘boot-strap under the null’ and non-parametric bootstrapprovides a useful tool.

As well as illustrating another method of modelcomparison, this example demonstrates the utilityof partially specified models, even when a fully spec-ified model is available. In this case it was possibleto check whether the functions chosen for use in thefully specified models produced spurious constraintson the population dynamics that restricted one orboth of the models in a manner that had nothing todo with the biology that the models were attempt-ing to describe. For the juvenile competition modelthere is some evidence for this being the case, since


12

0 8,0000

0 8,0000

0.6

0 16,0000

0.3

0 3,5000

0.25

200175150125100755025

30

20

10

0

Fre

quen

cy

Difference in goodness of fit

c

0 100 200 300 4000

2

4

6

8

10

b

Pop

ulat

ion

(tho

usan

ds)

0 100 200 300 4000

2

4

6

8

10

a

Pop

ulat

ion

(tho

usan

ds)

Figure 6: Partially specified blowfly models. (a) showsthe best fitting population trajectory for the adult fe-cundity model (r2 = 0.71), in the large panel: thedashed line joins the actual data, the solid line is modelfit. The two small panels are the best fit adult fecundityas a function of adult density (left) and the best fit adultdeath rate as a function of adult density. (b) is similarto (a), but shows the best fit growth limitation model(r2 = 0.61). The left small panel shows juvenile growthrate as a function of juvenile abundance, while the rightsmall panel is adult per capita death rate as a function oftotal adult fecundity (a surrogate for biomass). (c) is ahistogram of the difference in badness of fit between thetwo models in 99 non-parametric bootstrap re-samples.Badness of fit is measured by error sum of squares over106: the range of bootstrapped values went from 15 to197, the original value was 98.

the partially specified model yields dynamics whichappear to be a more reasonable match than the fullyspecified equivalent. However, it still appears thatthe adult fecundity model is significantly better, sothat confidence in this conclusion is strengthened.In short, within the limitations of the rather sim-plistic models employed for this example, the analy-sis based on partially specified models comes closerto testing for differences between competing mech-anisms, rather than testing for differences betweenalternative sets of incidental assumptions.

DISCUSSION

The approach described in this paper providesa way of formulating ecological models that pre-cisely reflect the information about a system thatthe modeller wants to include, while avoiding as-sumptions that the modeller doesn’t wish to includein the hope of reducing reliance on biologically spu-rious assumptions. More importantly it providesa practical way of using such models by providingfitting methods, as well as model selection and in-ference techniques. In the context of statistical es-timation the benefits of an approach that avoidsarbitrary parametric models are well known fromwork on GAMs and Spline models (Hastie and Tib-shirani 1990, Wahba 1990) and the methods can beviewed as an extension of these techniques whichintroduces mechanism and non-linearity into themodels employed. In ecological contexts the ben-efits of partial specification have been recognizedand clearly demonstrated in a number of specialcases (Bjørnstad et al. 1999, 1998, Ellner et al.1998, 1997, Wood 1997, 1994, Ohman and Wood,1996 and Wood and Nisbet 1991): what this pa-per does is to move from special cases to a generalframework, as well as solving the crucial problemof choosing the optimal model complexity, withoutwhich the exercise would merely have replaced oneset of arbitrary model assumptions with another (bycomplexity I mean the complexity of the componentfunctions of the model, as controlled by their asso-ciated smoothing parameters λ).

The methods described have several practicalbenefits for fully and partially specified model fit-ting and comparison: when used for ‘trajectorymatching’ the objective function makes good use ofinformation over all available timescales, which hasdistinct advantages for short noisy datasets; missingvalues, uneven sampling and unobserved state vari-ables present no technical difficulties; the efficient,reliable and robust fitting methods make it possi-ble to employ modern computer intensive inference

24 SIMON N. WOOD

methods with many models, and to have some con-fidence in the notion that differences in model fit re-flect real differences in model performances, ratherthan differences in how much help was given to thefitting routine.

Of course the approach is far from perfect. Pro-duction of a method that allows formal inferenceand model selection by a means other than arm-waving has necessitated some restrictive choices. Ihave considered a class of models that is quite spe-cific to ecology, and have not considered stochasticdynamics. The latter omission can lead to difficultywhen analysing long time-series for cyclic systemswith considerable process noise. In this circum-stance the process noise may produce phase driftin the dynamics, which a deterministic model cannot match. This implies that some care will be re-quired in selecting the ‘data’ to be matched in theobjective function employed here: if the model isnot designed to exactly mimic phase characteristicsof the modelled system dynamics, then the ‘data’to be fitted needs to be relatively phase insensitive.The obvious way to approach this problem is totransform cyclic timeseries data to obtain, for ex-ample, a data vector to be matched which consistsof the mean, sample variance, and ACF or PACF ofthe original data. The difficulty with this approachis lack of independence of the resulting data, whichwill strip λ selection methods of theoretical justifi-cation.

Chaotic systems are also problematic: it’s fairlyobvious that they should never be fit by trajectorymatching - the possibility of entirely spurious fit tosignal and noise is ever present, and yet any chaoticfit is contingent on the ecologically nonsensical no-tion that the system’s parameters really did notchange at all over the course of the data, and thatthere was absolutely no process noise. Happily thevery fragility of a chaotic trajectory match meansthat it’s very difficult to find one - in the chaoticregime the objective function becomes far too night-marish a landscape for the methods employed here,designed as they are for relatively gentle slopes androlling valleys. Pragmatically, this means that atrial step into the chaotic region of parameter spacealmost never leads to decrease in the objective func-tion, and if the minimisation routine does get stuckthere the problem is easily diagnosed. Even withnon-chaotic systems the fitting objective will some-times be too irregular for the methods used here:Appendix C describes an effective approach whenlocal minima are relatively small scale nuisance fea-tures of the objective, but in more severe cases other

techniques, such as simulated annealing (e.g. Presset al. 1992, Brooks and Morgan, 1994) may be theonly way to make progress.

Discrete time cyclic systems can also displayproblems of their own: such systems do not nec-essarily yield smooth changes in frequency as pa-rameters change... a feature which has the advan-tage of acting against phase drift in the data itself,but the disadvantage of promoting local minima inthe fitting objective. Discrete models also display‘aliasing’ effects when the cycle period is not an in-teger multiple of the model time-step: the resultingsmall scale irregularities can also cause local min-ima. Fortunately, both of these difficulties can oftenbe overcome by the use of “bootstrap restarting”during model fitting as described in Appendix C.

In the light of figures 5 and 6 it is important notto overstate the difficulty of fitting cyclic data bythe methods suggested here, and in any case mostdata to be fitted will not be long cyclic series. Neverthe less there are alternative model fitting methodsthat avoid what problems there are. One approachis to use auto-regressive methods (e.g. Caswell andTwombly 1989, Wood 1997). At its most generalthis simply means using the model to predict thenext data point or two on the basis of current (andperhaps past) data points. The model parametersthat do the best job at this are considered to fit thedata best. Continuous models can be dealt withby the simple expedient of smoothing the data andseeing how well the model predicts the smoothedgradients when fed the smoothed data (see Ellneret al. 1997). A disadvantage of simple regres-sion approaches is the need to observe all the statevariables, although more sophisticated approachesavoid this by iterating the original model in orderto statistically build up a dynamically equivalentmodel which predicts observed data solely from pre-viously observed data (e.g. Ellner et al. 1998).A more fundamental problem with using regressiontype approaches with partially specified models isthe difficulty in selecting the flexibility of modelfunctions: it is not hard to think up ad hoc tech-niques, but an efficient objective method with a firmtheoretical basis is not yet available. Also, althoughregression methods are a sensible way of dealingwith process error, they fare badly in the face ofsampling error by focusing on a feature of the datathat is likely to have the lowest signal to samplingerror ratio. To see this, imagine a population whosedynamics are a sine wave corrupted by samplingerror of comparable magnitude to the wave ampli-tude. A regression method would clearly yield very


poor results in this case, while trajectory matchingwould give quite reasonable estimates. Inferenceis also complicated in the regression case, partlybecause both predictor and response variables aresubject to measurement error.

An appealing extension of the partially specifiedmodelling framework would be the direct inclusionof stochasticity into the modelled dynamics. Overshort timescales one approach would be to includeprocess errors directly. Many noise models can becharacterised by a single scale parameter, so that itis possible to generate a set of zero mean pseudo-random variates from the noise model assumingone value for the variance, and simply scale thesevariates in order to obtain a set of variates consis-tent with the same model with a different variance.Interpolating these variates with a smooth curve,yields a realisation of a noise process that can beincorporated into a continuous model (S.P. Ellner,pers. comm.), and the variance of which can becontrolled by a single scale parameter. By averag-ing over a fairly large number of replicates of sucha process the expected population and other usefulmoments of the population’s statistical distributioncan be obtained, given any set of model parameters.Smoothness of the fitting objective is maintained byonly changing the scale of the variates, rather thangenerating new variates for each new set of param-eters in the fitting process. This smoothness shouldallow all the methods reported here to operate cor-rectly.

A further option for modelling stochasticity indynamics is simply to add a perturbation functionof time to the model structure as an unknown tobe estimated, and to treat this as a ‘random ef-fect’ within the fitting framework. In some casesthis results in a straightforward penalized regres-sion problem in which the smoothing parameter tobe estimated is proportional to the reciprocal ofthe variance of associated with the random function(further details of this approach will be publishedelsewhere).

One non-obvious use of the current framework isto simultaneously fit dynamics and separate param-eterization data. This is usually easy to achieve:for example it is always possible to define a statevariable that simply takes the value of a parameterof interest, and the deviation of this state variablefrom independent measurements of the state vari-able is then easily included in the fitting objective.More sophisticated generalisations of this approachwould be useful. Raftery et al. (1995) have at-tempted just such a synthesis of multiple sources

of parameterisation data and population dynamicdata using simple deterministic demographic mod-els for Bowhead whales and it provides clear bene-fits in the context of prediction.

But is model fitting really a good thing at all?A model that gets it right without fitting, purelyon the basis of independent parameter estimates,is always impressive. By contrast fitting is oftenfrowned upon, as if the validity of a model is un-dermined by it. Why is this? Partly it is becausea good ecological model needs to be relatively ro-bust to parameter variation to be taken seriously:if a model can only match reality for parameters ly-ing within some very narrow band then it is prob-ably wrong - the real quantities that the model pa-rameters represent will have varied quite substan-tially over the time period of data collection. Atthe same time the parameter values required tomake the model fit ought to agree with indepen-dent measurement. A model that fits data on thebasis of independently measured parameter valuesobviously meets this last criterion, but must almostalways have met the robustness criterion as well -otherwise it would be most unlikely that a good fitwould have been obtained by simply using the in-dependent point estimates for the parameters. Somodels that fit well without parameter adjustmentare likely to be good, but this should not be takento imply that a model is likely to be wrong if itdoes have to be fitted. Dynamics can alter substan-tially within the region of parameter space that isconsistent with independent parameterisation dataand to reject a model because it did the wrongthing at a single point somewhere in the middleof that region is a nonsense. To really test models,they should be fitted to data, while checking foragreement with independent parameter estimatesand separately checking robustness. The methodspresented here make this feasible, while markedlyincreasing the scope and quality of model fittingbased inference in those applications which provokeno controversy, such as hypothesis testing and es-timation. A Windows package implementing themethods is available free of charge from the author.

ACKNOWLEDGEMENTSRoger Nisbet provided ideas and encourage-

ment over the rather lengthy period that this worktook, as well as test driving the code implementingthe methods, and commenting on the manuscript.Bruce Kendall provided many helpful suggestionson the paper and discussions with him and SteveEllner heavily influenced the discussion section.

26 SIMON N. WOOD

One anonymous referee provided a large number ofhelpful suggestions which improved the introduc-tion and the examples sections; a later referee andF.R. Adler suggested further very helpful improve-ments. Peter Turchin suggested matching ACFs.Interactions with members of the NCEAS complexpopulation dynamics group have been very use-ful: Cheryl Briggs, Steve Ellner, Bruce Kendall,Ed McCauley, Bill Murdoch, Roger Nisbet and Pe-ter Turchin. Ed McCauley and Roger Nisbet pro-vided the Daphnia data and model. Joe Horwoodprovided the initial problem, and other encourage-ment. Thanks to all of them. Parts of this workwere conducted as part of the Complex PopulationDynamics Working Group at the National Centerfor Ecological Analysis and Synthesis at Santa Bar-bara, funded by NSF DEB-94-21535, and the workwas started at the NERC Centre for Population Bi-ology.

LITERATURE CITED

Al-Rabeh, A. 1992. Towards a general integrationalgorithm for time dependent one-dimensionalsystems of parabolic partial-differential equa-tions using the method of lines. Journal of Com-putational and Applied Mathematics 42(2):187-198.

Akaike, H. 1973. Information theory as an exten-sion of the maximum likelihood principle. Pages267-281. in B.N. Petrov and F. Csaki editors.Second International Symposium on Informa-tion Theory. Akademiai Kiado, Budapest.

Asknes, D.L., C.B. Miller, M.D. Ohman and S.N.Wood. 1997. Estimation techniques used instudies of copepod population dynamics - A re-view of underlying assumptions. Sarsia 82:279-296.

Bjørnstad, O.N., M. Begon, N.C. Stenseth, W.Falck, S. M. Sait and D.J. Thompson. 1998.Population dynamics of the Indian meal moth:demographic stochasticity and delayed regula-tory mechanisms. Journal of Animal Ecology67: 110-126.

Bjørnstad, O.N., J.M. Fromentin, N.C. Stenseth,and J. Gjøsæter. 1999. A new test for density-dpendent survival: the case of coastal cod pop-ulations. Ecology 80(4):1278-1288.

Blythe, S.P., R.M. Nisbet and W.S.C. Gurney.1984. The dynamics of population- modelswith distributed maturation periods. Theoret-ical Population Biology 25(3):289-311.

Brooks, S.P. and B.J.T. Morgan. 1994. Auto-matic starting point selection for function opti-mization. Statistics and Computing 4: 173-177.

Burnham, K.P. and D.R. Anderson. 1998.Model Selection and Inference: A Practi-cal Information-Theoretic Approach. Springer-Verlag, New York.

Caswell, H. 1989. Matrix Population Models. Sin-auer, Sunderland. Mass.

Caswell, H. and S. Twombly. 1989. Estimation ofstage-specific demographic parameters for zoo-plankton populations: methods based on stage-classified matrix projection models. Pages 94-107 in L. McDonald, B. Manly, J. Lockwood andJ. Logan, editors. Estimation and analysis of In-sect Populations Springer-Verlag.

Craven, P. and G. Wahba. 1979. Smoothing noisydata with spline functions. Numerische Mathe-matik 31: 377-403.

Davison, A.C. and D.V. Hinkley. 1997. BootstrapMethods and their Application. Cambridge Uni-versity Press, Cambridge.

Efron, B. and Tibshirani. 1993. An Introductionto the Bootstrap. Chapman and Hall, London.

Ellner, S.P., B.E. Kendall, S.N. Wood, E. Mc-Cauley, C.J. Briggs. 1997. Inferring mechanismfrom time-series data: Delay-differential equa-tions. Physica D 110:182-194.

Ellner, S.P., B.A. Bailey, G.V. Bobashev, A.R.Gallant, B.T. Grenfell and D.W. Nychka.1998. Noise and nonlinearity in measles epi-demics: combining mechanistic and statisticalapproaches to population modelling. AmericanNaturalist 151(5):425-440.

Gill, P.E. , G.H. Golub, W. Murray and M.A.Saunders. 1974. Methods for Modifying Ma-trix Factorizations. Mathematics of Computa-tion 28:505-535.

Gill P.E., W. Murray and M.H. Wright. 1981.Practical Optimization. Academic Press, Lon-don.

Green, P.J. and B.W. Silverman. 1994. Nonpara-metric Regression and Generalized Linear Mod-els. Chapman and Hall, London.

Gu. C. and G. Wahba. 1991. MinimizingGCV/GML scores with multiple smoothing pa-rameters via the newton method. SIAM Journalon Scientific and Statistical Computation 12:383-398.


Gurney, W.S.C., R.M. Nisbet and S.P. Blythe.1980. Nicholson’s blowflies revisited. Nature287:17-21.

Gurney, W.S.C., R.M. Nisbet and J.H. Law-ton. 1983. The Systematic formulation oftractable single species population models incor-porating age structure. Journal of Animal Ecol-ogy 52:479-495.

Gurney, W.S.C. and R.M. Nisbet. 1985. Fluc-tuation periodicity, generation separation, andthe expression of larval competition. Theoreti-cal Population Biology 28:150-180.

Gurney, W.S.C. R.M. Nisbet and S.P. Blythe.1986. The systematic formulation of models ofstage structured populations. Pages 474-494 inJ.A.J. Metz and O. Diekmann, editors. TheDynamics of Physiologically Structured Popu-lations. Springer Verlag Berlin.

Gurney, W.S.C. and R.M. Nisbet. 1998. EcologicalDynamics. Oxford University Press.

Hairer, E., S.P. Norsett and G. Wanner.1987. Solving Ordinary Differential EquationsI. Springer-Verlag, Berlin.

Hastie, T.J. and R.J. Tibshirani. 1990. General-ized Additive Models. Chapman and Hall, Lon-don.

Higham, D.J. 1993a. Error control for initial valueproblems with discontinuities and delays. Ap-plied Numerical Analysis 12: 315-330.

. 1993b. The tolerance proportionality ofadaptive ODE solvers. Journal of Computa-tional and Applied Mathematics 45:227-236.

Kendall, B.E., C.J. Briggs, W.W. Murdoch, P.Turchin, S.P. Ellner, E. McCauley, R.M. Nis-bet and S.N. Wood. 1999. Why do populationscycle? A synthesis of statistical and mechanis-tic modeling approaches. Ecology 80(6):1789-1805.

May, R.M. 1974. Stability and Complexity inModel Ecosystems. Princeton University Press.

MacDonald, N. 1978. Time lags in biological mod-els. Springer-Verlag, Berlin.

. 1989. Biological Delay Systems. Cam-bridge University Press.

McCauley, E., R.M. Nisbet, A.M. deRoos, W.W.Murdoch and W.S.C. Gurney. 1996. StructuredPopulation Models of Herbivorous Zooplankton.Ecological Monographs 66:479-501.

McCullagh, P. and J.A. Nelder. 1989. GeneralizedLinear Models. Chapman and Hall, London.

Meyer, S.L. 1975. Data analysis for Scientists andEngineers. John wiley and Sons, New York.

Murata, N., S. Yoshizawa and S. Amari. 1994. Net-work Information Criterion - Determining theNumber of Hidden Units for an Artificial NeuralNetwork Model. IEEE Transactions on NeuralNetworks 5(6):865-871.

Nicholson, A.J. 1954a. Compensatory reactionsof populations to stresses and their evolutionarysignificance. Australian Journal of Zoology 2:1-8.

. 1954b. An outline of the dynamics of ani-mal populations. Australian Journal of Zoology2: 9-65.

Nisbet, R.M. and W.S.C. Gurney. 1983. The sys-tematic formulation of population models for in-sects with dynamically varying instar durations.Theoretical Population Biology 23:114-135.

Nisbet R.M. 1997. Delay-Differential Equations forStructured Populations. Pages 89-118 in S. Tul-japurkar and H. Caswell, editors. Structured-Population Models in Marine, Terrestrial, andFreshwater Systems. Chapman & Hall, NewYork.

Ohman, M.D. and S.N. Wood. 1996. Mortal-ity estimation for planktonic copepods: Pseu-docalanus newmani in a temperate fjord. Lim-nology and Oceanography 41(1):126-135.

Paul, C.A.H. 1992. Developing a delay differentialequation solver. Applied Numerical Mathemat-ics 9:403-414.

Press, W.H., S.A. Teukolsky, W.T. Vetterling andB.P. Flannery. 1992. Numerical Recipes in C.Cambridge University Press, Cambridge.

Raftery, A.E., G.H. Givens and J.E. Zeh. 1995. In-ference from a Deterministic Population Dynam-ics Model for Bowhead Whales. Journal of theAmerican Statistical Association 90:402-416.

Readshaw, J.L. and W.R. Cuff. 1980. A modelfor Nicholson’s blowfly cycles and its relevanceto predation theory. Journal of Animal Ecology49:105-1010.

Silvey, S.D. 1975. Statistical Inference. Chapmanand Hall, London.

Villalobos, M. and G. Wahba. 1987. Inequalityconstrained multivariate smoothing splines with

28 SIMON N. WOOD

application to the estimation of posterior proba-bilities. Journal of the American Statistical As-sociation 82:239-248.

Wahba, G. 1983. Bayesian “Confidence Interval”for the Cross-validated Smoothing Spline. Jour-nal of the Royal Statistical Society B 45:133-150.

. 1990. Spline Models of Observational data.SIAM Philadelphia.

Watkins, D.S. 1991. Fundamentals of Matrix Com-putations. John Wiley and Sons, New York.

Wood S.N. and R.M. Nisbet. 1991. Estimationof Mortality Rates in Stage-Structured Popula-tions. Springer Verlag.

Wood, S.N. and M.B. Thomas. 1999. Super sen-sitivity to structure in biological models. Pro-ceedings of the Royal Society B 266: 565-570.

Wood, S.N. 1994. Obtaining birth and mortal-ity patterns from structured population trajec-tories. Ecological Monographs 64 23-44.

. 1997 Inverse problems and structuredpopulation dynamics. in Structured-PopulationModels. Pages 555-586 in S. Tuljapurkar and H.Caswell, editors. Structured-Population Modelsin Marine, Terrestrial, and Freshwater Systems.Chapman & Hall, New York.

. 2000. Modelling and smoothing parameterestimation with multiple quadratic penalties. inpress Journal of the Royal Statistical Society B.

APPENDIX A:SPLINE FUNCTIONS

A cubic spline, f(x), say, is the smoothest curve(in the sense of minimum integrated square of sec-ond derivative) through a set of n points (xi, pi),say (so pi = f(xi)). There are many alternativeways of representing splines using different sets ofbasis functions, for example, defining parameters a,b and ci’s:

f(x) = a + bx +n

∑

i=1

ci|x− xi|3

is one representation (see, e.g. Wahba 1990), al-though extra conditions are required to ensure itsuniqueness. The representation actually used in theexamples in this paper treats the pi’s as parametersof the spline. Specifically I used:

f(x) = miζ0i(x) + mi+1ζ1i(x) + piψ0i(x)

+pi+1ψ1i(x) xi ≤ x ≤ xi+1

where mi is the second derivative of f at xi (i.e.mi = f ′′(xi)) and the functions are: ζ0i(x) =[(xi+1 − x)3/hi − hi(xi+1 − x)]/6 , ζ1i(x) = [(x −xi)3/hi − hi(x − xi)]/6, ψ0i(x) = (xi+1 − x)/hiand ψ1i(x) = (x − xi)/hi, where hi = xi+1 − xi.m1 and mn are set equal to zero to yield theso called “natural” spline. The remaining mi’s([m2,m3 . . . , mn−1]T = m) are completely deter-mined by the pi’s via the linear equation:

Bm = Hp (that is m = B−1Hp)

where the (n − 2) × (n − 2) matrix B and the(n − 2) × n matrix H have zeroes everywhere ex-cept as follows: Hi,i = 1/hi Hi,i+1 = −(1/hi +1/hi+1) Hi,i+2 = 1/hi+1 Bi,i = (hi +hi+1)/3 1 ≤i ≤ n − 2 and Bi,i+1 = Bi+1,i = hi+1/6 1 ≤ i ≤n− 3 . . . this representation is quite convenient forcomputational purposes, but it is straightforward,if tedious, to demonstrate that it can be re-writtenin the form:

f(x) =n

∑

i=1

piγi(x)

under a suitable (but rather long winded) definitionof the basis functions γi(x). Another convenientfact is that:

∫

[f ′′(x)]2 dx = pT HT B−1Hp

APPENDIX B:CONSTRAINED OPTIMIZATION

This appendix sketches the principles under-pinning constrained optimization with a quadraticmodel. Suppose that we wish to minimise the r.h.s.of (10) subject to the linear equality constraintsAfp = 0, where Af is an (m×r) matrix. To do thiswe need to find a form for p that will ensure thatit never violates the constraints. Suppose that it ispossible to find a matrix Q such that AfQ = T,where T is a matrix whose first r −m columns arezero. This means that if we write the first r − mcolumns of Q in a matrix Z, then AfZ = 0. Hencefor any r −m dimensional vector pz: AfZpz = 0.So writing p = Zpz, ensures that p will never vi-olate the constraints. It turns out that Q is quiteeasy to construct using ‘Householder’ rotations ap-plied to Af (see Watkins, 1991). Substituting forp in (10) yields:

q(pz) ≈ a + hT Zpz +12pT

z ZT GZpz


Differentiating with respect to each element of pz

and setting the results to zero yields the minimum:

pz = (ZT GZ)−1ZT h

Inequality constraints are dealt with iteratively.Start with a parameter vector that violates noneof the constraints and then find the direction inwhich to alter the parameter vector to minimise themodel. If proceeding to this minimum would vio-late an inequality constraint then a step is takento the constraint and it is subsequently treated asan equality constraint with minimisation proceed-ing as indicated above. This process is repeatediteratively, until a minimum can be reached thatviolates no constraints. By this stage several con-straints may be treated as equality constraints andtests have to be made to ensure that they all needto be retained before a minimum can be accepted.

It is always tempting to try and perform con-strained optimization by using penalty functionmethods, which add a penalty to the objective func-tion for violating constraints - such methods aresimple to implement, but tend to spoil the quadraticmodel, particularly in the vicinity of constraintboundaries and also tend to require ad hoc adjust-ment of the strength of the penalties. The methodsused here avoid making problems more non-linearthan they already are.

The practical details of how constrained opti-mization is done efficiently and robustly are quiteinvolved, and the interested reader should consultGill et al. (1981) and references therein.

APPENDIX C:BOOTSTRAP RESTARTING

An often effective approach to dealing with ir-regular objective functions that may have multiplelocal minima is to use bootstrap restarting in con-junction with a minimisation method designed forsmooth objective functions. Given the relationshipbetween bootstrapping and statistical inference, afitting objective based on a bootstrap resample fromthe original data is likely to share the statisticallyimportant features of the original objective func-tion, while differing in details, such as the locationof small scale local minima. Having applied a min-imisation method to the original fitting objectiveq(p) in order to obtain best fit parameter estimatesp, it is sometimes possible to improve these esti-mates by iterating the following simple steps:

1. Sample with replacement from the originaldata y, and use this bootstrap resampled

data, y∗ in place of the original data to con-struct a perturbed objective function q∗(p)(each resampled yi will be accompanied by allassociated information, such as sample time,life history stage etc.). Starting from p ap-ply the minimisation method to q∗ to obtainparameter estimate vector p∗.

2. Starting from p∗, apply the minimisationmethod to the real objective q(p) to obtaina parameter estimate p, if q(p) < q(p) thenset p to p.

The basic idea is that while the global minimum ofthe true and bootstrap objectives will be in muchthe same location, they will have different local min-ima - hence the bootstrap steps can free the op-timization method from local minima of the trueobjective, by moving the parameters away fromthis minimum. The approach is preferable to otherstrategies for randomly jumping out of local min-ima in that the size and direction of the step outof the minima automatically takes account of theshape of the objective. Of course for a smooth wellbehaved objective function, without local minima,the method produces no improvement. Note alsothat general convergence criteria for this approachare difficult to formulate.

partially specified ecological models

Documents