statistical genetics and gaussian stochastic processes

Statistical Genetics and Gaussian Stochastic ProcessesPart I: Statistical genetic papers employing

stochastic process theory. Lange, Kirkpatrick and colleagues,

Blangero, Pletcher and colleagues.

Part II: Mathematical papers on Gaussian stochastic processes, particularly on the special class known as the Ornstein-Uhlenbeck process.

Doob, Chaturvedi, Serfozo, DasGupta

American Journal of Medical Genetics 24:483-491 (1986)

Cohabitation, Convergence, and Environmental Covariances

Kenneth Lange

Department of Biomathematics, School of Medicine, University of California, Los Angeles

Temporal variation in traits has long been a central theme in epidemiology. However, human geneticists have largely avoided this topic. Recently, several authors have shown how temporal variation in relative-to-relative covariances can be accommodated within the framework of variance components analysis. The present paper attempts to clarify the mathematics implicit in their approach. A stochastic mechanism is discussed that causes covariances to converge or diverge exponentially fast as relatives cohabit or lead separate lives.

Key words: environmental covariances, temporal variation, stochastic processes

INTRODUCTION

In a recent series of papers, Hopper and Mathews [1982, 19831 and Hopper and Culross [ 19831 have introduced some valuable new notions into variance components analysis of pedigree data. One of their innovations has been an explicit parameterization of how the trait covariance of two people varies as a function of their cohabitation history. The solution of these authors is surprisingly simple, intuitively appealing, and potentially widely useful. The current paper is an attempt to provide a rigorous foundation for their approach and to clarify its range of validity.

As often happens as science becomes more and more splintered, it is possible to adapt an appropriate mathematical model from another discipline. In this case, physics and communication engineering offer just the right ideas. My purpose here is not to develop new mathematics or statistics but to reinterpret an existing model. This reinterpretation involves viewing the temporal changes contributed by the environment to a quantitative trait as evolving according to an Ornstein-Uhlenbeck diffusion process. When several people are simultaneously followed, the process is both multidimensional and nonhomogeneous in time. Although the mathematics for such

Received for publication June 28, 1985; revision received September 9, 1985

Address reprint requests to Kenneth Lange, Department of Biomathematics, School of Medicine, University of California, Los Angeles, CA 90024.

0 1986 Alan R. Liss, Inc.

484 Lange

processes is a little esoteric, explicit and intuitively reasonable results emerge. For instance, the environmental covariance between two relatives who separate converges exponentially fast to 0. On the other hand, if they reunite, then it converges exponentially fast to a limiting positive value that depends on their degree of propinquity. In much the same vein, covariances between present and past trait values show an exponential decay to 0, thus raising some interesting possibilities for the modeling and analysis of longitudinal data.

In practice, the Ornstein-Uhlenbeck model should provide more accurate estimates of heritability and more insight into the temporal plasticity of a trait. It is relatively parsimonious in the number of necessary parameters. These parameters can be viewed as covariance components in an overall model that also includes genetic components. The number of parameters will depend on the detail with which the cohabitation histories of pairs of individuals are tracked.

The principal drawback of the model is the extra burden of data acquisition that it imposes. Whether investigators want to carry this burden will depend on the importance of the trait and on their intuitive judgment about the relevance of the model. It should be emphasized that the model is phenomenological and not mechan- istic. The physical motivation in the next section offers at most an analogy to how environmental effects could operate. If the model does not accurately capture trait covariances, then it should not be applied. This might be the case in the presence of cultural inheritance, gene-environment interaction, or assortative mating [Cavalli- Sforza and Feldman, 1981; Cloninger et al, 1979; Karlin, 1979; Rao et al, 19791. However, it is clear to me that the model offers a valuable paradigm for many traits.

In addition to the works of Hopper and Mathews (1982, 19831 and Hopper and Culross [1983], authors like Eaves et a1 [1978] and Province and Rao [1985] have pursued largely descriptive models of how age affects the correlations between relatives. The soon to be published paper of Eaves et a1 [1986] apparently presents some interesting parallels to the current paper.

MODEL FORMULATION

The fundamental units of observation for the model are pedigrees. The term pedigree should be interpreted loosely. For example, adopted children can constitute valid members of a pedigree even though they are not related to anyone else in the pedigree. The crucial distinction is that two individuals from different pedigrees always have independent realizations of the trait of interest. Let us now single out a pedigree of n people. Suppose Ztj represents some measurable quantitative trait for the jth person of this pedigree at time t. The actual values for some or all of the people will be observed at only a few specific times t. When the trait is under combined genetic and environmental control, Zt, might be decomposed as the sum

Zq = Ytj + X,j,

where Ytj is the genetic contribution and Xtj the environmental contribution. Typi- cally, Ytj and Xtj are also assumed to be uncorrelated. The remainder of this paper is primarily concerned with the random vector X, of environmental contributions.

One reasonable way of viewing the evolution of Xt is to look at how X + h differs from X, for h small and positive. For a diffusion process, one postulates that

Environmental Covariances 485

tend to limits as h -+ 0. In these expressions, E denotes expectation and Cov denotes covariance. The vector expectation in (1) and the matrix covariance in (2) are calculated conditional on the current trait values X,. In the physics and engineering literature, it is comonly assumed that the limit of (1) is given by -A (t)X,, where X (t) is an n X n matrix, and the limit of (2) is given by an n X n covariance matrix a (t). At this level of generality, it is possible to characterize completely the process X, starting from an initial X, at the earliest time to [Jazwinski, 1970; Maybeck, 1979, 1982; Van Kampen, 19811. I will review the formal mathematical results after some motivating remarks. The only case of real interest to us is when h (t) is diagonal having jth diagonal entry h,(t)>O. Without this diagonal assumption, many of the formulas that follow would be very formidable to evaluate. A negative value of h,(t) is inconsistent with the following physical motivation.

In the traditional application, X, is viewed as the velocity of a particle and -h,(t)XtJ as a frictional or dampening force that tends to slow the particle to a 0 equilibrium velocity. The variance aJt) describes the random effect of innumerable collisions with small neighboring particles. Each collision imparts a nonsystematic infinitesimal increment to X,. If aJt) = 0 and hJ(t) is a positive constant, X, simply decays exponentially fast to 0. On the other hand, if hJ(t) = 0 and aJJ(t) is a positive constant, then XtJ acts like Brownian motion with no systematic tendency to return to 0 and a variance that grows to 00. Positive constant values for both h,(t) and aJt) produce in the long run a stochastic equilibrium with mean 0 and variance finite.

What is the relevance of this to modeling the environmental contribution X, for a trait like serum lead level [Hopper and Mathews, 1983]? It might be plausible to assume for some traits that good environments get worse and bad environments get better and that the farther away Xt, is from its population mean the stronger the restoring force is. Superimposed on this deterministic trend are many random increments with no systematic tendency. These increments are small enough not to produce abrupt changes in X, but come often enough to have a large cumulative effect.

This leads to the question of how all n components of X, evolve in concert. Here is where a deeper understanding of the covariance matrix a(t) becomes crucial. If aJk(t) = 0, then the small increments X, + hj - XtJ and X, + hk - Xtk are totally uncorrelated. At the other extreme of a,k(t) = a,, (t)%qk(t)%, they are perfectly correlated. In general, a,k(t) should capture the current cohabitation status of the pair of individuals j and k. I will expand on this point after summarizing some mathematical results.

SUMMARY OF MATHEMATICAL RESULTS

It can be shown that X, is a Gaussian process provided that some initial XQ is Gaussian [Jazwinski, 1970; Maybeck 1979, 1982; Van Kampen, 19811. Now a Gaussian or multivariate normal random vector is uniquely determined by its mean

486 Lange

and covariance. To express these quantities, suppose the n X n matrix function Q(t,s) furnishes the solutions to the initial value problems:

Q(s,s) = I identity matrix

Then it can be shown that

where the superscript * denotes matrix transpose. Furthermore,

for t 2 s. When h(t) is diagonal, then Q(t,s) is also with jth diagonal entry

exp[-!3j(u)duI+ (6)

If h(t) is not diagonal, then Q(t,s) can be very formidable to evaluate. We will assume from now on that

E(Xtj) = 0

for all t and j. Even if this does not hold initially, it will hold asymptotically because of the exponential decay displayed in Eqs 3 and 6. Eq 4 written in components is

for Q diagonal. The covariance in Eq 7 simplifies considerably when the ratio of ajk(s) to hj(s) + is a step function. For instance, suppose < tl < * * < tm = t and

whenever ti - < s < ti and ci is some constant. Setting

di eXp - @ [ hj (U) -k hk (U) ldu,

Eq 7 becomes


Observe that a!jk(s)/[hj(s) + xk(s)] will be a step function whenever a,k(s), Aj(s), and hk(s) are all step functions. The case m = 1 corresponds to this ratio being constant and is of special interest. Then Eq 8 reduces to

Because of the evident exponential decay in Eq 9,

If

to begin with, then Cov(Xt,, Xtk) is time invariant. These remarks particularly apply when j = k. Finally, observe that Eq 5 yields

which shows the characteristic exponential decay between present and past trait values,

SPECIFICATION OF THE COVARIANCE MATRIX

entries It simplifies matters to replace a(t) by its associated correlation matrix p(t) with

Pjk (t) a j k (t) [ ajj(t) a k k (t) 1 '*

p(t) is a covariance matrix in its own right. The problem now is to specify the current infinitesimal cohabitation status &(t) of j and k. It is hard to imagine circumstances

488 Lange

TABLE I. Possible Equivalence Relations

j - k if and only if 1)j = k 2) j and k belong to the same household 3) j and k work or go to school together 4) j and k are genetically identical, eg, MZ twins 5) j and k have the same mother 6) j and k have the same father 7) j and k share both parents

under which Pjk(t) 3 0 fails. An attractive definition for Ojk(t) is the fraction of time that j and k spend together averaged over some relatively short time interval around the current time t. In general, this will be difficult to measure exactly, and some less detailed way of summarizing the cohabitation status would be useful.

Here is where the partition structures introduced by Lange and Boehnke [ 19831 come into play. Let - define an equivalence relation on the pedigree. Recall that - has three properties: 1) j - j for all j , 2) j - k implies k - j , and 3) j - k and k - m imply j - m. An equivalence relation partitions the pedigree into distinct classes or blocks of equivalent people. Table I lists some examples of equivalence relations.

Some of these relations can be a little ambiguous. For instance, someone might have two jobs or a small child might go to neither school nor work. In the second case, it might be reasonable to classify the child and his caretaker in the same block. In the list of Table I, relations 1-3 appear to be the most pertinent to the amount of time two individuals spend together. Corresponding to the ith of these equivalence relations, there is an obvious correlation matrix Qi whose entry in row j and column k is 1 or 0 depending on whether j - k or not. Just imagine one and the same random variable assigned to each member of a block. Random variables assigned to distinct blocks are independent. It is also true that any convex combination

Q = C ri Qi 1

q Z O C Q = l 1

of these matrices is a correlation matrix. This suggests writing

1

1

0 otherwise

j and k in same household at time t

j and k in same school or workplace at time t

i- r2 0 otherwise

using equivalence relations 1-3 and nonnegative constants r1-r3. Eq 11 makes Pjk(t) an easily summarized step function depending on the joint life history of j and k.


It is worth noting that not every correlation matrix with nonnegative entries can be represented as a convex combination of 0 - 1 partition matrices. The reader can check that

I \ 1 a JG? JG? 0 1

a 1 0 O < a < l 1 a J G ? I I

furnishes a counter example when n = 3 . Observe that there are five 0 - 1 partition matrices for n = 3 .

The cohabitation history summarized in P(t) puts no restrictions on the magnitude of the individual variances aj(t). There is flexibility here to allow individuals to react more or less strongly to the same random environmental stresses. ajj(t) might conceivably depend on sex or age. If this is true, then heritability might vary from individual to individual.

O < a < l

EQUILIBRIUM AND INITIAL CONDITIONS

Appropriate stationarity conditions are

and, when ajk(t), Aj(t), and &(t) are constant,

In many models, aj(t) will be constant, but ajk(t), j # k, will not. Thus, variances can remain fixed while covariances change over time.

Long-run behavior gives some clues about appropriate initial distributions. When A(t) and the diagonal elements of a(t) are constant, the simplest assumptions are stationarity and initial independence for each person j at the time of his brith, say so. Hence

Another possibility is to take XSoj = XsOk for k the mother of j. A stranger possibility yet is to assume that there is some uterine environment provided by the mother that evolves independently of her own external environment. This would make a child correlated at birth with his sibs but not with his mother. Whatever the precise assumptions about initial conditions, Eq 10 shows that they become less relevant in the presence of large decay constants Aj(t).

DISCUSSION

When it comes to the question of possible parameters, one is confronted with an embarrassment of riches. For instance, Aj(t) and aj,(t) could be constants that

490 Lange

depend on sex. As another possibility, X,(t) and aJJ(t) could be step functions of age. This will make Var(X,) vary with age unless aJ(t)/XJ(t) is held constant. If aging makes a person respond more slowly to change, then X,(t) should diminish with t. Other possible parameters are the coefficients r l , r2, and r3 that enter into Eq 11. Here the constraint r l , r2, + r3 = 1 must be obeyed.

Hypotheses about various parameters can be tested by the likelihood ratio criterion. For example, one might wish to test whether the decay constant X, is independent of sex. Testing whether a common decay constant X is 0 is not a reasonable procedure. As mentioned above, all variances can tend to 03 in this case.

Longitudinal data afford an excellent means of estimating XJ(t). There is a hazard in doing this, because X, evolves continuously, and the only way to accommodate measurement error at close times is to force XJ(t) to be large. Perhaps a separate error variance would mitigate this problem. Longitudinal data also provide an excellent means of distinguishing assortative mating from cohabitation. Married couples should converge and separated couples diverge.

The model lends itself nicely to maximum likelihood methods for estimating parameters [Hopper and Mathews, 1982, 1983; Lange and Boehnke, 1983; Elston and Stewart, 1971; Lange et al, 1976; Moll et al, 1979; Ott, 19791. Within this context, genetic variance components and mean components can be added to the model. When prime candidates for direct environmental influences are suspected, these can be measured and included as mean components. The environmental and genetic variance components should then accurately capture the residual variation not explained by the mean components. Maximum likelihood techniques afford the opportunity of estimating all these parameters simultaneously. Given parameter estimates, it is then possible to look systematically for outlier pedigrees and outlier individuals and to test empirically the overall appropriateness of the model. Numerical implementation of the model is clearly feasible although harder than for models that are linear in all parameters. Note the relevance of Eq 8 to numerical implementation. Obviously, it will take more experimentation to determine the most useful ways of parameterizating the model. As usual, some balance must be struck among the competing demands of biological realism, model parsimony, and computational feasibility.

ACKNOWLEDGMENTS

Michael Boehnke, Richard Dudley, and Patricia Moll read various drafts of the manuscript and made many helpful suggestions for improving it. This research was supported by University of California, Los Angeles; Massachusetts Institute of Tech- nology; NIH Research Career Development Award KO4 HD 00307; and NIH grant AM33329.

REFERENCES

Cavalli-Sforza LL, Feldman MW (1981): “Cultural Transmission and Evolution: A Quantitative Ap- proach. ” Princeton, New Jersey: Princeton University Press.

Cloninger CR, Rice J, Reich T (1979): Multifactorial inheritance with cultural inheritance and assortative mating. 11. A general model of combined polygenic and cultural inheritance. Am J Hum Genet 31: 176-198.


Eaves LJ, Last KA, Young PA, Martin NG (1978): Model-fitting approaches to the analysis of human

Eaves LJ, Long J, Heath AC (1986): A theory of developmental change in quantitative phenotypes

Elston RC, Stewart J (1971): A general model for the genetic analysis of pedigree data. Hum Hered

Hopper JL, Culross P (1983): Covariation between family members as a function of cohabitation history. Behav Genet 13:459-471.

Hopper JL, Mathews JD (1982): Extensions to multivariate normal models for pedigree analysis. Ann Hum Genet 46:373-383.

Hopper JL, Mathews JD (1983): Extensions to multivariate normal models for pedigree analysis. 11. Modeling the effect of shared environments in the analysis of variation in blood lead levels. Am J Epidemiol 117:344-355.

behavior. Heredity 411:249-320.

applied to cognitive development. Behav Genet 16: 143-162.

21:523-542.

Jazwinski AH (1970): “Stochastic Processes and Filtering Theory.” New York: Academic Press. Karlin S (1979): Models of multifactorial inheritance: 11. The covariance structure for a scalar phenotype

under selective assortative mating and sex-dependent symmetric parental-transmission. Theor Pop Biol 15:356-393.

Lange K, Boehnke M (1983): Extensions of pedigree analysis. IV. Covariance components models for multivariate traits. Am J Med Genet 14513-524.

Lange KL, Westlake J, Spence MA (1976): Extensions to pedigree analysis. 111. Variance components by the scoring method. Ann Hum Genet 39:485491.

Maybeck PS (1979, 1982): “Stochastic Models, Estimation and Control, Vols 1 and 2.” New York: Academic Press.

Moll PP, Powsner R, Sing CF (1979): Analysis of genetic and environmental sources of variation in serum cholesterol in Tecumseh, Michigan. V. Variance components estimated from pedigrees. Ann Hum Genet 42:343-354.

Ott J (1979): Maximum likelihood estimation by counting methods under polygenic and mixed models in human pedigrees. Am J Hum Genet 31:161-175.

Province MA, Rao DC (1985): Path analysis of family resemblance with temporal trends: Applications to height, weight, and Quetellet Index in Northeastern Brazil. Am J Hum Genet 37: 178-192.

Rao DC, Morton NE, Cloninger CR (1979). Path analysis under generalized assortative mating. I. Theory. Genet Res 33: 175-188.

Van Kampen NG (1981): “Stochastic Processes in Physics and Chemistry.” Amsterdam: North Holland.

Edited by James F. Reynolds

J. Math. Biol. (1989) 27:429-450 Journal of

Mathematical Wology

9 Springer-Verlag 1989

A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters

Mark Kirkpatrick ] and Nancy Heckman 2 Department of Zoology, University of Texas, Austin, TX 78712, USA

2 Department of Statistics, University of British Columbia, Vancouver, BC V6T 1W5, Canada

Abstract. Infinite-dimensional characters are those in which the phenotype of an individual is described by a function, rather than by a finite set of measurements. Examples include growth trajectories, morphological shapes, and norms of reaction. Methods are presented here that allow individual phenotypes, population means, and patterns of variance and covariance to be quantified for infinite-dimensional characters. A quantitative-genetic model is developed, and the recursion equation for the evolution of the population mean phenotype of an infinite-dimensional character is derived. The infinite- dimensional method offers three advantages over conventional finite-dimensional methods when applied to this kind of trait: (1) it describes the trait at all points rather than at a finite number of landmarks, (2) it eliminates errors in predicting the evolutionary response to selection made by conventional methods because they neglect the effects of selection on some parts of the trait, and (3) it estimates parameters of interest more efficiently.

Key words: Quantitative g e n e t i c s - Infinite-dimensional c h a r a c t e r s - Growth - - Morphological shapes - - Reaction norms

1. Introduction

Many phenotypic attributes of organisms can be quantified by a single measurement. These include the characters most often studied by quantitative geneticists, such as body weight in animals and crop yield in plants. But other types of characters are intrinsically more complex. One example is the growth trajectory of an organism. A growth trajectory represents an individual as a function that relates the age of an individual to some measure of its size. Since the size of the individual for each different age can be thought of as a different character, and since there are an infinite number of ages, growth trajectories can be thought of

430 M. Kirkpatrick and N. Heckman

as infinite-dimensional characters. Two other examples of infinite-dimensional characters are morphological shapes and norms of reaction. A morphological shape is a curve or a surface in space. The complete description of the shape of a clam shell, say, requires information not just on its length and width, but on the spatial locations of each of the infinite number of points on its surface. A reaction norm is the function that describes what phenotype will be produced by a given genotype in each of a number of environments. Examples include the locomotory performance of an ecotherm as a function of its temperature, and crop yield as a function of soil water potential. When the environmental variable can change in a continuous way, as temperature and water potential can, each genotype's reaction norm is a function consisting of an infinite number of points.

Many evolutionary questions implicitly involve understanding how infinite- dimensional characters evolve. Studies of evolutionary allometry (e.g., Thomp- son 1917; Huxley 1932; Gould 1977) are concerned with the size relations between parts of morphological shapes as they become larger or smaller. A complete theory of allometry would allow one to predict how shapes change as their overall size evolves. Ecologists and physiologists are interested in reaction norms because they describe the tradeoffs inherent in being ecologically specialized or generalized. A theory for the evolution of reaction norms would specify whether increasing adaptation to one range of environmental conditions will result in increased or decreased adaptation to other environments (e.g., Huey and Hertz 1984).

This paper will introduce methods for analyzing the evolution of infinite- dimensional characters. The approach taken is an extension of standard quantitative genetices (see, e.g., Falconer 1981; Bulmer 1985). The models are phenotypic, in that they are based on observable properties of the population but make no explicit reference to the underlying changes in allele frequencies. We first develop the mathematical notation and methods needed to describe and analyse infinite-dimensional characters. These results are then used to derive a model that predicts the evolutionary change in the mean of an infinite-dimensional character. The infinite-dimensional approach developed here is found to have several advantages over conventional quantitative-genetic methods, including a more complete description of the trait, greater accuracy in predicting the evolutionary response to selection, and increased efficiency in estimating genetic parameters. Applications of this model and methods for estimating its parameters will appear in later publications.

2. Mathematical background

The model we develop relies on concepts from functional analysis and stochastic processes. We review here the basic ideas relevant to our model for readers not familiar with those areas; others may wish to skip to Sect. 3. Introductions to these methods can be found in Reed and Simon (1980, Chaps. 1, 2, and 6) and Doob (1953).

Evolution of infinite-dimensional characters 431

Throughout, we will use growth trajectories as a concrete example to illustrate the ideas, but morphological shapes and reaction norms can be treated in the same framework with appropriate modifications.

2. I. Notation

The growth trajectory of an individual is a function defined by the individual's size through time. We will denote the size of the individual at age x by 3(x). The mean growth trajectory, written 5, is simply defined such that ~(x) is the average size of individuals in the population at age x. The mean growth trajectory is directly related to the vector of character means used in standard finite-dimensional quantitative genetics: if we consider a finite number of ages for which g is the vector of mean sizes, the mean size of individuals at age xi is ~(xi).

The variation of the growth trajectories of individuals about the mean growth trajectory can be quantified by the covariance function, ~3. The value of the function ~(x~, x2) specifies the covariance between the size of randomly chosen individual at age xl and the size of the same individual at age x2. The value of ~(xl , Xl) is the variance of body size among individuals at age Xl. Defined in this way, the function ~3 is a phenotypic covariance function, as it describes variation and covariation of the growth trajectory phenotypes. A covariance function, which is a bivariate function (that is, a function of two continuous variables), is the analog of the covariance matrix that is widely used in multivariate quantitative genetics and statistics. If Po is the phenotypic covariance between size at ages x,- and xj, then Po = ~(x~, xj). A more rigorous definition of covariance functions is given in Sect. A.1 of the Appendix.

2.2. Algebraic operations

We begin by reviewing basic definitions regarding the multiplication, transposition, and inversion of functions. We will assume here that the arguments for these functions range over the (possibly infinite) interval from a to b, and that for any univariate function used the integral S~ 02(x) dx is finite.

2.2.1. Multiplication. The (inner) product of two univariate functions t) and 3 is the scalar:

fob 0~3 = 0(~)3(~) d~. (1)

Two functions are said to be orthogonal if their inner product is zero. Multiplying a bivariate function 9.1 and a univariate function 3 produces the univariate function 9.18, with

(9.13)(x) = N(x,~)3(~ ) d~. (2)


2.2.2. Transposition. The transpose of a bivariate function 9.1 is written 9.I r and is defined such that

9~T(Xl, X2) = ~I(X2, Xl) , (3)

We also allow for the transpose of a univariate functions, although transposition leaves the values of a univariate function unchanged.

2.2.3. Inversion. The inverse, d - 1 , of the operation described by Eq. (2) is a rule that associates with each univariate function 9/3 its pre-image, 3- That is

d-19.13 = 3, (4)

for all univariate functions 3- Not all covariance functions have inverses; those which cannot be inverted are said to be singular. Further discussion of inverses is given in Sect. A3 of the Appendix.

2.3. Gaussian distributions

A central assumption of quantitative genetics is that characters are Gaussian (i.e., normally) distributed in a population. This assumption can be extended to infinite-dimensional characters in a natural way, and will be used in the following section in the development of a quantitative genetic model.

Let 3/ be a set of functions, for example the growth trajectories of individuals in a population, where the subscript i denotes the ith individual in the population. These functions are said to be Gaussian distributed if, when we choose any finite sets of ages xl, x2, 9 9 9 Xk and evaluate 3; at those points, the resulting values are distributed as a k-variate normal (Parzen 1962). I f the growth trajectories in a population are Gaussian distributed, then the sizes of individuals at any given age will be univariate normally distributed. (The converse, however, is not true: a normal distribution of sizes at each age taken separately is not sufficient to guarantee that the functions maintain a Gaussian distribution.) In an empirical study of an infinite-dimensional trait, the investigator is free to transform the data so that they meet the requirements of normality (see Wright 1968). A Gaussian distribution of functions is completely determined by its mean function (e.g., the mean growth trajectory ~) and its covariance function (e.g., the phenotypic covariance function ~). Section A2 of the Ap- pendix shows that given an arbitrary ~ and ~3, a corresponding Gaussian distribution exists with this mean and covariance, provided ~ satisfies some very general conditions.

2.4. Eigenfunctions

A covariance function can be decomposed into its component eigenfunctions and eigenvalues just as a covariance matrix can be written in terms of its eigenvectors and eigenvalues. The eigenfunctions of a covariance function 9.I are defined


(Parzen 1962) as the functions ~ that satisfy the relation

that is,

~a b ~ ( x , ~)~1i(~) d~ = ,~il]li(x), (5)

where Oi (r is not simultaneously zero for all values of ~. The scalar number 2~ is known as the eigenvalue associated with the eigenfunction r The eigenfunctions, which are orthogonal to each other, play the same role in infinite-dimensional theory that principal components do in standard quantitative genetics and statistics. A useful fact from the spectral theorem of linear operations (Lyusternik and Sobolev 1968) is that a covariance function can be rewritten in terms of an eigenfunction expansion:

~[(Xl, X2) = ~ ~i~li(Xl)Oi(X2). (6) i=1

Necessary conditions on the covariance function are given in the Appendix (Sect. A3).

2.5. Orthogonal function expansions

It is often very useful for both theoretical and empirical analyses of infinite- dimensional characters to represent functions in terms of their orthogonal function expansions. This method allows infinite-dimensional data to be approximated by a finite number of data, and provides algorithms for performing the algebraic operations described in Sect. 2.2. This is accomplished by: (1) transforming the functions (e.g., mean growth trajectories and covariance functions) into matrices, (2) manipulating these matrices using the conventional methods of linear algebra, and (3) reverse-transforming the resulting matrices into functions.

The method depends on orthogonal functions. A set of functions thi (x) is said to be orthogonal if thi is orthogonal to thj (that is, their inner product is zero) when i r It is convenient to use orthogonal functions that have been normalized so that the inner product of thi with itself is equal to one for all i.

An orthogonal basis is a set of orthogonal functions with the property that any univariate function 3 (such as a mean function) can be written

3(x) = ~ [G]z-th~(x), (7) i~l

where the [cz]~ are coefficients (see, e.g., Abramowitz and Stegun 1965). The coefficients [G]i are uniquely determined once the function 3(x) and the

set of orthogonal functions the(X) have been specified. These coefficients are calculated from the relation

[G], = 3~th, = 3(~)th,(~) d~. (8)


The [Cz]i's constitute the entries of a vector cz that has an infinite number of elements. The vector cz is referred to as the coefficient vector associated with the function 3. To repeat, the coefficient vector depends on the choice of orthogonal functions as well as on the function 3.

A covariance function, which is a function of two variables, can be expanded in a similar way:

92(xl,x2) = Y, ~ [CA,jO,(x,)Os(xO. (9) i=lj=l

The coefficients [CA]0 are calculated using the relation

[ c A , j =

= ~b,(r ~2)qgj(~2) d~, d~ 2. (10)

The coefficients form the elements of a symmetric matrix CA which has an infinite number of elements. This is referred to as the coefficient matrix associated with the function 92. Notice that the eigenfunction expansion of Eq. (6) is a special case of the orthogonal function expansion (9) in which the off-diagonal coefficients (those for which i # j ) of the coefficient matrix are zero.

These relations are useful because they allow us to perform the algebraic operations described in Sect. 2.2 by working with the vectors and matrices of coefficients. While the full expansion of a univariate or bivariate function generates a vector or matrix of infinite dimensions, the problem is made tractable by truncating the vectors and matrices to finite dimensions. A univariate function can, under quite general conditions, be approximated to arbitrary accuracy by a polynomial or a trigonometric series of finite degree (the Weierstrass and Fourier theorems; Apostol 1975). The functions therefore can be approximated by partial expansions in terms of orthogonal polynomials or trigonometric functions (the first few terms of the righthand sides of Eqs. (7) and (9)), which then generate finite matrices of coefficients that can be handled by conventional matrix methods. The results from this procedure depend not only on the functions themselves, but also on the choice of the family of orthogonal functions used in the expansion. Bounds on the size of error introduced by the choice of orthogonal functions can be determined by analytic techniques.

Several uses of this method are summarized in the following, which can be proven directly using the orthogonality property. In the following, CA will be the coefficient matrix associated with the covariance function 92, CB with ~3, and so forth. We will assume that a complete set of orthogonal functions ~b~ have been specified.

2.5.1. Addition and multiplication. The coefficient matrix of the sum of two functions 92 and ~ is equal to the sum of the coefficient matrix of 92 and the matrix of ~. Therefore

(92 "-~ ~)(Xl, X2) = ~, ~ [C A -~ CB]ij~)i(Xl)~)j(X2). ( l l ) i=lj=l


Similarly, the coefficient vector for the sum of two univariate functions is equal to the sum of their coefficient vectors. The product of two functions can be determined likewise: the coefficient matrix (or vector) of the product of two functions is equal to the matrix product of their respective coefficient matrices (or vectors).

2.5.2. Eigenfunctions. The coefficient vectors for the eigenfunctions Oi of a function 9/are equal to the corresponding eigenvectors of the coefficient matrix of 9/. If we denote the ith element of the j th eigenvector of CA as [co],-/, then

~0j(x) = ~ [e~lu~i(x). (12) i = 1

The corresponding eigenvalues 2i of the bivariate function 9/and its coefficient matrix CA are equal.

This result gives a very useful algorithm for finding approximations to the eigenfunction of a covariance function: (1) the function is approximated by a truncated expansion using Eq. (10) to produce the coefficient matrix CA, (2) the eigenvectors and eigenvalues of the coefficient matrix are determined using standard methods of linear algebra, and (3) the eigenvectors are used in Eq. (12) to produce approximations to the eigenfunctions of the original covariance function.

This concludes the development of the notation and methods used to describe the evolution of infinite-dimensional characters. We now turn to a model for the evolutionary change in the mean phenotype in a population.

3. A quantitative genetic model

The standard quantitative genetic theory for the simultaneous evolution of multiple characters (Magee 1965; Lande 1979; Falconer 1981) can be extended in a straightforward way to infinite-dimensional characters. Here we develop a dynamic model for the evolution of the mean function. Applications of this model to data will be considered in a later paper.

3.1. Assumptions

A fundamental concept of quantitative genetics is that the phenotype of an individual can be defined as the sum of an additive genetic component and an uncorrelated nonadditive-genetic component. Applying this concept to infinite- dimensional characters, the size of an individual in the population at age x can be written

3(x) = g(x) + e(x), (13)

where g is the additive genetic component and e the residual nonadditive component of the individual's phenotype. The components g and e are assumed


to be Gaussian (normally) distributed in the population, consistent with models of polygenic inheritance (Lande 1979; Bulmer 1985). At the outset of a generation (i.e. before selection acts), g is distributed in the population with mean function ~ and covariance function (5, while e is distributed with mean function ~(x) = 0 for all x and covariance function ~. The phenotypes in the population therefore are distributed with a mean growth trajectory g and a phenotypic covariance function ~ such that ~(xl, x2) = ~(Xl, x2) + ~(xl, x2).

Equation (13) is equivalent to the statistical decomposition of an individual's phenotype that is standard in quantitative genetics. The value of g(x) is the individual's additive genetic effect or "breeding value" for size at age x; that is, its average contribution to the size of its offspring at age x if it were mated at random to a large number of other individuals in the population (Falconer 1981; Bulmer 1985). The value of e(x) is the residual component of the individual's size at age x, caused by the effects of the environment and genetic dominance. By considering an individual's size at each different age as a different character, Eq. (13) is seen to be equivalent to the standard statistical model for multiple quantitative characters (see Lande 1979, p. 405).

The description of an individual's phenotype can be applied to characters other than growth trajectories. Consider a simple morphological shape, such as the outline of an insect's wing. By establishing a landmark in the interior of the wing as the origin for a set of polar coordinates, the shape of the wing can be described by the radial distance from the origin to the wing margin as a function of angle. Angle then assumes the role of age as the argument in Eq. (13). In the case of reaction norms, an environmental variable such as temperature takes that role. While the form of selection acting on the reaction norms will depend on whether the trait is fixed during development or continuously varies throughout life as continuous change, the patterns of inheritance can be described using the same framework. Thus the inheritance-and evolution of growth trajectories, morphology, and reaction norms can be treated using these infinite-dimensional methods.

We will assume that the population size is sufficiently large that the effects of random genetic drift are negligible, that generations are discrete and nonoverlapping, and that within each generation reproduction occurs after selection has finished.

3.2. Evolutionary dynamics of the mean

Selection determines the set of individuals that survive and reproduce to constitute the next generation. Under the standard assumptions of quantitative genetics (specifically, that mutation and recombination do not alter the mean breeding value of a population, and that the additive genetic component is gaussian distributed), the mean phenotype in the next generation is equal to the mean additive genetic value of the breeding individuals (see Bulmer 1985). This fact is used in the Appendix (Sect. A4) to show that the evolutionary change in the mean phenotype between generation t and generation t + 1 is


A~, = ~t+l - ~t = f f i~ - 1~,, (14)

where ~t is the mean growth trajectory among individuals at the start of generation t (before selection acts). The function ~t is the selection differential, which is defined (Falconer 1981) as the difference in the mean phenotype of individuals before and after selection within generation t. Denoting by ~* the mean growth trajectory of individuals that survive selection and reproduce, the selection differential is

~, (x) = 5" (x) - ~, (x). (15)

Equation (14) can be thought of as the linear regression of the mean growth trajectory phenotype among the survivors onto the additive genetic value of those phenotypes. It is the infinite-dimensional analog of the standard matrix equation for the joint evolutionary change of a finite number of discrete characters (Magee 1965; Lande 1979; Falconer 1981):

A z t = G P ls~, (16)

where A ~t is the vector of change between generations t and t + 1 in the means of the characters, G is the additive genetic covariance matrix, P-~ is the inverse of the phenotypic covariance matrix, and st is the vector of selection differentials for the characters in generation t.

The selection differential does not directly reflect the action of selection on the phenotypes because of phenotypically correlated responses. That is, even when selection is not acting directly on the size at age x, the corresponding selection differential ~t(x) in general will not be zero because selection acting on the phenotypes at other ages will generate a selection differential at x through the phenotypic correlations. It is therefore convenient to use the selection gradient (Lande and Arnold 1983) to visualize the way in which selection is acting. The selection gradient for infinite-dimensional characters is defined as

fl = ~ - ~;

that is,

fl(X) = ~ (~1T ~/t~ i )~/i (X), (17) i=1

where the 2i's and ~b;'s are the eigenvalues and eigenvectors of ~3. We have now dropped the subscript t for convenience. In order for this series to converge we require that E (~T~/)~i) 2 be finite (meaning that strength of selection is finite). The selection gradient fl is a measure of the force of directional selection. A positive value for fl(x) implies selection favors larger than average individuals at age x, whereas a negative value implies smaller individuals are favored. With this notation, the per-generation change in the mean growth trajectory can be written simply as

,~g = ~fl, (18)

which is clearly analogous to the matrix expression for the finite-dimensional case, Ag = Gfl (Lande and Arnold 1983). Making use of the eigenfunction


expansion (Eq. (6)), Eq. (18) can also be written:

Ag = ~ O~i(~oTfl)q~i, (19) i~l

where the ~i's and g0i's are the eigenvalues and eigenfunctions of the genetic covariance function ~. If the selection gradient has components that correspond to eigenfunctions of ~ for which there is very little or no genetic variance (that is, the associated eigenvalue ~ is very small or zero), then the inner product rp T/~ that appears in Eq. (19) will be small or zero. In this situation, there will be little or no evolutionary response towards those particular deformations of the mean growth trajectory.

These results assume that ~ is not singular. The Appendix (Sect. A5) generalizes these equations to cases in which ~ is singular (and provides a similar result for the finite-dimensional case in which P is singular).

3.3. Comparison to conventional matrix methods

Geneticists who have studied infinite-dimensional characters have discretized them by taking measurements at landmark points along the continuum. For example, in their study of genetic variation in mouse growth, Riska et al. (1985) measured individuals at 9 different ages in their development. These data were then treated with standard methods designed for analysing a finite number of discrete characters. For example, in order to characterize the genetic variation in the growth trajectory, these workers calculated the eigenvectors and eigenvalues (that is, the principal components and their loadings) for the 9 • 9 genetic covariance matrix they obtained from the data. The results developed above suggest that the growth trajectories might alternatively be analysed by infinite- dimensional methods. The underlying covariance function can be estimated from the data matrix by fitting orthogonal functions to the entries in the matrix. The eigenfunctions and eigenvalues then can be calculated from the estimated covariance function using the methods described earlier.

The infinite-dimensional alternative offers three advantages over the conventional approach. The first is the obvious point that the infinite-dimensional method produces a description at every point along the continuum of the character, interpolating between the landmark points at which the data were actually obtained. This by itself is a minor advantage, since some form of curve-fitting can be used to interpolate between the points of a finite-dimensional analysis.

The second advantage appears when predicting how the mean of a population will evolve in response to selection. If selection is actually acting at all points on an infinite-dimensional character (for example, at all ages of a growth trajectory), the finite-dimensional formula for the evolution of the mean (Eq. (16)) will produce an inaccurate prediction even for the landmark points at which the measurements were taken. The problem is that there will be correlated response from selection at points along the trait that are not included in the


analysis. This introduces errors for the same reason that selection on characters left out of the analysis of multivariate selection on a finite set of characters does (Lande and Arnold 1983; Turelli 1985). Thus an approach like the one developed here seems necessary. The infinite-dimensional method predicts the evolutionary response to selection by interpolating the parameters of selection and inheritance between the landmark points, taking into account the spacing of those points (e.g., the ages at which growth measurements are taken). This should often result in a more accurate estimate of the response to selection that does the conventional, finite-dimensional approach.

The third advantage of the infinite-dimensional method involves statistical efficiency. Given data on more and more points along the continuum of the character, the finite- and infinite-dimensional analyses will asymptotically converge on the same results. An important question is which method converges more rapidly. While we have not yet completely answered this question, numerical results suggest that the infinite-dimensional method may prove to be substantially better than the finite-dimensional method in this regard. We offer the following numerical example.

Imagine that we want to describe the relative amounts of additive genetic variation associated with the first and second eigenfunctions of the growth trajectories in a population. (This is equivalent to comparing the loadings on the first and second principal components of the genetic covariance matrix.) The relative efficiency of the two methods can be compared with the following exercise. A " t rue" covariance function is assumed over a finite square interval. Next, a lattice of n 2 evenly spaced points is laid down over the square, and the covariance function is evaluated at these points. This produces an n x n data matrix which we treat as the data that would have been produced by an error-free experiment that measured the character at n evenly-space landmark points (for example, n evenly-spaced ages along a growth trajectory). These data then can be analysed by both the finite- and infinite-dimensional methods for comparison.

We have performed this exercise numerically using 4 particular covariance functions:

(5(Xl, x2) = cos(rclxl - x2[) + 1, (20a)

(5(Xl, x2) = sech[3(xl - x2)], (20b)

(5(xl, x2) = exp[ - 2(xl - x2)2], (20c)

Iti(xl, x2) = exp[ - 2(xl - x2)4], (20d)

where Xl and x2 range from 0 to 1. These functions were evaluated with n = 5, 9, 17, and 33 points. In order to compare the two approaches, we examined the estimates they gave of the first two eigenvalues. For the finite-dimensional analysis, the eigenvalues of the resulting data matrices for each n were calculated using a standard computer algorithm. For the infinite-dimensional analysis, two-dimensional Legendre polynomials were fit to the data matrices to produce an estimate of 15, the covariance function. We used polynomials of degree n - 1,


0 .5 -

0 .4 .

1.7j = =

1.6 y

f

0.3 - | 9 | . 9 .

a o 10 20 30 40

1.35

1.25

f

~2 0.52

0.50

0.48

0.46 b o

= | 9

10 20 30 40

1.55' = = =

1.45.

I / 0.5 /

~ 0.4

0.38 -I 9 C 0 10 20 30 40

1.9

1.8'

1.7'

0 . 4 "

0.3

0.2 d o

= = 9

/ ,

9 = 9 = .

lo 20 4o

n #3 Fig. 1. Two largest eigenvalues of the covariance functions of Eqs. (20a~l) plotted against the number of data points, n, as estimated by the finite-dimensional method (open squares) and the infinite-dimensional method (closed squares)

which allowed the estimate of t~ to pass through each of the data points. Eigenvalues were then calculated from the coefficient matrix of the polynomials, as described above (Sect. 2.5.2).

The results are shown in Fig. 1. It is clear that the infinite-dimensional method converges much more rapidly to an asymptotic value, which we presume to be the actual eigenvalue. We analysed several additional cases in which the sample points (ages) xi are unevenly spaced, using the same covariance functions.


Those results show an even greater advantage to the infinite-dimensional method. This advantage may result from the fact that the infinite-dimensional method takes into account the spacing of the sample points when the orthogonal functions are fit to the covariance function, whereas the finite-dimensional method ignores not only the spacing but even the ordering of the points. Based on these numerical analyses, we speculate that the infinite-dimensional method may be generally superior to the finite-dimensional method for estimating eigenvalues.

We also compared the two methods with regard to how rapidly they converge on estimates of the eigenfunctions. In contrast to the results from the eigenvalues, the two methods appear to estimate the eigenfunctions with similar efficiency.

These numerical results do not prove the infinite-dimensional method is always or even generally more efficient, but they are suggestive. The success of the method depends upon using a set of orthogonal functions to fit the data that span the space of functions that is spanned by the underlying covariance function. If covariance functions that appear in biological data are reasonably smooth (as seems likely), then smooth orthogonal functions, such as polynomials, may work well. Clearly this question needs further study.

4. Discussion

Having developed this model for the evolution of infinite-dimensional characters, it is now appropriate to assess the limitations of this approach. There are two general categories of questions: those general to Gaussian quantitative-genetic models, and those specific to this particular model.

This model is a direct extension of standard quantitative-genetic models for the evolution of a finite number of characters, and as such it shares their strengths and their weaknesses. Two questions are frequently raised about this class of models. The first concerns the linearity of the response to selection. An important outcome of the assumption that additive genetic effects and nonaddi- rive effects are both normally (Gaussian) distributed is that the magnitude of the evolutionary response is a linear function of the force of selection in the preceding generations (see Eqs. (14) and (18)). The assumption of normality for the distributions of g and e is a major assumption. On the phenotypic level, it is known that the distribution of many univariate characters is either approximately normal or can be rendered so by a suitable transformation of the data (Wright 1968). Although this is consistent with the normality assumptions, it does not show directly that the constituent additive and nonadditive effects are each normal. The models are, however, supported by experimental results from short-term selection experiments on single characters that typically match the model predictions reasonably well (Falconer 1981). A more serious potential problem is the extrapolation of the normality assumption to multiple characters. The assumption of multivariate normality is far more stringent than that of univariate normality, and has not been systematically tested with suitably large data sets.


The normality assumption has been justified on theoretical as well as empirical grounds. Several models based on detailed assumptions concerning the action of mutation, recombination, and selection lead to a normal distribution of additive genetic effects at the level of the character (as opposed to the effects at individual loci) (Fisher 1918; Kimura 1965; Bulmer 1985; Lande 1980; Falconer 1981; Barton and Turelli 1987). Other kinds of genetic models, however, do not, and instead produce a nonlinear response to selection (Robertson 1977; Bulmer 1980; Barton and Turelli 1987). Even when the distribution of allelic effects at individual loci departs from normality, however, approximate normality of the overall additive genetic effects (as we have assumed here) can be maintained if there are a large number of loosely linked loci contributing to the trait (Bulmer 1980; TureUi 1986).

To conclude our discussion of the normality assumption, we note that it has some support from both empirical and theoretical studies, but certainly deserves further evaluation. Even if the normality assumption is violated, however, our model may produce a reasonable approximation to the true evolutionary dynamics.

Confusion has arisen from the criticism that some quantitative-genetic models fail to correctly account for evolution of the genetic variances and covariances. Under certain sets of assumptions, the variance-covariance structure remains approximately constant (Bulmer 1985; Lande 1980). This is not true, however, for quantitative-genetic models based on other genetic assumptions (Barton and Turelli 1987). We choose to avoid this issue, and make no claims or assumptions regarding the dynamics of the genotypic and phenotypic covariance functions ffi and ~. So long as the assumptions of the model are met, the dynamic equations for the evolution of the mean phenotype (Eqs. (14) and (18)) hold. Given empirical information or a model for the evolution of the covariance functions, new values for ~ and ~ can be used in each generation. Even if they do not remain constant, under quite general conditions the variance structure is likely to evolve slowly relative to the mean, and so assuming constancy of ffi and

may give reasonably accurate predictions for several to many generations. A question specific to the infinite-dimensional model that naturally arises is

whether it is possible to substitute conventional finite-dimensional methods for those developed in this paper. We find there are three arguments that favor using the infinite-dimensional methods whenever the trait of interest is inherently infinite-dimensional. First, these methods give a complete description of the character at all points along its continuum, rather than at fixed landmark points alone. Second, the infinite-dimensional method leads to a correct prediction of the response to selection where the finite-dimensional method does not because it neglects the effects of selection on all points of the trait other than the landmark points. Third, numerical examples suggest that the infinite-dimensional method may be substantially more efficient in estimating parameters of interest from finite data sets. Since growth, shape, and other infinite-dimensional characters are of such widespread interest to biologists, the development of methods such as these for describing their variation and predicting their evolution is an important goal.


Acknowledgments. We are very grateful to M. Bulmer, J. Felsenstein, C. Pease, S. Sawyer, M. Slatkin, and B. Walsh for discussions. We thank N. Barton, D. Lofsvold, T. Nagylaki, T. Price, M. Turelli, S. Via, M. Wade, and two anonymous reviewers for comments on earlier drafts of the paper. This research was supported by N.S.F. Grant BSR-8604743 to M.K. and N.S.E.R.C. Grant A-7969 to N.H.

Appendix

This Appendix develops five points. Section A1 gives the definition of a covariance function. Section A2 discusses the existence and integration of gaussian processes, and Sect. A3 develops the conditions for the existence of the inverse of the operator associated with a covariance function. Section A4 applies these results to our genetic model. Finally, Sect. A5 extends the genetic model to cases in which the phenotypic covariance matrix or function is singular.

The following assumptions will be made throughout. The arguments of all functions (either univariate or bivariate) lie in some interval, which may be infinite (e.g., the positive real line). A univariate function u is called an L 2 function if S u2(x) dx is finite. The function u need not be continuous. Two L 2 functions, Ul and u2, are considered to be the same if j" (Ul(X) - u2(x)) 2 dx = O, which can occur even if Ul does not equal u2 for, say, a finite collection of x 's . Thus the value of an L 2 function is not uniquely defined for each x. In the same way, we can say that a sequence of L 2 functions, u, , converges to a n L 2 function, u, if ~ ( u , ( x ) - u ( x ) ) Z d x converges to zero. This does not imply that u,(x) converges to u(x) for all x. In what follows, we will differentiate between L 2 equivalence/convergence and pointwise equivalence/convergence for each x through the phrases "in L 2'', or "for all x " and "pointwise".

A 1. Definition of a continuous covariance function

A continuous convariance function ~3 is any bivariate function with

~3 symmetric, i.e. ~3(Xl, X2) = ~(X2, Xl) for all Xl, x2,

~3 positive semi-definite, i.e. nT~3u t> 0 for all univariate L 2 functions u, (A1)

~3 is continuous.

Except for the continuity condition, this is analogous to the definition of a covariance matrix (Rao 1973).

A2. Existence and integration of gaussian processes

Let 3 be an L 2 function and ~3 be a continuous convariance function as defined above. Then there exists a Gaussian process 3 with covariance function ~ and mean function ~. (See D o o b 1953, Theorem 3.1 of Chap. 2.)


In Sect. A3 we will require that expressions of the form uT3 = S u(r162 d~, with u in L 2, can be mathematically defined. There are two ways of doing this. In the first method one fixes a "realization", 3", of 3 and requires that ~n(r d~ be finite. This will hold for all 3* (except for a collection of realizations which has a zero probability of occurring) provided

~(r 4) d( is finite. (A2)

(See Doob 1953, Chap. 2, Theorems 2.5 and 2.7). In the second method, one views ur3=~u(r as a random variable which "exists" provided its variance is finite. This will hold for a particular u if

ur~3u = [ ~ u(~l)~(~l, r162 d~l d~2 is finite,

and will hold for all n in L 2 provided

~ [~3(r r 2 d~ dr is finite. (A3)

(See Davis 1977, Proposition 2.3.6.)

A3. The inverse of the operator associated with a covariance function

Let ~ be a continuous covariance function and ~ a univariate function. The goal of this section is to specify conditions on ~ and g that guarantee that the equation

~u = g (A4)

has a solution u in L 2, with (A4) holding pointwise. First, sufficient conditions on ~ and g will be given that guarantee that (A4) holds in L 2, and then additional conditions will be given that guarantee that (A4) holds pointwise. The solution u will be denoted ~-~g.

Assume that ~ satisfies (A1), (A3), and

if ~3f = zero function, then f = zero function. (A5)

Let ~bi be the eigenfunctions of ~, with corresponding eigenvalues 2i , as defined in Sect. 2.4. Since by definition an eigenfunction is never the zero function, condition (A5) implies that none of the eigenvalues are zero. This fact alone would, in the finite-dimensional matrix case, guarantee a solution to (A4). More is needed, however, in our infinite-dimensional setting. Conditions (A1), (A3), and (A5) are sufficient to guarantee that the eigenfunctions of ~ form an orthogonal basis for the collection of all L 2 functions (by Sect. 3.2 and a modification of the argument in the example on p. 131 of Lyusternik and Sobolev 1968). Thus any L 2 function g can be written in terms of the orthogonal eigenfunctions qJ/:

~(X) : ~ [Cg]i~i(X), i = 1


with convergence of the series holding in L 2. (Note that this is a special case of Eq. (7), with the coefficients [cg]i given by Eq. (8).) Given that (A3) holds,

$(xl , x2) = ~ 2Aki(xl)~ki(x2) in L 2, i = 1

where the 2g's are the eigenvalues of $ as given in Eq. (6). The solution to (A4) is then given by

u(x) = Z ([eg]~/2i)O~(x) in L 2. (A6) i=1

Unfortunately, this sum may not always converge, and we require that

([Cg]ff2i) 2 be finite. (A7) i = 1

Under this assumption the solution to (A4) exists and is given by (A6). For the second part of the argument, suppose that $ is a continuous

covariance function satisfying (A1) and (A3) and that 9 and u are univariate functions with ~3u = 9 in L 2. We seek conditions on ~3 and g so that we can say that the equality holds pointwise. If we assume that ~ is continuous, then we need only find conditions on ~3 so that the function ~3u can be defined pointwise, with this pointwise definition being continuous. Assume that

[~[~(X, ~)]2 d~ is finite for all x, (A8.a)

and, for all x0, there exists a constant C = C(xo) with

I~( x, Y) I <" I C(xo) ~(Xo, Y) I ( A8. b)

for all x in some neighborhood of x0 and for all y. (Conditions (A.8a) and (A8.b) automatically hold if ~ is continuous and we restrict ourselves to x and y in a closed and bounded interval.) By the H61der inequality and condition (A8.a), ~(x, y)u(y) is an integrable function of y, and thus we can define ~u pointwise in a natural way. Conditions (A8), the continuity of ~3, and the dominated convergence theorem guarantee that this natural definition of ~u is continuous.

A4. The genetic model

Suppose that the phenotype of an individual drawn at random from the entire population is given by the gaussian random process 3, with

3(x) = g(x) + e(x),

where g and e are independent gaussian processes. As described in Sect. 3.1, 9 is the additive genetic component and e the residual nonadditive-genetic component of the individual's phenotype. The expectation of 3 is ~ (the population mean function), and the expectation of e is the zero function. The covariance functions of 3, 9, and e are $, 15, and ~, respectively, with $ = 15 + ~. Assume


that these covariance functions satisfy (A1), and in addition that ~5 satisfies (A8) and ~ satisfies (A2), (A3), (A5), and (A8).

Let 3" be the average phenotype of an infinite subset of individuals from the population, and let ~* be the average additive genetic component of those individuals. We assume that the selected group o f individuals breed randomly among themselves. A standard assumption of quantitative genetics is that the genetic processes of mutation and recombination do not alter the mean genetic component of a population from one generation to the next. The mean genetic component among the offspring therefore is equal to that of the breeding adults. Thus the mean phenotype of the offspring is simply the expected value of g in the selected group when 3*(x) is known for all values of x. This expected value will be denoted E(~*(x)13]. To prove Eq. (14) we must show that

13"] -- 3(x) + (3* - T -l x '

where ffix(y ) = ~(x, y) and ~ - 1 denotes the inverse operator described in Sect. A3. By an argument in Parzen (1962) Sect. 3,

E[~*(x) [3"1 = 3(x) + u~(3* - 3), (A9)

where Ux is any L 2 function with

~3Ux(y) = ~ ( y ) for all y.

Different values of x, however, will give rise to different functions Ux. For notational convenience, we will drop the subscript form Ux from this point on. Since ffi satisfies (A8), ~x can be viewed as a univariate L 2 function of y when x is fixed. Thus by the results of Sect. A3, for a given x the required function u exists provided

([Cgx],/2i) 2 is finite, (A10) i = 1

where 2,- are the eigenvalues of ~ corresponding to the eigenfunctions ~i and

[Cgx] i = ~ ~)(X, ~)~i (~) d~.

Writing

i=l

= ~i~ - l(~jx(y )

completes the proof. Actually, if we view E[fi*(x)13* ] as a random variable, rather than as a

particular realization of a random variable, assumption (A2) on ~ can be dropped, and condition (A10) can be weakened slightly to the requirement that

[Cgx]~/)~i be finite. (A11) i = l

This follows since the expression ( 3 * - ~ ) r ~ - I ~ x , when viewed as a random


variable, can be mathematically defined provided its variance is finite. This variance is simply

Var[ur~] = Var[ ~ u(03(~) d~]

= ~ ~ U(~l) Cov[~(~l), ~(~2)]u(~2) d~l d~2

Using the definition of u and the fact that the $~ s are the eigenfunctions of ~, this variance may be written

Var[uT~] = U(~I) ~'~ ~ ~[~(~1, ~2)([Cgx]i/~i)~i(~2) d~l d~2 i=l

i = 1

i=1j=1

= ~ [Cgx]~i/~i, i = l

which we require to be finite in (A11).

A 5. Singular covariance funct ions and matrices

In the preceding we assumed the covariance function ~3 (or covariance matrix P, in the finite-dimensional case) can be inverted; that is, that ~3 is nonsingular. Since there is no mathematical or biological reason to justify this as an assumption, the question naturally arises as to what the evolutionary dynamics are when ~3 is singular. We generalize the genetic model to cases in which the phenotypic covariance structure is singular. We will first work with the conventional finite-dimensional case, then sketch the analogous proof for the infinite- dimensional case.

We will establish that when the phenotypic covariance matrix P is singular, the evolutionary change in the vector of means for the traits given by the equation

A~ = G P s. (A12)

This is analogous to Eq. (16), with the difference that the inverse P-1 is replaced by P - , a "generalized inverse" of P. A generalized inverse of P is defined (Rao and Mitra 1971; Rao 1973; Sects. lb.5 and 4a.3) as a matrix which has the property

P P - P = P. (A13)

In the special case that P is non-singular, the generalized inverse P is unique and equals the proper inverse P-1. The generalized inverse is no longer unique


whenever P is singular. From the definition of (A13) it can be verified that a generalized inverse of P is

P - = ~ (l/t~i)pipTi , (A14)

where 2i is the ith eigenvalue and Pe the ith eigenvector of P, and the summation extends over only those values of i for which ~.g is nonzero. This provides an algorithm for calculating a generalized inverse P - .

We now verify Eq. (A12). From Eq. (A9) we see that the evolutionary change in the vector of means can be written

Ag = U(g* - ~), (A15)

where U is any matrix that satisfies the relation

U P = G. (Am6)

Thus (A12) is established if it can be shown that

U = G P - (A17)

satisfies (A16). The definition of a generalized inverse given by (A13) implies that

P - P = I + R, (A18)

where I is the identity matrix and R is a matrix that is orthogonal to P (that is, P R = 0). Substituting (A17) into the lefthand side of (A16) and making use of (A18), we see that

G P - P = G(I + R). (A19)

The fact that R is orthogonal to P implies that it is also orthogonal to G. This can be seen by letting the vector R~ equal the ith column of R. Recalling that P = G + E, we have

a f e R i = e f G e i -~ R f e a i . (A20)

The lefthand side of (A20) vanishes because P and R are orthogonal. Since G and E are positive semi-definite, both terms on the righthand side of (A19) must vanish for their sum to vanish. This shows that Ri is orthogonal to G for any value i, and consequently the product G R must vanish. Thus (A19) can be written

G P - P = G, (A22)

which shows that (Al6) satisfies (A15), and so establishes (All) . We now sketch the analogous proof for the infinite-dimensional case. The

notation and assumptions of Sect. A4 hold with the exception of assumption (A5). Suppose that ~ 's eigenfunctions and associated eigenvalues have been reordered and relabelled so that $; and ~ refer to eigenfunctions and values with ~.e nonzero, and the 9tj's denote eigenfunctions associated with zero eigenvalues.


Paralleling (A14), for each x define

t) = y, (1/L )(

with the equality holding in the variable t in L 2. Since the O~s and 9~j's form a basis for L 2 one can expand ~x in a series:

% ( 0 : ( ;,lg + ( % )9 j(0.

Since ~ is non-negative definite and ~ r % = 0, by an argument analogous to that following (A20)

Thus

=

This holds pointwise in x and t, by continuity of (5 and ~. Therefore, Eq. (14) can be generalized to the case of a singular phenotypic

covariance function ~3 to give

A g = ~ N - ~. (A23)

Typically one wishes to focus attention on the quantity ~ - ~ . This is given by

- ~(Y) =/~ = Z (~b 5 ~/2i)~i (Y). (A24)

For /~ to be defined for a particular ~, we require (as in Sect. 3.2) that E (OT~/2i) 2 be finite.

References

Abramowitz, M., Stegun, I. A.: Handbook of mathematical functions. New York: Dover 1965 Apostol, T. M.: Mathematical analysis, 2nd edn. Reading, Mass.: Addison-Wesley 1975 Barton, N. H., Turelli, M.: Adaptive landscapes, genetic distance and the evolution of quantitative

characters. Genet. Res. 49, 157 173 (1987) Bulmer, M. G.: The mathematical theory of quantitative genetics. Oxford: Oxford University Press

1985 Davis, M. H. A.: Linear estimation and stochastic control. London: Chapman and Hall 1977 Doob, J. L.: Stochastic processes. New York: Wiley 1953 Falconer, D. S.: Introduction to quantitative genetics, 2nd edn. New York: Longman 1981 Fisher, R. A.: The correlation between relatives on the supposition of Mendelian inheritance. Trans.

Royal Soc. Edinburgh 52, 399-433 (1918) Gould, S. J.: Ontogeny and phylogeny. Cambridge, Mass.: Belknap 1977 Huey, R. B., Hertz, P. E.: Is a jack-of-all-temperatures a master of none? Evolution 38, 441-444

(1984) Huxley, J.: Problems of relative growth. London: MacVeagh 1932 Kimura, M.: A stochastic model concerning the maintenance of genetic variability in quantitative

characters. Proc. Nat. Acad. Sci. 54, 731 736 (1965) Lande, R.: Quantitative genetic analysis of multivariate evolution, applied to brain:body size

allometry. Evolution 33, 402-416 (1979)


Lande, R.: The genetic covariance between characters maintained by pleiotropic mutations. Genetics 94, 203-215 (1980)

Lande, R., Arnold, S. J.: The measurement of selection on correlated characters. Evolution 37, 1210-1226 (1983)

Lyusternik, L. A., Sobolev, V. J.: Elements of functional analysis. New York: Unger 1968 Magee, W. T.: Estimating response to selection. J. Anita. Sci. 24, 242-247 (1965) Parzen, E.: An approach to time series analysis. Ann. Math. Stat. 32, 951489 (1962) Rao, C. R., Mitra, S. K.: Generalized inverse of matrices and its applications. New York: Wiley 1971 Rao, C. R.: Linear statistical inference and its applications. New York: Wiley 1973 Reed, M., Simon, B. Methods of modern mathematical physics: I. Functional analysis, 2nd edn. New

York: Academic Press 1980 Riska, B., Atchley, W. R., Rutledge, J. J.: A genetic analysis of targeted growth in mice. Genetics

107, 79-101 (1984) Robertson, A.: The non-linearity of offspring-parent regression. In: Pollak, E., Kempthorne, O.,

Bailey, T. B. (eds.) Proceedings Int. Conferrence on Quantitative Genetics, pp. 297-306. Ames: Iowa State University Press 1987

Thompson, D. W.: On growth and form. Cambridge: Cambridge University Press 1917 Turelli, M: Effects of pleiotropy on predictions concerning mutation selection balance for polygenic

traits. Genetics 111, 165-195 (1985) Turelli, M.: Gaussian versus non-gaussian genetic analyses of polygenic mutation-selection balance.

Karlin, S. Nevo, E. (eds.) Evolutionary processes and theory, pp. 607~28. New York: Academic Press 1986

Wright, S.: Evolution and the genetics of populations, vol. 1. Genetic and biometrical foundations. Chicago: University of Chicago Press 1968

Received January 18, 1988/Revised April 3, 1989

Copyright 0 1990 by the Genetics Society of America

Analysis of the Inheritance, Selection and Evolution of Growth Trajectories

Mark Kirkpatrick,” David Lofsvold*” and Michael Bulmer? *Department of Zoology, University $Texas, Austin, Texas 78712, and ?Department of Statistics, Oxford University,

Oxford OX1 3TG, England Manuscript received March 15, 1989

Accepted for publication December 18, 1989

ABSTRACT We present methods for estimating the parameters of inheritance and selection that appear in a

quantitative genetic model for the evolution growth trajectories and other “infinite-dimensional” traits that we recently introduced. Two methods for estimating the additive genetic covariance function are developed, a “full” model that fully fits the data and a “reduced” model that generates a smoothed estimate consistent with the sampling errors in the data. By decomposing the covariance function into its eigenvalues and eigenfunctions, it is possible to identify potential evolutionary changes in the population’s mean growth trajectory for which there is (and those for which there is not) genetic variation. Algorithms for estimating these quantities, their confidence intervals, and for testing hypotheses about them are developed. These techniques are illustrated by an analysis of early growth in mice. Compatible methods for estimating the selection gradient function acting on growth trajectories in natural or domesticated populations are presented. We show how the estimates for the additive genetic covariance function and the selection gradient function can be used to predict the evolutionary change in a population’s mean growth trajectory.

A predictive theory for the evolutionary response of growth trajectories to selection is an impor-

tant goal of both evolutionary biologists and breeders. Evolutionary biologists are interested in growth trajectories because of their impact on morphology, size- mediated ecological interactions, and life-history characters (e.g. , EBENMAN and PERSON 1988). Animal and plant breeders are concerned with growth trajectories because of the potential to increase the economic value of domesticated species by altering growth patterns through artificial selection (e.g., FITZHUCH 1976). Since the sizes of individuals of the same age in a population typically vary in a quantitative (continuous) manner, it has long been recognized that quantitative genetics provides appropriate methods for the study of the inheritance and evolution of growth trajectories.

We have recently extended the classical quantitative model for the evolution of multiple characters to “infinite-dimensional” traits such as growth trajectories in which the phenotype of an individual is represented by a continuous function (KIRKPATRICK 1988; KIRKPATRICK and HECKMAN 1989). In those earlier studies, we assumed the parameters of inheritance and selection were known quantities. Our goal in this paper is to develop methods for estimating those parameters and to show how they can be used to

’ Present address: Department o f Biology, Franklin and Marshall College, P.O. Box 3003, Lancaster, Pennsylvania 17604.

The publication costs of this article were partly defrayed by the payment ofpage charges. This article must therefore he hereby marked “aduertjsement” i n accordance with 18 U.S.C. 61734 solely to indicate this fact.

Genetics 124: 979-993 (April, 1990)

analyze the evolution of a population’s mean growth trajectory. While the example we discuss deals with body size, the methods apply to any ontogenetic process. More generally, the infinite-dimensional method can be extended to other kinds of traits in which an individual’s phenotype is a continuous function, such as reaction norms and morphological shapes, and so may be of use in a variety of biological contexts. An analysis of several data sets using these methods, and a discussion of the evolutionary implications of the results, is planned for a later publication.

The infinite-dimensional model is motivated by the fact that growth trajectories do not immediately fit into the framework of conventional quantitative genetics, which treats the evolution of a finite number of traits. This is because growth trajectories are continuous functions of time, so that a trait in an individual requires an infinite rather than finite number of measurements to fully describe. The infinite-dimensional model offers several advantages over earlier attempts to adapt quantitative genetics to growth trajectories (KIRKPATRICK and HECKMAN 1989). First, it predicts the evolution of the full growth trajectory (rather than at a set of landmark ages) without making a priori assumptions about the family of curves that are evolutionarily possible. Second, it provides a method for analyzing patterns of genetic variation that reveal potential evolutionary changes in the growth trajectory for which there is and for which there is not substantial genetic variation. Third, the method appears to have reduced biases in the estimates of the genetic variation (and therefore of the

980 M. Kirkpatrick, D. Lofsvold and M. Bulmer

response to selection) when compared with the alternative approaches. Two additional advantages appear from the methods presented in this paper: the spacing of the ages at which the data are collected is correctly accounted for (even when the spacing is uneven), and it allows one to project the evolution of the growth trajectory even when the data on selection and inheritance are collected at two different sets of ages.

We will begin with a brief review of the infinite- dimensional model, then turn to the problem of estimating the parameters of inheritance. T o make the ideas concrete, we will illustrate the genetic methods using a subset of the data of RISKA, ATCHLEY and RUTLEDGE (1984) on the genetics of growth in ICR randombred mice. In a detailed study, these workers measured 2693 individuals at weekly intervals between ages 2 weeks and 10 weeks in a half-sib breeding design. For the sake of simplicity, we will use only their data on male body weight at ages 2, 3 and 4 weeks in the following. Next the estimation of the parameters of selection is treated. Last, we show how the estimates of the genetic and selection parameters can be used to project the evolution of the population's mean growth trajectory.

Some of the statistical methods developed in this paper can involve a substantial amount of computation. Computer programs for these operations are available from the first author.

THE INFINITE-DIMENSIONAL MODEL

The mean size of unselected individuals in a cohort through time is referred to as the cohort's mean growth trajectory and is denoted by the function j . Thus the value ofy(a) is simply the expected size of individuals at age a in the absence of selection. Selection within a given generation generally will cause the observed mean size of individuals to differ from the mean growth trajectory and also will produce an evolutionary change in the mean growth trajectory between that generation and the next.

The evolutionary change in2 can be determined by extending the standard theory of quantitative genetics to infinite-dimensional characters (KIRKPATRICK and HECKMAN 1989). The growth trajectory of an individual can be thought of as the sum of two continuous functions. The first of these represents the additive genetic component of the growth trajectory inherited from the individual's parents. The second component is attributable to environmental effects, such as nutri- tion, and to genetic dominance. The additive and nonadditive components are defined to be independent of each other and are assumed to be multivariate normally distributed in the population. This assumption is standard in quantitative genetic models of multiple characters. The normality of genetic effects is consistent with a variety of forms of genetic varia-

tion at the individual loci involved provided the number of loci is moderate to large and linkage is loose (BULMER 1985, Chap. 8; BARTON and TURELLI 1989). When genetic effects are not normal it may be possible to transform the scale of measurement to one in which they are (for example, by taking logarithms) (WRIGHT 1968, Chap. 10; FALCONER 198 1 Chap. 17). Last, we assume that the growth trajectory is autosomally inherited, that the effects of random genetic drift, mutation, epistasis, and recombination on the mean growth trajectory are negligible compared with selection, and that generations are nonoverlapping.

When selection acts on the sizes of individuals, the evolutionary dynamics of the mean growth trajectory are described by the equation

Aj(a) = La'''''' Y(a, x)P(x) dx, (1)

where A;(a) is the evolutionary change in the mean size of individuals of age a following a single generation of selection, Yis the additive genetic covariance function, and P is the selection gradient function (KIRKPATRICK and HECKMAN 1989). Equation 1 can be modified to accommodate situations in which selection acts directly on growth rate rather than size per se; see LYNCH and ARNOLD (1988).

The additive genetic covariance function 59 plays the same role in the evolution of growth trajectories that the additive genetic covariance matrix does in the standard theory of quantitative characters (see LANDE 1979). The value of Y(u~ ,u~) is the additive genetic covariance for size between individuals measured at age a l and those same individuals measured at age a2. The selection gradient function P is a measure of the forces of directional selection acting on body size (LANDE and ARNOLD 1983). The magnitude of P(a) reflects the strength of directional selection acting on body size at age a. A negative value of @(a) indicates selection favors smaller size, while a positive value indicates the converse.

Equation 1 predicts the evolutionary change across only a single generation. In general, it is possible that both the strength of selection and the genetic variation will change from generation to generation. This does not present a problem for Equation 1, however, since new values can be used in each generation. This information can come either from direct estimation of the parameters or from genetic and ecological models that predict how they will change through time. We discuss methods for direct estimation below; theoretical approaches are reviewed by BARTON and TURELLI (1 989) and BULMER (1 989).

Predicting the evolutionary dynamics of the mean growth trajectory thus requires estimating the parameters of inheritance, described by 5f and of selection, described by P. In the next three sections, we discuss

Growth Trajectories , 436 522 424

e =

424 665 558 522 808 665

estimation of .% the analysis of .% and the estimation of p. Before proceeding, we pause here to describe the notation conventions used throughout the paper. Continuous functions, such as the mean growth trajectory? and the additive genetic covariance function -'are denoted with a script font. Vectors and matrices are written in bold. We use a hat or a tilda to signify estimates of quantities; for example, the estimate of an additive genetic covariance matrix is written G .

E S T I M A T I N G T H E : A D D I T I V E G E N E T I C C O V A R I A N C E F U N C T I O N Y

T o estimate the additive genetic covariance function we begin with the additive genetic covariance matrix G familiar from standard quantitative genetics. The sizes of an individual at two ages a ; and ai are considered to be two different characters, and the value of Go is equal to the additive genetic covariance for the sizes of an individual at those two ages. Meth- ods for estimating genetic variances and covariances of multiple characters have been extensivelv developed by animal and plant breeders (FALCONER 198 1 ; BULMER 198.5), and have more recently been applied to natural populations by evolutionary biologists (e.g., ARNOLD 1981: PRICE, GRANT and BOAC 1984; LOFS- VOLD 1986). Given measurements of size at n ages, a? n X n estimated additive genetic covariance matrix G can be calculated. M'e refer to the vector of n ages at which these measurements have been taken a s the age zlector, denoted a.

The entries i n the matrix G provide direct estimates of the additive genetic covariance function Y a t n' points, since G,, = $(a,, a,). The relationship between

98 1

the covariance matrix G and the covariance function !4 is illustrated i n Figures 1 and 2. The values of !4 I)etween the measured ages can be estimated by interpolation using smooth curves. By using smooth curves, we make the implicit assumption that the genetic variances and covariances do not change in a discontinuous fashion. (Our method can be modified to ;lccotnmodate discontinuities produced, for example, by metamorphosis by dividing the growth trajectory into pre- and ~~ost-~lletanlorpllosis periods, and determining the covariances within and between the two periocls.)

A variety of techniques could be used to estimate a continuous covariavce function -V from an observed covariatlce nmtrix G. U'e have chosen to use ;I family of methods that involve fitting orthogonal functions t o the data. The motivation for using this appro;lch for fitting smooth functions to the data rather than some other (such as splines) is that the coefficients derived from fitting orthogonal functions are very useful for analyzing patterns of genetic variation i n the growth trajectory, as we describe below.

A pair of functions 6, and c$~ are said to be normalized and orthogonal over the interval [u, u] if

1'' ~ , ( x ) @ , ( x ) d x = 0 and 1" (#.)?(x) d x = 1 .

Many families of functions that meet these criteria are available. M'e w i l l analyze the mouse data using the wll-studied Legendre polynomials. The choice of dlich family of orthogonal functions to use does not affect the estimates for the covariance function at t!le ages at which the data were taken (the points i n G) . l 'he choice does, however, affect the interpolation and therefore can affect conclusions regarding ages other than those at which the data were collected. (All families of orthogonal polynomials, however, w i l l produce the same estimate for !G if the maximum degree of the polynomials is held constant.) M'e favor polynomials over series of sines and cosines (Fourier functions), for exanqde, because on biological grounds we expect a covariance function for growth to be relatively smooth rather than oscillatory. I n an! event,


the element of arbitrariness introduced by the choice of orthogonal functions decreases as the number of ages at which data were sampled increases.

The jth normalized Legendre polynomial, Pj, is given by the formula

/o\

where [ .] indicates that fractional values are rounded down to the nearest integer (BEYER 1976, p. 439). These polynomials are defined over the interval [- 1, 11, and so u = -1 and v = 1 . From Equation 2 , we find that the first three polynomials are:

4o(x ) = I/&, (34

and

The additive genetic covariance function 9can be approximated to any specified degree of precision using a complete set of orthogonal functions such as Legendre polynomials (COURANT and HILBERT 1953, p. 65). In this form, the covariance between body size at ages a l and a2 is

m m

a2) = C C [GI, +i(af)4j(a2*>, (4) r=O ,=o

where

and urnin and amax are respectively the first (smallest) and last (largest) elements of the age vector. The adjusted age vector a*, calculated from the age vector a using (5) , rescales the ages at which the data were taken to the range of the orthogonal functions. In the case of the mouse data, the age vector is a = [2, 3, 4IT. Thus anlin = 2 and amax = 4, and so the adjusted age vector is a* = [-1, 0 , llT.

The matrix CG in Equation 4 is the coejjcient matrix associated with the covariance function .’9 Its elements are constants that depend both on Yand on the family of orthogonal functions 4 being used (Legendre polynomials, in this example). The full expansion of Equation 4 involves an infinitely large coefficient matrix which can only be estimated with an infinite amount of data. Given measurements on the sizes of individuals at n ages, however, an n x n truncated version of C G can be estimated. We previously found

that using the truncated estimate CG consisting of relatively few dimensions often produces a good approximation (KIRKPATRICK and HECKMAN 1989), and this is our present goal.

We have developed two methods for estimating the coefficient matrix Cc. These correspond to two different ways to estimate the additive genetic covariance function 9 The first method yields what we refer to as a “full” estimate of Y This approach estimates the coefficient matrix in such a way that the corresponding covariance function exactly reproduces the estimated additive genetic variances andAcovariances at the ages that were measured (that is, G). Our second method produces a “reduced” estimate of 52 The motivation for this approach is the fact that any estimate of G includes sampling error. Fitting a function through every point in G causes the sampling error to be included in the full estimate of 9 This noise makes the full estimate of Ysomewhat less smooth than the actual covariance function is. The reduced method finds a smoother and simpler estimate of 59 using information about the sampling error of G: the reduced estimate is the lowest-order polynomial that is statistically consistent with the data. A drawback of this method is that it excludes higher-order terms from the estimate of Yeven when they actually exist if the experiment is not sufficiently powerful to prove their presence. Because of this, we recommend investigators consider both the full and reduced estimates of 9.

The full estimate of 9: The full estimate of the additive genetic covariance function, denoted 2 is found by calculating the coefficient matrix C G whose corresponding covariance function exactly reproduces the estimated additive genetic covariance matrix G. We can write the observed covariance matrix in terms of the orthogonal functions using Equation 4:

G = @ CG aT, (6)

where the matrix @ is defined such that [@Ili =

$,(a?). The matrix C G is the estimate of the Coefficient matrix appearing in Equation 4. It is truncated to dimensions n X n by the finite number (n) of ages represented in the data matrix G. Rearranging Equa- tion 6 , we find an expression that can be used to calculate the estimated coefficient matrix:

C G = 9” G[@T]”. (7)

The matrix C G obtained from this calculation can be substituted into Equation 4 to give a continuous estimate of the covariance function Yfor all ages between the earliest and latest at which the data were taken.

To illustrate, the study of RISKA et al. produced an estimate for the additive genetic covariance matrix of

Growth Trajectories 983

the log of male body weight at 2, 3 and 4 weeks:

[

[ 40(1) 41(1) 4 2 ( 1 ) 1 [

436.0 522.3 424.2

424.2 664.7 558.0 1 G = 522.3 808.0 664.7 .

The elements of 9 are calculated by evaluating the first three Legendre polynomials (Equation 3a-c) at the three points of the adjusted age vector a*:

40(-1) 41(--1) M - 1 ) a) = do@) 4 l ( O ) 4 2 w

0.7071 -1.2247 1.5811 = 0.7071 0 -0.7906 .

0.707 1 1.2247 1.58 1 1 1 The full +mate of the additive genetic covariance

function, 9, is found by plugging these matrices into Equation 7 to obtain CG:

66.5 -1 12.0 6, = 66.5 24.3 -14.0 .

r-yl:.O -14.0 14.5 1 Finally, the full estimate of Yis obtained by substituting C G into Equation 4. This gives

@(a,, a2) = 808 + 71.2(ar + a:)

+ 3 6 . 4 ~ : ~ : - 40.7(ar2a; + ala2 * *2 )

- 215.0(ar2 + a;')

+ 8 1 . 6 ~ ~ a 2 , *2 *2

which is valid for ages between a = 2 and a = 4. The result can be verified by checking that indeed etl = @(ai, aj). The full estimate of the additive covariance function for the mouse data calculated in this way is shown in Figure 2.

The reduced estimate of 9: Our second approach, that of finding a reduced estimate for Y seeks to fit a set of k orthogonal functions to G , where k < n. We denote a reduced estimate of 9 a s 9 and the corresponding reduced estimate of the coefficient matrix as e,. The method, which is described in detail in APPENDIX A, consists of two steps. First, a candidate estimate of Y is constructed using weighted least squares to fit the simplest possible orthogonal function, that in which @ is constant for all ages. Second, this candid3te estimate is tested for statistical consistency with G. T o perform this test we have developed a procedure that produces an approximate x 2 statistjc for the goodness of fit of the reduced estimate to G. If this test shows that 9 is consistent with (that is, it does not differ significantly from) G , then it is accepted. If 9 differs significantly from G , we then consider a more complex reduced estimate by fitting

the first two orthogonal functions to the data. The fit is again tested using the x2 test. The procedure is iterated with successively more orthogonal functions until reduced estimates &. and @ are obtained that are consistent with G. If no simpler combination of orthogonal functions will successfully fit the data, the full estimate consisting of n orthogonal functions will always fit the data perfectly.

Using this method on the mouse data (see APPENDIX A), we find that the least-squares estimate for @ that consists of the first Legendre polynomial, 40, alone is @(al, a?) = 324. This estimate is rejected because the test statistic x2 = 57.3 with 5 degrees of freedom shows the estimate is inconsistent with the data (P << 0.01). The least squares estimate of Yproduced by the first two Legendre polynomials (a constant and a linear term) is

~ ( ~ l , ~ ~ ) = 3 1 2 . 2 - 1 1 . 9 ( a ~ + a ~ ) + 2 4 . 5 ~ : ~ ~ .

This estimate is also inconsistent with G ( x 2 = 38.7, 3 d.f., P << 0.01). Consequently, it is not possible to find a reduced estimate of Yfor this data set: only the full estimate consisting of the first three Legendre polynomials, shown in Figure 2, is statistically consistent with G. In contrast, other data sets (particularly cases in which the number of individuals is smaller and the number of ages is larger than in this example) will often result in a reduced estimate that is consistent with the data.

Analysis of the additive genetic covariance function. The major motivation for using orthogonal functions to estimate 9 is that the coefficient matrix Cc can be used to analyze the patterns of inheritance (KIRKPATRICK and HECKMAN 1989). In particular, the coefficient matrix can be used to calculate the eigenfunctions and eigenvalues of Y

Eigenfunctions are analogous to the eigenvectors (principal components) familiar from the analysis of covariance matrices. Each eigenfunction is a continuous function that represents a possible evolutionary deformation of the mean growth trajectory. Any mean growth trajectory can be thought of as the sum of a population's current mean growth trajectory plus a combination of the eigenfunctions of its additive genetic covariance function. Paired with each eigenfunction is a number known as its eigenvalue. The eigenvalue is proportional to the amount of genetic variation in the population corresponding to that eigenfunction. Eigenvalues (and the eigenfunctions associated with them) are conventionally numbered in order of decreasing size, beginning with the largest.

Eigenfunctions with large eigenvalues are deformations for which the populations has substantial genetic variation. The shape of the mean growth trajectory will therefore evolve rapidly along these


deformations if they are favored by selection. Eigen- functions with very small (or zero) eigenvalues, on the other hand, represent deformations for which there is little (or no) additive genetic variation. If selection favors a new mean growth trajectory that is obtained from the current trajectory by some combination of these deformations, there will be very slow (or no) evolutionary progress towards it. The eigenfunctions and eigenvalues therefore contain information that is of great value in understanding the evolutionary potential of growth trajectories. The ith eigenfunction and eigenvalue are denoted I)i and X i , respectively.

In principle, a covariance function has an infinite number of eigenfunctions and eigenvalues. (Many of the eigenvalues may, however, be zero.) In practice, we are able to estimate only a few of them because experiments give information about the covariance function at only a finite number of points (ages). The number of eigenfunctions and eigenvalues that can be estimated equals the dimensionality of the estimated coefficient matrix, which will be equal to the number of ages at which size was measured when dealing with a full estimate of 9 but will be smaller when using a reduced estimate.

Estimates of the eigenfunctions I)l and eigenvalues X, are calculated from the coefficient matrix CG. The ith eigenfunction is constructed from the relation

Tl-1

$,(a) = C [c+tIl4j(a*), (8) j = O

where [c+,]] is the j th element of the ith eigenvector of CG (KIRKPATRICK and HECKMAN 1989). The ith eigenvalue of Yis identical to the ith eigenvalue of CG. Eigenfunctions are adjusted to a norm of unity by convention in order to allow meaningful comparisons between the eigenvalues. This is conveniently done by requiring that the norms of the eigenvectors c+i equal unity. (Virtually all software packages which compute eigenvalues and eigenvectors do this as a matter of course.) Thus to obtain estimates of the eigenfunctions and eigenvalues, we determine the eigenvectors and eigenvalues of our estimate of the Coefficient matrix CG, then use these in Equation 8. The method can >e applied using either the full coefficient matrix CG or a reduced coefficient matrix

Sampling errors in the estimate of the genetic covariance matrix G produce biases in the estimates of the eigenvalues (HILL and THOMPSON 1978). Al- though the estimate of the arithmetic mean of the eigenvalues (ie., 1/n X i ) is unbiased, the larger eigenvalues are consistently overestimated while the smaller eigenvalues are consistently underestimated. This problem, which is general to all multivariate quantitative genetic studies, becomes particularly obvious in data sets that produce one or more eigenvalue

C G .

estimates that are negative. (Covariance matrices are by definition positive semidefinite, and so have no negative eigenvalues.) HAYES and HILL (1981) proposed transforming the estimate of G using a method they term “bending” in order to remedy this problem. Their method can be applied to G whenever negative eigenvalues are encountered if an estimate of the phenotypic covariance matrix P is available.

Often one would like to know the sampling distributions of the eigenvalues estimated for the additive genetic covariance function. We have developed two methods and describe them in detail in APPENDIX c. The first method constructs separate confidence limits for each eigenvalue by numerical simulation. The approach is to generate ;d simulated covariance matrix whose expectation is G but that includes random deviations in the elements that correspond to the sampling error. The eigenvalues for the coefficient matrix corresponding to each simulated G are calculated. This procedure is iterated many times, and the distribution for each eigenvalue is constructed empirically with the results. The second method uses a chi- squared statistic to test hypotheses about one or more of the eigenvalues. Typically, the hypothesis of interest is whether or not the observed eigenvalues are statistically distinguishable from zero.

We will now illustrate the methods for analyzing genetic covariance functions with the full estimate pf 59 from the mouse data. All three eigenvalues of CG are positive, and so bending the data matrix is unnec- essary. Using a standard computer Rackage, we find that the first (largest) eigenvalue of Cc is XI = 1361, and the eigenvector associated with it is

= [0.995, 0.0504, -0.0831IT.

By substituting this into Equation 8, we obtain the full estimate for the first eigenfunction of 9

$ ] (a ) = 0.7693 - 0.0617~” - 0.1971~*2.

The second and third eigenfunctions are obtained in the same way. The three eigenfunctions are shown in Figure 3. The eigenvalues associated with the eigenfunctions are XI = 1361, X2 = 24.5 and As = 1.5 (Figure 4).

Any conceivable evolutionary change in a population’s mean growth trajectory can be written in terms of a weighted sum of the eigenfunctions. The rate at which a population will evolve from its current mean trajectory to some new trajectory favored by selection is determined by the eigenvalues associated with the eigenfunctions responsible for that change. A large eigenvalue indicates that a change corresponding to that eigenfunction will happen rapidly, while a small (or zero) eigenvalue indicates that the change will be slow (or will not happen at all).

The first eigenfunction is a deformation involving


-2 J , I I

2 3 4

AGE (weeks) FIGURE J."Estimates of the three eigenfunctions and their ei-

genvalues for the additive genetic covariance function %

an overall increase or decrease of size at all ages (Figure 3). The large size of the first eigenvalue indicates that selection will produce rapid changes if this kind of alteration in the mean growth trajectory is favored. The second eigenfunction corresponds to genetic changes that increase (or decrease) size between ages 2 to 3 weeks, and decrease (or increase) size after 3 weeks of age. The third eigenfunction shows a more complex pattern. The second and third eigenvalues, however, reveal that the amount of genetic variation associated with these eigenfunctions is small in comparison with the variation associated with the first eigenfunction. These eigenvalues indicate that the evolutionary response to selection would be two or more orders of magnitude slower for changes involving the second and third eigenfunctions than for those involving the first eigenfunction.

The 95% confidence regions for each of the eigenvalues constructed by the numerical simulation method (described in APPENDIX c) are [ 1 100, 17001 for A,, [ 17, 331 for AB, and [-2.7, 5.11 for A3 (Figure 4). We are therefore quite confident that the large differences between the estimate of the first eigenvalue compared with the second and third are real. The conclusion that the estimate for A3 is not statistically different from zero is confirmed by the chi- squared test (also described in APPENDIX c). The hypothesis that AS equals zero gives x;,, = 0.65 which is not significant (P > 0.1). The hypothesis that both X2

and A3 are zero, however, is rejected (x& = 40.4,

A qualitatively similar picture of the pattern of genetic variation for mouse growth emerges from an analysis of the full data set for ages 2- 10 weeks. This analysis and its evolutionary implications will be discussed in a later publication.

P << 0.0 1).

10000 1-

7;.3

EIGENVALUE FIGURE 4.-The three eigenvalues of the additive genetic covar-

iance function Yand their 95% confidence limits (determined by the numerical simulation method) on linear (above) and logarithmic (bdour) scales. The confidence limits for A3 include zero.

ESTIMATING T H E SELECTION GRADIENT FUNCTION /3

Having developed the methods for estimating the additive genetic covariance function .Y we now turn to methods for estimating the selection gradient function P. The techniques are extensions of the results of LANDE (1 979) and LANDE and ARNOLD ( 1 983). Appli- cations and difficulties with these methods are discussed by ARNOLD and WADE ( 1 984a, b) and MITCH- ELL-OLDS and SHAW (1 987).

Our strategy here is the same as is used to estimate 9 The values of p at a finite number of ages are estimated, and then a continuous function is estimated by interpolating between these points. The selection gradient acting on any trait is defined as the partial regression of the phenotypic value of that character onto relative fitness, holding the phenotypic values for other traits constant (LANDE and ARNOLD 1983). In the context of growth trajectories, the partial regression coefficients of relative fitness at each of several ages ontp size form an estimated selection gradient vector b. The continuous selection gradient function canJhen be estimated by fitting orthogonal functions to b.

A selection gradient function can be written in terms of the same orthogonal functions that were used to describe the additive genetic covariance function:

(KIRKPATRICK and HECKMAN 1989). In (9), cg is the coefficient vector associated with the selection gradient function P. The full estimate of cg that passes

986 M. Kirkpatrick, D. Lofsvold and M . Bulmer

through every point in 6 is found using the relation

to = 0” 6. (10)

The continuous selection gradient function is estimated by substituting the 60 into (9). Alternatively, given informatio? on the errors in the estimates of the elements of b, a reduced estimate of p can be calculated using weighted least squares as described in APPENDIX A.

Estimating the selection gradient function P thus requires an estimate of the selection gradient vector b. The basic methodologies for estimating b are discussed by LANDE and ARNOLD (1983), ARNOLD and WADE (1984a, b), and MITCHELL-OLDS and SHAW (1 987). The methods can be-applied to growth trajectories in two ways. The first requires data on the sizes of individuals at each of several ages and a measure of their lifetime relative fitnesses. The selection gradient vector can then be estimated directly as the partial regressions of size onto relative fitness at those ages. This is the preferable approach, but is limited to cases in which there is data on the lifetime fitnesses of individuals.

In the absence of such data, an indirect method that makes use of data on the effects of size on fecundity and mortality can be used if relative fitnesses are constant in time. Under that assumption, a result from LANDE (1 979) can be extended to show that

6 p(a) da = - In@), s,-

where @ is the population’s mean fitness and (6/i$)ln(W) represents the first variation of ln(v) with respect to2 (see COURANT and HILBERT 1953, p. 184; K. GOMULKIEWICZ, in preparation). Equation I I is analogous to the equation for a finite number of quantitative characters, @ = Vln(m) (LANDE 1979; LANDE and ARNOLD 1983).

We can make use of Equation 11 if we have some understanding of how size affects fitness. If, for example, fecundity and mortality rates are functions only of size and age, then the relationship between the selection gradient and these life history attributes is

O(a> = fdaW ’(a) - fz(a)l*’(a) (1 2) where

and

Here, l(a) is the probability a newborn survives to age a, m(a) and ;(a) are, respectively, the average birth

and mortality rates among surviving individuals at age a, and primes denote derivatives taken with respect to>*(a), the mean size of individuals alive at age a (KIRKPATRICK 1984, 1988). Fitness, on the other hand, may be determined in part by factors other than size and age, such as growth rate. In these cases, Equation 12 can be modified to account for the way in which these other factors determine fitness (LYNCH and ARNOLD 1988).

Using the indirect method of estimating the selection gradient function ,kl depends on evaluating the components of Equation 12 (or its analog, if fitness depends on more than size and age alone). The term m ’(a) is the rate at which the average birth rate of individuals alive at age a changes per unit increase in the mean size of individuals. Given census data about a cohort of individuals at ages ai and a I c l , this term can be estimated using the regression of fecundity on body size, divided by the duration of the interval:

where ai = (a, + a;+1)/2 is the midpoint of the interval between a, and a,+l. Equation 13 is a linear interpolation that attributes the effects of size on birth rate to the midpoint of the interval being measured. The term Cov[m, f ( 4 ] is the covariance between the number of births over the interval and body size among those individuals that survived through the entire interval. The average of an individual’s size at the beginning and at the end of the interval should be used for this purpose. The term u2*(i,) is the mean of the variance in size at the start of the interval and the variance in size at the end of the interval among those individuals that survived throughout the period. Only individuals that survive are used in the calculation because the fecundity of individuals that died in the interval is reduced by the reduced time they had in which to reproduce.

The term ;’(a) in Equation 12 represents the effect of a change in the mean body size on the average mortality rate at age a. This is estimated from the relation:

In (1 4), 2+(a,) is the mean size at age a, of individuals that survive to reach age U ~ + ~ , ~ ( U ~ ) is the mean size of all individuals alive at age a,, and u2*(al) is their variance in size. Equations 13 and 14 follow from Equations 11, 12, and the results of ROBERTSON (1 966) and PRICE (1 970).

The interpolations of Equations 13 and 14 become increasingly accurate as the amount of growth that occurs over the interval becomes small relative to the variation in size among individuals present at the start


of the interval. The remaining terms involved in Equation 12, which are the survivorships and mean birth rates at different ages, can be estimated directly from census data.

Given census data from n times during the ontogeny of a cohort, this method will estimate the selection gradient function at n - 1 ages. These n - 1 point estimates form a selection gradient vector b which can then be used to estimate the continuous selection gradient function /3 via Equations 9 and 10.

PREDICTING THE EVOLUTIONARY DYNAMICS OF THE GROWTH TRAJECTORY

The estimates of the additive genetic covariance function Yand the selection gradient function P can be used directly in Equation 1 to predict Ay, the evolutionary change in the mean growth trajectory. Using Equation 1 directly is awkward, however, because the integration in (1) must be performed for each age a at which Aj((a) is to be evaluated. A method making use of Cc, the coefficient matrix for the additive genetic covariance function, and co, the coefficient vector for the selection gradient function, cir- cumvents this difficulty. Using a result from KIRKPAT- RICK and HECKMAN (1989), the evolutionary change in the mean growth trajectory is

A,.@) = c [cAJ=]di(U), (1 5) 2

where the coefficients cy are given by the matrix equation

CAf = C G C o . (16)

The summation in (15) extends over all i for which [cA,-]' I i s nonzero.

These formulas allow us to estimate the evolutionary change in the mean growth trajectory following one generation of selection. The full or reduced estimates of the coefficient matrix Cc and coefficient vector cp are determined using the methods described in the last two sections. These are used in Equation 16 to estimate cA2 The result is then substituted into (1 5) to give an estimate of the evolutionary change.

Equation 16 can be applied regardless of whether or not the additive genetic covariance function and the selection gradient function were estimated at the same ages: transforming the measurement into loadings on orthogonal functions puts the measurements on the same basis. In the event that the number of ages used to estimate the covariance function differs from the number used to estimate the selection gradient function, CC and 20 will be of different dimensions. Equation 16 can be applied in such cases by truncating the dimensions of the larger one to match those of the smaller.

A difficulty that arises when studying natural pop-

ulations is that ongoing selection makes it impossible to directly observe the unselected distribution of individuals at any given age, since mortality at earlier ages can alter the distribution that will appear at later ages. The observed mean size of individuals surviving to age a , for example, will generally differ fromy(a) because of selection at earlier ages. The same problem appears when one tries to estimate the additive genetic covariance function from data on a population expe- riencing selection. The quantities can, however, be estimated if selection is weak by calculating what the cumulative effects of selection at earlier ages have been on the distribution of sizes among the survivors. The basic methodology has been outlined by LYNCH and ARNOLD (1 988).

DISCUSSION

The infinite-dimensional method for analyzing the evolution of growth trajectories joins two alternative methods in current use. Previous workers either have treated the sizes of individuals at a set of landmark ages as a finite number of traits or have fit parametric families of curves to the growth trajectories. Our alternative offers several types of advantages over those methods, including the ability to treat the full, continuous growth trajectory without making restric- tive assumptions about the form of growth curves that a population's genetic variation will allow (KIRKPAT- RICK and HECKMAN 1989).

Two additional benefits to the infinite-dimensional method appear from the techniques introduced in this paper. First, the method explicitly accounts for the spacing of the ages at which the data were taken. There are advantages to designing breeding experiments with unequally spaced sample intervals. Genetic variances and covariances change rapidly during certain periods of ontogeny, often corresponding to critical events such as weaning (see Figure 2; see also HERBERT, KIDWELL and CHASE 1979; CHEVERUD, RUTLEDGE and ATCHLEY 1983; ATCHLEY 1984). Pe- riods in which the variance structure is changing rapidly should receive a greater sampling effort. (Ideally, the frequency at which data is collected should be proportional to how rapidly the variances are changing at that point in development.) The infinite-dimensional approach allows an investigator to concentrate effort on the critical periods, but also give these measurements the appropriate weights when estimating the population's response to selection.

A second additional benefit to using this approach is that the ages at which the genetic parameters are estimated and the ages at which the strength of selection is evaluated need not be the same. I t may often be the case that logistical reasons make it hard or impossible to take these data at the same ages. This w i l l immediately eliminate the possibility of using con-


ventional quantitative-genetic methods, since they require that the characters on which the genetic and selection parameters are measured are homologous.

The price paid for these advantages is that our method relies on an assumption of infinite-dimensional normality in the distribution of the additive- genetic component of the growth trajectories. The normality assumption is basic to classical quantitative genetics, and has support from both empirical studies and several kinds of models for the effect of genes at the underlying loci (FALCONER 198 1 ; BULMER 1985; BARTON and TURELLI 1989). The genetic effects for even a single trait can, however, depart from normality ( e .g . , ROBERTSON 1977). Thus an important question in quantitative genetics is the extent to which multiple quantitative traits (including growth trajectories) conform to multivariate normality. This is an empirical question, since at present it seems unlikely that it can be resolved by theory (TURELLI 1988). Models such as ours that are based on a normality assumption, however, may provide reasonable approximations for the evolution of the mean phenotypes even when this assumption is violated if the departures are small.

We are grateful to R. GOMULKIEWICZ and F. H. C. MARRIOTT for important suggestions regarding the mathematical analysis. We thank B. RISKA for help in analy7ing his mouse data. M. LYNCH, M. TURELLI, and two anonymous reviewers made numerous helpful con~~nents on an earlier draft. This work was supported by National Science Foundation grants BSR-8604743 and BSR-8657521 to M. Kirkpatrick.

LITERATURE CITED

ANDERSON, 1 . W., 1958 .4n Introduction to Multivariate Analysis. Wiley, New York.

ARNOLD, S. J,, 1981 Behavioral variation in natural populations. 1. Phenotypic, genetic, and environmental correlations between chemoreceptive responses to prey in the garter snake, Tham- nophis elegans. Evolution 35: 489-509.

ARNOLD, S. J . , and M . J. WADE, 1984a On the measurement of natural and sexual selection: theory. Evolution 38: 709-720.

ARNOLD, S. J . , and M . J. WADE, 198413 On the measurement of natural and sexual selection: applications. Evolution 38: 720- 734.

AICHIXY, W. R., 1984 Ontogeny, timing of development, and genetic variance-covariance structure. Am. Nat. 123: 5 19-540.

BARTON, N . H., and M. TURELLI , 1989 Evolutionary quantitative genetics: how little do we know? Ann. Rev. Genet. 23: 337- 370.

BECKER, W., 1984 Manual ofQuantitative Genetics, Ed. 4 Academic Euterpriseb, Pullman, Wash.

BKYER, W. H., 1976 Handbook of Standard Mathematical Tables, Ed. 25. C.R.C. Press, Boca Raton, Fla.

BOHREN, B. B., H. E. MCKEAN and Y. YAMADA, 1961 Relative efficiencies o f heritability estimates based on regression of offspring o n parent. Biometrics 17: 481-491.

BULMKR, M . G., 1985 The Mathematical Theory of Quantitative Genetics. Oxford University Press, Oxford.

BULMER, M., 1989 Maintenance of genetic variability by muta- tiowselection b;tl;rnce: a child's guide through the jungle. Ge- nome 31: 76 1-767.

CHEVERUD, J. M . , J. J . RUTLEDGE and W. R. ATCHLEY, 1983 Quantitative genetics of development: genetic correlations among age-specific trait values and the evolution of ontogeny. Evolution 37: 895-905.

COURANT, K., and D. HILBERT, 1953 Methods of Mathematical Physics, Vol. 1. Wiley, New York.

DRAPER, N . R., and H. SMITH, 1966 Applied Regression Analysis. Wiley, New York.

ERENMAN, B., and L. PERSON (Editors), 1988 The Dynamics of Size-Structured Populations. Springer-Verlag, Berlin.

FALCONER, D. S., 198 1 Introduction to Quantitative Genetics, Ed. 2 Longtryan, London.

FITLHUGH, H. A., 1976 Analysis of growth curves and strategies for altering their shapes. J. Anim. Sci. 33: 1036-1 051.

HAYES, J. F., and W. G. HILL, 1981 Modification of estimates of parameters in the consrruction of genetic selection indices ('bending'). Biometrics 37: 483-493.

HERRER'I', J . G., J. F. KIDWELL and H. B. CHASE, 1979 The inheritance of growth and form in the mouse. IV. Changes in the variance components of weight, tail length, and t a i l width during growth. Growth 43: 36-46.

HILL, W. G . , and R. THOMPSON, 1978 Probabilities of non-positive definite between-group or genetic covariance matrices. Biometrics 34: 429-439.

KEMPTHORNE, O., and 0. B. TANDON, 1953 The estimation of heritability by regression of offspring on parent. Biometrics 9: 90-1 00.

KIRKPATRICK, M., 1984 Demographic models based on size, not age, for organisms with indeterminate growth. Ecology 65: 1874-1884.

KIRKPATRICK, M., 1988 The evolution of size in size-structured populations, pp. 13-28 in The Dynamics ofSize-Structured Pop- ulations, edited by B. EBENMAN and L. PERSSON. Springer- Verlag, Berlin.

KIRKPATRICK, M . , and N . HECKMAN, 1989 A quantitative genetic model for growth, shape, and other infinite-dimensional characters. J. Math. Biol. 27: 429-450.

LANDE. R., 1979 Quantitative genetic analysis of multivariate evolution, applied to brain:body size allometry. Evolution 33:

LANDE, R. , and S. J. ARNOLD, 1983 The measurement of selection

IBFSVOLD, D., 1986 Quantitative genetics of morphological differentiation in Peromyscus. I . Tests of the homogeneity of genetic covariance structure among species and subspecies. Evolution 40: 559-573.

LYNCH, M . , and S. J. ARNOLD, 1988 The measurement of selection on size and growth. pp. 47-59 in The Dynamics of Size- Structured Populations, edited by B. EBENMAN and L. PERSSON. Springer-Verlag, Berlin.

MITCHELL-OLDS, T . , and R. G. SHAW, 1987 Regression analysis o f natural selection: statistical inference and biological interpretation. Evolution 41: 1149-1 161.

PRICF,, <,. R., 1970 Selection and covariance. Nature 227: 520- 521.

PRICE, T . D., P. K. GRANT and P. T. BOAG, 1984 Genetic changes in the morphological differentiation of Darwin's ground finches, pp. 49-66 in Population Biology and Evolution, edited by K. WOHRMANN and V . LOESCHKE. Springer-Verlag, Berlin.

KISKA, B., U'. R. ATCHLEY and J. J. RUTLEDGE, 1984 A genetic analysis of targeted growth in mice. Genetics 107: 79-1 0 1 .

RORERI'SON, A , , 1966 A mathematical model of the culling process i n dairy cattle. Anim. Prod. 8: 93-108.

RORERI'SON, A , , 1977 The non-linearity of offspring-parent regression, pp. 297-304 in Proceedings of the International Con- ference on Quantitative Genetics, edited by E. POLLAK, 0 .

KEMPTHORNF and T. B. BAILEY. Iowa State University Press, Atnes.

402-41 6.

on correlated characters. Evolution 37: 12 10-1 226.


TC'RELLI, M . , 1988 Phenotypic evolution, constant covariances, and the tnaintenance of additive variance. Evolution 42: 1342- 1347.

WRIGHT, S., 1968 Evolution and the Genetics ofPopulations: Vol. 1, Genetic and Biometric Foundations. University of Chicago Press, Chicago.

Communicating editor: M . TURELLI

APPENDIX A

Here we present a method for fitting a reduced estimate of Yand testing for its consistency with the data. We then illustrate the procedure using the data on the log of male mouse weight at ages, 2, 3 and 4 weeks from RISKA, ATCHLEY and RUTLEDCE (1984).

Finding a reduced estimate: Recall that a reduced estimate is one consisting of k orthogonal functio;s, where k is smaller than n (the dimensionality of G, which equals the number of ages at which measurements of body size were obtained). Given a set S of k orthogonal functions, we use the method of weighted least squares to fit the k X k reduced coefficient matrix e,. This produces the most statistically efficient estimate of the coefficient matrix that can be obtained from a linear function of the elements of G (DRAPER and SMITH 166, p. 80). T o apply weighted least squares, we begin by forming the vector g by stacking the successive columns of G:

1 - g = (611, . . . ) G,l, G12, * * e , e n 2 , . * e , d,,)T.

This vector has dimension n2. The vector 2. = (&I, . . . , ckO, cOl, . . . , e k l , . . ., &k)T is formed in the same way from the (as yet undetermined) coefficient matrix e,, and has dimension k 2 . In this notation, the relation between the undetermined coefficients and the observed genetic covariances is given by the regression equation

* . \ .

g = X s 2 . + e , (AI)

where e is a vector of errors and the matrix Xs is determined by the set S of orthogonal functions. The matrix Xs is calculated by first forming Qs, the n X k matrix obtained by deleting the columns of 9 corresponding to those 4 not in S, then taking the Kron- decker product of Qps with itself:

XS = Qs €3 = [ (Qs)~; Q s , (Qs)~; 9 s : -1 . (A2) ( Q S ) l I Qs ( W 1 2 Qs *

This is a matrix of dimensions n 2 X k2. Calculating 2. also requires the covariance matrix-V

of the errors in the estimates of g: V,,k, = Cov[G,, 6 ~ 1 . V can be estimated given the particular design of the breeding experiment used to estimate G . We present the general method for calculating e, the

estimate of V, and apply it to three widely used experimental designs (parent-offspring regression, half-sibs, and full sibs) in APPENDIX B.

In typical regression applications, a least-squares estimate of the coefficients in c would follow directly from the linear form of Equation A1 and the specification of Xs and V. The symmetry of G , however, produces redundancies in the vector g that cause V to be singular and so prevents us from calculating 2. from Equation A1 immediately. We therefore make the following modifications:

1. Delete from 9 those columns and rows corresponding to elements of g whose entry G, has i < j .

2. Delete from Xs the rows corresponding to those elements of g for which G,j- has i C j.

3. For each element of 2. for which Cy has i < j, add the corresponding column of Xs to the column corresponding to Cj,, then delete the former column.

4. Delete from g the elements for which G,j- has i < j .

5. Delete from 2. the elements for which C,j- has i < j . Following these operations, 9 has dimensions n(n + 1)/2 X n(n + 1)/2, Xs becomes n(n + 1)/2 X k(k + 1)/ 2, g becomes n(n + 1)/2 X 1, and 2: becomes k(k + 1)/ 2 x 1.

We now can calculate 2. using standard weighted least squares procedures [see, e.g. DRAPER and SMITH (1966, pp. 77-81) and BULMER (1985, pp. 60-61)]:

2. = (XsT iT-' XS)" XsT 9-1 g. (A3)

The reduced coefficient matrix e G is then constructed from 2.. First, form a matrix by restoring the elements deleted in Step 5 above, and "unstacking" the columns. Then insert a row and a column of zeroes in the positions corresponding to those orthogonal functions not included in Qs to obtain e,. (For example, if the first-order orthogonal function, has been omitted, a row of zeroes would be inserted into Cs between the 0th and 2nd rows, and a column of zeroes between the 0th and 2nd columns.) The reduced estimate @of the additive genetic covariance function is then obtained by substituting e G into Equation 3.

Having produced the reduced estimate 9 using the set of orthogonal functions S, we want to test the goodness of fit of 9 to G. We have adopted a procedure that approximates the distribution of errors in the estimated 6,, by a multivariate normal. Using this approximation, the consistency of 9 and G can be determined using the standard test for the fit of a regression model [see DRAPER and SMITH (1966, pp. 77-81) and BULMER (1985, pp. 60-61)]. We test the chi-squared statistic

= (g - xs ;)T 9" (g - Xs E), (A4)


where m = n(n + 1)/2 is the number of degrees of freedom in G and p = K ( K + 1)/2 is the number of parameters being fit. A significant result indicates that the model is inconsistent with the data, in which case we attempt to fit a model consisting of a larger number of orthogonal functions.

Because we are approximating the errors in the G,’s as multivariate normal, the chi-squares test does not produce exact probability values. We expect it, however, to be a reasonable guide that discriminates between candidate covariance functions that fit the data reasonably well and those that do not. More accurate tests could be developed with numerical simulation.

In summary, the algorithm for finding the reduced estimate of the covariance function is as follows. Esti- mates of the additive covariance function are obtained by fitting orthogonal functions in a stepwise manner using weighted least squares (Equation Al). Each estimate is tested against G using an approximate statistical test given by Equation A4. The reduced estimate is the simplest set of orthogonal functions (e.g., the polynomial of lowest degree) which when fit produces an estimate of Y that is not statistically different from

A worked example: T o illustrate the method, consider a reduced estimate for the mouse data consisting of the first two Legendre polynomials (that is, th_e polynomials of degrees 0 and 1). The data matrix G is given in the text (following Equation 7). Following the steps outlined above, we have

g = [436.0, 522.3, 424.2, 522.3,

A

6.

808.0, 664.7, 424.2, 664.7, 558.0IT.

The matrix CPS is found by deleting from the matrix CP (given in the text, following Equation 7) the third column, corresponding to the missing 2nd degree polynomial. This gives:

[ 0.7071 -1.2247

0.7071 1.2247 O 1- CPS = 0.7071

From Equation A2 and the steps listed above we find that

-0.5 -0.866 -0.866 1.5- 0.5 0.0 -0.866 0.0 0.5 0.866 -0.866 -1.5 0.5 -0.866 0.0 0.0

0.5 0.866 0.0 0.0

0.5 0.0 0.866 0.0 -0.5 0.866 0.866 1.5 -

Xs = . 0.5 0.0 0.0 0.0

0.5 -0.866 0.866 -1.5

Using the method described in APPENDIX B, we find that 6, the estimated covariance matrix of errors in

g, is

2752 3187 2541 3187 3187 4527 3504 4527 2541 3504 3057 3504 3187 4527 3504 4527 3692 6210 4708 6210 2944 4830 4058 4830 2541 3584 3057 3584 2944 4830 4058 4830 2347 3754 3477 3754

3692 2944 2541 2944 2347- 6210 4830 3584 4830 3754 4708 4058 3057 4058 3477 6210 4830 3584 4830 3754

7921 6673 4058 6673 5562 4708 4058 3057 4058 3477 7921 6673 4058 6673 5562 6005 5562 3477 5562 5 155-

10453 7921 4708 7921 6005 .

We now follow the steps prescribed ?hove. Step 1, which deletes rows and columns from V, produces

2752 3 187 2541 3692 2944 2347 3187 4527 3504 6210 4830 3754 2541 3504 3057 4708 4058 3477 3692 6210 4708 10453 792 1 6005 ‘

2944 4830 4058 7921 6673 5562 2347 3754 3477 6005 5562 5155 1

By deleting rows from XS (Step 2 above) yields

0.5 -0.866 -0.866 1.5 0.5 0.0 -0.866 0.0

0.5 0.866 0.0 0.5 0.866 0.866

The vector of coefficients, c = [coo, clo, cui, clIlT, contains the element col for which i <j . In Step 3 we therefore add the third column of Xs to the second, then delete the third column. This leaves the matrix in its final form:

0.5 -1.732

xs = 0.5 0.0 0.0 . 0.5 0.866 0.0 0.5 1.732

Removing the redundant elements in g (Step 4) gives

g = [436.0, 522.3, 424.2, 808.0, 664.7, 558.0IT,

and doing the same for Z: (Step 5) produces = [ C O O ,

co1, 6111 , T

Growth Trajectories 99 1

The reduced coefficient vector E is calculated using Equation A3. This gives

E = 1624.3, -13.8, 16.3IT,

and so

[ 0.0 0.0 0.0 1 624.3 -13.8 0.0 CG = -13.8 16.3 0.0 .

By using these coefficients in Equation 3, we arrive at the reduced estimate of Ythat consists of the 0 and 1 st degree Legendre polynomials:

@(al, ~ 2 ) = 312.2 - 11.9(a: + a:) + 24.5a:~:.

The reduced estimate .!@ can be tested against the observed genetic covariance matrix G using the chi- squared statistic of Equation A4. We find x2 = 38.68. Since G has m = 6 degrees of freedom and we have estimated p = 3 coefficients, we test the statistic with 3 degrees of freedom and find that the difference between the reduced estimate @ and the observed values G are highly significant. We therefore reject the reduced estimate consisting only of the 0 and 1st degree Legendre polynomials.

Following the same procedure for all other combinations of 0, 1st and 2nd degree Legendre polynomials shows that only the-full estimate consisting of all three is consistent with G. The error variance of the Gq's in these data is therefore sufficiently small that no reduced model is acceptable, although this may often not be the case for smaller data sets. The full estimate 9 is shown in Figure 2.

APPENDIX B

This appendix describes methods- to calculate V, the covariance matrix of errors in G , the estimated additive genetic covariance matrix. We use the notation vij,kl to denote the covariance of Gij and 6 k l . Below we present formulae for estimating V Yrom three widely used breeding designs: half sibs, full sibs, and parent-offspring regression.

In the following calculations, we will often need an expression for the covariance of two mean cross products. From classical statistics theory we have the result

Cov(Mq, M U ) = (MikMjl + MilMjk)/fr (B1)

where M , is the mean cross product of variables i and j , and f is the number of degrees of freedom (ANDER- SON 1958, p. 161; BULMER 1985, p. 94). Replacing each of the M ' s with its estimate M and dividing by (f + 2) rather than f yields fi,,hi, an unbiased estimator of the covariance.

Half-sib analysis: In the classic half-sib analysis, s sires are each mated to n dams, and one offspring is measured from each mating. An analysis of variance

and covariance partitions the observed variation among the offspring into components among sires and a residual [see FALCONER (198 1, p. 140) and BECKER (1984, pp. 45-54, 113-1 IS)]. The additive genetic component of variance is estimated as

6,. 'I = 4(M aJJ .. - Me,q)/n, (B2)

where Ma,q is the mean cross-product among sires, Mea is the residual mean crossproduct, and n is the number of offspring per sire in a balanced design. (The mean crossproducts Ma,v and Me,q are defined so as to be independent.) The sampling covariance is therefore

Ijrl ,kl = Is [Cov(Ma,v, Ma,,,,) + Cov(ke,g, ke ,~) ] , (B3) n 2

where the covariances of the M ' S are given by Equa- tion B1.

We often want to estimate V from data summaries in the literature that do not include the estimated mean cross products. These quantities can, however, be back-calculated from the estimated acditive genetic and phenotypic covariance matrices G and P that frequently are reported. In a half sib analysis, the necessary relations are

A M . = p. . - - G . . - 1 - ~ J J 'I 'I (B44

and

M a , q = - 6, + i$. (B4b) n- 1

4

Substituting (B4) into (B3) then gives an estimate of

Full sib analysis: In this breeding design, each of s sires is mated to d dams, and n progeny are measured per dam [see FALCONER (1981, pp. 140-141) and BECKER (1984, pp. 55-65, 119-127)]. The resulting nested analysis of variance and covariance yields two estimates of the genetic covariance:

vij,kl.

63.9 = ~(Ms, ' I - Md.q)/nd, (B5a)

and

&,q = 4(Md,q + Me,q)/n, (B5b)

where M s , q , Gd,q, and M e , q are respectively the estimated among sires, among dams, and residual crossproducts. The two estimates of the G's give rise to two estimates for the V's:

V" - -y,kl - - [cOV(k~,,j, M s , k l ) cOV(fid,q, f i d , ~ ) ] (B6a) 16

n2d2

and

16 n

@qj,kl = 7 [COV(Me,q, h e , k l ) + COV(M~,~, f i d , k l ) ] ? (B6b)

992 M . Kirkpatrick, D. Lofsvold and M. Bulmer

where the covariances are again calculated using Equation A 1. The two estimates of V obtained from (B6a) and (B6b) can be averaged to give a single composite estimate.

The k7s that appear in (B6a,b) can be obtained from reported values of Gs, Gd, and P using

1 M,,q = Pi, - - ( G q + Gd,q), 4 (B7a)

and

Parent-offspring regression: When parent-offspring regression [see FALCONER (1 98 1, pp. 136- 140) and BECKER (1984, pp. 103-106, 133-134)] is used, the additive genetic covariance of trait i with trait j can be estimated using

Gq = (Mq + M9)/2, 038)

where Mq is the estimated crossproduct for trait i in the parents and trait j in the offspring. That is,

where z b is the mean of trait i in family k, ZP is the overall mean of zi in the offspring, z$k is the midparent value of trait j in family k, Zf is the overall mean of trait j in the parents, and f is the degrees of freedom. Our estimate of the sample covariances of the genetic covariances are then readily obtained from Equation B1 as

Variation in family size can be taken into account using a form of weighted regression (KEMPTHORNE and TANDON 1953). Doing so results in each mean crossproduct, Mu, being multiplied by a weight, wi, which is the reciprocal of the variance of the offspring means about the regression line. The weight of trait zi is

where pi is the intraclass correlation of trait zi in the offspring (= h2/2 for midparent regression in the absence of dominance and environmental correlations

between sibs), /3q is the slope of the parent-offspring regression, P,i is the phenotypic variance of zi, and n is the number of offspring per family (KEMPTHORNE and TANDON 1953; BOHREN, MCKEAN and YAMADA 1961; BULMER 1985, p. 79). If family size varies, weighted regression should be used to estimate the genetic parameters. pi and p,, can either be guessed, or estimated from the data and used to iteratively calculate the regression coefficients (cf: BULMER 1985, pp. 83-84). Note, however, the latter method yields biased estimates of the parameters (BOHREN, MCKEAN and YAMADA 196 1).

APPENDIX C

This appendix describes in detail two methods for testing hypotheses about the estimated additive genetic covariance function 9 The first tests whether one or more of the eigenvalues of Yare statistically indistinguishable from 0. The second is a numerical method for constructing the confidence limits of the eigenvalues of 9 In this appendix we make use of the notation and results of APPENDIXES A and B.

T o find confidence limits on the estimates of the eigenvalues of we begin by forming the n(n + 1)/ 2-dimensional vectorAg from the diagonal and subdiagonal elements of G (as described in APPENDIT A) and the n(n + 1)/2 X n(n + 1)/2 error matrix V (as described in APPENDIX B). The elements of an additive genetic covariance matrix simulated with error are calculated as

g' = + $'/.e, (C1)

where $'" is the matrix square root of 9 and e is a n(n + 1)/2-dimensional vector of uncorrelated, normally distributed random variates with expectation 0 and variance 1. The simulated covariance matrix G' is then reconstructed from the elements of g'. The corresponding coefficient matrix CG' is determined using Equation 5, and its eigenvalues calculated. The values are recorded, and the entire procedure reiter- ated. We have been using 1000 iterations in our analyses.

The a-percent confidence limits for each eigenvalue can then be determined directly by the range included by 1 - a of the values. Confidence regions for the values of the eigenfunctions at any specified points (ages) of interest can be determined at the same time.

Our second method tests the hypothesis that one or more of the estimated eigenvalues of !Y is statistically indistinguishable frFm 0. We can write the estimated coefficient matrix Cc in terms of its eigenvalues and eigenvectors:

C G = u A UT, (C2)

where A is a dia_gonal matrix whose elements are the eigenvalues of CG and U is a matrix whose columns


are the corresponding eigenvectors. We then generate a coefficient matrix CL by setting one or more of the eigenvalues in A in Equation C? equal to 0. The genetic covariance matrix G* is constructed using

G* = 9 CE aT, (C3)

from which vector g* is formed from the lower diagonal elements of G* in the same way that g was. The hypothesis of zero eigenvalues is then tested with the

chi-squared statistic x 2 = (g - g*)T +-yg - g*) (C4)

with t( t + 1)/2 degrees of freedom, where t is the number of eigenvalues that have been set to zero. If this reaches a significant value, then the hypothesis that those eigenvalues are zero is rejected. The same procedure can be used to test a hypothesis that one or more eigenvalues are equal to some specified values other than zero.

Human Biology, December 1993, v. 65, no. 6, pp. 941–966.Copyright © 1993 Wayne State University Press, Detroit, Michigan 48202

key words: genotype-environment interaction, complex segre-gation analysis, quantitative genetics, quantitative traits, statistical methods.

Statistical Genetic Approaches to Human Adaptability

J. Blangero1

Abstract The genetic determinants of physiological and developmental re-sponses to environmental stress are poorly understood. This has been primar-ily due to the difficulty of direct measurement of response and the lack of appropriate statistical genetic methods. Here, I present a unified statistical genetic methodology for human adaptability studies that permits evaluation of the inheritance of quantitative trait response to environmental stressors. The foundation of this approach is the mathematical relationship between genotype- environment interaction and the genetic variance of response to en-vironmental challenge. I describe two basic methods that can be used for ei-ther discrete or continuous environments. Each method allows for major loci, residual polygenic variation, and genotype- environment interaction at both the major genic and the polygenic levels. The first method is based on mul-tivariate segregation analysis and is appropriate for situations in which data are available for each individual in each environment. The second method is appropriate for the more common case when response to the environment cannot be observed directly. This method is based on an extension of a mixed major locus/variance component model and can be used when singly mea-sured related individuals are observed in different environments. Three ex-ample applications using data on lipoprotein variation in pedigreed baboons are provided to show the utility of these methods.

The study of human adaptability examines the biological responses of individuals to environmental change and the variability in such reaction norms both within and between populations. Past research in this field has focused primarily on the assessment of the environmental components of basic homeostatic mechanisms involved in normal physiological and developmental processes in stressful envi-ronments (Baker 1976; Frisancho 1979). Genetic inferences have usually been limited to indirect quasi- experimental between- population comparisons using classical migration designs (Harrison 1966). Because of the general lack of fam-ily studies and of an appropriate statistical methodology, relatively little is known about the underlying genetic basis of quantitative physiological responses to en-vironmental change.

1 Department of Genetics, Southwest Foundation for Biomedical Research, PO Box 28147, San Antonio, TX 78228- 0147.

HB_5-6_FINAL.indb 523 5/3/2010 12:28:13 PM

524 / blangero

The genetic determinants of physiological response are likely complex, in-volving both major genes with large effects and minor genes (polygenes) with small effects, and their elucidation will require new analytical methods that explicitly in-corporate genotype- environment interaction. In this article I present a general statis-tical genetic approach for human adaptability studies that permits evaluation of the inheritance of quantitative traits involved in adaptation to environmental stresses.

The approach that I advocate for the dissection of the genetic architecture of response to environmental stress uses genotype- environment (G × E) interaction as the focal concept. In most cases, significant G × E interaction is interpretable as evidence for a heritable basis of biological response to environmental change. This relationship can be exploited to make inferences about the genetic media-tion of physiological or developmental responses to environmental stresses. The advantage of using G × E interaction as an analytical focus is that, given an appro-priate pedigree- based sampling design, it can be evaluated even when direct mea-surement of response is not possible. Therefore the statistical analysis of G × E interaction can provide a useful framework in which to make inferences about the genetic determinants of response.

I present a model of quantitative trait variation that includes the effects of a single unknown major gene (MG), polygenes (PG), known (and therefore measured) environmental factors (E), and random environmental factors (e). Al-though the model easily generalizes to multiple major loci (Blangero et al. 1990), I limit the present exposition to a single locus. Instead of (or in addition to) the unknown major gene, the model can include a known genetic polymorphism at a candidate locus (a measured gene, noted as mG). Using this basic model, I dis-cuss how G × E interaction can be examined at several levels: (1) polygenotype- environment interaction (PG × E), (2) major genotype- environment interaction (MG × E), and (3) major genotype- random environment interaction (MG × e). The direct analogues for the case when we have information on a known genetic poly-morphism are termed measured genotype- environment interaction (mG × E) and measured genotype- random environment interaction (mG × e).

The problem is also divided on the basis of experimental design into two types: (1) complete data situations in which each individual can be measured in each environment and (2) missing data situations in which each individual can be measured in only a single environment. When each individual can be measured in each environment, the examination of G × E interaction and the genetic analysis of response to environmental change is straightforward, because response can be directly observed. For such a situation multivariate genetic analysis of the trait’s expression in the different environments can be used to examine the genetic archi-tecture of response (Falconer 1952). Complete data situations most likely occur when the environment can be manipulated easily (e.g., cold stressor tests, exercise tests) or when individuals routinely encounter multiple environments (e.g., diur-nal variation, seasonal variation).

When each individual can be measured only in a single environment, the analysis of G × E interaction and of the genetic determinants of response is

HB_5-6_FINAL.indb 524 5/3/2010 12:28:13 PM

Statistical Genetics of Human Adaptability / 525

greatly complicated. Such missing- data situations occur when environments are difficult to manipulate (e.g., smokers versus nonsmokers), are exclusive (e.g., males versus females), or are continuous (e.g., years spent at high altitude, di-etary indexes). However, I show that, when related individuals are measured in different environments, genetic analysis of the response to environmental change is possible.

Models for G × E Interaction

Each Individual Measured in Each Environment. The simplest situation in which to evaluate G × E interaction is when each individual can be measured in each environment. In this section I develop a general framework for discuss-ing G × E interaction by examining the bivariate case of quantitative trait ex-pression in two discrete environments. Given knowledge about the major locus genotype, an individual’s phenotype measured in the first environment can be written as

p m g em1 1 11 1 1 1= + ′ + +µ β x , (1)

where m represents the major locus genotype, mm is the mean of the mth genotype, β is a vector of regression coefficients corresponding to the vector of fixed effects x, g is the polygenotypic effect, and e is the random environmental deviation. For the current one- locus problem, m can be AA, AB, or BB, which are assumed to be in Hardy- Weinberg equilibrium with frequencies ψ = − − ′q q q q2 22 1 1, ( ),( ) , where q is the frequency of the A allele.

We can model the phenotypic value in the second environment as a linear function of the phenotypic expression in the first environment:

p m g e

p m

g e

m

m m g

2 2 2 2 2 2

1

1 1 2 1 1

= + ′ + +

= +

= + + + ′ + + +

µ

µ β

β

β ∆

2 x

x

∆

∆ ∆( ) ( ) ( ) ( ++∆e ),

(2)

where

∆ ∆ ∆ ∆= + ′ − + ′ + +m g eβ ∆1 2 1 2( )x x xβ (3)

refers to the response ( p2 - p1) to the environment and the subscripted D’s are the component- specific changes in parameters (or random variables) between environ-ments. Now let the complete bivariate phenotype be represented as p = [ p1, p2].

Assuming that there is no correlation between the polygenotypic vector and the vector of random environmental deviations (i.e., no PG × E correlation, Cov[g, e] = 0), the conditional phenotypic covariance matrix for p is given by

Var( )p m G Em= + , (4)

HB_5-6_FINAL.indb 525 5/3/2010 12:28:14 PM

526 / blangero

where G is a within- genotype additive genetic covariance matrix and Em is a within- genotype random environmental covariance matrix. The within- genotype additive genetic covariance matrix is assumed to be constant across genotypes (i.e., there is no MG × PG epistasis). G has the form

G G G G G

G G G G

=

σ ρ σ σ

ρ σ σ σ1

21 2

1 2 22 , (5)

where the s Gi2 is the residual additive genetic variance of the trait in the ith environ-

ment and rG is the additive genetic correlation between trait expressions in the two environments. The environmental covariance matrices Em have analogous forms.

The total phenotypic covariance matrix of p can be decomposed into its three constituent parts (Blangero and Konigsberg 1991):

Var( )p = + +M G E (6a)

= ′ + +∑µ µ ψWC G Ei i, (6b)

where M is the genetic covariance matrix resulting from the major locus and E is the pooled within- genotype environmental covariance matrix. In Eq. (6b), m is the 2 × 3 matrix of genotype- specific means, W = diag(), and C = (I - 13). Equation (6b) shows that the covariance matrix resulting from the major locus is strictly a function of the genotypic means and the genotypic frequencies. In the univariate case this formula reduces to

σ ψ µ ψ µM i i j j2

2

= −( )∑∑ . (7)

Given this variance decomposition of the bivariate phenotype, we can com-pletely specify the variance components of the response. Because D is a linear transformation of p (D = kp, where k = [-1, 1]), the conditional variance of the response given the major locus genotype is

Var( ) ( )∆ m G E

G Em

m

G G G G G E m

= ′ +

= ′ + ′

= + −( )+ +

k k

k k k k

σ σ ρ σ σ σ σ12

22

1 2 122 EE m Em E m E m2

21 22−( )ρ σ σ .

(8)

As with the original phenotypes, the total phenotypic variance of the response to the second environment can be partitioned into three components:

Var( ) ( )

.

∆

∆ ∆ ∆

= ′ + +

= + +( )k kM G E

M G Eσ σ σ2 2 2 (9)

The heritability of the response can be broken into two components— that result-ing from the major locus (hMD

2 ) and that resulting from the polygenes (hGD2 ):

HB_5-6_FINAL.indb 526 5/3/2010 12:28:16 PM


hMM

M G E∆

∆

∆ ∆ ∆

22

2 2 2=+ +

σ

σ σ σ, (10a)

hGG

M G E∆

∆

∆ ∆ ∆

22

2 2 2=+ +

σ

σ σ σ. (10b)

The total heritability of response is simply the sum of these two component- specific heritabilities: h h hM G

2 2 2= +∆ ∆.

PG × E Interaction. PG × E interaction occurs when there is a significant polygenic component of variance in response. The additive polygenic variance in response is a function of the additive genetic variance expressed in the two environments and the additive genetic correlation between the trait’s expression in the two environments:

σ σ σ ρ σ σ

σ σ σ σ ρG G G G G G

G G G G G

∆2

12

22

1 2

1 22

1 2

2

2 1

= + −

= − + −( ) ( ). (11)

The absence of PG × E interaction implies that there is no polygenic variance for the response to the environment (i.e., σ G∆

2 0= ). Equation (11), which was first derived by Robertson (1959), shows that there is no PG × E interaction when σ σG G1

22

2= and rG = 1. The first condition requires that the polygenic variance be constant across environments. Observed heteroscedasticity of genetic variances across environments can arise simply because of scaling. For example, if g2 = cg1, where c is a constant, s G2

2 will be equal to c2s G12 and (sG1 - sG2)

2 will equal (1 - c)2

s G12 . For the second condition ( rG = 1) to hold, the same polygenes must influence

the phenotype in both environments and have similar effects in each environment. The second condition is therefore the requirement of complete pleiotropy. In the absence of complete pleiotropy ( rG < 1), the polygenotypes may exhibit different ranks in different environments—one genotype may express the highest quantita-tive trait mean in one environment, but a different genotype may have the highest mean in a second environment.

MG × E Interaction. Significant MG × E interaction requires that there be a sig-nificant major locus component for the biological response to the environment. Similar to the case for PG × E, the major locus component of variance in response is given by

σ σ σ σ σ ρM M M M M M∆2

1 22

1 22 1= − + −( ) ( ). (12)

The absence of MG × E interaction requires σ M∆2 0= . This will occur only when

DAA = DAB = DBB (i.e., when there is no major genotype- specific response).

MG × e Interaction. A more subtle form of G × E interaction involves interac-tion between a gene and the random environment—that part of the unmeasured

HB_5-6_FINAL.indb 527 5/3/2010 12:28:18 PM

528 / blangero

environment that is specific to each individual. This type of interaction, which I denote MG × e interaction, is closely related to the concepts of physiological stability and developmental stability (Mather 1953; Bradshaw 1965). If some genotypes are more environmentally labile, they will exhibit increased random environmental variance. Therefore MG × e interaction can be said to exist when the environmental variance is a function of major locus genotype. In terms of our current bivariate model, the null hypothesis of no MG × e interaction requires that EAA = EAB = EBB. It is important to note that this type of interaction can be exam-ined without reference to any measured environment. Unlike PG × E and MG × E interaction, MG × e interaction is not directly related to the genetic variance of the response of a quantitative trait to an environmental change.

Each Individual Measured in a Single Environment. The missing data situ-ation in which individuals can be measured in only a single environment is con-siderably more complex. In this section I extend the model to the examination of trait expression as a function of a continuous environmental index, Although I focus on this continuous case, the proposed model also can be formulated for discrete environments.

For the case of a measured continuous environment indexed by z, an indi-vidual’s phenotype can be modeled as the linear function

p z g em m z= + + ′ + +µ α β x1 , (13)

where am is a genotype- specific regression on the environmental index, which is assumed to be scaled so that the basal environment exhibits a value of 0. The response to the measured environment z relative to the basal environment can be defined as Dmz = amz. The regression coefficients βz are subscripted to allow them to be a function of the measured environment if necessary.

PG × E Interaction. For the continuous environment case the presence of PG × E interaction implies that the polygenic variance is a function of z:

Var( ) ( , )

,

g z f zG

Gz

=

=

θ

σ 2 (14)

where fG(z, θ) is a nonnegative function and θ is a vector of parameters. Such variance functions can take many possible forms (Carrol and Ruppert 1988). For example, we can assume that the additive genetic standard deviation is a linear parametric function of the measured environment:

σ σ γGz G Gz= +0 1( ), (15)

where sG0 is the expected additive genetic standard deviation when z = 0 and gG determines the rate of change in sG. These two parameters must be constrained

HB_5-6_FINAL.indb 528 5/3/2010 12:28:19 PM


so that sG 0. Another possible variance function model is to let the logarithm of the additive genetic standard deviation be a linear function of the environment, which leads to

σ κ γGz G Gz= +exp( ), (16)

which guarantees that sGz 0.Similarly, the genetic correlation between an individual’s polygenotypic value

expressed in environment zi with that expressed in environment zj can be written

ρG zi zj i j G i j G i jg g z z f z z f z z, , ( , ), ( , )( )= − ≤ ≤1 1. (17)

If there is no PG × E interaction, Var( )g z G=σ 2 and rG(gi, gj) = 1. Again, the para-metric function fG(zi, zj) can take any number of forms. One simple yet plausible one is to let the genetic correlation be an exponential function of the differences between the two environmental indexes:

ρ λG zi zj G i jg g z z( , ) exp= − −( ), (18)

where lG is a parameter that determines the rate of exponential decline in the genetic correlation as the difference between environmental indexes increases. If l = 0, the genetic correlation between trait expressions in any two environments is 1. More elaborate functions can be specified to take into account the environments individuals may have previously encountered (Hopper and Mathews 1983) when such historical environmental data are available.

For the variance functions listed, a statistical test of the null hypothesis of no PG × E interaction or no polygenic variance in response can be based on the expectation that gG = 0 and l = 0. Once the variance and correlation functions are known, the genetic variance for the response between any two environments can be obtained. For example, plugging the functions given in Eqs. (15) and (18) into Eq. (11) yields the following prediction for the polygenic variance in response to a change in environmental index relative to the basal environment:

σ σ γ γ λG G G G Gzz z z∆

20

2 2 2 2 1= + + − ( )exp( ) . (19)

MG × E Interaction. For the continuous environment a quantitative phenotype will exhibit significant MG × E interaction when the genetic variance resulting from the major locus varies as a function of the environmental index:

σ α α αMz M AA AB BBf z2 = ( , , , ), (20)

which requires that σ M z∆2 0> for at least some values of z. This will occur only

when there is heterogeneity in the genotype- specific regressions on the environ-mental index, because

HB_5-6_FINAL.indb 529 5/3/2010 12:28:20 PM

530 / blangero

σ ψ α ψ αM i i j jzz∆

2 22

= −( )∑∑ . (21)

MG × e Interaction. The residual random environmental variance can also be modeled as a function of the measured environmental index. The same variance functions used for the polygenic variance can be used for the random environmen-tal variance. To allow for MG × e interaction, the parameters of the environmental variance function have to be major genotype specific:

Var( , ) ( , , )

.

e z m f z mE

Emz

=

=

θ

σ 2 (22)

The analogous environmental function to Eq. (15) is

σ σ γEmz Em Emz= +0 1( ). (23)

The null hypothesis of no MG × e interaction requires that sEAA0 = sEAB0 = sEBB0 and gEAA = gEAB = gEBB. If there is evidence of MG × e interaction, then genotypes significantly vary in their environmental stabilities.

Environmental Variance of Response. When individuals can be measured in only a single environment, the random environmental correlation between the expression of the trait in different measured environments is statistically uniden-tifiable. Therefore the random environmental variance of the response to envi-ronmental change is also undefined. However, if we assume that the correlation between random environmental deviations is 0, the expected pooled environmen-tal variance of response to environment zi will be

σ σ σ σ σE Ez E Ez Ezz i i∆2

02

02= − +( ) . (24)

The assumption of rE = 0 is plausible for many physiological traits that are influ-enced by time- specific local conditions. Therefore Eq. (24) may be useful to help gauge the relative importance of major genic and polygenic determinants when individuals can be measured only in single environments.

Statistical Methods

A variety of statistical genetic methods can be used to assess G × E interac-tion, depending on the genetic determinants to be considered and the experimental design of the study.

Complete Data Situations: Multivariate Genetic Analysis. Statistical detec-tion of G × E interaction is uncomplicated when information is available on each individual in each environment. Given sufficient pedigree data, the parameters of the model for the bivariate complete- data situation described in Eqs. (1), (2), and

HB_5-6_FINAL.indb 530 5/3/2010 12:28:21 PM


(4)–(6) can be estimated using standard likelihood methods for pedigrees. For a pedigree of size n, let P be the n × 2 matrix of phenotypes. Assuming multivariate normality within genotypes, the conditional density of P given a vector of major locus genotypes m and the matrix of covariates X is

f P X

P F X P F X

nm ,

exp

/( )=

× − − −( )′ − −( )

− −

−

21 2

12

1

π

µ µ

Ω

Ωvec vecβ β

, (25)

where F is an n × 3 indicator matrix of 0’s and 1’s, mapping each individual to the appropriate genotypic- specific means in the matrix m. The phenotypic covariance matrix for the whole pedigree is denoted , which is given by

Ω Φ ϒ= ⊗ +G 2 , (26)

where is the Kronecker product operator, is a matrix of kinship coefficients, and is a block diagonal matrix with n genotype- specific random environmental covariance matrices (Em) along the diagonal given by

ϒ=⊕=in

mE i1 , (27)

where is a block diagonal matrix summation operator and mi is the ith indi-vidual’s genotype.

Quantitative Genetic Analysis. For the case of no major gene effects and no measured genotypic effects (i.e., mAA = mAB = mBB), Eq. (25) is the likelihood of the polygenic model commonly used in quantitative genetic analyses (Hopper and Mathews 1982; Lange and Boehnke 1983). For these polygenic models the maximum- likelihood estimates of the parameters can be obtained by maximiz-ing the ln- likelihood function ln[ (f P X )] using numerical optimization methods. However, in the present context, inference based on such a simple model is lim-ited to the examination of PG × E interaction.

Measured Genotype Analysis. If information is available on a polymorphic candidate locus that may be involved in the physiological, developmental, or metabolic pathway of the trait being studied, several additional types of G × E in-teraction can be detected. Without modification, Eq. (25) provides the likelihood of a multivariate measured genotype model (Boerwinkle et al. 1986; Blangero et al. 1992). Parameter estimation for measured genotype models can be performed by maximizing the function ln[ ( , )]f P Xm . Using this technique, PG × E, mG × E, and mG × e interaction can all be tested.

Complex Segregation Analysis. For models that include a major gene compo-nent, the likelihood is more complex and parameter estimation is more difficult. The detection of major genes affecting quantitative traits is accomplished using a

HB_5-6_FINAL.indb 531 5/3/2010 12:28:22 PM

532 / blangero

set of methods known as complex segregation analysis (Elston and Stewart 1971; Morton and MacClean 1974). A standard set of hypotheses are tested before the hypothesis of major gene involvement is accepted (Lalouel et al. 1983). In re-cent years these methods have been extended to allow for multivariate phenotypes (Lalouel 1983; Bonney et al. 1988; Blangero and Konigsberg 1991). This exten-sion to multiple traits has great implications for the joint examination of MG × E and PG × E interaction because it allows the simultaneous analysis of a single trait measured in multiple environments (Blangero et al. 1990). Because response is a linear function of the original phenotypes, results from a multivariate segregation analysis can be formulated in terms of the variance components of response, as detailed in Eqs. (8)–(12).

The likelihood of a multivariate mixed major gene and polygene model can be written (Blangero and Konigsberg 1991)

L G E q P X f m f P m XA j jj

n

µ, , , , , , ( ) ( , )β M( )= ⋅ ⋅=

∑1

3

, (28)

where M is an n × 3n matrix containing all possible genotypic combinations, m·j is the jth column of M, and f (m·j) is the probability of observing the jth genotypic vector, which is a function of the pedigree structure and the rules of Mendelian transmission (Elston 1981). In Eq. (28), the summation is over all possible ge-notypic vectors—a potentially enormous number. For a Mendelian model this number can be reduced by eliminating consideration of impossible genotypic combinations (e.g., father = AA, mother = BB, child = AA). However, this require-ment to sum over all possible genotypes leads to the computational complexity of segregation analysis. Fortunately, some efficient methods for recursive probability calculations on pedigrees are available (Elston and Stewart 1971; Cannings et al. 1978). These methods exploit the systematic pattern of the residual phenotypic covariance matrix that occurs when there is a standard additive polygenic residual component. In such cases the polygenotypes of offspring are independent, given the polygenotypes of their parents (Elston and Stewart 1971). This assumption greatly reduces the numerical burden of calculating the likelihood of a mixed major locus/polygenic model by permitting efficient analytical integration over multivariate polygenotypes (Hasstedt 1982; Blangero and Konigsberg 1991) or by allowing the rapid inversion of small patterned residual covariance matrices (Bonney 1984).

Missing- Data Situations. When each individual is measured in only one envi-ronment, assessment of MG × E and PG × E interaction is possible if related indi-viduals are measured in different environments. This situation is made statistically tractable by means of missing- data theory (Little and Rubin 1987) because the measurements that are lacking can be considered missing. In missing- data situ-ations examination of MG × E and PG × E interaction (and therefore the genetics of response to environmental changes) is still possible if the missing data can be

HB_5-6_FINAL.indb 532 5/3/2010 12:28:23 PM


considered to be missing at random (MAR). If the probability of observing an individual in an environment is independent of genotype, then the missing data are MAR. In the present context MAR means that there is no correlation between genotype and environment (i.e., genotypes are distributed randomly across envi-ronments). The assumption of MAR is therefore unlikely to hold if there is strong natural selection acting on the trait such that allele frequencies are markedly dif-ferent in the contrasting environments. For most traits of interest this is unlikely to be a problem because there are few examples of such selection. Note that the MAR assumption does not depend on the randomness of environments within pedigrees (sets of related individuals). Even if there is familial aggregation for the environmental measure, the MAR assumption will hold so long as genotypes (not phenotypes) and environmental measures are uncorrelated (i.e., genotypes and environmental measures are not jointly transmitted within families).

Given that the MAR assumption holds, the marginal distribution of the ob-served phenotypes pobs can be obtained by integrating out the missing data pmis:

f f dp p p pobs obs mis misθ θ( )= ( )−∞

+∞

∫ , , (29)

where θ represents the parameters to be estimated. It can be shown that the result-ing ln- likelihood function is given by

L f cθ θp pobs obs( )= ( )+ln , (30)

where c is a constant. Equation (30) suggests that standard likelihood inference can be used and is based solely on the observed data.

Conditional Distribution of Phenotypes Given Major Locus Genotypes. As-suming that the missing data are MAR, the conditional density of phenotypes in a pedigree given the vector of major locus genotypes m and the matrix of covariates X, is

f X Xp m F, , )( ) −MVN( µ β Ω , (31)

where MVN() denotes a multivariate normal density with mean vector (Fm - Xβ) and phenotypic covariance matrix . This density is identical in structure to the one shown in Eq. (25), except for dimensional differences. However, the pheno-typic covariance matrix in Eq. (31) is different from that used in the complete- data situation. When individuals can be measured in only a single environment,

Var( , )

,

p m X

R

=

= +

Ω

Φ Ξ ϒ2 (32)

where is the Hadamard product operator and is the n × n kinship matrix. The elements of the matrices R (rij), X (xij), and (uij) are given by

HB_5-6_FINAL.indb 533 5/3/2010 12:28:24 PM

534 / blangero

rz z

z z z zij

i j

G i j i j

=∀ =

∀ ≠

1, ,

( , ), ,ρ (33a)

ξ σ σij Gz Gzi j= , (33b)

υσ

ij

Ezii j

i j=

∀ =

∀ ≠

2

0

, ,

, . (33c)

The matrix R can be considered a correction to the kinship matrix because the presence of PG × E interaction alters the expected genetic correlations among relatives.

Statistical Genetic Analysis in Missing Data Situations. Maximum- likelihood estimation of the parameters of the G × E models can be based on the likelihood implied by Eq. (31). Quantitative genetic analysis of PG × E interaction and mea-sured genotype analysis of mG × e, mG × E, and PG × E interactions can proceed using the same techniques that are used for the complete- data situation with the only difference being that the random environmental correlation between a trait’s expression in different environments is undefined. All other relevant parameters can be estimated using standard likelihood methods.

Complex Segregation Analysis. For the mixed major locus/polygenic model the analysis of G × E interaction is significantly more complex than that observed for the complete- data situation. Most of the complication is due to the residual PG × E interaction component. Incorporating PG × E interaction violates the assumption of conditional independence of offspring’s polygenotypes given parental poly-genotypes. This leads to increased complexity of pedigree likelihood calculations by making analytical integration over polygenotypes cumbersome. To avoid this problem, I have adapted Hasstedt’s (1991) variance component/major locus like-lihood approximation to allow for G × E interaction. This method requires that the n × n conditional phenotypic covariance matrix be formed and inverted at each iteration. The method has been shown to be computationally feasible and to generate unbiased parameter estimates (Blangero 1991). Because of the computa-tional requirements (the covariance matrix would have to be inverted separately for each possible genotypic vector), it is unlikely that the evaluation of MG × e interaction is currently feasible.

Assessing MG × E interaction in the missing- data situation poses no ana-lytical difficulties because it requires only a simple genotype- specific regression model, as shown in Eq. (13). In fact, several investigators have developed major gene models that allow for MG × E interaction (Eaves 1984; Moll et al. 1984; Konigs berg et al. 1991; Pérusse et al. 1991; Gordeuk et al. 1992). Unlike the cur-rent method, none of these earlier methods allowed for the simultaneous evalua-tion of MG × E and PG × E interaction.

HB_5-6_FINAL.indb 534 5/3/2010 12:28:24 PM


Hypothesis Testing. Using likelihood- based inference, we can compare com-peting hypotheses regarding the presence or absence of different G × E compo-nents. Formal statistical tests of the null hypothesis of no G × E interaction can be performed using likelihood ratio statistics (Kendall and Stuart 1961). Such a test compares the likelihood of a general model with the likelihood of a nested submodel. For example, to test the hypothesis that there is no interaction between a major locus and a continuous environmental index, we would compare a model in which the genotype- specific regressions on the environment were held equal to one another (i.e., a = aAA = aAB = aBB) with the more general model in which aAA, aAB, and aBB are each estimated. The likelihood ratio statistic for such a test is

L = 2[ln Li (aAA, aAB, a BB, θi) - ln Lj (a , θj)], (34)

where the vector θi denotes all other estimated parameters of the ith model. L is distributed approximately as a chi- square variate with degrees of freedom equal to the difference in estimated parameters between the two compared models. In the given example there are two degrees of freedom. If the null hypothesis of no MG × E interaction is rejected, there is evidence for a significant major locus com-ponent of the response to the environment. Similar tests to the example given can be specified for each of the G × E interaction terms.

The chi- square approximation to the asymptotic distribution of L does not hold when the constrained parameter is located on the boundary of its acceptable parameter space. Such tests often occur when testing whether a particular vari-ance component is greater than 0. In this case L is distributed as a mixture of a chi- square distribution and a density with all its point mass at 0 (Chernoff 1954; Hopper and Mathews 1982). For such a test with one degree of freedom, the p value obtained from the c1

2 distribution should be halved.

Applications

In this section I present three examples of applications of these methods. All examples are taken from ongoing work at the Southwest Foundation for Bio-medical Research on the genetics of lipoprotein metabolism using pedigreed baboons.

Complete Data, Discrete Environment: mG × E, PG × E, and mG × e Inter-action. The first example uses the data of Hixson et al. (1989), who examined the relationship between low- density lipoprotein cholesterol (LDL- C) levels and a DNA polymorphism in an important candidate gene, the LDL receptor (LDLR) gene, for LDL- C metabolism in a group of pedigreed baboons. The 253 baboons (Papio hamadryas anubis, P. h. cynocephalus, and their hybrids) were members of 21 pedigrees. Serum concentrations of LDL- C were measured on each baboon in each of two dietary environments: (1) a basal diet and (2) a high- cholesterol, high- saturated- fat (HCSF) diet. Therefore this is an example of a complete- data

HB_5-6_FINAL.indb 535 5/3/2010 12:28:24 PM

536 / blangero

situation. The baboons were also typed for an AvaII restriction fragment length polymorphism (RFLP) in intron 17 of the LDLR gene (Hixson et al. 1989). Two alleles (A, B) were found, with the observed frequency of the more common B allele estimated at 0.79. Hixson et al. (1989) detected a significant association be-tween LDLR genotype and quantitative levels of LDL- C on both diets. However, Hixson did not examine G × E interaction or the genetic determinants of LDL- C response to the HCSF diet.

I reanalyzed Hixson’s data using the methods for mG × E, PG × E, and mG × e interaction. Complete data (i.e., LDLR genotype, LDLC basal level, and LDL- C HCSF level) were available for 203 animals. The quantitative genetic analysis program Fisher (Lange et al. 1988) was used for this analysis. Special subroutines were written to estimate the parameters described in Eqs. (1)–(4). The effects of several significant covariates (sex, male age, nursery- reared versus maternal- reared, and subspecies) were simultaneously estimated in all analyses.

Table 1 shows the results of hypothesis testing using likelihood ratio statis-tics. As found by Hixson et al. (1989) in their univariate analyses, there is strong evidence ( p = 0.007) for an effect of the LDLR gene on quantitative LDL- C variation. There is also significant evidence for mG × E interaction (L = 6.26, p = 0.044), which indicates that there is a significant effect of the LDLR poly-morphism on the response to the HCSF diet. This can be seen in Figure 1, which shows the genotypic means in each dietary environment. The solid lines indicate the observed response, and the dashed lines show the expected response assum-ing that there is no mG × E interaction. In the absence of mG × E interaction, each genotype would be expected to increase by 57 mg/dl. The estimated genotype- specific responses were D AA = 38.31, D AB = 44.10, and D BB = 62.92. In particular, the AA genotype appears to be less influenced by the dietary challenge.

Table 1 also shows that there is significant PG × E interaction ( p < 0.001). This suggests that additional genetic factors influence dietary response. The PG × E interaction effect was further broken into the two components described by Eq. (11). The hypothesis that the additive genetic standard deviations were equal in the two environments can be rejected ( p < 0.001). The maximum- likelihood estimates of the genetic standard deviations were sG1 = 13.68 2.05 for the basal

Table 1. Analysis of mG × E, PG × E, and mG × e Interaction in LDL-C Concentrations in 203 Pedigreed Baboons

Model d.f. Λ p

No mG effects 4 14.17 0.007No mG × E interaction 2 6.26 0.044No PG × E interaction 2 22.06 <0.001 sG1 = sG2 1 20.12 <0.001 rG = 1 1 7.81 0.003No mG × e interaction 6 17.75 0.007

The measured genotype (mG) refers to an AvaII LDLR RFLP, E refers to diet (basal versus HCSF), and e refers to random environment.

HB_5-6_FINAL.indb 536 5/3/2010 12:28:25 PM


Figure 1. Diet- specific LDLR genotypic means for LDL- C levels showing mG × E interaction. Solid lines indicate estimated values; dashed lines indicate values expected in the absence of mG × E interaction.

diet and sG2 = 35.88 4.53 for the HCSF diet. As mentioned previously, such a difference can be due to scaling phenomena. The hypothesis of complete plei-otropy ( rG = 1) is also rejected ( p = 0.003), which suggests the possibility that polygenotypes can change ranks in different environments. The observed genetic correlation is relatively low: 0.563 0.158.

HB_5-6_FINAL.indb 537 5/3/2010 12:28:25 PM

538 / blangero

There is also evidence for significant mG × e interaction ( p = 0.007). The random environmental covariance matrices varied significantly among measured genotypes. This can be seen in Figure 2, which shows genotype- specific bivariate ellipses. The area within the ellipse indicates the magnitude of variability within measured genotypes. The orientation of the ellipse reflects the environmental cor-relation between LDLC serum levels on the two diets. For example, the ellipse for the BB genotype is much larger than those of the other two genotypes, reflect-ing the overall greater variability. Therefore the BB genotype appears to have decreased environmental stability compared with the other two genotypes. This genotype may exhibit reduced capacity for physiological buffering against other unknown environmental factors.

Figure 2. Bivariate ellipses showing LDLR genotypic- specific environmental variation (mG × e inter action). Ellipses contain 68% of each genotypic distribution (approximately 1 SD on either side of the mean). Genotype- specific centroids are shown as plus signs.

HB_5-6_FINAL.indb 538 5/3/2010 12:28:25 PM


Missing Data, Discrete Environment: MG × E and PG × E Inter action. Whereas the first example used a known candidate polymorphism in a complete- data situation, the second example involves an unknown major gene and a missing- data situation. The quantitative trait is apolipoprotein AI (apo AI) serum concentration. apo AI is the main protein in high- density lipoprotein (HDL) and is considered a protective factor against heart disease. We have shown that apo AI serum levels are influenced by two separate major genes in baboons (Blangero et al. 1990) and that both genes exhibit genotype- diet interaction. For the current example I examine the role of genotype- sex interaction in a single dietary environ-ment using just one of the two major loci. In this case the environment is sex, which primarily marks differences in endogenous sex hormonal microenvironment. Sex is a good example of an obligate missing- data situation that is important for many physiological traits. Elsewhere, the role of genotype- sex interaction in determin-ing apo AI levels using standard quantitative genetic methods has been considered (Towne et al. 1992).

The sample includes 617 baboons in 23 pedigrees and is similar to the sample that we previously analyzed (Blangero et al. 1990). The computer pro-gram PAP (Hasstedt 1989) was adapted to allow for segregation analysis incor-porating both MG × E and PG × E interaction in a missing- data situation. As with the LDL- C analysis, the effects of several significant covariates (age and age2 in females and nursery- rearing versus maternal- rearing) were simultaneously esti-mated in each analysis.

Table 2 shows the results of the apo AI segregation analyses. The model incorporating both MG × E and PG × E interaction is a significant improvement ( p = 0.003) over the classical mixed model, which ignores G × E interaction. The null hypothesis of no MG × E interaction (DAA = DAB = DBB) can be unequivo-cally rejected ( p = 0.006). Therefore there is evidence for a major genic com-ponent of variance in response (or more appropriately, sexual dimorphism). This is clearly indicated in Figure 3, which shows the sex- specific genotypic means. Females show an exaggerated between- genotype variability relative to males. In females approximately 41% of the total phenotypic variance is accounted for by the major locus, whereas only 13% of the variance is attributable to this locus in males. Table 2 also shows that there is significant PG × E interaction for this trait ( p = 0.001). This interaction component is purely due to heterogeneity of residual genetic standard deviations between the two sexes because the genetic correlation between the expression of apo AI in the two sexes is estimated as 1.00.

Missing Data Continuous Environment: PG × E Interaction. The final ap-plication examines a case of G × E interaction involving a continuous environ-ment. The trait is apolipoprotein B (apo B) serum concentrations. apo B is one of the primary proteins of LDL. The continuous environmental index is the aver-age ambient temperature (°F) of the month when blood samples were drawn. A number of human studies have documented the existence of seasonal variation in lipoprotein levels (Buxtorf et al. 1988; Gordon et al. 1988), even after controlling

HB_5-6_FINAL.indb 539 5/3/2010 12:28:25 PM

540 / blangero

for seasonal differences in diet. Because ambient temperatures are known to in-fluence some enzymatic activities, one potential cause of seasonal variation is temperature variation. The data consisted of apo B serum concentrations in 614 pedigreed baboons. Each individual was represented by a single measurement. The measurements were taken over several years across all months. Additional significant covariates (sex, sex- specific age, and age2) were included (but not shown) in all subsequent analyses.

The data were analyzed using the computer program PAP (Hasstedt 1989), modified by specialized penetrance subroutines that I have developed. Because there was no evidence of a major locus influencing apo B, the analysis was lim-ited to the assessment of PG × E interaction. For the analysis the environmental index (temperature) was scaled so that the observed minimum average tempera-ture (50°F) had a score of 0.

Table 3 shows the results of the analysis. apo B levels exhibit a significant negative relationship with average temperature [ (temp) = -0.298 0.078]. The variance function used for the analysis is described by Eq. (15), and the genetic correlation function used is given by Eq. (18). As judged by the likelihood ratio statistic comparing the PG × E model with the classical polygenic model, there is evidence of significant PG × E interaction ( p = 0.01). This is due to a significant decrease in additive genetic variance as temperature increased (gG = -0.011 0.005, p = 0.049), which is shown in Figure 4. There was no analogous effect of temperature on the random environmental variance (data not shown). There is also significant evidence for incomplete pleiotropy among environments (l = 0.03

Table 2. Analysis of MG × E and PG × E Interaction in apo AI Concentrations in 617 Pedigreed Baboons: Maximum-Likelihood Estimates and Likelihood Ratio Statistics

Model

MG × E No No ClassicalParameter PG × E MG × E PG × E Mixed

qA 0.668 0.757 0.666 0.712μAA 117.03 114.10 112.19 113.94μAB 125.21 131.40 129.38 129.90μBB 147.68 172.27 154.92 169.29ΔAA –10.15 –5.49 –3.95 –5.69ΔAB –4.06 –5.49 –9.22 –5.69ΔBB 17.21 –5.49 8.35 –5.69sGM 14.63 12.24 7.74 7.61sGF 5.60 5.12 7.74 7.61sEM 18.17 16.29 19.21 19.88sEF 19.94 21.55 20.30 19.88rG(M,F) 1.000 1.000 (1) (1)Λ – 10.22 11.12 16.44d.f. – 2 1 4p – 0.006 0.001 0.003

MG refers to an inferred major gene and E refers to sex (male versus female).

HB_5-6_FINAL.indb 540 5/3/2010 12:28:25 PM


Figure 3. Sex- specific genotypic means for apo AI concentration showing MG × E interaction. Error bars indicate 1 SE.

Table 3. Analysis of PG × E Interaction in APOB Concentrations in 614 Pedigreed Baboons: Maximum-Likelihood Estimates and Likelihood Ratio Statistics

Model

ClassicalParameter PG × E l = 0 gG = 0 Polygenic

m 51.16 50.28 50.48 49.52 (temp) –0.298 –0.245 –0.286 –0.232sG50 15.60 14.49 12.13 10.84sE 16.76 17.47 16.77 17.55gG –0.0114 –0.0125 (0) (0)l 0.0299 (0) 0.0299 (0)Λ – 2.90 3.88 6.56d.f. – 1 1 2p – 0.044 0.049 0.010

E represents average monthly temperature at time of sample.

HB_5-6_FINAL.indb 541 5/3/2010 12:28:26 PM

542 / blangero

0.02, p = 0.044). This can be seen in Figure 5, which shows how the genetic cor-relation between the trait’s expression in different temperature environments is a function of the absolute temperature difference between any two environments. The finding of significant PG × E interaction can be interpreted as evidence for a significant genetic component in the response of serum apo B levels to tempera-ture change in baboons.

Discussion

The methods presented here can be used to decompose the underlying ge-netic determinants of physiological and developmental response to environmental stresses in human populations. Using some basic mathematical relationships, I have shown that the genetics of response can be studied through the examination

Figure 4. Predicted additive genetic variance in apo B concentration as a function of average monthly temperature. Significant decline in variance is indicative of PG × E interaction.

HB_5-6_FINAL.indb 542 5/3/2010 12:28:26 PM


of G × E interaction even when response itself cannot be measured in any single individual. However, utilization of this approach requires a shift in sampling de-signs from individuals to pedigrees. More specifically, the successful implemen-tation of the proposed methodology depends on the exploitation of situations in which relatives can be measured in different environments (i.e., there are environ-mental contrasts among relatives).

Other investigators have called for and/or employed similar strategies for a variety of physiological traits and environmental stresses (Mueller et al. 1980; Ward and Prior 1980; Chakraborty et al. 1983; Ward 1985). However, most previ-ous applications of family studies have focused on classical quantitative genetic models of inheritance [but see Ward (1985) for an application allowing for a major gene]. Such simple models are unlikely to hold for many of the traits that are ex-amined in studies of human adaptability.

Figure 5. Predicted additive genetic correlations between apo B polygenotypes as a function of tem-perature difference. Nonunit correlations are indicative of PG × E interaction.

HB_5-6_FINAL.indb 543 5/3/2010 12:28:26 PM

544 / blangero

The genetic architecture of physiological traits involved in the response to environmental stresses is complex, involving both major genes with relatively large effects and numerous genes with small effects (polygenes). The evidence for major genes influencing critical physiological pathways is immense (as a ran-dom perusal of the genetic- epidemiological literature will reveal). Therefore it is highly likely that some of the focal traits in human adaptability studies are also influenced by major genes. We have recently confirmed this expectation by dem-onstrating that a single major locus accounts for nearly 40% of the phenotypic variance in %O2 saturation of arterial hemoglobin in Tibetan high- altitude dwell-ers (Beall et al. 1994). The analyses of response to environmental stress in such traits will require methods, such as the ones presented here, that include the po-tential for both MG × E and PG × E interaction. Only recently have the necessary statistical genetic tools become available for the detection of the effects of G × E interaction on complex quantitative traits (Eaves 1984; Moll et al. 1984; Bonney et al. 1988; Blangero et al. 1990; Blangero and Konigsberg 1991; Konigsberg et al. 1991; Pérusse et al. 1991; Blangero et al. 1992).

The methods described here will be particularly useful for studies that can incorporate a continuous environmental measure. In this regard the choice of en-vironments to examine can be broad, encompassing both endogenous and exog-enous factors that are not normally considered environments. For example, the age at which an individual is measured can be considered a measure of the endog-enous developmental environment. Thus these methods will also be applicable to studies of growth and development. For example, using these methods Williams- Blangero et al. (1992) recently found evidence for a major gene influencing head breadth that showed distinct major genotype- age interaction, suggesting that gen-otypes exhibit differential growth patterns.

In summary, the statistical genetic methodology presented here can be used for a wide variety of problems regarding the genetic basis of human adaptability. Ultimately, such knowledge of the genetic components of intrapopulation varia-tion will help us to understand the evolutionary dynamics of between- population differentiation.

Acknowledgments This research was supported by the National Institutes of Health under grants HL28972, HL45522, GM31575, DK44297, and contract HV5303. I thank Tom Dyer for providing expert computer programming assistance, Sarah Williams- Blangero, Lyle Konigsberg, and Brad Towne for helpful discussions, Glen Mott for per-forming the LDL- C, apo B, and apo AI measurements, and Jim Hixson for allowing me to reanalyze his LDLR RFLP data.

The specialized Fortran subroutines used in the examples will be made available to interested individuals who already have official copies of PAP (version 3.0) and/or Fisher.

Received 9 November 1992; revision received 1 February 1993.

HB_5-6_FINAL.indb 544 5/3/2010 12:28:26 PM


Literature CitedBaker, P. 1976. Research strategies in population biology and environmental stress. In The Measures

of Man: Methodologies in Biological Anthropology, E. Giles and J. S. Friedlaender, eds. Cam-bridge, MA: Peabody Museum Press, 230–259.

Beall, C. M., J. Blangero, S. Williams- Blangero, and M. C. Goldstein. 1994. A major gene for satura-tion of arterial hemoglobin in Tibetan highlanders. Am. J. Phys. Anthropol. (in press).

Blangero, J. 1991. Complex segregation analysis incorporating genotype- environment interaction. Am. J. Hum. Genet. S49:465.

Blangero, J., and L. W. Konigsberg. 1991. Multivariate segregation analysis using the mixed model. Genet. Epidemiol. 8:299–316.

Blangero, J., S. Williams- Blangero, and J. E. Hixson. 1992. Assessing the effects of candidate genes on quantitative traits in primate populations. Am. J. Primatol. 27:119–132.

Blangero, J., J. W. MacCluer, C. M. Kammerer, G. E. Mott, T. D. Dyer, and H. C. McGill Jr. 1990. Genetic analysis of apolipoprotein A- I in two dietary environments. Am. J. Hum. Genet. 47:414–428.

Boerwinkle, E., R. Chakraborty, and C. F. Sing. 1986. The use of measured genotype information in the analysis of quantitative phenotypes in man. I. Models and analytical methods. Ann. Hum. Genet. 50:181–194.

Bonney, G. E. 1984. On the statistical determination of major gene mechanisms in continuous human traits: Regressive models. Am. J. Med. Genet. 18:731–749.

Bonney, G. E., G. M. Lathrop, and J. M. Lalouel. 1988. Combined linkage and segregation analysis using regressive models. Am. J. Hum. Genet. 43:29–37.

Bradshaw, A. D. 1965. Evolutionary significance of phenotypic plasticity in plants. Adv. Genet. 13:115–155.

Buxtorf, J. C., M. F. Baudet, C. Martin, J. L. Richard, and B. Jacotot. 1988. Seasonal variation of serum lipids and apoproteins. Ann. Nutr. Metab. 32:68–74.

Cannings, C., E. A. Thompson, and M. H. Skolnick. 1978. Probability functions on complex pedi-grees. Adv. Appl. Probability 10:26–61.

Carrol, R. J., and D. Ruppert. 1988. Transformation and Weighting in Regression. New York: Chap-man and Hall.

Chakraborty, R., J. Clench, R. E. Ferrell, S. A. Barton, and W. J. Schull. 1983. Genetic components of variations of red cell glycolytic intermediates at two altitudes among the South American Aymara. Ann. Hum. Biol. 10:173–184.

Chernoff, H. 1954. On the distribution of the likelihood ratio. Ann. Math. Soc. 25:573–578.Eaves, L. J. 1984. The resolution of genotype × environment interaction in segregation analysis of

nuclear families. Genet. Epidemiol. 1:215–228.Elston, R. C. 1981. Segregation analysis. Adv. Hum. Genet. 11:63–120.Elston, R. C., and J. Stewart. 1971. A general model for the genetic analysis of pedigree data. Hum.

Hered. 21:523–542.Falconer, D. S. 1952. The problem of environment and selection. Am. Natur. 86:293–298.Frisancho, A. R. 1979. Human Adaptation: A Functional Interpretation. St. Louis, MO: Mosby.Gordeuk, V., J. Mukiibi, S. J. Hasstedt, W. Samowitz, C. Q. Edwards, G. West, S. Ndambire, J. Em-

manual, N. Nkanza, Z. Chapanduka et al. 1992. Iron overload in Africa: Interaction between a gene and dietary iron content. New Engl. J. Med. 326:95–100.

Gordon, D. J., J. Hyde, D. C. Trost, F. S. Whaley, P. J. Hannan, D. R. Jacobs, and L.- G. Ekelund. 1988. Cyclic seasonal variation in plasma lipid and lipoprotein levels: The Lipid Research Clinics Coronary Primary Prevention Trial Placebo Group. J. Clin. Epidemiol. 41:679–689.

Harrison, G. A. 1966. Human adaptability with reference to the IBP proposals for high- altitude re-search. In The Biology of Human Adaptability, P. T. Baker and J. S. Weiner, eds. Oxford, En-gland: Clarendon Press, 509–520.

Hasstedt, S. J. 1982. A mixed model likelihood approximation for large pedigrees. Computers Biomed. Res. 15:295–307.

HB_5-6_FINAL.indb 545 5/3/2010 12:28:26 PM

546 / blangero

Hasstedt, S. J. 1989. Pedigree Analysis Package, V3.0. Salt Lake City, UT: Department of Human Genetics.

Hasstedt, S. J. 1991. A variance components/major locus likelihood approximation on quantitative data. Genet. Epidemiol. 8:113–125.

Hixson, J. E., C. M. Kammerer, L. A. Cox, and G. E. Mott. 1989. Identification of an LDL receptor gene marker associated with altered levels of LDL- cholesterol and apolipoprotein B in ba-boons. Arterosclerosis 9:829–835.

Hopper, J. L., and J. D. Mathews. 1982. Extensions to multivariate normal models for pedigree analy-sis. Ann. Hum. Genet. 46:373–383.

Hopper, J. L., and J. D. Mathews. 1983. Extensions to multivariate normal models for pedigree analy-sis. II. Modeling the effect of shared environments in the analysis of variation in blood lead levels. Am. J. Epidemiol. 117:344–355.

Kendall, M. G., and A. Stuart. 1961. The Advanced Theory of Statistics, v. 2. London, England: Charles Griffin.

Konigsberg, L. W., J. Blangero, C. M. Kammerer, and G. E. Mott. 1991. Mixed model segregation analysis of LDL- C concentration with genotype- covariate interaction. Genet. Epidemiol. 8:69–80.

Lalouel, J. M. 1983. Segregation analysis of familial data. In Methods in Genetic Epidemiology, N. E. Morton, D. C. Rao, and J. M. Lalouel, eds. Basel, Switzerland: Springer Karger, 75–97.

Lalouel, J. M., D. C. Rao, N. E. Morton, and R. C. Elston. 1983. A unified model for complex segrega-tion analysis. Am. J. Hum. Genet. 35:816–826.

Lange, K., and M. Boehnke. 1983. Extensions to pedigree analysis. IV. Covariance components mod-els for multivariate traits. Am. J. Med. Genet. 14:513–524.

Lange, K., D. Weeks, and M. Boehnke. 1988. Programs for pedigree analysis: Mendel, Fisher, and dGene. Genet. Epidemiol. 5:471–472.

Little, R. J. A., and D. B. Rubin. 1987. Statistical Analysis with Missing Data. New York: Wiley.Mather, K. 1953. Genetical control of stability in development. Heredity 7:297–336.Moll, P. P., C. F. Sing, S. Ussier- Cacan, and J. Davignon. 1984. An application of a model for a

genotype- dependent relationship between a concomitant (age) and a quantitative trait (LDL cholesterol) in pedigree data. Genet. Epidemiol. 1:301–314.

Morton, N. E., and C. J. MacClean. 1974. Analysis of familial resemblance. III. Complex segregation analysis of quantitative traits. Am. J. Hum. Genet. 26:489–503.

Mueller, W. H., R. Chakraborty, S. A. Barton, F. Rothhammer, and W. J. Schull. 1980. Genes and epidemiology in anthropological adaptation studies: Familial correlations in lung function in populations residing at different altitudes in Chile. Med. Anthropol. 4:367–384.

Pérusse, L., P. P. Moll, and C. F. Sing. 1991. Evidence that a single gene with gender- and age- dependent effects influences systolic blood pressure determination in a population- based sample. Am. J. Hum. Genet. 49:94–105.

Robertson, A. 1959. The sampling variance of the genetic correlation coefficient. Biometrics 15:469–485.

Towne, B., J. Blangero, and G. E. Mott. 1992. Genetic analysis of sexual dimorphism in serum APO AI and HDL- C concentrations in baboons. Am. J. Primatol. 27:107–117.

Ward, R. H. 1985. Isolates in transition: A research paradigm for genetic epidemiology. In Diseases of Complex Etiology in Small Populations, E. Szathmary and R. Chakraborty, eds. New York: Alan R. Liss, 147–177.

Ward, R., and I. Prior. 1980. Genetic and sociocultural factors in the response of blood pressure to migration of the Tokelau population. Med. Anthropol. 4:339–366.

Williams- Blangero, S., J. Blangero, and M. C. Mahaney. 1992. Segregation analysis of craniometric traits incorporating genotype- specific growth patterns. Am. J. Hum. Genet. 51:A163.

HB_5-6_FINAL.indb 546 5/3/2010 12:28:26 PM

Genet. Res., Camh. (1994),64, pp. 57-69 With 5 texl:figures Copyright © 1994 Cambridge University Press 57

Estimating the covariance structure of traits during growth and ageing, illustrated \vith lactation in dairy cattle

MARK KIRKPATRICK*t, WILLIAM G. HILLt A~D ROBIN THOMPSON§ * Department 01" Zuology, Unil.'ersity uf Texas. Austin TX 7R712, USA. t Instilute of Cell, Animal and Population Biolugy. University of Edinburgh. West l'Jains Road. Edinhurgh EH93JT U.K. §BBSRC Ruslin lnstirure (Edinburgh), Ruslin, l'vfidlorhian EH259PS. U.K.

(Received 20 October 1993 and in redsedfurm 13 !vIay 1994)

Summary

Quantitative variation in traits that change with age is important to both evolutionary biologists and breeders. We present three new methods for estimating the phenotypic and additive genetic covariance functions of a trait that changes with age, and illustrate them using data on daily lactation records from British Holstein-Friesian dairy cattle. First, a new technique is developed to fit a continuous covariance function to a covariance matrix. Secondly, this technique is used to estimate and correct for a bias that inflates estimates of phenotypic variances. Thirdly, \ve offer a numerical method for estimating the eigenvalues and eigenfunctions of covariance functions. Although the algorithms are moderately complex, they have been implemented in a software package that is made freely available.

Analysis of lactation shows the advantages of the new methods over earlier ones. Results suggest that phenotypic variances are inflated by as much as 39 % above the underlying covariance structure by measurement error and short term environmental effects. Analysis of additive genetic variation indicates that about 90 % of the additive genetic variation for lactation during the first 10 months is associated with an eigenfunction that corresponds to increased (or decreased) production at all ages. Genetic tradeoffs between early and late milk yield are seen in the second eigenfunction, but it accounts for less than 8 % of the additive variance. This illustrates that selection is expected to increase production throughout lactation.

1. Introduction

An individual's phenotype changes with age. A trait that changes with age can be represented as a trajectory, that is, a function of time. Because each character takes on a value at each of an infinite number of ages, and its value at each age can be considered as a distinct trait, such trajectories are referred to as 'infinite-dimensional' characters.

Many problems of interest to breeders and evolutionary biologists involve selection on this type of trait. The traditional way of analysing the quantitative genetics of infinite dimensional traits involves focusing on the phenotypic values at a small number of landmark ages, making discrete what is intrinsically a continuous process. Recently, the methods of quantitative genetics have been extended to infinitedimensional traits to overcome this deficiency

t Corresponding author.

(Kirkpatrick & Heckman, 1989; Kirkpatrick et al. 1990; Kirkpatrick & Lofsvold, 1992; Gomulkiewicz & Kirkpatrick, 1992).

The infinite-dimensional approach can provide more accurate estimates of variation in the traits and improve estimates of their response to natural or artificial selection as compared to conventional methods. Improved estimates of phenotypic and genetic covariances can be realized using the fact that the measurements are ordered in time. The situation is analogous to the classical statistical problem of predicting the value of a dependent variable y as a function of an independent variable x. A standard approach is to regress observed values of y onto x. Then, given a value x*, a prediction for the corresponding value y* is determined by the regression eq uation. Alternatively, one might use the observed value of y corresponding to the observed value of x that is closest to x*. In many situations the regression prediction will be superior because measurement error in y makes prediction from a single pair of observed x

1'v1. Kirkpatrick, W. G. Hill and R. Thornpson

and y unreliable, while the regression approach gains power bv using information from all the observations.

An estimate of the covariance between the values of a trait at two ages can likewise be improved by using information about the covariances at other ages. The classical approach of treating the value at each age as a discrete trait without regard for its place in the sequence of ages loses substantial information. In contrast, the infinite-dimensional approach seeks to retain this information by using, in effect, a regression of covariance on age. Given the notoriously large sampling errors inherent in estimates of covariances, any gain in the power of estimation is welcome.

This paper extends the recently-developed methods for the analysis of infinite-dimensional traits and demonstrates them using data on lactation records from British Holstein-Friesian dairy cattle. We begin by briefly reviewing the framework for estimating covariance functions that was introduced by Kirkpatrick et al. (1990). We then introduce three new methods within this framework. The first is a technique for estimating covariance functions referred to as the method of asymmetric coefficients. This method is illustrated with a simple worked example. The second is a technique for correcting the bias that appears along the diagonal of an estimated phenotypic covariance function or matrix. This bias arises because date-specific measurement errors inflate the phenotypic variances (the diagonal elements), but have no such bias on the phenotypic covariances (the offdiagonal elements) or any of the additive genetic parameters. Our strategy here is to use the unbiased off-diagonal elements to estimate the diagonal elements. The algorithm is demonstrated using a simple example. The third method is a numerical approach to calculate the eigenvalues and eigenfunctions of a covariance function which is useful to describe the patterns of variation. After these new methods are introduced, they are applied to lactation records from British Holstein-Friesian dairy cattle.

2. Estimating covariance functions

For any trait that changes in time, the phenotype of an individual at age t can be written x(t). Variation in the population for this function is characterized by a covariance function. A covariance function is the infinite-dimensional analogue of a covariance matrix. The value of the phenotypic covariance function 2I'(t l' t 2 ) gives the phenotypic covariance between the value of the trait at ages t1 and t2 • The phenotypic variance at age t] is written 2J>(t l' tl). Likewise, the additive genetic covariance structure of a population is described by the additive genetic covariance function '9.

For any practical application, these covariance functions are estimated from breeding data. The approach advocated by Kirkpatrick et al. (1990) starts

58

with measurements of individuals at each of n ages, denoted a 1 through a". Standard quantitative-genetic methods are used to obtain an estimate of the n x n covariance matrix for the measurements at these ages.

The goal now is to estimate the underlying covariance function from this matrix. In general, this is done by interpolating between the values of the covariance matrix, perhaps smoothing them in order to damp out the sampling error in the elements of the matrix. A variety of functions can be used for the interpolation. The approach we developed earlier is based on orthogonal functions (Kirkpatrick & Heckman, 1989; Kirkpatrick et af. 1990), and we will again use that method here.

We begin by briefly reviewing the approach, which is referred to as the method of' symmetric coefficients' in this paper. It starts with the fact that any continuous covariance function can be represented as a weighted sum of orthogonal functions. That is, given a set of functions 9;, i = 0, 1, ... , that are orthogonal over the interval [al' an], we can write the covariance function ,9 as

qJ(tl,lz) = 2: 2: C i )9iUJ9,(t2), (1) ?=OJ=O

where the elis are constants. These constants form a symmetric matrix, C lj = C ji (whence the term' symmetric coefficients'), which guarantees that qJ is symmetric as required by the definition of a covariance function. The strategy developed by Kirkpatrick et al. 1990) is to use an estimated covariance matrix P

based on measurements taken at n ages IO estimate a truncated set of the weighting coeflkients Clio Our estimate of the covariance function qJ, based on the first k orthogonal functions, is then

.. -1 .. -1

pj(t1 , t 2 ) = 2: 2: Cij 9;(t1 ) 9lt2)' (2) i=H 1=0

where k ~ 11. The statistical problem, then. is to estimate the matrix of coefficients C so that they can be substituted into eqn (2) to yield an estimate of the covariance function. As discussed by Kirkpatrick et al. (1990), we can obtain a 'full fit', in which k = n, such that the value of .&(tl' t 2 ) exactly equals the corresponding value of P when t1 and t2 equal two of the ages at which the data were taken. Alternatively, we can seek a 'reduced fit', in which k < n. Under a reduced fit, there will generally be discrepancies between if> and.P. The rational for favouring a reduced fit is that the estimate P includes sampling error, and we might prefer an estimate ,0/; that smooths out the fluctuations that these errors introduce.

The methods for both the full and reduced estimates of a covariance function that were developed earlier lead to a symmetric coefficient matrix C. In the next section we introduce a new method that leads to an asymmetric coefficient matrix, and show its advantages.

Infinite-dimensional COil'S

3. The method of asymmetric coefficients

There are three reasons for developing the new method based on 2symmetric coefficients. First, an estimate of a covariance function based on our earlier method has continuous first derivatives everywhere. This may be undesirable alol1g the diagonal of the covariance function, where we might want to allow for the possibility of a crease, or discontinuous first derivative. A discontinuous first derivative along the diagonal is found in the covariance functions of several simple stochastic processes, including brownian motion, and so it seems desirable to allow for this possibility. The method of asymmetric coefficients makes no assumption about the continuity of first derivatives of the estimated covariance function along the diagonaL

Secondly, estimates of a covariance function based on asymmetric coefficients may be somewhat better behaved than those based on symmetric coefficients. The reason lies in the fact that estimates using symmetric coefficients involve the products of higherorder terms that result in functions that are less smooth than their asymmetric counterparts. The coefficient matrix C derived using symmetric coefficients generally will have all non-zero elements. When substituted into eqn (2), this produces terms of order ¢k-](tJ ¢k-l(tZ)' With orthogonal polynomials as the ¢s, for example, this corresponds to the product of two (k _1)th order polynomials, which will often result in a quite 'wiggly' function. By contrast, the coefficient matrix C derived using asymmetric coefficients has zeros in all elements eu for which i+j ~ k. Thus the terms of highest order to appear in eqn (2)

Symmetric coef:ficiems I

2 100.--------____ ~~j 11

Asymmetric coefficients

11

11

rIg. 1. Fits using the methods of symmetric coefficients ttop) and asymmetric coefficients (bottom) with the example of eqn (5) discussed in the text. The solid circles show the original data points.

59

are of the same order as 9"-1' Hence asymmetric coefficients often lead to smoother estimates.

Thirdly, the asymmetric method can be used to correct for a bias in the diagonal elements of phenotypic covariance functions. We discuss this problem further in a later section (' Extrapolating to the diagonal ').

These attractions of asymmetric coefficients are mitigated by the fact that some of the techniques developed earlier under the method of symmetric coefficients do not carryover to the new method. In particular, the algebraic technique for estimating the eigenfunctions and eigenvalues of the covariance matrix directly from the coefficients cannot be applied to asymmetric coelTIcients. It is still possible, however, to estimate these quantities by numerical methods using the methods we discuss in a later section C Analysis of genetic variation ').

The method of asymmetric coefficients seeks an estimate of the covariance function .:Yf that is of the form

_ f~l ~ Cij 9,(t1 ) 9i t 2)' tl ~ t2

gl>(tl' tJ = :~~ L~ (3)

l~ E Cij 9i(t2 ) 9/11' t] < t2 •

Unlike the earlier method, there is no requirement that Cij = Cji because the form of eqn (3) guarantees that ~ will be symmetric, so we refer to this as the method of' asymmetric coefficients'. The data matrix P contains n(n + 1 )/2 parameters, and we can estimate no more than this number of coefficients. We choose to fit the coefficients Cij with i + j ~ k -1, that is the lower left half of the matrix C. This choice tends to result in a smoother estimate than if higher-order coefficients were fitted, as discussed above.

The strategy we use to fit C is to transform the problem into a standard least-squares formulation. By stacking the columns of the data matrix P to form a vector p, and similarly transforming the coefficient matrix ; into a vector c, the statistical model can be written:

p = Xc+e, (4)

where p is a vector of observations (the estimated variances and covariances), X is a matrix defined by the values of the orthogonal functions evaluated at the measured ages, c is a vector of coefficients, and e is a vector of error terms. Our goal is to solve for the vector c that minimizes the error vector according to the weighted sum of squares criterion.

Algorithms for this calculation are described in Appendix A for the cases of both a full and a reduced fit. The method has been implemented as a computer program in a J'vJathematicag notebook (V/olfram, 1991). The program (which also performs other analyses and displays them graphically) is available from the senior author.

.M. Kirkpatrick, W. G. Hill and R. Thompson

To illustrate our approach, consider the problem of fitting the covariance matrix

(5)

based on measurements taken at the ages a = (10, 11)T, as showil in Figure 1. We will find the full estimate of 9, and so k = n = 2. \Ve choose to use normalized Legendre polynomials as the orthogonal functions. The first two of these polynomials are:

(6)

(see Kirkpatrick et al. 1990). Calculation of the coefficient matrix is described in detail for this example in Appendix A. It leads to the result

~ ( 6 c= -v(3/3)

V/(3/3))

° . J

(7)

Notice that the matrix t: is asymmetric, and that elements below the antidiagonal are zero. These two properties distinguish the asymmetric coefficient matrix from the symmetric matrix approach described by Kirkpatrick et al. (1990).

Substituting these coefficients into eqn (A 14) gives our estimate of the covariance function:

for for

10 ~ " ~ t2 ~ 11 10~t2~tl~11.

(8)

While the coefficient matrix from which it was calculated is asymmetric, the covariance function itself is symmetric (as required by the definition of a covariance function). Checking, we confirm that the entries in the original matrix P are recovered when we substitute (1' t2 = 10, 11 into eqn (8). A perfect fit of the estimated covariance function to the data matrix results whenever a full fit is calculated.

The method of symmetric coefficients developed previously (Kirkpatrick et at. 1990) leads to somewhat different results. Using that method, the coefficient matrix for a full fit is

- (5 c= o o ')

1 '~ . /.5 J

(9)

Unlike the coefficient matrix (7), this matrix is symmetric. (The off-diagonal coefficients are zero in this example, but that is not generally true.) The corresponding estimate of the covariance function is

(10)

As with the earlier estimate using asymmetric coefficients, the original data matrix of eqn (5) is recovered when we substitute t 1 , t2 = 10, 11 into this equation.

The symmetric and asymmetric expressions for r! are quite different. The differences are seen clearly in Fig. 1. A conspicuous and diagnostic discrepancy is that symmetric coefficient estimate of r! is smooth along the diagonal while the asymmetric coefficient

60

estimate is not. The symmetric coefficient estimate also has more curvature.

4. Extrapolating to the diagonal

Estimates of phenotypic variances for the values of traits at specific ages are often inflated by factors that do not affect estimates of the covariances between ages. One source of this inflation is measurement error. A second source involves environmental factors that have effects over periods much shorter than the between-measurement intervals, such as weather, health, food quality, and hormonal state. This second type of factor tends to increased covariances close to the diagonal of the covariance function. For example, estimates of the phenotypic correlations of lactation test day records one day apart were 0'84, declining only to 0-82 for records five days apart (Pander et al. 1993). Thus we can view the diagonal elements of a phenotypic covariance matrix or function as being biased upwards, relative to a smoother underlying pattern that we expect on biological grounds. The upward bias appears as a ridge along the diagonal of estimated phenotypic covmiance matrices and covariance functions. This bias distorts OUT picture of the covariance structure of the trait, and has practical implications in breeding programs that are based on age-specific variances.

We would therefore like to correct for the bias. Two strategies are available. A direct approach would be to estimate the measurement error directly, for example through repeated measures. A second, indirect approach is available when the characters of interest are age-specific measurements of the same trait through time. A familiar example is a growth trajectory, in which thc data are measurements of the sizes of each individual at a series of ages. In this situation the basic phenotype of interest is a continuous function·· the growth trajectory - that is an infinite-dimensional trait. Here we show how the phenotypic covariances for an infinite-dimensional trait can be used to estimate the variances (that is, the diagonal elements of the covariance matrix). These estimates are free of measurement error bias, and may lead to selection indices with increased efficiency.

Our strategy is as follows. On intuitive grounds, we expect covariance function for growth processes to be continuous. (This is a biological rather than mathematical argument, since there is nothing in the definition of a covariance function that requires it to be continuouso) Using the unbiased estimates for the phenotypic co variances (that is, f!lJ(tl' t 2) where (1 =F t~), we can extrapolate estimates of the phenotypic variances (that is, f!lJ(ll' t1))' The algorithm begins with an estimated phenotypic covariance matrix of the sizes of indiyiduals at the n ages ai . \Ve first estimate the phenotypic covariance function using only the n(n-l)/2 unbiased subdiagonal elements of P. The method of asymmetric coefficients described above

Infinite-dimensional COlrS

15

Fig. 2. Fit using the method of extrapolating to the diagonal using the example discussed in the text. The solid circles show the original data points; the open circles are the extrapolated values for the diagonal elements (the variances).

produces an estimate of the phenotypic covariance function in terms of a weighted sum of orthogonal functions. Because the diagonal elements of P were omitted, this estimated interpolates the values of q»(tl' t2 ) over the ranges tl E La2 , an] and t2 E lap an-J, where tl > 12 , Secondly, the coefficients are used to extrapolate the estimated covariance function: the range of tl is extended downward from age az to a 1 and the range of [2 upward from age G'i-l to an' giving us the full range t1' 12 E [aI' an]. This extrapolation gives us an unbiased estimate of the diagonal of f!J' along with the rest of the covariance function.

A detailed description of the algorithm is given in Appendix B. It has been implemented in a liJafhematicaT§ notebook, which is available from the senior author. To illustrate, consider the estimated phenotypic covariance matrix

3 8 3

~) 9/

(11)

based on measurements of some character taken at ages a = (10, 11, 15yr, plotted in Fig. 2. This example will illustrate how the method naturally accommodates uneven intervals between the measured ages. The variances along the diagonal of P have been inllated by the biases described earlier, and our aim is to obtain corrected estimates for them. In this approach the diagonal elements are not used in the estimate of the if? , and so a full fit uses k = n - 1 = 2 orthogonal functions.

Appendix B shows that using the method of extrapolating to the diagonal, we obtain an estimated phenotypic covariance function

#(tl' 12 ) =

(-17/4+t 1 -t2 /4 for

t-17/4+t,,-t l /4 for

10!( 11 !( [2 !( 15

10 !( t2 !( t1 !( 15. (12)

Evaluating this function at the measured ages (t = 10, 11. 15), we obtain the matrix

~ (3'25 p= 3

2

3 4 3

(13)

61

The results suggest that the variances shown along the diagonal of eqn (11) are overestimated by as much as 115 % (7 F. 3·25 for the variance at age 10). These results are illustrated in Fig. 2.

5. Analysis of variation

The covariance function is an important descriptor of variation in the trajectory of a character that changes through time, and a substantial amount can be learned from its analvsis. The spectrum, or eigenvalues and eigenfunction~, of a covariance function is particularly useful. The leading eigenvalues and eigenfunctions visualize major patterns of variation, and describe these patterns with many fewer parameters than the full covariance function. One important application involves the additive covariance function. Its leading eigenfunctions identify the types of evolutionary changes for which the population has substantial genetic variation available. Conversely, its spectrum also shows the types of changes for which there is not appreciable genetic variation, and therefore which will occur slowly if at all under selection.

The method of symmetric coefficients was specifically devised with this objective in mind. Calculations based on a symmetric coefficient matrix can be used to obtain estimates of eigenfunctions and eigenvalues directly (Kirkpatrick & Heckman, 1989; Kirkpatrick et al. 1990), The method of asymmetric coefficients, on the other hand, cannot be adapted to these calculations. We therefore propose an alternative using a numerical approach.

An estimate of a covariance function based on asvmmetric coefficients can be evaluated on a square la~tice of a moderate to large number of points. These values form a matrix whose spectrum (eigenvectors and eigenvalues) can then be calculated by standard methods. As the number of points on the lattice increases, the estimates of the eigenvalues will converge on those of the underlying covariance function (see Kirkpatrick & Heckman, 1989). The points of the eigenvectors can be interpolated to give estimates of the corresponding eigenfunctions (within a constant factor that is a function of the number of points in the lattice1-

Thi~ may seem like a rather baroque method of estimating quantities that could be obtained much more directly by simply calculating the eigenvalues and eigenvectors of the original covariance matrix. The incentive for performing the less direct algorithm just described is that it is expected to give more accurate estimates (Kirkpatrick & Heckman, 1989). The reason for this seems to lie in the fact that simply calculating the spectrum of the original matrix discards all information about the ordering in time of the ages at which the measurements were taken. The methods for symmetric coefficients developed by Kirkpatrick et al. (1990) make use of this information; the indirect

M. Kirkpatrick, W. G. Hill and R. Thompson 62

Data matrix Symmetric Full fit (k = 10)

8

Asyrrunetric Extrapolated Full fit (k = 9)

8 8

4

Extrapolated Discrepancy

8

10

Fig. 3. Estimates of the phenotypic covariance function for lactation in British Holstein-Friesian dairy cattle. The original data (top left) show a ridge corresponding to upward biases in the diagonal elements (variances). The full fit using symmetric coefficients (top right) overfits rhe data; the plot has been truncated in the vertical dimension. The full fit with asymmetric coefficients (middle left) is much smoother, but reproduces the inflated diagonal. The extrapolated full fit (middle right) is poorly behaved along the diagonal. An extrapolated reduced fit with k = 7 (bottom left) is well behaved everywhere. The discrepancy between this estimate and the original data (bottom right) is substantial along the diagonal, corresponding to the bias, but very small elsewhere. Variances and covariances are in units of kg".

algorithm just described for use with asymmetric coefficients also does so.

6. Analysis of lactation in dairy cattle

\Ve will illustrate the three methods developed above using lactation records of British Holstein-Friesian dairy cattle. The data are described in detail by

Pander et al. (1992), and comprise their data set 2. Briefly, these were records of daily milk yield (' test day records ') of 34029 heifers of known parents. The first record for each heifer was taken between day 5 and day 35 after the start of lactation, and successive records at monthly intervals for a total of 10 monthly records per individual. The data were analysed as if each measurement was taken at the midpoint of the

infinite-dimensional con's

month interval. Additive genetic and phenotypic parameters were calculated from the sire component of covariances using restricted maximum likelihood (REML; see Patterson & Thompson, 1971; Meyer, 1985).

We analysed these data with the methods described above using Legendre polynomials. When reduced estimates are computed, the matrix V of the error covariances between the estimated covariances is required (Kirkpatrick et al. 1990). Since the REML program used does not estimate V, we used the following approximation. The phenotypic and additive genetic covariances were viewed as if they had been estimated using a standard balanced half-sib breeding design with 700 random sires, each with 10 half-sib offspring. There was a total of 16000 residual degrees of freedom because daughters of anum ber of additional selected sires were used to increase connexions between herds. These sample sizes are a reasonable approximation to the more complex pedigree actually used to estimate the genetic parameters (see Pander et a/. 1992). The V matrix 'Was then estimated using the formulae in Appendix C of Kirkpatrick et at. (1990).

(i) Estimating the phenotypic covariance function

The phenotypic covariance matrix is plotted in three dimensions in Fig. 3 (top left). We begin by calculating full estimates (k = n = 10) of the phenotypic covariance function. The estimates based on symmetric coefficients and on asymmetric coefficients are compared in Fig. 3. Both show a conspicuous ridge running along the diagonaL corresponding to the date-specific measurement error described in the introduction. A secondary effect of the spike along the diagonal is to produce a series of parallel harmonic ridges in the symmetric coefficient estimate. These are not seen in the original data, but rather reflect side effects of how the polynomials used to construct :j accommodate the large elements along the diagonaL The estimate based on asymmetric coefficients is substantially smoother, as anticipated for the reasons discussed earlier, but still captures the diagonal ridge corresponding to the inflated variance estimates.

The method of extrapolating to the diagonal is used to eliminate the upward bias in the estimates of the diagonal of P. The full fit (k = 9 polynomials) gives an unsatisfactory estimate for f!jJ (Fig. 3, middle right). The extrapolation based on these high-order polynomials causes the estimated covariance function to take on extremely small values along the diagonal. In fact, this estimate is not positive semidefinite, and so does not qualify as a covariance function. The reduced fit with k = 8 suffers the same problem.

A reduced estimate with k = 7 (Fig. 3, bottom left), however, shows a covariance function that is both well-behaved and in keeping with our intuitive

63

expectation based on the original data. The function rises smoothly to the diagonal. It fits the off-diagonal elements of P very well: all of the differences are less than 2 % in magnitude. In contrast, there is a large discrepancy between the extrapolated estimates of the diagonal elements of P and those from the original matrix (Fig. 3, bottom right). The differences are, in facL our estimates of the upward biases in the diagonal elements. They are substantial. Our extrapolated values differ by as TIl uch as 36 % from the values of the diagonal elements of the original matrix P.

An even simpler estimate of the covariance function would be one that depended only on the difference between any pair of ages. Inspection of the data, however, shows for example that the phenotypic correlation between the first and second month of lactation is different from that between the seventh and eighth (r p = 0·64 v. r p = 0·76, respectively; Pander et af. 1992 Appendix Table 1), and so this alternative is not appropriate in this case.

(ii) Estimation and analysis of the additive genetic covariance function

To illustrate the method of asymmetric coefficients, we will again use the data of Pander e tal. (1992). Their estimate of the lOx 10 additive genetic covariance matrix G and our estimates of the continuous covariance function are shown in Fig. 4. W-e first calculated full (k = n = 10) estimates r§ of the covariance function using both symmetric and asymmetric coefficients. Both show severe fluctuations. The symmetric estimate takes on values that range from less than - 3 to more than 7 kg2 (Fig. 4, top right) even though the original matrix elements only span the range from 1·5 to 3·5 kg2

• The asymmetric estimate is again considerably better behaved, as expected, but it nevertheless takes on values as small as - 0.12 kg~ (Fig. 4, middle left).

\Ve than calculated the symmetric and asymmetric estimates using reduced fits using k = 9 polynomials (Fig. 4, middle right and bottom left). Both are far better behaved than the full estimates. The goodnessof-fit tests give X2 (10 D.F.) = 36·4 for the symmetric fit and X2 (10 D.F.) = 30·8 for the asymmetric fits, indicating that the asymmetric tit is somewhat better. Both tests, however, show there are statistically significant discrepancies between the smoothed covariance function and the original data. \Ve nevertheless prefer these reduced estimates because they are smoother and because the discrepancies between them and the original data matrix are small (less than 7 % for both the symmetric and asymmetric fits).

We estimated the eigenvalues and eigenfunctions of <§ using three different methods for comparison. First, we analysed the symmetric estimate with the algebraic method described by Kirkpatrick et af. (1990) using the reduced estimate with k = 9. Secondly, we carried

M. Kirkpatrick, W. G. Hill and R. Thompson

Data matrix

4

Asymmetric

4

2

Asymmetric

4

10

Symmetric

4

Symmetric

4

4

o

Full fit (k = 10)

Asymmetric (k = 7) v. data

Fig. 4. Estimates of the additive genetic covariance function for lactation in British Holstein-Friesian dairy cattle. The original data are shown at top left. The full fit using symmetric coefficients (top right) overfits the data; the plot has been truncated in the vertical dimension. The full fit with asymmetric coefficients (middle left) is much smoother, but is poorly behaved in the off-diagonal comers. Reduced fits with k = 9 using symmetric coefficients (middle right) and asymmetric coefficients (bottom lefI) arc much smoother. The discrepancy betv.:een the reduced asymmetric fit and the original data (bottom right) is very small everywhere. Variances and covariances are in units of kg".

64

out an analysis of the asymmetric estimate again with k = 9 using the numerical approach outlined above wi th a 31 x 31 matrix reconstructed from the es timated covariance function. Thirdly, we calculated the eigenvalues and eigenvectors of the original matrix G. The

eigenvalues from the last two methods were renormalized to make them comparable to those from the first method. (The eigenvalues of a covariance function are defined by an integration rather than a summation. Eigenvalues calculated by the second and third

Infinite-dimensional cows

I 04 ~

04

O+-------"....----Cl----------l

20

-04

o

4 7 10 Momh

Fig. 5. Estimates of the first and second eigenfunctions (top and bottom panels, respectively) of additive genetic variation for lactation in British Holstein-Friesian dairy cattle. Each panel compares the estimates obtained via symmerric coefficients, asymmetric coefficients, and the corresponding eigenvector from the original additive genetic covariance matrix. Estimates using symmetric and asymmetric coefficients are reduced fits with k = 9 (see Fig. 4). Estimates of the corresponding eigenvalues are shown in the insets; note that estimates of A1 are about an order of magnitude greater than those for "\2' Eigenvalues are in units of kg2.

methods, in contrast, are defined by a matrix product that involves a summation whose value depends on the number of ages sampled. Renormalization accounts for the number of ages so that eigenvalues estimated by these different methods can be compared.)

The results are present in Fig. 5. The first eigenfunction is positive everywhere, showing that the principal axis of genetic variation corresponds to simultaneous increases or decreases in lactation at all ages. The leading eigenvalue shows that this eigenfunction accounts for about 90% of all additive genetic variation (ill = 228 and the sum of all eigenvalues = 254, using the method of asymmetric coefficients). Thus there appears to be substantial genetic variation for enhanced milk production throughout the entire lactation pcriod. A consequence of practical importance is that there are not strong tradeoffs between early and late lactation: genetic improvement at one age will tend to improve all ages. The second eigenfunction, which accounts for less than 8 % of the genetic variation, shows a tradeoff between performance before v. after the fourth month of lactation.

65

All three methods give similar results in this example. Our experience with other examples, hmNever, shows that this is not always so. It is likely that the agreement between the methods in this case results from the high precision of the parameter estimates in this very large data set and the relatively large number of measured ages. In other applications, we expect the infinite-dimensional methods will have substantially greater power than the conventional matrix-based ones (see Kirkpatrick & Heckman, 1989). Furthermore, the infinite-dimensional methods estimate the full eigenfunctions rather than a series of points along them.

7. DiscUlssion

The methods developed here complement those developed earlier for estimating and analysing the structure of variation in traits that change with age. The tcchnique of asymmetric coefficients may lead to smoother and more accurate estimates of the covariance and correlation functions whenever we are willing to allow there to be a crease (that is, a discontinuous first derivative) along its diagonal. The technique of extrapolating to the diagonal allows one to correct for biases that inflate the diagonal elements of a covariance matrix (the variances). The eigenvalues and eigenfunctions of covariance functions estimated by either method can be calculated numerically to reveal dominant components of variation and tradeoffs.

A question common to all of these methods is how to decide the appropriate degree of smoothing for the estimate of the covariance function. Kirkpatrick et af.

(1990) developed a goodness-of-fit test, and suggested we choose the smoothest function (the one with the smallest number of orthogonal polynomials) that does not differ significantly from the original data matrix. Our analysis of phenotypic variation in lactation shows that this criterion is not ahvays adequate. The function that it chooses may not be positive semidefinite, and therefore not qualify as a covariance function. Our preference in this case is for a smoother estimate of the covariance function that is better behaved (Fig. 3). Although it does differ significantly from the original data, the discrepancies are small.

An issue related to smoothing involves the positive definiteness of the covariance functions estimated by these methods. A covariance function, by definition, must be positive semidefinite. Estimates of covariance functions, like estimates of covariance matrices, can violate this requirement. Even if the matrix on which it is based is positive semi-definite, there is no certainty that the covariance function estimated by interpolating between the points of the matrix will be. The problem is illustrated in Fig. 3 by the estimates with k = 9. One might choose to use positive semidefiniteness as one of the criteria for choosing among estimates of the covariance function that differ in the

GRH 64


degree of smoothing. One might expect smoother fits generally to be less prone to violate positive semidefiniteness if the original data matrix does not.

Alternative methods for fitting functions might lead to better-behaved estimates of covariance functions, for example that are smoother, that conform better to the data, and that are less likely to violate the requirement of positive semidefmiteness. Polynomials, which are the basis for the estimates in this paper, are very often wiggly and can be poorly behaved when used for extrapolation. Other methods such as two dimensional splines are available (Lancaster & Salkauskas, 1986). They might lead to improved estimates.

One clear opportunity for alternative methods involves our method for extrapolating to the diagonal of the covariance function. The algorithm we developed produces a covariance function estimate that has a crease (a discontinuous first derivative) along its diagonal. Cnfortunately, trajectories corresponding to covariance functions of this sort are not smooth: they are continuous, but do not have continuous derivatives (Soong, 1973, chapter 4). We usually expect on biological grounds that a growth process will be smooth. \Ve therefore would prefer an estimate with continuous derivatives everywhere. It may be possible to extend the approach described here to cure this weakness in our method. In any event, we suspect that this extension typical1y would make only small changes to the quantitative results. The data analysed in Section 6 below suggest that further changes in the variance estimates produced by smoothing the diagonal crease will be small relative to the corrections made by the algorithm presented here.

Lactation in dairy cattle is an excellent candidate for infinite-dimensional analyses because changes in rate of production throughout lactation are of interest. \Ve would like to maximize production over the whole lactation, and need to be able to predict lactation yield from a small number of records early in lactation, both at the phenotypic level so as to make early culling decisions, and at the genetic level to make early selection decisions. Previous analyses of lactation curves and yield prediction (reviewed by Danell, 1990) have not considered the underlying continuous covariance structure of the records. The deviation of an individual from the population mean at one or two early ages can be used to estimate its performance at any later age or set of ages by the standard methods of part-whole correlation (see, e.g. Falconer, 1989; VanRaden el al. 1991). Given the economic incentives, the relatively small amount of additional computation required by the infinite-dimensional method seems a small price to pay.

Our results show that allowance for inflation of the phenotypic variance by measurement error and datespecific effects (such as illness and weather) needs to be made in computing the underlying phenotypic covariance structure. Analyses of daily milk records

66

have shown that almost all of the increase is associated with the variance of the daily record., but some residual effects span a few days. For example, the phenotypic correlations of r<!cords 1, 2, 10 and 30 days apart were 0'84, 0'82, 0-79 and 0-75 in a small data set (Pander et aI. 1993). Test day records used in the present analysis were approximately 30 d apart, and the increase in diagonal elements of 30 % or so (Fig. 3) correspond to these figures. In the repeata bility model commonly used in the analysis of quantitative traits with multiple records (e.g. Falconer, 1989, chap. 8) it is assumed that #(11' '2) = rVp for all t1 =!= [2' and that 2I'(t1 , ( 1 ) = Vp , where r is the repeatability. In our analysis we allow for this inflation of the variance but also for continuous changes in the covariance over the lactation.

The genetic analysis shows that, although there is substantial additive genetic variation, about 90 % of it is associated with the first eigenfunction, which is positive at all ages (Fig. 5). Tradeoffs are seen in the second eigenfunction, which shows opposite effects on production before and after the fourth month of testing. This eigenfunctions, however, accounts for less than 7 % of the genetic variation. The present analysis therefore formalizes what is known from examination of the genetic correlation matrix of test day records, which shows high positive values throughout (e.g. Pander et al. 1992), that selection on records from the first few months of production will have little negative consequences on performance in later lactation. A further development of the methods would involve defining joint covariance functions of yield of milk and, for example, of proportion of fat in the first lactation and (requiring more change in structure) of milk yield in the first and in later lactations.

Two directions for future work are suggested by this work. First, covariance functions would be better fitted assuming that the error structure of the estimated variances and covariances followed a multivariate Wishart distribution using likelihood to evaluate the fit. This approach requires numerical iteration. A covariance function estimate based on the methods from this paper (using least squares, assuming normally-distributed errors) would be a logical initial point for the iterations.

The ultimate extension would be a method in which the covariance function is estimated directly from the original observations, without the intermediate of a covariance matrix. In the analyses above, measurements records for individuals were grouped into 1-month categories for analysis. These pooled data were than analysed to give estimated covariance matrices, which in turn were analysed by the infinitedimensional methods. A more direct approach would avoid the pooling entirely and instead treat each record according to the individual's actual age (or number of days since lactation began). Such a method might well increase the precision of covariance function estimates.

b!finite-dimensional cows

We thank Nick Barton and two anonymous referees for comments. This work was supported by NlH grant 45226-01 and NSF grant 9107140 to M.K.

Appendix A

The appendix describes the method of asymmetric coefficients. Programs for this analysis have been implemented in a lvlathematicdJpy notebook that is available on request from the senior author.

Here we 'will follow Kirkpatrick et al. (1990) by fitting orthogonal functions. These functions, denoted 9i (i = 0,1, ... ), are orthogonal over the interval [u, u]. In the examples discussed in the text and below we use Legendre polynomials, in which case u = -1 and v = 1.

The method of asymmetric coefficients then proceeds according to the following seven steps.

(i) Form the vector p by stacking the successive columns of the lower left diagonal part of the phenotypic covariance matrix:

(A 1)

(ii) Form the coefficient vector c by stacking the successive columns of upper left diagonal part of the matrix c:

c = (Coo, C 10' ••• , Ck - 1 ,O' COl. Cll"'" C If - 1 . P ..• ,

CO. k lr". (A 2)

This contrasts with the method of symmetric coefficients, in which the coefficient vector is formed from the lower diagonal parts of the columns of C, in the same way that p is formed according to eqn (A 1). Note that the subscripts run from 0 to k -1 rather from 1 to k in order to conform with the conventional numbering of the orthogonal functions.

(iii) The estimated phenotypic covariance matrix P is based on measurements at n ages; these ages form the age vector a. We use a to calculate the adjusted age vector a*:

(A 3)

j = 1,2, ... , n. This operation rescales the range of the measured ages to the range of the orthogonal functions.

(iv) The next step is to form the matrix X from the orthogonal functions. The way in which the vectors p and c are formed makes the notation for specifying the elements of X somewhat awkward. \Ve begin by defining four "index functions', which are integervalued functions that generate appropriate subscripts for the orthogonal functions and the adjusted age vector. The first two index functions are used to generate the subscripts for the matrix C as they appear on the right hand side of (A 2). The index function 11 (i, k) is based on the sequence

0, 1, ... ,k-l,0, 1, ... ,k-2, ... ,0. (A 4)

67

The value of 11 (i, k) is given by the ith element of (A 4). For example, 11(5,3) = 1. The function Iz(i,k) is based on the sequence

0,0, ... ,0,1,1, ... ,1, ... ,k-l, (A 5)

in which there are k Os, (k-l) Is, etc. The value of 12(i,k) is given by the ith element of A 5; thus 12(6,3) = 2.

The third index function is 13 (i, k), and is based on the sequence

1,2, ... , k, 2, 3, ... , k, ... , k. (A 6)

The value of Iii, k) is given by the ith element of (A 6). For example, 1a(4, 3) = 2. The last two index functions are simply an incremented and decremented version of 12 and 13:

14(i,k) = 1k,k)-;- 1,

I5U, k) = T3U, k)-1.

(A 7)

(A 8)

For all five index functions, the argument i can take on the values i = 1,2, ... , k(k+ 1)/2.

With these definitions in hand, we now calculate the matrix X:

(A 9)

where i= ,2, ... ,n(n+1)/2 and j= 1,2, ... , k(k + 1)/2. By way of comparison, the previous method of symmetric coefficients calls for X to be of the form

Xij = 915U,k)(a1,u, n) ¢I,(j . .,)(ai',(i,n)

+ 91.(j, k)(a;3(i, n») 915U, ",)(at(i. H) / I + J[I2(j, k),

15(j, k)], (A 10)

where o[s, t] = 1 if s = t and is 0 otherwise. (v) For a full estimate of &, we solve for c using the

relation

(A 11)

Alternatively, we may be interested in a reduced estimate of &, in which the number of orthogonal coefficients that are fit is smaller than the number of observations (k < n). To do so, we first calculate the matrix V that is the error covariance matrix corresponding to the vector p; details are given in Kirkpatrick et al. (1990). We than calculate c using weighted least squares:

c = (XT V-I X)-I X T V- 1 p. (A 12)

(vi) We now form the estimated coefficient matrix C by unstacking the vector c (that is, performing the reverse of the operation described by eqn (A 2). Using the notation of the index functions, the elements of C are found by plugging successive values of i into the relation

(A 13)

where i = 1, 2, ... , k(k - 1) /2; all other elements of C are O.

5-2


(vii) The coefficient matrix generates our estimate for the covariance function:

(A 14)

where

(A 15)

The form ofeqn (A 14) guarantees that.#is symmetric, as reg uired by one of the defining criteria of covariance functions. In the case of a reduced fit (k < 11), the consistency with the original data matrix can be tested for statistically significant deviation from the data using the goodness-of-fit test described in Kirkpatrick et a!. (1990, Appendix C).

0) A worked example

The method will be illustrated by the worked example presented by the covariance matrix of text eqn (5). We will use Legendre polynomials for the fit, which are defined over the interval [- 1, 1], so 11 = - 1 and v = 1.

Following Step (i) above, stack successive columns of the subdiagona1 part of P to form the vector

p = (3,2, 3V. (A 16)

From Step (ii), the (unknown) vector of coefficients is

(A 17)

From Step (iii), the adjusted age vector is

a* = (-1, (A 18)

\Ve form the matrix X from the orthogonal functions as described in Step (iv):

c¢"CCn 9o(a;t')] [91(a;") 9o(ai)]

X [9o(ap 9o(an] [91 (a;) 90(a;")J [9ua; ) 90(a:)] [91(a:J ¢o(a~)]

2 ( \,'3\ \-2)

( _ \/3) \ 2 j

1 \,3 (-~) 2 2 \ 2

(A 19)

1 \3 \ 3 -

2 2 2

The orthogonal coefficients for the full fit are calculated as described in Step (v):

c = X-I P = (6, - v'3/3,\,3/3)T. (A 20)

By unstacking this vector according to Step (vi), we find the estimated coefficients matrix given by text eqn

68

(7). Finally, using that result in Step (vii) we arrive at the estimated covariance function given by text eqn (8).

Appendix B

The technique of extrapolating to the diagonal makes use of the asymmetric coefficient fit described in Appendix A. Programs that run this analysis have been developed in a Mathematica@) notebook that is available Crom the senior author on request.

As with the earlier methods, we are interested in fitting the data using k orthogonal functions. But because we are not using the diagonal elements of Pin the fit, we now require k < n rather than k ~ n, The algorithm proceeds by the following steps.

(i) Form the vector p by stacking the subdiagonal parts of the columns of P:

This vector is of length nn-l)/2. (ii) Form the coefficient vector c according to eqn

(A 2). (iii) We calculate two adjusted age vectors a* and

b*:

(B 2a)

and

(B 2b)

i = 1, 2, ... , n - 1. (iv) Form the matrix X:

(B 3)

where

Is(i,n) = 13 (i,n-l)+1, (B4)

i= 1,2, ... ,I1(n-1)/2, andj= 1,2, ... ,k(k+1)/2. v) As in the previous section, the coefficients are

calculated using eqn (A 11) for a full estimate, or eqn (A 12) for a reduced estimate. Note, however, that because only the off-diagonal elements of P are being used, a full fit implies k = n - 1 rather than k = n polynomials (and likewise a reduced fit implies k < n -1). A reduced fit can be tested for consistency with the original data using the goodness-of-fit test described in Kirkpatrick et al. (1990, Appendix C). When extrapolating to the diagonal, however, the values along the diagonal are omitted from the test.

(vi) The coefficient matrix C is formed from the vector c via eqn (A 13).

(vii) Finally, the estimated covariance function is obtained by substituting C into eqn (A 14) using

(B Sa)

Infinite-dimensional cmvs

and

where t 1 and ! 2 range over the interval [a" az]'

(i) A Ivorked example

\Ve will demonstrate the method of extrapolating to the diagonal using the covariance matrix given in text eqn (11), based on measurements taken at the ages

a = (l0, 11, 15)1'.

(This age vector will illustrate how the infinitedimensional method naturally accommodates unequal spacing of ages.) \Ve will calculate a full estimate oL~ (that is k = 11 - 1 = 2), again using Legendre polynomials.

The vectors p, c, a*, and h* arc

and

a* = h* = ( - 1, 1 r. Steps (iii-(v) produce the same values for X, c, and C found in the example of Section 2. Last, we follow Step (vi) to obtain the estimate of the covariance function given by text eqn 12).

References

Danell, B. (1990). Genetic aspects of different parts of the lactation. Proceedings of' the 4th Vvorld Congress on

69

Genetic Applications to Lh'estock Production, Edinburgh, 14,114-117.

Falconer, D. (1989). Introductioll to Quantitati1'e Genetics, 3rd edition. Kew York: Longman.

Gomulkiewicz, R. & Kirkpatrick, ~1. (1992). Quantitative genetics and the evolution of reaction norms. EL'o/ution 46. 390-411.

Kirkpatrick, M. & Hel:kman, K. (1989). A quantitative genetic model for growth, shape and other infinitedimensional characters. JOllrnal of'il4athematical Biologr 27,429-450.

Kirkpatrick, M., Lofsvold, D. & Bulmer, M. (1990). Analysis of the inheritance, selection and evolution of growth trajectories. Genetics 124, 979-993.

Kirkpatrick, M. & Lofsvold, D. (1992). Measuring selection and constraint in the evolution of growth. El'olution 46, 954-971.

Lancaster, P. & Salkauskas, K. (1986). Cune and Surface Fitting: An Introduction. London: Academic Press.

Meyer, K. (1985). Maximum likelihood estimation of variance components for a multivariate mixed model with equal design matrices. Biometrics 41, 153-165.

Pander, B. L., Hill, W. G. & Thompson. R. (1992). Genetic parameters of test day records of British Holstein-Friesian heifers. Animal Production 55, 11-21.

Pander, B. L., Thompson, R. & HilL W. G. (1993). The effect of increasing the interval between recordings on genetic parameters of test day yields of British HolstcinFriesian heifers. Anima! Production 56,159-164.

Pander, B. L., Thompson, R. & Hill, W. G. (1993). Phenotypic correlations among daily records of milk yields. Indian Journal of Animal Sciences 63, 1282-1286.

Patterson, H. D. & Thompson, R. (1971). Recovery of interblock information when block sizes are unequal. Biomerrika 58, 545-554.

Soong, T. T. (1973). Random DifFerentia! Equations in Science and Engineering. New York: Academic Press.

Van Raden, P. M., Wiggans, G. R. & Ernst, C. A. (1991). Expansion of projected lactation yield to sta bilize genetic variance. Journal of Dair:r Scif:'rlce 74, 4344-4349.

\Volfram, S. (1991). Mathematica: A System for Doing lHathematics h)' COrrJpUler, Second edition. Redwood CilY: Addison-Wesley.

Copyright 1999 by the Genetics Society of America

The Genetic Analysis of Age-Dependent Traits:Modeling the Character Process

Scott D. Pletcher*,† and Charles J. Geyer†

*Department of Ecology, Evolution and Behavior and †School of Statistics, University of Minnesota, Saint Paul, Minnesota 55108

Manuscript received March 15, 1999Accepted for publication June 22, 1999

ABSTRACTThe extension of classical quantitative genetics to deal with function-valued characters (also called

infinite-dimensional characters) such as growth curves, mortality curves, and reaction norms, was begunby Kirkpatrick and co-workers. In this theory, the analogs of variance components for single traits arecovariance functions for function-valued traits. In the approach presented here, we employ a variety ofparametric models for covariance functions that have a number of desirable properties: the functions (1)are positive definite, (2) can be estimated using procedures like those currently used for single traits, (3)have a small number of parameters, and (4) allow simple hypotheses to be easily tested. The methodsare illustrated using data from a large experiment that examined the effects of spontaneous mutationson age-specific mortality rates in Drosophila melanogaster. Our methods are shown to work better than astandard multivariate analysis, which assumes the character value at each age is a distinct character.Advantages over existing methods that model covariance functions as a series of orthogonal polynomialsare discussed.

SINCE the introduction of quantitative genetics thefunction of some independent and continuous variable.ory and methods to the study of evolution, a tremen- More specifically, a function-valued trait is a function

dous body of literature has developed, documenting x(t). In all of the work that has been done on function-patterns of quantitative genetic variation within and be- valued traits, including ours, both the independent vari-tween species for a wide variety of continuous characters able t and the dependent variable x(t) are single valued.(Barton and Turelli 1989; Falconer 1989; Lynch These traits have also been called infinite-dimensionaland Walsh 1998). Evolutionary biologists use this infor- traits (Kirkpatrick and Heckman 1989) because themation to predict how a population might respond to character can take on a value at an infinite number ofnatural or artificial selection and to provide insight into ages. In principle, there is no reason why our methodsthe contributions of the various evolutionary processes or those of other workers in this area cannot be ex-to the levels of genetic variation seen in natural popula- tended to allow t or x(t) or both to be multivariate. Fortions (Lande 1979, 1982; Houle 1992). Empirical esti- the case of univariate t and x(t), we think “functionmates of genetic variances in single traits and genetic valued” is the more descriptive term. It avoids confusioncovariances between traits have contributed greatly to with characters that are described by a multidimensionalour knowledge of the evolution of biological characters. t or x(t). For specificity, we always refer to the indepen-

Classical quantitative genetics theory covers the analy- dent variable t as time or age, although there is nosis of a single quantitative trait, such as bristle number reason why it cannot be any continuous variable.in Drosophila, or at most a few traits. However, many In cases where the functional nature of the trait isinteresting characters are inherently too complex to be of interest, classical methods are often employed bydescribed by classical theory. Most often this is because treating arbitrary, discrete age intervals as unique char-it is difficult to describe the character of interest by a acters in a multivariate analysis (Hughes and Charles-single value. Examples can be found in the field of life worth 1994; Promislow et al. 1996; Tatar et al. 1996;history evolution, where traits change over the lifetime Pletcher et al. 1998). This approach is problematic.of an individual. In fact, in many cases it is the change As the number of ages of interest increases the abilityof the character with age that is the primary interest to produce precise estimates of statistical parameters(Hughes and Charlesworth 1994; Promislow et al. is rapidly lost (Shaw 1987, 1991). In addition, when1996; Pletcher et al. 1998). measurements are taken at irregular intervals, one

Function-valued traits are characters that change as a might reasonably expect the trait to be more similarbetween ages separated by a short time as compared withmore disparate ages. A standard variance component

Corresponding author: Scott D. Pletcher, Max Planck Institute for analysis ignores this type of information.Demographic Research, Doberaner Str. 114, D-18057 Rostock, Germany.E-mail: [email protected] Recognizing the limits of the classical approach, Kirk-

Genetics 151: 825–835 ( October 1999)

826 S. D. Pletcher and C. J. Geyer

patrick and Heckman (1989) formulated a quantita- Cov(eij, ekl) 5 dikεjl, (2a)tive genetic model for function-valued traits, which has

where dik are the elements of the identity matrix (dik 5since served as the foundation for numerous theoretical

1 if i 5 k, and dik 5 0 otherwise), andand experimental investigations in this area. On thetheoretical side, age-specific selection on a character Cov(gij, gkl) 5 rikgjl, (2b)and its interactions with genetic constraints have re-

where the rij are the coefficients of relationship (ele-ceived considerable attention (Kirkpatrick et al. 1990;

ments of the A matrix) and the gjl and εjl are parametersKirkpatrick and Lofsvold 1992). The evolution of

to be estimated. Making matrices G and E with elementsreaction norms over continuous environments has also

gjl and εjl allows us to write the matrix equationbeen studied (Gomulkiewicz and Kirkpatrick 1992).On the experimental side, estimates of genetic variation Var(x) 5 A ^ G 1 I ^ E, (3)for age-dependent growth patterns in birds (Gebhardt-

where ^ denotes the Kronecker product of matricesHenrich and Marks 1993; Bjorklund 1997), mice

(Searle et al. 1992, pp. 443 ff.) and x is a vector con-(Kirkpatrick et al. 1990; Meyer and Hill 1997), and

taining all data on all individuals in the order x11, x12,livestock (Kirkpatrick 1997) have been published.. . . , x21, x22, . . . . The matrices G and E are symmetric

Moreover, the recent interest in age-specific compo-m 3 m matrices if there are m traits, and each has m(m 1

nents of genetic variation for other life-history charac-1)/2 independent parameters. Statistical inference

ters (Engstrom et al. 1989; Houle et al. 1994; Hughesabout the G matrix and the constraints it imposes on

and Charlesworth 1994; Promislow et al. 1996;the dynamics of phenotypic evolution is the primary

Pletcher et al. 1998) suggests that interest in function-interest in these analyses (Lande 1979, 1982).

valued traits is growing.Function-valued traits add an additional level of com-

A quantitative genetics theory for function-valuedplication. Now for individual i the trait is a function

traits is a straightforward extension to standard method-xi(t) of the continuous variable t. Equations 2a and 2b

ology. Classical quantitative genetics partitions an ob-are replaced by

servable trait asCovei(s), ek(t) 5 dikE(s, t) (4a)

x 5 m 1 g 1 e, (1)Covgi(s), gk(t) 5 rikG(s, t). (4b)

where m is the mean (fixed effect) and g and e areThe primary interest in analyses of function-valued traitsthe genetic and environmental components (randomis statistical inference about the “G function,” G(s, t),effects). Assuming no gene-environment interaction, galso called the additive genetic covariance function. The “Eand e are independent, hencefunction,” E(s, t), also called the environmental covariance

Var(x) 5 Var(g) 1 Var(e). function, is of lesser interest.In practice, data are only observed at a finite set of

If xi, etc. denote the effects for individual i, the simplest times t1, . . . , tm, rather than a continuum, so we haveassumptions are that ei and ej are uncorrelated if i ≠ j only a finite set of data on each individual, which weand that Cov(gi, gj) is proportional to the coefficient of can consider as a multivariate trait vector xi(t1), . . . ,relationship of i and j (Falconer 1989, pp. 111 ff., xi(tm). Although in theory the trait has a continuous Gespecially p. 156). Making a matrix A of the coefficients function, in practice the covariance structure is de-of relationship (the so-called numerator relationship ma- scribed by a “G matrix.” The elements of the G matrixtrix) allows us to write the matrix equation are genetic covariances between the trait measured at

different ages. The key idea here is that the elementsVar(x) 5 s2gA 1 s2

eI,of the G matrix do not consist of unique parameters

where I is the identity matrix and s2g and s2

e are two for all variances and covariances. Instead, all elementsparameters to be estimated, the genetic and environ- of this matrix are obtained from a single G function.mental variances. Thus, the finite dimensional G matrix for the character

More complex genetic models partition the genetic process model has elements defined by gjl 5 G(tj, tl). Aeffect into additive, dominance, and other effects similar argument applies for the “E matrix.” Given the(Lynch and Walsh 1998). All the theory and examples new parameterization of the G and E matrices, Equationin this article consider only additive models. Extension 3 again describes the variance of the observed pheno-of our methods to include dominance and other effects type considered as a multivariate trait vector xi(tj).is theoretically straightforward (though no doubt some Is that all there is to function-valued traits? It appearspractical difficulties will arise). as though we have simply redefined the problem. Al-

When more than one trait is modeled, we have covari- though in principle there is a G function G(s, t), inances among traits as well as among individuals (Shaw practice there is only a G matrix G(tj, tl). Is anything1987, 1991). If xij, etc., now denote the effects for individ- new introduced by talking about function-valued traits?

The answer is “yes,” because classical multivariate meth-ual i and trait j, the simplest assumptions are now

827Genetics of Function-Valued Traits

ods run into intractable difficulties when there are many oN

i51oN

j51

bibjrX(ti, tj) $ 0. (7)traits. Even five traits are trouble (Shaw et al. 1995;Shaw and Geyer 1997). Function-valued traits are often

Most quantitative genetics theory is based on the as-observed at many times (or many values of t if t is notsumption that the character of interest or some transfor-time), too many for classical multivariate quantitativemation of it is normally distributed (Lynch and Walshgenetics to cope with.1998). This assumption can be extended to a characterSome new idea has to be added to manage the param-process by utilizing the theory of Gaussian processeseter explosion, m(m 1 1) parameters to estimate in the(Hoel et al. 1972; Kirkpatrick and Heckman 1989).genetic covariance matrix alone if data are observed atA stochastic process X(t), t P T, is called a Gaussianm times. In the theory of function-valued characters,process if the vector (X(t1), X(t2), . . . , X(tm)) has athe number of parameters in the finite dimensional Gmultivariate normal distribution for every choice ofmatrix is equal to the number of parameters in the Gtimes t1, . . . , tm (Hoel et al. 1972). As with any Gaussianfunction—this is independent of the number of agesrandom variable, the distribution of a Gaussian processexamined, and the task is to model and estimate the Gis completely determined by its mean and covariancefunction. There are two possible approaches: paramet-function.ric and nonparametric. This article explores the use of

Using the language of Gaussian processes, we canparametric models for the G function. Kirkpatrick andnow complete our description of quantitative geneticsco-workers and followers use an approach that is non-for function-valued traits. We assume the observed phe-parametric in spirit, although for most experimentalnotypic character process X(t) is a Gaussian process anddesigns it is missing some important features that onecan be decomposed analogous to (1) asexpects in a nonparametric statistical method.

In the following sections we provide a brief review of X(t) 5 m(t) 1 g(t) 1 e(t), (8)the seminal work in this area, while focusing on the

where m(t) is a nonrandom function, the mean functiondifferences between previous work and our own. Weof X(t), and g(t) and e(t) are mean-zero Gaussian pro-present representative examples from an extensive se-cesses that are independent of each other and haveries of simulations in which we compared our approachcovariance functions G(s, t) and E(s, t), respectively.with those suggested previously. We then illustrate theBy the independence of g(t) and e(t), the covariancevarious techniques using real data on mortality rates infunction of X(t) is given by P(s, t) asfemale Drosophila. Last, we summarize some of the

benefits of our character process model over previous P(s, t) 5 G(s, t) 1 E(s, t). (9)methods and suggest promising avenues for future theo-retical development. Each individual has a different realization of the charac-

ter processes X(t), g(t), and e(t). The covariance of theprocesses for different individuals we have already de-

GENERAL CONSIDERATIONSrived as (4a) and (4b).

The probabilistic framework for modeling a function- Thus the character process approach, also called func-valued trait is based on the theories of stochastic pro- tion-valued quantitative genetics, can be simply butcesses. A stochastic process can be defined as a set of briefly described as replacing the Gaussian random vari-random variables X(t), t P T, where T is a subset of the ables or random vectors of classical quantitative geneticsreal line and termed the time parameter set (Hoel et by Gaussian stochastic processes and proceeding mutatisal. 1972). A specific realization of a process (i.e., the mutandis. What we have described so far includes allvalues of the random variables at each t) is called a sample approaches to function-valued quantitative genetics:path of that process. We are interested in processes with that of Kirkpatrick and co-workers, that of Meyer andfinite variance, i.e., for which EX(t)2 , ∞, the so-called Hill (1997), and ours. The differences are in how thesecond-order processes. In such cases, we can define a G and E functions are modeled and in how the modelsmean function of the process by are fitted to data.

mX(t) 5 EX(t), t P T (5)

and a covariance function of the process by NONPARAMETRICS AND ORTHOGONALPOLYNOMIALS

rX(s, t) 5 CovX(s), X(t), s, t P T. (6)In the approaches of Kirkpatrick and co-workers and

Equation 5 is the function describing how the expected of Meyer and Hill, the G and E functions are modeledvalue of the character changes with age, and (6) de- by a linear combination of orthogonal Legendre polyno-scribes the covariance between the character at two sepa- mialsrate ages. The covariance function must be nonnegativedefinite, that is, for any finite set of times (t1 . . . tN) and G(s, t) 5 o

m

i50om

j50

φi(s)φj(t)kij, (10)any real numbers (b1 . . . bN),


where G is the covariance function, m determines the have a large number of parameters, most of whichnumber of polynomial terms used in the model, kij are have no simple interpretation. Specific age-depen-unknown parameters to be estimated (the coefficients dent hypotheses are not easily tested.of the linear combination), and φi is the ith Legendre

We avoid these problems by using parametric modelspolynomial (Kirkpatrick and Heckman 1989; Kirk-for the G and E functions. We discuss a large familypatrick et al. 1990). A similar model is used for the Eof parametric models, each with a small number offunction.interpretable parameters, that satisfy theoretical re-Kirkpatrick and co-workers used fitting proceduresquirements and that as a group exhibit a wide varietythat are no longer recommended, being supersededof behaviors. We (like Meyer and Hill 1997) use MLby the methods of Meyer and Hill (1997), who usedto estimate parameters. C code, implementing theserestricted maximum likelihood (REML). Meyer and Hillprocedures, is available from the first author.estimated the parameters of the model (i.e., the kij in

Equation 10) for each model with a fixed set of Le-gendre polynomials, which corresponds to fixing m in(10). They then used likelihood-ratio tests to determine PARAMETRIC CHARACTER PROCESS MODELSa value of m that adequately fits the data.

Useful parametric models for covariance functionsWe have no argument with model fitting by maximumare limited by several theoretical requirements. First,likelihood (ML) or REML, but we propose a differentcovariance functions must be positive semidefinite, i.e.,way of modeling G and E functions. Covariance func-satisfy Equation 7. Second, biological processes are ex-tions modeled with Legendre polynomials (or otherpected to be reasonably smooth, requiring their covari-orthogonal polynomials) have a number of potentialance functions to be smooth as well (Hoel et al. 1972).drawbacks.If a Gaussian stochastic process is to be considered

1. They are not automatically positive semidefinite. Al- smooth, it will have differentiable sample paths, and sothough constrained ML or REML can be used to must its covariance function. In general the covarianceimpose this condition, this greatly complicates hy- function has twice as many derivatives as the processpothesis testing and other statistical procedures. itself (Hoel et al. 1972). Thus, because we expect biolog-

2. Legendre polynomials have no theoretical justifica- ical processes to be relatively smooth, we choose covari-tion other than being one among many sets of or- ance function models that are highly differentiable.thogonal basis functions. Third, it is desirable for the covariance function to have

3. Polynomials do not fit covariance functions well. parameters with biologically meaningful interpretationsPolynomials of high degree are extremely “wiggly” so that interesting hypotheses can be easily tested.and do not have asymptotes. Sensible covariance With these considerations in mind, we first concen-functions are extremely smooth and typically trate on a simple model of a character process that

nevertheless may adequately represent many age-depen-G(s, t) → 0, as |s 2 t| → ∞dent traits. We assume each process X(t) is second-order

(an asymptote). stationary, which means4. For the majority of genetic studies, trying to be non-

parametric about the covariance function of an un- mX(t) is independent of t andobservable stochastic process may be optimistic. In rX(s, t) is a function of s 2 ttime-series analysis and spatial statistics, where the

(Hoel et al. 1972). This stationarity assumption is neces-stochastic process is observed directly, the most suc-sary for several fundamental results, but it is relaxedcessful methods use parametric models [e.g., autore-later. Second-order stationarity requires that the meangressive integrated moving average (ARIMA) model-value of the trait must not change with age and that theing of time series and variogram estimation in spatialcovariance between the value of the character at twostatistics]. Experience in spatial statistics shows thatdifferent ages depends only on the time distance be-the behavior of the covariance function at pointstween the age classes.closely related in time determines most of the behav-

For stationary models, the choice of a covariance func-ior of the process, and it is difficult to distinguishtion is greatly simplified by Bochner’s theorem (Hoel etdifferent behaviors in the tails of the covariance func-al. 1972), which asserts that a strictly positive covariancetion (Cressie 1993, section 3.2.1). It is even morefunction is necessarily proportional to the characteristicdifficult if the stochastic process is unobserved likefunction of some probability distribution. Thus, imme-the genetic and environmental processes in quantita-diately we have a long menu of potential covariancetive genetics. For realistic experimental designs,functions from which to choose, as any real-valued char-there is not enough information in the data for goodacteristic function of a probability distribution is al-nonparametric estimation.

5. Polynomial models for covariance functions often lowed. A number of satisfactory functions are presented


TABLE 1 (rather than covariance) stationarity. This relaxationallows variance to change with age. If rX(s 2 t) is theCovariance functions for the character process modelcorrelation function of a second-order stationary pro-cess and v(t) is an arbitrary function, thenName Covariance function

rX(s, t) 5 v(s)v(t)rX(s 2 t) (11)Standard normal u0 exp(2uct 2)

Cauchy is a valid covariance function. Thus we can choose rX(t)u0

1 1 uct 2 to be any of the functions in Table 1 with the additionalrestriction that u0 5 1 [so that the correlation of X(t)Bilateral exponential u0 exp(2uc|t|)with itself is 1] and choose v(t) completely arbitrarilyHyperbolic cosine u0

cosh(puct/2) and still obtain a reasonable model. Although the modelhas stationary correlation, the variance

Characteristic function of a u0sin(uct)uctuniform distribution VarX(t) 5 v(t)2

is not stationary and can be specified as we please.Characteristic function of a u0[1 2 cos(uct)]u2

ct2triangular distribution Hypotheses concerning the pattern of change in age-specific variances (genetic and otherwise) for a given

Characteristic function of a u0 exp(2uc|t|a) character can be examined using this model.general stable distribution The parameters of the model are estimated straight-

forwardly using ML or REML. The reason, as mentionedValid covariance functions derived from the characteristicfunctions of various probability distributions. The parameters in the Introduction, is that the character process is onlysatisfy u0 . 0, uc . 0, and 0 , a # 2. Characteristic functions observed at a finite set of times; hence the observationswere taken from Feller (1968). More general covariance form a multivariate normal random vector with meanfunctions can be obtained by replacing u0 with a more general

and covariance that are specified by the models for thevariance function (see text).mean function and G and E covariance functions. Inprinciple the estimation procedure is no different fromclassical quantitative genetics of multivariate traits. Onlyin Table 1. In many cases the characteristic function

of one probability distribution is proportional to the the model specification is new. In practice, however,the ideas of the character process model use reasonableprobability density function of another. In such cases

we refer to the covariance function by the name of assumptions to reduce the dimension of the parameterspace and make an age-dependent quantitative analysisthe distribution with the proportional density. In cases

where there is no such distribution, the covariance func- of the trait possible.tion is specifically referred to as the characteristic func-tion of its parent distribution. The available functions

EXAMPLESexhibit a wide variety of behaviors, and some can benegative in sign. Simulation study: We investigated the behavior of the

character process and orthogonal polynomial (OP)Although the assumptions of stationarity are ratherstrict, we can use the results for stationary processes models through extensive simulations. Three represen-

tative examples are provided in this section. For eachto formulate models that account for age-dependentchanges in the mean value of the character and that example, a single data set was generated assuming a

standard half-sib design (Lynch and Walsh 1998) inallow for more general covariance functions. The sim-plest way to achieve first-order stationarity (i.e., a con- which 20 sires were each mated to three dams and three

progeny were measured from each dam. We assumedstant mean over time) is to model the mean separatelyas in (8), where g(t) and e(t) have mean zero for all the character of interest was measured at 10 regularly

spaced ages denoted 1, . . . , 10. It is important to notet, hence are first-order stationary. The nonstochasticfunction m(t), analogous to fixed effects in classical that such a balanced design is not required for applying

these methods. Unequal family structure, as well as irreg-quantitative genetics, models the mean behavior. Analternative to modeling the mean function directly is to ularly spaced measurements, are perfectly accept-

able, although different designs will contain differentuse methods analogous to those used to remove trendsin time series (Box et al. 1994), such as differencing amounts of genetic information (Shaw 1987). Details

of the simulation procedure are available from the firstthe series (replacing the value at time t by Xt11 2 Xt),and more generally using “integrated” models, such as author.

Because they are unobserved, we have no way of know-ARIMA.A relaxation of second-order stationarity—the condi- ing what a typical genetic covariance function might

look like. Therefore, these examples are rather arbitrarytion that requires the covariance of the process betweenages t1 and t2 to be only a function of |t1 2 t2|—that still and serve mainly to illustrate the relationship between

the character process and OP models. We present threegives relatively simple models is second-order correlation


Figure 2.—(A) Actual genetic covariance surface for simu-Figure 1.—(A) Actual genetic covariance surface for simu-lated data from case II: constant genetic variance and slowlylated data from case I: constant genetic variance and rapidlydeclining covariance. The form of the covariance function isdeclining covariance. The form of the covariance function isG(t1, t2) 5 0.5e20.01(t12t2)2. (B) Lack of fit of an estimated geneticG(t1, t2) 5 0.5e20.7(t12t2)2. (B) Lack of fit of an estimated geneticcovariance surface for a model consisting of three orthogonalcovariance surface for a model consisting of five orthogonalpolynomials. Lack of fit is defined as the absolute differencepolynomials. Lack of fit is defined as the absolute differencebetween the estimated surface and the actual surface. Darkerbetween the estimated surface and the actual surface. Darkerregions indicate greater lack of fit.regions indicate greater lack of fit.

cally uncorrelated (or nearly so), OP models provide arelatively simple cases: case I, genetic variance is con-poor estimate of the covariance function (Figure 1).stant across all ages, and genetic covariance declinesThe five-polynomial model was determined to providevery quickly between adjacent ages; case II, genetic vari-an adequate fit to the data via likelihood-ratio tests (aance is constant across all ages, and genetic correlationsix-polynomial model did not fit significantly better),declines very slowly; case III, the genetic covarianceand although the fit is quite poor, genetic variances arefunction is composed of four OPs (giving a covarianceestimated more accurately than covariances (Figure 1b).function of degree three).In our experience this is to be expected when covari-Figures 1–3 present the actual covariance functionsances decline asymptotically toward zero within thefor each of the three cases along with contour plotsrange of the data. The wiggly nature of the polynomialdescribing the fit of different models to the simulatedmodel has difficulty reproducing such a structure. Thedata. The contour plots display the absolute differenceOP model does a much better job of describing thebetween the fitted surface and the actual surface, withcovariance structure when genetic correlations are highdarker regions indicating regions of poor fit and lighterbetween all ages in the data (Figure 2). In this case, theregions indicating regions of better fit. Contour shadingthree-OP model was determined as the best fit, andis constant over all figures, allowing comparisons be-it does a reasonable job of estimating the covariancetween them.

When character values at different ages are geneti- structure. The fits of the character process models are


not presented for these two examples. They are ex- structure of the genetic covariances. Nevertheless, thefit of the character process model is not terrible, andpected to fit well (and do) because they were used to

generate the data. essentially smooths over the undulations in the actualfunction. Surprisingly, the OP model has some difficultyFigure 3 presents a genetic covariance function gener-

ated directly from a four-OP model. In this case, it was reproducing the covariance structure. This is likely dueto the number of parameters in the model (10) andthe character process model (a linear variance model

with normal correlation) that had trouble capturing the the size of the simulated experiment. Even when theform of the underlying covariance function is knownprecisely, most experiments will not provide enoughinformation to accurately estimate even a moderatenumber of parameters.

In summary, OP models do not accurately describethe structure of the genetic covariance function whenthe genetic correlation is expected to decline signifi-cantly with age. We argued (see above) that it is thesetypes of covariance functions that one might expectfrom natural stochastic processes. For relatively simplecovariance structures, however, the OP models accu-rately estimate the surfaces (Figure 2). Flexibility fromthe range of allowable character process models allowsa reasonable approximation to the actual covariancestructure even when it is very irregular (Figure 3). More-over, Figures 1–3 suggest that a significant strength ofthe character process model is its separation of variancefunctions from correlation functions. In all the exam-ples, the majority of lack of fit is in the covariance (notvariance) structure, suggesting the overall fit of themodel is determined primarily by estimates of age-spe-cific variances.

Age-specific mortality rates in Drosophila: In this ex-ample, our goal is to estimate the genetic covariancestructure for age-specific mortality rates in lines of Dro-sophila melanogaster allowed to accumulate spontaneousmutations for 19 generations (Pletcher et al. 1998).The data are mortality rate estimates (5-day intervals) for29 mutation-accumulation lines. For each accumulationline there are four mortality observations at each age,and mortality rates are presented for six different ages.A logarithmic transformation was used to normalize thedata (Promislow et al. 1996; Pletcher et al. 1998). Inthis example, log-mortality rates are examined throughage 30 days posteclosion. Data from the oldest ages wereexcluded because estimates of genetic variances andcovariances among these ages were extremely imprecisewhen estimated using standard methods, and often this

Figure 3.—(A) Actual genetic covariance surface for simu-lated data from case III: genetic covariance function basedon four orthogonal polynomials. (B) Lack of fit of an estimatedgenetic covariance surface using a character process modelwith a linear variance and normal correlation function. (C)Lack of fit of an estimated genetic covariance surface for amodel consisting of four orthogonal polynomials (the sameform used to generate the data). For both B and C, lack offit is defined as the absolute difference between the estimatedsurface and the actual surface. Darker regions indicate greaterlack of fit.


TABLE 2hindered our ability to compare estimation methods.Estimates of the mutational covariance structure based Comparison of age-specific genetic variance matriceson the complete data set are presented in a companion estimated by various methodsarticle (Pletcher et al. 1999).

The data set was analyzed using three approaches. Age interval (days)MethodFirst, the genetic covariance structure was estimated(days) 0–4 5–9 10–14 15–19 20–24 25–29completely nonparametrically (i.e., using standard mul-SMtivariate techniques) by specifying a separate parameter

0–4 0.55 0.50 0.48 0.39 0.26 0.11for each age-specific variance and each covariance. Our5–9 0.50 0.51 0.40 0.30 0.17sample size was far too small to estimate all 21 parame-10–14 0.54 0.47 0.40 0.21ters in the 6 3 6 covariance matrix simultaneously, and15–19 0.53 0.49 0.26

we were forced to construct the matrix piecewise—by 20–24 0.51 0.29examining ages two at a time. Pairwise covariances were 25–29 0.16obtained using ML implemented in the program QUER- OPCUS (Shaw 1987; Shaw and Shaw 1992). Second, a 0–4 0.62 0.60 0.52 0.39 0.23 0.04genetic covariance function composed of four Legendre 5–9 0.58 0.51 0.38 0.23 0.05

10–14 0.50 0.46 0.35 0.15polynomials (giving a polynomial of degree three) was15–19 0.52 0.48 0.26estimated using ML procedures similar to those of20–24 0.51 0.30Meyer and Hill (1997). Third, we used the character25–29 0.20process approach to estimate a genetic covariance func-

CPtion based on a quadratic variance function and normal0–4 0.57 0.59 0.55 0.45 0.31 0.17correlation function (see Table 1).5–9 0.66 0.65 0.56 0.42 0.24

The estimated genetic covariance matrices for the 10–14 0.67 0.62 0.49 0.29various methods are presented in Table 2. Although all 15–19 0.60 0.50 0.32procedures appear to capture the dominant aspects of 20–24 0.45 0.31

25–29 0.22the covariance structure, several issues make the charac-ter process approach desirable. First, using standard Genetic covariance (generated by spontaneous mutations)multivariate methods, covariances and their asymptotic for age-specific mortality rates in female Drosophila melanogasterstandard errors were estimated pairwise and are too estimated by standard multivariate methods (SM), orthogonal

polynomials (OP), and the character process model (CP).small when considering the matrix as a whole. DespiteThe SM matrix was estimated “piecewise,” by estimating vari-the small standard errors there is insufficient statisticalances and covariances between pairs of ages. The OP matrixpower to detect a significant change in covariance as was based on a model of four orthogonal polynomials, and

ages become further separated in time (analysis not the CP matrix was based on a quadratic variance and normalshown). Second, because data from each age are consid- correlation function.ered separately, systematic relationships among thecharacters are ignored. Third, the sample size prohibitsestimating the entire 6 3 6 covariance matrix simultane- guaranteed to be positive definite, and data from all

ages are analyzed simultaneously. Standard errors forously, and as a result the “piecewise” matrix (Table 2)is not even positive definite. the parameters of the model are obtained from the

maximization procedure and error estimates on the in-The genetic matrix produced by the four-polynomialmodel is quite similar to that produced by the standard dividual age measures can be easily calculated. Most

covariance functions have relatively few parameters,methods. However, a primary concern remains the num-ber of parameters in the model; we are estimating 10 which are estimated with high precision. Finally, and

perhaps most importantly, the parameters of the modelparameters for the genetic matrix alone. As with thestandard methods, the number of parameters demands have useful interpretations, which allow simple hypothe-

ses to be easily tested.a large sample size for accurate estimation, but unlikethese methods, none of the parameters have a clear To further investigate the behavior of the character

process models, we fit several different covariance func-interpretation. Although we may have asymptotic vari-ance estimates for the coefficients of the OP (as is the tions to the data. In all models, we estimated a nonpara-

metric mean function—average mortality rates at eachcase when ML is used), it is difficult to establish simpletests of interesting hypotheses. For example, the rate of age were estimated simultaneously—to account for the

increase in mortality rates with age. For both the geneticdecline in covariance as ages become further separatedin time is described by a complicated combination of and environmental effects, we examined the fit of covari-

ance functions composed of (in all combinations) threethe coefficients of the polynomial.Many of the problems inherent in the standard and variance functions, the v(t)2 from Equation 11 (con-

stant, linear, and quadratic) and three correlation func-OP methods are alleviated under the character processmodel. The estimated genetic covariance functions are tions, the rX(s 2 t) from Equation 11 (normal, Cauchy,


TABLE 3

Character process model estimates for genetic covariance functions

Correlation VarianceFunction Function u0 u1 u2 uc Likelihood

Normal Constant 0.38 — — 0.05 280.65(0.09) (0.02)

Linear 0.92 20.13 — 0.04 276.29(0.26) (0.04) (0.02)

Quadratic 0.40 0.21 20.04 0.03 273.14(0.25) (0.16) (0.02) (0.01)

Cauchy Constant 0.39 — — 0.06 280.93(0.10) (0.03)

Linear 0.94 20.13 — 0.04 276.51(0.26) (0.04) (0.02)

Quadratic 0.47 0.10 20.02 0.04 274.71(0.25) (0.14) (0.02) (0.02)

Uniform Constant 0.38 — — 0.53 280.24(0.09) (0.09)

Linear 0.90 20.12 — 0.47 276.01(0.26) (0.04) (0.10)

Quadratic 0.46 0.10 20.02 0.45 274.03(0.24) (0.14) (0.02) (0.10)

Parameter estimates (standard errors) and the log likelihoods for nine character process models composedof all combinations of three variance and three correlation functions. Variance functions are as follows: constant(u0), linear (u0 1 u1), and quadratic (u0 1 u1t 1 u2t 2). Correlation functions are taken from Table 1 with u0 51.Estimates were obtained using maximum likelihood.

and characteristic function of a uniform). For all analy- is little statistical power to detect subtle differences inthe shapes of the underlying genetic correlation func-ses the constant variance and Cauchy correlation func-

tions were chosen for modeling the environmental co- tion.Hypothesis tests concerning age-specific genetic vari-variance—more complicated covariance functions did

not provide a significantly better fit (details not shown). ance for mortality are easily conducted. ML estimatesare asymptotically normally distributed, and thereforeParameter estimates for the genetic covariance func-

tions are given in Table 3. The dynamics of age-specific their estimated standard errors can be used to constructconfidence intervals and test statistics (Searle et al.genetic variance can be determined using likelihood-

ratio tests. Given a specific correlation function, twice 1992). Further, the significantly improved fit of the qua-dratic variance function over the constant and linearthe difference in log likelihoods between a more general

variance model (e.g., quadratic variance) and a more functions provides strong evidence for interestingchanges in mutational properties across ages, althoughconstrained model (e.g., linear variance) has a chi-

square distribution with degrees of freedom equal to the the low variance at ages 25–29 days may be driving thisresult. Such statements could not be made from thenumber of additional parameters in the more general

model. The P-values for the test that a quadratic variance results of standard methods or from the fit of OPs.The hypothesis that most mutations affect mortalityfunction fits better than a linear one are 0.01 for the

normal correlation function, 0.06 for the Cauchy, and equally at all ages can be tested by asking if the correla-tion in mortality rates between various ages is different0.05 for the characteristic function of the uniform (the

deviances being 6.3, 3.6, and 3.96, respectively, all from unity. Because, for all character process models,uc (see Table 1) is the rate of decrease in correlation withasymptotically chi-square on 1 d.f.). A cubic variance

function did not provide a significantly better fit to the time, testing whether this value is significantly differentfrom zero directly addresses this hypothesis. The param-data.

Given a particular model for the variance function, eter is significantly greater than zero in all models (P ,0.05), providing strong evidence that the majority ofthere is little difference between the fits of the correla-

tion functions. For example, the log-likelihood values measured mutations exhibit some form of age speci-ficity.for the normal, Cauchy, and uniform correlation func-

tions with a quadratic variance function are 273.14, Despite the twofold increase in the number of param-eters, a covariance function based on four OP did not274.71, and 274.03, respectively. Although a rigorous

test of non-nested hypotheses such as these is rather provide a significantly better fit than the best-fit functionfrom the character process model. Using two popularcomplicated (see Cox 1961, 1962), it is clear that there


criteria, Akaike information criterion (AIC) and Bayes- Many of the problems with OPs were recognized bythe original authors, and it has been suggested thatian information criterion (BIC; Schwarz 1978; Stone

1979), any of the character process models with a qua- more advanced “smoothing” techniques, such as cubic-splines or wavelets, might be more well behaved (Kirk-dratic variance function would be chosen over the best

OP model (data not shown). patrick et al. 1994). This is a promising avenue forfuture research. Good parametric and nonparametricapproaches complement one another. The strengths of

DISCUSSIONthe parametric approach are its great efficiency and itsease of interpretation. Unfortunately, if the assumedThe quantitative genetic analysis of function-valued

traits, such as growth and mortality curves, starts with model is grossly incorrect, inferences can be misleading(Simonoff 1996). Good nonparametrics are less reliantthe fundamental recognition by Kirkpatrick and

Heckman (1989) that the genetic and environmental on assumptions about the formal structure of the data.They do, however, require large sample sizes, muchcomponents of such traits should be modeled as

Gaussian stochastic processes. It continues with the rec- larger than many of the most ambitious quantitativegenetic studies. If there is insufficient information inognition by Meyer and Hill (1997) that ML or REML

can be used to fit such models, just as it can be used the data to support the accurate estimation of manyparameters, one is essentially left with a bad parametricfor all other quantitative genetics models. Our contribu-

tion to the subject is a method of finding valid paramet- model.An equally promising direction for the future mightric models for covariance functions of these Gaussian

processes from theory in spatial and time-series statistics, be the extension of our techniques to examine the rela-tionship between multiple character processes. Two-where it is widely used (Cressie 1993, section 2.5.1).

These parametric models for covariance functions character processes can be examined by estimatingco-variance functions for each character and a cross-have many virtues. They are assured to be positive defi-

nite, hence valid covariance functions. They can be cho- covariance function between the two (Kirkpatrick 1988).The approach is analogous to estimating the geneticsen to be highly differentiable, implying the character

process itself is smooth, which we expect from a biologi- covariance between two different characters, except inthis case the covariance is estimated for the value of thecal process. They have a small number of parameters,

and models can be chosen to address specific biological two characters at every combination of the two ages.In this way age-dependent genetic constraints on thehypotheses. Moreover, the flexibility of the approach

means reasonable fits are obtained even when the actual independent evolution of the two traits can be explored.covariance function is highly irregular (Figure 3). Comments provided by J. Curtsinger, R. Shaw, G. Oehlert, R. Lande,

It is important to recognize that parametric models M. Kirkpatrick, A. Clark, and an anonymous reviewer greatly improvedthe quality and clarity of the manuscript. M. Kirkpatrick generouslyhave certain limitations. Although we have argued thatprovided creative discussion throughout the development of this work.our covariance functions are reasonable models, veri-This work was supported by National Institutes of Health grants AG-fying the assumptions of the models, particularly sta-0871 and Ag-11722 to J. Curtsinger and by the University of Minnesota

tionarity in correlation, is exceedingly difficult (Math- Graduate School.eron 1988). Stationarity will, however, often be a goodapproximation; and as George Box asserted, all modelsare wrong, but some are useful (Box 1976). Kirkpatrick LITERATURE CITEDand colleagues often focus on characterizing the domi-

Barton, N. H., and M. Turelli, 1989 Evolutionary quantitativenant eigenfunctions of the genetic covariance function, genetics: how little do we know? Annu. Rev. Genet. 23: 337–370.

Bjorklund, M., 1997 Variation in growth in the blue tit (Paruswhich are thought to summarize patterns of geneticcaeruleus). J. Evol. Biol. 10: 139–155.variation (Kirkpatrick et al. 1990). Although we have

Box, G. E. P., 1976 Science and statistics. J. Am. Stat. Assoc. 71:not pursued it here, it is likely that for a particular covari- 791–802.

Box, G. E. P., G. Jenkins and G. C. Reinsel, 1994 Time Series Analysis:ance function, the eigenfunctions are somewhat limitedForecasting and Control, Ed. 3. Prentice Hall, Englewood Cliffs, NJ.in their range of behaviors. One may argue, however,

Cox, D. R., 1961 Tests of separate families of hypotheses. Proc. 4ththat the process of choosing a good model in effect Berkeley Symp. 1: 105–123.

Cox, D. R., 1962 Further results on tests of separate families ofsearches a large space of possible eigenfunctions.hypotheses. J. R. Stat. Soc. B 24: 406–424.Implementing a nonparametric approach using Le-

Cressie, N. A., 1993 Statistics for Spatial Data. John Wiley and Sons,gendre polynomials (Kirkpatrick and Heckman 1989) New York.

Engstrom, G., L. E. Lilijedahl, M. Rasmuson and T. Bjorklund,is problematic. Subsequent covariance functions are not1989 Expression of genetic and environmental variation duringnecessarily positive definite. Simple simulations showageing: 1. Estimation of variance components for number of

that polynomials of low degree do not closely approxi- adult offspring in Drosophila melanogaster. Theor. Appl. Genet. 77:119–122.mate reasonable covariance functions unless character

Falconer, D. S., 1989 Introduction to Quantitative Genetics, Ed. 3.values at all measured ages are highly correlated (Fig-Longman, New York.

ures 1 and 2). Polynomials of high degree have many Feller, W., 1968 An Introduction to Probability Theory and its Applica-tions, Vol. 1, Ed. 3. John Wiley and Sons, New York.parameters, more than are necessary to fit data.


Gebhardt-Henrich, S. G., and H. L. Marks, 1993 Heritabilities of Matheron, G., 1988 Estimating and Choosing: An Essay on Probabilitygrowth curve parameters and age-specific expression of genetic in Practice. Springer-Verlag, New York.variation under two different feeding regimes in Japanese quail Meyer, K., and W. G. Hill, 1997 Estimation of genetic and pheno-(Coturnix coturnix japonica). Genet. Res. 62: 42–55. typic covariance functions for longitudinal or ‘repeated’ records

Gomulkiewicz, R., and M. Kirkpatrick, 1992 Quantitative genetics by restricted maximum likelihood. Livest. Prod. Sci. 47: 185–200.and the evolution of reaction norms. Evolution 46: 390–411. Pletcher, S. D., D. Houle and J. W. Curtsinger, 1998 Age-specific

Hoel, P. G., S. C. Port and C. Stone, 1972 Introduction to Stochastic properties of spontaneous mutations affecting mortality in Dro-Processes. Houghton Mifflin, Boston. sophila melanogaster. Genetics 148: 287–303.

Houle, D., 1992 Comparing evolvability and variability of quantita- Pletcher, S. D., D. Houle and J. W. Curtsinger, 1999 The evolu-tive traits. Genetics 130: 195–204. tion of age-specific mortality rates in Drosophila melanogaster: ge-

Houle, D., K. A. Hughes, D. K. Hoffmaster, J. Ihara, S. Assima- netic divergence among unselected lines. Genetics 153: 813–823.copoulos et al., 1994 The effects of spontaneous mutation on Promislow, D. E. L., M. Tatar, A. A. Khazaeli and J. W. Curtsinger,quantitative traits. I. Variances and covariances of life history 1996 Age-specific patterns of genetic variation in Drosophila mela-traits. Genetics 138: 773–785. nogaster. I. Mortality. Genetics 143: 839–848.

Hughes, K. A., and B. Charlesworth, 1994 A genetic analysis of Schwarz, G., 1978 Estimating the dimension of a model. Ann. Stat.senescence in Drosophila. Nature 367: 64–66. 6: 461–464.Kirkpatrick, M., 1988 The evolution of size in size-structured popu-

Searle, S. R., G. Casella and C. E. McCulloch, 1992 Variancelations, pp. 13–28 in The Dynamics of Size-Structured Populations,Components. Wiley and Sons, New York.edited by B. Ebenman and L. Persson. Springer-Verlag, Heidel-

Shaw, F. H., and C. J. Geyer, 1997 Estimation and testing in con-berg, Germany.strained covariance component models. Biometrika 84: 95–102.Kirkpatrick, M., 1997 Genetic improvement of livestock growth

Shaw, R. G., 1987 Maximum-likelihood approaches applied to quan-using infinite-dimensional analysis. Anim. Biotech. 8: 55–56.titative genetics of natural populations. Evolution 41: 812–826.Kirkpatrick, M., and N. Heckman, 1989 A quantitative genetic

Shaw, R. G., 1991 The comparison of quantitative genetic parame-model for growth, shape, reaction norms, and other infinite-ters between populations. Evolution 45: 143–151.dimensional characters. J. Math. Biol. 27: 429–450.

Shaw, R. G., and F. H. Shaw, 1992 QUERCUS: programs for quanti-Kirkpatrick, M., and D. Lofsvold, 1992 Measuring selection andtative genetic analysis using maximum likelihood.constraint in the evolution of growth. Evolution 46: 954–971.

Shaw, R. G., G. A. J. Platenkamp, F. H. Shaw and R. H. Podolsky,Kirkpatrick, M., D. Lofsvold and M. Bulmer, 1990 Analysis of1995 Quantitative genetics of response to competitors in Ne-the inheritance, selection and evolution of growth trajectories.mophila menziesii: a field experiment. Genetics 139: 397–406.Genetics 124: 979–993.

Simonoff, J. S., 1996 Smoothing Methods in Statistics. Springer-Verlag,Kirkpatrick, M., W. G. Hill and R. Thompson, 1994 Estimatingthe covariance structure of traits during growth and ageing, illus- New York.trated with lactation in dairy cattle. Genet. Res. 64: 57–69. Stone, M., 1979 Comments on model selection criteria of Akaike

Lande, R., 1979 Quantitative genetic analysis of multivariable evolu- and Schwarz. J. R. Stat. Soc. Ser. B 41: 276–278.tion, applied to brain:body size allometry. Evolution 33: 402–416. Tatar, M., D. E. L. Promislow, A. A. Khazaeli and J. W. Curtsinger,

Lande, R., 1982 A quantitative genetic theory of life history evolu- 1996 Age-specific patterns of genetic variation in Drosophila mela-tion. Ecology 63: 607–615. nogaster. II. Fecundity and its genetic correlation with mortality.

Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative Genetics 143: 849–858.Traits. Sinauer Associates, Sunderland, MA.

Communicating editor: A. G. Clark

Copyright 2000 by the Genetics Society of America

Statistical Models for Estimating the Genetic Basis of Repeated Measuresand Other Function-Valued Traits

Florence Jaffrezic* and Scott D. Pletcher†

*Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, Scotland and †Max PlanckInstitute of Demographic Research, D-18057 Rostock, Germany

Manuscript received February 18, 2000Accepted for publication June 26, 2000

ABSTRACTThe genetic analysis of characters that are best considered as functions of some independent and

continuous variable, such as age, can be a complicated matter, and a simple and efficient procedure isdesirable. Three methods are common in the literature: random regression, orthogonal polynomialapproximation, and character process models. The goals of this article are (i) to clarify the relationshipsbetween these methods; (ii) to develop a general extension of the character process model that relaxescorrelation stationarity, its most stringent assumption; and (iii) to compare and contrast the techniquesand evaluate their performance across a range of actual and simulated data. We find that the characterprocess model, as described in 1999 by Pletcher and Geyer, is the most successful method of analysis forthe range of data examined in this study. It provides a reasonable description of a wide range of differentcovariance structures, and it results in the best models for actual data. Our analysis suggests geneticvariance for Drosophila mortality declines with age, while genetic variance is constant at all ages forreproductive output. For growth in beef cattle, however, genetic variance increases linearly from birth,and genetic correlations are high across all observed ages.

Asimple and efficient procedure for the genetic anal- 1998). Third, the character process model was recentlyproposed by Pletcher and Geyer (1999) and is basedysis of characters that change as a function of age

(or some other independent and continuous variable) on theories of stochastic processes. We develop andconsider a general extension of the process model tois desirable for researchers in several fields of biology

and genetics. Plant and animal breeders are often faced take advantage of new methods for estimating compli-cated correlation structures. Each of these methods haswith the genetic analysis of “repeated measures” data,

such as lactation in dairy cows or growth rates in impor- been implemented in relatively easy to use computersoftware packages, and they are freely available.tant agricultural species. Biologists interested in the

evolution of life histories study the genetic basis of The aim of this article is to compare and contrastrandom regression, orthogonal polynomials, and char-age-specific fitness components, such as survival or re-

productive output, while evolutionary ecologists often acter process models and evaluate their performance.We focus first on examining the underlying assumptionsexamine the genetic relationship between values of a

single character expressed over a continuous range of of the three methods, while emphasizing fundamentalsimilarities and differences when appropriate. Next, weenvironmental variables.

Recent conceptual and computational advancements explore a variety of simulated data sets and describe thetypes of covariance structures (genetic, environmental,have made the genetic analysis of such function-valuedand otherwise) accommodated by each method. Last,traits readily accessible. Three methods have been ad-using empirical data on age-specific mortality and repro-vanced in the literature. First, random regression mod-ductive output in the fruit fly Drosophila melanogaster andels have been widely used for the analysis of longitudinalon growth in beef cattle, we evaluate the ability of eachdata in the traditional statistical literature (Diggle etmodel to adequately fit empirical data.al. 1994) and recently have been applied in the animal

breeding context (Jamrozik et al. 1997b). Second, theuse of orthogonal polynomials to approximate covari- THE GENETIC ANALYSIS OFance matrices was initially suggested by Kirkpatrick FUNCTION-VALUED TRAITSand Heckman (1989) and is closely related to the ran-

Detailed descriptions of the extension of classicaldom regression models (Meyer and Hill 1997; Meyerquantitative genetics to the analysis of function-valuedtraits is given in Kirkpatrick and Heckman (1989)and Pletcher and Geyer (1999). In short, the method

Corresponding author: Scott D. Pletcher, Department of Biology, Wolf-assumes the observed character is best described by ason House, 4 Stephenson Way, University College, London NW1 2HE,

England. E-mail: [email protected] function (or stochastic process) of some independent

Genetics 156: 913–922 (October 2000)

914 F. Jaffrezic and S. D. Pletcher

and continuous variable. Although any continuous vari- acter process model (Pletcher and Geyer 1999). Allthree methods are based on likelihood estimation—able is acceptable (e.g., the level of some environmental

factor), the most common is age, and all of the examples although the orthogonal polynomial approach was orig-inally published as a least squares estimation (Kirkpat-in this article focus on characters that change with age.

Further, it is assumed that the character values at each rick et al. 1990).Random regression: Random regression (RR) modelsage constitute a multivariate normal distribution on

some scale. employ parametric forms for the unobserved functionsin (1). Although traditionally a parametric mean curveAs with traditional quantitative genetics, it is assumed

that the observed phenotypic trajectory of the character is often used to estimate m(t), this is not essential. How-ever, the individual deviations from this curve [i.e., theis random and influenced by one or more unobservable

factors. In the simplest case one might consider the g(t) and e(t)] are assumed to be parametric functionsof time, and polynomials are often used. For example,additive contribution of many genes along with unpre-

dictable environmental effects. More complicated mod- the age-dependent deviations from the populationmean due to an individual’s genotype might be linearels involving interactions among different genes or spe-

cific environmental effects (e.g., maternal effects) are in time, such thatstraightforward, although computational difficulties will

g(t) 5 a1 1 a2t,likely arise. For the additive model, we assume the ob-served phenotype can be decomposed as where the ai are random genetic regression coefficients.

The regression coefficients are unobservable randomX(t) 5 m(t) 1 g(t) 1 e(t) 1 ε, (1)effects; they have a specific value for each individual;

where m(t) is a nonrandom function, the genotypic and they are assumed to be multivariate normally distrib-mean function of X(t), and g(t) and e(t) are Gaussian uted. The environmental deviations, e(t), are assumedrandom functions, which are independent of one anindependent of the genetic effects, and they are mod-other and have an expected value of zero at each age eled similarly.(Kirkpatrick and Heckman 1989; Pletcher and Genetic and environmental covariances as a functionGeyer 1999). They represent the age-dependent ge- of age are determined by the variances and covariancesnetic and environmental deviations, respectively. In this among the regression coefficients. Following the exam-context, e(t) is often referred to as the permanent envi- ple presented above, the genetic covariance betweenronmental effect and ε is the residual variation—-ε is ages s and t isassumed normally distributed with constant and un-

G(s, t) 5 Cov(g(s), g(t))known variance over time. The original developmentof the character process (CP) model did not include a 5 Cov(a1 1 a2s, a1 1 a2t)residual variance term (Pletcher and Geyer 1999).

5 Var(a1) 1 (s 1 t)Cov(a1, a2) 1 st Var(a2).Recently, however, we have found that data sets thatexhibit a great deal of measurement error support a (3)residual variance.

The primary objective in these models is to chooseThe goal of the analysis is to decompose the observedthe most appropriate parametric functions for the ge-variation in X(t) into its genetic and environmental con-netic and the permanent environmental deviations. Intributions by estimating covariance functions for g(t) andmany cases the parametric functions are nested ande(t). A covariance function, r(s, t), is a bivariate continu-likelihood-ratio testing can be used. Since this involvesous function that describes the covariance between anytesting the significance of parameters on the boundarytwo ages, r(s, t) 5 Cov X (s), X (t). By the independenceof their feasible parameter space, the test statistics areof g(t) and e(t), the phenotypic covariance function ofoften mixtures of chi-square distributions (Stram andX(t) is given by P(s, t) asLee 1994).

P(s, t) 5 G(s, t) 1 E(s, t), (2) Character process model: In contrast to the RR mod-els, the character process model does not attempt towhere G(s, t) is the genetic covariance function, and E(s, t)model the forms of the g(t) or e(t) functions. Instead,the environmental covariance function, which also includesparametric models for the covariance functions them-the residual variance. These functions are estimable viaselves [i.e., G(s, t) and E(s, t) in Equation 2] are themaximum likelihood (ML) or restricted maximum like-target of analysis (Pletcher and Geyer 1999).lihood (REML) when there are data on individuals of

Again taking the genetic covariance function as anvarious relatedness (Lynch and Walsh 1998; Pletcherexample, the covariance function can be decomposedand Geyer 1999).intoThere have been at least three different methods sug-

gested for estimating the desired covariance functions: G(s, t) 5 vG(s)vG(t)rG(|s 2 t|), (4)orthogonal polynomials (Kirkpatrick and Heckman1989), random regression (Meyer 1998), and the char- where vG(t)2 describes how the genetic variance changes

915Genetic Analysis of Function-Valued Traits

with age and rG(|s 2 t|) describes the genetic correlation G(s, t) 5 om

i50om

j50

φi(s)φj(t)kij, (6)between two ages. There are no restrictions on the formof vG(·), and it is often modeled using simple polynomi- where m determines the number of polynomial termsals (linear, quadratic, etc.). As presented in Pletcher used in the model, kij are the m(m 1 1)/2 unknownand Geyer (1999), the character process model assumes parameters to be estimated (the coefficients of the lin-correlation stationarity; i.e., the correlation between two ear combination), and φi is the ith Legendre polynomialages is assumed to be a function only of the time distance (Kirkpatrick et al. 1990). The environmental covari-(|s 2 t|) between them. Although strictly speaking this ance function is modeled similarly. Meyer and Hillassumption is almost surely wrong, experience suggests (1997) present a method for estimating covariance func-that it is expected to provide a reasonable approximations such as (6) directly from the data using REML.tion in most cases (Pletcher and Geyer 1999). The As originally presented, the orthogonal polynomialbenefit of correlation stationarity is that it allows numer- approach is similar in spirit to the CP model, and bothous choices for r(·), all of which satisfy several theoreti- differ in principle from the RR approach. In the RRcal requirements (Pletcher and Geyer 1999). methods, the primary model development occurs at the

We suggest an extension of the character process level of individual deviations (Equation 1). The analystmodel for nonstationary correlations using a method begins by considering the behavior of individual age-proposed by Nunez-Anton (1998) and Nunez-Anton specific deviations. The resulting covariance structureand Zimmerman (2000) in what they term structured is a consequence of these deviations. For the CP andantedependence models. The idea is to implement a OP models, the situation is reversed. The analyst beginsnonlinear transformation upon the time axis, f(t), such by considering the structure of the covariance matrixthat correlation stationarity holds on the transformed (Equation 2), and the shapes of the individual devia-scale—on the original scale the correlation is nonsta- tions are a consequence of this structure. In some casestionary. The correlation function is then defined as it may be possible to expose a duality between the two, asr(s, t) 5 r(|f (s) 2 f (t)|), and the functions suggested Meyer (1998) has done for certain RR and OP models.by Pletcher and Geyer (1999) remain valid. Ideally When the data are collected at equally spaced intervals,the transformation function should contain a small CP models with a constant variance and an absolutenumber of parameters with interpretable effects. exponential correlation (r(s, t) 5 uc

|s2t|) function areNunez-Anton and Zimmerman (2000) suggest a Box- equivalent to an autoregressive model of order 1. At

Cox power transformation such that present, however, analytical difficulties preclude moregeneral results for the character process models.

f l(t) 5

(tl 2 1)/l if l ? 0

log t if l 5 0, (5)EXAMPLES AND ANALYSES

where l is a parameter to be estimated. ConsideringEstimation procedures: All models were estimated us-an absolute exponential correlation function, r(s, t) 5

ing REML. In all cases a nonparametric mean functionu|f(s)2f(t)|, the correlations on the subdiagonals are mono-was used (i.e., a separate mean was fitted for each distincttone increasing if l , 1 or monotone decreasing if l .age in the data), which ensures a consistent estimate of1. If l 5 1 the nonstationary model reduces to a station-the covariance structure (Diggle et al. 1994). Compari-ary one. Thus, a likelihood-ratio test of the null hypothe-son among models was based on the Bayesian informa-sis H0: l 5 1.0 can be used to quantitatively examinetion criterion (BIC; Schwarz 1978), which provides forthe extent of nonstationarity in the data. Additionallikelihood-based comparison among nonnested mod-flexibility in the nonstationary pattern might beels. BIC isachieved by considering more than one parameter l.

log-likelihood 2 1⁄2 3 number of parameters in the model 3 log n*,For example, one might incorporate distinct li for dif-ferent values of |s 2 t|, which is equivalent to a separate where n* 5 n 2 p when using REML with n the numberli for each subdiagonal of the covariance structure. of observations in the data set and p the number of

Orthogonal polynomials: Kirkpatrick and Heckman fixed effects. The model selected is the one that maxi-(1989) originally presented the use of orthogonal poly- mizes the criterion.nomials (OPs) as a nonparametric way of “smoothing” To determine the best-fitting model under each tech-previously estimated covariance matrices. This was the nique, a large number of models were fit to each datafirst attempt to formalize the estimation of covariance set. For the character process method, .100 differentfunctions in a genetic context. As with the CP model, models (i.e., different combinations of polynomial vari-the shapes of the individual age-dependent deviations ance functions and stationary and nonstationary correla-were not considered, and models for the structure of tion functions) were investigated, and the best modelthe variance-covariance matrix itself were the focus of was chosen according to the BIC criterion. We choseattention. Kirkpatrick and Heckman (1989) suggest to examine a large number of CP models for reasons

of thoroughness. The CP models are relatively new,that the genetic covariance function be represented as


and the behavior of these models is not well known. In with genetic variance function identical to that in thepractice, such an exhaustive search is not required, as stationary CP data, but with an arbitrary nonstationarystandard model selection procedures (e.g., sequential correlation structure (Figure 1B). The environmentaladdition of polynomial terms to the variance function) covariance was assumed identical to that in the station-result in identical conclusions (results not presented). ary CP data. This data set is the nonstationary CP data.For both random regression and orthogonal polynomial The third data set was simulated according to a ran-methods, the appropriate polynomials of increasing de- dom regression model with linear deviations for bothgree were fit until an increase in degree no longer the genetic and environmental parts. The chosen pa-resulted in a significant increase in the log-likelihood rameter values resulted in genetic and environmentalat the a 5 0.05 level (Meyer and Hill 1997). We find correlations that remained quite high over all ages inthat a reasonable approach to model selection requires the data (Figure 1C).on the order of 5–10 model fits for each method. The last data set that we present was simulated ac-

Estimates of the covariance structure based on ran- cording to an OP model, with quadratic Legendre poly-dom regression and orthogonal polynomial methods nomials for the genetic and environmental parts (i.e.,were obtained using the software package ASREML m 5 2 in Equation 6). The shapes of the covariance(Gilmour et al. 1997), while estimates of the character functions were rather undulating, as is expected fromprocess model (and certain orthogonal polynomial functions based on orthogonal polynomials. Parametermodels) were obtained using computer software devel- values were chosen such that the environmental correla-oped by one of the authors (S. Pletcher; C code and tion remained quite high over time while the geneticexecutable files freely available). A series of exploratory correlation was highly nonstationary (Figure 1D).analyses were conducted to ensure the two software To compare the fit of the models we calculated good-packages produced comparable log-likelihoods. A small ness-of-fit statistics for the estimated variance and corre-number of covariance structures could be fitted by both lation functions under each model with respect to thepackages (models of constant variance and correlation simulated structure. Goodness of fit was quantified byacross ages, and small orthogonal polynomial models) the concordance correlation coefficient, rc, describedand these structures were fitted to several data sets. In by Vonesh et al. (1996; see appendix). The possibleall cases, identical log-likelihoods were reported by each values of rc are in the range 21 # rc # 1, with a perfectpackage. fit corresponding to a value of 1 and a lack of fit to

Simulated data: Many data sets were simulated ac- values #0.cording to various covariance structures from CP, RR, Empirical data: Drosophila reproduction and mortality:and OP models. All were built assuming a standard sire

Age-specific measurements of reproduction and mortal-design (i.e., groups of half-sibs) in which 12 offspring

ity rates were obtained from 56 different recombinantfrom each of 70 sires were measured at five differentinbred (RI) lines of D. melanogaster, which are expectedages (Lynch and Walsh 1998). Under such a design,to exhibit genetically based variation in longevity andthe estimated between-sire covariance function is di-reproduction (J. W. Curtsinger and A. A. Khazaeli,rectly proportional to the genetic covariance function.unpublished results). Age-specific measures of mortalityThe environmental covariance function and residualand average female reproductive output were collectederror are estimated based on the within-sire and thesimultaneously from two replicate cohorts for each ofwithin-animal variation. We present the results of four56 RI lines. Deaths were observed every day, while eggrepresentative data sets. Because the magnitudes of thecounts were made every other day. For both mortalityvariance and covariances were different among the sim-and reproduction the data were pooled into 11 5-dayulations, we set the residual variance for all simulationsintervals for analysis. Mortality rates were log trans-to z10% of the total variance at age 0.formed and reproductive measures were square-rootThe first data set was simulated according to a station-transformed to insure the age-specific measures wereary CP covariance structure, the purpose of which wasnormally distributed.to assess the behavior of RR and OP models when the

Growth in beef cattle: These data come from the Wo-genetic correlation decreases to zero within the rangekalup selection experiment in Western Australia andof the data. The genetic covariance function was com-correspond to January weights of 436 beef cows fromposed of a quadratic variance [i.e., a quadratic v2(·)77 sires. Weights were recorded between 19 and 82from Equation 4] and “normal” correlation (r(ti, tj) 5months of age, with up to six records per cow. Analysesexp(20.8(ti 2 tj)2)) (Figure 1A). The environmentalwere carried out within 83 contemporary groups (year-covariance function was composed of a linear variancepaddock-age of weighing subclasses), fitted as fixed ef-and “Cauchy” correlation function (r(ti, tj) 5 1/(1 1fects. Additional information, along with access to the0.05(ti 2 tj)2)) (Pletcher and Geyer 1999). We referdata, can be obtained from Dr. Karin Meyer’s web pageto this data set as the stationary CP data.at the Animal Genetics unit of the University of NewTo examine a well-behaved covariance function with a

somewhat nonstationary correlation, we simulated data England, Australia (http://agbu.une.edu.au/zmeyer).


Figure 1.—Contour plotsof the simulated genetic co-variance structures for (A)data generated accordingto a stationary character pro-cess (CP) model, (B) datasimulated according to aCP model with arbitraryand nonstationary correla-tion (this is a discrete valuematrix rather than a contin-uous function), (C) datagenerated under a randomregression (RR) model withlinear deviations, and (D)data simulated assuming anorthogonal polynomial (OP)model of degree 2.

RESULTS correlation patterns that decrease asymptotically to zerowithin the range of the data, and the correlation obtainedSimulations: For the stationary CP data, the best ran-by both models goes negative (Figure 2).dom regression model according to the BIC criterion

The aim of the second simulated data set was to investi-was characterized by quadratic and linear deviationsgate the behavior of these models in the case of a ratherfor the genetic and environmental parts, respectively.simple nonstationary genetic correlation structure. TheHigher-order polynomials did not converge to a maxi-best RR and OP models were the same as for the stationarymum and could not be considered. The best OP modelCP data detailed in the previous paragraph. The RR modelcontained a cubic polynomial for the genetic covariancedealt very poorly with the nonstationary pattern of theand a quadratic for the environmental part. As expected,genetic correlation (rc 5 0.10); the correlation was esti-the simulated structure was accurately recovered by themated to be very high over all ages. Again, the greaterstationary character process model. Concordance co-number of parameters in the best-fitting OP model overefficients rc describing the goodness of fit for the vari-the regression model provided a better fit to the correla-ance and correlation functions are given in Table 1. Fortion structure (rc 5 0.70). Surprisingly, the CP modelthe RR and OP models, the environmental covariancefailed to accurately estimate the nonstationary correlationstructure (including both the variance and correlation)pattern (Table 1). Our nonstationary extension did notwas very well fitted (rc ≈ 1). The genetic variance wassignificantly improve the goodness of fit (BIC 5 24454also well modeled, but both models had trouble dealingand 24456 for stationary and nonstationary models, re-with the rapidly decreasing genetic correlation function.spectively; P 5 0.052 for a likelihood-ratio test of l 5 1.0).Although the OP model could better estimate the geneticHowever, the goodness of fit of the fitted nonstationarycorrelation (rc 5 0.61 for OP compared to 0.36 for RR),correlation (rc 5 0.55) is substantially better than thatit contains significantly more parameters than the regres-of the stationary model (rc 5 0.03), which provides ansion model (17 vs. 10), and both models exhibit similar

behavior. The polynomial structures are unable to handle interesting commentary on model selection criteria. In


TABLE 1

Goodness-of-fit values for covariance functions estimated from threedifferent methods on simulated data

Simulated covariancestructure Model VarG CorrG VarE CorrE BIC

Stationary CP CP 0.98 1.0 1.0 1.0 24591RR 0.96 0.36 0.93 0.87 27414OP 0.98 0.61 0.98 0.98 26605

Nonstationary CP CP 0.91 0.03 0.99 1.0 24454RR 0.95 0.10 0.94 0.81 27397OP 0.84 0.70 0.98 0.97 26628

Random regression CPa 1.0 0.93 0.96 0.93 23817RR 1.0 0.94 0.99 1.0 23803OP 1.0 0.94 0.99 1.0 23803

Orthogonal polynomial CPa 0.86 0.10 0.69 0.94 214334RR 0.30 0.15 0.94 0.90 214371OP 0.99 0.83 0.99 1.0 214272

Concordance values (see appendix) for covariance functions estimated by three different methods on fourrepresentative covariance structures. The methods are CP, the character process model; RR, the randomregression model; and OP, the orthogonal polynomial model. VarG represents the fit to age-specific geneticvariances; CorrG refers to the fit to genetic correlations between ages; VarE represents the fit to environmentalvariances; and CorrE shows the fit to environmental correlations between ages. See text for details of thesimulated covariance structures and details of the best-fitting models for each approach.

a The best-fitting correlation function was a nonstationary CP model.

retrospect, the nonstationarity in this data set was predomi- netic and environmental) remained quite high over time.Our nonstationary extension of the CP model was success-nantly between extreme ages (ages 1 and 5). It is possible

that more observations per individual are needed to detect ful in providing a good fit to the data. The genetic covari-ance structure was described by a quadratic variance andsmall to moderate levels of nonstationarity (see fly repro-

duction data). The genetic variance function and environ- nonstationary correlation given by the characteristic func-tion of the Uniform distribution (Pletcher and Geyermental covariance structure were identical to that for the1999), and the environmental variance function was linearstationary CP data and were well fit by all the methodswith a Cauchy correlation. The goodness of fit for the(Table 1).genetic correlation structure was improved substantiallyAll methods did a reasonable job of estimating theover a stationary model (rc 5 0.74, BIC 5 23819 andgenetic and environmental covariance structures gener-rc 5 0.93, BIC 5 23817 for the stationary and nonstation-ated according to a random regression model with linearary CP models, respectively).deviations. Under this model the correlations (both ge-

Although we have essentially no idea what a typical age-dependent genetic covariance function might look like,the data set simulated with an OP structure might beconsidered pathological in that the genetic covariancestructure is highly irregular. In fact, the genetic correlationis negative between early ages but highly positive betweenlate ages (Figure 1D). Such a structure is, however, typicalfor OP models (Kirkpatrick et al. 1994). Convergenceproblems hindered our ability to obtain estimates of highdimensional random regression models, and the best RRmodel was not able to accommodate either the simulatedgenetic variance or correlation (rc 5 0.30 and rc 5 0.15,respectively). Both the genetic and environmental covari-ance structures were described by a quadratic varianceand nonstationary correlation given by the characteristic

Figure 2.—Genetic correlations between age 1 and other function of the Uniform distribution. When comparedfor the simulated stationary character process data and fitted

to random regression, the CP model is much better atgenetic correlations obtained from the random regressionestimating the genetic variance function but is slightlymodel with linear deviations and orthogonal polynomial of

degree 3. worse at approximating the correlation structure (Table


TABLE 2

Results of covariance function estimation on empirical data

Method Genetic Environmental NPCov Log-likelihood BIC

Fly mortality (N 5 955)11 fixed effects CP Quad-Cauchy Lin-Cauchy 7 2186.0 2247.7

OP Cubic Quadratic 17 2242.1 2338.0RR Quadratic Quadratic 13 2298.2 2380.4

Fly reproduction (N 5 1109)11 fixed effects CP Const-Expa Quad-Cauchya 8 494.1 427.5

OP Cubic Quadratic 17 451.4 353.4RR Quadratic Linear 10 374.0 300.5

Beef cattle growth (N 5 1626)24 fixed effects CP Lin-Exp Lin-Exp 7 26895.6 27010.0

RR Constant Linear 6 26910.7 27021.4OP Linear Linear 8 26908.3 27026.4

The best-fitting genetic and environmental covariance functions for three different methods using empiricaldata on fruit fly mortality and reproduction and growth in beef cattle. Also presented is the log-likelihood ofthe models at their maximum and the BIC model selection criterion. NPCov represents the number of estimatedparameters in the covariance structure for each model. The number of fixed effects reflects the number ofdifferent ages at which observations were obtained, and N is the total number of observations. Quad, quadratic;Const, constant; Exp, exponential; Lin, linear.

a The best-fitting correlation function was a nonstationary CP model.

1). The environmental covariance is better behaved and This is true for reproductive output as well, and the sig-nificant nonstationary parameter in the genetic correla-much less of a problem. As seen with the random regres-tion provides evidence for an increase in the correlationsion simulations, the strong positive correlations across allbetween two equidistant ages with increasing age.ages are well fit by all the methods.

Beef cattle: Although differences in fit among the meth-Empirical: Drosophila reproduction and mortality: For age-ods are less dramatic for beef cattle than for Drosophila,specific mortality and reproduction in Drosophila, thethe character process model again provides a significantlycharacter process model provided a significantly better fit,better fit (as determined by the BIC criterion) than eitheraccording to the BIC criterion, than either the orthogonalrandom regression or orthogonal polynomial methodspolynomial or random regression methods (Table 2). In(Table 2). The best-fitting model for the genetic part wasfact, the CP models achieved higher likelihoods despitea linear variance (increasing with age) and an absolutecontaining significantly less parameters than the OP orexponential correlation (rG(ti, tj) 5 u|ti2tj|)). There wasRR models. For age-specific mortality, the best-fittingno evidence for nonstationarity in the data. Parametermodel for the genetic covariance was a quadratic varianceestimates and their standard errors for the CP model arewith a Cauchy correlation function (rG(ti, tj) 5 1/(1 1presented in Table 3, and the fitted genetic covarianceu(ti 2 tj)2)). For fly reproduction the best character processstructure is shown in Figure 3C.model was a constant variance at all ages coupled with a

nonstationary correlation function described by the abso-lute exponential, rG(ti, tj) 5 u|f(ti)2f(tj)| (see text following

DISCUSSIONEquation 5). Parameter estimates and their standard er-rors for the CP model are presented in Table 3, and the The quantitative genetic analysis of repeated measuresfitted genetic covariance structures are presented in Figure and other function-valued traits requires the estimation3, A and B. of continuous covariance functions for each source of

The simplicity of the character process model allows variation in a particular statistical model. Traditionally,quantitative statements about the predominant attributes statistical geneticists interested in characters that changeof the genetic covariance function. Genetic variance for gradually along some continuous scale have had to settleDrosophila mortality declines significantly with age, while for models that are either overparameterized (i.e., stan-genetic variance is constant at all ages for reproductive dard multivariate methods) or oversimplified (e.g., com-output. For mortality, the parameter in the genetic correla- posite character analysis; Meyer 1998; Pletcher andtion function was significantly different from zero (P , Geyer 1999). In recent years, however, the introduction0.0001), suggesting that mortality rates become less geneti- and development of random regression models, orthogo-

nal polynomial models, and models based on stochasticcally correlated as ages become further separated in time.


TABLE 3

Character process model estimates of genetic and environmental covariancefunctions for empirical data

Parameters Genetic Environmental Residual

Fly mortalityu0 0.28 (0.12) 0.53 (0.05) Noneu1 0.35 (0.08) 20.03 (0.007)u2 20.03 (0.007) —uC 0.10 (0.02) 1.76 (0.29)

Fly reproductionu0 0.18 (0.03) 0.10 (0.02) Noneu1 — 20.01 (0.01)u2 — 20.002 (0.001)uC 0.26 (0.15) 4.0 (2.0)l 20.63 (0.30) 0.51 (0.13)

Beef cattle growthu0 0.0001a (186.3) 0.0001a (257.8) 1000.8 (85.35)u1 4.12 (6.95) 38.94 (7.77)uC 0.99 (0.02) 0.99 (0.003)

Parameter estimates (and standard errors) for the best-fitting character process models for empirical dataon fruit fly mortality and reproduction and growth in beef cattle. u0, u1, and u2 represent parameters of thevariance function such that a quadratic variance is represented as v2(t) 5 u0 1 u1t 1 u2t 2. In cases where thebest-fitting model was constant or linear, the appropriate ui are omitted. uC and l are parameters of thecorrelation function. A residual term is not always added in the model.

a Parameter estimate is at the lower boundary and asymptotic standard errors may not be reliable.

process theory (i.e., the character process model) have mentioned previously, for random regression models theentire covariance structure is implicitly determined by theprovided important alternatives. Other types of random

regression models (e.g., nonlinear models as suggested by shapes of the regression polynomials, and covariance sur-faces described by orthogonal polynomials have a fixedLindstrom and Bates 1990 and Davidian and Giltinan

1995) may prove useful, but they are currently difficult to relationship between variance and correlation. This limita-tion is exemplified in the analysis of growth in beef cattle.implement.

Through extensive investigation of a variety of simulated For the genetic deviation, the best-fitting RR model in-cluded only a random intercept. This implies not onlycovariance structures and empirical data, we find that

under most conditions the CP models provide the best that the variance is considered constant over time but alsothat the correlation is constant and equal to 1 across alldescription of the underlying covariance structure. It is

clear from the simulation results that the CP model is the ages, which is probably not appropriate (Figure 3C).Applying the same argument to the fertility data in Dro-only method that adequately captures a correlation that

declines rapidly to zero as character values become further sophila, the best-fitting CP model for the genetic partwas a constant variance with a rather rapid decline inseparated in time. Both random regression models and

orthogonal polynomials have noticeable problems approx- correlation between increasingly separated ages (Table 3).Such a combination is simply not possible under the RRimating such a structure (Table 1, stationary CP data;

Figure 2). Polynomials do not have asymptotes, and the or OP methods. It is also likely that the separation ofvariance and correlation was a major factor contributingrapid decline in correlation tends to force both methods

to estimate correlations that are strongly negative within to the ability of the CP model to reasonably estimate thegenetic variation with a much smaller number of parame-the range of the data. Although the characteristics of

covariance functions for natural organisms remain gener- ters (4 parameters) than random regression (10 parame-ters) or orthogonal polynomial (17 parameters) modelsally unknown, this is a serious limitation as asymptotic

behavior in covariances/correlations are to be expected (Table 2).The data sets we examined were small in comparison(Pletcher and Geyer 1999). Other parameterizations of

the RR models (e.g., using orthogonal polynomials in the to those commonly analyzed in agricultural and breed-ing contexts. Using extremely large data sets, compli-regression) may prove more useful in this regard. On the

other hand, RR and OP models work quite well when cated covariance and correlation models may be ofgreater use, and the random regression and orthogonalthe correlation structure remains high over time (see Ta-

ble 1, environmental correlation in CP simulated data). polynomial methods may begin to show an advantage.Large data sets would also relieve the convergence prob-A further advantage of the CP models appears to be the

ability to model the variance and correlation separately. As lems we experienced with high-order random regres-


several important limitations of the process models thatsuggest avenues for further development. First, addi-tional ways of relaxing the stationarity assumption(Pletcher and Geyer 1999) without greatly increasingthe number of parameters are needed. Although notappropriate in all situations, a promising direction pro-posed by Nunez-Anton and Zimmerman (2000) hasbeen studied here and seems to offer reasonable flexi-bility in practice. Second, CP models require the manip-ulation (inversion, factorization, etc.) of matrices whosedimensions are proportional to the number of ages inthe data set, regardless of the size of the model itself(Meyer 1998). A method of reparameterization, similarto that used for RR and OP models (Meyer 1998),would be useful. Third, a method for estimating theeigenfunctions of covariance functions used by the pro-cess models would provide insight into patterns of ge-netic constraints across ages (Kirkpatrick et al. 1990;Kirkpatrick and Lofsvold 1992).

Last, the genetic analysis of two or more function-valued traits is an important goal. Generalization ofregression models to multitrait analyses is straightfor-ward and has already been used, for instance, to analyzeage-dependent milk production, fat, and protein con-tent in dairy cattle (Jamrozik et al. 1997a). Bivariatecharacter process models might be implemented by de-fining a parametric cross-covariance function betweenthe two traits, but appropriate forms for this functionare yet to be discovered.

W. Hill, N. Barton, and two anonymous reviewers provided valuablecomments on the manuscript. Thanks to J. Curtsinger and A. Khazaelifor generously providing published and unpublished data. F.J. thanksthe INRA for support during this project.

LITERATURE CITED

Davidian, M., and D. M. Giltinan, 1995 Nonlinear Models for RepeatedMeasurement Data. Chapman and Hall, London.

Diggle, P. J., K. Y. Liang and S. L. Zeger, 1994 Analysis of Longitudi-nal Data. Oxford University Press, Oxford.

Gilmour, A. R., R. Thompson, B. R. Cullis and S. J. Welham, 1997ASREML Manual. New South Wales Department of Agriculture,Orange, 2800, Australia.

Jamrozik, J., L. Schaeffer, Z. Liu and G. Jansen, 1997a Multipletrait random regression test day model for production traits.Proceedings of 1997 Interbull Meeting, Vol. 16, pp. 43–47.

Figure 3.—Contour plots of genetic covariance functions Jamrozik, J., L. R. Schaeffer and J.-C. M. Dekkers, 1997b Geneticfitted by the character process model. (A) Age-specific mortal- evaluation of dairy cattle using test day yields and random regres-ity in the fruit fly, Drosophila melanogaster; (B) age-specific re- sion model. J. Dairy Sci. 80: 1217–1226.

Kirkpatrick, M., and N. Heckman, 1989 A quantitative geneticproduction in D. melanogaster, (C) age-specific growth in beefmodel for growth, shape, reaction norms, and other infinite-cattle.dimensional characters. J. Math. Biol. 27: 429–450.

Kirkpatrick, M., and D. Lofsvold, 1992 Measuring selection andconstraint in the evolution of growth. Evolution 46: 954–971.

Kirkpatrick, M., D. Lofsvold and M. Bulmer, 1990 Analysis ofsion and orthogonal polynomial models. Unfortunately, the inheritance, selection and evolution of growth trajectories.most quantitative genetic studies of natural and experi- Genetics 124: 979–993.

Kirkpatrick, M., W. G. Hill and R. Thompson, 1994 Estimatingmental populations are extremely labor intensive, andthe covariance structure of traits during growth and ageing, illus-sample sizes will often be similar to those reported here. trated with lactation in dairy cattle. Genet. Res. 64: 57–69.

For these situations, the properties of the character pro- Lindstrom, M. J., and D. M. Bates, 1990 Non-linear mixed effectsmodels for repeated measures data. Biometrics 46: 673–687.cess models (e.g., easy hypothesis testing, few and inter-

Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitativepretable parameters) make it a useful option. Traits. Sinauer Associates, Sunderland, MA.

Meyer, K, 1998 Estimating covariance functions for longitudinalDespite their apparent success in this study, there are


data using a random regression model. Genet. Sel. Evol. 30:rc 5 1 2

RT2 1i51 RT

j5i11 (yij 2 yij)2

Ri,j(yij 2 y)2 1 Ri,j(yij 2 y)2 1 T(T 2 1)(y 2 y)2/2,221–240.

Meyer, K., and W. G. Hill, 1997 Estimation of genetic and pheno-(A1)typic covariance functions for longitudinal or ‘repeated’ records

by Restricted Maximum Likelihood. Livest. Prod. Sci. 47: 185–200.

where yij represents the estimated correlation betweenNunez-Anton, V., 1998 Longitudinal data analysis: non-stationaryerror structures and antedependent models. Appl. Stochastic times ti and tj given by the model and yij is the correlationModels Data Anal. 13: 279–287. between times ti and tj in the simulated data. T representsNunez-Anton, V., and D. L. Zimmerman, 2000 Modeling non-sta-

the total number of times at which measurements weretionary logitudinal data. Biometrics 56 (in press).Pletcher, S. D., and C. J. Geyer, 1999 The genetic analysis of age- taken. y and y are the means of the correlation values

dependent traits: modeling a character process. Genetics 153: for the simulated data and for the model, respectively.825–833.The concordance coefficient for the variance estimateSchwarz, G., 1978 Estimating the dimension of a model. Ann. Stat.

6: 461–464. is much simpler and given byStram, D. O., and J. W. Lee, 1994 Variance components testing in

the longitudinal and mixed effects model. Biometrics 50: 1171–1177.

rc 5 1 2RT

i51 (yi 2 yi)2

Ri(yi 2 y)2 1 Ri(yi 2 y)2 1 T(y 2 y)2, (A2)Vonesh, E., V. Chinchilli and K. Pu, 1996 Goodness-of-fit in gener-

alized nonlinear mixed-effects models. Biometrics 52: 572–587.

Communicating editor: C. Haleywhere the y’s now refer to the actual and estimatedvariances rather than correlations.

APPENDIX: GOODNESS OF FIT OF THE The coefficient rc is directly interpretable as a concor-COVARIANCE STRUCTURE dance coefficient between observed and predicted val-

ues. It directly measures the level of agreement (concor-The concordance correlation coefficient rc describeddance) between yij and yij, and its value is reflected inby Vonesh et al. (1996) was used in the simulation studyhow well a scatter plot yij vs. yij falls about the line identity.to evaluate the goodness of fit for both the varianceThe possible values of rc are in the range 21 # rc # 1,and correlation functions estimated by the models whenwith a perfect fit corresponding to a value of 1 and acompared to the simulated structure. For the correla-

tion structure, for instance, we consider lack of fit to values #0.

BIOMETRICS 58, 157-162 March 2002

Generalized Character Process Models: Estimating the Genetic Basis of Traits That Cannot Be Observed and That Change with Age or Environmental Conditions

Scott D. Pletcher Department of Biology, Galton Laboratory, University College London NW1 2HE, U.K.

email: [email protected]

and

Florence Jaffrhzic Institute of Animal, Cell, and Population Biology, Edinburgh University, Edinburgh, Scotland

SUMMARY. The genetic analysis of characters that change as a function of some independent and continuous variable has received increasing attention in the biological and statistical literature. Previous work in this area has focused on the analysis of normally distributed characters that are directly observed. We propose a framework for the development and specification of models for a quantitative genetic analysis of function- valued characters that are not directly observed, such as genetic variation in age-specific mortality rates or complex threshold characters. We employ a hybrid Markov chain Monte Carlo algorithm involving a Monte Carlo EM algorithm coupled with a Markov chain approximation to the likelihood, which is quite robust and provides accurate estimates of the parameters in our models. The methods are investigated using simulated data and are applied to a large data set measuring mortality rates in the fruit fly, Drosophila melanogaster.

KEY WORDS: Age-specific mortality; Character process models; Genetic variation; Infinite dimensional traits; Quantitative genetics; Repeated measures.

1. Introduction Function-valued quantitative genetics (Pletcher and Geyer, 1999) or the genetics of infinite-dimensional characters (Kirk- patrick and Heckman, 1989) is concerned with estimating the genetic contribution to observed variation in characters that change as a function of age or some other continuous variable. Taking advantage of observations from related individuals, observed variation in the function-valued character is decomposed into genetic and nongenetic contributions by estimating continuous, bivariate covariance functions (Kirk- patrick and Heckman, 1989). These models have been shown to be effective when applied to a variety of characters from age-dependent patterns of reproductive output in fruit flies to growth and lactation curves in cattle (Diggle, Liang, and Zeger, 1994; Kirkpatrick, Hill, and Thompson, 1994; Jaffrezic and Pletcher, 2000).

In this article, we present theory and implementation of the genetic analysis of survival and other threshold characters thought to be influenced by a continuously distributed underlying trait, commonly termed frailty or liability, that is unobserved and changes as a function of some continuous variable, such as age. An important example is inference concerning age-specific mortality rates, which are genetically influenced but unobserved (Shaw et al., 1999). Other applica-

tions include estimating the genetic component of variation in the appearance of an environmentally induced phenotype across different environmental conditions (Roff and Bradford, 2000) or in the expression of an ordered categorical character across age and space (Wright, 1934). As a foundation for our development of a generalized function-valued quantitative genetics, we have chosen the character process model (Pletcher and Geyer, 1999). It has several desirable properties; most important for us is its improved efficiency-this model fits many observed covariance structures better and with fewer parameters than other popular models such as random regression and other repeated measures-type analyses (Jaffrezic and Pletcher, 2000).

2. Generalized Process Models We are interested in inferring the genetic basis of some character Y, which is not observed, given a series of measurements on an observed trait, which is denoted by X . We assume that some reasonable model for the relationship between X and Y is available and that all genetic and shared environmental effects are modeled with respect to the Y value. This is in keeping with the standard interpretation of threshold characters (Wright, 1934) and of correlated frailty (Yashin, Iachine, and Harris, 1999).

157

158 Biometrics, March 2002

When considering function-valued traits, it is assumed that the trajectory (over some continuous variable) of the character is random and influenced by one or more unobservable factors. For the additive model, we assume the unobserved character can be decomposed as

where t is some continuous measure and g ( t ) and e(t) are Gaussian random functions, which are independent of one another and have an expected value of zero at each age (Kirk- patrick and Heckman, 1989; Pletcher and Geyer, 1999). These represent genetic and environmental deviations at each value of t . p f t ) is the mean function, and E is the residual variation.

In practice, a finite number of observations (each associated with a particular value of the continuous variable) are made on a number of individuals i of varying relatedness. Thus, let y i ( t J ) , etc., denote the effects for individual i at point t j and y be a vector containing all data on all individuals in the order y1 ( t l ) , y1 ( t z ) , . . . , y2 ( t l ) , . . . ; then the distribution of y, fe, (y), is multivariate normal with density

and @ denotes the Kroneker product, A is a matrix containing coefficients of relatedness, II , is the identity matrix of size k, n is the number of measurements on each individual, and N is the total number of observations in the data. The remaining matrices, Z and R, are discrete representations of the covariance functions for the genetic and environmental processes given in (1). If G ( s , t ) = covg(s),g(t) and E(s,t) = cove(s),e(t), then Z[ i , j ] = G(t i , t j ) and R[i, j] = E ( t i , t j ) (Pletcher and Geyer, 1999). The residual variance is a:. The vector /3 describes the mean function nonparametrically by specifying a unique parameter for each value o f t in the data set.

Parametric forms for the covariance functions are based on the character process model where, taking G ( s , t ) as an example, the functions are written as

G ( s , t ) = ~ G ( ~ ) W G ( ~ ) P G ( ~ S - 41, (4)

where ?JG(t)2 describes how the genetic variance changes with age and ~ ~ ( 1 s - ti) describes the genetic correlation between two ages. There are no restrictions on the form of TIC(.), and it is often modeled using simple polynomials (linear, quadratic, etc.) either on the natural or the log scale. If the correlation between two ages is a function only of the time distance (1s - tl) between them (correlation stationarity), then numerous choices for p ( . ) are available, all of which satisfy several theoretical requirements (for a list, see Pletcher and Geyer, 1999). Strict correlation stationarity can be relaxed by implementing a nonlinear transformation on the time axis, f ( t ) (Nunez-Anton, 1998; Jaffrezic and Pletcher, 2000). The correlation function is then defined as p(s, t ) = p(/f(s) - f ( t ) l ) , and the functions suggested by Pletcher and Geyer (1999) remain valid.

The elements of the observed vector 5 are conditionally

independent given y, and

N N

i=l i=l

The likelihood associated with the observed data is N

fd5) = / n. fe, ( X i I Y i ) f O l (Y)dY, ( 6 ) a = 1

where 82 is a vector of parameters describing the relat,ionship between X and Y and 81 contains parameters describing the distribution of Y , which includes parameters of the variance functions, mean function, and potential fixed effects (Meyer and Hill, 1997; Pletcher and Geyer, 1999).

3. Likelihood Maximization The likelihood was maximized using a hybrid algorithm composed of Markov chain Monte Carlo EM (MCEM) (McCul- loch, 1997) and Markov chain Monte Carlo integration/maximization (MCMLE) (Shaw et al., 1999; Geyer, 1995). The computational cost of the MCEM algorithm is much lower than that of the MCMLE. However, parameter estimates obtained from MCEM show a good deal of variation (McCulloch, 1997; S. Pletcher, unpublished results) and confidence intervals are not easily obtained. The MCMLE provides accurate parameter estimates and confidence intervals, but it is computationally expensive and requires a reference point in the parameter space of @ = @I, & that is close to the MLE (Shaw et al., 1999). We found the following three-step procedure combines the strengths of both methods. First, the MCEM is used to determine the reference point, call it 00, for the MCMLE. Second, a single chain of random deviations from fe,(y 1 z) is obtained using a Metropolis algorithm (Shaw et al., 1999). These deviates are used to approximate the likelihood function (6) through a Monte Carlo evaluation of the integral (Geyer, 1995). Third, the approximation is maximized, and estimates and standard errors of the parameters are obtained. Details of the computational algorithms and relevant computer code are available from the first author (or see http://www.ucl.ac.uk/biology/goldstein/scott-index. htm).

4. Example For the following examples, the character we are interested in, y ( t ) , is the age-specific mortality function for a specific cohort of genetically identical individuals. The observed character z ( t ) is the number of individuals dying in that cohort at age t . Shaw and colleagues assumed parametric forms for the unobserved mortality curves using Gompertz and logistic functions (Shaw et al., 1999), which is analogous to a random regression on the age-dependent trajectories (Jaffrezic and Pletcher, 2000). Because the character process models have been shown to perform better than random regression models for observed function-valued characters (Jaffrezic and Pletcher, 2000), we extend the Shaw model to the generalized character process theory.

Measurements are taken at a finite number of ages, and therefore we observe a census vector x i g , which contains the number of individuals alive in cohort i at census number j . Similarly, we assume each cohort has a log-mortality rate yZ3 at census number j , and t i j is the age at which census number j was taken. We estimate genetic and environmental

Generalized Character Process Models 159

Table 1 Estimated genetic and environmental covariance functions for simulated dataa

VG V E

Data e Qc e OC

y-values (unobserved) 0.15 (0.03) 0.095 (0.041) 0.20 (0.006) 0.402 (0.006) z-values (observed) 0.17 (0.17) 0.081 (0.150) 0.20 (0.032) 0.403 (0.012)

"Covariance functions are composed of a constant variance across ages (v ( t )2 = 6') and a normal correlation function p(s, t ) = e - e c ( s - t ) 2 . Asymptotic standard errors of the estimates are in parentheses. y-values indicate results from a function-valued analysis directly on the unobserved frailty. 2-values represent the results of the Markov chain Monte Carlo models on the observed ages at death. Parameter estimates from the two methods are nearly identical, but standard errors are higher for the MCMC analysis.

covariance functions associated with y as well as a separate mean mortality rate (over all cohorts) for each t i j (call these parameters p j ) . The mean parameters are incorporated into the conditional distribution f(x I y), resulting in the unobserved variable y(t) representing the cohort-specific deviation from this mean at each age, which has mean function equal to zero (p = 0 in equation (2)).

Assuming a piecewise constant hazard function, the probability of an individual alive at the start of census j - 1 surviving the interval [ti(j-l>,tij) is p t t j = exp-pj(tij -ticj-1)). The number of deaths in the interval is binomially distributed with frequency p t i , and number of trials xi(j-l). Writing xi = zil, zi2, . . . and yi similarly, the conditional probability of observing a specific census vector for a specific cc- hort is

.r

3=7

where J is the number of census times (Shaw et al., 1999). This distribution is substituted into (5) and combined with (2) to yield the likelihood (6) for use in the Metropolis algorithm and in likelihood maximization.

4.1 Simulated Data Simulated ages at death were generated for 600 distinct cohorts (20 replicate cohorts from each of 30 genetically distinct lines) of 500 individuals each. The data were simulated using a covariance function with a constant variance (i.e., v$(t) = 0.2 in equation (4)) and standard normal correlation function (i.e., p (s , t ) = e-ec(s-t)2) for both the genetic and environmental parts (Bc = 0.1 and Bc = 0.4 for the genetic and environmental correlations, respectively). Similar results were obtained using other covariance functions and experimental designs.

The small number of lines in the simulated data leaves open the possibility that the realized genetic variance and covariance among lines may deviate significantly from the target values. To compensate for this, we estimated covariance functions for the realized y-values themselves (i.e., the unobserved age-dependent mortality rates), which are saved during the course of the simulation, and we used these estimates as metrics for determining the accuracy of the covariance functions estimated from the 2-values (the observed ages at death).

The MCEM routines provided an excellent 60 for the

MCMLE routines. The sample paths for the four covariance parameters (two for both the genetic and environmental covariance functions) show a rapid convergence to the neighborhood of the simulated value, with the genetic variancp converging less quickly than the others (data not presented). 00 for the MCMLE was obtained by averaging the values from the last 200 (of a total of 500) iterations.

The MCMLE routines were then used to obtain estimates and confidence intervals for the genetic and environmental covariance functions and for the mean mortality trajectory. The approximation of the likelihood was based on an MCMC sample size of 1000 random deviates sampled from the chain every 1000 steps. The genetic covariance function obtained from this analysis is in complete agreement with that obtained when standard methods are used on the unobserved y- values themselves (Table 1). The environmental covariance functions, which are estimated accurately with smaller sample sizes, are essentially identical. As expected, the asymptotic standard errors on the parameter estimates are much larger (up to five times larger) when obtained from the observed data (Table 1). It may be that increasing the MCMC sample size would reduce this difference. It is more likely, however, that there is simply more uncertainty in the estimates.

4.2 ~ o ~ a Z i t y in Drosophila melanogaster The Drosophila data are taken from a large mortality experiment composed of 29 genetically distinct lines of flies. The lines were created using an experimental mutagenesis technique whereby single mutational events were initiated in a genetically homogeneous background (S. D. Pletcher, unpublished data). Experimental populations differed among themselves genetically via one mutational event. Genetic variation in mortality rates as a function of age and genetic covariation in mortality between ages provide important insights into the age-specific properties of these mutations (Pletcher, Houle, and Curtsinger, 1998). Ages at death were recorded for four replicate cohorts (each of approximately 300 males) from each line and were pooled into 3-day intervals for analysis.

Exploratory analyses, including an examination of the phenotypic covariance structure and an estimate of the genetic variogram cloud (Jaffrezic, Pletcher, and Hill, 2001), suggested the use of genetic and environmental covariance functions that are composed of a linear variance function (i.e., v2(t) = yo + y l t in equation (4)) and a normal correlation function (i.e., p(s,t) = e--yc(s--t)2). The 80 value

160 Biometrics, March 2002

Figure 1. Contour plots of the genetic and environmental covariance functions estimated from a large mortality experiment using the fruit fly, Drosophila melanogaster. Functions represent age-dependent covariance in log- mortality rates. Both genetic and environmental covariance functions are described by a linear variance function v&(t) =

yo+ylt and normal correlation function p(s , t ) = e - Y c ( s - t ) 2 . A. Estimated genetic covariance function. +J < 0.0001, = 0.049, yk = 0.072. yo was estimated at its lower boundary (G 0). B. Estimated environmental covariance function. "yo = 0.21, = 0.022, 72 = 0.60.

for the MCMLE procedures was obtained by averaging 500 consecutive iterations of the MCEM algorithm after it was determined to have converged to a stable region for each parameter. The MCMLE routines were then executed with an MCMC sample size of 2000, and the chain was sampled every 1000 steps.

We found that both genetic and environmental variance for age-specific mortality increased with age. Early in life, environmental variance was very high, yo = 0.21 and 70 < 0.0001 for the environmental and genetic variances, respectively, but the rate of increase in genetic variance with age was faster (Figure 1). The correlation parameter was much higher in the environmental correlation function than it was in the genetic function (0.59 versus 0.075), implying that environmental covariance decreases much more rapidly as ages become more and more separated in time. This suggests a rather high degree of pleiotropy (single genes affecting mortality at more than one age) and a relatively transient influence of the environment. The degree of uncertainly in the parameter estimates of the genetic function is illustrated by profile likelihoods (Figure 2).

5. Discussion We present a flexible approach for examining the genetic basis of function-valued characters that are unobserved but that influence the expression of an observed character through some arbitrary, hypothesized form. The complexity of the models necessitated the use of stochastic methods for model specification, and we rely heavily on Markov chain Monte Carlo methods, which can be troublesome and difficult to implement. To alleviate some of the difficulties, we implemented a composite algorithm consisting of a Markov chain EM algorithm (MCEM) followed by a Markov chain approximation to the actual likelihood (MCMLE). This combination was found to work well for generalized linear mixed models (McCulloch, 1997), and many of the properties of convergence discussed in McCulloch (1997) apply to the models we developed here. The MCEM algorithm robustly provided excellent reference values (i.e., 00) from a wide range of starting points, which were then used in the MCMLE to estimate parameters and to obtain likelihood statistics and confidence intervals. Results obtained through the analysis of simulated data and of mortality rates in the fruit fly, Drosophzla ,melanogaster, show that variation accumulated through heterogeneity of starting values and through the stochastic nature of the Markov chain algorithms is surprisingly small (essentially inconsequential) in comparison with the support of the parameter estimates provided by the data (data not shown).

Although the algorithms are successful in recovering the underlying genetic structure in simulated data sets and in capturing the variation in Drosophila mortality rates, some limitations are apparent. Despite the large number of individuals in our data sets, the asymptotic standard errors (and profile likelihood functions) of the estimates in our analyses are considerable. This suggests that large sample sizes may be required for inference regarding the genetic basis of unobserved characters. In addition, our choice of covariance model was based on exploratory algorithms that will not apply in all situations (Jaffrezic et al., 2001). The development of model selection criteria similar to those used for observed function-valued traits is an important issue, and work is currently underway in this area.

Our examples have focussed exclusively on age-specific mortality rates. However, precisely the same theory applies to any nonnormally distributed phenotype that is thought to be determined by an unobserved, normally distributed

Generalized Character Process Models 161

5

-0 U 0 0 -c Q, -5 - $ Is) 0 -I -10

0.W 0.05 0.10 015 0.20

-15 0.01 0.12 0.23 0.34

’YO 5

-25 U 0 0 r 5 -55 $ Is) 0 -85 -J

-115 r o.ol 0.03 0.06 o.08 o.lo

0.01 0.12 0.23 0.34

’Y1

lo I -0

U 0 0 r a, -10 Y -1

-

bl 9 -20

-30 0.W 0.05 0.10 0.15 0.20

0.01 0.12 0.23 0.34

‘YC

Figure 2. Likelihood profiles for the parameters of the genetic covariance function estimated from a large mortality experiment in Drosophila. The estimated genetic covariance function is described by a linear variance function u g ( t ) =

70 +?It and normal correlation function p ( s , t ) = e--Yc(s-t)2. Estimated values are TO < 0.0001, = 0.049, and 72 = 0.072. 70 was estimated at its lower boundary (M 0). Insets focus on a narrow range of parameter values and provide guidance for the construction of 99% confidence intervals on the estimates.

character (Wright, 1934). An example may be the expression of a threshold character, such as the occurrence of a disease, over space or time. The distribution of the observed trait given the unobserved liability f,,, is the only aspect of the theory and computer code that requires change. Furthermore, although we prefer the character process model for describing the covariance structure of the unobserved character, random regression or orthogonal polynomial models could be implemented with equally small modifications.

ACKNOWLEDGEMENTS

We thank W. Hill, D. Commenges, and three reviewers for comments on the manuscript. We are indebted to F. Shaw for his clarification of many of the MCMC methods and A. Yashin for pointing out the problems of identifiability in survival models.

RESUME L’analyse genktique des caracthres dont l’expression est modifiee par une variable independante continue fait l’objet d’un int6rGt croissant dans la litt6rature biologique et sta- tistique. Les travaux antkrieurs dans ce domaine se sont concentres sur l’analyse des traits normalement distribuks et directement observables. Nous proposons un cadre mk- thodologique pour le developpement et la spkcification de modbles destines l’analyse gknetique quantitative de ce type de caractkres lorsqu’ils ne sont pas directement observables, comme la variation gCn6tique dam les taux de mortalitks age- specifiques ou les caractkres & seuil complexe. Nous employons un algorithme hybride MCMC (associant un algorithme de Monte Carlo de type EM et une approximation de la vraisemblance par les chaines de Markov) qui est robuste et fournit une estimation precise des paramktres de nos modkles. Ces mkthodes sont Btudikes dans le contexte de donnCes simulees et elles sont appliqukes a un large jeu de donnkes reelles mesurant les taux de mortalit6 de la mouche a fruit, Drosophila melanogaster.

REFERENCES

Diggle, P. J., Liang, K. Y., and Zeger, S. L. (1994). Analysis of Longitudinal Data. Oxford: Oxford University Press.

Geyer, C. J . (1995). Estimation and optimization of functions. In Markou Chain Monte Carlo in Practice, W. R. Gilks, S. Richardson, and D. J. Spiegelhalter (eds), 241-258. London: Chapman and Hall.

Jaffrezic, F. and Pletcher, S. D. (2000). Statistical models for estimating the genetic basis of repeated measures and other function-valued traits. Genetics 156, 913-922.

Jaffrezic, F., Pletcher, S. D., and Hill, W. G. (2001). Non- parametric estimation of covariance structure for genetic analysis of repeated measures and other function-valued traits. Genetical Research, in press.

Kirkpatrick, M. and Heckman, N. (1989). A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters. Journal of Mathematical Biology 27, 429-450.

Kirkpatrick, M., Hill, W. G., and Thompson, R. (1994). Estimating the covariance structure of traits during growth and ageing, illustrated with lactation in dairy cattle. Genetical Research 64, 57-69.

162 Biometracs, March 2002

McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Society 92, 162-170.

Meyer, K. and Hill, W. G. (1997). Estimation of genetic and phenotypic covariance functions for longitudinal or ‘repeated’ records by restricted maximum likelihood. Livestock Production Science 47, 185-200.

Nunez-Anton, V. (1998). Longitudinal data analysis: Non- stationary error structures and antedependent models. Applied Stochastic Models and Data Analysis 13, 279- 287.

Pletcher, S. D. and Geyer, C. J. (1999). The genetic analysis of age-dependent traits: Modeling a character process. Genetics 153, 825-833.

Pletcher, S. D., Houle, D., and Curtsinger, J. W. (1998). Age- specific properties of spontaneous mutations affecting mortality in Drosophila melanogaster. Genetics 148, 287-303.

Roff, D. A. and Bradford, M. J. (2000). A quantitative genetic analysis of phenotypic plasticity of diapause induction in the cricket allonemobius socius. Heredzty 84, 193-200.

Shaw, F., Promislow, D. E. L., Tatar, M., Hughes, K., and Geyer, C. J. (1999). Towards reconciling inferences concerning genetic variation in senescence. Genetacs 152,

Wright, S. (1934). An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19,

Yashin, A. A., Iachine, I. A., and Harris, J. R. (1999). Half of variation in susceptibility to mortality is genetic: Findings from Swedish twin survival data. Behavior Genetzcs 29, 11 -19.

553-566.

506-536.

Received November 2000. Revised September 2001. Accepted September 2001.

Copyright 2004 by the Genetics Society of AmericaDOI: 10.1534/genetics.103.019554

Multivariate Character Process Models for the Analysis of Two or MoreCorrelated Function-Valued Traits

Florence Jaffrezic,*,1 Robin Thompson†,‡ and Scott D. Pletcher§

*INRA Quantitative and Applied Genetics, 78352 Jouy-en-Josas Cedex, France, †Rothamsted Research, Harpenden, Herts AL5 2JQ,United Kingdom, ‡Roslin Institute (Edinburgh), Roslin, Midlothian EH25 9PS, United Kingdom and §Huffington Center

on Aging and Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030

Manuscript received July 1, 2003Accepted for publication May 17, 2004

ABSTRACTVarious methods, including random regression, structured antedependence models, and character

process models, have been proposed for the genetic analysis of longitudinal data and other function-valued traits. For univariate problems, the character process models have been shown to perform well incomparison to alternative methods. The aim of this article is to present an extension of these models tothe simultaneous analysis of two or more correlated function-valued traits. Analytical forms for stationaryand nonstationary cross-covariance functions are studied. Comparisons with the other approaches arepresented in a simulation study and in an example of a bivariate analysis of genetic covariance in age-specific fecundity and mortality in Drosophila. As in the univariate case, bivariate character process modelswith an exponential correlation were found to be quite close to first-order structured antedependencemodels. The simulation study showed that the choice of the most appropriate methodology is highlydependent on the covariance structure of the data. The bivariate character process approach proved tobe able to deal with quite complex nonstationary and nonsymmetric cross-correlation structures and wasfound to be the most appropriate for the real data example of the fruit fly Drosophila melanogaster.

THE need for a rigorous method of analysis for bio- ances. A comparison among these methods revealedlogical characters that are best considered as func- that, in many cases, character process models performed

tions of some independent and continuous variable is well in comparison to alternative methods, especiallyrapidly growing. Important examples of these so-called random regression, often providing a better fit to thefunction-valued traits include growth curves (Meyer covariance structure (genetic and nongenetic) with2001), age-specific components of organismal fitness fewer parameters (Jaffrezic and Pletcher 2000).such as survival or reproductive output (Pletcher et al. A parsimonious method for the analysis of two or1998), lactation curves in dairy cattle (Meuwissen and more correlated function-valued traits is needed. Al-Pool 2001; Jaffrezic et al. 2002), and gene expression though a multivariate extension of random regressionprofiles across age or environmental treatments (DeRisi models is straightforward, their sometimes poor perfor-et al. 1997; Pletcher et al. 2002). mance in the univariate case argues for the development

Several techniques have been proposed for single- of alternative methods. Moreover, the nature of thetrait (univariate) analyses. These include random reparameterization results in a dramatic increase in thegression models, which are based on a parametric mod- number of parameters required to describe complicatedeling of individual curves (Diggle et al. 1994), character covariance structures, which is often problematic. Theprocess models, which focus on parametric modeling data sets that are generated in experimental sciences,of the covariance structure (Pletcher and Geyer 1999), such as genetics, and that are used to estimate differentand structured antedependence models (SAD; Nunez- types of covariance structures (e.g., genetic and nonge-Anton and Zimmerman 2000; Jaffrezic et al. 2003), netic) are often too small to support the estimation ofwhere an observation at time t is modeled via a regres- many parameters (Pletcher et al. 1998). This wouldsion over the preceding observations. The number of also preclude the use of other models such as splineparameters is considerably reduced in the SAD approach functions.compared to the traditional antedependence models The aim of this article is to investigate an extension(Gabriel 1962), thanks to a parametric modeling of of the character process (CP) models (Pletcher andthe antedependence coefficients and innovation vari- Geyer 1999) to the multivariate case. The advantages

that apply to the CP models in the univariate setting, i.e.,a small number of parameters to model the covariancestructure and a high degree of flexibility, are crucial1Corresponding author: INRA-SGQA, 78352 Jouy-en-Josas Cedex,

France. E-mail: [email protected] for developing practical multivariate models. Several

Genetics 168: 477–487 (September 2004)

478 F. Jaffrezic, R. Thompson and S. D. Pletcher

cross-correlation and cross-covariance functions are Cov(g(t), g(s)) Cov(g1(t), g1(s)) Cov(g1(t), g 2(s))Cov(g 2(t), g1(s)) Cov(g 2(t), g 2(s)). (4)

studied, and their behavior is compared to multivariaterandom regression and structured antedependence As the covariance function has to be symmetric, it is required

thatmodels in a simulation study and in an example for thegenetic analysis of age-specific fecundity and mortality Cov(g(s), g(t)) Cov(g(t), g(s)). (5)in the fruit fly, Drosophila melanogaster.

Definition of matrix (t s): In the bivariate case, matrix(t s) is of dimension 2 2. The requirements on thismatrix are that it is positive definite, equal to the identity

MATERIALS AND METHODS matrix when t s, and should verify the symmetry property(t s) (s t). It corresponds to a bivariate extension

Bivariate character process models: A detailed description of the correlation functions proposed for univariate characterof the quantitative genetic model for univariate function-val- process models by Pletcher and Geyer (1999). All the func-ued traits is given by Jaffrezic and Pletcher (2000) and tions proposed in their article can be extended. Among them,Pletcher and Geyer (1999). In the genetic analysis of two however, the most commonly used are the exponential, thecorrelated function-valued traits, it is assumed that the ob- Gaussian, and the Cauchy correlations. These functions areserved phenotypic characters can be decomposed as defined as follows:

Y(t) (t) g(t) e(t), (1) Exponential: (t s) exp((|t s|)).Gaussian: (t s) exp((t s)2).

where Y(t) (Y1(t), Y2(t)) represent the observed phenotypic Cauchy: (t s) (I (t s)2)1.trajectories for the two characters Y1(t) and Y2(t), t represents

In the bivariate case, I is the 2 2 identity matrix and any continuous independent variable, which for clarity weis a 2 2 matrix, not necessarily symmetric, with positiveassume is time, (t) (1(t), 2(t)) are nonrandom func-eigenvalues. The matrix exponentiation corresponds to a se-tions that correspond to the genotypic mean functions of Y1(t)ries expansion and can be calculated using an eigenvalueand Y2(t), respectively, and g(t) (g1(t), g2(t)) represent thedecomposition as shown in appendix a.genetic deviations for the two characters. Both deviations are

The bivariate exponential function is also used in the statisti-correlated over time and g(t) is a bivariate Gaussian process.cal literature for the Ornstein-Uhlenbeck process (Sy et al.Similarly, e(t) (e1(t), e2(t)) are the environmental devia-1997).tions. Processes g(t) and e(t) are assumed independent of one

Further extension to this framework includes a relaxationanother, with mean zero at each age and with covarianceof stationarity of the correlation function. The nonstationaryfunctions G(t, s) and E(t, s). Focus is on the modeling of theseextension of the CP models proposed by Jaffrezic andcovariance functions.Pletcher (2000) is implemented by replacing time lags (t In the univariate character process approach, there is onlys) by a transformation (f(t) f(s)). Considering a Box-Coxone function-valued trait, Y(t), and its covariance functionstransformation, as suggested by Nunez-Anton and Zimmer-(genetic and environmental) are modeled asman (2000), and an exponential CP model, the correlationfunction can be written asG(t, s) v(t)v(s)(t, s), (2)

(t, s) exp(((t s )/)) (6)where v2(t) represents the variance function and is usuallya parametric function of the continuous variable such as a for 0 andpolynomial and (t, s) is the correlation function. Assuming

(t, s) exp((Log(t) Log(s))) (7)stationarity in the correlations, Pletcher and Geyer (1999)proposed parametric forms for the correlation function in- when 0.cluding an exponential ((t, s) exp(|t s|)), a Gaussian Definition of matrix V(t): In the bivariate case, matrix V(t)((t, s) exp((t s)2)), and a Cauchy ((t, s) 1/(1 is also of dimension 2 2. The requirements for this matrix(t s)2)) function. Jaffrezic and Pletcher (2000) suggested are that it is symmetric and positive definite. It in fact corre-a nonstationary extension of the models based on a nonlinear sponds to the covariance of the process at a given time t, astransformation of the timescale, f(t) (Nunez-Anton and Zim- matrix (t s) is the identity matrix when t s :merman 2000). Correlation stationarity is assumed to hold onthe transformed scale (t, s) (|f(t) f(s)|). V(t) Var(g(t)) Var(g1(t)) Cov(g1(t), g 2(t))

Cov(g1(t), g 2(t)) Var(g 2(t)) . (8)Models for bivariate Gaussian processes have been investi-

gated previously (Sy et al. 1997) as, for example, the bivariateWe present here two possible ways of modeling matrix V(t).Ornstein-Uhlenbeck process. It corresponds to a continuous-

It is possible to use a polynomial of time to model functiontime extension of a first-order autoregressive process [AR(1)],V(t). That would correspond to a direct bivariate extensionwhich is also equivalent to a CP model with an exponentialof the variance function of the character process modelcorrelation and a constant variance. We adapt these ideas to (Pletcher and Geyer 1999).extend the character process methodology. When considering, for example, a quadratic function ofLet the continuous variable of interest be time and the time, the bivariate variance function can be written as

object of analysis be the genetic covariance function. In thebivariate case, let g(t) (g1(t), g2(t)) be the genetic character ln(V(t)) A Bt Ct 2, (9)process, where g1(t) is associated with trait 1 and g2(t) with

where A, B, and C are 2 2 symmetric matrices. The ln( )trait 2. The bivariate covariance function of the process canof the variance again corresponds to a series expansion andbe written ascan be calculated as the exponential in the matrix by usingan eigenvalue decomposition as explained in appendix a.Cov(g(t), g(s)) V(t)1/2(t s)(V(s)1/2) (3)

The covariance matrix V(t) can also be decomposed interms of variance and correlation functions such as(for 0 s t), where

479Genetic Analysis of Correlated Function-Valued Traits

TABLE 1

Likelihood values for the simulated data sets based on unstructured covariance matrices

Model NPCov Example 1 Example 2 Example 3

US 55 2746.7 2401.8 3801.9CP Quad-Exp 13 551.0 799.4 588.1CP Quad-ExpNS 14 566.3 1478.9 703.0SAD(1) 12 262.2 1008.4 545.0SAD(2) 14 430.8 1380.2 864.4RR1 13 980.0 200.7 472.6

US, unstructured covariance matrix; CP Quad-ExpNS, quadratic polynomial used to model V(t), exponentialfunction for (t s) with the nonstationary extension (Equation 6); RR1, linear random regression modelwith three additional parameters for the residual structure; NPCov, number of parameters in the covariancestructure.

that the first-order bivariate structured antedependenceV(t) v21(t) v1(t)v2(t)12(t)

v1(t)v2(t)12(t) v22(t) . (10)

model [SAD(1)] was well able to capture the covariancestructures simulated under all these different assump-Variance functions can be modeled as for univariate charactertions (results not shown). The similarity between theseprocess models with polynomial functions of time. For a qua-

dratic function, for instance, v21(t) Var(g1(t)) exp(a 1 two approaches had already been pointed out in the

b 1 t c 1 t 2) and v22(t) Var(g2(t)) exp(a 2 b 2t c 2t 2). univariate case for SAD(1) models and CP with an expo-

Function 12(t) represents the cross-correlation between the nential correlation function (Jaffrezic et al. 2003). Ontwo traits at a given time t. A possible parametric modelingthe other hand, random regression models dealt poorlyfor this cross-correlation function iswith all the different covariance structures considered

Corr(g1(t), g2(t)) 12(t) exp(1t) exp(2t) (11) here, even when a cubic polynomial was used (involving36 parameters for the covariance structure).for 1, 2 0. For practical purposes, it is interesting to note

that this correlation function is equal to 0 at t 0, increases Simulations with unstructured covariance models: Toto a maximum at t [ln(2/1)]/(2 1), and then decreases understand better the abilities and limitations of theto 0 at infinity. different models, several patterns of covariance struc-A likelihood-ratio test can be used to examine specific

tures were investigated. To avoid favoring any of thehypotheses about the parameters. For example, testing if themethodologies, data were simulated with unstructuredcross-correlation between the two processes at all times t is

equal to zero is equivalent to testing if 1 2. The cross- covariance matrices. A total of 2000 animals were consid-correlation function 12(t) can also be assumed constant: ered with five observations for each trait. As focus was12(t) r, which would imply that the cross-correlations are

on the cross-correlation modeling, quite simple struc-equal for all t.tures for the variances and correlations of both variablesEstimation procedure: Parameters of these bivariate charac-

ter process models can be estimated with REML procedures, were chosen. Three examples are presented here.using, for example, the OWN function of ASREML (Gilmour In the first case, the data were generated using a cross-et al. 2002) as presented in appendix a. The nonstationary correlation that was stationary, symmetric, with quiteparameter (Equation 6) is estimated at the same time as the

high values. With regard to the likelihood value (seeother covariance parameters with standard REML procedures.Table 1), a simple bivariate linear random regressionThe properties of the proposed bivariate covariance function

are studied in appendix b. model was found to be the most appropriate, followedby the bivariate CP models and then the SAD models(all models had about the same number of parameters:

EXAMPLE from 12 to 14). Estimated cross-correlations obtainedwith the unstructured model and the bivariate linearSimulation study: A simulation study was performedrandom regresssion model are presented in Figure 1.to understand better the analogies between the differ-

In the second example, the cross-correlation was moreent methodologies: the bivariate CP model proposedcomplex. Although the correlations between the traitshere, the bivariate structured antedependence modelswere still quite high, they were nonstationary and non-presented in Jaffrezic et al. (2003), and the randomsymmetric. The bivariate quadratic random regressionregression models. In a first set of simulations, data weremodel did not converge and, on the other hand, thegenerated according to a bivariate CP model, with anlinear bivariate model was not able to deal adequatelyexponential “correlation” function (exp((t s)))with this cross-correlation pattern. It was found for theand a V(t) structure defined as ln V(t) A Bt Ct 2.character process model that the nonstationary exten-Different assumptions on parameters of , A, B, and Csion, using only one extra parameter (parameter inwere investigated, setting some elements to zero or giv-

ing various values to these parameters. It was found Equation 6), considerably improved the fit as shown in


Figure 1.—Estimated cross-correlations for example 1 of the simulation study for the unstructured model (US) and a bivariatelinear random regression model (RR).

Table 1. The likelihood value was then higher than that early ages and then increasing and decreasing for lateages. The likelihood value was higher for SAD(2) thanfor the second-order SAD model with the same number

of parameters. Figure 2 gives the estimated cross-correla- for all the other models. It can be seen, however, inFigure 3, that this model was not able to adequately fittions obtained with the unstructured model and with

the chosen bivariate CP model. the diagonal cross-correlation terms. On the other hand,although the likelihood value was a little lower thanIn the third example, the data were also generated

with nonsymmetric and nonstationary cross-correla- that with the second-order SAD model, the characterprocess model was better able to capture the diagonaltions, with lower values than those for the first two exam-

ples. The diagonal cross-correlations were lower for cross-correlation pattern. These figures do show, how-

Figure 2.—Estimated cross-correlations for example 2 of the simulation study for the unstructured model (US) and the chosenbivariate CP model: quadratic polynomial used to model V(t), exponential function for (t s) with the nonstationary extension(Equation 6).


Figure 3.—Estimated cross-correlations for example 3 of the simulation study, with data simulated with an unstructuredcovariance matrix. [US, unstructured covariance matrix; CP, quadratic polynomial used to model V(t), exponential function for(t s) with the nonstationary extension (Equation 6); SAD, second-order bivariate structured antedependence model; RR,linear random regression model.]

ever, that even for the chosen models, there is still scope output were collected simultaneously from two replicatecohorts for each of 56 RI lines. Deaths were observedfor improving the fit, although this might be difficult

while keeping the number of parameters reasonably low. every day, while egg counts were made every other day.For both mortality and reproduction, the data wereEmpirical data—joint analysis of fecundity and mor-

tality in Drosophila: Age-specific measurements of re- pooled into 11 5-day intervals for analysis. Mortality rateswere log transformed and reproductive measures wereproduction and mortality rates were obtained from 56

different recombinant inbred (RI) lines of D. melanogas- square-root transformed so that the age-specific mea-sures were approximately normally distributed.ter, which are expected to exhibit genetically based varia-

tion in longevity and reproduction (J. W. Curtsinger Parameter estimates for the different methodologieswere obtained with ASREML using the OWN functionand A. A. Khazaeli, unpublished results). Age-specific

measures of mortality and average female reproductive (Gilmour et al. 2002). Models were compared using the


TABLE 2

Likelihood values and BIC criterion (Schwarz 1978) for univariate and bivariate genetic analysesof fecundity and mortality in Drosophila

Genetic Environmental

Corr. Var. Corr. Var. NPCov Log L BIC

UnivariateMortality Cauchy Quad. Cauchy Lin.Fecundity Exp. NS Const. Cauchy NS Quad. 15 329.0 186.7

Mortality Cauchy NS Quad. Cauchy NS Quad.Fecundity Cauchy NS Quad. Cauchy NS Quad. 20 337.2 175.6

Bivariate Cauchy NS Quad-Const. Cauchy NS Lin-Quad. 23 377.9 204.8Cauchy NS Quad. Cauchy NS Quad. 28 380.6 188.2Exp. NS Quad. Cauchy NS Quad. 28 370.2 177.8Cauchy Quad. Cauchy Quad. 26 352.9 168.2Exp. NS Quad. Exp. NS Quad. 28 354.6 162.2

In both cases the logarithms of the variances were modeled, such as ln v 2(t) a bt c t 2 and ln(V(t)) A Bt Ct 2 with A, B, and C 2 2 symmetric matrices. Corr., correlation; Var., variance; Quad., quadratic;Lin., linear; Exp., exponential; Const, constant.

BIC criterion (Schwarz 1978; Jaffrezic and Pletcher in the methodology section). The main improvementof the bivariate model lies in its ability to model the2000): BIC ln L 0.5ncln(N p), where ln L is the

REML likelihood value, nc is the number of covariance cross-covariance structure. The likelihood value of thebivariate model (Log L 377.9) was indeed muchparameters in the model, p is the number of fixed ef-

fects, and N is the total number of observations. Stan- higher than that for the two univariate analyses (Log L 329.0). Therefore, taking into account the correlationdard likelihood-ratio tests could be used for nested mod-

els. Specific cases include testing if certain parameters function between the two variables fits the actual processmuch better. Estimates obtained for the chosen bivari-in matrices V(t) or are equal to zero. A nonparametric

mean function was used for both traits (i.e., a separate ate model are given in Table 3 and the first graph ofFigure 4 gives the genetic cross-correlation estimates.mean was fitted for each distinct age in the data), which

ensures a consistent estimate of the covariance structure They were found to be negative at all ages, nonstationaryand nonsymmetric. Fecundity and mortality were more(Diggle et al. 1994).

The best models chosen in the univariate analyses are strongly negatively correlated at a similar age (diagonalterms), and the correlation intensity decreased whengiven in the first part of Table 2. For the genetic part,

a Cauchy correlation with quadratic variance was chosen ages became farther apart.As they allow a simple and straightforward extensionfor mortality and a nonstationary exponential correla-

tion with a constant variance was chosen for fecundity. to the multivariate case, random regression models(RRM) are most often used for multivariate analyses ofMany different correlation and variance functions were

investigated for the bivariate analysis and the best ones longitudinal data. They may not always, however, be themost appropriate methodology. In this example, forregarding the likelihood value and BIC criterion are

given in Table 2. In the bivariate model, the correlation instance, the likelihood value was much higher for thecharacter process approach (Log L 377.9) than forfunction has to be the same for the two variables and

was chosen here to be a nonstationary Cauchy correla- a bivariate quadratic random regression model (LogL 134.7), despite having far more parameters (42 fortion (with parameter of the nonstationary extension

as in Equation 6). For the variance function, more flex- the RRM compared to 23 for the CP model). Moreover,increasing the order of the polynomials dramaticallyibility can be achieved in the choice of the function by

setting some parameters of matrices A, B, and C to zero. increases the number of parameters (for instance, fromquadratic to cubic: 42 to 72 parameters).In the bivariate model, the chosen function was, as in

the univariate case, quadratic for mortality and constant Although the difference was not as important as forrandom regression models, the likelihood value was alsofor fecundity. Estimates obtained for the variance and

correlation functions for fecundity and mortality were higher, in this example, for the bivariate CP modelthan for a bivariate structured antedependence modelvery similar with the univariate and bivariate models

(although their analytical forms were different, as shown (Jaffrezic et al. 2003; Log L 322.8, 24 parameters).


TABLE 3

Parameter estimates (and standard errors) for the bivariate genetic analysis of fecundity and mortality inDrosophila with the best-fitting bivariate character process model, for the BIC criterion, given in Table 2

Parameters Genetic Environmental Parameters Genetic Environmental

1 0.49(0.22) 5.82(2.45) b1 14.91(2.14) 0.04(0.19)2 1.20(0.61) 18.17(8.57) b 2 0.0 0.04(0.70)1 0.71(0.44) 1.16(0.30) b 3 0.0 2.22(0.44)2 0.18(0.31) 2.32(0.83) c 1 16.64(2.19) 0.0a 1 2.71(0.46) 0.92(0.13) c 2 0.0 0.75(0.56)a 2 1.99(0.14) 2.40(0.18) c 3 0.0 1.46(0.36)a 3 0.59(0.07) 0.41(0.12) 0.37(0.14) 0.43(0.11)

The variance functions are defined by ln(V(t)) A Bt C t 2, where t age/10 and

A a1 a3

a3 a2

and similarly for matrices B and C. Parameters 1, 2, 1, and 2 define matrix as specified in appendix afor the Cauchy correlation function, and is the nonstationary parameter (Equation 6).

The estimated genetic cross-correlations obtained estimated phenotypic cross-correlations and the un-structured estimates, the Vonesh concordance coeffi-with the three methodologies are presented in Figurecient (Vonesh et al. 1996) was used, as presented by4. Their patterns were found to be very different, evenJaffrezic and Pletcher (2000), considering the un-between the bivariate CP and SAD models, althoughstructured estimates as the correct values.there was only a small difference in their likelihood

The concordance coefficients were 0.77 for the CPvalues. As the true genetic cross-correlations are notmodel, 0.52 for the SAD model, and 0.73 for the RRknown, it is difficult, however, to know which patternmodel (a perfect fit being at 1.0). As shown with theis the closest to reality and how much discrepancy stilllikelihood value, the bivariate character process modelremains compared to the actual values.fit best the phenotypic cross-correlation structure. OnTo address these issues, a phenotypic analysis wasthe other hand, the goodness of fit was found higherperformed on these data, which allows us to obtainfor the bivariate random regression model than for theestimates for an unstructured covariance matrix (22 structured antedependence model (0.73 compared to22). This was not possible in the genetic study due to0.52), although the likelihood value was much higherthe very large number of parameters to be estimated.for the SAD model (Log L 183.8) than for the RREstimated phenotypic cross-correlations obtained withmodel (Log L 67.7). The SAD models were thereforethe different models are presented in Figure 5 and thein this case better able to model the covariance structureunstructured estimates were considered as the referencefor each trait separately, as in univariate analysis,model. Once again, the four estimated patterns werewhereas the random regression models were better ablefound to be very different. As in the genetic analysis,to fit the cross-correlation structure. The choice of thethe likelihood value was the highest for the charactermodel should therefore not be made regarding theprocess model ( 197.1 with a nonstationary Cauchylikelihood value only, but also depends on the prioritiescorrelation function and quadratic V(t) function, withof the study. In any case, in this particular study, the14 parameters, BIC 58.6), compared to a bivariatecharacter process model was more appropriate than theSAD(1) model (Log L 183.8, with 12 parameters,other two methodologies.BIC 53.0), a bivariate SAD(2) model (Log L 185.9,

Figure 5 shows, however, that the obtained cross-cor-with 14 parameters, BIC 47.4), and a quadratic bivari-relation patterns were still all quite different from theate random regression model (Log L 67.7, 21 parame-unstructured phenotypic estimates and that there is still,ters, BIC 97.7). The highest likelihood value, ob-therefore, scope for improvement.tained here with the bivariate CP model, is still, however,

quite far away from that of the unstructured model (LogL 535.6). But as the number of parameters in the

DISCUSSIONunstructured model is very large ( 253), its BIC valueis extremely low ( 522.4), and the best model with The character process model, originally proposed byregard to the BIC criterion here, therefore, is the bivari- Pletcher and Geyer (1999) to analyze function-valuedate CP model. traits, is based on a parametric modeling of the variance

and correlation functions of a stochastic process. It mod-To have a measure of the discrepancy between the


Figure 4.—Estimated genetic cross-correlations between fecundity and mortality obtained with the chosen CP model, a bivariateSAD(1) model, and a quadratic random regression model.

els the covariance structure with a small number of properties of the univariate character process approachand simultaneously allow a parametric modeling of theinterpretable parameters. A special case of these models

has been independently proposed in the statistical litera- cross-covariance structure. The proposed extension wasbased on an idea presented by Sy et al. (1997) for theture, namely the Ornstein-Uhlenbeck process (Taylor

et al. 1994). It is equivalent to a character process model Ornstein-Uhlenbeck process and was generalized toother kinds of correlation functions, including thosewith an exponential correlation function and constant

variances and represents a continuous time extension that are nonstationary.Models were presented here in the bivariate case, butof a first-order autoregressive model.

We proposed an extension of the univariate character extension to the analysis of more than two correlatedfunction-valued traits is straightforward and accom-process model to the multivariate case. Our goal was to

develop a method of analysis for two or more correlated plished by increasing the dimensions of matrices V and in accord with the number of traits analyzed.function-valued traits that would retain all the desirable


Figure 5.—Estimated phenotypic cross-correlations between fecundity and mortality obtained with the unstructured model(US); a character process model CP Quad-CauchyNS: quadratic polynomial used to model V(t), Cauchy function for (t s)with the nonstationary extension; a bivariate SAD(1) model; and a quadratic random regression model.

The first part of the simulation study highlighted the data and that the three models (random regression,structured antedependent, or character process) cansimilarities between the bivariate CP models with an

exponential correlation and bivariate first-order SAD be worthwhile depending on the particular biologicalphenomenon studied. When the cross-covariance struc-models (Jaffrezic et al. 2003), as in the univariate case.

Further differences between the two approaches appear ture is symmetric and stationary with quite high correla-tions, the most appropriate model to use might be awhen higher orders of antedependence are considered

or when other parametric correlation functions are used simple random regression model. When the cross-corre-lation structure becomes more complex it should bein the CP models.

It was found in the second part of the simulation study either structured antedependence or character processmodels, especially because the number of parametersthat the choice of the most appropriate methodology is

highly dependent on the covariance structure of the required in a more complex random regression model


Sy, J. P., J. M. G. Taylor and W. G. Cumberland, 1997 A stochasticdramatically increases. For the Drosophila analysis, themodel for the analysis of bivariate longitudinal AIDS data. Biomet-

bivariate character process model proved to be the most rics 53: 542–555.Taylor, J. M. G, W. G. Cumberland and J. P. Sy, 1994 A stochasticappropriate.

model for analysis of longitudinal AIDS data. J. Am. Stat. Assoc.The multivariate extension of the character process89: 727–736.

models represents a flexible and powerful technique Vonesh, E., V. Chinchilli and K. Pu, 1996 Goodness-of-fit in gener-alized nonlinear mixed-effects models. Biometrics 52: 572–587.for the genetic analysis of two or more function-valued

traits. Although the observed measurements are avail- Communicating editor: M. K. Uyenoyamaable only on a discrete timescale, this approach canmodel the fact that the underlying process is continuousand therefore can deal with highly unbalanced data. As APPENDIX A: IMPLEMENTATIONvariance parameters are assumed to change with time,

As suggested by Sy et al. (1997), to calculate the matrixother environmental factors of heterogeneity could beexponentiation used in the correlation functions, diago-included in the variance modeling, as suggested bynalization of matrix is used,Foulley and Quaas (1995). Further research might

extend these multivariate models to include the genetic 1, (A1)analysis of nonnormally distributed traits, as studied by

where is a diagonal matrix of the distinct eigenvaluesPletcher and Jaffrezic (2002) in the univariate case.1 and 2 of , and is a 2 2 matrix whose columns

We are most grateful to Jean-Louis Foulley, William G. Hill, Nancy are the right eigenvectors. The matrix exponential isHeckman, Jay Beder, and two anonymous referees for very interesting

then written and evaluated ascomments and ideas. Thanks go to J. Curtsinger and A. Khazaeli forgenerously providing published and unpublished data.

e(ts) e(ts)1. (A2)

For the exponential correlation,LITERATURE CITED

exp((t s)) 1 2

1 1 e1(ts) 0

0 e2(ts) 1 2

1 1 1

,DeRisi, J. L., V. R. Iyer and P. O. Brown, 1997 Exploring themetabolic and genetic control of gene expression on a genomicscale. Science 278: 680–686. (A3)

Diggle, P. J., K. Y. Liang and S. L. Zeger, 1994 Analysis of Longitudi-nal Data. Oxford University Press, Oxford. where parameters 1 and 2 are the elements of matrix

Foulley, J. L., and R. L. Quaas, 1995 Heterogeneous variances in (Sy et al. 1997). The Gaussian is similar, with (t s)Gaussian linear mixed models. Genet. Sel. Evol. 27: 211–228.being replaced by (t s)2. For the Cauchy correlation,Gabriel, K. R., 1962 Ante-dependence analysis of an ordered set

of variables. Ann. Math. Stat. 33: 201–212. taking advantage of the fact that 1 11, itGilmour, A. R., B. J. Gogel, B. R. Cullis, S. J. Welham and R. follows thatThompson, 2002 ASREML User Guide Release 1.0. VSN Interna-

tional, Hemel Hempstead, UK.Jaffrezic, F., and S. D. Pletcher, 2000 Statistical models for esti- (I (t s)2)1 1 2

1 1 mating the genetic basis of repeated measures and other function-valued traits. Genetics 156: 913–922.

Jaffrezic, F., I. M. S. White, R. Thompson and P. M. Visscher, 2002 1/(1 1(t s)2) 00 1/(1 2(t s)2)Contrasting models for lactation curve analysis. J. Dairy Sci. 84:

968–975.Jaffrezic, F., R. Thompson and W. G. Hill, 2003 Structured antede-

1 2

1 1 1

.pendence models for genetic analysis of multivariate repeatedmeasures in quantitative traits. Genet. Res. 82: 55–65.

Meuwissen, T. H. E., and M. H. Pool, 2001 Autoregressive versus For the variance functions V(t), an eigenvalue decom-random regression test-day models for prediction of milk yields.position, ln V(t) P(t)(t)P (t), can also be used. ItInterbull Bull. 27: 172–178.

Meyer, K., 2001 Estimating genetic covariance functions assuming follows that V(t)1/2 P(t)exp(1⁄2(t))P(t).a parametric correlation structure for environmental effects. Parameter estimations were obtained using the OWNGenet. Sel. Evol. 33: 557–585.

function of ASREML (Gilmour et al. 2002), which re-Nunez-Anton, V., and D. L. Zimmerman, 2000 Modeling non-sta-tionary longitudinal data. Biometrics 56: 699–705. quires us to provide the first derivatives of the covariance

Pletcher, S. D., and C. J. Geyer, 1999 The genetic analysis of age- matrix with respect to each parameter. The nonstation-dependent traits: modeling a character process. Genetics 153:ary parameter of Equation 6 is obtained at the same825–833.

Pletcher, S. D., and F. Jaffrezic, 2002 Generalized character pro- time as the other parameters of the covariance matrix.cess models: estimating the genetic basis of traits that cannot beobserved and that change with age or environmental conditions.Biometrics 58: 157–162.

Pletcher, S. D., D. Houle and J. W. Curtsinger, 1998 Age-specificproperties of spontaneous mutations affecting mortality in Dro- APPENDIX B:sophila melanogaster. Genetics 148: 287–303. PROPERTIES OF THE DEFINED BIVARIATE

Pletcher, S. D., S. J. Macdonald, R. Marguerie, U. Certa, S. C. CHARACTER PROCESS COVARIANCE FUNCTIONStearns et al., 2002 Genome-wide transcript profiles in agingand calorically restricted Drosophila melanogaster. Curr. Biol. 12 When J times of measurement are available for two(9): 712–723.

variables Y1 and Y2, and for each individual i, observa-Schwarz, G., 1978 Estimating the dimension of a model. Ann. Stat.6: 461–464. tions are ordered as yi (yi11, yi21, . . . , yi1J , yi2J). The


whole genetic covariance matrix G of dimension (2J nential function is considered. In this case, matrix isdefined as for the bivariate Ornstein-Uhlenbeck process2J ) can be written as G VV . By construction (Equa-

tion 5), matrix G will be symmetric. Matrix V is block (Sy et al. 1997) and therefore satisfies the positive defi-niteness property. When considering other functions asdiagonal: V (Vj)j1,J, where Vj are 2 2 matrices de-

fined by Vj (V(tj))1/2, where ln V(tj) A Btj proposed in the univariate case by Pletcher and Geyer(1999), such as Gaussian or Cauchy, the property isCt 2

j , or is specified as in Equation 10. In both cases, matri-ces Vj, for j 1, . . . , J, are positive definite. Matrix maintained. Therefore, the proposed function for the

bivariate CP model satisfies the theoretical require-is a 2J 2J symmetric matrix defined, for (i, j 1, . . . ,J ), by (2(i 1) 1:2i, 2(j 1) 1:2j) ij, where ments of a covariance function as it is symmetric and

positive definite.ij (exp((ti tj)))1ji and ji ij, if an expo-

GAUSSIAN STOCHASTIC PROCESSES

S. Chaturvedi

School of Physics, University of Hyderabad

Hyderabad - 500134, India

Introduction

Among all possible stochastic processes, Gaussian stochastic processes constitute

a very important class. These occur in many areas of physics. A historically impor-

tant example of a Gaussian stochastic process is that of Brownian motion. The inten-

sity of the light emitted by a thermal source is another example of such a process.

The main reason why Gaussian stochastic processes have been studied so extensively

is that they are completely specified by the first two moments. This makes them par-

ticularly easy to handle.

We shall begin our discussion on Gaussian stochastic processes by studying

Gaussian random variables. This, as we shall see later, will enable us to define

Gaussian stochastic processes and to discuss some of their important properties.

Gaussian Random variables

Let us briefly recapitulate what a Gaussian random variable is.

A random variable X is defined by specifying

(i) the range of values x it can take and

(ii) a probability distribution over this range.

A random variable is said to be Gaussian if the range of values it can take ex-

tends from - ~ to + oo and if the probability distribution over this range is a

Gaussian distribution

2 I ( x - <x>)

P (x) exp (- ) (i)

If instead of a single variable we have a vector X having n components then (1)

generalizes to i

(detA) ~ T P (Xl .... Xn) n/2 exp E- (x-<X~ A (x__-<X_>)~ (2)

(2~)

where A is a positive definite symmetric matrix. The probability distribution (2)

is known as a multivariate Gaussian distribution. We shall now discuss some of its

properties.

(a) Let us first check that <X> appearing on the R.H.S. of (2) is indeed the mean

value of ~ and that the distribution is correctly normalised.

1 T A <X>= (detA)½ dx(x)exp E- ~ (x-<X>) (x-<X>)~ -- (2~) n/2 -~

1 yT (detA) ½ dy Ey+<X>~ exDE- 7 A y 3 (2~) n/2 -- -- -- - -- --

-co

Stochastic Processes: Formalism and Applications Lecture Notes in Physics, 1983, v. 184, pp. 19-29

20

since A is a symmetric matrix, it can be diagonalized by an orthogonal matrix.

S T A = s T AS ; S =

Putting y = Sz in the expression above and making use of the fact that the

Jacobian of the transformation is unity we obtain

<X > = (detA)½ dz~s z + <X>] exp E- ~ zTA z] -- (207) n/2 -- -- --

_oo

The first term, being odd in z, vanishes so that

<X > = <X> (datA)½ 7 dZl dz exp [-1 n 2] -- (2JD n/2 "'" n ~ Z A z. i=l i l -o0

(detA) ½ (2 2T) n/2 <X> n/2 <X >

-- (27[) (A 1 . - .A n) ½ --

Here Ai are the eigenvalues of A m

This also shows that (2) is correctly normalised.

(b) We now want to show that

< (Xi-<Xi>) (Xj-<Xj >) >

< (Xi-<Xi>) (Xj-<Xj >)>

= ~-l)i j

(det A ) ½ ~ 1 T (2jT)n/2 dy__ (yiYj) exp E-~ Y A y~

-oo

(detA) ½ d z (Sz_)i(Sz) j exp E- ~z Az] (2 5q) n/2 _oo

(3)

(detA) ½ ~ I n 2 (2jT)n/2 ~SidSj~ Jdz z~ z~ exp [- ~Z flizi]

-co i=l

8 ~ s "- s T) = >2 s s = ( i~ 1 = (A -1) ij

we thus see that a Gaussian distribution is completely determined by the mean

values of the variables and the second moments.

Henceforth we shall assume for convenience that the variables X have zero mean 1

i.e. <X> = 0. All the results for the case <X>~ 0 can easily be obtained by re-

placing X in the following by X -<X__>.

(c) We now establish a very useful property of multivariate Gaussian distribution

~f(X) > <Xif (X~ > = ~ <XiXj > < ~ (4)

3 3 where f(X) is a polynomial in the X . 's.

-- l

21

To prove (4) we rewrite it using (3) as

Df (x) < Xif (X_) > = ~ (A-l) ij-- ~----~----

J J or

~f (X) > Aij < Xj f (X__) > = <

j i

Now

Aij < Xjf (X) > (detA)½

j -- (2 •) n/2 _co

= (datA !~ T

(2j7) n/2 J _co

and on integrating by parts

~f (X) = < ~ > 1

Hence the proof.

dx f(x) Zj Aij xj exp~ m xlAlmxm ~

i T dx f(x) ~ exp [- ~x Ax3

l

(d) Repeated use of (4) enables us to show that all the even moments of a Gaussian

distribution with zero mean factorize pairwise into the second moments. (The odd

moments of such a distribution of course vanish as is easily seen.) Consider for

instance the four£h moment < XiXjXkXI>. From (4) we have

< XiXjXkXl> : ~ <XiX~>< ~ (XjXkXl)>

= <XiXj><XkXI> + <XiXk>< XjXI>+<XiXI><XjXk> (5)

The R.H.S. of (5) can be written compactly as

= Z < XpXq>< XrXs > pairs

where the indices p,q,r,s etc. are the same as i,j,k;l and the summation extends

over all different ways in which i,j,k,l can be divided into pairs.

Proceeding in a similar fashion, we have, in general

< XiXjXkX 1 ..... > = Z <XpXq><XrXs> ...... (6) pairs

Thus the even moments of a Gaussian with zero mean factorise as in (6). For a moment

of order 2k, there are (2k~/2 k k~) terms on the R.H.S. of (6). Conversely one can

show that if the moments of a probability distribution factorise as in (6) then the

distribution is a Gaussian. It therefore follows that (6) is both necessary and

sufficient for a distribution to be a Gaussian and is called the moment theorem for

Gaussian distributions.

(e) A convenient way of calculating the moments of a probability function is to work

out its characteristic function

22

C(h__) = < exp (i h.X) >

= ~ (ihl)ml (ih2)m2 ... < X nh xmz "" 1 2 ...... > (7) !

m = o ml [ m2" Z

Given the characteristic function, an arbitrary moment can be worked out by diffe-

rentiating it an appropriate number of times w.r.t, the h.'s and then setting h = o. l

For Gaussian distribution with zero mean c(h) has a very simple form

1 hTA-lh ] (8) C (h~ = exp ~ - ~ _ _

as is easily seen: ½

1 i i xTh C(h) (detA) dx exp E- ~ xTAx + ~ hTx + -- (2 3T) n/2 . . . . . . .

% 1 h T -i (detA) dx exp E- (x-iA-lh )T A(x-iA-lh )- ~ A

(2 jT) n/2 . . . . . .

1 h T (detA) ½ = exp (- ~_ A-lh)_ j dy exp[- ~ Yl TA y~ (2 ]i) n/2 -- _

hi

i h T = exp (-~_ A-lh)

Using (3) we may write (8) as

1 ~. h i <XiX j > hj] (9) C (h_) = exp E- ~ ij

Similarly for a Gaussian with <X>~ O, C(h) is found to be

i h T A -i c(h) = exp[-~_ h + i hT<x>3

1 = exp[iZ hi<xi>- ~ hi<<XiXj> hj~ (i0)

i ~3

Another useful quantity is the logaritham of the characteristic function - the cumulant

generating function:

K(hJ = in C(h A (ihl) m I (ih2) m~

, ..... << Xl Im X2 ~m .... >> m. =o ml ' m2" 1

For a Gaussian we find that the cumulant generating function

1 hTA-I K (h_) = i h i < X > - ~ _ h

! X hi< < = iE hi<X i> - 2 . , XiXj>>hj i z3

(ii)

(12)

is at most quadratic in the auxiliary variables h.'s and hence all cumulants higher 1

than the second vanish.

23

We mention here a theorem due to Marcinkiewicz EI-3~ which states that

the cumulant generating function of a probability distribution can not be a poly-

nomial in the hi's of degree greater than 2. In other words either K(~) is at best

quadratic in h or contains all of powers of h. This in turn implies that either

all but the first two cumulants of a probability distribution vanish or there are

an infinite number of non vanishing cumulants.

With this background on Gaussian random variables we now go over to defining

Gaussian stochastic processes.

Gaussian Stochastic Processes

A stochastic process is a function of two variables t, the time and~ a random

variable.

X~(t) = f (t, j~) (13)

We may look upon (13) in two ways:

(i) For each value (O the random variable3%takes, X (t) is just an ordinary function

of time and is called a realisation of the stochastic process x~(t). The stochastic

process is thus an ensemble of all such realisations.

(ii) For a fixed t, X (t) is a sotchastic variable - being a function of a random

variable X. The stochastic process X~(t) may be regarded as a continuum of random

variables one for each t.

~From the second point of view it therefore logically follows that in order to

define a stochastic process completely we need to specify an infinite number of

joint probabilities,

P1 (x, t)

P2 c~'t2; h 'h ) (x ,t "x2,t2"x ,t I) P3 3 3' ' 1

....... . .........

which say ~hat is the probability that X~(t) has a value x at t, what is the probabi-

lity that X~(t) has a value x I at t I and x 2 at t 2 etc. Given this infinite (over

complete) set of joint probabilities the stochastic process is completely defined.

A stochastic process is said to be Gaussian if all these joint probabilities

are Gaussian.

Pn(Xn,tn; . . . . . ; x2 , t 2 ; x l , t 1)

(det A ) ½ 1 exp E- ~ .~ (xi-<X(ti)>) Aij (xj-<X(tj)>) ~ (14)

(2JI) n/2 13

-i where the matrix A.. is the inverse of the matrix A with elements

13

( A -I) . . = <(X(ti)- <X(t2)>) (X(tj)-<x(tj)>) > x3

-= ~ X (ti) X (tj)~ (15)

24

in analogy with (3). <<X(ti) X(tj)~ is known as the auto correlation function.

Thus a Gaussian stochastic process is completely characterized by<X(t)> and

the auto correlation function. All the formulae we had derived previously for

Gaussian distributions can now be generalised to stochastic processes by replacing

partial derivatives by functional derivatives, summation over i by integration over

t etc. We list them below.

(a) Novikov's Theorem: For a functional f[X~ of the stochastic process X(t) we have

/ \ <X(t) fiX] > = dt' <X(t) X(t')> \ ~X(t')/ (16)

(b) Moment theorem:

For a Gaussian stochastic process with < X(t)> = o, the odd moments vanish

and the even moments factorise pairwise.

<X(ti)X(tj)... > = ~ <X(tp) X(tq)> <X(tr) X(ts) > ..... (17) pairs

(c) The characteristic functional oo

C ~] = <exp i[ dt h(t) X(t)>

im -co = ~. ~. ~ dtl...Jdt m h(t l)...h(t m) < X(t l)...x(t m) > (18)

m=o

for a Gaussian stochastic process with zero mean is given by

1 C[h] = expE- ~ ~ dtlfdt 2 h(tl) h(t2) <X(tl) X(t2) > ~ (19)

and for <X(t)>~ 0 by

C[h] = exp E iJdtlh(tl)<X(tl) > - 21-- fdtlfdt2h(tl)h(t2)<<X(tl)X(t2)~3 (20)

where

~X(t l) X(t2)~ = <(X(tl)-<X(tl)~ ) (X(t2)-<X(t2)>)> (21)

(d) The cumulant generating functional

K[h 3 = in C[h]

is fdtl...fdtm h(t l)...h(tm)<x(t I) ...X(tm)> > (22) m=o

f o r a G a u s s i a n d i s t r i b u t i o n w i t h < X ( t ) > = 0 r e a d s

1 K [ h ] = - ~- j d t l f d t 2 h ( t l ) h ( t 2 ) < x ( t t ) x ( t 2 ) > (23)

and f o r < X ( t ) > ¢ 0 a s

i m p l y i n g t h a t a l l t h e c u m u l a n t s o f a G a u s s i a n s t o c h a s t i c p r o c e s s h i g h e r t h a n t h e

s e c o n d v a n i s h . The M a r c i n k i e w i c z t h e o r e m h o l d s f o r s t o c h a s t i c p r o c e s s e s a s w e l l .

25

Of special interest to physicists and mathematicians are a class of stochastic

process known as Markov processes E4~ - A Markov process is fully determined by

a single time distribution Pl(X,t) and a conditional probability defined as

P2(Xl,tl;x2,t2 )

p (xl,tlL x2t2) ~ pi (x2,t2) (24)

satisfying

(i) the Chapman-Kolmogorov equation

P(x3,t3Jxl,tl) = \; dx2P(x3,t31x2,t2) P(x2,t21 xl,tl) for t3>t2>t I (25)

and

(ii) Pl(X2,t2) = f dx I P(x2,t21xl,tl) P(Xl,tl) (26)

Another class of stochastic processes which are of special relevance to physics

are the stationary processes. A stochastic process is stationary if all the joint

probabilities are invariant under a shift in time.

Pn(Xn,tn +q; .... ;x2,t2 +~; xl,tl+~) = Pn(Xn,tn; .... ;x2,t2;xl,tl) (27)

This necessarily implies that the single time probability Pl(X,t) is independent of

time. Equation (27) in turn implies that

<X(tn+q) .... X(t2+~ ) Xl(tl+~)> = <X(tn) .... X(t2)X(tl) > (28)

Having thus defined these three important classes of stochastic processes viz.

Gaussian, Markovian and stationary stochastic processes, a natural question to ask

is as follows: Among all Gaussian stochastic processes what characterizes those which

have the additional attributes of being stationary and Markovian? The answer to this

question is provided by Doob's Theorem:

A stationary Gaussian process is Markovian only if the auto correlation function

is an exponential.

~X (tl)X(t2)~ oC exp-~ tl-t2 I (29)

(For a multicomponent stochastic process (29) is to be replaced by its obvious genera-

lisation

~ X(tl)XT (t2)~ 0C exp- F I tl-t21 (30)

where I ~ is a constant matrix.)

We now briefly outline the proof of this important theorem.

For a Gaussian stochastic process the joint probabilities have the form given

in (14). (For simplicity we shall consider Gaussian processes with <X(t)> = 0).

Substituting for P2(Xl,tl;x2,t2) and Pl(X2,t2) in (24), we find that the conditional

probability for a Gaussian process has the following general form

26

P(Xl,t l]x2,t 2)

/2/[ 0 -2 (t I) (i-~ (tl,t2) ~

1 i £ (tl' t2)o-(tl) ,2

expE 2 o2(tl) (i_ e2(tl,t2)] (Xl- ~(-~2 ) x21 ] (31)

where 2

O- ( t ) = <X(t) X(t)> (32)

and < X(tl) X(t 2) >

e (tl,t 2) O_(tl)O_(t2) (33)

e(tl,t2) is known as the correlation coefficient.

From (31) it follows that the conditional average of X(t) at time t I given that

it had a value x 3 at time t 3

<X(tl)~X(t3)=x 3 ~ f dx I x I P(Xl,tll x3,t 3) (34)

is given by

e(tl,t3) O-(t I) <X(tl)>X(t3)=x3 0_(t3) x 3 (35)

For a Gauss-Markov process, we have, on using (35) and the Chapman Kolmogorov

equation (25)

< X(tl)>X(t3)=x3

/JdXldX x I P(Xl,t I

e (tl,t 2) ~(t I) =

O-(t 2)

= e (tl,t2) (]-(t I)

dx I x I P(xl,t I x3,t 3)

x2,t 2) P(x2,t 2 I x3t3);tl~ t2~ t 3

dx~2P(x2,t21 x3,t 3)

C (t 2,t 3) @-(t 2)

G- (t 2) O- (t 3) x3 (36)

From (35) and (36) we have

e (tl,t 3) : e(tl,t 2) e (t2,t 3) tl~t 2 ~ t 3 (37)

Thus we find that a necessary condition for a Gaussian process to be Markovian is

that the correlation coefficients must satisfy (36). In fact this condition turns out

to be both necessary and sufficient E5 ~ .

Let us now consider a stationary Gaussian process. Stationarity implies that

O-(t) is independent of time and that <X (tl) X (t2) > and hence e (tl,t 2) depends only

on tl-t 2. From (36) it follows that for such a process to be Markovian we must have

27

e(tl-t 3) = e(tl-t 2) e(t2-t 3) (38)

This functional equation is satisfied only if ~ (tl-t3) is an exponential

e(tl-t3) = exp - ~ (tl-t 3)

i.e. <X(t I) X(t2)> oc exp - t tl-t21 (39)

Hence the proof.

Examples of Gaussian Stochastic Processes

In conclusion we list some important Gaussian stochastic processes which one

frequently encounters in physics. It contains examples of Gaussian stochastic proce-

sses which have either the Markov property or stationarity or both or neither of them.

i. Gaussian White noise : The Gaussian "stochastic process" ~(t) characterized by

<~(t)> = o (40)

<~(t) ~(t')> = ~ (t-t') (41)

is usually referred to as G aussian white noise. Such a stochastic process was first

introduced by Langevin in the context of Brownian motion. Gaussian white noise is not

a stochastic process in a strict mathematical sense. However, in physics it is often

used as a model for very rapid fluctuations.

2. Wiener Process: Wiener process W(t) is an example of a Gaussian Markovian non-

stationary stochastic process and is characterised by

<W(t)> = 0

<W(t) W(t')> = min (t,t')

That it is a non stationary process is clear from (43).

that O-2(t) and e(tl,t2) for this process are given by

2 0- (t) = < W(t)W(t) > = t (44)

< W(tl)W(t 2) > t~t ~ e (tl,t2) = o_ (tl)O.(t2) = ; tl>t 2 (45)

The single time probability Pl(W,t) is therefore given by

2 w Pl(W,t ) = 1 exp [ - ~-~] (46)

J2/[ t

Substituting from (44) and (45) in (31) we find that the conditional probability

P(Wl,tllw2,t2) for this process is given by

1 1 (Wl_W2)2 ] (47) P(wl,tlL w2,t2) exp [- 2 - -

J2 ~ (tl-t 2) (tl-t2)

That this process is a Markov process is easily checked by verifying that

e(tl,t2) satisfies (37).

(42)

(43)

From (43) it also follows

28

We can regard Wiener process as an integral of the Gaussian white noise t

W(t) = f at ~(t) (48)

in the sense that (48) together with (40) and (41) reproduce (42) and (43). We can

also write (48) formally as a differential equation

dW = ~(t) (49)

d--~

3. Ornstein Uhlenbeck Process: This is an example of a Gaussian Markovian stationary

stochastic process. It is characterised by

<X (t) > = 0 (50)

<Y (tl) Y (t2) > = exp E- Itl-t21 ~ (51)

From (50) and (51) we can construct Pl(Y,t) and Pl(Yl,tllY2,t2) just as in the

case of Wiener process. These are given by

1 y2j pl(Y,t ) 1 exp E-

j2~

(yl_Y2e- (t, -t~) ) 2

P(Yl,tl ly2,t2 ) = 1 exp E- (t -tz)) / -2(t -t2) ~ 2(l-e -2 / 25q (l-e . ,

(52)

(53)

Again, as in the case of Wiener process,

cess in terms of Gaussian white noise as follows

t

Y(t) = ~ dt' e -(t-t') ~(t')

in the sense that (54) reproduces (50) and (51).

differential equation

We may express the Ornstein Uhlenbeck pro-

(54)

Equation (54) may be written as a

dY d-~ = - Y + ~(t). (55)

4. An example of a Gaussian stochastic process which is stationary but non Markovian

is easy to construct. Any Gaussian stationary process with a non exponential auto

correlation function is, according to Doob's Theorem, non Markovian.

5. Finally an example of a Gaussian stochastic process which is neither Markovian

nor stationary is the stochastic process defined to be the integral of Ornstein

Uhlenbeck process t

Z(t) = Jo Y(t) dt' (56)

For this process we can deduce using (50) and (51) that

<Z (t) > = 0 (57)

29

<Z(tl)Z(t2)> = e-t'+e -t~ -l-e-lt'-t~l+2 min (tl,t2)

This process is not stationary as is evident from (58).

that (37) is not satisfied for this process and hence it is non Markovian.

We may write (55) as a differential equation

dZ -- = Y(t) (59) dt

where Y(t) obeys (54). With Z(t) and Y(t) identified with the position and velocity

of a Brownian particle, (59) and (55) are the Langevin equation for a free Brownian

particle. Although Z(t) is non Markovian Z(t) and Y(t) together constitute a Markov

process.

The material presented above can be found in some form or another in any good

text book on stochastic processes. See for instance E6~ and E7~.

(58)

It is also easy to check

References

i. J. Marcinkiewicz, Math. Z. 4_~4, 612 (1939). 2. D.W. Robinson, Comm. Math. Phys. ~, 89 (1965). 3. A.K. Rajagopal and E.C.G. Sudarshan, Phys. REv. AI0, 1852 (1974). 4. See Lectures by R. Vasudevan in these proceedings. 5. W. Feller, "Introduction to Probability Theory and its Applications", vol.2

Wiley, New York (1966). 6. N.G. Van Kampen, "Stochastic Processes in Physics and Chemistry", North Holland,

Amsterdam, New York, Oxford (1981). 7. C.W. Gardiner "A Hand book of Stochastic Methods for Physics, Chemistry and Natural

Sciences". To be published, Springer (1983).

Chapter 5

Brownian Motion

Brownian motion processes originated with the study by the botanist Brownin 1827 of the movements of particles suspended in water. As a particle isoccasionally hit by the surrounding water molecules, it moves continuouslyin three dimensions. Assuming the infinitesimal displacements of the particleare independent and identically distributed, the central limit theorem wouldimply that the size of a typical displacement (being the sum of many smallones) is normally distributed. Then the continuous trajectory of the particlein R

3 would have increments that are stationary, independent and normallydistributed. These are the defining properties of Brownian motion. This dif-fusion phenomenon, commonly encountered in other contexts as well, gaverise to the theory of Brownian motion and more general diffusion processes.

Brownian motion is one of the most prominent stochastic processes. Its im-portance is due in part to the central limit phenomenon that sums of randomvariables such as random walks, considered as processes in time, converge toa Brownian motion or to functions of it. Moreover, Brownian motion playsa key role in stochastic calculus involving integration with respect to Brow-nian motion and semimartingales. This calculus is used to study dynamicalsystems modeled by stochastic differential equations. For instance, in thearea of stochastic finance, stochastic differential equations are the basis forpricing of options by Black-Scholes and related models. Brownian motion isan important example of a diffusion process and it is a Gaussian process aswell. Several variations of Brownian motion arise in specific applications, suchas Brownian bridge in statistical hypothesis testing. In operations research,the major applications of Brownian motion have been in approximations forqueueing systems, and there have also been applications in various areas suchas financial models and supply chains.

This chapter begins by introducing a Brownian motion as a Markov processthat satisfies the strong Markov property, and then characterizes a Brownianmotion as a Gaussian process. The second part of the chapter is a study ofhitting times of Brownian motion and its cumulative maximum process. This

R. Serfozo, Basics of Applied Stochastic Processes,Probability and its Applications.c© Springer-Verlag Berlin Heidelberg 2009

341

342 5 Brownian Motion

includes a reflection principle for Brownian sample paths, and an introductionto martingales and the optional stopping theorem for them.

The next major results are limit theorems: a strong law of large numbersfor Brownian motion and its maximum process, a law of the iterated loga-rithm for Brownian motion, and Donsker’s functional limit theorem showingthat Brownian motion is an approximation to random walks. Applications ofDonsker’s theorem yield similar Brownian approximations for Markov chains,renewal and regenerative-increment processes, and G/G/1 queueing systems.

Other topics include peculiarities of Brownian sample paths, geometricBrownian motion, Brownian bridge processes, multidimensional Brownianmotion, Brownian/Poisson particle process, and Brownian motion in a ran-dom environment.

5.1 Definition and Strong Markov Property

Recall that a random walk in discrete time on the integers is a Markov chainwith stationary independent increments. An analogous process in continuoustime on R is a Brownian motion. This section introduces Brownian motion asa real-valued Markov process on the nonnegative time axis. Its distinguish-ing features are that it has stationary, independent, normally-distributed in-crements and continuous sample paths. It also satisfies the strong Markovproperty.

We begin by describing a “standard” Brownian motion, which is also calleda Wiener process.

Definition 1. A real-valued stochastic process B = B(t) : t ∈ R+ is aBrownian motion if it satisfies the following properties.(i) B(0) = 0 a.s.(ii) B has independent increments and, for s < t, the increment B(t)−B(s)has a normal distribution with mean 0 and variance t− s.(iii) The paths of B are continuous a.s.

Property (ii) says that a Brownian motion B has stationary, independentincrements. From this one can show that B is a Markov process. Conse-quently, a Brownian motion is a diffusion process — a Markov process withcontinuous sample paths. The next section establishes the existence of Brown-ian motion as a special type of Gaussian process. An introduction to Brownianmotion in R

d is in Section 5.14.Because the increments of a Brownian motion B are stationary, inde-

pendent and normally distributed, its finite-dimensional distributions aretractable. The normal density of B(t) with mean 0 and variance t is

fB(t)(x) =1√2πt

e−x2/2t.

5.1 Definition and Strong Markov Property 343

Denoting this density by ϕ(x; t), it follows by induction and properties (i)and (ii) that, for 0 = t0 < t1 < · · · < tn and x0 = 0, the joint density ofB(t1), . . . , B(tn) is

fB(t1),...,B(tn)(x1, . . . , xn) =n∏

m=1

ϕ(xm − xm−1;√tm − tm−1). (5.1)

Another nice feature is that the covariance between B(s) and B(t) is

E[B(s)B(t)] = s ∧ t. (5.2)

This follows since, for s < t,

Cov(B(s), B(t)) = E[B(s)B(t)] = E[B(s)[(B(t) −B(s)) +B(s)]]= E[B(s)2] = s.

Several elementary functions of a Brownian motion are also Brownian mo-tions; see Exercise 1. Here is an obvious example.

Example 2. Symmetry Property. The process −B(t), which is B reflectedabout 0, is a Brownian motion (i.e., −B d= B).

As a generalization of a standard Brownian motion B, consider the process

X(t) = x+ μt+ σB(t), t ≥ 0.

Any process equal in distribution to X is a Brownian motion with drift: xis its initial value, μ is its drift coefficient, and σ > 0 is its variation. Manyproperties of a Brownian motion with drift readily follow from properties ofa standard Brownian motion. For instance, X has stationary, independentincrements and X(t + s) − X(s) is normally distributed with mean μt andvariance σ2t. Drift and variability parameters may be useful in Brownianmodels for representing certain trends and volatilities.

Now, let us see how Brownian motions are related to diffusion processes.Generally speaking, a diffusion process is a Markov process with continuouspaths. Most diffusion processes in applications, however, have the followingform. Suppose that X(t) : t ≥ 0 is a real-valued Markov process withcontinuous paths a.s. that satisfies the following properties: For each x ∈ R,t ≥ 0, and ε > 0,

limh↓0

h−1P|X(t+ h) −X(t)| > ε|X(t) = x = 0,

limh↓0

h−1E[X(t+ h) −X(t)|X(t) = x] = μ(x, t),

limh↓0

h−1E[(X(t+ h) −X(t))2|X(t) = x] = σ(x, t),


where μ and σ are functions on R × R+. The X is a diffusion process on R

with drift parameter μ(x, t) and diffusion parameter σ(x, t).As a prime example, a Brownian motion with drift X(t) = μt+ σB(t), is

a diffusion process whose drift and diffusion parameters μ and σ are inde-pendent of x and t. Many functions of Brownian motions are also diffusions(e.g., the Ornstein-Uhlenbeck and Bessel Processes in Examples 8 and 64).

We end this introductory section with the strong Markov property forBrownian motion. Suppose that B is a Brownian motion on a probabilityspace (Ω,F , P ). Let FB

t ⊆ F be the σ-field generated by B(s) : s ∈ [0, t],and assume FB

0 includes all sets of P -probability 0 to make it complete. Astopping time of the filtration FB

t is a random time τ , possibly infinite, suchthat τ ≤ t ∈ FB

t , t ∈ R+. The σ-field of events up to time τ is

FBτ = A ∈ F : A ∩ τ ≤ t ∈ Ft, t ∈ R+.

If τ1 and τ2 are two FBt -stopping times and τ1 ≤ τ2, then FB

τ1 ⊆ FBτ2 .

Theorem 3. If τ is an a.s. finite stopping time for a Brownian motion B,then the process B′(t) = B(τ + t) − B(τ), t ∈ R+, is a Brownian motionindependent of FB

τ .

Proof. We will prove this only for a stopping time τ that is a.s. bounded(τ ≤ u a.s. for some u > 0). Clearly B′ has continuous sample paths a.s. Itremains to show that the increments of B′ are independent and independentof FB

τ , and B′(s+t)−B′(s) is normally distributed with mean 0 and variancet. These properties will follow upon showing that, for any 0 ≤ t0 < · · · < tn,and u1, . . . , un in R+,

E[eSn |FBτ ] = e

12

∑ni=1 u

2i (ti−ti−1) a.s., (5.3)

where Sn =∑ni=1 ui[B

′(ti) −B′(ti−1)].The proof of (5.3) will be by induction. First note that

E[eSn+1 |FBτ ] = E

[eSnE[eSn+1−Sn |FB

τ+tn ]∣∣∣FB

τ

]. (5.4)

Now, since τ + tn is a bounded stopping time, using Example 26 below,

E[eSn+1−Sn |FBτ+tn ] = E[eun+1[B(τ+tn+1)−B(τ+tn)]|FB

τ+tn ]

= e12u

2n+1(tn+1−tn).

This expression with n = 0 and S0 = 0 proves (5.3) for n = 1. Next assuming(5.3) is true for some n, then using the last display and (5.3) in (5.4) yields(5.3) for n+ 1.

5.2 Brownian Motion as a Gaussian Process 345

5.2 Brownian Motion as a Gaussian Process

This section shows that Brownian motion is a special type of Gaussian Pro-cess. Included is a proof of the existence of Gaussian processes, which leadsto the existence of Brownian motion.

We begin with a discussion of multivariate normal distributions. Supposethat X1, . . . , Xn are normally distributed (not necessarily independent) ran-dom variables with means m1, . . . ,mn. Then clearly, for u1, . . . , un ∈ R,

E[ n∑

i=1

uiXi

]=

∑

i

uimi, Var[ n∑

i=1

uiXi

]=

∑

i

∑

j

uiujcij , (5.5)

where cij = Cov(Xi, Xj). The vector (X1, . . . , Xn) is said to have a multivari-ate normal (or Gaussian) distribution if

∑ni=1 uiXi has a normal distribution

for any u1, . . . , un in R. In light of (5.5), the (X1, . . . , Xn) has a multivariatenormal distribution if and only if its moment generating function has theform

E[e∑n

i=1 uiXi

]= exp

∑

i

uimi +12

∑

i

∑

j

uiujcij

, ui ≥ 0. (5.6)

The vector (or distribution) associated with the moment generating func-tion (5.6) is called nondegenerate if the n× n matrix C = cij has rank n.In this case, the joint density of (X1, . . . , Xn) is

f(x1, . . . , xn) =1

√(2π)n|C|

exp− 1

2

∑

i

∑

j

cij(xi −mi)(xj −mj), (5.7)

where cij is the inverse of C and |C| is its determinant.It turns out that any multivariate normal vector can be represented by

a nondegenerate one as follows. When C does not have rank n, it followsby a property of symmetric matrices that there exists a k × n matrix Awith transpose At, where k ≤ n, such that C = AtA. Let X denote themultivariate normal vector as a 1 × n matrix with mean vector m. Supposethat Y is a 1×k nondegenerate multivariate vector of i.i.d. random variablesY1, . . . , Yk that are normally distributed with mean 0 and variance 1. Thenthe multivariate normal vector, has the representation

Xd= m+ Y A. (5.8)

This equality in distribution follows because the moment generating functionof m+ Y A is equal to (5.6). Indeed,


E

⎡

⎣exp n∑

i=1

uimi +n∑

i=1

ui(k∑

j=1

aji)Yj⎤

⎦ = exp

∑

i

uimi +v

2

,

where interchanging the summations and using C = AtA,

v = Var

⎡

⎣k∑

j=1

(n∑

i=1

uiaji

)

Yj

⎤

⎦ =k∑

j=1

(n∑

i=1

uiaji

)2

=k∑

j=1

(n∑

i=1

uiatij

)(n∑

=1

uaj

)

=∑

i

∑

j

uiujcij .

A major characteristic of a Brownian motion is that its finite-dimensionaldistributions are multivariate normal. Other stochastic processes with thisproperty are as follows.

Definition 4. A stochastic process X = X(t) : t ∈ R+ is a Gaussianprocess if (X(t1), . . . , X(tn)) has a multivariate normal distribution for anyt1, . . . , tn in R+. Discrete-time Gaussian processes are defined similarly.

A process X is Gaussian, of course, if and only if∑n

i=1 uiX(ti) has a nor-mal distribution for any t1, . . . , tn in R+ and u1, . . . , un in R. A Gaussian pro-cessX is called nondegenerate if its covariance matrix cij = Cov(X(ti), X(tj))has rank n, for any t1, . . . , tn in R+. In that case,

(X(t1), . . . , X(tn)

)has a

multivariate normal density as in (5.7).The next result establishes the existence of Gaussian processes. It also

shows that the distribution of a Gaussian process is determined by its meanand covariance functions. So two Gaussian processes are equal in distributionif and only if their mean and covariance functions are equal. Let c(s, t) be areal-valued function on R

2+ that satisfies the following properties:

c(s, t) = c(t, s), s, t,∈ R+. (Symmetric)

For any finite set I ⊂ R+ and ut ∈ R,∑

t∈I

∑

s∈Iusutc(s, t) ≥ 0. (Nonnegative-definite)

Theorem 5. For any real-valued function m(t) and the function c(s, t) de-scribed above, there exists a Gaussian process X(t) : t ∈ R+ defined on aprobability space (Ω,F , P ) = (RR+ ,BR+ , P ), with E[X(t)] = m(t) and

Cov(X(s), X(t)) = c(s, t), s, t,∈ R+.

Furthermore, the distribution of this process is determined by the functionsm(t) and c(s, t).

5.2 Brownian Motion as a Gaussian Process 347

Proof. We begin by defining finite-dimensional probability measures μI fora process on (RR+ ,BR+ , P ). For any finite subset I in R+, let μI be theprobability measure specified by μI(×t∈IAt), At ∈ B, t ∈ I, that has thejoint normal moment generating function

G(uI) = exp∑

t∈Iutm(t) +

12

∑

t∈I

∑

s∈Iusutc(s, t)

, uI = (ut : t ∈ R+).

Note that for I ⊆ J ,

G(uJ) = G(uI), if ut = 0, t ∈ J\I.

Consequently, the joint normal distributions μI satisfy the consistency con-dition that, for any I ⊆ J for finite J and At ∈ B, t ∈ J ,

μJ(×t∈JAt) = μI(×t∈IAt), if At = R for t ∈ J\I. (5.9)

Then it follows by Kolmogorov’s extension theorem (Theorem 5 in the Ap-pendix), that there exists a stochastic process X(t) : t ∈ R+ defined on theprobability space (Ω,F , P ) = (RR+ ,BR+ , P ), whose finite-dimensional prob-ability measures are given by μI . Since the μI are determined by m(t) andc(s, t), so is the distribution of X . Moreover, from the moment generatingfunction for the μI , it follows that

E[X(t)] = m(t), Cov(X(s), X(t)) = c(s, t).

Brownian motion is a quintessential example of a Gaussian process.

Proposition 6. A Brownian motion with drift X(t) = μt+ σB(t), t ≥ 0, isa Gaussian process with continuous sample paths a.s. starting at X(0) = 0and its mean and covariance functions are

E[X(t)] = μt, Cov(X(s), X(t)) = σ2(s ∧ t), s, t ∈ R+.

Proof. For any 0 = t0 < t1 < · · · < tn, letting Yi = X(ti)−X(ti−1), we have

n∑

i=1

uiX(ti) =n∑

i=1

ui

i∑

k=1

Yi =n∑

k=1

(n∑

i=k

ui

)

Yk, u1, . . . , un ∈ R.

Now the increments Yi are independent, normally distributed random vari-ables with mean 0 and variance σ2(ti− ti−1). Then the last double-sum termhas a normal distribution, and so (X(t1), . . . , X(tn)) has a multivariate nor-mal distribution. Hence X is a Gaussian process, and its mean and varianceare clearly as shown.

The preceding characterization is useful for verifying that a process isa Brownian motion, especially when the multivariate normality condition


is easy to verify (as in Exercise 2). There are other interesting Gaussianprocesses that do not have stationary independent increments; see Example 8below and Exercise 10.

One approach for establishing the existence of a Brownian motion is toconstruct it as a Gaussian process as follows.

Theorem 7. There exists a stochastic process B(t) : t ≥ 0 defined ona probability space (Ω,F , P ) = (RR+ ,BR+ , P ) such that B is a Brownianmotion.

Sketch of Proof. Let B(t) : t ≥ 0 be a Gaussian process as constructedin the proof of Theorem 5 with the special Brownian functions m(t) = 0and c(s, t) = s ∧ t. A major result (whose proof is omitted) says that thisprocess has stationary independent increments, and B(t)−B(s), for s < t, isnormally distributed with mean 0 and variance t−s. A second step is needed,however, to justify that such a process has continuous sample paths.

Since B(t) −B(s) d= (t− s)1/2B(1), for s < t, the process satisfies

E[|B(t) −B(s)|a] = (t− s)a/2E[|B(1)|a] <∞, a > 0.

Using this property, another major result shows that B can be chosen so thatits sample paths are continuous, and hence it is a Brownian motion. The re-sults that complete the preceding two steps are proved in [64].

A Brownian motion is an example of a Markov process with continuouspaths that is a Gaussian process. Are there Markov processes with continu-ous paths (i.e., diffusion processes), other than Brownian motions, that areGaussian? Yes there are — here is an important example.

Example 8. An Ornstein-Uhlenbeck Process is a stationary Gaussian processX(t) : t ≥ 0 with continuous sample paths whose mean function is 0 andwhose covariance function is

Cov(X(s), X(t)) =σ2

2αe−α|s−t|, s, t ≥ 0,

where α and σ are positive. This process as proved in [61] is the only station-ary Gaussian process with a continuous covariance function that is a Markovprocess. (Exercise 9 shows that a Gaussian processX is stationary if and onlyif its mean function is a constant and its covariance function Cov(X(s), X(t))only depends on |t− s|.)

It is interesting that the process X is also a function of a Brownian motionB in that X is equal in distribution to the process

Y (t) = σe−αtB(eαt/2α), t ≥ 0.

To see this, note that Y has continuous sample paths and clearly it is Gaussiansince B is. In addition, E[Y (t)] = 0 for each t and, for s < t,

5.3 Maximum Process and Hitting Times 349

Cov(Y (s), Y (t)) = σ2e−α(s+t)E[B(eαs/2α)B(eαt/2α)

]

=σ2

2αe−α(t−s).

Consequently, Y is a stationary Gaussian process and it has the same meanand covariance functions as X . Hence Y d= X .

The Ornstein-Uhlenbeck process X defined above satisfies the stochasticdifferential equation

dX(t) = −αX(t)dt+ σdB(t). (5.10)

The example above assumes that X(0) has a normal distribution with mean0 and variance σ2. The solution of this equation is

X(t) = X(0)e−αt + σ

∫ t

0

e−α(t−s)dB(s).

The stochastic differential equation and the integral with respect to Brownianmotion, which is beyond the scope of this work, is discussed in [61, 64].

Scientists realized that the Brownian motion representation for a particlemoving in a medium was an idealized model in that it does not account forfriction in the medium. To incorporate friction in the model, Langevin 1908proposed that the Ornstein-Uhlenbeck processX could represent the velocityof a particle undergoing a Brownian motion subject to friction. He assumedthat the rate of change in the velocity satisfies (5.10), where −αX(t)dtmodelsthe change due to friction; the friction works in the opposite direction to thevelocity and α is the coefficient of friction divided by the mass of the particle.The stochastic process for this model was formalized later by Ornstein andUhlenbeck 1930 and Doob 1942.

5.3 Maximum Process and Hitting Times

For a Brownian motion B, its cumulative maximum process is

M(t) = maxs≤t

B(s), t ≥ 0.

This process is related to the hitting times

τa = inft > 0 : B(t) = a, a ∈ R.

Namely, for each a ≥ 0 and t,

τa ≤ t = M(t) ≥ a. (5.11)


In other words, the distribution of the maximum process is determined bythat of the hitting times and vice versa. This section presents expressions forthese distributions and an important property of the hitting times.

We begin with a preliminary fact.

Remark 9. The hitting time τa is an a.s. finite stopping time of B.

The τa is a stopping time since B has continuous paths a.s. Its finitenessfollows by Theorem 32 below, which is proved by the martingale optionalstopping theorem in the next section. The finiteness also follows by the con-sequence (5.30) of the law of the iterated logarithm in Theorem 38 below.

The first result is a reflection principle that an increment B(t) − B(τ)after a stopping time τ has the same distribution as the reflected increment−(B(t) − B(τ)). This is basically the symmetry property B

d= −B in Ex-ample 2 manifested at the stopping time τ . A version of this principle forstochastic processes is in Exercises 20 and 21.

Proposition 10. (Reflection Principle) If τ is an a.s. finite stopping time ofB, then, for any a and t,

PB(t) −B(τ) ≤ a, τ ≤ t = PB(t) −B(τ) ≥ −a, τ ≤ t.

Proof. Letting B′(t) = B(τ + t) −B(τ), t ≥ 0, we can write

B(t) −B(τ) = B′(t− τ), for τ ≤ t. (5.12)

By the strong Markov property in Theorem 3, B′ is a Brownian motionindependent of Fτ . Using this and (5.12) along with the symmetry propertyB′ d= −B′ and τ ≤ t ∈ Fτ ,

PB(t) −B(τ) ≤ a, τ ≤ t = E[PB′(t− τ) ≤ a|τ ≤ t,Fτ

]Pτ ≤ t

= E[P−B′(t− τ) ≤ a|τ ≤ t

]Pτ ≤ t

= PB′(t− τ) ≥ −a, τ ≤ t.

Then using (5.12) in the last probability completes the proof.

We will now apply the reflection principle to obtain an expression for thejoint distribution of B(t) and M(t).

Theorem 11. For x < y and y ≥ 0,

PB(t) ≤ x,M(t) ≥ y = PB(t) ≥ 2y − x, (5.13)PM(t) ≥ y = 2PB(t) ≥ y.

Furthermore, M(t) d= |B(t)| for each t, and the density of M(t) is

5.3 Maximum Process and Hitting Times 351

fM(t)(x) =2√2πt

e−x2/2t, x ≥ 0.

HenceE[M(t)] =

√2t/π, Var[M(t)] = (1 − 2/π)t. (5.14)

Proof. Assertion (5.13) follows since by (5.11) and Proposition 10 with τ =τy, B(τ) = y, and a = x− y, we have, for x ≤ y and y ≥ 0,

PB(t) ≤ x,M(t) ≥ y = PB(t) ≤ x, τy ≤ t= PB(t) ≥ 2y − x, τy ≤ t= PB(t) ≥ 2y − x.

The last equality is because 2y − x ≥ y and

B(t) ≥ 2y − x ⊆ B(t) ≥ y ⊆ τy ≤ t.

Next, using what we just proved with x = y, we have

PM(t) ≥ y = PB(t) ≤ y,M(t) ≥ y + PB(t) ≥ y,M(t) ≥ y= 2PB(t) ≥ y.

Taking the derivative of this with respect to y yields the density of M(t). Inaddition, 2PB(t) ≥ y = P|B(t)| ≥ y implies M(t) d= |B(t)|. Exercise 11proves (5.14).

Even though M(t) d= |B(t)| for each t, the processes M and |B| are notequal in distribution; M is nondecreasing while |B| is not. Exercise 19 pointsout the interesting equality in distribution M d= M −B for the processes.

Because of the reflection property −B d= B of a Brownian motion B, itsminima is also a reflection of its maxima.

Remark 12. The minimum process for B is

M(t) = mins≤t

B(s), t ≥ 0.

It is related to the maximum process M by M d= −M . Hence

PM(t) ≤ a = 2PB(t) ≥ −a, a < 0.

That M d= −M follows by M(t) = −maxs≤t−B(t) and the reflection prop-

erty −B d= B. Also, Theorem 11 yields the distribution of M(t).

We will now obtain the distribution of the hitting time τa from Theo-rem 11. This result is also a special case of Theorem 32 for hitting times fora Brownian motion with drift, which also contains the Laplace transform ofτa and shows that E[τa] = ∞.


Corollary 13. For any a ≥ 0,

Pτa ≤ t = 2[1 − Φ(a/√t)], t > 0,

where Φ is the standard normal distribution. Hence, the density of τa is

fτa(t) =a√2πt3

e−a2/2t, t > 0.

Proof. From (5.11), Theorem 11, and B(t) d=√tB(1), we have

Pτa ≤ t = PM(t) ≥ a= 2PB(t) ≥ a = 2[1 − Φ(a/

√t)].

Taking the derivative of this yields the density of τa.

The family of hitting times τa : a ≥ 0 for B is an important process inits own right. It is the non-decreasing left-continuous inverse process of themaximum process M since

τa = inft : B(t) = a = inft : M(t) = a.

By Corollary 13, we know the density of τa and that E[τa] = ∞. Here is moreinformation about these hitting times.

Proposition 14. The process τa : a ≥ 0 has stationary independent incre-ments and, for a < b, the increment τb − τa is independent of Fτa and it isequal in distribution to τ(b−a).

Proof. Since τb − τa = inft : B(τa + t) − B(τa) = b − a, it follows by thestrong Markov property at τa that τb − τa is independent of Fτa and it isequal in distribution to τb−a. Also, it follows by an induction argument thatτa : a ≥ 0 has stationary independent increments.

5.4 Special Random Times

In this section, we derive arc sine and arc cosine probabilities for certainrandom times of Brownian motion by applying properties of the maximumprocess.

We first consider two random times associated with a Brownian motion Bon the interval [0, 1] and its maximum process M(t) = maxs≤t B(s). Thesetimes have the same arc sine distribution, which is discussed in Exercise 14.

Theorem 15. (Levy Arc Sine Law) For a Brownian motion B on [0, 1], thetime τ = inft ∈ [0, 1] : B(t) = M(1) has the arc sine distribution

5.4 Special Random Times 353

Pτ ≤ t =2π

arcsin√t, t ∈ [0, 1]. (5.15)

In addition, the time τ ′ = supt ∈ [0, 1] : B(t) = 0 has the same distribution.

Proof. First note that, for t ≤ 1,

τ ≤ t ⇐⇒ maxs≤t

B(s) −B(t) ≥ maxt≤s≤1

B(s) −B(t).

Denote the last inequality as Y1 ≥ Y2 and note that these random variablesare independent since B has independent increments. Now, by the translationand symmetry properties of B and Theorem 11,

Y1d= M(t) d= |B(t)| d= t1/2|Z1|,

Y2d= M(1 − t) d= |B(1 − t)| d= (1 − t)1/2|Z2|,

where Z1 and Z2 are normal random variables with mean 0 and variance 1.From these observations, we have

Pτ ≤ t = PY1 ≥ Y2 = PtZ21 ≥ (1 − t)Z2

2= PZ2

2/(Z21 + Z2

2 ) ≤ t, (5.16)

where we may take Z1 and Z2 to be independent. Then (5.15) follows sincethe last probability, due to the symmetry property of the normal distribution,is the arsine distribution by Exercise 14.

Next, note that by Remark 12 on the minimum process, we have

Pτ ′ < t = P maxt≤s≤1

B(s) > 0 + P mint≤s≤1

B(s) < 0

= 2PY > −B(t).

where Y = maxt≤s≤1B(s) − B(t) is independent of B(t). By the symmetryof B and Theorem 11, we have

−B(t) d= B(t) d= t1/2Z1,

Yd= M(1 − t) d= |B(1 − t)| d= (1 − t)1/2|Z2|,

where Z1 and Z2 are normal random variables with mean 0 and variance 1.Assuming Z1 and Z2 are independent, the preceding observations and (5.16)yield

Pτ ′ < t = 2P(1 − t)1/2|Z2| < t1/2Z1= P(1 − t)Z2

2 < tZ21 = Pτ ≤ t.

This proves that τ ′ also has the arc sine distribution.


Next, we consider the event that a Brownian motion returns to the origin0 in a future time interval.

Theorem 16. The event A that a Brownian motion B hits 0 in a time in-terval [t, u] has the probability

P (A) =2π

arccos√t/u, where 0 < t < u. (5.17)

Proof. For t < u and u = 1, it follows by Theorem 15 that

P (A) = Pτ > t = 1 − 2π

arcsin√t =

2π

arccos√t.

The proof for u = 1 is Exercise 15.

5.5 Martingales

A martingale is a real-valued stochastic process defined by the property thatthe conditional mean of an “increment” of the process conditioned on pastinformation is 0. A random walk and Brownian motion whose mean stepsizes are 0 have this property. However, the increments of a martingale aregenerally dependent, unlike the independent increments of a random walk orBrownian motion. Martingales are used for proving convergence theorems,analyzing hitting times of processes, finding optimal stopping rules, and pro-viding bounds for processes. Moreover, they are key tools in the theory ofstochastic differential equations.

In this section, we introduce martingales and discuss several examplesassociated with Brownian motion and compound Poisson processes. We alsopresent the important submartingale convergence theorem. The next two sec-tions cover the optional stopping theorem for martingales and its applicationsto Brownian motion.

Throughout this section, X = X(t) : t ≥ 0 will denote a real-valued continuous-time stochastic process that has right-continuous pathsand E[ |X(t)| ] <∞, t ≥ 0. Associated with the underlying probability space(Ω,F , P ) for the process X , there is a filtration Ft : t ≥ 0, which isa family of σ-fields contained in F that is increasing (Fs ⊆ Ft, s ≤ t)and right-continuous (FB

t = ∩u>tFBu ), and F0 contains all events with P -

probability 0. Furthermore, the process X is adapted to the filtration Ft inthat X(t) ≤ x ∈ Ft, for each t and x.

Definition 17. The process X is a martingale with respect to Ft if

E[X(t)|Fs] = X(s) a.s. 0 ≤ s < t. (5.18)

The process X is a submartingale if

5.5 Martingales 355

E[X(t)|Fs] ≥ X(s) a.s. 0 ≤ s < t.

If the inequality is reversed, then X is a supermartingale.

Taking the expectation of (5.18) yields the characteristic of a martingalethat

E[X(t)] = E[X(s)], s ≤ t.

The martingale condition (5.18) is equivalent to

E[X(t) −X(s)|Fs] = 0,

which says that the conditional mean of an increment conditioned on thepast is 0.

A classic illustration of a martingale is the value X(t) of an investment (orthe fortune of a gambler) at time t in a marketplace described by the eventsin Ft. The martingale property (5.18) says that the investment is subject toa “fair market” in that its expected value at any time t conditioned on theenvironment Fs up to some time s < t is the same as the value X(s).

On the other hand, the submartingale property implies that the mar-ket is biased toward “upward” movements of the value X resulting inE[X(t)] ≥ X(s) a.s., for s ≤ t. Similarly, the supermartingle property implies“downward” movements resulting in E[X(t)] ≤ X(s) a.s.

In typical applications, Ft = FYt , which is the σ-field generated by the

events of a right-continuous process Y (s) : s ≤ t on a general state space.In this setting, we say that X is a martingale with respect to Y . In someinstances, it is natural that X is a martingale with respect to the filtrationFt = FX

t of its own history.Martingales in discrete time are defined similarly. In particular, real-valued

random variables Xn with E[ |Xn| ] < ∞ form a martingale with respect toincreasing σ-fields Fn if

E[Xn+1|Fn] = Xn, n ≥ 0.

The Xn is a submartingale or supermartingale if the equality is replaced by ≥or ≤, respectively. Standard examples are sums and products of independentrandom variables; see Exercise 30.

Note that a Brownian motion B is a martingale with respect to itself since,for s ≤ t,

E[B(t)|FBs ] = E

[B(t) −B(s)

∣∣∣B(s)

]+B(s) = B(s).

Similarly, if X(t) = x+ μt+ σB(t) is a Brownian motion with drift, then

E[X(t)|FBs ] = μ(t− s) +X(s), s ≤ t.


Therefore,X is a martingale, submartingale, or supermartingale with respectto B according as μ is = 0, > 0, or < 0.

We will also encounter several functions of Brownian motion that are mar-tingales of the following type.

Example 18. Martingales For Processes with Stationary Independent Incre-ments. Suppose that Y is a real-valued process that has stationary indepen-dent increments and, for simplicity, assume that Y (0) = 0. Suppose that themoment generating function ψ(α) = E[eαY (1)] exists for α in a neighborhoodof 0, and that E[eαY (t)] as a function of t is continuous at 0, for fixed α.

Then by Exercise 7, E[eαY (t)] = ψ(α)t and

E[Y (t)] = at, Var[Y (t)] = bt,

where a = E[Y (1)] and b = Var[Y (1)]. For instance, Y may be Brownianmotion with drift, a Poisson process or a compound Poisson process.

An easy check shows that two martingales with respect to Y are

Y (t) − at, and (Y (t) − at)2 − bt, t ≥ 0.

The means of these martingales are 0.Next, consider the process

Z(t) = eαY (t)/ψ(α)t, t ≥ 0.

Clearly Z(t) is a deterministic, nonnegative function of Y (s) : s ≤ t, andE[Z(t)] = 1. Then Z is a martingale (sometimes called an exponential mar-tingale) with respect to Y . Indeed,

E[Z(t)|FYs ] = Z(s)

E[eα(Y (t)−Y (s))|Z(s)]ψ(α)t−s

= Z(s).

Example 19. Martingales for Brownian Motion. For a Brownian motion withdrift Y (t) = x+μt+σB(t), the preceding example justifies that the followingfunctions of Y are martingales with respect to B:

(Y (t) − x− μt)2 − σ2t, ec[Y (t)−x−μt]−e−c2σ2t/2, t ≥ 0, c = 0.

In particular, B(t)2 − t and ecB(t)−c2t/2 are martingales with respect to B.

Having a constant mean suggests that a martingale should also have anice limiting behavior. According to the next major theorem, many sub-martingales as well as martingales converge a.s. to a limit. This result fordiscrete-time processes also holds in continuous-time.

Theorem 20. (Submartingale Convergence) If Xn is a martingale, or a sub-martingale that satisfies supnE[X+

n ] <∞, then there exists a random variableX with E[ |X | ] <∞ such that Xn → X a.s. as n→ ∞.

5.5 Martingales 357

This convergence can be viewed as an extension of the fact that a nonde-creasing sequence of real numbers that is bounded converges to a finite limit.For a submartingale, the nondecreasing tendency is E[Xn+1|Fn] ≤ Xn, anda bound on E[X+

n ] is enough to ensure convergence a.s. — the submartingaleitself need not be nondecreasing.

The theorem establishes the existence of the limit X , but it does notspecify its distribution. Properties of X can sometimes be derived in specificcases depending on characteristics of Xn.

In addition to the convergence Xn → X a.s., it follows by Theorem 15 inthe Appendix that E[ |Xn −X | ] → 0 as n→ ∞ when the Xn are uniformlyintegrable.

Although Theorem 20 is a major result, we will only give the following ex-ample since it is not needed for our results. For a proof and other applications,see for instance [37, 61, 62, 64].

Example 21. Doob Martingale. Let Z be a random variable with E[ |Z| ] <∞,and let Fn be an increasing filtration on the underlying probability space forZ. The conditional expectation

Xn = E[Z |Fn], n ≥ 1,

is a martingale with respect to Fn. Then by Theorem 20

X = limn→∞

E[Z|Fn] exists a.s.

That Xn is a martingale follows since

E[ |Xn| ] ≤ E[E[ |Z| |Fn]

]= E[ |Z| ] <∞,

E[Xn+1|Fn] = E[E[Z |Fn+1]

∣∣∣Fn]

]= E[Z |Fn] = Xn.

Consider the case Xn = E[Z |Y1, . . . , Yn] in which Xn is a martingale withrespect to Yn. For instance, Xn could be an estimate for the mean of Z basedon observations Y1, . . . , Yn associated with Z. By an additional argument itfollows that the limit of Xn is X = E[Z |Y1, Y2, . . .]. Therefore,

E[Z |Y1, . . . , Yn] → E[Z |Y1, Y2, . . .] a.s. as n→ ∞.

In particular, if Z is the indicator of an event A in the σ-field generated byY1, Y2, . . ., then

P (A |Y1, . . . , Yn) → P (A) a.s. as n→ ∞.


5.6 Optional Stopping of Martingales

This section presents the optional stopping theorem for martingales. It wasinstrumental in the proof of the strong Markov property for Brownian motionin Theorem 3; the next section uses it to analyze hitting times of Brownianmotion.

For the following discussion, suppose that X is a martingale with respectto a filtration Ft, and that τ is a stopping time of the filtration: τ ≤ t ∈ Ft,for each t.

We will now address the following questions: Is the martingale property,E[X(t)] = E[X(0)] also true when t is a stopping time? More generally, isE[X(σ)] = E[X(τ)] true for any stopping times σ and τ?

The optional stopping theorem below says that E[X(τ)] = E[X(0)] isindeed true for a bounded stopping time τ . A corollary is that the equalityis also true for a finite stopping time when X satisfies certain bounds. Thiswould imply, for instance, that the expected value of an investment in a fairmarket at the stopping time is the same as the initial value. In other words,in this fair market, there would be no benefit for the investor to choose tostop and freeze his investment at a time depending only on the past historyof the market (independent of the future).

There are several optional stopping theorems with slightly different as-sumptions. For our purposes, we will use the following version from [61]. Itsdiscrete-time version is Theorem 28 below.

Theorem 22. Associated with the martingale X, assume that σ and τ arestopping times of Ft such that τ is bounded a.s. Then

X(σ ∧ τ) = E[X(τ)|Fσ ] a.s. (5.19)

Hence E[X(τ)] = E[X(0)]. If σ is also bounded, then E[X(σ)] = E[X(τ)].

Proof. Our proof for this continuous time setting uses an approximationbased on the analogous discrete-time result in Theorem 28 below. For a fixedn ∈ Z+, let Xk = X(k2−n), k ∈ Z+. Clearly Xk is a discrete-time martingalewith respect to Fk = Fk2−n . Define

σn = 2nσ + 1/2n, τn = 2nτ + 1/2n.

Now σ′m = 2nσm for fixed m, and τ ′n = 2nτn are integer-valued stoppingtimes of Fk. Then by Theorem 28 below,

Xσ′m∧τ ′

n= E[Xτ ′

n|Fσ′

m] a.s.

This expression in terms of the preceding definitions is

X(σm ∧ τn) = E[X(τn)|Fσm ] a.s.

5.6 Optional Stopping of Martingales 359

Letting m→ ∞ in this expression results in σm → σ and

X(σ ∧ τn) = E[X(τn)|Fσ] a.s.

Then letting n→ ∞ in this expression yields (5.19). The justifications of thelast two limit statements, which are in [61], will not be covered here.

The assertion E[X(0)] = E[X(τ)] follows by taking expectations in (5.19)with σ = 0. Finally, when σ as well as τ is bounded, then (5.19) and thisexpression with the roles of σ and τ reversed yield

E[X(τ)|Fσ] = X(σ ∧ τ) = E[X(σ)|Fτ ] a.s.

Then expectations of these terms give E[X(σ)] = E[X(τ)].

Theorem 22 can also be extended to stopping times that are a.s. finite, butnot necessarily bounded. To see this, suppose that τ is an a.s. finite stoppingtime of Ft. The key idea is that, for fixed s and t, the τ ∧ s and τ ∧ t are a.s.bounded stopping times of Ft. Then by Theorem 22,

X(τ ∧ s) = E[X(τ ∧ t)|Fτ∧s], s < t.

This property justifies the following fact, which is used in the proof below.Remark 23. Stopped Martingales. The stopped processX(τ∧t) is a martingalewith respect to Ft.

Corollary 24. Associated with the martingale X, suppose that τ is an a.s.finite stopping time of Ft, and that either one of the following conditions issatisfied.(i) E

[supt≤τ |X(t)|

]<∞.

(ii) E[ |X(τ)| ] <∞, and limu→∞ E[ |X(u)|1(τ > u)] = 0.Then E[X(τ)] = E[X(0)].

Proof. Since X(τ ∧ t) is a martingale with respect to Ft, Theorem 22 impliesE[X(τ ∧ u)] = E[X(0)], for u > 0. Now, we can write

|E[X(τ)] − E[X(0)]| = |E[X(τ)] − E[X(τ ∧ u)]|≤ E[ |X(τ) −X(u)|1(τ > u)] |.

If (i) holds, then |X(τ)−X(u)| ≤ 2Z, where Z = supt≤τ |X(t)|. Since τ isfinite a.s., 1(τ > u) → 0 a.s. as u→ ∞, and so by the dominated convergencetheorem,

|E[X(τ)] − E[X(0)]| ≤ 2E[Z1(τ > u)] → 0.

On the other hand, if (ii) holds, then

|E[X(τ)] − E[X(0)]| ≤ E[(

|X(τ)| + |X(u)|)1(τ > u)

]→ 0.

Thus, E[X(τ)] = E[X(0)] if either (i) or (ii) is satisfied.


The next proposition and example illustrate computations involving op-tional stopping.

Proposition 25. (Wald Identities) Let X be a process with stationary in-dependent increments as in Example 18, with E[ |X(1)| ] < ∞ and ψ(α) =E[eαX(1)]. Suppose τ is an a.s. finite stopping time of X. Then

E[X(τ)] = E[X(1)]E[τ ].

If in addition τ is bounded or E[supt≤τ |X(t)|

]<∞, then

E[eαX(τ)ψ(α)−τ ] = 1, for any α with ψ(α) ≥ 1. (5.20)

Proof. Example 18 establishes that X(t) − E[X(1)]t is a martingale withrespect to X . Now, τ ∧ t is a bounded stopping time of X , and so by theoptimal stopping theorem, E[X(τ ∧ t) −E[X(1)](τ ∧ t)] = 0. Letting t→ ∞in this expression, the dominated and monotone convergence theorems yieldE[X(τ)] = E[X(1)]E[τ ].

Similarly, Z(t) = eαX(t)ψ(α)−t is a martingale with respect to X , andunder the assumptions the optional stopping theorem or Corollary 24 implythat E[Z(τ)] = E[Z(0)] = 1, which gives (5.20).

Example 26. Brownian Optional Stopping. For a Brownian motion B, weknow by Example 19 that the processes B(t) and B(t)2 − t are martingaleswith respect to B with zero means. Then as in the preceding proposition, wehave the following result.

If τ is an a.s. finite stopping time of B, then E[B(τ)] = 0. In addition,E[τ ] = E[B(τ)2] if τ is bounded a.s.

Example 19 also noted that X(t) = ecB(t)−c2t/2 is a martingale with re-spect to B with mean 1. If τ is an a.s. bounded stopping time of B, thenE[X(τ)] = 1 by the optional stopping theorem. Consequently, the conditionalmoment generating function for an increment of B following τ is

E[ec[B(τ+u)−B(τ)]|Fτ ] = ec2u/2 = E[ecB(u)].

This was the key step in proving the strong Markov property ofB for boundedstopping times.

The rest of this section is devoted to proving the discrete-time optionalstopping theorem used in the proof of Theorem 22. We begin with a prelim-inary result.

Proposition 27. Let X and Y be random variables on a probability space,and let F and G be two σ-fields on the space. Suppose there is an eventA ∈ F ∩ G such that X = Y a.s. on A and F = G on A (A ∩ F = A ∩ G).Then E[X |F ] = E[Y |G] a.s. on A.

5.7 Hitting Times for Brownian Motion with Drift 361

Proof. Let Z = E[X |F ]−E[Y |G] and C = A∩Z > 0. Under the hypothe-ses, C ∈ F ∩ G and

E[Z1C ] = E[E[X |F ]1C − E[Y |G]1C

]= E[X1C − Y 1C ] = 0.

Here 1C is the random variable 1(ω ∈ C). Because a nonnegative randomvariable V is 0 a.s. if and only if E[V ] = 0, it follows that Z1C = 0 a.s.,which implies Z ≤ 0 a.s. on A. A similar argument with C = A ∩ Z < 0,shows Z ≥ 0 a.s. on A. This proves the assertion.

Theorem 28. Suppose that Xn : n ∈ Z+ is a martingale with respect toFn, and that σ and τ are stopping times of Fn such that τ is bounded a.s.Then

Xσ∧τ = E[Xτ |Fσ] a.s..

Hence E[Xτ ] = E[X0]. If σ is also bounded, then E[Xσ] = E[Xτ ].

Proof. For m ≤ n, one can show that Fτ = Fm on τ = m. Then byProposition 27 and the martingale property,

E[Xn|Fτ ] = E[Xn|Fm] = Xm = Xτ , a.s. on τ = m.

Since this is true for each m ≤ n, we have

E[Xn|Fτ ] = Xτ , a.s. if τ ≤ n a.s. (5.21)

Now, consider the case σ ≤ τ ≤ n a.s. Then Fσ ⊆ Fτ . Using this and(5.21) for τ and for σ, we get

E[Xτ |Fσ] = E[E[Xn|Fτ ]

∣∣∣Fσ

]= E[Xn|Fσ] = Xσ a.s.

In addition, E[Xτ |Fσ] = Xτ a.s. if τ ≤ σ ∧ n.For the general case, similar reasoning using the preceding two results and

Proposition 27 yield

E[Xτ |Fσ] = E[Xτ |Fσ∧τ ] = Xσ∧τ a.s. on σ ≤ τE[Xτ |Fσ] = E[Xσ∧τ |Fσ] = Xσ∧τ a.s. on σ > τ.

This provesXσ∧τ = E[Xτ |Fσ] a.s. The other assertions follow as in the proofof Theorem 22.

5.7 Hitting Times for Brownian Motion with Drift

We will now address the following questions for a Brownian motion with drift.What is the probability that the process hits b before it hits a, for a < b?


What is the distribution and mean of the time for the process to hit b? Weanswer these questions by applications of the material in the preceding sectionon martingales and optional stopping.

Consider a Brownian motion with drift X(t) = x+μt+σB(t), where B isa standard Brownian motion. For a < x < b, let τa and τb denote the timesat which X hits the respective states a and b. In addition, let τ = τb ∧ τa,which is the time at which X escapes from the open strip (a, b). Our focuswill be on properties of these hitting times.

Remark 29. Finiteness of Hitting Times. If μ ≥ 0, then τb is finite a.s. sinceusing Remark 9,

τb = inft ≥ 0 : X(t) = b ≤ inft ≥ 0 : B(t) = (b − x)/σ <∞ a.s.

Similarly, τa is finite a.s. if μ ≤ 0. Also, τ is finite since either τa or τb isnecessarily finite.

We begin with a result for a Brownian motion with no drift.

Theorem 30. The probability that the process X(t) = x+σB(t) hits b beforea is

Pτb < τa = (x− a)/(b− a). (5.22)

Also, E[τ ] = (x− a)(b − x)/σ2.

Proof. By Example 19, X is a martingale with respect to B with mean x.Also, E[supt≤τ |X(t)|] is finite since X(t) ∈ (a, b) for t ≤ τ . Then by theoptional stopping theorem (Theorem 22) for τ ,

E[X(τ)] = E[X(0)] = x. (5.23)

Now, since τ = τa ∧ τb, we can write

X(τ) = a1(τa ≤ τb) + b1(τb < τa). (5.24)

ThenE[X(τ)] = a[1 − Pτb < τa] + bPτb < τa.

Substituting this in (5.23) and solving for Pτb < τa, we obtain (5.22).Next, we know by Example 19 that Z(t) = (X(t)−x)2−σ2t is a martingale

with respect to B. Then the optional stopping theorem for the boundedstopping time τ ∧ t yields E[Z(τ ∧ t)] = E[Z(0)] = 0. That is,

σ2E[τ ∧ t] = E[(X(τ ∧ t) − x)2].

Now since τ ∧ t ↑ τ and X(t) is bounded for t ≤ τ , it follows by the monotoneand bounded convergence theorems that

σ2E[τ ] = E[(X(τ) − x)2].

5.7 Hitting Times for Brownian Motion with Drift 363

Then using (5.24) in the last expectation followed by (5.22), we have

σ2E[τ ] = (a− x)2Pτa ≤ τb + (b− x)2Pτb < τa= (x − a)(b− x).

The preceding result for a Brownian motion with no drift has the followinganalogue for a Brownian motion X(t) = x+ μt+ σB(t) with drift μ = 0.

Theorem 31. The probability that the process X hits b before a is

Pτb < τa =eαx − eαa

eαb − eαa, (5.25)

where α = −2μ/σ2. In addition,

E[τ ] = μ−1[(a− x) + (b − a)Pτb < τa

]. (5.26)

Proof. As in Example 19, Z(t) = expcX(t) − cx − (cμ + c2σ2/2)t is amartingale with respect to B. Letting c = α, this martingale reduces toZ(t) = eαX(t)−αx.

Now, E[supt≤τ |Z(t)|] is finite, since X(t) ∈ [a, b] for t ≤ τ . Then byCorollary 24 on optional stopping,

1 = E[Z(0)] = E[Z(τ)] = e−αxE[eαX(τ)].

Now, using X(τ) = a1(τa ≤ τb) + b1(τb < τa) in this expression yields

eαx = eαa(1 − Pτb < τa) + eαbPτb < τa.

This proves (5.25).To determine E[τ ], we apply the optional stopping theorem to the mar-

tingale B(t) = σ−1[X(τ) − x − μτ ] and the bounded stopping time τ ∧ t toget 0 = E[B(τ ∧ t)]. That is,

μE[τ ∧ t] + x = E[X(τ ∧ t)].

Letting t → ∞ in this expression, we have τ ∧ t ↑ τ , and so the monotoneand bounded convergence theorems yield

μE[τ ] + x = E[X(τ)] = bPτb < τa + a(1 − Pτb < τa).

This proves (5.26).

The last result of this section characterizes the distribution of the hittingtime τb for a Brownian motion X(t) = μt+ σB(t) with drift μ.

Theorem 32. Let τb denote the time at which the Brownian motion X hitsb > 0. If μ ≥ 0, then E[τb] = −b/μ (which is ∞ if μ = 0) and the Laplacetransform and density of τb are


E[e−λτb ] = exp−bσ−2[√μ2 + 2σ2λ− μ], (5.27)

fτb(x) =

b

σ√

2πx3exp−(b− μx)2/(2σ2x. (5.28)

If μ < 0, then τb may be infinite and Pτb <∞ = e2bμ/σ2.

Proof. For the case μ < 0, it follows by (5.25) (with x = 0 and α > 0) that

Pτb <∞ = lima→−∞

Pτb < τa = e2bμ/σ2.

Next, consider the case μ ≥ 0. For positive constants α and λ, considerthe process Z(t) = eαX(t)−λt. This is a martingale (see Example 19) and itreduces to Z(t) = ecB(t)−c2t/2, where

c = ασ, α = σ−2[√μ2 + 2σ2λ− μ].

This choice of α ensures that α2σ2/2 + αμ− λ = 0.Now, applying the optional stopping theorem to the martingale Z(t) and

the bounded stopping time τb ∧ t, we obtain

1 = E[Z(0)] = E[Z(τb ∧ t)] = E[eαX(τb∧t)−λ(τb∧t)].

Since X is continuous a.s., we have

limt→∞

[αX(τb ∧ t) − λ(τb ∧ t)] = αb− λτb a.s.

Then by the preceding displays and the bounded convergence theorem,

1 = E[ limt→∞

Z(τb ∧ t)] = eαbE[e−λτb ].

This proves (5.27). Inverting this Laplace transform yields the density formula(5.28). Finally, the derivative of this transform at λ = 0 yields E[τb] = −b/μ.

5.8 Limiting Averages and Law of the IteratedLogarithm

This section contains strong laws of large numbers for Brownian motion andits maximum process, and a law of the iterated logarithm for Brownian mo-tion.

As usual B will denote a standard Brownian motion. The strong law oflarge numbers for it is as follows.

Theorem 33. A Brownian motion B has the limiting average

5.8 Limiting Averages and Law of the Iterated Logarithm 365

limt→∞

t−1B(t) = 0 a.s.

Proof. Since B has stationary independent increments, it has regenerativeincrements with respect to the deterministic times Tn = n. Then the assertionwill follow by the SLNN in Theorem 54 in Chapter 2 for processes withregenerative increments upon showing that n−1B(n) → 0 a.s., and

E[

maxn−1≤t≤n

|B(t)|]<∞. (5.29)

Now, the SLLN for i.i.d. random variables ensures that

n−1B(n) = n−1n∑

m=1

[B(m) −B(m− 1)] → E[B(1)] = 0, a.s.

Also, (5.29) follows since E[M(1)] <∞ and

maxn−1≤t≤n

|B(t)| d= max0≤t≤1

|B(t)| ≤M(1) −M(1),

where M(t) = mins≤tB(s) d= −M(t) by Remark 12.

If a real-valued process X , such as a Brownian motion or a functional of aMarkov process, has a limiting average t−1X(t) → c a.s., you might wonderif its maximum M(t) = sups≤tX(s) also satisfies t−1M(t) → c a.s. Wonderno longer. The answer is given by the next property, which is analogous tothe elementary fact that if n−1cn → c, then n−1

∑nk=1 ck → c.

Proposition 34. Let x(t) and a(t) be real-valued functions on R+ such that

0 ≤ a(t) → ∞, a(t)−1x(t) → c, as t→ ∞.

Then the maximum m(t) = sups≤t x(s) satisfies limt→∞ a(t)−1m(t) = c.

Proof. For any ε > 0, let t′ be such that a(t)−1x(t) < c+ ε, for t ≥ t′. Thenfor t ≥ t′,

a(t)−1x(t) ≤ a(t)−1m(t) = maxa(t)−1m(t′), a(t)−1 sup

t′≤s≤tx(s)

≤ maxa(t)−1m(t′), c+ ε

.

Letting t→ ∞ in this display, it follows that

c ≤ lim inft→∞

a(t)−1m(t) ≤ lim supt→∞

a(t)−1m(t) ≤ c+ ε.

Since this is true for any ε, we have a(t)−1m(t) → c.

We will now apply this result to the maximum processM(t) = maxs≤tX(s)for a Brownian motion with drift X(t) = x+ μt+ σB(t).


Proposition 35. The Brownian motion with drift X and its maximum pro-cess have the limiting averages

t−1X(t) → μ, t−1M(t) → μ a.s. as t→ ∞.

Proof. This follows since the SLLN t−1B(t) → 0 implies that

t−1X(t) = t−1x+ μ+ σt−1B(t) → μ a.s.,

and then t−1M(t) → μ a.s. follows by Proposition 34.

The preceding result implies that M(t) → ∞ or −∞ a.s. according as thedrift μ is positive or negative. This tells us something about the maximum

M(∞) = supt∈R+

X(t)

on the entire time axis. First, we have the obvious result

M(∞) = limt→∞

M(t) = ∞ a.s. when μ > 0.

Second, M(∞) = ∞ a.s. when μ = 0 by the law of the iterated logarithm in(5.30) below.

For the remaining case of μ < 0, we have the following result.

Theorem 36. If μ < 0 and X(0) = 0, then M(∞) has an exponential dis-tribution with rate −2μ/σ2.

Proof. The assertion follows, since letting t → ∞ in M(t) > b = tb < tand using Theorem 32,

PM(∞) > b = Pτb <∞ = e2μb/σ2.

We will now consider fluctuations of Brownian motions that are describedby a law of the iterated logarithm. Knowing that the limiting average ofBrownian motion B is 0 as t → ∞, a follow-on issue is to characterize itsfluctuations about 0. These fluctuations, of course, can be described for a“fixed” t by the normal distribution of B(t); e.g., P|B(t)| ≤ 2

√t ≈ .95.

However, to get a handle on rare fluctuations as t→ ∞, it is of interest tofind constants h(t) such that

lim supt→∞

B(t)h(t)

= 1 a.s.

In other words, h(t) is the maximum height of the fluctuations of B(t) above0, and B(t) gets near h(t) infinitely often (i.o.) as t→ ∞ in that

PB(t) ∈ [h(t) − ε, h(t)] i.o. = 1, ε > 0.

5.8 Limiting Averages and Law of the Iterated Logarithm 367

Since the reflection −B is a Brownian motion, the preceding would also yield

lim inft→∞

B(t)h(t)

= −1 a.s.

These fluctuations are related to those as t ↓ 0 as follows.

Remark 37.

lim supt→∞

B(t)h(t)

= 1 a.s. ⇐⇒ lim supt↓0

B(t)th(1/t)

= 1 a.s.

This is because the time-inversion process X(t) = tB(1/t) is a Brownianmotion by Exercise 2. Indeed, the equivalence is true since, using s = 1/t,

lim supt→∞

B(t)h(t)

= lim sups↓0

X(s)sh(1/s)

.

Remark 37 says that h(t) is the height function for fluctuations of B ast → ∞ if and only if th(1/t) is the height function for fluctuations as t ↓ 0.The height functions for both of these cases are as follows. The proof, due toKhintchine 1924, is in [37, 61, 64].

Theorem 38. (Law of the Iterated Logarithm)

lim supt↓0

B(t)√

2t log log(1/t)= 1, lim sup

t→∞

B(t)√2t log log t

= 1 a.s.

lim inft↓0

B(t)√

2t log log(1/t)= −1, lim inf

t→∞

B(t)√2t log log t

= −1 a.s.

Note that the lim supt↓0 result implies that B(t) > 0 i.o. near 0, and so

inft > 0 : B(t) > 0 = 0 a.s.

Similarly, the lim inft↓0 result implies that B(t) < 0 i.o. near 0 a.s. Conse-quently, B(t) = 0 i.o. near 0 a.s. because B has continuous paths a.s.

The other results for t → ∞ imply that, for any fixed a > 0, we haveB(t) > a and B(t) < −a i.o. a.s. as t→ ∞, and so B passes through [−a, a]i.o. a.s. Furthermore, the extremes of B are

supt∈R+

B(t) = ∞ a.s., inft∈R+

B(t) = −∞ a.s. (5.30)


5.9 Donsker’s Functional Central Limit Theorem

By the classical central limit theorem (Theorem 63 in Chapter 2), we knowthat a random walk under an appropriate normalization converges in dis-tribution to a normal random variable. This section extends this result tostochastic processes. In particular, viewing a random walk as a process incontinuous time, if the time and space parameters are rescaling appropri-ately, then the random walk process converges in distribution to a Brownianmotion. This result, called Donsker’s functional central limit theorem, alsoestablishes that many functionals of random walks can be approximated bycorresponding functionals of Brownian motion.

Throughout this section Sn =∑ni=1 ξk will denote a random walk in which

the step sizes ξn are i.i.d. with mean 0 and variance 1. For each n, considerthe stochastic process

Xn(t) = n−1/2Snt, t ∈ [0, 1].

That is,

Xn(t) = n−1/2Sk if k/n ≤ t < (k + 1)/n for some k < n.

This process is a continuous-time representation of the random walk Sk inwhich the location Sk is rescaled (or shrunk) to the value n−1/2Sk, and thetime scale is rescaled such that the walk takes [nt] steps in time t. Then as nbecomes large the steps become very small and frequent and, as we will show,Xn converges in distribution to a standard Brownian motion B as n→ ∞.

We begin with the preliminary observation that the finite-dimensional dis-tributions of Xn converge in distribution to those of B. That is, for any fixedt1 < · · · < tk,

(Xn(t1), . . . , Xn(tk))d→ (B(t1), . . . , B(tk)), as n→ ∞. (5.31)

In particular, for each fixed t, we have Xn(t)d→ B(t), as n→ ∞.

The latter follows since n−1/2Snd→ B(1) by the classical central limit

theorem, and so

Xn(t) = (nt/n)1/2nt−1/2Sntd→ t1/2B(1) d= B(t).

Similarly, (5.31) follows by a multivariate central limit theorem.Expression (5.31) only provides a partial description of the convergence

in distribution of Xn to B; we will now give a complete description of theconvergence that includes sample path information.

Throughout this section, D = D[0, 1] will denote the set of all functionsx : [0, 1] → R that are right-continuous with left-hand limits. Assume that theσ-field associated with D is the smallest σ-field under which the projection

5.9 Donsker’s Functional Central Limit Theorem 369

map x → x(t) is measurable, for each t. Almost every sample path of Xn isa function in D, and so the process Xn is a D-valued random variable (or arandom element in D).

We will consider D as a metric space in which the distance between twofunctions x and y is ‖x− y‖, based on the uniform or supremum norm

‖x‖ = supt≤1

|x(t)|.

Other metrics for D are discussed in [11, 115]. Convergence in distributionof random elements in D, as in other metric spaces, is as follows. Randomelements Xn in a metric S convergence in distribution to X in S as n→ ∞,denoted by Xn

d→ X in S, if

limn→∞

E[f(Xn)] = E[f(X)],

for any bounded continuous function f : S → R. The convergence Xnd→ X

is equivalent to the weak convergence of their distributions

PXn ∈ · w→ PX ∈ ·. (5.32)

Several criteria for this convergence are in the Appendix.An important consequence of Xn

d→ X in S is that it readily leads to theconvergence in distribution of a variety of functionals of the Xn as follows.

Theorem 39. (Continuous Mapping) Suppose that Xnd→ X in S as n→ ∞,

and f : S → S′ is a measurable mapping, where S′ is another metric space.If C ⊆ S is in the σ-field of S such that f is continuous on C and X ∈ C

a.s., then f(Xn)d→ f(X) in S′ as n→ ∞.

Proof. Recall that Xnd→ X is equivalent to (5.32), which we will denote by

μnw→ μ. Then f(Xn)

d→ f(X) is equivalent to μnf−1 w→ μf−1 since

Pf(Xn) ∈ A = PXn ∈ f−1(A) = μnf−1(A).

Also note that by Theorem 10 in the Appendix, μnw→ μ is equivalent to

lim infn→∞

μn(G) ≥ μ(G), for any open G ⊆ S.

Now using this characterization, for any open set G ⊆ S′,

lim infn→∞

μnf−1(G) ≥ lim inf

n→∞μn(f−1(G)) ≥ μ(f−1(G)).

Here A is the interior of the set A. Clearly f−1(G) ⊃ C ∩ f−1(G), andμ(C) = 1 by the assumption X ∈ C a.s. Then μ(f−1(G)) = μf−1(G). Using


this in the preceding display yields lim infn→∞ μnf−1(G) ≥ μf−1(G), which

proves μnf−1 w→ μf−1, and hence f(Xn)d→ f(X).

We are now ready to present the functional central limit theorem provedby Donsker in 1951 for the continuous-time random walk process

Xn(t) = n−1/2Snt, t ∈ [0, 1].

Theorem 40. (Donsker’s FCLT) For the random walk process Xn definedabove, Xn

d→ B in D as n→ ∞, where B is a standard Brownian motion.

The proof of this theorem will follow after a few observations and prelimi-nary results. Donsker’s theorem is called a “functional central limit theorem”because, under the continuous-mapping theorem, many functionals of therandom walk also converge in distribution to the corresponding functionalsof the limiting Brownian motion. Two classic examples are as follows; wecover other examples later.

Example 41. If Xnd→ B in D, then, for t1 < · · · < tk ≤ 1,

(n−1/2Snt1, . . . , n−1/2Sntk)

d→ (B(t1), . . . , B(tk)). (5.33)

This convergence is equivalent to (5.31). Now (5.33) says f(Xn)d→ f(B),

where f : D → Rk is defined, for fixed t1 < · · · < tk, by

f(x) = (x(t1), . . . , x(tk)).

Clearly f is continuous on the set C of continuous functions in D and B ∈ Ca.s. Then (5.33) follows from the continuous-mapping theorem.

Example 42. The convergence Xnd→ B implies

n−1/2 maxm≤n

Smd→ max

s≤1B(s).

The distribution of the limit is given in Theorem 11. The convergence fol-lows by the continuous-mapping theorem since the function f : D → R+

defined by f(x) = maxs≤1 x(s) is continuous in that ‖xn − x‖ → 0 impliesmaxs≤1 xn(s) → maxs≤1 x(s).

Donsker’s FCLT is also called an invariance principle because in the con-vergence Xn

d→ B, the Brownian motion limit B is the same for “any” dis-tribution of the step size of the random walk, provided it has a finite meanand variance. When the mean and variance are not 0 and 1, respectively, theresult applies with the following change in notation.

5.9 Donsker’s Functional Central Limit Theorem 371

Remark 43. If ξn are i.i.d. random variables with finite mean μ and varianceσ2, then (ξk−μ)/σ are i.i.d. with mean 0 and variance 1, and hence Donsker’stheorem holds for

Xn(t) = n−1/2

nt∑

k=1

(ξk − μ)/σ, t ∈ [0, 1].

Consequently, the random walk Sn =∑nk=1 ξk, for large n, is approximately

equal in distribution to a Brownian motion with drift. In particular, usingn1/2B(t) d= B(nt),

Sntd≈ μnt + σB(nt), Sn

d≈ μn+ σB(n).

Does the convergence in distribution in Donsker’s theorem hold for pro-cesses defined on the entire time axis R+? To answer this, consider the spaceD[0, T ] of all functions x : [0, T ] → R that are right-continuous with left-hand limits, for fixed T > 0. Similarly to D[0, 1], the D[0, T ] is a metricspace with the supremum norm. Now let D(R+) denote the space of all func-tions x : R+ → R that are right-continuous with left-hand limits. ConsiderD(R+) as a metric space in which convergence xn → x in D(R+) holds ifxn → x in D[0, T ] holds for each T that is a continuity point of x.

Remark 44. Convergence in D(R+). Donsker’s convergence Xnd→ B holds in

D[0, T ], for each T , and in D(R+) as well. The proof for D[0, T ] is exactlythe same as that for D[0, 1]. The convergence also holds in D(R+), since Bis continuous a.s.

Donsker’s approach for proving Theorem 40 is to prove the convergence(5.31) of the finite-dimensional distributions and then establish a certaintightness condition. This proof is described in Billingsley 1967; his book andone by Whitt 2002 cover many fundamentals of functional limit theoremsand weak convergence of probability measures on metric spaces.

Another approach for proving Theorem 40, which we will now present, isby applying Skorohod’s embedding theorem. The gist of this approach is thatone can construct a Brownian motion B and stopping times τn for it suchthat Sn d= B(τn). Then further analysis of Xn and B defined on thesame probability space establishes ‖Xn −B‖ P→ 0, which yields Xn

d→ B.The key embedding theorem for this analysis is as follows. It says that

any random variable ξ with mean 0 and variance 1 can be represented asB(τ) for an appropriately defined stopping time τ . Furthermore, any i.i.d.sequence ξn of such variables can be represented as an embedded sequenceB(τn)−B(τn−1) in a Brownian motion B for appropriate stopping times τn.The proof is in [37, 61].

Theorem 45. (Skorohod Embedding) Associated with the random walk Sn,there exists a standard Brownian motion B with respect to a filtration and


stopping times 0 = τ0 ≤ τ1 ≤ . . . such that τn − τn−1 are i.i.d. with mean 0and variance 1, and Sn d= B(τn).

Another preliminary leading to Donsker’s theorem is the following Skoro-hod approximation result that the uniform difference between the randomwalk and a Brownian motion on [0, t] is o(t1/2) a.s. as t→ ∞. This materialand the proof of Donsker’s theorem below is from Kallenberg 2004.

Theorem 46. (Skorohod Approximation of Random Walks) There exists astandard Brownian motion B on the same probability space as the randomwalk Sn such that

t−1/2 sups≤t

|Ss −B(s)| P→ 0, as t→ ∞. (5.34)

Proof. Let B and τn be as in Theorem 45, and define them on the sameprobability space as Sn (which is possible) so Sn = B(τn) a.s. Define

D(t) = t−1/2 sups≤t

|B(τs) −B(s)|.

Then (5.34) is equivalent to PD(t) > ε → 0 for ε > 0.To prove this convergence, let δt = sups≤t |τs − s|, t ≥ 0. For a fixed

γ > 0, consider the inequality

PD(t) > ε ≤ PD(t) > ε, t−1δt ≤ γ + Pt−1δt > γ. (5.35)

Note that n−1τn → 1 a.s. by the strong law of large numbers for i.i.d. randomvariables, and so t−1|τt − t| → 0 a.s. Then the limiting average of thesupremum of these differences is t−1δt → 0 a.s. by Proposition 34.

Next, consider the modulus of continuity of f : R+ → R, which is

w(f, t, γ) = supr,s≤t, |r−s|≤γ

|f(r) − f(s)|, t ≥ 0.

ClearlyD(t) ≤ w(B, t+ tγ, tγ), when t−1δt ≤ γ.

Using this observation in (5.35) and t−1/2B(r) : r ≥ 0 d= B(rt) : r ≥ 0from the scaling property in Exercise 2, we have

PD(t) > ε ≤ Pt−1/2w(B, t+ tγ, tγ) > ε + Pt−1δt > γ= Pw(B, 1 + γ, γ) > ε + Pt−1δt > γ.

Letting t → ∞ (t−1δt → 0 a.s.), and then letting γ → 0 (B has continuouspaths a.s.), the last two probabilities tend to 0. Thus PD(t) > ε → 0,which proves (5.34).

We will now obtain Donsker’s theorem by applying Theorem 46.

5.10 Regenerative and Markov FCLTs 373

Proof of Donsker’s Theorem. Let B and Sn = B(τn) a.s. be as in the proofof Theorem 46, and define Bn(t) = n−1/2B(nt). Clearly

‖Xn −Bn‖ = n−1/2 supt≤1

|Snt −B(nt)| = n−1/2 sups≤n

|Ss −B(s)|.

Then ‖Xn −Bn‖ P→ 0 by Theorem 46.Next, note that, by Exercise 1, the scaled process Bn is a Brownian motion.

Now, as in [61], one can construct Xn and a Brownian motion B on the sameprobability space such that (Xn, B) d= (Xn, Bn). Then we have

‖Xn −B‖ d= ‖Xn − B‖ d= ‖Xn −Bn‖ P→ 0.

This proves Xnd→ B.

5.10 Regenerative and Markov FCLTs

This section presents an extension of Donsker’s FCLT for processes withregenerative increments. This in turn yields FCLTs for renewal processesand ergodic Markov chains in discrete and continuous time.

For this discussion, suppose that Z(t) : t ≥ 0 is a real-valued processwith Z(0) = 0 that is defined on the same probability space as a renewal pro-cess N(t) whose renewal times are denoted by 0 = T0 < T1 < . . . The incre-ments of the two-dimensional process (N(t), Z(t)) in the interval [Tn−1, Tn)are denoted by

ζn = (Tn − Tn−1, Z(t) − Z(Tn−1) : t ∈ [Tn−1, Tn)).

Recall from Section 2.10 that Z(t) has regenerative increments over Tn if ζnare i.i.d.

Theorem 65 in Chapter 2 is a central limit theorem for processes withregenerative increments. An analogous FCLT is as follows. Assuming theyare finite, let

μ = E[T1], a = E[Z(T1)]/μ, σ2 = Var[Z(T1) − aT1],

and assume σ > 0. In addition, let

Mn = supTn<t≤Tn+1

|Z(t) − Z(Tn)|, n ≥ 0,

and assume E[M1] and E[T 21 ] are finite. For r > 0, consider the process


Xr(t) =Z(rt) − art

σ√r/μ

, t ∈ [0, 1].

This is the regenerative-increment process Z with space-time scale changesanalogous to those for random walks. A real-valued parameter r instead ofan integer is appropriate since Z is a continuous-time process.

Theorem 47. (Regenerative Increments) For the normalized regenerative-increment process Xr defined above, Xr

d→ B as r → ∞, where B is astandard Brownian motion.

The proof uses the next two results. Let D1 denote the subspace of func-tions x in D that are nondecreasing with x(0) = 0 and x(t) ↑ 1 as t → 1.The composition mapping from the product space D ×D1 to D, denoted by(x, y) → x y, is defined by x y(t) = x(y(t)), t ∈ [0, 1]. Let C and C1 denotethe subspaces of continuous functions in D and D1, respectively.

Proposition 48. The composition mapping from D×D1 to D is continuouson the subspace C × C1.

Proof. Suppose (xn, yn) → (x, y) in D×D1 such that (x, y) ∈ C ×C1. Usingthe sup norm and the triangle inequality,

‖xn yn − x y‖ ≤ ‖xn yn − x yn‖ + ‖x yn − x y‖.

Now, the last term tends to 0 since x ∈ C is uniformly continuous. Also,

‖xn yn − x yn‖ = ‖xn − x‖ → 0.

Thus xn yn → x y in D, which proves the assertion.

The continuity of composition mappings under weaker assumptions is dis-cussed in [11, 115]. The importance of the composition mapping is illustratedby the following result. In the setting of Theorem 47, the regenerative- incre-ment property of Z implies that

ξn = Z(Tn) − Z(Tn−1) − a(Tn − Tn−1)

are i.i.d. with mean 0 and variance σ2.

Lemma 49. Under the preceding assumptions, define the process

X ′r(t) =

1σ√r/μ

N(rt)∑

k=1

ξk, t ∈ [0, 1].

Then X ′rd→ B as r → ∞.

5.10 Regenerative and Markov FCLTs 375

Proof. Letting Xr(t) = 1

σ√r/μ

∑rtk=1 ξk, it follows by Donsker’s theorem that

Xrd→ μ1/2B as r → ∞. With no loss in generality, assume μ−1 < 1. Consider

the process

Yr(t) =N(rt)/r if N(r)/r ≤ μ−1

t/μ if N(r)/r > μ−1.

Note that

Xr Yr(t) =1

σ√r/μ

rYr(t)∑

k=1

ξk.

This equals X ′r(t) when N(r)/r ≤ μ−1, and so for any ε > 0,

P‖X ′r − Xr Yr‖ > ε ≤ PN(r)/r > μ−1 → 0.

The convergence follows since N(r)/r → μ−1 a.s. by the SLLN for renewalprocesses (Corollary 11 in Chapter 2). This proves X ′

r − Xr Yr d→ 0. Thento prove X ′

rd→ B, it suffices by Exercise 53 to show that Xr Yr d→ B.

Letting I(t) = t, t ∈ [0, 1], note that

‖Yr − μ−1I‖ ≤ supt≤1

|N(rt)/r − μ−1t|

= r−1 sups≤r

|N(s) − μ−1s| → 0 a.s.

The convergence follows by Proposition 34 since the SLLN for N impliesr−1|N(r) − μ−1r| → 0 a.s. Now, we have (Xr, Yr)

d→ (μ1/2B,μ−1I), wherethe limit functions are continuous. Then Proposition 48 and Exercise 1 yield

Xr Yr d→ μ1/2B μ−1Id= B.

Thus Xr Yr d→ B, which completes the proof.

Remark 50. The assertion in Lemma 49 implies that

X ′r(1) =

1σ√r/μ

N(r)∑

k=1

ξkd→ B(1),

which is Anscombe’s result in Theorem 64 in Chapter 2.

We now establish the convergence of Xr(t) = (Z(rt) − art)/(σ√r/μ).

Proof of Theorem 47. We can write

Xr(t) = X ′r(t) +

√μ

σVr(t), (5.36)

where


X ′r(t) =

Z(TN(rt)) − aTN(rt)

σ√r/μ

,

Vr(t) = r−1/2[Z(rt) − Z(TN(rt)) − a(rt− TN(rt))

].

Recognizing that X ′r is the process in Lemma 49, we have X ′

rd→ B. Then the

proof of Xrd→ B will be complete upon showing that Vr

d→ 0.Letting

ξn = supTn<t≤Tn+1

|Z(t) − Z(Tn)| + a(Tn+1 − Tn),

it follows that

‖Vr‖ ≤ r−1/2 supt≤1

ξN(rt) =√N(r)/r

(N(r)−1/2 sup

k≤N(r)

ξk

).

The regenerative-increment property of Z implies that the ξn are i.i.d. Then

n−1/2ξnd= n−1/2ξ1 → 0 a.s.

Now N(r)−1/2 supk≤N(r) ξkP→ 0 by Proposition 34. Applying this to the pre-

ceding display and using N(r)/r → μ−1 a.s., we get ‖Vr‖ d→ 0.

Since renewal processes and ergodic Markov chains are regenerative pro-cesses, FCLTs for them are obtainable by Theorem 47. To see this, first notethat a renewal process N(t) has regenerative increments over its renewaltimes Tn, and the parameters above are Mn = 1,

a = E[N(T1)]/μ = μ−1, Var[N(T1) − μ−1T1] = μ−2Var[T1].

Then the following is an immediate consequence of Theorem 47.

Corollary 51. (Renewal Process) Suppose N(t) is a renewal process whoseinter-renewal times have mean μ and variance σ2, and define

Xr(t) =N(rt) − rt/μ

σ√r/μ3

, t ∈ [0, 1].

Then Xrd→ B as r → ∞.

The particular case Xr(1) d→ B(1) is the classical central limit theoremfor renewal processes, which we saw in Example 67 in Chapter 2; namely

N(r) − r/μ

σ√r/μ3

d→ B(1).

For the next result, suppose that Y is an ergodic CTMC on a countablestate space S with stationary distribution p. For a fixed state i, assume that

5.11 Peculiarities of Brownian Sample Paths 377

Y (0) = i and let 0 = T0 < T1 < . . . denote the times at which Y enters statei. Assume Ei[T 2

1 ] < ∞ and let μ = Ei[T1]. For f : S → R, assuming thefollowing integral exists, consider the process

Z(t) =∫ t

0

f(Y (s))ds, t ≥ 0.

This has regenerative increments over the Tn and, assuming the sum is ab-solutely convergent, Corollary 40 in Chapter 4 yields

a = Ei[Z(T1)]/μ =∑

j

f(j)pj.

Assume Ei[M1] and σ2 = Var[Z(T1) − aT1] are finite, and σ > 0. ThenTheorem 47 for the CTMC functional Z is as follows. An analogous resultfor discrete-time Markov chains is in Exercise 48.

Corollary 52. (CTMC) Under the preceding assumptions, for r > 0, definethe process

Xr(t) =

∫ rt0f(Y (s))ds− art

σ√r/μ

, t ∈ [0, 1].

Then Xrd→ B as r → ∞.

5.11 Peculiarities of Brownian Sample Paths

While sample paths of a Brownian motion are continuous a.s., they are ex-tremely erratic. This section describes their erratic behavior.

Continuous functions are typically monotone on certain intervals, but thisis not the case for Brownian motion paths.

Proposition 53. Almost every sample path of a Brownian motion B ismonotone on no interval.

Proof. For any a < b in R+, consider the event A = B is nondecreasing on[a, b]. Clearly A = ∩∞

n=1An, where

An = ∩ni=1B(ti) −B(ti−1) ≥ 0

and ti = a + i(b − a)/n. The A is measurable since each An is. BecausePB(ti) − B(ti−1) ≥ 0 = 1/2 and the increments of B are independent,we have P (An) = 2−n, and so P (A) ≤ limn→∞ P (An) = 0. This conclu-sion is also true for the event A = B is nonincreasing on [a, b]. Thus B ismonotone on no interval a.s.


For the next result, we say that for a Brownian motion B on a closed inter-val I, its local maximum is supt∈I B(t), and its local minimum is inft∈I B(t).There are processes that have local maxima on two disjoint intervals thatare equal with a positive probability, but this is not the case for Brownianmotion.

Proposition 54. The local maxima and minima of a Brownian motion Bare a.s. distinct.

Proof. It suffices to show that, for disjoint closed intervals I and J in R+,

MI = MJ a.s.,

where each of the quantities MI and MJ is either a local minimum or a localmaximum.

First, suppose MI and MJ are both local maxima. Let u denote the rightendpoint of I and v > u denote the left endpoint of J .

MJ −MI = supt∈J

[B(t) −B(v)] − supt∈I

[B(t) −B(u)] +B(v) −B(u).

The three terms on the right are independent, and the last one is nonzero a.s.(since the increments are normally distributed). Therefore, MI = MJ a.s.

This result is also true by similar arguments when each of the quantitiesMI and MJ are both local minima, or when one is a local minimum and theother is a local maximum.

We now answer the question: How much time does a Brownian motionspend in a particular state?

Proposition 55. The amount of time that a Brownian motion B spends ina fixed state a over the entire time horizon is the Lebesgue measure La of thetime set t ∈ R+ : B(t) = a, and La = 0 a.s.

Proof. Since La is nonnegative, it suffices to show E[La] = 0. For n ∈ Z+,consider the process Xn(t) = B(nt/n), t ≥ 0. Clearly Xn(t) → B(t) a.s. asn→ ∞ for each t. Then by Fubini’s theorem,

E[La] =∫

R+

PB(t) = adt =∫

R+

limn→∞

PXn(t) = adt

≤ lim infn→∞

∫

R+

PXn(t) = adt.

The last integral (of a piecewise constant function) is 0 since Xn(t) has anormal distribution, and so E[La] = 0.

Proofs of the next two results are in [61, 64].

5.12 Brownian Bridge Process 379

Theorem 56. (Dvoretzky, Erdos, and Kakutani 1961) Almost every samplepath of a Brownian motion B does not have a point of increase: for positivet and δ,

PB(s) ≤ B(t) ≤ B(u) : (t− δ)+ ≤ s < t < u ≤ t+ δ = 0.

Analogously, every sample path of B does not have a point of decrease.

Theorem 57. (Paley, Wiener and Zygmund 1933) Almost every sample pathof a Brownian motion is nowhere differentiable.

More insights into the wild behavior of a Brownian motion path are givenby its linear and quadratic variations. The (linear) variation of a real-valuedfunction f on an interval [a, b] is

V ba (f) = sup n∑

k=1

|f(tk) − f(tk−1)| : a = t0 < t1 < · · · < tn = b.

If this variation is finite, then f has the following properties:• It can be expressed as the difference f(t) = f1(t) − f2(t) of two increasingfunctions, where f1(t) = V ta (f).• The f has a derivative at almost every point in [a, b].• Riemann-Stieltjes integrals of the form

∫[a,b] g(t)df(t) exist.

In light of these observations, Theorem 57 implies that almost every samplepath of a Brownian has an “unbounded” variation on any finite interval ofpositive length. Further insight into the behavior of Brownian paths in termsof their quadratic variation is in Exercise 33.

Because the sample paths of a Brownian motion B have unbounded vari-ation a.s., a stochastic integral

∫[a,b]X(t)dB(t) for almost every sample path

cannot be defined as a classical Riemann-Stieltjes integral. Another approachis used for defining stochastic integrals with respect to a Brownian motionor with respect to a martingale. Such integrals are the basis of the theory ofstochastic differential equations.

5.12 Brownian Bridge Process

We will now study a special Gaussian process called a Brownian bridge.Such a process is equal in distribution to a Brownian motion on [0, 1] that isrestricted to hit 0 at time 1. An important application is its use in the non-parametric Kolmogorov-Smirnov statistical test that a random sample comesfrom a specified distribution. In particular, for large samples, the normalizeddifference between the empirical distribution and the true distribution is ap-proximately the maximum of a Brownian bridge.


Throughout this section X(t) : t ∈ [0, 1] will denote a stochastic processon R, and B(t) will denote a standard Brownian motion. The process X isa Brownian bridge if it is a Gaussian process with mean 0 and covariancefunction

E[X(s)X(t)] = s(1 − t), 0 ≤ s ≤ t ≤ 1.

Such a process is equal in distribution to the following Brownian motion“tied down” at 1.

Proposition 58. The process X(t) = B(t)− tB(1), t ∈ [0, 1], is a Brownianbridge.

Proof. This follows since X is clearly a Gaussian process with zero mean and

E[X(s)X(t)] = E[B(s)B(t) − tB(s)B(1)] − sE[B(1)B(t) − tB(1)2]= s(1 − t), s < t.

The last equality uses E[B(u)B(v)] = u, for u ≤ v.

Because of its relation to Brownian motion, many basic properties of aBrownian bridge X can be related to those of Brownian motion. For instance,X has continuous paths that are not differentiable. Note that the negation−X(t), and time reversal X(1 − t) are also Brownian bridges; related ideasare in Exercises 49 and 50.

We will now show how a Brownian bridge is a fundamental process relatedto empirical distributions. Suppose that ξ1, ξ2, . . . are i.i.d. random variableswith distribution F . The empirical distribution associated with ξ1, . . . , ξn is

Fn(t) = n−1n∑

k=1

1(ξk ≤ x), x ∈ R, n ≥ 1.

This function is an estimator of F based on n samples from it. The estimatoris unbiased since clearly E[Fn(x)] = F (x). It is also a consistent estimatorsince by the classical SLLN,

Fn(x) → F (x) a.s. as n→ ∞. (5.37)

This convergence is also uniform in x as follows.

Proposition 59. (Glivenko-Cantelli) The empirical distributions satisfy

supx

|Fn(x) − F (x)| → 0 a.s. as n→ ∞.

Proof. Consider any −∞ = x1 < x2 < · · · < xm = ∞, and note that since Fand Fn are nondecreasing, for x ∈ [xk−1, xk],

Fn(xk−1) − F (xk) ≤ Fn(x) − F (x) ≤ Fn(xk) − F (xk−1).

5.12 Brownian Bridge Process 381

Then

supx

|Fn(x) − F (x)| ≤ maxk

|Fn(xk−1) − F (xk)| + maxk

|Fn(xk) − F (xk−1)|.

Letting n → ∞ and letting the differences xk − xk−1 tend to 0, and thenapplying (5.37) to the preceding display proves the assertion for continuousF . Exercise 40 proves the assertion when F is not continuous.

An important application of empirical distributions concerns the followingnonparametric text that a sample comes from a specified distribution.

Example 60. Kolmorogov-Smirnov Statistic. Suppose that ξ1, ξ2, . . . are i.i.d.random variables with a distribution F that is unknown. As mentioned above,the empirical distribution Fn(x) is a handy unbiased, consistent estimatorof F . Now, suppose we want to test the simple hypothesis H0 that the sampleis from a specified distribution F , versus the alternative hypothesis H1 thatthe sample is not from this distribution. One approach is to use the classicalchi-square test.

Another approach is to use a test based on the Kolmogorov-Smirnov statis-tic defined by

Dn =∑

x

|Fn(x) − F (x)|.

This is a measure of the distance between the empirical distribution Fn andF (which for simplicity we assume is continuous). The test would reject H0 ifDn > c, and accept it otherwise. The c would be determined by the probabil-ity PDn > c|H0 = α, for a specified level of significance α. The conditioningon H0 means conditioned that F is the true distribution.

When n is large, one can compute c by using the approximation

Pn1/2Dn ≤ x|H0 ≈ P sup0≤t≤1

|B(t) − tB(1)| ≤ x

= 2∞∑

k=1

(−1)k+1e−2k2x2.

This approximation follows from Theorem 61 below, and the summation for-mula is from [37].

We will now establish the limiting distribution of the Kolmogorov-Smirnovstatistic.

Theorem 61. The empirical distribution Fn associated with a sample fromthe distribution F satisfies

n1/2 supx

|Fn(x) − F (x)| d→ sup0≤t≤1

|X(t)|, (5.38)

where X is a Brownian bridge.


Proof. From Exercise 40, we know that Fn = Gn(F (·)) and

supx

|Fn(x) − F (x)| = sup0≤t≤1

|Gn(t) − t|,

where Gn(t) = n−1∑n

k=1 1(Uk ≤ t) is the empirical distribution of the Un,which are i.i.d. with a uniform distribution on [0, 1]. The ξn and Un aredefined on the same probability space.

In light of this observation, assertion (5.38) is equivalent to

n−1/2‖Yn‖ d→ ‖X‖,

where Yn(t) =∑nk=1(1(Uk ≤ t) − t), 0 ≤ t ≤ 1, and ‖x‖ = supt≤1 |x(t)|,

for x ∈ D. To prove this convergence, it suffices by the continuous-mappingtheorem to show that n−1/2Yn

d→ X in D, since the map x→ ‖x‖ from D toD is continuous (in the uniform topology).

Let κn be a Poisson random variable with mean n that is independent ofthe Uk. We will prove n−1/2Yn

d→ X based on Exercise 53 by verifying

n−1/2Yκn

d→ X, (5.39)

n−1/2‖Yn − Yκn‖P→ 0. (5.40)

Letting Nn(t) =∑κn

k=1 1(Uk ≤ t), where Nn(1) = κn, we can write

n−1/2Yκn(t) = n−1/2(Nn(t) − nt) − tn−1/2(Nn(1) − n).

Now Nn is a Poisson process on [0, 1] with rate n by the mixed-sample rep-resentation of Poisson processes in Theorem 26 of Chapter 3. Then fromthe functional central limit theorem for renewal processes in Corollary 51,the process n−1/2(Nn(t) − nt) converges in distribution in D to a Brownianmotion B.

Applying this to the preceding display, it follows that the process n−1/2Yκn(t)converges in distribution in D to the process B(t) − tB(1), which is a Brow-nian bridge. This proves (5.39).

Next, note that

n−1/2‖Yn − Yκn‖d= n−1/2 sup

0≤t≤1

∣∣∣

|n−κn|∑

k=1

(1(Uk ≤ t) − t)∣∣∣

= n−1/2|κn − n|Zn, (5.41)

where Zn = sup0≤t≤1 |G|κn−n|(t) − t|. Since κn is the sum of n i.i.d. Poissonrandom variables with mean 1, it follows by the classical central limit theoremthat n−1/2|κn − n| d→ |B(1)|. This convergence also implies |κn − n| P→ ∞.Now sup0≤t≤1 |Gn(t) − t| P→ 0 by Proposition 59 and so this convergence

5.13 Geometric Brownian Motion 383

is also true with n replaced by |κn − n|; that is, ZnP→ 0. Applying these

observations to (5.41) verifies (5.40), which completes the proof.

5.13 Geometric Brownian Motion

This section describes a geometric Brownian and related processes that areused for modeling stock prices or values of investments.

Let X(t) denote the price of a stock (commodity or other financial in-strument) at time t. Suppose the value of the stock has many small up anddown movements due to continual trading. One possible model is a Brownianmotion with drift X(t) = x + μt + σB(t). This might be appropriate as acrude model for local or short-time behavior. It is not very good, however,for medium or long term behavior, since the stationary increment propertyis not realistic (e.g., a change in price for the stock when it is $50 should bedifferent from the change when the value is $200).

A more appropriate model for the stock price, which is used in practice, is

X(t) = xeμt+σB(t). (5.42)

Any process equal in distribution to X is a geometric Brownian motion withdrift μ and volatility σ. Since E[eαB(t)] = eα

2t/2, the moments of X(t) aregiven by

E[X(t)k] = xkekμt+k2tσ2/2, k ≥ 1.

For instance,

E[X(t)] = xeμt+tσ2/2 = x[1 + (μ+ σ2/2)t] + o(t) as t ↓ 0.

The X is a diffusion process that satisfies the differential property

dX(t) = (μ+ σ2/2)X(t)dt+ σX(t)dB(t).

We will not prove this characterization, but only note that by the momentformula above, it follows that the instantaneous drift and diffusion parametersfor X are

μ(x, t) = (μ+ σ2/2)x, σ(x, t) = σ2x2.

Although the geometric Brownian motion X does not have stationary in-dependent increments, it does have a nice property of ratios of the increments.In particular, the ratio at the end and beginning of any time interval [s, s+ t]is

X(t+ s)/X(s) = eμt+σ(B(s+t)−B(s)) d= eμt+σB(t),

so its distribution is independent of s. Also, these ratios over disjoint equal-length time intervals are i.i.d. This means that as a model for a stock price,


one cannot anticipate any upward or downward movements in the price “ra-tios”. So in this sense, the market is equitable (or not biased).

Does this also mean that the market is fair in the martingale sense thatX(t) is a martingale with respect to B? The answer is generally no.

However, X is a martingale if and only if μ + σ2/2 = 0 (a very specialcondition). This follows since e−t(μ+σ2/2)X(t) is a martingale with respectto B with mean x by Example 19 (and E[X(t)] = x when X(t) is such amartingale).

The geometric Brownian model (5.42) has continuous paths that do notaccount for large discrete-jumps in stock prices. To incorporate such jumps,another useful model is as follows.

Example 62. Prices with Jumps. Suppose the price of a stock at time t is givenbyX(t) = eY (t), where Y (t) is a real-valued stochastic process with stationaryindependent increments (e.g., a compound Poisson or Levy process). Theseproperties of Y also ensure that the price ratios are i.i.d. in disjoint, fixed-length intervals.

Assume as in Exercise 7 that the moment generating function ψ(α) =E[eαY (1)] exists for α in a neighborhood of 0, and E[eαY (t)] is continuous att = 0 for each α. Then it follows that

E[X(t)k] = ψ(k)t, k ≥ 1.

In particular, if Y (t) is a compound Poisson process with rate λ and itsjumps have the moment generating function G(α), then ψ(α) = e−λ(1−G(α)).Consequently,

E[X(t)k] = e−λt(1−G(k)), k ≥ 1.

Other possibilities are that Y is a sum of a Brownian motion plus anindependent compound Poisson process, or that X is the sum of a geometricBrownian motion plus an independent compound Poisson process.

We will not get into advanced investment models using geometric Brow-nian motion such as Black-Scholes option pricing. However, the followingillustrates an elementary computation for an option.

Example 63. Stock Option. Suppose that the price of one unit of a stock attime t is given by a geometric Brownian motion X(t) = eB(t). A customerhas the option of buying one unit of the stock at a fixed time T at a priceK, but the customer need not make the purchase. The value of the optionto the customer is (X(t) −K)+ since the customer will not buy the stock ifX(t) < K. We will disregard any fee that the customer would pay in orderto obtain the option.

The expectation of the option’s value is

E[(X(T )−K)+] =∫ ∞

0

PX(T )−K > x dx

=∫ ∞

0

PB(T ) > log(x+K) dx.

5.14 Multidimensional Brownian Motion 385

This integral can be integrated numerically by using an approximation forthe normal distribution of B(T ). A variation of this option is in Exercise 39.

5.14 Multidimensional Brownian Motion

Brownian motions in the plane and in multidimensional spaces are natu-ral models for phenomena driven by several independent (or dependent)single-dimension Brownian motions. This section gives some insight into thesemultidimensional processes.

A stochastic process B(t) = (B1(t), . . . , Bd(t)), t ≥ 0, in Rd is a multidi-

mensional Brownian motion if B1, . . . , Bd are independent Brownian motionson R. Many basic properties of this process follow from results in one dimen-sion. For instance, the multidimensional integral formula

∫

Rd

Px+B(t) ∈ Adx = |A|,

the Lebesgue measure of A, follows from the similar formula for d = 1 inExercise 6. The preceding integral is used in Section 5.15 for particle systems.

Applications of Brownian motions in Rd typically involve intricate func-

tions of the single-dimension components whose distributions determine sys-tem parameters (e.g., Exercise 54). Here is another classical application.

Example 64. Bessel Processes. Associated with a Brownian motion B(t) inRd, consider its radial distance to the origin defined by

R(t) =(B1(t)2 + · · · +Bd(t)2

)1/2, t ≥ 0.

Any process equal in distribution to R is a Bessel process of order d.When d = 1, we have the familiar reflected Brownian motion process

R(t) = |B(t)|. Exercise 19 mentioned that this is a Markov process and itspecifies its distribution (also recall Theorem 11).

The R(t) is also a Markov process on R for general d. Its transition prob-ability PR(t) ∈ A|R(0) = x =

∫A p

t(x, y) dy has the density

pt(x, y) = t−1(xy)1−d/2yd−1Id/2−1(xy/t),

where Iβ is the modified Bessel function of order β > −1 defined by

Iβ(u) =∞∑

k=0

(u/2)2k+β

k!Γ (k + β + 1), u ∈ R.

This is proved in [61]. We will only derive the density of R(t) when R(0) = 0.To this end, consider


R(t)2/t = (B1(t)2 + · · · +Bd(t)2)/td= B1(1)2 + · · · +Bd(1)2.

The last sum of squares of d independent standard normal random variablesis known to have a χ-squared density f with d degrees of freedom. This f isa gamma density with parameters α = d/2 and λ = 1/2 (see the Appendix).Therefore, knowing that R(0) = 0,

PR(t) ≤ r = PR(t)2/t ≤ r2/t =∫ r2/t

0

f(x) dx. (5.43)

The density of R(t) is shown in Exercise 55.Although the hitting times of R(t) are complicated, we can evaluate their

means. Consider the time τa = inft ≥ 0 : R(t) = a to hit a value a > 0. Thisis a stopping time of R(t) and τa ≤ inft ≥ 0 : |B1(t)| = a < ∞ a.s. sincethe last stopping time is finite a.s. as noted in Theorem 11. Now, Exercise 56shows that R(t)2− t is a martingale with mean 0. Then the optional stoppingresult in Corollary 24 yields E[R(τa)2 − τa] = 0. Therefore E[τa] = a2.

We will now consider a multidimensional process whose components are“dependent” one-dimensional Brownian motions with drift. Let B(t) be aBrownian motion in R

d, and let C = cij be a d × d matrix of nonneg-ative real numbers that are symmetric (cij = cji) and nonnegative-definite(∑

i

∑j uiujcij ≥ 0, for u ∈ R

d). As in the representation (5.8) of a multi-variate normal vector, let A be a k × d matrix with transpose At and k ≤ dsuch that AtA = C. Consider the process X(t) : t ≥ 0 in R

d defined by

X(t) = x+ μt+B(t)A,

where x and μ are in Rd.

Any process equal in distribution to X is a generalized Brownian motionin R

d with initial value x, drift μ and covariance matrix C = AtA.A major use for multidimensional Brownian motions is in approximat-

ing multidimensional random walks. The following result is an analogue ofDonsker’s Brownian motion approximation for one-dimensional random walksin Theorem 40.

Suppose that ξk, k ≥ 1, are i.i.d. random vectors in Rd with mean vector

μ = (μ1, . . . , μd) and covariances cij = E[(ξk,i − μi)(ξk,j − μj)], 1 ≤ i, j ≤ d.Define the processes Xn(t) : t ≥ 0 in R

d, for n ≥ 1, by

Xn(t) = n−1/2

nt∑

k=1

(ξk − μ), t ≥ 0.

Theorem 65. Under the preceding assumptions, Xnd→ X, as n→ ∞, where

X is a generalized Brownian motion on Rd starting at 0, with no drift, and

with covariance matrix cij.

5.15 Brownian/Poisson Particle System 387

Sketch of Proof. Consider Xn,i(t) = n−1/2∑nt

k=1 (ξk,i − μi), which is the ith

component of Xn. By Donsker’s theorem, Xn,id→ Xi for each i. Now, the

Cramer-Wold theorem states that (Xn,1, . . . , Xn,d)d→ (X1, . . . , Xd) in R

d

if and only if∑d

i=1 aiXn,id→

∑di=1 aiXi in R for any a ∈ R

d. However,the latter holds by another application of Donsker’s theorem. Therefore, thefinite-dimensional distributions of Xn converge to those of X . To completethe proof that Xn

d→ X , it suffices to verify a certain tightness condition onthe distributions of the processes Xn, which we omit.

5.15 Brownian/Poisson Particle System

This section describes a system in which particles occasionally enter an Eu-clidean space and move about independently as Brownian motions and even-tually exit. The system data and dynamics are represented by a markedPoisson process like those in Chapter 2. The focus is on characterizing cer-tain Poisson processes describing particle locations over time and departuresas intricate functions of the arrival process and particle trajectories. TheBrownian motion structure of the trajectories lead to tractable probabilities.

Consider a system of discrete particles that move about in the space Rd

as follows. The locations and entry times of the particles are represented bythe space-time Poisson process N =

∑n δ(Xn,Tn) on R

d × R, where Xn isthe location in R

d at which the nth particle enters at time Tn. This Poissonprocess is homogeneous in that

E[N(A× I)] = α|A|λ|I|,

where |A| is the Lebesgue measure of the Borel set A. Here λ is the arrivalrate of particles per unit time in any unit area, and α is the arrival rate perunit area in any unit time period. Note that, for bounded sets A and I, theN(A × I) is finite, but N(Rd × I) and N(A × R) are infinite a.s. (becausetheir Poisson means are infinite).

We assume that each particle moves in Rd independently as a d-dimensional

Brownian motion B(t), t ≥ 0, for a length of time V with distribution G andthen exits the space.

More precisely, let Vn, n ∈ Z, be independent with Vnd= V , and let Bn,

n ∈ Z, be independent with Bnd= B. Assume Bn, Vn are independent

and independent of N . Viewing the Bn and Vn as independent marks of(Xn, Tn), the data for the entire system is defined formally by the markedPoisson process

M =∑

n

δ(Xn, Tn, Bn, Vn), on S = Rd × R × C(R,Rd) × R+.


Here C(R,Rd) denotes the set of continuous functions from R to Rd. The

mean measure of M is given by

E[M(A× I × F × J)] = α|A|λ|I|PB ∈ FPV ∈ J.

The interpretation is that the nth particle has a space-time entry at(Xn, Tn) and its location at time t is given by Xn + Bn(t − Tn), wheret − Tn ≤ Vn. At the end of its sojourn time Vn it exits the system at timeTn + Vn from the location Xn +Bn(Vn).

Let us see where the particles are at any time t. It is not feasible to accountfor all the particles that arrive up to time t, which is N(Rd × (−∞, t]) = ∞.So we will consider particles that enter in a bounded time interval I prior tot, which is t− I (e.g., t− [a, b] = [t− b, t− a]).

Now, the number of particles that enter Rd in a time interval I prior to t

and are in A at time t is

Nt(I ×A) =∑

n

δ(Tn,Xn+Bn(t−Tn))(I ×A)1(Vn > t− Tn).

The Nt is a point process on R+ × Rd.

Proposition 66. The family of point processes Nt : t ∈ R is stationary int, and each Nt is a Poisson process on R+ × R

d with mean measure

E[Nt(I ×A)] = αλ|A|∫

I

(1 −G(u))du. (5.44)

Proof. By the form of its mean measure, the Poisson process M with its timeaxis shifted by an amount t is

StM =∑

n

δ(Xn,Tn+t,Bn,Vn)d= M, t ∈ R.

Therefore, M is stationary in the time axis. To prove that Nt is stationaryin t, it suffices by Proposition 104 in Chapter 3 to show that Nt = f(StM),for some function f .

Accordingly, for a locally-finite counting measure ν =∑

n δ(xn,tn,bn,vn) onS, define the counting measure f(ν) on R+ × R

d by

f(ν) =∑

n

δ(−tn,xn+bn(−tn))1(vn > −tn).

Then clearly, Nt = f(StM), which proves that Nt is stationary.Next, note that Nt is a deterministic map of the Poisson process M re-

stricted to the subspace (x, s, b, v) ∈ S : s ≤ t, v > t−s, in which any point(x, s, b, v) in the subspace is mapped to (s, x+ b(t−s)). Then by Theorem 32in Chapter 2, Nt is a Poisson process with mean measure given by

5.16 G/G/1 Queues in Heavy Traffic 389

E[Nt(I ×A)] = αλ

∫

t−I

( ∫

Rd

Px+B(t− s) ∈ A dx)PV > t− sds.

Because B is a Brownian motion, the integral in parentheses reduces to|A| by Exercise 6. Therefore, using the change of variable u = t − s in thelast expression yields (5.44).

Next, let us consider departures from the system. The number of particlesthat enter R

d during the time set I and depart from A during the time setJ is N(I ×A× J), where N is a point process of the form

N =∑

n

δ(Tn,Xn+Bn(Vn),Tn+Vn) on (s, x, u) ∈ R × Rd × R : s ≤ u.

Proposition 67. The point process of departures N is a Poisson process withmean measure given by

E[N(I ×A× J)] = αλ|A|∫

R+

|I ∩ (J − v)|dG(v). (5.45)

Proof. By its definition, N is a deterministic map g of the Poisson processM ,where g(Xn, Tn, Bn, Vn) = (Tn, Xn+Bn(Vn), Tn +Vn). Then by Theorem 32in Chapter 2, N is a Poisson process with mean

E[N(I×A×J)] = αλ

∫

I

∫

R+

1(s+v ∈ J)( ∫

Rd

Px+B(v) ∈ A dx)dG(v) ds.

The integral in parentheses reduces to |A| by Exercise 6. Then an interchangeof the order of integration in the last expression yields (5.45).

There are several natural generalizations of the preceding model with moredependencies among the marks and entry times and points, e.g., see Exercise51. Although the processes Nt and N may still be Poisson, their mean valueswould be more complicated.

5.16 G/G/1 Queues in Heavy Traffic

Section 4.20 of Chapter 4 showed that the waiting times Wn for successiveitems in a G/G/1 queueing system are a function of a random walk. Thissuggests that the asymptotic behavior of these times can be characterizedby the Donsker Brownian motion approximation of a random walk, and thatis what we shall do now. We first describe the limit of Wn when the trafficintensity ρ = 1, and then present a more general FCLT for the Wn when thesystem is in heavy traffic: the traffic intensity is approximately 1.

Consider a G/G/1 queueing system, as in Section 4.20 of Chapter 4, inwhich items arrive at times that form a renewal process with inter-arrival


times Un, and the service times are i.i.d. nonnegative random variables Vnthat are independent of the arrival times. The service discipline is first-come-first-served with no preemptions. The inter-arrival and service times havefinite means and variances, and the traffic intensity of the system is ρ =E[V1]/E[U1]. For simplicity, assume the system is empty at time 0.

Our interest is in the length of time Wn that the nth arrival waits in thequeue before being processed. Section 4.20 of Chapter 4 showed that thesewaiting times satisfy the Lindley recursive equation

Wn = (Wn−1 + Vn−1 − Un)+, n ≥ 1,

and consequently,

Wn = max0≤m≤n

n∑

k=m+1

(Vk−1 − Uk). (5.46)

Under the assumptions on the Un and Vn, it follows that

Wnd= max

0≤m≤nSm, (5.47)

where Sn =∑nm=1 ξm and ξm = Vm − Um.

In case ρ < 1, Theorem 118 of Chapter 4 noted that

Wnd→ max

0≤m<∞Sm.

In this section, we consider the limiting behavior of the waiting times Wn

when ρ equals or approaches 1, meaning that the system is in heavy traffic.We begin with the case ρ = 1, and describe the asymptotic behavior of

the waiting times via the process

Wn(t) =Wntσ√n, t ≥ 0.

Theorem 68. Suppose the G/G/1 system defined above has ρ = 1 and σ2 =Var(ξ1) > 0. Then

Wnd→M in D(R+) as n→ ∞,

where M(t) = maxs≤tB(s), the maximum process for a standard Brownianmotion B. Hence

Wn

σ√n

d→M(1) d= |B(1)|.

Proof. Note that

5.16 G/G/1 Queues in Heavy Traffic 391

Wn(t) d=1

σ√n

maxm≤nt

Sm =1

σ√n

sups≤t

S[ns].

That is, Wn(t) d= f(Xn)(t), t ≥ 0, where

Xn(t) =Sntσ√n, t ≥ 0,

and f : D(R+) → D(R+) is the supremum map defined by

f(x)(t) = sup0≤s≤t

x(s), x ∈ D(R+).

Now the random walk Sn has steps with mean E[ξ1] = 0, since ρ = 1; andσ2 = Var(ξ1). Then Xn

d→ B by Donsker’s theorem.Next, it is clear that if ‖xn − x‖ → 0 in D[0, T ], then

‖f(xn) − f(x)‖ ≤ ‖xn − x‖ → 0 in D[0, T ].

Then since ‖Xn −B‖ P→ 0 in D[0, T ] for each T , it follows that

‖f(Xn) − f(B)‖ P→ 0 in D(R+).

This along with Wn = f(Xn) and f(B) = M proves Wnd→M .

In particular, Wn(1) d→M(1) d= |B(1)|, which proves the second assertion;that M(1) d= |B(1)| follows by Theorem 11.

The preceding result suggests that for any G/G/1 system in which ρ ≈ 1,

the approximation Wnd≈M would be valid. A formal statement to this effect

is as follows.Consider a family of G/G/1 systems indexed by a parameter r with inter-

arrival times Urn and service times V rn . Denote the other quantities by ρr,W rn , Srn =

∑nm=1 ξ

rm, etc., and consider the process

Wr(t) =W r

rtσ√r, t ≥ 0.

Theorem 69. Suppose the family of G/G/1 systems are such that ρr → 1,

suprE[(ξr1 − E[ξr1 ])2+ε] <∞, for some ε > 0,

r1/2E[ξr1 ] → 0, Var(ξr1) → σ2 > 0, as r → ∞.

Then Wrd→M in D(R+) as r → ∞.


Proof. As in the proof of Theorem 68, Wr = f(Xr), where f is the supremummap and Xr(t) = Srt/σ

√r, t ≥ 0. Then to prove the assertion, it suffices

to show that Xrd→ B as r → ∞.

Now, we can write

Xr(t) = Yr(t) + (rt/r)r1/2E[ξr1 ],

where

Yr(t) =1

σ√r

rt∑

m=1

(ξrm − E[ξr1 ]).

Under the hypotheses, Yrd→ B by a theorem of Prokhorov 1956, and hence

Xrd→ B.

The preceding results are typical of many heavy-traffic limit theorems thatone can obtain for queueing and related processes by the framework presentedby Whitt [115]. In particular, when a system parameter, such as the waitingtime above, can be expressed as a function of the system data (cumulativeinput and output processes), and that data under an appropriate normaliza-tion converges in distribution, then under further technical conditions, thesystem parameter also converges in distribution to the function of the limitsof the data. Here is one of the general models in [115].

Example 70. Generalized G/G/1 System. Consider a generalization of theG/G/1 systems above in which the inter-arrival times Urn and service timesV rn (the system data) are general random variables that may be dependent.Then the waiting times W r

n can still be expressed as a function of the systemdata as in (5.46). In other words,

W rn = Sn − min

0≤m≤nSm

where Sn =∑nk=1(Vk−1 − Uk). As above, consider the processes

Wr(t) =W r

rtσ√r, Xr(t) =

Srtσ√r, t ≥ 0.

Then we can write Wr = h(Xr), where h : D(R) → D(R) is the one-sidedreflection map defined by

h(x)(t) = x(t) − inf0≤s≤t

x(s), t ≥ 0.

The reflection map h (like the supremum map above) is continuous in theuniform topology on D[0, T ] since

‖h(x) − h(y)‖ ≤ 2‖x− y‖, x, y ∈ D[0, T ].

5.17 Brownian Motion in a Random Environment 393

Then the continuous-mapping theorem yields the following result.

Convergence Criterion. If Xrr→ X in D(R), then Wr

d→ W in D(R) asr → ∞, where

W (t) = X(t) − inf0≤s≤t

X(s), t ≥ 0.

To apply this for a particular situation, one would use properties of theinter-arrival times and service times (as in Theorem 69) to verify Xr

r→ X .There are a variety of conditions under which the limit X is a Brownianmotion, a process with stationary independent increments, or an infinitelydivisible process; and other topologies on D(R+) are often appropriate [115].

5.17 Brownian Motion in a Random Environment

Section 3.14 describes a Poisson process with a random intensity measurecalled a Cox process. The random intensity might represent a random envi-ronment or field that influences the locations of points. This section describesan analogous randomization for Brownian motions in which the time scale isdetermined by a stochastic process.

Let X(t) : t ∈ R+ and η = η(t) : t ∈ R+ be real-valued stochasticprocesses defined on the same probability space, such that η(t) is a.s. non-decreasing with η(0) = 0 and η(t) → ∞ a.s. as t → ∞. The process X isa Brownian motion directed by η if the increments of X are conditionallyindependent given η, and, for any s < t, the increment X(t) − X(s) has aconditional normal distribution with variance τt − τs. These conditions, interms of the moment generating function for the increments of X , say that,for 0 = t0 < t1 < · · · < tn and u1, . . . , un in R+,

E[exp

n∑

i=1

ui[X(ti) −X(ti−1)] ∣∣∣ η

](5.48)

= exp1

2

n∑

i=1

u2i [η(ti) − η(ti−1)]

a.s.

A directed Brownian motion is equal in distribution to a standard Brow-nian motion with random time parameter as follows.

Remark 71. A process X is a Brownian motion directed by η if and only ifX

d= B η′, where B and η′ are defined on a common probability spacesuch that B is a standard Brownian motion independent of η′, and η′

d=η. This follows by the definition above and consideration of the momentgenerating function of the increments of the processes. The process B η′ is a


Brownian motion subordinated to η (like a Markov chain subordinated to aPoisson process, which we saw in Chapter 4). In case η is strictly increasing,Exercise 43 shows that X = B η a.s., where B is defined on the sameprobability space as X and η.

A Brownian motion X directed by η inherits many properties of standardBrownian motions. The proofs usually follow by conditioning on η and usingproperties of this process. Here are some examples.

Example 72. E[X(t)] = 0, and Var[X(t)] = E[η(t)].

Example 73. Consider τXa = inft : X(t) ≥ a. Then

PτXa ≤ t =∫

R+

Pη(t) ≤ uPτa ∈ du,

where τa = inft : B(t) = a.

Example 74. Suppose that X1, . . . , Xm are Brownian motions directed byη1, . . . , ηm, respectively, and (X1, η1), . . . , (Xm, ηm) are independent. ThenX(t) = X1(t) + · · · + Xm(t) is a Brownian motion directed by η(t) =η1(t) + · · · ηm(t).

Example 75. FCLT. For a Brownian motion X directed by η, define

Xr(t) = b−1/2r X(rt) t ≥ 0,

where br → ∞ are constants. By Remark 71 and the scaling b−1/2r B

d= B(br ·),we can write Xr

d= B Yr, where Yr(t) = η′(rt)/br and the Brownian motionB and η′ are independent. Then by the property of the composition mappingin Proposition 48, we obtain the following result, where I is the identityfunction: If Yr

d→ I in D(R+), then Xrd→ B in D(R+).

5.18 Exercises

For the following exercises, B will denote a standard Brownian motion.

Exercise 1. Show that each of the following processes is also a Brownianmotion.(a) B(t+ s) −B(s) Translated process for a fixed time s.(b) −B(t) Reflected process(c) c−1/2B(ct) Scaling, for c > 0

(d) B(T ) −B(T − t), t ∈ [0, T ], Time-reversal on [0, T ] for fixed T .

5.18 Exercises 395

Exercise 2. The time-inversion of a Brownian motion B is the processX(0) = 0,

X(t) = tB(1/t), t > 0.

Prove that X is a Brownian motion. First show that X(t) → 0 a.s. as t ↓ 0.

Exercise 3. Suppose that h : R+ → R+ is a continuous, strictly increasingfunction with h(0) = 0, and h(t) ↑ ∞. Find the mean and covariance functionsfor the process B(h(t)). Show that B(h(t)) d= X(t) for each t, where X(t) =(h(t)/t)−1/2B(t). Are the processes B(h(·)) and X equal in distribution?

Exercise 4. For 0 < s < t, show that the conditional density of B(s) givenB(t) = b is normal with conditional mean and variance

E[B(s)|B(t) = b] = bs/t, Var[B(s)|B(t) = b] = s(t− s)/t.

For t1 < s < t2, show that

PB(s) ≤ x|B(t1) = a, B(t2) = b = PB(s−t1) ≤ x−a|B(t2−t1) = b−a.

Using these properties prove that the conditional density of B(s) givenB(t1) = a, B(t2) = b is normal with conditional mean and variance

a+ (b− a)(s− t1)(t2 − t1)

and(s− t1)(t2 − s)

(t2 − t1).

Exercise 5. Consider the process X(t) = ae−αt + σ2B(t), t ≥ 0, wherea, α ∈ R and σ > 0. Find the mean and covariance functions for this pro-cess. Show that X is a Gaussian process by applying Theorem 5. Does Xhave independent increments, and are these increments stationary? Is X amartingale, submartingale or supermartingale with respect to B?

Exercise 6. For a density f on R that is symmetric (f(x) = f(−x), x ∈ R),show that

∫

R

( ∫

A−xf(y)dy

)dx = |A| (The Lebesgue measure of A).

Use this to verify that, for a Brownian motion B(t),∫

R

Px+ μt+ σB(t) ∈ Adx = |A|,

independent of t, μ and σ.

Exercise 7. Let Y (t) be a real-valued process with stationary independentincrements. Assume that the moment generating function ψ(α) = E[eαY (1)]exists for α in a neighborhood of 0, and g(t) = E[eαY (t)] is continuous att = 0 for each α. Show that g(t) is continuous in t for each α, and that


g(t) = ψ(α)t. Use the fact that g(t + u) = g(t)g(u), t, u ≥ 0, and that theonly continuous solution to this equation has the form g(t) = etc, for some cthat depends on α (because g(t) depends on α). Show that

E[Y (t)] = tE[Y (1)], Var[Y (t)] = tVar[Y (1)].

Exercise 8. Let X(t) = μt+ σB(t) be a Brownian motion with drift, whereμ > 0. Suppose that a signal is triggered whenever M(t) = sups≤tX(s)reaches the levels 0, 1, 2, . . .. So the nth signal is triggered at time τn =inft : M(t) = n, n ≥ 0. Show that these times of the signals form arenewal process and find the mean, variance and Laplace transform of thetimes between signals.

In addition, obtain this information under the variation in which a signalis triggered whenever M(t) reaches the levels 0 = L0 < L1 < L2 < . . . , whereLn − Ln−1 are independent exponential random variables with rate λ.

Exercise 9. When is a Gaussian Process Stationary? Recall that a stochasticprocess is stationary if its finite-dimensional distributions are invariant undershifts in time. This is sometimes called strong stationarity. A related notionis that a real-valued process X(t) : t ≥ 0 is weakly stationary if its meanfunction E[X(t)] is a constant and its covariance function Cov(X(s), X(t))depends only on |t−s|. Weak stationarity does not imply strong stationarity.However, if a real-valued process is strongly stationary and its mean andcovariance functions are finite, then the process is weakly stationary. Showthat a Gaussian process is strongly stationary if and only if it is weaklystationary.

Exercise 10. Suppose that Yn, n ∈ Z, are independent normally distributedrandom variables with mean μ and variance σ2. Consider the moving averageprocess

Xn = a0Yn + a1Yn−1 + · · · + amYn−m, n ∈ Z,

where a0, . . . , am are real numbers. Show that Xn : n ∈ Z is a Gaussian pro-cess that is stationary and specify its mean and covariance functions. Justifythat this process is not Markovian and does not have stationary independentincrements.

Exercise 11. Derive the mean and variance formulas in (5.14) for M(t).

Exercise 12. Equal Cruise-Control Settings. Two autos side by side on ahighway moving at 65 mph attempt to move together at the same speed bysetting their cruise-control devises at 65 mph. As in many instances, naturedoes not always correspond to one’s wishes and the actual cruise-controlsettings are independent normally distributed random variables V1 and V2

with mean μ = 65 and standard deviation σ = 0.4. Find the probability thatthe autos move at the same speed. Find P|V1 − V2| < .3.

5.18 Exercises 397

Exercise 13. Letting τa = inft : B(t) = a, find the probability that B hits0 in the time interval (τa, τb), where 0 < a < b.

Exercise 14. Arc sine Distribution. Let U = sin2θ, where θ has a uniformdistribution on [0, 2π]. Verify that PU ≤ u = arcsin

√u, u ∈ [0, 1], which

is the arc sine distribution.Let X1, X2 be independent normally distributed random variables with

mean 0 and variance 1. Show that X21/(X

21 + X2

2 ) d= U . Hint: In the inte-gral representation for PX2

1/(X21 +X2

2 ) ≤ u, use polar coordinates where(x1, x2) is mapped to r = (x2

1 + x22)1/2 and θ = arctanx2/x1.

Is it true that U d= 1 − U?

Exercise 15. Prove Theorem 15 for u = 1. Using this result, find the distri-bution of η = supt ∈ [0, u] : B(t) = 0 for u = 1.

Exercise 16. Suppose that B and B are independent Brownian motions.Find the moment generating function of B(τa) at the time when B hits a,which is τa = inft : B(t) = a. Show that B(τa) : a ∈ R+ considered as astochastic process has stationary independent increments.

Exercise 17. For the hitting time τ = inft > 0 : B(t) ∈ (−a, a), wherea > 0, prove that its Laplace transform is

E[e−λτ ] = 1/ arccos(a√

2λ).

Mimic the proof of Theorem 32 using the facts that B(τ) is independent ofτ , and PB(τ) = −a = PB(τ) = a = 1/2.

Exercise 18. Continuation. In the context of the preceding exercise, verifythat E[τ ] = a2, and E[τ2] = 5a4/3.

Exercise 19. Let M(t) = sups≤tB(s), and consider the process X(t) =M(t) −B(t), t ≥ 0. Show that

X(t) d= M(t) d= |B(t)|, t ≥ 0.

(The processes X and |B(·)| are Markov processes on R+ with the sametransition probabilities, and hence they are equal in distribution. However,they are not equal in distribution to M , since the latter is nondecreasing.)

Show that

PX(t) ≤ z|X(s) = x =∫ z

−∞[ϕ(y − x; t− s) + ϕ(−y − x; t− s)] dy,

where ϕ(x; t) = e−x2/2t/

√2πt. In addition, verify that

PM(t) > a|X(t) = 0 = e−a2/2t.


Exercise 20. Reflection Principle for Processes. Suppose that τ is an a.s.finite stopping time for a Brownian motion B, and define

X(t) = B(t ∧ τ) − (B(t) −B(t ∧ τ)), t ≥ 0.

Prove that X is a Brownian motion. Hint: Show that X d= B by using thestrong Markov property along with the process B′(t) = B(τ + t) − B(τ),t ≥ 0, and the representations

B(t) −B(t ∧ τ) = B′((t− τ)+), B(t) = B(t ∧ τ) +B′((t− τ)+).

Exercise 21. Continuation. For the hitting time τa = inft > 0 : B(t) = a,show that the reflected process

X(t) = B(t)1(τa ≤ t) + (2a−B(t))1(τa > t)

is a Brownian motion. Use the result in the preceding exercise.

Exercise 22. Use the reflection principle to find an expression for

PB(t) > y, mins≤t

B(s) > 0.

Exercise 23. The value of an investment is modeled as a Brownian motionwith drift X(t) = x + μt + σB(t), with an upward drift μ > 0. Find thedistribution of M(t) = mins≤tX(s). Use this to find the distribution of thelowest value M(∞) = inft∈R+ X(t) when x = 0. In addition, find

PX(t)−M(t) > a, a > 0.

Exercise 24. The values of two stocks evolve as independent Brownian mo-tions X1 and X2 with drifts, where Xi(t) = xi + μit+ σiBi(t), and x1 < x2.Find the probability that X2 will stay above X1 for at least s time units.Let τ denote the first time that the two values are equal. Find E[τ ] whenμ1 < μ2 and when μ1 > μ2.

Exercise 25. Show that

PB(1) ≤ x|B(s) ≥ 0, s ∈ [0, 1] = 1 − e−x2/2.

Hint: Consider B(t) = B(1) − B(1 − t) and show that the conditional prob-ability is equal to PB(1) ≤ x|M (1) = B(1), where M(t) = sups≤t B(s).

Exercise 26. Consider a compound Poisson process Y (t) =∑N(t)

n=1 ξn, whereN(t) is a Poisson process with rate λ and the ξn are i.i.d. and independent ofN . Suppose ξ1 has a mean μ, variance σ2 and moment generating functionψ(α) = E[eαξ1 ]. Show that the following are martingales with respect to Y :

5.18 Exercises 399

X1(t) = Y (t) − λμt, X2(t) = (Y (t) − λμt)2 − tλ(μ2 + σ2),X3(t) = eαY (t)−λt(1−ψ(α)), t ≥ 0.

Find the mean E[Xi(t)], for each i.

Exercise 27. Suppose X(t) denotes the stock level of a certain product attime t and the holding cost up to time t is Y (t) = h

∫ t0X(s) ds, where h is

the cost per unit time of holding one unit in inventory. Show that if X is aBrownian motion B, then the mean and covariance functions of Z are

E[Y (t)] = 0, Cov(Y (s), Y (t)) = h2s2(t/2 − s/6), s ≤ t.

Find the mean and covariance functions of Y if X(t) = x + μt + σB(t), aBrownian motion with drift; or if X is a compound Poisson process as in thepreceding problem.

Exercise 28. Prove that Y (t) =∫ t0B(s) ds, t ≥ 0, is a Gaussian process

with mean 0 and E[Y (t)2] = t3/3.Hint: Show that Z =

∑ni=1 uiX(ti) has a normal distribution for any

t1, . . . , tn in R+, and u1, . . . , un in R. Since a Riemann integral is the limitof sums of rectangles, we know that Z = limn→∞ Zn, where

Zn =n∑

i=1

ui

n∑

k=1

(ti/n)B(kti/n).

Justify that each Zn is normally distributed, and that its limit (using momentgenerating functions) must also be normally distributed.

Exercise 29. Continuation. Suppose X(t) = exp∫ t0 B(s) ds, t ≥ 0. Verify

that E[X(t)] = et6/6.

Exercise 30. Let Y = Yn, n ≥ 0 be independent random variables (thatneed not be identically distributed) with finite means. Suppose X0 is a de-terministic function of Y0 with finite mean. Define

Xn = X0 +n∑

i=1

Yi, X ′n = X0

n∏

i=1

Yi, n ≥ 1.

Show that Xn is a discrete-time martingale with respect to Y if E[Yi] = 0.How about if E[Yi] ≥ 0? Is Xn a martingale with respect to itself? What canyou say about X ′

n if the Yi are positive with E[Yi] = 1? or ≥ 1?

Exercise 31. Wald Equation for Discounted Sums . Suppose that ξ0, ξ1, . . .are costs incurred at discrete times and they are i.i.d. with mean μ. Considerthe discounted cost process Zn =

∑nm=0 α

mξm, where α ∈ (0, 1) is a discount


factor. Suppose that τ is a stopping time of the process ξn such that E[τ ] <∞and E[ατ ] exists for some 0 < α < 1. Prove that

E[Zτ ] =μ(1 − αE[ατ ])

(1 − α).

Do this by finding a convenient martingale and applying the optional stoppingtheorem; there is also a direct proof without the use of martingales.

Next, consider the process Sn =∑n

m=0 ξm, n ≥ 0, and show that

E[τ∑

m=0

αmSm] =μE[τ ]1 − α

− αμ(1 − αE[ατ ])(1 − α)2

.

Exercise 32. Continuation. In the preceding problem, are the results trueunder the weaker assumption that ξ0, ξ1, . . . are such that E[ξ0] = μ andE[ξn|ξ0, . . . , ξn−1] = μ, n ≥ 1?

Exercise 33. Quadratic Variation. Consider the quadratic increments

V (t) =∑

i

(B(ti) −B(ti−1))2

over a partition 0 = t0 < t1 < · · · < tk = t of [0, t]. Verify E[V (t)] = t and

Var[V (t)] =∑

i

(ti − ti−1)2Var[B(1)2].

Next, for each n ≥ 1, let Vn(t) denote a similar quadratic increment sum fora partition 0 = tn0 < tn1 < · · · < tnkn = t, where maxk(tnk − tn,k−1) → 0.Show that E[Vn(t)2 − t] → 0 (which says Vn(t) converges in mean squaredistance to t). The function t is the quadratic variation of B in that it is theunique function (called a compensator) such that B(t) − t is a martingale.

One can also show that Vn → t a.s. when the partitions are nested.

Exercise 34. Random Time Change of a Martingale. Suppose that X is amartingale with respect to Ft and that τt : t ≥ 0 is a nondecreasing processof stopping times of Ft that are bounded a.s. Verify thatX(τt) is a martingalewith respect to Ft, and that its mean is E[X(τt)] = E[X(0)].

Exercise 35. Optional Switching. Suppose that Xn and Yn are two martin-gales with respect to Fn that represent values of an investment in a fairmarket that evolve under two different investment strategies. Suppose an in-vestor begins with the X-strategy and then switches to the Y -strategy at ana.s. finite stopping time τ of Fn, and that Xτ = Yτ . Then the investmentvalue would be

5.18 Exercises 401

Zn = Xn1(n < τ) + Yn1(n ≥ τ),

where Xτ is the value carried forward at time τ . Show that there is no benefitfor the investor to switch at τ by showing that Zn is a martingale. Use therepresentation

Zn+1 = Xn+11(n < τ) + Yn+11(n ≥ τ) − (Xτ − Yτ )1(τ = n+ 1).

Exercise 36. Prove that if σ and τ are stopping times of Ft, then so areσ ∧ τ and σ + τ .

Exercise 37. Prove that if X(t) and Y (t) are submartingales with respectto Ft, then so is X(t) ∨ Y (t).

Exercise 38. Consider a geometric Brownian motion X(t) = xeB(t). Findthe mean and distribution of τa = inft : X(t) = a.

Exercise 39. Recall the investment option in Example 63 in which a cus-tomer may purchase a unit of a stock at a price K at time T . Consider thisoption with the additional stipulation that the customer “must” purchasea unit of the stock before time T if its price reaches a prescribed level a,and consequently the other purchase at time T is not allowed. In this set-ting, the customer must purchase the stock at the price a prior to time t,if maxx≤tX(s) = eM(t) > a, where M(t) = maxs≤tB(s). Otherwise, theoption of a purchase at the price K is still available at time T . In this case,the value of the option is

Z = (1 − a)1(M(T ) > log a) + (X(T )−K)+1(M(T ) ≤ log a).

Prove that

E[Z] = 2(1 − a)[1 − Φ(log a/√T )] +

∫ log a

0

∫ y

0

(ey −K)fT (x, y) dx dy,

where ft(x, y) is the joint density of B(t),M(t) and Φ is the standard normaldistribution. Verify that

ft(x, y) =2(2y − x)√

2πt3e−(2y−x)2/2t, x ≤ y, y ≥ 0.

Verify that E[Z] is minimized at the value a at which the integral term equalsthe preceding term. This would be the worst scenario for the customer.

Exercise 40. Prove Proposition 59 when F is not continuous. Use the factfrom Exercise 11 in Chapter 1 that ξn

d= F−1(Un), where Un are i.i.d. withthe uniform distribution on [0, 1]. By Theorem 16 in the Appendix, you canassume the ξn and Un are on the same probability space. Then the empiricaldistribution Gn(t) = n−1

∑nk=1 1(Uk ≤ t) of the Un satisfies Fn = Gn(F (·)).

Conclude by verifying that


supx

|Fn(x) − F (x)| = supt≤1

|Gn(t) − t| → 0 a.s. as n→ ∞,

where the limit is due to Proposition 59 for a continuous distribution.

Exercise 41. Let X be a Brownian motion directed by η. Suppose the pro-cess η has stationary independent nonnegative increments and E[e−αη(t)] =ψ(α)t, where ψ(α) = E[e−αη(1)]. Determine the moment generating functionof X(1) (as a function of ψ) and show that X has stationary independentincrements.

Exercise 42. Show that if Xn is a nonnegative supermartingale, then thelimit X = limn→∞Xn exists a.s. and E[X ] ≤ E[X0]. Use the submartingaleconvergence theorem and Fatou’s lemma.

Exercise 43. Let X be a Brownian motion directed by η, where the pathsof η are strictly increasing a.s. Show that X(t) = B(η(t)), t ∈ R+, where Bis a Brownian motion (on the same probability space as X and η) that isindependent of η.

Hint: Define B(t) = X(η(t)), where η(t) = infs ≥ 0 : η(s) = t. Arguethat η(η(t)) = t and X(t) = B(η(t)), for each t, and that

E[exp

n∑

i=1

ui[B(ti) −B(ti−1)] ∣∣∣ η

]= exp1

2

n∑

i=1

u2i (ti − ti−1) a.s.

Thus B is a Brownian motion and it is independent of η since the last ex-pression is not random.

Exercise 44. As a variation of the model in Section 5.17, a real-valued pro-cess X is a Brownian motion with drift μ and variability σ directed by η if,for 0 = t0 < t1 < · · · < tn and u1, . . . , un in R+,

E[exp

n∑

i=1

ui[X(ti) −X(ti−1)] ∣

∣∣ η

]

= exp n∑

i=1

uiμ[η(ti) − η(ti−1)]12

n∑

i=1

u2iσ

2[η(ti) − η(ti−1)]

a.s.

Show that if t−1η(t) → c a.s. for some c > 0, then

t−1X(t) → cμ, a.s. t−1 maxs≤t

X(s) → cμ a.s.

Exercise 45. Prove Theorem 39 on continuous mappings for “separable”metric spaces by applying the coupling result for the a.s. representation ofconvergence in distribution (Theorem 16 in the Appendix).

5.18 Exercises 403

Exercise 46. Use Donsker’s theorem to prove that

Pn−1/2(Sn − mink≤n

Sk) > x → e−x2/2.

Exercise 47. In the context of Donsker’s theorem, consider the range

Yn = maxk≤n

Sk − mink≤n

Sk

of the random walk. Show that n−1/2Ynd→ Y , where E[Y ] = 2

√2/π. Express

Y as a functional of a Brownian motion.

Exercise 48. FCLT for Markov Chains. Let Yn be an ergodic Markov chainon a countable state space S with stationary distribution π. For a functionf : S → R, consider the process

Xn(t) =1

σ√n

nt∑

k=1

[f(Yk) − a], t ∈ [0, 1].

Specify assumptions (and a, σ) under which Xnd→ B as n→ ∞, and prove it.

Exercise 49. Show that if B is a Brownian motion, then (1− t)B(t/(1− t))and tB(1 − t)/t) are Brownian bridges. In addition, show that if X is aBrownian bridge, then (1+t)X(t/(1+t)) and (1+t)X(1/(1+t)) are Brownianmotions. Hint: Take advantage of the Gaussian property.

Exercise 50. For a Brownian bridge X , find expressions for the distributionof M(1) = mint≤1X(t) and M(1) = maxt≤1X(t).

Exercise 51. Consider the Brownian/Poisson model in Section 5.15 with thedifference that the Poisson input process N is no longer time-homogeneousand its mean measure is

E[N(A× I)] = α|A|Λ(I),

where Λ is a measure on the time axis R. As in Section 5.15, let Nt(I × A)denote the number of particles that enter S in the time interval t− I and arein A at time t. Verify that each Nt is a Poisson process with

E[Nt(I ×A)] = α|A|∫

t−IPV > t− sΛ(ds).

Is the family Nt : t ∈ R stationary as it is in Section 5.15?


Exercise 52. Continuity of Addition in D × D. Assume that (Xn, Yn)d→

(X,Y ) in D×D and Disc(X)∩Disc(Y ) is empty a.s., where Disc(x) denotesthe discontinuity set of x. Prove that Xn + Yn

d→ X + Y .

Exercise 53. Show thatXnd→ X inD if Xn

d→ X inD andXn−Xnd→ 0. Do

this by proving and applying the property that if Xnd→ X in D and Yn

d→ y

in D for non-random y, then (Xn, Yn)d→ (X, y) in D2 and Xn +Yn

d→ X + yin D, when X has continuous paths a.s.

Exercise 54. Suppose that (B1(t), B2(t)) is a Brownian motion in R2, and

define τa = inft : B1(t) = a. Then X(a) = B2(τa) is the value of B2

when B1 hits a. The process X(a) : a ≥ 0 is, of course, a Brownian motiondirected by τa : a ≥ 0. Show that X has stationary independent incrementsand that X(a) has a Cauchy distribution with density

f(x) =1

aπ(1 + (x/a)2), x ∈ R.

Hint: Find the characteristic function of X(a).

Exercise 55. Consider the Bessel process R(t) =(B1(t)2 · · ·Bd(t)2

)1/2 as in(5.43). Show that its density is

fR(t) =2

(2t)n/2Γ (n/2)rd−1e−r

2/2t.

Evaluate this for d = 3 by using the fact that Γ (α) = (α− 1)Γ (α− 1). Showthat Γ (1/2) =

√π by its definition Γ (α) =

∫ ∞0xα−1e−x dx and the property

of the normal distribution that√

2∫∞0 e−t

2/2 dt =√π.

Exercise 56. Continuation. For the Bessel process R(t) in the precedingexercise, show that R(t)2 − t is a martingale with mean 0.

Exercise 57. Suppose that X(t) is a Brownian bridge. Find an expression interms of normal distributions for pt = P|X(1/2)−X(t)| > 0. Is pt strictlyincreasing to 1 on [1/2, 1]?

Exercise 58. Let X(t) denote a standard Brownian motion in R3 and let

A denote the unit ball. Find the distribution of the hitting time τ = inft :X(t) ∈ Ac. Is τ d= inft : |B(t)| > 1?

Exercise 59. For the G/G/1 system described in Section 5.16, consider thewaiting times

Wn = max0≤m≤n

n∑

=m+1

(V−1 − U).

Show that if ρ > 1, then n−1Wn → E[V1 − U1] a.s. as n→ ∞.

Chapter 12Brownian Motion and Gaussian Processes

We started this text with discussions of a single random variable. We then proceededto two and more generally, a finite number of random variables. In the last chapter,we treated the random walk, which involved a countably infinite number of randomvariables, namely the positions of the random walk Sn at times n D 0; 1; 2; 3; : : :.The time parameter n for the random walks we discussed in the last chapter belongsto the set of nonnegative integers, which is a countable set. We now look at a specialcontinuous time stochastic process, which corresponds to an uncountable family ofrandom variables, indexed by a time parameter t belonging to a suitable uncountabletime set T . The process we mainly treat in this chapter is Brownian motion, althoughsome other Gaussian processes are also treated briefly.

Brownian motion is one of the most important continuous-time stochastic pro-cesses and has earned its special status because of its elegant theoretical properties,its numerous important connections to other continuous-time stochastic processes,and due to its real applications and its physical origin. If we look at the path ofa random walk when we run the clock much faster, and the steps of the walk arealso suitably smaller, then the random walk converges to Brownian motion. Thisis an extremely important connection, and it is made precise later in this chapter.Brownian motion arises naturally in some form or other in numerous statisticalinference problems. It is also used as a real model for modeling stock marketbehavior.

The process owes its name to the Scottish botanist Robert Brown, who noticedunder a microscope that pollen particles suspended in fluid engaged in a zigzag andeccentric motion. It was, however, Albert Einstein who in 1905 gave Brownian mo-tion a formal physical formulation. Einstein showed that Brownian motion of a largeparticle visible under a microscope could be explained by assuming that the parti-cle gets ceaselessly bombarded by invisible molecules of its surrounding medium.The theoretical predictions made by Einstein were later experimentally verified byvarious physicists, including Jean Baptiste Perrin who was awarded the Nobel prizein physics for this work. In particular, Einstein’s work led to the determination ofAvogadro’s constant, perhaps the first major use of what statisticians call a momentestimate. The existence and construction of Brownian motion was first explicitlyestablished by Norbert Wiener in 1923, which accounts for the other name Wienerprocess for a Brownian motion.

A. DasGupta, Probability for Statistics and Machine Learning: Fundamentalsand Advanced Topics, Springer Texts in Statistics, DOI 10.1007/978-1-4419-9634-3 12,c Springer Science+Business Media, LLC 2011

401

402 12 Brownian Motion and Gaussian Processes

There are numerous excellent references at various technical levels on thetopics of this chapter. Comprehensive and lucid mathematical treatments are avail-able in Freedman (1983), Karlin and Taylor (1975), Breiman (1992), Resnick(1992), Revuz and Yor (1994), Durrett (2001), Lawler (2006), and Bhattacharyaand Waymire (2009). Elegant and unorthodox treatment of Brownian motion isgiven in Morters and Peres (2010). Additional specific references are given in thesections.

12.1 Preview of Connections to the Random Walk

We remarked in the introduction that random walks and Brownian motion are in-terconnected in a suitable asymptotic paradigm. It would be helpful to understandthis connection in a conceptual manner before going into technical treatments ofBrownian motion.

Consider then the usual simple symmetric random walk defined by S0 D 0; Sn DX1 CX2 C CXn; n 1, where the Xi are iid with common distribution P.Xi D˙1/ D 1

2. Consider now a random walk that makes its steps at much smaller time

intervals, but the jump sizes are also smaller. Precisely, with the Xi ; i 1 still asabove, define

Sn.t/ D Sbntcpn

; 0 t 1;

where bxc denotes the integer part of a nonnegative real number x. This amounts tojoining the points

.0; 0/;

1

n;

X1pn

;

2

n;

X1 C X2pn

; : : :

by linear interpolation, thereby obtaining a curve. The simulated plot of Sn.t/ forn D 1000 in Fig. 12.1 shows the zigzag path of the scaled random walk. We can seethat the plot is rather rough, and the function takes the value zero at t D 0; that is,Sn.0/ D 0, and Sn.1/ ¤ 0.

It turns out that in a suitable precise sense, the graph of Sn.t/ on Œ0; 1 for largen should mimic the graph of a random function called Brownian motion on Œ0; 1.Brownian motion is a special stochastic process, which is a collection of infinitelymany random variables, say W.t/; 0 t 1, each W.t/ for a fixed t being anormally distributed random variable, with other additional properties for their jointdistributions. They are introduced formally and analyzed in greater detail in the nextsections.

The question arises of why is the connection between a random walk and theBrownian motion of any use or interest to us. A short nontechnical answer to thatquestion is that because Sn.t/ acts like a realization of a Brownian motion, by usingknown properties of Brownian motion, we can approximately describe propertiesof Sn.t/ for large n. This is useful, because the stochastic process Sn.t/ arises in

12.2 Basic Definitions 403

0.2 0.4 0.6 0.8 1 t

-0.75

-0.5

-0.25

0.25

0.5

0.75

1

Fig. 12.1 Simulated plot of a scaled random walk

numerous problems of interest to statisticians and probabilists. By simultaneouslyusing the connection between Sn.t/ and Brownian motion, and known properties ofBrownian motion, we can assert useful things concerning many problems in statis-tics and probability that would be nearly impossible to assert in a simple directmanner. That is why the connections are not just mathematically interesting, butalso tremendously useful.

12.2 Basic Definitions

Our principal goal in the subsequent sections is to study Brwonian motion andBrownian bridge due to their special importance among Gaussian processes.

The Brownian bridge is closely related to Brownian motion, and shares many ofthe same properties as Brownian motion. They both arise in many statistical appli-cations. It should also be understood that the Brownian motion and bridge are ofenormous independent interest in the study of probability theory, regardless of theirconnections to problems in statistics.

We caution the reader that it is not possible to make all the statements in thischapter mathematically rigorous without using measure theory. This is because weare now dealing with uncountable collections of random variables, and problems ofmeasure zero sets can easily arise. However, the results are accurate and they can bepractically used without knowing exactly how to fix the measure theory issues.

We first give some general definitions for future use.

Definition 12.1. A stochastic process is a collection of random variables fX.t/; t

2 T g taking values in some finite-dimensional Euclidean space Rd ; 1 d < 1,where the indexing set T is a general set.


Definition 12.2. A real-valued stochastic process fX.t/; 1 < t < 1g is calledweakly stationary if

(a) E.X.t// D is independent of t .(b) EŒX.t/2 < 1 for all t; and Cov.X.t/; X.s// D Cov.X.t C h/; X.s C h// for

all s; t; h:

Definition 12.3. A real-valued stochastic process fX.t/; 1 < t < 1g iscalled strictly stationary if for every n 1; t1; t2; : : : ; tn, and every h, thejoint distribution of .Xt1 ; Xt2 ; : : : ; Xtn/ is the same as the joint distribution of.Xt1Ch; Xt2Ch; : : : ; XtnCh/.

Definition 12.4. A real-valued stochastic process fX.t/; 1 < t < 1g is called aMarkov process if for every n 1; t1 < t2 < < tn,

P.Xtn xtn jXt1 D xt1 ; : : : ; Xtn1D xtn1

/ D P.Xtn xtn jXtn1D xtn1

/I

that is, the distribution of the future values of the process given the entire past de-pends only on the most recent past.

Definition 12.5. A stochastic process fX.t/; t 2 T g is called a Gaussian processif for every n 1; t1; t2; : : : ; tn, the joint distribution of .Xt1 ; Xt2 ; : : : ; Xtn/ is amultivariate normal.

This is often stated as a process is a Gaussian process if all its finite-dimensionaldistributions are Gaussian.

With these general definitions at hand, we now define Brownian motion and theBrownian bridge. Brownian motion is intimately linked to the simple symmetricrandom walk, and partial sums of iid random variables. The Brownian bridge isintimately connected to the empirical process of iid random variables. We focus onthe properties of Brownian motion in this chapter, and postpone discussion of theempirical process and the Brownian bridge to a later chapter. However, we defineboth Brownian processes right now.

Definition 12.6. A stochastic process W.t/ defined on a probability space.;A; P /; t 2 Œ0; 1/ is called a standard Wiener process or the standard Brownianmotion starting at zero if:

(i) W.0/ D 0 with probability one.(ii) For 0 s < t < 1; W.t/ W.s/ N.0; t s/.

(iii) Given 0 t0 < t1 < : : : < tk < 1; the random variables W.tj C1/ W.tj /; 0 j k 1 are mutually independent.

(iv) The sample paths of W.:/ are almost all continuous; that is except for a set ofsample points of probability zero, as a function of t; W.t; !/ is a continuousfunction.

Remark. Property (iv) actually can be proved to follow from the other three proper-ties. But it is helpful to include it in the definition to emphasize the importance of thecontinuity of Brownian paths. Property (iii) is the celebrated independent increments


property and lies at the heart of numerous further properties of Brownian motion.We often just omit the word standard when referring to standard Brownian motion.

Definition 12.7. If W.t/ is a standard Brownian motion, then X.t/ D x C W.t/,x 2 R, is called Brownian motion starting at x, and Y.t/ D W.t/; > 0 is calledBrownian motion with scale coefficient or diffusion coefficient .

Definition 12.8. Let W.t/ be a standard Wiener process on Œ0; 1. The processB.t/ D W.t/ tW.1/ is called a standard Brownian bridge on Œ0; 1.

Remark. Note that the definition implies that B.0/ D B.1/ D 0 with probabilityone. Thus, the Brownian bridge on Œ0; 1 starts and ends at zero. Hence the name tieddown Wiener process. The Brownian bridge on Œ0; 1 can be defined in various otherequivalent ways. The definition we adopt here is convenient for many calculations.

Definition 12.9. Let 1 < d < 1, and let Wi .t/; 1 i d , be independent Brown-ian motions on Œ0; 1/. Then Wd .t/ D .W1.t/; : : : ; Wd .t// is called d -dimensionalBrownian motion.

Remark. In other words, if a particle performs independent Brownian move-ments along d different coordinates, then we say that the particle is engaged ind -dimensional Brownian motion. Figure 12.2 demonstrates the case d D 2. Whenthe dimension is not explicitly mentioned, it is understood that d D 1.

Example 12.1 (Some Illustrative Processes). We take a few stochastic processes,and try to understand some of their basic properties. The processes we consider arethe following.

(a) X1.t/ X , where X N.0; 1/.(b) X2.t/ D tX , where X N.0; 1/.

0.2 0.4 0.6 0.8

-0.8

-0.6

-0.4

-0.2

0.2

0.4

Fig. 12.2 State visited by a planar Brownian motion


(c) X3.t/ D A cos t C B sin t , where is a fixed positive number, t 0, andA; B are iid N.0; 1/.

(d) X4.t/ D R t

0W.u/du; t 0, where W.u/ is standard Brownian motion on

Œ0; 1/, starting at zero.(e) X5.t/ D W.t C 1/ W.t/; t 0, where W.t/ is standard Brownian motion on

Œ0; 1/, starting at zero.

Each of these processes is a Gaussian process on the time domain on which it isdefined. The mean function of each process is the zero function.

Coming to the covariance functions, for s t ,

Cov.X1.s/; X1.t// 1:

Cov.X2.s/; X2.t// D st:

Cov.X3.s/; X3.t// D cos s cos t C sin s sin t D cos .s t/:

Cov.X4.s/; X4.t// D E

Z s

0

W.u/duZ t

0

W.v/dv

D E

Z t

0

Z s

0

W.u/W.v/dudv

DZ t

0

Z s

0

EŒW.u/W.v/dudv DZ t

0

Z s

0

min.u; v/dudv

DZ s

0

Z s

0

min.u; v/dudv CZ t

s

Z s

0

min.u; v/dudv

DZ s

0

Z v

0

min.u; v/dudv CZ s

0

Z s

vmin.u; v/dudv

CZ t

s

Z s

0

min.u; v/dudv

D s3

6C

s3

2 s3

3

C s2

2.t s/

D s2t

2 s3

6D s2

6.3t s/:

Cov.X5.s/; X5.t// D Cov.W.s C 1/ W.s/; W.t C 1/ W.t//

D s C 1 min.s C 1; t/ s C s

D 0; if s t 1; and D s t C 1 if s t > 1:

The two cases are combined into the single formula Cov.W.s C 1/ W.s/;

W.t C 1/ W.t// D .s t C 1/C. The covariance functions of X1.t/; X3.t/, andX5.t/ depend only on s t , and these are stationary.

12.2.1 Condition for a Gaussian Process to be Markov

We show a simple and useful result on characterizing Gaussian processes that areMarkov. It turns out that there is a simple way to tell if a given Gaussian processis Markov by simply looking at its correlation function. Because we only need to


consider finite-dimensional distributions to decide if a stochastic process is Markov,it is only necessary for us to determine when a finite sequence of jointly normalvariables has the Markov property. We start with that case.

Definition 12.10. Let X1; X2; : : : ; Xn be n jointly distributed continuous ran-dom variables. The n-dimensional random vector .X1; : : : ; Xn/ is said to havethe Markov property if for every k n, the conditional distribution of Xk

given X1; : : : ; Xk1 is the same as the conditional distribution of Xk given Xk1

alone.

Theorem 12.1. Let .X1; : : : ; Xn/ have a multivariate normal distribution withmeans zero and the correlations Xj ;Xk

D jk . Then .X1; : : : ; Xn/ has the Markovproperty if and only if for 1 i j k n; ik D ij jk .

Proof. We may assume that each Xi has variance one. If .X1; : : : ; Xn/ has theMarkov property, then for any k, E.Xk jX1; : : : ; Xk1/ D E.Xk jXk1/ Dk1;kXk1(see Chapter 5). Therefore, Xk k1;kXk1 D Xk E.Xk jX1; : : : ;

Xk1/ is independent of the vector .X1; : : : ; Xk1/. In particular, each covari-ance Cov.Xk k1;kXk1; Xi / must be zero for all i k 1. This leads toik D i;k1k1;k , and to the claimed identity ik D ij jk by simply applyingik D i;k1k1;k repeatedly.

Conversely, suppose the identity ik D ij jk holds for all 1 i j k n. Then, it follows from the respective formulas for E.Xk jX1; : : : ; Xk1/

and Var.Xk jX1; : : : ; Xk1/ (see Chapter 5) that E.Xk jX1; : : : ; Xk1/ Dk1;kXk1 D E.Xk jXk1/, and Var.Xk jX1; : : : ; Xk1/ D Var.Xk jXk1/.All conditional distributions for a multivariate normal distribution are also normal,therefore it must be the case that the distribution of Xk given X1; : : : ; Xk1 anddistribution of Xk given just Xk1 are the same. This being true for all k, the fullvector .X1; : : : ; Xn/ has the Markov property. ut

Because the Markov property for a continuous-time stochastic process is definedin terms of finite-dimensional distributions, the above result gives us the followingsimple and useful result as a corollary.

Corollary. A Gaussian process X.t/; t 2 R is Markov if and only if X.s/;X.u/ DX.s/;X.t/ X.t/;X.u/ for all s; t; u; s t u.

12.2.2 Explicit Construction of Brownian Motion

It is not a priori obvious that an uncountable collection of random variables with thedefining properties of Brownian motion can be constructed on a common probabilityspace (a measure theory terminology). In other words, that Brownian motion existsrequires a proof. Various proofs of the existence of Brownian motion can be given.We provide two explicit constructions, of which one is more classic in nature. Butthe second construction is also useful.


Theorem 12.2 (Karhunen–Loeve Expansion). (a) Let Z1; Z2; : : : be an infinitesequence of iid standard normal variables. Then, with probability one, the infi-nite series

W.t/ D p2

1XmD1

sin

m 12

t

m 1

2

Zm

converges uniformly in t on Œ0; 1 and the process W.t/ is a Brownian motionon Œ0; 1.

The infinite series B.t/ D p2

P1mD1

sin.mt/m

Zm converges uniformly in t

on Œ0; 1 and the process B.t/ is a Brownian Bridge on Œ0; 1.(b) For n 0, let In denote the set of odd integers in Œ0; 2n. Let Zn;k; n 0; k 2

In be a double array of iid standard normal variables. Let Hn;k.t/; n 0; k 2In be the sequence of Haar wavelets defined as

Hn;k.t/ D 0 if t …

k 1

2n;

k C 1

2n

; and Hn;k.t/ D 2.n1/=2

if t 2

k 1

2n;

k

2n

; and 2.n1/=2 if t 2

k

2n;

k C 1

2n

:

Let Sn;k.t/ be the sequence of Schauder functions defined as Sn;k.t/ DR t

0Hn;k.s/ds; 0 t 1; n 0; k 2 In.Then the infinite series W.t/ D P1

nD0

Pk2In

Zn;kSn;k.t/ converges uni-formly in t on Œ0; 1 and the process W.t/ is a Brownian motion on Œ0; 1.

Remark. See Bhattacharya and Waymire (2007, p. 135) for a proof. Both con-structions of Brownian motion given above can be heuristically understood byusing ideas of Fourier theory. If the sequence f0.t/ D 1; f1.t/; f2.t/; : : : formsan orthonormal basis of L2Œ0; 1, then we can expand a square integrable func-tion, say w.t/, as an infinite series

Pi ci fi .t/, where ci equals the inner productR 1

0 w.t/fi .t/dt . Thus, c0 D 0 if the integral of w.t/ is zero. The Karhunen–Loeveexpansion can be heuristically explained as a random orthonormal expansion ofW.t/. The basis functions fi .t/ chosen do depend on the process W.t/, specificallythe covariance function. The inner products

R 1

0 W.t/fi .t/dt; i 1 form a sequenceof iid standard normals. This is very far from a proof, but provides a heuristic con-text for the expansion. The second construction is based similarly on expansionsusing a wavelet basis instead of a Fourier basis.

12.3 Basic Distributional Properties

Distributional properties and formulas are always useful in doing further calcula-tions and for obtaining concrete answers to questions. The most basic distributionalproperties of the Brownian motion and bridge are given first.

12.3 Basic Distributional Properties 409

Throughout this chapter, the notation W.t/ and B.t/ mean a (standard) Brownianmotion and Brownian bridge. The phrase standard is often deleted for brevity.

Proposition. (a) Cov.W.s/; W.t// D min.s; t/I Cov.B.s/; B.t// D min.s; t/st:

(b) (The Markov Property). For any given n and t0 < t1 < < tn,the conditional distribution of W.tn/ given that W.t0/ D x0; W.t1/ Dx1; : : : ; W.tn1/ D xn1 is the same as the conditional distribution of W.tn/

given W.tn1/ D xn1.(c) Given s < t , the conditional distribution of W.t/ given W.s/ D w is N.w; ts/.(d) Given t1 < t2 < < tn, the joint density of W.t1/; W.t2/; : : : ; W.tn/ is given

by the function

f .x1; x2; : : : ; xn/ D p.x1; t1/p.x2 x1; t2 t1/ p.xn xn1; tn tn1/;

where p.x; t/ is the density of N.0; t/; that is, p.x; t/ D 1p2t

e x2

2t :

Each part of this proposition follows on simple and direct calculation byusing the definition of a Brownian motion and Brownian bridge. It is worthmentioning that the Markov property is extremely important and is a conse-quence of the independent increments property. Alternatively, one can simplyuse our previous characterization that a Gaussian process is Markov if and onlyif its correlation function satisfies X.s/;X.u/ D X.s/;X.t/ X.t/;X.u/ for alls t u.

The Markov property can be strengthened to a very useful property, known as thestrong Markov property. For instance, suppose you are waiting for the process toreach a level b for the first time. The process will reach that level at some randomtime, say . At this point, the process will simply start over, and W.t/ b will actlike a path of a standard Brownian motion from that point onwards. For the generaldescription of this property, we need a definition.

Definition 12.11. A nonnegative random variable is called a stopping time for theprocess W.t/ if for any s > 0, whether s depends only on the values of W.t/

for t s.

Example 12.2. For b > 0, consider the first passage time Tb D infft > 0 W W.t/

D bg. Then, Tb > s if and only if W.t/ < b for all t s. Therefore, Tb is a stoppingtime for the process W.t/.

Example 12.3. Let X be a U Œ0; 1 random variable independent of the processW.t/. Then the nonnegative random variable D X is not a stopping time forthe process W.t/.

Theorem 12.3 (Strong Markov Property). If is a stopping time for the processW.t/, then W. Ct/W./ is also a Brownian motion on Œ0; 1/ and is independentof fW.s/; s g.

See Bhattacharya and Waymire (2007, p. 153) for its proof.


12.3.1 Reflection Principle and Extremes

It is important in applications to be able to derive the distribution of special func-tionals of Brownian processes. They can be important because a Brownian processis used directly as a statistical model in some problem, or they can be importantbecause the functional arises as the limit of some suitable sequence of statistics ina seemingly unrelated problem. Examples of the latter kind are seen in applicationsof the so-called invariance principle. For now, we provide formulas for the distribu-tion of certain extremes and first passage times of a Brownian motion. The followingnotation is used:

M.t/ D sup0<stW.s/I Tb D infft > 0 W W.t/ D bg:

Theorem 12.4 (Reflection Principle). (a) For b > 0, P.M.t/ > b/ D 2P.W

.t/ > b/:

(b) For t > 0; M.t/ D sup0<stW.s/ has the density

r2

tex2=.2t/; x > 0:

(c) For b > 0, the first passage time Tb has the density

bpt

t3=2

;

where denotes the standard normal density function.(d) (First Arcsine Law). Let be the point of maxima of W.t/ on Œ0; 1. Then is

almost surely unique, and P. t/ D 2

arcsin.p

t /.(e) (Reflected Brownian Motion). Let X.t/ D sup0st jW.s/j. Then X.1/ D

sup0s1jW.s/j has the CDF

G.x/ D 4

1XmD0

.1/m

2m C 1e.2mC1/22=.8x2/; x 0:

(f) (Maximum of a Brownian Bridge). Let B.t/ be a Brownian bridge on Œ0; 1.Then, sup0t1 jB.t/j has the CDF

H.x/ D 1 1X

kD1.1/k1e2k2x2

; x 0:

(g) (Second Arcsine Law). Let L D supft 2 Œ0; 1 W W.t/ D 0g. Then P.L t/ D2

arcsin.p

t /.(h) Given 0 < s < t , P.W.t/ has at least one zero in the time interval .s; t// D

2

arccos.q

st/.


Proof of the Reflection Principle: The reflection principle is of paramount im-portance and we provide a proof of it. The reflection principle follows from twoobservations, the first of which is obvious, and the second needs a clever argument.The observations are:

P.Tb < t/ D P.Tb < t; W.t/ > b/ C P.Tb < t; W.t/ < b/;

and,P.Tb < t; W.t/ > b/ D P.Tb < t; W.t/ < b/:

Because P.Tb < t; W.t/ > b/ D P.W.t/ > b/ (because W.t/ > b impliesthat Tb < t), if we accept the second identity above, then we immediately have thedesired result P.M.t/ > b/ D P.Tb < t/ D 2P.W.t/ > b/. Thus, only the secondidentity needs a proof. This is done by a clever argument.

The event fTb < t; W.t/ < bg happens if and only if at some point < t ,the process reaches the level b, and then at time t drops to a lower level l; l < b.However, once at level b, the process could as well have taken the path reflectedalong the level b, which would have caused the process to end up at level b C.b l/ D 2b l at time t . We now observe that 2b l > b, meaning thatcorresponding to every path in the event fTb < t; W.t/ < bg, there is a path inthe event fTb < t; W.t/ > bg, and so P.Tb < t; W.t/ < b/ must be equal toP.Tb < t; W.t/ > b/.

This is the famous reflection principle for Brownian motion. An analytic proofof the identity P.Tb < t; W.t/ < b/ D P.Tb < t; W.t/ > b/ can be given by usingthe strong Markov property of Brownian motion.

Note that both parts (b) and (c) of the theorem are simply restatements of part(a). Many of the remaining parts follow on calculations that also use the reflectionprinciple. Detailed proofs can be seen, for example, in Karlin and Taylor (1975,pp. 345–354). utExample 12.4 (Density of Last Zero Before T ). Consider standard Brownian motionW.t/ on Œ0; 1/ starting at zero and fix a time T > 0. We want to find the densityof the last zero of W.t/ before the time T . Formally, let D T D supft < T WW.t/ D 0g. Then, we want to find the density of .

By using part (g) of the previous theorem,

P. > s/ D P.There is at least one zero of W in .s; T // D 2

arccos

rs

T

:

Therefore, the density of is

f .s/ D d

ds

2

arccos

rs

T

D 1

p

s.T s/; 0 < s < T:

Notice that the density is symmetric about T2

, and therefore E./ D T2

. A calcula-

tion shows that E.2/ D 38T 2, and therefore Var./ D 3

8T 2 T 2

4D 1

8T 2.


12.3.2 Path Properties and Behavior Near Zero and Infinity

A textbook example of a nowhere differentiable and yet everywhere continuousfunction is Weierstrass’s function f .t/ D P1

nD0 2n cos.bnt/; 1 < t < 1, forb > 2 C 3 . Constructing another example of such a function is not trivial. A resultof notoriety is that almost all sample paths of Brownian motion are functions ofthis kind; that is, as a function of t; W.t/ is continuous at every t , and differentiableat no t! The paths are extremely crooked. The Brownian bridge shares the sameproperty. The sample paths show other evidence of extreme oscillation; for example,in any arbitrarily small interval containing the starting time t D 0, W.t/ changes itssign infinitely often. The various important path properties of Brownian motion aredescribed and discussed below.

Theorem 12.5. Let W.t/; t > 0 be a Brownian motion on Œ0; 1/. Then,

(a) (Scaling). For c > 0; X.t/ D c 12 W.ct/ is a Brownian motion on Œ0; 1/.

(b) (Time reciprocity). X.t/ D tW. 1t/, with the value being defined as zero at

t D 0 is a Brownian motion on Œ0; 1/.(c) (Time Reversal). Given 0 < T < 1; XT .t/ D W.T /W.T t/ is a Brownian

motion on Œ0; T .

Proof. Only part (b) requires a proof, the others being obvious. First note that fors t , the covariance function is

Cov

sW

1

s

; tW

1

t

D st

min

1

s;

1

t

D st

1

tD s D minfs; tg:

It is obvious that X.t/ X.s/ N.0; t s/. Next, for s < t < u; Cov.X.t/ X.s/; X.u/ X.t// D t s t C s D 0, and the independent increments propertyholds. The sample paths are continuous (including at t D 0) because W.t/ has con-tinuous sample paths, and X.0/ D 0. Thus, all the defining properties of a Brownianmotion are satisfied, and hence X.t/ must be a Brownian motion. ut

Part (b) leads to the following useful property.

Proposition. With probability one, W.t/t

! 0 as t ! 1.

The behavior of Brownian motion near t D 0 is quite a bit more subtle, andwe postpone its discussion till later. We next describe a series of classic results thatillustrate the extremely rough nature of the paths of a Brownian motion. The resultsessentially tell us that at any instant, it is nearly impossible to predict what a particleperforming a Brownian motion will do next. Here is a simple intuitive explanationfor why the paths of a Brownian motion are so rough.


Take two time instants s; t; s < t . We then have the simple moment formulaEŒ.W.t/ W.s/2 D .t s/. Writing t D s C h, we get

EŒW.s C h/ W.s/2 D h , E

W.s C h/ W.s/

h

2

D 1

h:

If the time instants s; t are close together, then h 0, and so 1h

is large. We can

see that the increment W.sCh/W.s/h

is blowing up in magnitude. Thus, differentia-bility is going to be a problem. In fact, not only is the path of a Brownian motionguaranteed to be nondifferentiable at any prespecified t , it is guaranteed to be non-differentiable simultaneously at all values of t . This is a much stronger roughnessproperty than lack of differentiability at a fixed t .

The next theorem is regarded as one of the most classic ones in probability theory.We first need a few definitions.

Definition 12.12. Let f be a real-valued continuous function defined on some opensubset T of R. The upper and the lower Dini right derivatives of f at t 2 T aredefined as

DCf .t/ D lim suph#0

f .t C h/ f .t/

h; DCf .t/ D lim inf

h#0

f .t C h/ f .t/

h:

Definition 12.13. Let f be a real-valued function defined on some open subset Tof R. The function f is said to be Holder continuous of order > 0 at t if forsome finite constant C (possibly depending on t), jf .t C h/ f .t/j C jhj forall sufficiently small h. If f is Holder continuous of order at every t 2 T with auniversal constant C , it is called Holder continuous of order in T .

Theorem 12.6 (Crooked Paths and Unbounded Variation). (a) Given anyT > 0; P.supt2Œ0;T W.t/ > 0; inft2Œ0;T W.t/ < 0/ D 1. Hence, with proba-bility one, in any nonempty interval containing zero, W.t/ changes sign at leastonce, and therefore infinitely often.

(b) (Nondifferentiability Everywhere). With probability one, W.t/ is (simultane-ously) nondifferentiable at all t > 0; that is,

P.For each t > 0; W.t/ is not differentiable at t/ D 1:

(c) (Unbounded Variation). For every T > 0, with probability one, W.t/ has anunbounded total variation as a function of t on Œ0; T .

(d) With probability one, no nonempty time interval W.t/ can be monotone increas-ing or monotone decreasing.

(e) P.For all t > 0; DCW.t/ D 1 or DCW.t/ D 1 or both/ D 1.(f) (Holder Continuity). Given any finite T > 0 and 0 < < 1

2, with probability

one, W.t/ is Holder continuous on Œ0; T of order .(g) For any > 1

2, with probability one, W.t/ is nowhere Holder continuous of

order .


(h) (Uniform Continuity in Probability). Given any > 0; and 0 < T < 1;

P.supt;s;0 t;s T;jtsj<hjW.t/ W.s/j > / ! 0 as h ! 0.

Proof. Each of parts (c) and (d) would follow from part (b), because of results inreal analysis that monotone functions or functions of bounded variation must bedifferentiable almost everywhere. Part (e) is a stronger version of the nondifferen-tiability result in part (b); see Karatzas and Shreve (1991, pp. 106–111) for parts(e)–(h). Part (b) itself is proved in many standard texts on stochastic processes; theproof involves quite a bit of calculation. We show here that part (a) is a consequenceof the reflection principle.

Clearly, it is enough just to show that for any T > 0; P.supt2Œ0;T W.t/ > 0/ D 1.This will imply that P.inft2Œ0;T W.t/ < 0/ D 1, because W.t/ is a Brownianmotion if W.t/ is, and hence it will imply all the other statements in part (a). Fixc > 0. Then,

P. supt2Œ0;T

W.t/ > 0/ P. supt2Œ0;T

W.t/ > c/ D 2P.W.T / > c/ .reflection principle/

! 1 as c ! 0, and therefore P.supt2Œ0;T W.t/ > 0/ D 1. utRemark. It should be noted that the set of points at which the path of a Brownianmotion is Holder continuous of order 1

2is not empty, although in some sense such

points are rare.The oscillation properties of the paths of a Brownian motion are further illus-

trated by the laws of the iterated logarithm for Brownian motion. The path of aBrownian motion is a random function. Can we construct suitable deterministicfunctions, say u.t/ and l.t/, such that for large t the Brownian path W.t/ will bebounded by the envelopes l.t/; u.t/? What are the tightest such envelope functions?Similar questions can be asked about small t . The law of the iterated logarithm an-swers these questions precisely. However, it is important to note that in addition tothe intellectual aspect of just identifying the tightest envelopes, the iterated loga-rithm laws have other applications.

Theorem 12.7 (LIL). Let f .t/ D p2t log j log t j; t > 0. With probability one,

(a) lim supt!1W.t/f .t/

D 1I lim inft!1 W.t/f .t/

D 1.

(b) lim supt!0W.t/f .t/

D 1I lim inft!0W.t/f .t/

D 1.

Remark on Proof: Note that the lim inf statement in part (a) follows from thelim sup statement because W.t/ is also a Brownian motion if W.t/ is. On theother hand, the two statements in part (b) follow from the corresponding statementsin part (a) by the time reciprocity property that tW. 1

t/ is also a Brownian motion

if W.t/ is. For a proof of part (a), see Karatzas and Shreve (1991), or Bhattacharyaand Waymire (2007, p. 143). ut


12.3.3 Fractal Nature of Level Sets

For a moment, let us consider a general question. Suppose T is a subset of thereal line, and X.t/; t 2 T a real-valued stochastic process. Fix a number u, andask how many times does the path of X.t/ cut the line drawn at level u; that is,consider NT .u/ D #ft 2 T W X.t/ D ug. It is not a priori obvious that NT .u/ isfinite. Indeed, for Brownian motion, we already know that in any nonempty intervalcontaining zero, the path hits zero infinitely often with probability one. One mightguess that this lack of finiteness is related to the extreme oscillatory nature of thepaths of a Brownian motion. Indeed, that is true. If the process X.t/ is a bit moresmooth, then the number of level crossings will be finite. However, investigationsinto the distribution of NT .u/ will still be a formidable problem. For the Brownianmotion, it is not the number of level crossings, but the geometry of the set of timesat which it crosses a given level u that is of interest. In this section, we describe thefascinating properties of these level sets of the path of a Brownian motion. We alsogive a very brief glimpse into what we can expect for processes whose paths aremore smooth, to draw the distinction from the case of Brownian motion.

Given b 2 R, letCb D ft 0 W W.t/ D bg:

Note that Cb is a random set, in the sense that different sample paths will hit the levelb at different sets of times. We only consider the case b D 0 here, although most ofthe properties of C0 extend in a completely evident way to the case of a general b.

Theorem 12.8. With probability one, C0 is an uncountable, unbounded, closed setof Lebesgue measure zero, and has no isolated points; that is, in any neighborhoodof an element of C0, there is at least one other element of C0.

Proof. It follows from an application of the reflection principle that P.supt0

W.t/ D 1; inft0 W.t/ D 1/ D 1 (check it!). Therefore, given any T > 0,there must be a time instant t > T such that W.t/ D 0. For if there were a finite lasttime that W.t/ D 0, then for such a sample path, the supremum and the infimumcannot simultaneously be infinite. This means that the zero set C0 is unbounded. It isclosed because the paths of Brownian motion are continuous. We have not definedwhat Lebesgue measure means, therefore we cannot give a rigorous proof that C0

has zero Lebesgue measure. Think of Lebesgue measure of a set C as its total length .C/. Then, by Fubini’s theorem,

EŒ .C0/ D E

ZC0

dt

D E

ZŒ0;1/

IfW.t/D0gdt

DZ

Œ0;1/

P.W.t/ D 0/dt D 0:

If the expected length is zero, then the length itself must be zero with probabilityone. That C0 has no isolated points is entirely nontrivial to prove and we omit the


proof. Finally, by a result in real analysis that any closed set with no isolated pointsmust be uncountable unless it is empty, we have that C0 is an uncountable set. utRemark. The implication is that the set of times at which Brownian motion returnsto zero is a topologically large set marked by holes, and collectively the holes are bigenough that the zero set, although uncountable, has length zero. Such sets in one di-mension are commonly called Cantor sets. Corresponding sets in higher dimensionsoften go by the name fractals.

12.4 The Dirichlet Problem and Boundary CrossingProbabilities

The Dirichlet problem on a domain in Rd ; 1 d < 1 was formulated by Gaussin the mid-nineteenth century. It is a problem of special importance in the area ofpartial differential equations with boundary value constraints. The Dirichlet problemcan also be interpreted as a problem in the physical theory of diffusion of heatin a d -dimensional domain with controlled temperature at the boundary points ofthe domain. According to the laws of physics, the temperature as a function of thelocation in the domain would have to be a harmonic function. The Dirichlet problemthus asks for finding a function u.x/ such that

u.x/ D g.x/ .specified/; x 2 @U I u.:/ harmonic in U;

where U is a specified domain in Rd . In this generality, solutions to the Dirich-let problem need not exist. We need the boundary value function g as well as thedomain U to be sufficiently nice. The interesting and surprising thing is that so-lutions to the Dirichlet problem have connections to the d -dimensional Brownianmotion. Solutions to the Dirichlet problem can be constructed by solving suitableproblems (which we describe below) about d -dimensional Brownian motion. Con-versely, these problems on the Brownian motion can be solved if we can directly findsolutions to a corresponding Dirichlet problem, perhaps by inspection, or by usingstandard techniques in the area of partial differential equations. Thus, we have analtruistic connection between a special problem on partial differential equations anda problem on Brownian motion. It turns out that these connections are more thanintellectual curiosities. For example, these connections were elegantly exploited inBrown (1971) to solve certain otherwise very difficult problems in the area of sta-tistical decision theory.

We first provide the necessary definitions. We remarked before that the Dirichletproblem is not solvable on arbitrary domains. The domain must be such that it doesnot contain any irregular boundary points. These are points x 2 @U such that aBrownian motion starting at x immediately falls back into U . A classic exampleis that of a disc, from which the center has been removed. Then, the center is anirregular boundary point of the domain. We refer the reader to Karatzas and Shreve(1991, pp. 247–250) for the exact regularity conditions on the domain.

12.4 The Dirichlet Problem and Boundary Crossing Probabilities 417

Definition 12.14. A set U Rd ; 1 d < 1 is called a domain if U is connectedand open.

Definition 12.15. A twice continuously differentiable real-valued function u.x/

defined on a domain U Rd is called harmonic if its Laplacian 4u.x/ DPdiD1

@2

@x2i

u.x/ 0 for all x D .x1; x2; : : : ; xd / 2 U .

Definition 12.16. Let U be a bounded regular domain in Rd , and g a real-valuedcontinuous function on @U . The Dirichlet problem on the domain U Rd withboundary value constraint g is to find a function u W U ! R such that u is harmonicin U , and u.x/ D g.x/ for all x 2 @U , where @U denotes the boundary of U andU the closure of U .

Theorem 12.9. Let U Rd be a bounded regular domain. Fix x 2 U . ConsiderXd .t/; t 0, d -dimensional Brownian motion starting at x, and let D U Dinfft > 0 W Xd .t/ … U g D infft > 0 W Xd .t/ 2 @U g. Define the function upointwise on U by

u.x/ D ExŒg.Xd .//; x 2 U I u D g on @U:

Then u is continuous on U and is the unique solution, continuous on U , to theDirichlet problem on U with boundary value constraint g.

Remark. When Xd .t/ exits from U having started at a point inside U , it can exitthrough different points on the boundary @U . If it exits at the point y 2 @U , theng.Xd .// will equal g.y/. The exit point y is determined by chance. If we averageover y, then we get a function that is harmonic inside U and equals g on @U . Weomit the proof of this theorem, and refer the reader to Karatzas and Shreve (1991,p. 244), and Korner (1986, p. 55).

Example 12.5 (Dirichlet Problem on an Annulus). Consider the Dirichlet problemon the d -dimensional annulus U D fz W r < jjzjj < Rg; where 0 < r < R < 1.Specifically, suppose we want a function u such that

u harmonic on fz W r < jjzjj < Rg; u D 1 on fz W jjzjj D Rg;u D 0 on fz W jjzjj D rg:

A continuous solution to this can be found directly. The solution is

u.z/ D jzj r

R rfor d D 1I

u.z/ D log jjzjj log r

log R log rfor d D 2I

u.z/ D r2d jjzjj2d

r2d R2dfor d > 2:


We now relate this solution to the Dirichlet problem on U with d -dimensionalBrownian motion. Consider then Xd .t/, d -dimensional Brownian motion thatstarted at a given point x inside U ; r < jjxjj < R. Because the function g corre-sponding to the boundary value constraint in this example is g.z/ D IfzWjjzjjDRg, bythe above theorem, u.x/ equals

u.x/ D ExŒg.Xd .//

DPx.Xd .t/ reaches the spherejjzjj D R before it reaches the spherejjzjj D r/:

For now, let us consider the case d D 1. Fix positive numbers r; R and suppose aone-dimensional Brownian motion starts at a number x between r and R, 0 < r <

x < R < 1. Then the probability that it will hit the line z D R before hittingthe line z D r is xr

Rr. The closer the starting point x is to R, the larger is the

probability that it will first hit the line z D R. Furthermore, the probability is a verysimple linear function. We revisit the case d > 1 when we discuss recurrence andtransience of d -dimensional Brownian motion in the next section.

12.4.1 Recurrence and Transience

We observed during our discussion of the lattice random walk (Chapter 11) that itis recurrent in dimensions d D 1; 2 and transient for d > 2. That is, in one and twodimensions the lattice random walk returns to any integer value x at least once (andhence infinitely often) with probability one, but for d > 2, the probability that therandom walk returns at all to its starting point is less than one. For the Brownianmotion, when the dimension is more than one, the correct question is not to ask ifit returns to particular points x. The correct question to ask is if it returns to anyfixed neighborhood of a particular point, however small. The answers are similarto the case of the lattice random walk; that is, in one dimension, Brownian motionreturns to any point x infinitely often with probability one, and in two dimensions,Brownian motion returns to any given neighborhood of a point x infinitely oftenwith probability one. But when d > 2, it diverges off to infinity. We can see this byusing the connection of Brownian motion to the Dirichlet problem on discs. We firstneed two definitions.

Definition 12.17. For d > 1, a d -dimensional stochastic process Xd .t/; t 0 iscalled neighborhood recurrent if with probability one, it returns to any given ballB.x; / infinitely often.

Definition 12.18. For any d , a d -dimensional stochastic process Xd .t/; t 0 iscalled transient if with probability one, Xd .t/ diverges to infinity.

We now show how the connection of the Brownian motion to the solution of theDirichlet problem will help us establish that Brownian motion is transient for d > 2.That is, if we let B be the event that limt!1 jjWd .t/jj ¤ 1, then we show thatP.B/ must be zero for d > 2. Indeed, to be specific, take d D 3, pick a point

12.5 The Local Time of Brownian Motion 419

x 2 R3 with jjxjj > 1, suppose that our Brownian motion is now sitting at the pointx, and ask what the probability is that it will reach the unit ball B1 before it reachesthe disk jjzjj D R. Here R > jjxjj. We have derived this probability. The Markov

property of Brownian motion gives this probability to be exactly equal to 1 1 1jjxjj

1 1R

.

This clearly converges to 1jjxjj as R ! 1. Imagine now that the process has evolved

for a long time, say T , and that it is now sitting at a very distant x (i.e., jjxjj is large).The LIL for Brownian motion guarantees that we can pick such a large T and sucha distant x. Then, the probability of ever returning from x to the unit ball wouldbe the small number D 1

jjxjj . We can make arbitrarily small by choosing jjxjjsufficiently large, and what that means is that the probability of the process returninginfinitely often to the unit ball B1 is zero. The same argument works for Bk , the ballof radius k for any k 1, and therefore, P.B/ D P.[1

kD1Bk/ D 0; that is, the

process drifts off to infinity with probability one. The same argument works for anyd > 2, not just d D 3. The case of d D 1; 2 is left as a chapter exercise. We then havethe following theorem. utTheorem 12.10. Brownian motion visits every real x infinitely often with prob-ability one if d D 1, is neighborhood recurrent if d D 2, and transient if d > 2.Moreover, by its neighborhood recurrence for d D 2, the graph of a two-dimensionalBrownian path on Œ0; 1/ is dense in the two-dimensional plane.

12.5 The Local Time of Brownian Motion

For the simple symmetric random walk in one dimension, we derived the distribu-tion of the local time .x; n/, which is the total time the random walk spends at theinteger x up to the time instant n. It would not be interesting to ask exactly the samequestion about Brownian motion, because the number of time points t up to sometime T at which the Brownian motion W.t/ equals a given x is zero or infinity. PaulLevy gave the following definition for the local time of a Brownian motion. Fix aset A in the real line and a general time instant T; T > 0. Now ask what is the totalsize of the times t up to T at which the Brownian motion has resided in the givenset A. That is, denoting Lebesgue measure on R by , look at the following kernel

H.A; T / D ft T W W.t/ 2 Ag:

Using this, Levy formulated the local time of the Brownian motion at a given x as

.x; T / D lim#0

H.Œx ; x C ; T /

2;

where the limit is supposed to mean a pointwise almost sure limit. It is important tonote that the existence of the almost sure limit is nontrivial.


Instead of the clumsy notation T , we eventually simply use the notation t , andthereby obtain a new stochastic process .x; t/, indexed simultaneously by twoparameters x and t . We can regard .x; t/ together as a vector-valued time parameter,and call .x; t/ a random field. This is called the local time of one-dimensionalBrownian motion. The local time of Brownian motion is generally regarded to be ananalytically difficult process to study. We give a relatively elementary exposition tothe local time of Brownian motion in this section.

Recall now the previously introduced maximum process of standard Brownianmotion, namely M.t/ D sup0st W.s/. The following major theorem on the dis-tribution of the local time of Brownian motion at zero was proved by Paul Levy.

Theorem 12.11. Let W.s/; s 0 be standard Brownian motion starting at zero.Consider the two stochastic processes, f.0; t/; t 0g, and fM.t/; t 0g. Thesetwo processes have the same distribution.

In particular, for any given fixed t , and y > 0,

P

.0; t/p

t y

D

r2

Z y

0

ez2=2d z D 2ˆ.y/ 1

, P..0; t/ y/ Dr

2

t

Z y

0

ez2=.2t/d z:

For a detailed proof of this theorem, we refer to Morters and Peres (2010, p. 160).A sketch of the proof can be seen in Revesz (2005).

For a general level x, the corresponding result is as follows, and it follows fromthe case x D 0 treated above.

Theorem 12.12.

P..x; t/ y/ D 2ˆ

y C jxjp

t

1; 1 < x < 1; t; y > 0:

It is important to note that if the level x ¤ 0, then the local time .x; t/ can actuallybe exactly equal to zero with a positive probability, and this probability is simplythe probability that Brownian motion does not reach x within time t , and equals2ˆ. jxjp

t/ 1. This is not the case if the level is zero, in which case the local time

.0; t/ possesses a density function.The theorem above also says that the local time of Brownian motion grows at

the rate ofp

t for any level x. The expected value follows easily by evaluating theintegral

R 10

Œ1 P..x; t/ y/dy, and one gets

EŒ.x; t/ D 4p

t

xpt

1 ˆ

jxjpt

4jxj

1 ˆ

jxjpt

2

:

The limit of this as x ! 0 equalsq

2

pt , which agrees with EŒ.0; t/. The ex-

pected local time is plotted in Fig. 12.3.

12.6 Invariance Principle and Statistical Applications 421

01

23

45 0

2

4

6

8

10

0

1

2

01

23

4

Fig. 12.3 Plot of the expected local time as function of (x,t)

12.6 Invariance Principle and Statistical Applications

We remarked in the first section of this chapter that scaled random walks mimic theBrownian motion in a suitable asymptotic sense. As a matter of fact, if X1; X2; : : :

is any iid sequence of one-dimensional random variables satisfying some relativelysimple conditions, then the sequence of partial sums Sn D Pn

iD1 Xi ; n 1, whenappropriately scaled, mimics Brownian motion in a suitable asymptotic sense. Whyis this useful? This is useful because in many concrete problems of probability andstatistics, suitable functionals of the sequence of partial sums arise as the objectsof direct importance. The invariance principle allows us to conclude that if the se-quence of partial sums Sn mimics W.t/, then any nice functional of the sequence ofpartial sums will also mimic the same functional of W.t/. So, if we can figure outhow to deal with the distribution of the needed functional of the W.t/ process, thenwe can use it in practice to approximate the much more complicated distribution ofthe original functional of the sequence of partial sums. It is a profoundly useful factin the asymptotic theory of probability that all of this is indeed a reality. This sec-tion treats the invariance principle for the partial sum process of one-dimensionaliid random variables. We recommend Billingsley (1968), Hall and Heyde (1980),and Csorgo and Revesz (1981) for detailed and technical treatments; Erdos and Kac(1946), Donsker (1951), Komlos et al. (1975, 1976, Major (1978), Whitt (1980),and Csorgo and Hall (1984) for invariance principles for the partial sum process;and Pyke (1984) and Csorgo (2002)) for lucid reviews. Also, see Dasgupta (2008)for references to various significant extensions, such as the multidimensional anddependent cases.


Although the invariance principle for partial sums of iid random variables is usu-ally credited to Donsker (1951), Erdos and Kac (1946) contained the basic ideabehind the invariance principle and also worked out the asymptotic distribution of anumber of key and interesting functionals of the partial sum process. Donsker (1951)provided the full generalization of the Erdos–Kac technique by providing explicitembeddings of the discrete sequence Skp

n; k D 1; 2; : : : ; n into a continuous-time

stochastic process Sn.t/ and by establishing the limiting distribution of a generalcontinuous functional h.Sn.t//. In order to achieve this, it is necessary to use a con-tinuous mapping theorem for metric spaces, as consideration of Euclidean spacesis no longer enough. It is also useful to exploit a property of the Brownian mo-tion known as the Skorohod embedding theorem. We first describe this necessarybackground material.

Define

C Œ0; 1 D Class of all continuous real valued functions on Œ0; 1; and

DŒ0; 1 D Class of all real-valued functions on Œ0; 1 that are right continuous

and have a left limit at every point in Œ0; 1:

Given two functions f; g in either C Œ0; 1 or DŒ0; 1, let .f; g/ D sup0t1

jf .t/ g.t/j denote the supremum distance between f and g. We refer to as theuniform metric. Both C Œ0; 1 and DŒ0; 1 are (complete) metric spaces with respectto the uniform metric .

Suppose X1; X2; : : : is an iid sequence of real valued random variables with meanzero and variance one. Two common embeddings of the discrete sequence Skp

n; k D

1; 2; : : : ; n into a continuous time process are the following.

Sn;1.t/ D 1pn

ŒSbntc C fntgXbntcC1;

and

Sn;2.t/ D 1pn

SŒnt;

0 t 1. Here, b:c denotes the integer part and f:g the fractional part of a positivereal.

The first one simply continuously interpolates between the values Skpn

by drawingstraight lines, but the second one is only right continuous, with jumps at the pointst D k

n; k D 1; 2; : : : ; n. For certain specific applications, the second embedding is

more useful. It is because of these jump discontinuities that Donsker needed to con-sider weak convergence in DŒ0; 1. It led to some additional technical complexities.

The main idea from this point on is not difficult. One can produce a versionof Sn.t/, say QSn.t/, such that QSn.t/ is close to a sequence of Wiener processesWn.t/. Because QSn.t/ Wn.t/, if h.:/ is a continuous functional with respect tothe uniform metric, then one can expect that h. QSn.t// h.Wn.t// D h.W.t// indistribution. QSn.t/ being a version of Sn.t/; h.Sn.t// D h. QSn.t// in distribution,

12.6 Invariance Principle and Statistical Applications 423

and so, h.Sn.t// should be close to the fixed Brownian functional h.W.t// in distri-bution, which is the question we wanted to answer.

The results leading to Donsker’s theorem are presented below.

Theorem 12.13 (Skorohod Embedding). Given a random variable X with meanzero and a finite variance 2, we can construct (on the same probability space) astandard Brownian motion W.t/ starting at zero, and a stopping time with respectto W.t/ such that E./ D 2 and X and the stopped Brownian motion W./ havethe same distribution.

Theorem 12.14 (Convergence of the Partial Sum Process to Brownian Motion).Let Sn.t/ D Sn;1.t/ or Sn;2.t/ as defined above. Then there exists a common prob-ability space on which one can define Wiener processes Wn.t/ starting at zero, anda sequence of processes f QSn.t/g; n 1, such that

(a) For each n; Sn.t/ and QSn.t/ are identically distributed as processes.

(b) sup0t1j QSn.t/ Wn.t/j P) 0:

We prove the last theorem, assuming the Skorohod embedding theorem. A proofof the Skorohod embedding theorem may be seen in Csorgo and Revesz (1981), orin Bhattacharya and Waymire (2007, p. 160).

Proof. We treat only the linearly interpolated process Sn;1.t/, and simply call itSn.t/. To reduce notational clutter, we write as if the version QSn of Sn is Sn itself.Thus, the QSn notation is dropped in the proof of the theorem. Without loss of gener-ality, we take E.X1/ D 0 and Var.X1/ D 1. First, by using the Skorohod embeddingtheorem, construct a stopping time 1 with respect to the process W.t/; t 0 such

that E.1/ D 1 and such that W.1/LD X1. Using the strong Markov property of

Brownian motion, W.t C 1/ W.1/ is also a Brownian motion on Œ0; 1/, inde-pendent of .1; W.1//, and we can now pick a stopping time, say 0

2 with respect to

this process, with the two properties E. 02/ D 1 and W. 0

2/LD X2. Therefore, if we

define 2 D 1 C 02, then we have obtained a stopping time with respect to the orig-

inal Brownian motion, with the properties that its expectation is 2, and 2 1 and1 are independent. Proceeding in this way, we can construct an infinite sequence ofstopping times 0 D 0 1 2 3 , such that k k1 are iid with meanone, and the two discrete time processes Sk and W.k/ have the same distribution.Moreover, by the usual SLLN,

n

nD 1

n

nXkD1

Œk k1a:s:! 1;

from which it follows that

max0kn jk kjn

P! 0:


Set Wn.t/ D W.nt/pn

; n 1. Therefore, in this notation, W.k/ D pnWn.

k

n/. Now

fix > 0 and consider the event Bn D fsup0t1 jSn.t/ Wn.t/j > g. We need toshow that P.Bn/ ! 0.

Now, because Sn.t/ is defined by linear interpolation, in order that Bn happens,at some t in Œ0; 1 we must have one of

ˇSk=

pn Wn.t/

ˇand

ˇSk1=

pn Wn.t/

ˇ

larger than , where k is the unique k such that k1n

t < kn

. Our goal is toshow that the probability of the union of these two events is small. Now use thefact that in distribution, Sk D W.k/ D p

nWn.k

n/, and so it will suffice to show

that the probability of the union of the two events fjWn. k

n/ Wn.t/j > g and

fjWn.k1

n/Wn.t/j > g is small. However, the union of these two events can only

happen if either Wn differs by a large amount in a small interval, or one of the twotime instants k

nand k1

nare far from t . The first possibility has a small probability

by the uniform continuity of paths of a Brownian motion (on any compact interval),and the second possibility has a small probability by our earlier observation thatmax0kn jkkj

n

P! 0: This implies that P.Bn/ is small for all large n, as we wantedto show.

This theorem implies the following important result by an application of the con-tinuous mapping theorem, continuity being defined through the uniform metric onthe space C Œ0; 1.

Theorem 12.15 (Donsker’s Invariance Principle). Let h be a continuous func-tional with respect to the uniform metric on C Œ0; 1 and let Sn.t/ be defined as

either Sn;1.t/ or Sn;2.t/. Then h.Sn.t//L) h.W.t//, as n ! 1.

Example 12.6 (CLT Follows from Invariance Principle). The central limit theoremfor iid random variables having a finite variance follows as a simple consequence ofDonsker’s invariance principle. Suppose X1; X2; : : : are iid random variables withmean zero and variance 1. Let Sk D Pk

iD1 Xi ; k 1. Define the functional h.f / Df .1/ on C Œ0; 1. This is obviously a continuous functional on C Œ0; 1 with respect tothe uniform metric .f; g/ D sup0t1 jf .t/ g.t/j. Therefore, with Sn.t/ as thelinearly interpolated partial sum process, it follows from the invariance principle that

h.Sn/ D Sn.1/ DPn

iD1 Xipn

L) h.W / D W.1/ N.0; 1/;

which is the central limit theorem.

Example 12.7 (Maximum of a Random Walk). We apply the Donsker invarianceprinciple to the problem of determination of the limiting distribution of a functionalof a random walk. Suppose X1; X2; : : : are iid random variables with mean zero andvariance 1. Let Sk D Pk

iD1 Xi ; k 1. We want to derive the limiting distribution of

12.7 Strong Invariance Principle and the KMT Theorem 425

Gn D max1kn Skpn

. To derive its limiting distribution, define the functional h.f / Dsup0t1 f .t/ on C Œ0; 1. This is a continuous functional on C Œ0; 1 with respectto the uniform metric .f; g/ D sup0t1 jf .t/ g.t/j. Further notice that ourstatistic Gn can be represented as Gn D h.Sn/, where Sn is the linearly interpolated

partial sum process. Therefore, by Donsker’s invariance principle, Gn D h.Sn/L)

h.W / D sup0t1 W.t/, where W.t/ is standard Brownian motion on Œ0; 1. Weknow its CDF explicitly, namely, for x > 0, P.sup0t1 W.t/ x/ D 2ˆ.x/ 1.Thus, P.Gn x/ ! 2ˆ.x/ 1 for all x > 0.

Example 12.8 (Sums of Powers of Partial Sums). Consider once again iid randomvariables X1; X2; : : : with zero mean and a unit variance. Fix a positive integer m

and consider the statistic Tm;n D n1m=2Pn

kD1 Smk

. By direct integration of the

polygonal curve ŒSn.t/m, we find that Tm;n D R 1

0ŒSn.t/mdt . This guides us to

the functional h.f / D R 1

0 f m.t/dt . Because Œ0; 1 is a compact interval, it is easyto verify that h is a continuous functional on C Œ0; 1 with respect to the uniformmetric. Indeed, the continuity of h.f / follows from simply the algebraic identityjxm ymj D jx yjjxm1 C xm2y C C ym1j. It therefore follows from

Donsker’s invariance principle that Tm;n

L) R 1

0W m.t/dt . At first glance it seems

surprising that a nondegenerate limit distribution for partial sums of Smk

can existwith only two moments.

12.7 Strong Invariance Principle and the KMT Theorem

In addition to the weak invariance principle described above, there are also stronginvariance principles. The first strong invariance principle for partial sums was ob-tained in Strassen (1964). Since then, a lot of literature has developed, includingfor the multidimensional case. Good sources for information are Strassen (1967),Komlos et al. (1976), Major (1978), Csorgo and Revesz (1981), and Einmahl (1987).

It would be helpful to first understand exactly what a strong invariance principleis meant to achieve. Suppose X1; X2; : : : is a zero mean unit variance iid sequenceof random variables. For n 1, let Sn denote the partial sum

PniD1 Xi , and Sn.t/

the interpolated partial sum process with the special values Sn. kn

/ D Skpn

for each

n and 1 k n. In the process of proving Donsker’s invariance principle, wehave shown that we can construct (on a common probability space) a process QSn.t/

(which is equivalent to the original process Sn.t/ in distribution) and a single Wiener

process W.t/ such that sup0t1 j QSn.t/ 1pn

W.nt/j P! 0. Therefore,

j QSn.1/ 1pn

W.n/j P! 0

) j QSn W.n/jpn

P! 0:


The strong invariance principle asks if we can find suitable functions g.n/ such that

we can make the stronger statement j QSnW.n/jg.n/

a:s:! 0, and as a next step, what is thebest possible choice for such a function g.

The exact statements of the strong invariance principle results require us to saythat we can construct an equivalent process QSn.t/ and a Wiener process W.t/ on

some probability space such that j QSnW.n/jg.n/

a:s:! 0 for some suitable function g. Dueto the clumsiness in repeatedly having to mention these qualifications, we drop theQSn notation and simply say Sn.t/, and we also do not mention that the processes

have all been constructed on some new probability space. The important thing forapplications is that we can use the approximations on the original process itself, bysimply adopting the equivalent process on the new probability space.

Paradoxically, the strong invariance principle does not imply the weak invarianceprinciple (i.e., Donsker’s invariance principle) in general. This is because under theassumption of just the finiteness of the variance of the Xi , the best possible g.n/

increases faster thanp

n. On the other hand, if the common distribution of the Xi

satisfies more stringent moment conditions, then we can make g.n/ a lot slower,and even slower than

pn. The array of results that is available is bewildering and

they are all difficult to prove. We prefer to report a few results of great importance,including in particular the KMT theorem, due to Komlos et al. (1976).

Theorem 12.16. Let X1; X2; : : : be an iid sequence with E.Xi / D 0; Var.Xi / D 1.

(a) There exists a Wiener process W.t/; t 0, starting at zero such thatSnW.n/p

n log log n

a:s:! 0.

(b) Thep

n log log n rate of part (a) cannot be improved in the sense thatgiven any nondecreasing sequence an ! 1 (however slowly), there existsa CDF F with zero mean and unit variance, such that with probability one,lim supn an

SnW.n/pn log log n

D 1, for any iid sequence Xi following the CDF F , and

any Wiener process W.t/.(c) If we make the stronger assumption that Xi has a finite mgf in some open neigh-

borhood of zero, then the statement of part (a) can be improved to jSnW.n/j DO.log n/ with probability one.

(d) (KMT Theorem) Specifically, if we make the stronger assumption that Xi has afinite mgf in some open neighborhood of zero, then we can find suitable positiveconstants C; K; such that for any real number x and any given n,

P. max1kn

jSk W.k/j C log n C x/ Kex;

where the constants C; K; depend only on the common CDF of the Xi .

Remark. The KMT theorem is widely regarded as one of the most major advancesin the area of invariance principles and central limit problems. One should note thatthe inequality given in the above theorem has a qualitative nature attached to it,as we can only use the inequality with constants C; K; that are known to exist,depending on the underlying F . Refinements of the version of the inequality given

12.8 Brownian Motion with Drift and Ornstein–Uhlenbeck Process 427

above are available. We refer to Csorgo and Revesz (1981) for such refinements andgeneral detailed treatment of the strong invariance principle.

12.8 Brownian Motion with Drift and Ornstein–UhlenbeckProcess

We finish with two special processes derived from standard Brownian motion. Bothare important in applications.

Definition 12.19. Let W.t/; t 0 be standard Brownian motion starting at zero.Fix 2 R and > 0. Then the process X.t/ D t C W.t/; t 0 is calledBrownian motion with drift and diffusion coefficient . It is clear that it inheritsthe major path properties of standard Brownian motion, such as nondifferentiablityat all t with probability one, the independent increments property, and the Markovproperty. Also, clearly, for fixed t; X.t/ N.t; 2t/.

12.8.1 Negative Drift and Density of Maximum

There are, however, also some important differences when a drift is introduced. Forexample, unless D 0, the reflection principle no longer holds, and consequentlyone cannot derive the distribution of the running maximum M.t/ D sup0st X.s/

by using symmetry arguments. If 0, then it is not meaningful to ask for thedistribution of the maximum over all t > 0. However, if < 0, then the processhas a tendency to drift off towards negative values, and in that case the maximum infact does have a nontrivial distribution. We derive the distribution of the maximumwhen < 0 by using a result on a particular first passage time of the process.

Theorem 12.17. Let X.t/; t 0 be Brownian motion starting at zero, and withdrift < 0 and diffusion coefficient . Fix a < 0 < b, and let

Ta;b D minŒinfft > 0 W X.t/ D ag; infft > 0 W X.t/ D bg;the first time X.t/ reaches either a or b. Then,

P.XTa;bD b/ D e2a=2 1

e2a=2 e2b=2:

A proof of this theorem can be seen in Karlin and Taylor (1975, p 361). By usingthis result, we can derive the distribution of supt>0 X.t/ in the case < 0.

Theorem 12.18 (The Maximum of Brownian Motion with a Negative Drift). IfX.t/; t 0 is Brownian motion starting at zero, and with drift < 0 and diffusioncoefficient , then, supt>0 X.t/ is distributed as an exponential with mean 2

2.


Proof. In the theorem stated above, by letting a ! 1, we get

P.X.t/ ever reaches the level b > 0/ D e2b=2

:

But this is the probability that an exponential variable with mean 2

2is larger

than b. On the other hand, P.X.t/ ever reaches the level b > 0/ is the same asP.supt>0 X.t/ b/. Therefore, supt>0 X.t/ must have an exponential distribution

with mean 2

2. ut

Example 12.9 (Probability That Brownian Motion Does Not Hit a Line). Considerstandard Brownian motion W.t/ starting at zero on Œ0; 1/, and consider a straightline L with the equation y D aCbt; a; b > 0. Because W.0/ D 0; a > 0, and pathsof W.t/ are continuous, the probability that W.t/ does not hit the line L is the sameas P.W.t/ < a C bt8t > 0/. However, if we define a new Brownian motion (withdrift) X.t/ as X.t/ D W.t/ bt , then

P.W.t/ < a C bt8t > 0/ D P

supt>0

X.t/ a

D 1 e2ab ;

by our theorem above on the maximum of a Brownian motion with a negative drift.We notice that the probability that W.t/ does not hit L is monotone increasing ineach of a; b, as it should be.

12.8.2 Transition Density and the Heat Equation

If we consider Brownian motion starting at some number x, and with drift < 0

and diffusion coefficient , then by simple calculations, the conditional distributionof X.t/ given that X.0/ D x is N.x C t; 2t/, which has the density

pt .x; y/ D 1p2

pte

.yxt/2

22t :

This is called the transition density of the process. The transition density satisfies avery special partial differential equation, which we now prove.

By direct differentiation,

@

@tpt .x; y/ D .x y/2 2t2 2t

2p

23t5=2e

.yxt/2

22t I@

@ypt .x; y/ D x y C tp

23t3=2e

.yxt/2

22t I@2

@y2pt .x; y/ D .x y C t/2 2tp

25t5=2e

.yxt/2

22t :

12.8 Brownian Motion with Drift and Ornstein–Uhlenbeck Process 429

On using these three expressions, it follows that the transition density pt .x; y/ sat-isfies the partial differential equation

@

@tpt .x; y/ D

@

@ypt .x; y/ C 2

2

@2

@y2pt .x; y/:

This is the drift-diffusion equation in one dimension. In the particular case that D0(no drift), and D 1, the equation reduces to the celebrated heat equation

@

@tpt .x; y/ D 1

2

@2

@y2pt .x; y/:

Returning to the drift-diffusion equation for the transition density in general, if wenow take a general function f .x; y/ that is twice continuously differentiable in y

and is bounded above by Kecjyj for some finite K; c > 0, then integration by partsin the drift-diffusion equation produces the following expectation identity, which westate as a theorem.

Theorem 12.19. Let x; be any real numbers, and > 0. Suppose Y N.x Ct; 2t/, and f .x; y/ twice continuously differentiable in y such that for some 0 <

K; c < 1; jf .x; y/j Kecjyj for all y. Then,

@

@tExŒf .x; Y / D Ex

@

@yf .x; Y /

C 2

2Ex

@2

@y2f .x; Y /

:

This identity and a multidimensional version of it has been used in Brown et al.(2006) to derive various results in statistical decision theory.

12.8.3 The Ornstein–Uhlenbeck Process

The covariance function of standard Brownian motion W.t/ is Cov.W.s/; W.t// Dmin.s; t/. Therefore, if we scale by

pt , and let X.t/ D W.t/p

t; t > 0, we get that

Cov.X.s/; X.t// Dq

min.s;t/max.s;t/

Dq

st, if s t . Therefore, the covariance is a func-

tion of only the time lag on a logarithmic time scale. This motivates the definitionof the Ornstein–Uhlenbeck process as follows.

Definition 12.20. Let W.t/ be standard Brownian motion starting at zero, and let’ > 0 be a fixed constant. Then X.t/ D e ’t

2 W.e’t /; 1 < t < 1 is calledthe Ornstein–Uhlenbeck process. The most general Ornstein–Uhlenbeck process isdefined as

X.t/ D C p’

e ’t2 W.e’t /; 1 < < 1; ’; > 0:

In contrast to the Wiener process, the Ornstein–Uhlenbeck process has a locallytime-dependent drift. If the present state of the process is larger than , the global


mean, then the drift drags the process back towards , and if the present state ofthe process is smaller than , then it does the reverse. The ’ parameter controls thistendency to return to the grand mean. The third parameter controls the variability.

Theorem 12.20. Let X.t/ be a general Ornstein–Uhlenbeck process. Then, X.t/

is a stationary Gaussian process with EŒX.t/ D , and Cov.X.s/; X.t// D2

’e ’

2 jst j.

Proof. It is obvious that EŒX.t/ D . By definition of X.t/,

Cov.X.s/; X.t// D 2

’e ’

2.sCt/ min.e’s; e’t / D 2

’e ’

2.sCt/e’ min.s;t/

D 2

’e

’2

min.s;t/e ’2

max.s;t/ D 2

’e.’=2/jst j;

and inasmuch as Cov.X.s/; X.t// is a function of only js t j, it follows that it isstationary. utExample 12.10 (Convergence of Integrated Ornstein–Uhlenbeck to Brownian Mo-tion). Consider an Ornstein–Uhlenbeck process X.t/ with parameters ; ’, and2. In a suitable asymptotic sense, the integrated Ornstein–Uhlenbeck processconverges to a Brownian motion with drift and an appropriate diffusion coef-ficient; the diffusion coefficient can be adjusted to be one. Towards this, defineY.t/ D R t

0X.u/du. This is clearly also a Gaussian process. We show in this ex-

ample that if 2; ’ ! 1 in such a way that 42

’2 ! c2; 0 < c < 1, thenCov.Y.s/; Y.t// ! c2 min.s; t/. In other words, in the asymptotic paradigm where; ’ ! 1, but are of comparable order, the integrated Ornstein–Uhlenbeck processY.t/ is approximately the same as a Brownian motion with some drift and somediffusion coefficient, in the sense of distribution.

We directly calculate Cov.Y.s/; Y.t//. There is no loss of generality in taking

to be zero. Take 0 < s t < 1. Then

Cov.Y.s/; Y.t// DZ t

0

Z s

0

EŒX.u/X.v/dudv D 2

’

Z t

0

Z s

0

e ’2 juvjdudv

D 2

’

Z s

0

Z v

0

e ’2 juvjdudv C

Z s

0

Z s

ve ’

2 juvjdudv

CZ t

s

Z s

0

e ’2 juvjdudv

D 2

’

Z s

0

Z v

0

e ’2

.vu/dudv CZ s

0

Z s

ve ’

2.uv/dudv

CZ t

s

Z s

0

e ’2

.vu/dudv

D 2

’

4

’2

h’s C e’s=2 C e’t=2 e’.ts/=2

i;

on doing the three integrals in the line before, and on adding them.

Exercises 431

If ’; 2 ! 1, and 42

’2 converges to some finite and nonzero constant c2, thenfor any s; t; 0 < s < t , the derived expression for Cov.Y.s/; Y.t// ! c2s Dc2min.s; t/, which is the covariance function of Brownian motion with diffusioncoefficient c.

The Ornstein–Uhlenbeck process enjoys another important property besides sta-tionarity. It is also a Markov process. It is the only stationary and Markov Gaussianprocess with paths that are smooth. This property explains some of the popularityof the Ornstein-Uhlenbeck process in fitting models to real data.

Exercises

Exercise 12.1 (Simple Processes). Let X0; X1; X2; : : : be a sequence of iid stan-dard normal variables, and W.t/; t 0 a standard Brownian motion independent ofthe Xi sequence, starting at zero. Determine which of the following processes areGaussian, and which are stationary.

(a) X.t/ X1CX2p2

:

(b) X.t/ j X1CX2p2

j:(c) X.t/ D tX1X2q

X21

CX22

:

(d) X.t/ D Pkj D0ŒX2j cos j t C X2j C1 sin j t :

(e) X.t/ D t2W. 1t2 /; t > 0, and X.0/ D 0:

(f) X.t/ D W.t jX0j/:

Exercise 12.2. Let X.t/ D sin t , where U Œ0; 2.

(a) Suppose the time parameter t belongs to T D f1; 2; : : : ; g. Is X.t/ stationary?(b) Suppose the time parameter t belongs to T D Œ0; 1/. Is X.t/ stationary?

Exercise 12.3. Suppose W.t/; t 0 is a standard Brownian motion starting at zero,and Y N.0; 1/, independent of the W.t/ process. Let X.t/ D Yf .t/ C W.t/,where f is a deterministic function. Is X.t/ stationary?

Exercise 12.4 (Increments of Brownian Motion). Suppose W.t/; t 0 is astandard Brownian motion starting at zero, and Y is a positive random vari-able independent of the W.t/ process. Let X.t/ D W.t C Y / W.t/. Is X.t/

stationary?

Exercise 12.5. Suppose W.t/; t 0 is a standard Brownian motion starting at zero.Let X.n/ D W.1/ C W.2/ C : : : C W.n/; n 1. Find the covariance function ofthe process X.n/; n 1.


Exercise 12.6 (Moments of the Hitting Time). Suppose W.t/; t 0 is a standardBrownian motion starting at zero. Fix a > 0 and let Ta be the first time W.t/ hits a.Characterize all ’ such that EŒT ’

a < 1.

Exercise 12.7 (Hitting Time of the Positive Quadrant). Suppose W.t/; t 0 is astandard Brownian motion starting at zero. Let T D infft > 0 W W.t/ > 0g. Showthat with probability one, T D 0.

Exercise 12.8. Suppose W.t/; t 0 is standard Brownian motion starting at zero.Fix z > 0 and let Tz be the first time W.t/ hits z. Let 0 < a < b < 1. FindE.Tb jTa D t/.

Exercise 12.9 (Running Maximum of Brownian Motion). Let W.t/; t 0

be standard Brownian motion on Œ0; 1/ and M.t/ D sup0st W.s/. EvaluateP.M.1/ D M.2//.

Exercise 12.10. Let W.t/; t 0 be standard Brownian motion on Œ0; 1/. Let T >

0 be a fixed finite time instant. Find the density of the first zero of W.t/ after thetime t D T . Does it have a finite mean?

Exercise 12.11 (Integrated Brownian Motion). Let W.t/; t 0 be standardBrownian motion on Œ0; 1/. Let X.t/ D R t

0W.s/ds. Identify explicit positive

constants K; ’ such that for any t; c > 0; P.jX.t/j c/ Kt’

c:


0W.s/ds. Prove that for any fixed t; X.t/

has a finite mgf everywhere, and use it to derive the fourth moment of X.t/.


0W.s/ds. Find

(a) E.X.t/ jW.t/ D w/.(b) E.W.t/ jX.t/ D x/.(c) The correlation between X.t/ and W.t/.(d) P.X.t/ > 0; W.t/ > 0/ for a given t .

Exercise 12.14 (Application of Reflection Principle). Let W.t/; t 0 be standardBrownian motion on Œ0; 1/ and M.t/ D sup0st W.s/. Prove that P.W.t/ w; M.t/ x/ D 1 ˆ. 2xwp

t/; x w; x 0. Hence, derive the joint density of

W.t/ and M.t/.

Exercise 12.15 (Current Value and Current Maximum). Let W.t/; t 0 bestandard Brownian motion on Œ0; 1/ and M.t/ D sup0st W.s/. Find P.W.t/ DM.t// and find its limit as t ! 1.

Exercise 12.16 (Current Value and Current Maximum). Let W.t/; t 0

be standard Brownian motion on Œ0; 1/ and M.t/ D sup0st W.s/. FindE.M.t/ jW.t/ D w/.

Exercises 433

Exercise 12.17 (Predicting the Next Value). Let W.t/; t 0 be standard Brown-ian motion on Œ0; 1/ and let NW .t/ D 1

t

R t

0W.s/ds the current running average.

(a) Find OW .t/ D E.W.t/ j NW .t/ D w/:

(b) Find the prediction error EŒjW.t/ OW .t/j:Exercise 12.18 (Zero-Free Intervals). Let W.t/; t 0 be standard Brownian mo-tion, and 0 < s < t < u < 1. Find the conditional probability that W.t/ has nozeroes in Œs; u given that it has no zeroes in Œs; t .

Exercise 12.19 (Application of the LIL). Let W.t/; t 0 be standard Brownianmotion, and 0 < s < t < u < 1. Let X.t/ D W.t/p

t; t > 0. Let K; M be any two posi-

tive numbers. Show that infinitely often, with probability one, X.t/ > K and < M .

Exercise 12.20. Let W.t/; t 0 be standard Brownian motion, and 0 < s < t <

u < 1. Find the conditional expectation of X.t/ given X.s/ D x; X.u/ D y.

Hint: Consider first the conditional expectation of X.t/ given X.0/ D X.1/ D 0.

Exercise 12.21 (Reflected Brownian Motion Is Markov). Let W.t/; t 0 bestandard Brownian motion starting at zero. Show that jW.t/j is a Markov process.

Exercise 12.22 (Adding a Function to Brownian Motion). Let W.t/ be standardBrownian motion on Œ0; 1 and f a general continuous function on Œ0; 1. Show thatwith probability one, X.t/ D W.t/ C f .t/ is everywhere nondifferentiable.

Exercise 12.23 (No Intervals of Monotonicity). Let W.t/; t 0 be standardBrownian motion, and 0 < a < b < 1 two fixed positive numbers. Show, byusing the independent increments property, that with probability one, W.t/ is non-monotone on Œa; b.

Exercise 12.24 (Two-Dimensional Brownian Motion). Show that two-dimensionalstandard Brownian motion is a Markov process.

Exercise 12.25 (An Interesting Connection to Cauchy Distribution). LetW1.t/; W2.t/ be two independent standard Brownian motions on Œ0; 1/ start-ing at zero. Fix a number a > 0 and let Ta be the first time W1.t/ hits a. Find thedistribution of W2.Ta/.

Exercise 12.26 (Time Spent in a Nonempty Set). Let W2.t/; t 0 be a two-dimensional standard Brownian motion starting at zero, and let C be a nonemptyopen set of R2. Show that with probability one, the Lebesgue measure of the set oftimes t at which W.t/ belongs to C is infinite.

Exercise 12.27 (Difference of Two Brownian Motions). Let W1.t/; W2.t/; t 0

be two independent Brownian motions, and let c1; c2 be two constants. Show thatX.t/ D c1W1.t/ C c2W2.t/ is another Brownian motion. Identify any drift anddiffusion parameters.


Exercise 12.28 (Intersection of Brownian Motions). Let W1.t/; W2.t/; t 0 betwo independent standard Brownian motions starting at zero. Let C D ft > 0 WW1.t/ D W2.t/g.

(a) Is C nonempty with probability one?(b) If C is nonempty, is it a finite set, or is it an infinite set with probability one?(c) If C is an infinite set with probability one, is its Lebesgue measure zero or

greater than zero?(d) Does C have accumulation points? Does it have accumulation points with prob-

ability one?

Exercise 12.29 (The L1 Norm of Brownian Motion). Let W.t/; t 0 be stan-dard Brownian motion starting at zero. Show that with probability one, I DR 1

0jW.t/jdt D 1.

Exercise 12.30 (Median Local Time). Find the median of the local time .x; t/ ofa standard Brownian motion on Œ0; 1/ starting at zero.Caution: For x ¤ 0, the local time has a mixed distribution.

Exercise 12.31 (Monotonicity of the Mean Local Time). Give an analytical proofthat the expected value of the local time .x; t/ of a standard Brownian motionstarting at zero is strictly decreasing in the spatial coordinate x.

Exercise 12.32 (Application of Invariance Principle). Let X1; X2; : : : be iid vari-ables with the common distribution P.Xi D ˙1/ D 1

2. Let Sk D Pk

iD1 Xi ; k 1,and …n D 1

n#fk W Sk > 0g. Find the limiting distribution of ˘n by applying

Donsker’s invariance principle.

Exercise 12.33 (Application of Invariance Principle). Let X1; X2; : : : be iid vari-ables with zero mean and a finite variance 2. Let Sk D Pk

iD1 Xi ; k 1,and Mn D n1=2 max1kn Sk . Find the limiting distribution of Mn by applyingDonsker’s invariance principle.


iD1 Xi ; k 1, andAn D n1=2 max1kn jSkj. Find the limiting distribution of An by applyingDonsker’s invariance principle.


iD1 Xi ; k 1, andTn D n3=2

PnkD1 jSkj. Find the limiting distribution of Tn by applying Donsker’s

invariance principle.

Exercise 12.36 (Distributions of Some Functionals). Let W.t/; t 0 be standardBrownian motion starting at zero. Find the density of each of the following func-tionals of the W.t/ process:

References 435

(a) supt>0 W 2.t/;

(b)R 1

0 W.t/dt

W. 12

/;

Hint: The terms in the quotient are jointly normal with zero means.

(c) supt>0W.t/aCbt

; a; b > 0.

Exercise 12.37 (Ornstein–Uhlenbeck Process). Let X.t/ be a general Ornstein–Uhlenbeck process and s < t two general times. Find the expected value of jX.t/ X.s/j.Exercise 12.38. Let X.t/ be a general Ornstein–Uhlenbeck process and Y.t/ DR t

0X.u/du. Find the correlation between Y.s/ and Y.t/ for 0 < s < t < 1, and

find its limit when ; ’ ! 1 and ’

! 1.

Exercise 12.39. Let W.t/; t 0 be standard Brownian motion starting at zero, and0 < s < t < 1 two general times. Find an expression for P.W.t/ > 0 jW.s/ > 0/,and its limit when s is held fixed and t ! 1.

Exercise 12.40 (Application of the Heat Equation). Let Y N.0; 2/ and f .Y /

a twice continuously differentiable convex function of Y . Show that EŒf .Y / isincreasing in , assuming that the expectation exists.

References

Bhattacharya, R.N. and Waymire, E. (2007). A Basic Course in Probability Theory, Springer,New York.

Bhattacharya, R.N. and Waymire, E. (2009). Stochastic Processes with Applications, SIAM,Philadelphia.

Billingsley, P. (1968). Convergence of Probability Measures, John Wiley, New York.Breiman, L. (1992). Probability, Addison-Wesley, New York.Brown, L. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value prob-

lems, Ann. Math. Statist., 42, 855–903.Brown, L., DasGupta, A., Haff, L.R., and Strawderman, W.E. (2006). The heat equation and Stein’s

identity: Connections, Applications, 136, 2254–2278.Csorgo, M. (2002). A glimpse of the impact of Pal Erdos on probability and statistics, Canad. J.

Statist., 30, 4, 493–556.Csorgo, M. and Revesz, P. (1981). Strong Approximations in Probability and Statistics, Academic

Press, New York.Csorgo, S. and Hall, P. (1984). The KMT approximations and their applications, Austr. J. Statist.,

26, 2, 189–218.Dasgupta, A. (2008), Asymptotic Theory of Statistics and Probability Springer,New York.Donsker, M. (1951). An invariance principle for certain probability limit theorems, Mem. Amer.

Math. Soc., 6.Durrett, R. (2001). Essentials of Stochastic Processes, Springer, New York.Einmahl, U. (1987). Strong invariance principles for partial sums of independent random vectors,

Ann. Prob., 15, 4, 1419-1440.Erdos, P. and Kac, M. (1946). On certain limit theorems of the theory of probability, Bull. Amer.

Math. Soc., 52, 292–302.


Freedman, D. (1983). Brownian Motion and Diffusion, Springer, New York.Hall, P. and Heyde, C. (1980). Martingale Limit Theory and Its Applications, Academic Press,

New York.Karatzas, I. and Shreve, S. (1991). Brownian Motion and Stochastic Calculus, Springer, New York.Karlin, S. and Taylor, H. (1975). A First Course in Stochastic Processes, Academic Press,

New York.Komlos, J., Major, P., and Tusnady, G. (1975). An approximation of partial sums of independent

rvs and the sample df :I, Zeit fur Wahr. Verw. Geb., 32, 111–131.Komlos, J., Major, P. and Tusnady, G. (1976). An approximation of partial sums of independent

rvs and the sample df :II, Zeit fur Wahr. Verw. Geb., 34, 33–58.Korner, T. (1986). Fourier Analysis, Cambridge University Press, Cambridge, UK.Lawler, G. (2006). Introduction to Stochastic Processes, Chapman and Hall, New York.Major, P. (1978). On the invariance principle for sums of iid random variables, Mult. Anal., 8,

487-517.Morters, P. and Peres, Y. (2010). Brownian Motion, Cambridge University Press, Cambridge, UK.Pyke, R. (1984). Asymptotic results for empirical and partial sum processes: A review, Canad.

J. Statist., 12, 241–264.Resnick, S. (1992). Adventures in Stochastic Processes, Birkhauser, Boston.Revesz, P. (2005). Random Walk in Random and Nonrandom Environments, World Scientific Press,

Singapore.Revuz, D. and Yor, M. (1994). Continuous Martingales and Brownian Motion, Springer, Berlin.Strassen, V. (1964). An invariance principle for the law of the iterated logarithm, Zeit. Wahr. verw.

Geb., 3, 211–226.Strassen, V. (1967). Almost sure behavior of sums of independent random variables and martin-

gales, Proc. Fifth Berkeley Symp., 1, 315–343, University of California Press, Berkeley.Whitt, W. (1980). Some useful functions for functional limit theorems, Math. Opem. Res., 5,

67–85.

statistical genetics and gaussian stochastic processes

Documents

ndi vi dual

raj ect ory

moment generating functionof

pol ynomi al

char act er

di mensi onal

di st ri

dimensional met hod