population dynamics and random genealogies1uctuates randomly (processes of bienaym e{galton{watson,...

Population Dynamics and Random Genealogies1

Amaury Lambert2

July 2008

—This work is dedicated to the victims of the violent repressions that tookplace in Mexico in 2006, first in San Salvador Atenco on the 3rd and 4th ofMay, and then in Oaxaca from June to December—

1This document was intended primarily to supporting attendants to the ‘IX Simposio de Probabilidady Procesos Estocasticos’ held on November 20–24, 2006, in Guanajuato, Mexico (CIMAT).

2Universite Paris 6 ‘Pierre et Marie Curie’ and Ecole Normale Superieure, Paris.

1

Abstract. We start with stochastic models of genealogies in discrete time, distinguishingmodels where the population size is fixed (models of Cannings, Wright–Fisher and Moran;compositions of bridges) with models where the population size fluctuates randomly (processesof Bienayme–Galton–Watson, Jirina; birth–death processes, logistic branching process).

Scaling limits of these models can be seen as genealogies of continuous populations. Thecontinuous analogue of models with fixed size is the stochastic flow of bridges of Bertoin andLe Gall. That of branching models is the continuous-state branching process. Both processeshave diffusion versions, called respectively the Fisher–Wright diffusion and the Feller diffusion.Connections between the two kinds of models are also studied, and special attention is givento extinction/fixation (probability, expected time, conditioning).

When (sub)populations are bound to extinction, the distribution of their size conditional onextinction, either not to have occurred yet or to occur in the distant future, converges in manysituations. The limits are called quasi-stationary distributions and provide a rigorous notionfor the dynamics of populations that seem stable, at least on the human timescale. We displayquasi-stationarity for all models introduced previously, tackling the problem of uniqueness ofquasi-stationary distributions, and comparing different conditionings (Q-process).

In a slightly different setting, we prove that the contour process of splitting trees withgeneral lifetime is a killed Levy process, and use this observation to derive a certain number ofproperties of these trees, including connections between Levy processes and branching processes(one by Le Gall and Le Jan and another one by Bertoin and Le Gall), and a generalizationof the coalescent point process of Aldous and Popovic. These two kinds of results open upto more general problems such as Ray–Knight type theorems on the one hand, and coalescentprocesses on the other hand.

Running head. Population Dynamics and Random Genealogies.MSC Subject Classification (2000). Primary 92D25; secondary 60-01, 60G09, 60G51, 60G55,60J05, 60J25, 60J55, 60J60, 60J70, 60J80, 60J85, 60K20, 92D10.

2

Foreword

A short story

The first quantitative study of populations came up with the book of Reverend Thomas RobertMalthus in 1798 [92], in which the growth of human populations was computed to be geometric,while that of resources only arithmetic. This idea of limited growth was one of the first notchesagainst the prevailing dogma of pre-established harmony from divine origin. Unfortunately,his computation made Malthus predict the collapse of humanity and by the way, urge for birthcontrol policies and... suppression of welfare !

Shortly after, this novel idea that resources are in limited amount inspired young CharlesDarwin. More wisely than Malthus, Darwin was not willing to apply this idea to human popu-lations, but to all other (extant and extinct) species, and these first thoughts are at the originof the theory of natural selection published in 1859 [29].

A few years later, Sir Francis Galton (who was proudly half-cousin of Darwin) was wonderingwhether the observed demographic decay of British noble families was due to their diminishedfertility or to the law of chances. Galton was unaware of the earlier writings of Irenee–Jules Bi-enayme on that subject [60], and, in that matter, Malthus and Darwin’s works on populationswere useless to him. It was his friend Reverend Henry William Watson who reinvented what isnow known as the Bienayme–Galton–Watson (BGW) process [111], and solved (erroneously)his question.

Surprisingly, it was not until the 1920’s that the BGW process made its third appearance,in the early works of Ronald Aylmer Fisher [46, 47] and John Burdon Sanderson Haldane [55].At that point, it is important to recall that Fisher was not only the famous statistician, butalso one of the three great pioneers of population genetics, along with J.B.S. Haldane andSewall Wright. For further details on the historical aspects of branching processes and theirapplications to biology, see [56, 70].

More recent progress

There are two things we can learn from this naıve story. First, although one should admit thatlater applications of branching processes will also include physical models, it is not anecdoticthat one of the most popular objects in probability theory —for the phrase ‘branching pro-cesses’, a multidisciplinary academic research engine provides more than one million groupedcitations— has its roots in population biology. Indeed, the complexity of biological interactionsis responsible for numerous phenomena that have no obvious visible causes, like fluctuations ofpopulation sizes and allele frequencies, mutant fixations, species invasions, epidemics, extinc-tions, to name but a few, and, as Henri Poincare pointed out, ‘we say they are due to chance’.This suggests that population biology is a natural great source of inspiration for probabilists.

Second, it is known (and unfortunate) that human intuition poorly handles stochasticity[48, 66], so it is no wonder that our little story has begun with a deterministic object —ageometric sequence. And indeed, this scenario has repeated itself various times in the lastcentury, with the mathematical theories of epidemics (Kermack–McKendrick), ecological inter-

3

actions (Lotka–Volterra), spatial diffusion (Kolmogorov–Petrovsky–Piscounov), morphogenesis(Turing),... all making extensive use of deterministic tools like dynamical systems and partialdifferential equations.

Therefore, an interesting feature of recent mathematical research in biology is its propensityto request help from probability theory more and more spontaneously. Without even speakingof tools from the ‘omics’ era, there are at least two examples of recent, active fields of probabilitytheory related to population biology: genealogy, or phylogeny, modelling and reconstruction[12, 36], whose toy model is the Kingman coalescent [75], and social networks [35], whose toymodel is the Erdos–Renyi random graph [37]. In other words, the interplay between probabilitytheory and population biology is not only still in motion, it is accelerating.

Outline

This document comprises four chapters.

In the first chapter, we define and study various models of exchangeable discrete populations.We distinguish genealogical models where the total population size is fixed, from those wherethe population size flucutates randomly. Former models are used in mathematical populationgenetics [28, 43], whereas latter models, usually called ‘branching models’, are more popularin mathematical ecology [77, 98] and adaptive dynamics [22, 23, 24].

Fixed-size models include those of Cannings, Wright–Fisher (discrete time) and Moran(continuous time); branching models include Bienayme–Galton–Watson processes (discretetime) and (some) birth–death processes (continuous time).

In the second chapter, we define scaling limits of these models, to be seen as genealogiesof continuous populations. In discrete time, these continuous genealogies can be modelledby successive compositions of bridges (fixed population size) or by successive compositions ofsubordinators (branching population size).

In continuous time, the continuous analogue of models with fixed size is the stochastic flowof bridges of Bertoin and Le Gall [14], also called generalized Fleming–Viot process, or GFV-process. That of branching models is the continuous-state branching process, or CB-process.Both processes have diffusion versions, called respectively the Fisher–Wright diffusion and theFeller diffusion. Connections between the two kinds of models are also studied, and specialattention is given to extinction and fixation (probability, expected time, conditioning).

When (sub)populations are bound to extinction, there may exist distributions that are invariantconditional on extinction not to have occurred. These distributions are called quasi-stationarydistributions (QSD). The third chapter is devoted to the study of quasi-stationarity.

Special examples are as follows. The limiting distribution, as t→∞, of Zt conditional on{Zt 6= 0}, is called the Yaglom distribution, whereas the limit, as s → ∞, of (Zu; 0 ≤ u ≤ t)conditional on {Zt+s 6= 0}, is called the Q-process. For all models introduced in the first twochapters, we display QSD’s (including Yaglom distributions) and Q-processes. We compareQSD’s with the invariant distributions of the Q-process, and tackle the question of uniquenessof the QSD.

In the fourth chapter, we consider splitting trees, which are those real trees (trees with edge

4

lengths) where all individuals have i.i.d. lifespans (the edge lengths) during which they givebirth, at constant rate, to copies of themselves. The number of individuals alive at time t isa (generally not Markovian) branching process called the homogeneous binary Crump–Mode–Jagers process.

We prove that the contour process of splitting trees with general lifetime, truncated upto some fixed level τ , is a Levy process reflected below τ and killed upon hitting 0, that wecall the jumping chronological contour process (JCCP). We use this observation to derive acertain number of properties of these trees, including spine decompositions, a generalizationof the coalescent point process of Aldous and Popovic [2, 97], and connections between Levyprocesses and branching processes previously discovered by Le Gall and Le Jan [88] and byBertoin and Le Gall [13]. These connections can be used to provide various generalizations ofthe well-known results, called Ray–Knight theorems, that give the law of the local time processof killed Brownian motions.

Notation. In models where the total population size is fixed, the letter Y will denote ei-ther the absolute or the relative random size of a subpopulation. In models where the totalpopulation size is random, we will denote this size by Z.

Secondary references. Books on fixed size models and mathematical population geneticsinclude [28, 36, 43, 51]. On branching models and random trees, book references include[6, 7, 33, 40, 54, 56, 63].

Contents

1 Discrete Populations 71.1 Discrete populations in discrete time . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.1 Fixed size models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.2 Branching models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.1.3 Two relations between fixed size and branching . . . . . . . . . . . . . . 18

1.2 Discrete populations in continuous time . . . . . . . . . . . . . . . . . . . . . . 191.2.1 Basic reminders on generators . . . . . . . . . . . . . . . . . . . . . . . . 191.2.2 Fixed size models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.2.3 Branching models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.2.4 Birth–death processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2 Scaling Limits 332.1 Discrete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.1.1 Fixed size : composition of bridges . . . . . . . . . . . . . . . . . . . . . 332.1.2 Stochastic size : Jirina process . . . . . . . . . . . . . . . . . . . . . . . 35

2.2 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.2.1 Generators, absorption times and conditionings . . . . . . . . . . . . . . 372.2.2 Fixed size : generalized Fleming–Viot processes . . . . . . . . . . . . . . 402.2.3 Stochastic size : continuous-state branching process . . . . . . . . . . . 432.2.4 A relation between GFV-processes and CB-processes . . . . . . . . . . . 49

2.3 Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.3.1 Fisher–Wright diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.3.2 CB-diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.3.3 A relation between Fisher–Wright and Feller diffusions . . . . . . . . . . 56

3 Quasi–Stationarity 573.1 What is quasi–stationarity ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.2 Markov chains with finite state-space . . . . . . . . . . . . . . . . . . . . . . . . 59

3.2.1 Perron–Frobenius theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.2.2 Application to population genetics . . . . . . . . . . . . . . . . . . . . . 60

3.3 Markov chains with countable state-space . . . . . . . . . . . . . . . . . . . . . 613.3.1 R-theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3.2 Application to the BGW model . . . . . . . . . . . . . . . . . . . . . . . 62

3.4 Birth–death processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.5 Kimmel’s branching model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5

CONTENTS 6

3.6 The CB-process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.6.1 Quasi-stationary distributions . . . . . . . . . . . . . . . . . . . . . . . . 703.6.2 The Q-process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.7 Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.7.1 Fisher–Wright diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.7.2 CB-diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.7.3 More general diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4 Splitting Trees 774.1 Preliminaries on trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.1.1 Discrete trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.1.2 Chronological trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2 The exploration process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.2.2 Properties of the JCCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.3 Splitting trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.3.2 Law of the JCCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.3.3 New properties of the Crump–Mode–Jagers process . . . . . . . . . . . . 854.3.4 The coalescent point process . . . . . . . . . . . . . . . . . . . . . . . . 86

4.4 Spine decomposition of infinite trees . . . . . . . . . . . . . . . . . . . . . . . . 874.4.1 The supercritical infinite tree . . . . . . . . . . . . . . . . . . . . . . . . 884.4.2 Conditioned (sub)critical trees . . . . . . . . . . . . . . . . . . . . . . . 90

4.5 Ray–Knight type theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.5.1 Ray–Knight theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.5.2 A new viewpoint on Le Gall–Le Jan theorem . . . . . . . . . . . . . . . 924.5.3 JCCP of the first infinite subtree . . . . . . . . . . . . . . . . . . . . . . 944.5.4 JCCP of the truncated tree . . . . . . . . . . . . . . . . . . . . . . . . . 97

A Levy processes 100

Chapter 1

Discrete populations

1.1 Discrete populations in discrete time

1.1.1 Fixed size models

To stick to standard notation, we set the constant population size to 2N , where N is a positiveinteger. This is due to the fact that in population genetics, it is handy to think of a populationas a basket of gametes, or ‘gametic urn’ (one gamete per homologous gene –at some locusspecified in advance– for each individual, so that each diploid individual theoretically yieldstwo possibly different gametes).

Cannings model

At each time step, the 2N individuals are randomly labelled i = 1, . . . , 2N . The dynamics ofthe Cannings model are given by the following rules (see Fig. 1.1).

• generation n+ 1 is made up of the offspring of individuals from generation n

• for any i, individual i from generation n begets a number ηi of offspring, so that∑

i ηi =2N

• the law of the 2N -tuple (η1, . . . , η2N ) is exchangeable, that is, invariant under permutationof labels

(η1, . . . , η2N ) L= (ηπ(1), . . . , ηπ(2N)),

for any permutation π of {1, . . . , 2N}.

Start with a subpopulation of size Y0 = y, and let Yn denote the number of descendants ofthis subpopulation at time n. In the Cannings model, (Yn;n ≥ 0) is a discrete-time Markovchain, with two absorbing states 0 and 2N . For any integer 0 ≤ y ≤ 2N , we write Py for theconditional probability measure P(· | Y0 = y). Let τ denote the absorption time

τ = inf{n : Yn = 0 or 2N}.

If we exclude the trivial case where each individual begets exactly one child (η1 = 1 a.s.), thenit is easily seen that τ <∞ a.s.

7

CHAPTER 1. DISCRETE POPULATIONS 8

Definition 1.1.1 In total generality, the event {Yτ = 0} is called extinction and denoted{Ext}, whereas {Yτ = 2N} is called fixation, and denoted {Fix}.

To understand what is meant by ‘fixation’, imagine that the subpopulation under focus isthe set of individuals of a certain type which is transmitted faithfully from parent to offspring.

s ss ss ss ss ss ss ss ss ss ss ss ss ss ss s

s ss ss ss ss ss ss ss ss ss ss ss ss ss ss s

!!!!!

!!!!!#

##

##

!!!!!!!!!!

aaaa

aaaaa

accccc

!!!!!!!!!!

aaaaa

aaaaa

!!!!!

aaaa

a

!!!!!

!!!!!!

!!!!

aaaaa!!!!!

aaaaa!!!!!

6

5

4

3

2

10 1 2 3 4 5 6 7 8 9

Figure 1.1: Cannings model for 2N = 6 run on 9 timesteps. The subpopulation started fromthe first 3 individuals fixates at time 7.

Proposition 1.1.2 The Markov chain (Yn;n ≥ 0) is a martingale, and the fixation probabilityPy(Fix) equals y/2N .

Proof. By exchangeability, the ηi’s are equally distributed and since∑

i ηi = 2N , we must

have E(η1) = 1. Then conditional on Yn, and thanks to exchangeability, Yn+1L=∑Yn

i=1 ηi.Taking (conditional) expectations, we get E(Yn+1 | Yn, Yn−1, . . . , Y0) = Yn. Now observe thatτ is a stopping time and apply the stopping theorem to the bounded martingale Y

y = Ey(Yτ ) = Ey(0 · 1Yτ=0 + 2N · 1Yτ=2N ) = 2NPy(Fix).

Another way of proving that Py(Fix) = y/2N consists in giving a different type to each in-dividual in the initial population. By exchangeability, each type has the same probability tobecome fixed, which thus has to be equal to 1/2N . 2

Now let P denote the transition matrix of the Markov chain Y . In other words, P is thesquare matrix of order 2N + 1 and generic element

Pyz = Py(Y1 = z) 0 ≤ y, z ≤ 2N.

Since Py(Yn = z) = e′y Pn ez (where ′ denotes transposition and ex is the vector with zeros

everywhere except a 1 at level x), it can be useful to get some information about the eigenvaluesof P . Recall that the dominant eigenvalue (eigenvalue with maximum modulus) of a transitionmatrix is always 1, and notice that here e0 and e2N are left eigenvectors for P associated to 1,so that the eigenvalue 1 has at least multiplicity 2.

Theorem 1.1.3 (Cannings [20]) The eigenvalues of P ranked in nonincreasing order areλ0 = λ1 = 1 and

λj = E(η1η2 · · · ηj) 1 ≤ j ≤ 2N.


Remark 1.1 In particular, writing σ2 := Var(η1), it is elementary to compute

λ2 = E(η1η2) = 1− σ2

2N − 1.

As a consequence, the multiplicity of the eigenvalue 1 is exactly 2 and the probabilities P(τ >n) = P(Yn 6∈ {0, 2N}) decrease with n like a geometric sequence with reason 1− σ2/(2N − 1).For large populations, this can be quite slow.

Proof. Let Z be the following square matrix of order 2N + 1

Z =

1 0 0 0 . . . 01 1 12 13 . . . 12N

1 2 22 23 . . . 22N

1 3 32 33 . . . 32N

......

......

...1 2N (2N)2 (2N)3 . . . (2N)2N

It is standard that Z is invertible. We are going to show that there is a triangular squarematrix A such that P = ZAZ−1, and whose diagonal elements are a00 = 1 and

ajj = E(η1η2 · · · ηj) 1 ≤ j ≤ 2N.

Since the diagonal elements of the (triangular) matrix A are the common eigenvalues of P andA, the proof will be over. Rewriting P = ZAZ−1 as PZ = ZA, we are looking for coefficients(aij)i≤j such that

2N∑k=0

pikkj =

j∑k=0

ikakj .

Recalling that P is the matrix transition of Y , we can rephrase the question as finding poly-nomials Rj(X) =

∑jk=0 akjX

k of degree j = 0, 1, . . . , 2N , such that

E(Y j1 | Y0 = i) = Rj(i).

Now sinceE(Y j

1 | Y0 = i) = E((η1 + · · ·+ ηi)j),

we deduce from the exchangeability property that indeed E(Y j1 | Y0 = i) = Rj(i), where Rj is

the polynomial of degree j given by

Rj(X) =j∑

k=1

X(X − 1) · · · (X − k + 1)∑

α.≥1 :α1+···+αk=j

E(ηα11 · · · η

αkk ).

The result follows, since ajj is the dominant coefficient of Rj , which is easily seen to equalE(η1η2 · · · ηj). 2


Exercise 1.1 For any 1 ≤ z ≤ 2N − 1, define Nz as the total number of visits of z

Nz := Card{n : Yn = z}.

Establish a relation between Py(Yn = z | Fix) and Py(Yn = z), and deduce a relation betweenEy(Nz | Fix) and Ey(Nz). Conclude that

Ey(τ | Fix) = y−12N−1∑z=1

zEy(Nz).

Wright–Fisher model

The Wright–Fisher model (WF) is a particular instance of the Cannings model, where (η1, . . . , η2N )follows the multinomial distribution with parameters (2N ; 1/2N, . . . , 1/2N). As for the asso-ciated Markov chain Y ,

Pyz =(

2Nz

)( y

2N

)z (1− y

2N

)2N−z.

In other words, conditional on Yn = y,

Yn+1L= Bin(2N, y/2N),

that is, Yn+1 follows the binomial distribution with number of trials 2N and success probabilityy/2N . Yet another way of defining the model is to say that

• each individual from generation n+ 1 picks its one parent at random, uniformly amongthe individuals of generation n

• these 2N samplings are independent.

Results for the Cannings model apply, in particular λ0 = λ1 = 1, and [44]

λj =j−1∏i=1

(1− i

2N

)2 ≤ j ≤ 2N.

In classical textbooks, treatment of the WF model usually goes through two further steps.The first one is computing equivalents, as N grows, of such quantities as the mean time toabsorption E(τ), thanks to diffusion approximations. This will be done in Chapter 2. Thesecond step is to introduce the Kingman coalescent, which will hopefully be done in a futureversion of this work.

1.1.2 Branching models

Bienayme–Galton–Watson model

First, assume we are given the law of a random integer ξ

pk := P(ξ = k) k ≥ 0,

where p0 and p1 will always be assumed to be both different from 0 and 1. The population sizeat time n will be denoted by Zn. Assume that at each time n, individuals in the population arerandomly labelled i = 1, . . . , Zn. The dynamics of the BGW model are given by the followingrules (see Fig. 1.2).



• conditional on Zn, for any 1 ≤ i ≤ Zn, individual i from generation n begets a numberξi of offspring

• the ξi’s are independent and all distributed as ξ.

The Markov chain (Zn;n ≥ 0) is called the BGW process. It contains less information thanthe whole BGW model, which provides a complete genealogical information. If Z(x) denotesa BGW process started with Z0 = x individuals, then it is straightforward to check that thefollowing branching property holds

Z(x+ y) L= Z(x) + Z(y), (1.1)

where Z is an independent copy of Z. In general, stochastic processes that satisfy (1.1) arecalled branching processes.

-

0 1 2 3 4 5 6

Figure 1.2: A Bienayme–Galton–Watson tree through 6 generations, starting from one ancestor(crosses mean death with zero offspring; vertical axis has no meaning).

It is convenient to consider the generating function f of ξ (see Fig. 1.3)

f(s) := E(sξ) =∑k≥0

pksk s ∈ [0, 1],

as well as its expectationm := E(ξ) = f ′(1−) ∈ (0,+∞].


6 6

- -��

��

1 1

1 10 0

m < 1 m > 1

q

q

s s

f(s) f(s)

(a) (b)

Figure 1.3: Graph of the probability generating function f (a) for a subcritical BGW model(b) for a supercritical BGW model, with extinction probability q shown.

Similarly as in the previous subsection, we write Pz for the conditional probability measureP(· | Z0 = z). Unless otherwise specified, P will denote P1.

Proposition 1.1.4 The generating function of Zn is given by

Ez(sZn) = fn(s)z s ∈ [0, 1], (1.2)

where fn is the n-th iterate of f with itself. In particular, E(Zn | Z0 = z) = mn z.

Proof. One can compute the generating function of Zn+1 conditional on Zn = z as

E(sZn+1 | Zn = z) = E(s∑zi=1 ξi) =

z∏i=1

E(sξi) = f(s)z.

Iterating the last displayed equation then yields (1.2). Differentiating (1.2) w.r.t. s and lettings tend to 1, gives the formula for the expectation. 2

As in the previous subsection, we say that extinction occurs if Z hits 0, and denote {Ext}this event. Before stating the next result, recall that f is an increasing, convex function suchthat f(1) = 1. As a consequence, f has at most 2 fixed points in [0, 1]. More specifically, 1is the only fixed point of f in [0,1] if m ≤ 1, and if m > 1, f has another distinct fixed pointtraditionally denoted by q (see Fig. 1.3).

Theorem 1.1.5 If extinction does not occur, then limn→∞ Zn = +∞ a.s. In addition,

Pz(Ext) = qz.

Proof. Notice that Z is an irreducible Markov chain with two classes. Since {0} is an acces-sible, absorbing state, the class {1, 2, 3, . . .} is transient, and the first part of the theorem isproved.


To get the second part, observe that {Ext} is the increasing union, as n ↑ ∞, of the events{Zn = 0}, so that

P(Ext) = limn→∞

↑ P(Zn = 0).

Thanks to (1.2), Pz(Zn = 0) = fn(0)z, so that P(Ext) is the limit of the sequence (qn)n definedrecursively as q0 = 0 and qn+1 = f(qn). By continuity of f , this limit is a fixed point of f , so itbelongs to {q, 1}. But 0 = q0 < q so taking images by the increasing function f and iterating,one gets the double inequality qn < q ≤ 1, which ends the proof. 2

Definition 1.1.6 A BGW model is said subcritical if m < 1, critical if m = 1, and super-critical if m > 1.

Exercise 1.2 Assuming that σ2 := Var(ξ) is finite, prove that

Var(Zn | Z0 = 1) ={σ2mn−1mn−1

m−1 if m 6= 1nσ2 if m = 1.

Exercise 1.3 Assume m > 1. Show that conditional on {Ext}, Z has the same law as thesubcritical branching process Z? with offspring distribution p?k = qk−1pk, whose generatingfunction is

f?(s) = q−1f(qs) s ∈ [0, 1].

This subcritical branching process is called the dual process. Note that a similar result holdsfor the subtree of infinite lines of descent conditional on {Extc}.

Examples–exercises

Binary. Assume pk = 0 for all k ≥ 3, and call this model Binary(p0, p1, p2). The process issupercritical iff p2 > p0, and in that case, the extinction probability is q = p0/p2. The dualprocess is Binary(p2, p1, p0).

Geometric. Assume pk = (1 − a)ak, and call this model Geometric(a). We have m =a/(1− a). The process is supercritical iff a > 1/2, and in that case, the extinction probabilityis q = (1− a)/a. The dual process is Geometric(1− a).

Poisson. Assume pk = e−aak/k!, and call this model Poisson(a). The process is supercriticaliff a > 1, and in that case, we only get the inequality q < 1/a. The dual process is Poisson(qa).

Linear fractional. Assume p0 = b and pk = (1 − b)(1 − a)ak−1 for k ≥ 1. Call this modelLF (a, b). We have m = (1− b)/(1− a). The process is supercritical iff a > b, and in that case,the extinction probability is q = b/a. The dual process is LF (b, a). This example has veryinteresting features, see e.g. section I.4 of [6].


BGW process with immigration

Assume that in addition to the law of a random integer ξ (offspring distribution) with generatingfunction f , we are also given the law of a random integer ζ (immigration law) with generatingfunction g. The dynamics of the BGW model with immigration is given by the following rules

• generation n + 1 is made up of the offspring of individuals from generation n and of arandom number ζn+1 of immigrants, where the ζi’s are independent and all distributedas ζ

• conditional on Zn, for any 1 ≤ i ≤ Zn, individual i from generation n begets a numberξi of offspring

• the ξi’s are independent and all distributed as ξ.

It is important to understand that to each immigrant is given an independent BGW descendanttree with the same offspring distribution. The population size process (Zn;n ≥ 0) of this modelis a discrete-time Markov chain called BGW process with immigration. It is straightforwardthat

Ez(sZ1) = g(s)f(s)z.

Iterating this last equation yields

Ez(sZn) = fn(s)zn−1∏k=0

g ◦ fk(s) s ∈ [0, 1]. (1.3)

The following theorems concern the asymptotic growth of BGW processes with immigration.

Theorem 1.1.7 (Heathcote [57]) Assume m < 1. Then the following dichotomy holds

E(log+ ζ) <∞ ⇒ (Zn) converges in distributionE(log+ ζ) =∞ ⇒ (Zn) converges in probability to +∞.

Theorem 1.1.8 (Seneta [104]) Assume m > 1. Then the following dichotomy holds

E(log+ ζ) <∞ ⇒ limn→∞m−nZn exists and is finite a.s.

E(log+ ζ) =∞ ⇒ lim supn→∞ c−nZn =∞ for any positive c a.s.

To prove the last two theorems, we will need the following

Lemma 1.1.9 Let ζ1, ζ2, . . . be i.i.d. random variables distributed as ζ. Then for any c > 1,

E(log+ ζ) <∞ ⇒∑

n≥1 c−nζn <∞ a.s.

E(log+ ζ) =∞ ⇒ lim supn→∞ c−nζn =∞ a.s.

Proof. By the Borel–Cantelli lemma, one can prove that for any sequence of i.i.d. nonnegativer.v.’s X,X1, X2, . . ., lim supn→∞Xn/n = 0 or∞ according whether E(X) is finite or not. Thenthe lemma follows from taking X = log+(ζ). Indeed, writing log c =: a > 0, one then getsc−nζn = exp−n(a−Xn/n) (as soon as ζn 6= 0), which ends the proof. 2


Proof of Theorem 1.1.7. (inspired from [91], where it was inspired from [4]) Let Yk havethe law of the pure BGW process (without immigration) at generation k, when started at a r.v.distributed as a pack of immigrants, that we will denote by ζk. Then observe that the BGWprocess with immigration Z started at 0 and evaluated at generation n satisfies the followingequality in distribution

ZnL=

n∑k=0

Yk,

where the Yk’s are independent, and for each k, Yk stands for the contribution to generation nfrom the immigrants of generation n− k. As a consequence, it suffices to determine whether

Z∞ :=∞∑k=0

Yk

is finite or infinite, where the Yk’s are all independent, and the law of Yk is that of the pureBGW process (without immigration) at generation k, when started at a random state denotedζk and distributed as a pack of immigrants. Notice that thanks to Kolmogorov’s zero–one law,Z∞ is finite a.s. or infinite a.s. Let G be the σ-field generated by all these r.v.’s ζ0, ζ1, ζ2, . . .

First assume that E(log+ ζ) <∞. Then

E(Z∞ | G) =∞∑k=0

ζkmk,

so by Lemma 1.1.9, this conditional expectation is finite a.s., which entails that Z∞ itself isfinite a.s.

Now, assume that Z∞ is finite a.s. and write Z∞ as

Z∞ =∞∑k=0

ζk∑i=1

Z(i)k ,

where for each k ≥ 1 conditional on ζk, the r.v.’s Z(i)k , i = 1, . . . , ζk, are i.i.d. and distributed

as the value at generation k of the pure BGW process starting from one individual. Thanks tothe Borel–Cantelli lemma conditional on G, we cannot have

∞∑k=0

ζk∑i=1

P(Z(i)k ≥ 1) =∞,

because otherwise we would get

Z∞ ≥∞∑k=0

ζk∑i=1

1Z

(i)k ≥1

=∞.

As a consequence,∞∑k=0

ζkP1(Z ′k ≥ 1) <∞ a.s.,


where Z ′ is the pure BGW process (without immigration). Next observe that for Z ′k to benonzero, it is necessary that at each of the k first generations, (at least) one individual has atleast one descendant, so that P1(Z ′k ≥ 1) ≥ P(ξ 6= 0)k, which entails

∞∑k=0

ζkP(ξ 6= 0)k <∞ a.s.

Again, conclude with Lemma 1.1.9. 2

Proof of Theorem 1.1.8. (also inspired from [91], where it was also inspired from [4]) First,consider the case when E(log+ ζ) =∞. Thanks to Lemma 1.1.9, lim supn→∞ c−nζn =∞. SinceZn ≥ ζn, the result follows.

Now consider the case when E(log+ ζ) < ∞. Let Fn stand for the σ-algebra generated bythe r.v.’s Z0, Z1, . . . , Zn as well as all the r.v.’s ζ0, ζ1, ζ2, . . . Then

E(Zn+1/mn+1 | Fn) = m−n−1E(

Zn∑i=1

ξi + ζn+1 | Fn)

=Znmn

+ζn+1

mn+1.

Two consequences stem from this last equation. The first one is that (Zn/mn) is a submartin-gale w.r.t. the filtration (Fn). The second one is that, by an immediate induction,

E(Zn/mn | F0) = Z0 +n∑k=0

ζkmk

n ≥ 1.

Now thanks to Lemma 1.1.9, (Zn/mn)n is a submartingale with bounded expectations, so itconverges a.s. to a finite r.v. 2

Kesten–Stigum theorem

Assume that 1 < m <∞ and set Wn := m−nZn. It is elementary to check that (Wn;n ≥ 0) is anonnegative martingale (see e.g. proof of Theorem 1.1.8), so it converges a.s. to a nonnegativerandom variable W

W := limn→∞

Znmn

.

To be sure that the geometric growth at rate m is the correct asymptotic growth for the BGWprocess, one has to make sure that W = 0 (if and) only if extinction occurs.

Theorem 1.1.10 (Kesten–Stigum [72]) Either P(W = 0) = q or P(W = 0) = 1. Thefollowing are equivalent

(i) P(W = 0) = q(ii) E(W ) = 1(iii) E(ξ log+ ξ) <∞.

For a recent generalization of this theorem to more dimensions, see [5].


Remark 1.2 It is worth noting with C.C. Heyde [59], that even when E(ξ log+ ξ) =∞, thereis a deterministic sequence (Cn) such that limnCn+1/Cn = m, limn Zn/Cn = W exists and isfinite a.s., and P(W = 0) = q.

On the other hand, if E(ξ2) <∞, the fact that E(W ) = 1 comes directly from the fact that(Wn) is bounded in L2. This was proved by Kolmogorov in 1938 [76].

Proof. (following [90]) Let us prove that P(W = 0) ∈ {q, 1}. Conditional on Z1, the de-scendant trees of all individuals from generation 1 are independent BGW trees, so that, withobvious notation,

W = limn→∞

Zn+1/mn+1 = m−1 lim

n→∞

Z1∑i=1

(Z(i)n /mn) = m−1

Z1∑i=1

Wi,

where the Wi’s are independent r.v.’s, independent of Z1, all distributed as W . As a conse-quence,

P(W = 0) = E(P(Z1∑i=1

Wi = 0) | Z1) = E(P(W = 0)Z1) = f(P(W = 0)).

In conclusion, P(W = 0) is a fixed point of the generating function f of ξ, so it is either q or 1.As (Wn;n ≥ 0) is a martingale, we can define a new probability on chains, say P↑, with the

following martingale change of measure (Doob’s harmonic transform)

P↑x(A) = x−1Ex(Wn, A) A ∈ Fn,

where Fn is σ(Z0, Z1, . . . , Zn). Taking x = 1, this absolute continuity relationship becomes fora general event A ∈ F∞

P↑(A) = E(W,A) + P↑(A, lim supn→∞

Wn =∞). (1.4)

Indeed, let P denote the probability measure defined by P := (P + P↑)/2. Since both P and P↑are absolutely continuous w.r.t. P , the associated Radon–Nikodym derivatives (Un) and (Vn)are P -martingales satisfying

P(A) = E(Un, A) and P↑(A) = E(Vn, A), A ∈ Fn. (1.5)

These two nonnegative P -martingales converge P -a.s. and in L1(P ) to r.v.’s U and V , re-spectively. Then adding the two equalities in (1.5) yields P (Un + Vn = 2) = 1 so thatP (Un = Vn = 0) = 0. In addition, for any A ∈ Fn, get two different expressions forE(UnWn, A), to find P (Vn = UnWn) = 1. Letting n → ∞ in the last two results, we findthat

P (U = 0, V 6= 0, limn→∞

Wn =∞) + P (U 6= 0, limn→∞

Wn = V/U) = 1.

By absolute continuity, this equality still holds if replacing P with P or P↑. Therefore, for anyA ∈ F∞,

P↑(A, lim supn→∞

Wn <∞) = P↑(A,U 6= 0) = E(V,A,U 6= 0) = E(UW,A,U 6= 0) = E(W,A),


where the second equality follows from taking limits in (1.5). Thus, we have proved (1.4).Next, since E↑z(sZn−1) = Ez(ZnsZn−1)/zmn = fn(s)z−1f ′n(s)/mn, it is easy to check by

induction that

E↑z(sZn−1) = fn(s)z−1n−1∏k=0

(f ′ ◦ fk(s)/m).

Identifying this with (1.3) proves that under P↑, (Zn − 1;n ≥ 0) is a BGW process withimmigration, where the immigration law is given by

P(ζ = k − 1) = kpk/m k ≥ 1,

which has generating function g(s) = f ′(s)/m. For deeper insight into this size-biased immigra-tion, see [90], as well as forthcoming chapters on quasi-stationary distributions and branchingprocesses conditioned to be never extinct (Chapter 3), and on spine decompositions of splittingtrees (Chapter 4).

Notice thatE(ξ log+ ξ) <∞⇔ E(log+ ζ) <∞.

Then we know, thanks to Theorem 1.1.8, that

E(ξ log+ ξ) <∞ ⇒ P↑(limn→∞Wn exists and is finite) = 1E(ξ log+ ξ) =∞ ⇒ P↑(lim supn→∞Wn =∞) = 1.

Conclude using (1.4). 2

Results for (sub)critical processes on the asymptotic decay of P(Zn 6= 0), as n → ∞, willbe displayed in Chapter 3. They can (and will) be proved thanks to a similar comparison withbranching processes with immigration.

1.1.3 Two relations between fixed size and branching

By the branching property, a BGW model which is conditioned to have constant size 2N is aparticular instance of the Cannings model. A further result in that direction is the following.

Proposition 1.1.11 A BGW model with Poisson offspring distribution conditioned to haveconstant size has the same law as the Wright–Fisher model.

Proof. Let ξ(n)i be the offspring number of individual i from generation n − 1 in the BGW

model. Since these r.v. are i.i.d., for any integers K ′ ≥ K ≥ 1, the law of (ξ(n)i ; 1 ≤ i ≤ 2N, 1 ≤

n ≤ K) conditioned on {∑2N

i=1 ξ(n)i = 2N, 1 ≤ i ≤ 2N, 1 ≤ n ≤ K ′} is independent of K ′ ≥ K.

As a consequence, it makes sense to condition the BGW model on {∑2N

i=1 ξ(n)i = 2N, 1 ≤ i ≤

2N,n ≥ 1}, which we will then term the BGW model conditioned to have constant size 2N .If η(n)

i denotes the offspring number of individual i from generation n− 1 in the BGW modelconditioned to have constant size 2N , then the 2N -tuples (η(n)

i ; 1 ≤ i ≤ 2N), for n ≥ 1, arei.i.d. and for any 2N -tuple of integers (k1, . . . , k2N ) summing up to 2N , their common law is


given by

P(η1 = k1, . . . , η2N = k2N ) = P(ξ1 = k1, . . . , ξ2N = k2N |2N∑i=1

ξi = 2N)

=∏2Ni=1 P(ξi = ki)

P2N (Z1 = 2N).

Now let f be the generating function of the Poisson distribution with mean, say, m. Since thegenerating function of Z1 under P2N is f(s)2N , one gets

E2N (sZ1) = (exp(−m(1− s)))2N = exp(−2Nm(1− s)),

so that under P2N , Z1 follows a Poisson distribution with mean 2Nm. As a consequence,

P(η1 = k1, . . . , η2N = k2N ) =∏2Ni=1 e

−mmki/ki!e−2Nm(2Nm)2N/(2N)!

=(2N)!

k1! · · · k2N !

(1

2N

)2N

,

which is the multinomial distribution with parameters (2N ; 1/2N, . . . , 1/2N). 2

An alternative relationship is mentioned in [102], where an exchangeable genealogy is obtainedby repeating the following scheme at each generation. The N individuals give birth to N i.i.d.integers with values greater than or equal to 1 (branching scheme) and the next generation isobtained by sampling exactly N individuals uniformly among these.

Conversely, branching processes can be obtained from Wright-Fisher genealogies in the limit oflarge populations. To keep track of the population size, set Y (N) the Markov chain associatedto a Wright-Fisher model starting from a subpopulation of size Y (N)

0 = k, independent fromthe total population size N .

Theorem 1.1.12 The Markov chains (Y (N)n ;n ≥ 0) converge weakly in law, as N →∞, to a

BGW process with Poisson offspring of mean 1.

Proof. It is sufficient to prove the convergence of finite-dimensional marginals. This conver-gence can be obtained easily by recalling that Y (N)

1 is binomial with parameters (2N, k/2N),which converges in law to a Poisson variable with parameter k. 2

1.2 Discrete populations in continuous time

1.2.1 Basic reminders on generators

We will like to define continuous-time Markov chains with values in {0, 1, 2, . . .} from theirinfinitesimal generator, that is, a rate matrix Q with nonnegative elements, except the diagonalones, which satisfy

−qii =: qi ≥∑k 6=i

qik i ≥ 0,


where we always assume that qi <∞. If the above inequality is strict, then the chain startingin state i can reach (and remain in) a cemetery state ∂ after an exponential time of parameterqi −

∑k 6=i qik. If instead of this inequality, equality holds for all i’s, the rate matrix Q is said

to be conservative.

It is always possible to build a (right-continuous) Markov chain (Ft; t ≥ 0) with a given ratematrix Q, iterating the following rule.

• starting from F0 = i, wait an exponential time S1 with parameter qi

• given S1 = t, make a jump to state j with probability qij/qi (j = ∂ with probability1−

∑k 6=i qik/qi)

• given Ft = j, start afresh from state j.

The chain thus obtained is called the minimal process. The exponential durations S1, S2, . . .are called holding times, or sojourn times.

The minimal process can have a finite lifetime in two cases. The first case occurs when thecemetery state is reached, which can only happen if the rate matrix is not conservative. Thenwe say that the chain is killed. The killing time will be denoted T∂ . The second case occurswhen

T∞ :=∑k≥1

Sk <∞.

Since T∞ <∞ obviously implies that limt↑T∞ Ft = +∞, we then say that the chain blows up.Transition functions (Pij(t); i, j ∈ N, t ≥ 0) are nonnegative functions satisfying Pij(0) =

δij ,∑

j Pij(t) ≤ 1, as well as the Chapman–Kolmogorov equations (semigroup property) Pij(t+s) =

∑k Pik(t)Pkj(s). They are said Q-functions iff P ′(0+) = Q. There is some ambiguity

about the definition of the Markov chain from its rate matrix, since there may be other Markovchains than the minimal process whose transition functions are Q-functions. Actually, theseother Markov chains can be obtained by resurrecting the minimal process after T∂ ∧ T∞ (seeforthcoming Theorem 1.2.1).

Recall the backward Kolmogorov equations (associated with Q)

Pij(t) = δije−qit +

∫ t

0ds e−qis

∑k 6=j

qikPkj(t− s) i, j ≥ 0, t > 0,

as well as the forward Kolmogorov equations

Pij(t) = δije−qjt +

∫ t

0ds e−qjs

∑k 6=j

Pik(t− s)qkj i, j ≥ 0, t > 0.

In terms of the paths of a Markov chain (Xt, t ≥ 0), the former equation can equivalently beread as

Pi(X(t) = j) = δijP(S > t) +∫ t

0P(S ∈ ds)

∑k 6=j

qikqi

Pk(Xt−s = j),

where S is the first jump time of X. Considering the last jump before t provides a similarinterpretation to the forward equation. In the case when the minimal process blows up, notice


that there can fail to be a last jump, so the forward equation may not always hold, even in theconservative case.

The following theorem fixes the ideas as far as uniqueness is concerned. The statement isrelatively clear without entering the details, and we refer the reader to standard textbookssuch as [6] for more precise results and proofs.

Theorem 1.2.1 If T∞ =∞ a.s., then the minimal process is the unique solution to the back-ward equations. If in addition T∂ = ∞, the minimal process is the unique Markov chain withrate matrix Q.

If T∞ < ∞ with positive probability, then for any i.i.d. sequence of random integersU1, U2, . . ., the minimal process resurrected in state Ui at each blow up time T (i)

∞ satisfies thebackward equations.

If T∂ <∞ with positive probability and the minimal process has an entrance law at ∞, thatis, limi→∞ Pi(Ft = j) exists for some j, then the minimal process resurrected at ∞ at eachkilling time satisfies the forward equations.

In what follows, it will be implicit that, whenever specifying a rate matrix, the Markovchain we consider is the minimal process associated with it. Anyway, we will only deal withconservative rate matrices, so that T∂ = ∞, and most of the time, the minimal process willhave T∞ =∞ a.s., so that there will remain no ambiguity about which process we consider.

1.2.2 Fixed size models

Start with a population of fixed size 2N . Let η be a r.v. with values in {1, . . . , 2N}. A generalversion of the Cannings model in continuous time can be defined as follows. Each individualgives birth at constant rate to a random number of children distributed as η, so that no twobirth events can occur simultaneously. Then at each birth time S, conditional on η = k, drawrandomly k individuals from the population at time S−, and replace them with the newborns.A continuous-state space version of this general model is studied in the next chapter. In whatfollows, we focus on the Moran model, which is the particular case of this model associated toη = 1 a.s.

Moran model

Label randomly the individuals of the population. To each ordered pair (i, j) of individuals,attach an exponential clock with parameter c. If the first clock that rings is that of, say, (i0, j0),then individual i0 gives birth to a new individual who takes the place of individual j0. Thenrelabel the individuals and start afresh new exponential clocks (see Fig. 1.4).

Since the mean number of birth–death events per unit time is 4N2c, and we want to mea-sure time in generations as in the Cannings model, that is, we would like to see a mean numberof 2N birth–death events per unit time, we fix c = 1/2N .

Start with a subpopulation of size Y0 = y, and let Yt denote the number of descendants ofthis subpopulation at time t. In the Moran model, (Yt; t ≥ 0) is a continuous-time Markov


6

?

6

a a

b a

c d

d d

e d

f f

Figure 1.4: A realization of Moran model with 2N = 6. Births (arrows) and deaths (crosses)are meant to be simultaneous. Letters (not to be confounded with ‘labels’) allow to identifythe ancestor of each individual.

chain, again with two absorbing states 0 and 2N . The transition rates of this chain are givenafter the following lemma.

Lemma 1.2.2 If X1, X2, . . . , are independent exponential variables with parameters a1, a2, . . .,then mini(Xi) is an exponential variable with parameter

∑i ai.

The first time at which a birth of an individual from a subpopulation of size y occurs is theminimum of all exponential variables attached to pairs (i, j), where i ≤ y and j > y. Thereare y(2N − y) such pairs, so thanks to the previous lemma, this random time is an exponentialvariable with parameter y(2N − y)/2N . As a consequence, the transition rates for the Moranmodel are {

y → y + 1 at rate y(2N − y)/2Ny → y − 1 at rate y(2N − y)/2N.

In particular, notice that the embedded Markov chain associated with the Moran model is thesimple random walk stopped upon hitting 0 or 2N .

Recall that τ is the absorption time, that is, the first time Y hits {0, 2N}. Then let

Tx := inf{t : Yt = x} 0 ≤ x ≤ 2N.

As usual {τ = T0} is the extinction event, and {τ = T2N} the fixation event. As in the Canningsmodel, the exchangeability property assures that Y is a martingale so that Py(Fix) = y/2N .

Expectations of hitting times

Now let us compute the expectation of the absorption time, as is done in [36]. With

Sy :=∫ ∞

01Yt=y dt,

we have τ =∑2N−1

y=1 Sy. In addition, since each sojourn time in state y is exponential withparameter y(2N − y)/2N ,

E(Sy) = 2NE(Ny)/y(2N − y),


where Ny is the number of visits to y. By the Markov property, for any k ≥ 1,

P(Ny = k) = P(Ty < τ)(1− ρ(y))k−1ρ(y),

where ρ(y) denote the probability of no returning to y starting from y. In particular,

E(Ny) =P(Ty < τ)ρ(y)

.

Applying the stopping theorem to the martingale Y at time T0 ∧ Ty, we get

Px(Ty < T0) =x

yx ≤ y,

and by symmetry

Px(Ty < T2N ) =2N − x2N − y

x ≥ y.

Then, since the embedded Markov chain is a simple random walk,

ρ(y) =12

Py+1(T2N < Ty) +12

Py−1(T0 < Ty),

and using the two preceding equations, we get

ρ(y) =2N

y(2N − y).

Putting this together,

Ex(Ny) ={x(2N − y)/2N if x ≤ yy(2N − x)/2N if x ≥ y,

which finally yields

Ex(Sy) ={

x/y if x ≤ y(2N − x)/(2N − y) if x ≥ y,

Adding up all these expressions gives the result for Ex(τ).

Theorem 1.2.3 The absorption time τ has expectation

Ex(τ) =x−1∑y=1

2N − x2N − y

+2N−1∑y=x

x

y,

and conditional on fixation

Ex(τ | Fix) = 2N − x+x−1∑y=1

(2N − x)yx(2N − y)

.


Proof. It only remains to prove the conditional expectation. The proof is the same as in theexercise ending the subsection on the Cannings model, and relies on the following applicationof Fubini’s theorem and the Markov property

Ex(Sy,Fix) = Ex∫ ∞

01Yt=y1Fix dt =

∫ ∞0

dt Px(Yt = y)Py(Fix) = Ex(Sy)Py(Fix).

Recalling that Py(Fix) = y/2N , one gets

Ex(τ | Fix) =2N−1∑y=1

Ex(Sy)y

x,

which finishes the proof. 2

Elementary computations provide the following

Corollary 1.2.4 For a subpopulation starting from one single individual, the absorption timehas

E1(τ) =2N−1∑y=1

y−1 ∼ log(N) and E1(τ | Fix) = 2N.

For a subpopulation starting from 2N − 1 individuals,

E2N−1(τ | Fix) =2N

2N − 1

2N∑y=2

y−1 ∼ log(N)

For an initial subpopulation representing a proportion p of a large population (N → ∞ and2Nx→ p),

E(τ) ∼ −2N(p log(p) + (1− p) log(1− p))

E(τ | Fix) ∼ −2N(1− p) log(1− p)/p.

A typical result to remember is that the time to fixation, when starting from half the totalpopulation, has mean 2 log(2)N ≈ 1.4N (generations).

The asymptotic expressions given in the last corollary can be obtained directly by diffusionapproximation methods, which will be done in detail in the next chapter.

1.2.3 Branching models

Markov branching process

Consider a discrete population where individuals give birth independently at constant rate b,to k ≥ 1 new offspring individuals with probability pk, and die independently at constant rated > 0. Let πk := bpk stand for the birth rate per individual of k-sized clutches, and a := b+ dthe total birth–death rate per individual (see Fig. 1.5). Since each individual is replaced by 0individual with probability d/a and by k + 1 individuals with probability πk/a, the genealogyassociated with this birth–death scheme is that of a BGW process with offspring generatingfunction

f(s) =1a

(d+∑k≥1

πksk+1) s ∈ [0, 1].


Check that the process is critical if b∑

k≥1 kpk = d and supercritical (resp. subcritical) ifb∑

k≥1 kpk > d (resp. b∑

k≥1 kpk < d). Set m := f ′(1) as well as

u(s) := a(f(s)− s) s ∈ [0, 1].

Then recall that the extinction probability q is the smallest root of u. The Markov chain(Zt; t ≥ 0) counting the number of individuals at time t trivially satisfies the branching property(1.1), and thus, is called a Markov branching process. Thanks to Lemma 1.2.2, its transitionrates are given by {

n → n+ k at rate nπkn → n− 1 at rate nd.

Remark 1.3 When π1 = b, Z is called binary branching process, or linear birth–death pro-cess. If in addition d = 0, it is called binary fission process, or linear birth process, orYule–Furry process, or Yule process.

Remark 1.4 Another interpretation is the following. Each birth event of a clutch of size kcan be considered alternatively as the birth of k + 1 individuals simultaneously with the deathof the mother. In other words, each individual lives an exponential lifespan with parameter aat the end of which she gives birth to k individuals with probability πk/a, with π0 := d. In thatcase, nothing prevents us from giving the lifespans a more general distribution than the mereexponential. At this degree of generality, the process Z is called a Bellman–Harris process.

Remark 1.5 Processes that satisfy the branching property (1.1), but are not necessarily Marko-vian are called general branching processes, or Crump–Mode–Jagers processes.

Since we assume conservative rate matrix, recall that either Z blows up in finite time T∞with positive probability, or it has infinite lifetime a.s.

Theorem 1.2.5 The branching process has infinite lifetime a.s. iff m <∞, or m =∞ and∫ 1 ds

u(s)= −∞.

Proof. We exclude the trivial case when q = 1. Let hi(t) := Pi(T∞ > t) =∑

j≥0 Pi(Zt = j).Notice that h(t) := h1(t) is nonincreasing in t, that h(0+) = 1 and that h(t) ∈ (q, 1]. Next,by the branching property, observe that hi(t) = h(t)i. Summing up the Kolmogorov backwardequations yields

h(t) = e−at +∫ t

0ds e−as

∑k 6=1

q1kh(t− s)k t > 0.

Differentiating this last equation and integrating by parts, this becomes

h′(t) = u(h(t)) t > 0.

Assume that the branching process has finite lifetime with positive probability, so that h(t0) < 1for some t0. Fixing ε ∈ (q, 1) and setting F (x) :=

∫ xε ds/u(s), we get that t − F (h(t)) =

t0 − F (h(t0)), so letting t → 0+ entails that F (1−) has a finite value. As a consequence, if his not identically 1, then

∫ 1ds/u(s) is finite (which also forces m =∞).


0-

t

particles give birthat rate b...

and die at rate d

Figure 1.5: A Markov branching process in continuous time.

Conversely, assume that m = ∞ and that∫ 1ds/u(s) converges, which allows to define

F (x) :=∫ x

1 ds/u(s) for x ∈ (q, 1]. Integrating h′ = u(h) with this new F , entails t−F (h(t)) = 0,which implies h(t) < 1 as soon as t > 0. 2

Next, setq(t) := P1(T < t) t ∈ (0,+∞],

so that q(0+) = 0 and q(∞) = q, the extinction probability.

Theorem 1.2.6 The law of the extinction time is given implicitly by∫ q(t)

0

ds

u(s)= t t ≥ 0.

Proof. The idea is the same as previously. Since Pi(Zt = 0) = q(t)i, the Kolmogorov back-ward equations yield

q(t) =∫ t

0ds e−as(

∑k≥1

πkq(t− s)k+1 + d) t > 0,

which becomes after differentiation and integration by parts

q′(t) = u(q(t)) t > 0.

Recalling that q(t) ∈ [0, q), we set F (x) :=∫ x

0 ds/u(s), and integrate the last displayed equation,to finally get F (q(t)) = t. 2


Binary case. In the binary case (π1 = b), check that the process is critical if b = d andsupercritical (resp. subcritical) if b > d (resp. b < d). Here u(s) = d − (b + d)s + bs2, theextinction probability is q = min(d/b, 1), and with r := b− d,

q(t) ={d(ert − 1)/(bert − d) if b 6= d

bt/(1 + bt) if b = d.

In Subsection 1.2.4 (birth–death processes), we will see in addition that the expected time toextinction equals

E1(T ) =1b

log(

11− b/d

)if b < d, and is infinite if b = d.

Time change

From the transition rates of the branching process, notice that the holding time in state n isexponential with parameter an, which means that birth–death events occur at a rate which islinear w.r.t. the population size. Then ‘decelerating the clock’ proportionally to the populationsize will give a Markov chain whose rates are independent of its current state, that is, a randomwalk. Specifically, we define implicitly X as the solution to

Zt = X

(∫ t

0Zs ds

)t > 0.

More rigorously, recall that T is the (possibly infinite) extinction time of the branching processand set

θt :=∫ t

0Zs ds t > 0.

Since θ is strictly increasing on [0, T ), we let κ be its inverse on [0, θT ). Next define

Xt := Z ◦ κt t < θT .

The following theorem is a classical result in random time changes (see e.g [39, chapter 6], or[78, pp.25–26])

Theorem 1.2.7 The process (Xt; t ≥ 0) is a random walk killed upon hitting 0. Its transitionrates are for n ≥ 1 {

n → n+ k at rate πkn → n− 1 at rate d.

Remark 1.6 Because jumps of this random walk to the left are of absolute size at most 1, itis sometimes called a left-continuous random walk.

Of course, the converse can easily be stated. In other words, one can recover the branchingprocess Z from the random walk X killed at T0, its (possibly infinite) first hitting time of 0.Indeed, for t < T0, ∫ t

0

ds

Xs=∫ κt

0

dθuX(θu)

=∫ κt

0

Zu du

Zu= κt


Check that if θ′ is the inverse of κ on [0, κ(T0−)), then

θt ={

θ′t if t < κ(T0−)T0 if t ≥ κ(T0−).

As a conclusion, Z = X ◦θ. Notice that this procedure can be achieved without reference to theinitial process Z, by considering a left-continuous random walk X straight from the beginning,defining κ and θ as previously, and proving that X ◦ θ has the same transition rates as Z.

Applications of this time change are numerous, and will be studied in more detail forbranching processes with continuous-state space. In that setting, the time change is usuallyreferred to as the Lamperti transform.

Extensions

Here, we want to give the definitions of two useful additional models: the branching processwith immigration, and the branching process with logistic growth. Both models are simpleextensions of the branching process, with the interesting feature that the former never becomesextinct, and the latter never goes to ∞.

Immigration. Let ν be a positive, finite measure on the nonnegative integers, and set ρ :=∑k≥0 νk. In a branching model with immigration,

• at rate ρ, packs of immigrating individuals enter the population

• each pack comprises k individuals with probability νk/ρ

• all individuals present in the population die and reproduce independently according tothe branching scheme.

Then the branching process with immigration has transition rates{n → n+ k at rate nπk + νkn → n− 1 at rate nd.

Exercise 1.4 Prove that the branching process with immigration has infinite lifetime a.s. iffthe pure branching process associated with it also has.

Logistic growth. Let c be a positive real number, called the competition intensity. In abranching model with logistic growth,

• all individuals present in the population reproduce independently according to the branch-ing scheme

• in addition to natural deaths at constant rate d, each individual kills and replaces everyother individual at rate c.

Then the branching process with logistic growth, or logistic branching process, has transitionrates {

n → n+ k at rate nπkn → n− 1 at rate nd+ cn(n− 1).

Recall that T stands for the extinction time.


Theorem 1.2.8 ([81]) If d = 0, then the logistic branching process (Zt; t ≥ 0) converges indistribution, and if d 6= 0, it becomes extinct with probability 1.

Provided that∑πk log(k) < ∞, the logistic branching process comes down from infinity,

that is,limi↑∞

P(Zt = j | Z0 = i) exists for all j ≥ 0, t > 0.

In addition, E∞(T ) <∞.

In the binary case, a proof can be found in the next subsection.

1.2.4 Birth–death processes

A birth–death process (BDP) is a continuous-time Markov chain whose jumps can only equal±1. Its transition rates are written{

n → n+ 1 at rate λnn → n− 1 at rate µn,

with µ0 = 0.

Examples

The Yule process corresponds to λn = λn, µn = 0, the linear BDP to λn = λn, µn = µn,the BDP with immigration to λn = λn + ρ, µn = µn, and the logistic BDP to λn = λn,µn = µn+ cn(n− 1).

Boundaries

Theorems II.2.2. and II.2.3. in [3] on uniqueness of the Kolmogorov equations associated withBDP rate matrices can be rephrased in terms of blowing up (backward equation) and comingdown from infinity (forward equation).

Theorem 1.2.9 Assume that λn > 0 for all n ≥ 1. Then the (minimal) BDP has infinitelifetime a.s. iff

R :=∑n≥1

(1λn

+µn

λnλn−1+ · · ·+ µn · · ·µ2

λn · · ·λ2λ1

)is infinite.

Exercise 1.5 Prove that all four BDP’s mentioned as examples have infinite lifetime a.s.

Theorem 1.2.10 Assume that µn > 0 for all n ≥ 1. Then the BDP comes down from infinityiff

S :=∑n≥1

1µn+1

(1 +

λnµn

+ · · ·+ λn · · ·λ1

µn · · ·µ1

)is finite.


Exercise 1.6 Let α ∈ (1,+∞) and define the generalized logistic BDP as the Markov chainwith rates {

n → n+ 1 at rate λnn → n− 1 at rate µnα.

Prove that any generalized logistic BDP comes down from infinity (for α = 2 , this is Theorem1.2.8 in the binary case).

Extinction

Assume that λ0 = 0, so that 0 is absorbing for the BDP and let

un := P(Ext | Z0 = n)

be the probability that 0 is hit in finite time. Further, recall that T is the extinction time andlet

θn := E(T,Ext | Z0 = n)

be the mean time to extinction, on {Ext}. Then the following recursions can be obtained, thatrely on the fact that jumps of a BDP are of amplitude at most 1{

λnun+1 − (λn + µn)un + µnun−1 = 0λnθn+1 − (λn + µn)θn + µnθn−1 = −un

n ≥ 1.

The only non-trivial cases occur either when all rates λn, µn are nonzero (n ≥ 1), or when theyare nonzero up to a certain level N for which λN = µN = 0. But this amounts to consideringu

(N)n := Pn(T < TN ) in the general case. Then set

UN :=N−1∑k=1

µ1 · · ·µkλ1 · · ·λk

.

Elementary computations show that

u(N)n = (1 + UN )−1

N−1∑k=n

µ1 · · ·µkλ1 · · ·λk

n = {1, . . . , N − 1}.

In particular, u(N)1 = UN/1 + UN .

Theorem 1.2.11 If UN tends to infinity as N →∞, then all extinction probabilities are equalto 1. If it converges to a finite limit U∞, then

un = (1 + U∞)−1∞∑k=n

µ1 · · ·µkλ1 · · ·λk

n ≥ 1.


Application to the branching process (linear BDP). when λ ≤ µ, the process is(sub)critical so that extinction occurs with probability 1, and if λ > µ, then un = (µ/λ)n.

Next turn to the expected time to extinction. First, set

θ(N)n := En(T, T < TN ) n ≤ N.

By Beppo Levi’s theorem, θ(N)n converges to θn = En(T,Ext) as N →∞. Set

ρk :=λ1 · · ·λk−1

µ1 · · ·µk.

Theorem 1.2.12 On {Ext}, the expected time to extinction is finite iff∑ρku

2k <∞. Then

θn = un

n−1∑k=1

(1 + Uk)ρkuk + (1 + Un)∑k≥n

ρku2k n ≥ 1,

and in particular,E1(T,Ext) =

∑k≥1

ρku2k.

Corollary 1.2.13 When P(Ext) = 1, the expected time to extinction is finite iff∑ρk < ∞.

Then

θn =n−1∑k=1

(1 + Uk)ρk + (1 + Un)∑k≥n

ρk n ≥ 1,

and in particular,E1(T ) =

∑k≥1

ρk.

Application to the branching process (linear BDP). It can be shown that when thebranching process with birth rate λ and death rate µ is supercritical (λ > µ) and conditioned toextinction, it has the same law as a branching process with birth rate µ and death rate λ. Thisallows us to concentrate on the case when extinction occurs a.s. Since in that case (λ ≤ µ),ρk = λk−1/kµk,

E1(T ) = λ−1∑k≥1

k−1(λ/µ)k = −λ−1 log(1− λ/µ),

if λ < µ, and E1(T ) =∞ if λ = µ (critical case).

Proof of the theorem. The recursion satisfied by θ(N)n is

λnθ(N)n+1 − (λn + µn)θ(N)

n + µnθ(N)n−1 = −u(N)

n n ≤ N.

Elementary manipulations show that

θ(N)n = (1 + Un)θ(N)

1 −n−1∑k=1

σ(N)k ,


where

σ(N)k :=

k∑i=1

u(N)i

λi

k∏j=i+1

µjλj,

an empty product being equal to 1 by convention. It is not harder to get

n−1∑k=1

σ(N)k =

n−1∑k=1

ρk(Un − Uk)u(N)k n ≤ N.

In addition, because θ(N)N = 0, we get

θ(N)1 = (1 + UN )−1

N−1∑k=1

ρk(UN − Uk)u(N)k =

N−1∑k=1

ρk

(u

(N)k

)2,

which, by Beppo Levi’s theorem, yields

θ1 =∑k≥1

ρku2k,

where both sides might be infinite. Taking the same limit for θ(N)n gives

θn = En(T,Ext) = (1 + Un)θ1 −n−1∑k=1

ρk(Un − Uk)uk.

Replacing with the expression we just got for θ1, and using the fact that when U∞ <∞, thenuk = (U∞ − Uk)/(1 + U∞), we get

θn = un

n−1∑k=1

ρkuk(1 + Uk) + (1 + Un)∑k≥n

ρku2k,

and this expression also holds when the extinction probabilities are equal to 1. 2

Chapter 2

Scaling Limits

In this chapter, we will consider real-valued stochastic processes in continuous time that ariseas scaling limits from the models seen in the last chapter.

2.1 Discrete time

2.1.1 Fixed size : composition of bridges

Before dealing with continuous populations, recall the Cannings model for a population of fixedsize 2N , and label its individuals randomly as i = 1, 2 . . . , 2N . For each i, let ηn(i) be thenumber of descendants of individual i at generation n, and set

Yn(i) :=i∑

j=1

ηn(j),

so that Yn(i) is the number of descendants, at generation n, of the subpopulation formed ofthe first i individuals. Observe that in the Cannings model, the (ηn(i); i = 1, . . . , 2N) areexchangeable, so that for each fixed n, the process (Yn(i); i = 0, 1, . . . , 2N) is a nondecreasingprocess with exchangeable increments satisfying Yn(0) = 0, and Yn(2N) = 2N . Such a processis called a discrete bridge (see Fig. 2.1).

Consistent labelling assumption. The labelling of individuals at each generation is sup-posed to be consistent with the genealogy, that is, for any 1 ≤ i < j ≤ 2N , the offspring ofindividual i at the next generation all have smaller labels than those of j (see Fig. 2.2).

Under the consistent labelling assumption, and by time homogeneity of the Cannings model,we can iterate the sampling of the first generation and assert that there are i.i.d. random maps(Bn)n≥1 from {0, 1, . . . , 2N} onto itself, all distributed as Y1, such that Yn+1 = Bn+1 ◦ Yn.In conclusion, we got a representation of the genealogy of a Cannings model by successivecompositions of i.i.d. discrete bridges.

To define the Cannings model in continuous-state space, we can analogously consider achain (Yn(·) ;n ≥ 0) of mappings from [0, 1] to [0, 1] with the following interpretation. For any0 ≤ x ≤ 1, Yn(x) is the relative size, at generation n, of the subpopulation descending froman initial subpopulation that had relative size x at generation 0. By analogy with the discrete

33

CHAPTER 2. SCALING LIMITS 34

t tt tt t

t tt tt t

!!!!

!!

!!!!!!

6

5

4

3

2

10 1

η1 = 1

η2 = 2

η3 = 0

η4 = 1

η5 = 2

η6 = 0

00 1 2 3 4 5 6

1

2

3

4

5

6

ts

st

s s

t

Figure 2.1: Discrete bridge associated to the Cannings model on one timestep.

case, the law of the chain (Yn(·) ;n ≥ 0) is given as follows. There are i.i.d. bridges (Bn)n≥1

such thatYn+1 = Bn+1 ◦ Yn,

where

Definition 2.1.1 A bridge B is a right-continuous process from [0, 1] to [0, 1] such that(i) B(0) = 0 and B(1) = 1(ii) B has a.s. nondecreasing paths and exchangeable increments.

It is known since [67] that any bridge B can be represented as follows

B(x) = (1−D)x+∑i

∆i1{si≤x} x ∈ [0, 1],

where the jump times (si)i form a sequence of i.i.d. uniform random variables, and the jumpsizes (∆i)i form an independent sequence of positive random variables such that D :=

∑i ∆i ≤

1 a.s.

t tt tt tt tt tt t

t tt tt tt tt tt t

!!!!!!

6 6 6 6

5 5 5 5

4 4 4 4

3 3 3 3

2 2 2 2

1 1 1 10 1 0 1

!!!!

!!

⇒

Figure 2.2: The consistent labelling assumption is equivalent to avoiding crossing-overs.


In the next subsection, a similar construction of branching models is given. Namely, branch-ing processes in continous state-space and discrete time, called Jirina processes, can be definedthanks to compositions of subordinators. Properties of both models, as well as relations betweenthem, will be developed in the next section, concerning continuous populations in continuoustime. The interested reader will consult [13, 14, 15, 16].

2.1.2 Stochastic size : Jirina process

Definition

Recall the BGW process (Zn(x);n ≥ 0) where x ∈ N stands for the initial condition Z0. Fromthe branching property (1.1), we can construct a doubly indexed process (Zn(x) ;n, x ≥ 0)such that, for each fixed integer n, (Zn(x) ;x = 0, 1, 2, . . .) is a Markov chain whose incrementsare i.i.d. with common law that of Zn(1). Indeed, Zn(x + y) − Zn(x) is the descendance atgeneration n of individuals x+ 1, . . . , y, so it is independent of Zn(x) and has he law of Zn(y).Therefore, it is an integer-valued increasing random walk, or renewal process, starting from 0(see Fig. 2.3). Recall that the equivalent in continuous state-space of an increasing randomwalk is a subordinator, that is, an increasing Levy process (see Appendix).

t tt tt t

t tt tt t

!!!!

!!

!!!!!!

6

5

4

3

2

1

0 1

ξ1 = 2

ξ2 = 2

ξ3 = 0

ξ4 = 1

ξ5 = 3

ξ6 = 0

00 1 2 3 4 5 6

1

2

3

4

5

6

7

8

ts

st

s s

aaaaaat

######

s 6

t

Figure 2.3: Renewal process associated to the branching model on one timestep.

The Jirina process is a branching process in discrete time but continuous state-space. Pre-cisely, the Jirina process is a time-homogeneous Markov chain (Zn;n ≥ 0) with values in [0,+∞)satisfying the branching property (1.1). As in the discrete case, writing Z0 = x ∈ [0,+∞),then for each integer n, (Zn(x) ;x = 0, 1, 2, . . .) has i.i.d. nonnegative increments. In particular,(Z1(x) ;x ≥ 0) is a subordinator, that we prefer to denote S. Let F be its Laplace exponent

E(exp(−λS(x))) = exp(−xF (λ)) λ, x ≥ 0.


By time homogeneity, the descendants at generation n of the individuals 1, . . . , x, are thedescendants at generation 1, of their descendants at generation n. Rigorously, there are i.i.d.subordinators (Sn)n≥1 distributed as S, such that, conditional on Z0, Z1, . . . , Zn,

Zn+1 = Sn+1 ◦ Zn.

In particular, by Bochner’s subordination, the process x 7→ Zn(x) is a subordinator withLaplace exponent Fn the n-th iterate of F , so that

Ex(exp(−λZn)) = exp(−xFn(λ)) λ ≥ 0.

Definition 2.1.2 We say that Z is a Jirina process with branching mechanism F .

Interpretation in the pure-jump case

In what follows, we use the word lifetime for a time interval whose width is called the lifespan.Assume that the subordinator S is a compound Poisson process, that is, it has no drift and itsLevy measure Λ is finite. In other words, St =

∑s≤t ∆s, where (∆s; s ≥ 0) is a Poisson point

process with intensity dtΛ(dx). Let b denote the mass of the measure Λ. Now we consider arandom tree where each individual is given a lifespan in (0,+∞) (see Fig. 2.4)


• each individual from generation n gives birth at rate b during her lifetime, to one offspringat a time, and the lifespans of her offspring are i.i.d. with distribution Λ(·)/b

• conditional on the lifespans and the number Zn of individuals from generation n, thebirth processes are independent.

The sum Zn of all lifespans of individuals from generation n is the value of the Jirina processat time n. The number Zn of individuals from generation n is equal to the number of jumps ofthe compound Poisson process Sn on the interval [0, Zn−1]. The chain (Zn;n ≥ 0) is a BGWprocess whose offspring distribution is a mixed Poisson law

pk =∫ ∞

0Λ(dx)b−1e−bx

(bx)k

k!.

The Jirina process Z and the BGW process Z coincide if Λ = bδ1, where δ1 is the Dirac measureat 1. Trees constructed as previously are called splitting trees, and Chapter 4 will be devotedto their study.

Immigration

Thanks to the previous interpretation, the natural construction of a Jirina process with im-migration is as follows. Let (In)n≥1 be a sequence of i.i.d. random nonnegative real numbers.The variable In embodies the immigration at generation n, and in the interpretation givenearlier, In is to be seen as the lifespan of a single immigrating individual. Consequently, theJirina process with immigration can be defined as the Markov process (Zn;n ≥ 0) such that,conditional on Z0, Z1, . . . , Zn,

Zn+1 = In+1 + Sn+1 ◦ Zn,


6 6

x

ζ1

ζ2

ζ3

ζ1

ζ2

ζ3

0 x

Figure 2.4: A splitting tree starting from one ancestor with lifespan x and the birth process ofthe ancestor, a subordinator on [0, x].

where (Sn)n≥1 are i.i.d. subordinators. Let F be the common Laplace exponent of thesesubordinators, and G be the common Laplace transform of the I’s

G(λ) := E(exp(−λI1)) λ ≥ 0.

Then if Fn is the n-th iterate of F , it is easy to show by induction that

Ex(exp(−λZn)) = exp(−xFn(λ))n−1∏k=0

G ◦ Fk(λ) λ ≥ 0. (2.1)

Note the similarity with (1.3)

Definition 2.1.3 We say that Z is a Jirina process with branching mechanism F and immi-gration mechanism G.

2.2 Continuous time

In this section, we consider real-valued stochastic processes that are the analogues in continuoustime of the models seen earlier. Except for a whole subsection on diffusions, we define theseMarkov processes directly from their transition semigroup, so we will not need to say too muchabout infinitesimal generators. However, in the next subsection, we say a word about the effecton generators of conditionings on extinction or fixation.

2.2.1 Generators, absorption times and conditionings

Consider a continuous time Markov process (Xt; t ≥ 0) with values in I ⊆ [0,∞) and transitionsemigroup (Pt; t ≥ 0) defined for any bounded measurable f by

Ptf(x) := Ex(f(Xt)) x ∈ I.


Roughly speaking, the infinitesimal generator of X is a linear operator L which is the analogueof the rate matrix in the discrete setting. It is defined on a vector space of sufficiently smoothfunctions called its domain. For each f in the domain of L, Lf is a function satisfying

Lf(x) = limt↓0

1t(Ptf(x)− f(x)) x ∈ I.

In the continuous setting, the Kolmogorov backward equations read

Ptf(x) = f(x) +∫ t

0dsLPsf(x),

whereas the Kolmogorov forward equations read

Ptf(x) = f(x) +∫ t

0dsPsLf(x).

Next set τ to be the first hitting time To of some absorbing point o in I such that for any x ∈ I,

u(x) := Px(τ <∞) > 0,

that is, o is also accessible. Set also

f(x) := Ex(τ, τ <∞).

To have a sense of what we are doing, think of

• populations with fixed size, where o = 1, so that {τ <∞} is the event of fixation

• general populations with no immigration, where o = 0, so that {τ < ∞} is the event ofextinction, or more precisely, extinction with absorption, since for continuous populations,extinction is said to occur as soon as Xt → 0.

The next statement provides harmonic equations satisfied by

U(t, x) := Px(τ > t) and F (t, x) := Ex(τ, τ < t).

The proof relies on the use of Kolmogorov equations applied to the function

g(x) := 1{x 6=o},

which is assumed to be in the domain of the generator, which means in particular that τ hasa density w.r.t. the Lebesgue measure.

Theorem 2.2.1 Assume that g is in the domain of L. Then for any x 6= o and t ≥ 0,

LU(t, x) =∂U

∂t(t, x) and Lu(x) = 0.

In addition,LF (t, x) = U(t, x)− 1 and Lf(x) = −u(x).


Proof. Observe that U(t, x) = Ptg(x) and, by Fubini’s theorem

F (t, x) = Ex∫ t

01{Xs 6=o}ds =

∫ t

0Psg(x) ds.

An application of the backward equations to g yields

U(t, x) = Ptg(x) = g(x) +∫ t

0dsLPsg(x) ds,

so that U(t, x) = 1 +∫ t

0 dsLU(s, x) ds, since g(x) = 1 when x 6= o. Differentiating this lastequation yields the first equation of the theorem. For the second one, notice that by the Markovproperty, u(Xt) = P(τ <∞ | Ft), so that

Ptu(x) = Ex(P(τ <∞ | Ft)) = Px(τ <∞) = u(x),

which implies Lu(x) = 0.For the second part of the theorem, apply L to F , which yields

LF (t, x) = L

∫ t

0Psg(x) ds =

∫ t

0LPsg(x) ds =

∫ t

0LU(s, x) ds = U(t, x)− 1,

where swapping L and the integral can be justified as follows. First, thanks to Fubini’s theoremand the semigroup property, we get

Pu

∫ t

0Psg(x) ds = Ex

∫ t

0Psg(Xu) ds =

∫ t

0dsExPsg(Xu) =

∫ t

0dsPuPsg(x) =

∫ t

0dsPu+sg(x),

so that

1u

(Pu

∫ t

0Psg(x) ds−

∫ t

0Psg(x) ds

)=

1u

∫ t

0(Pu+sg(x)− Psg(x)) ds

=1u

∫ t

0(Px(τ > s+ u)− Px(τ > s)) ds

= −1u

∫ t

0ds

∫ u+s

udv w(v),

where w stands for the density −∂U/∂t of τ . By Fubini’s theorem again, we finally get

LF (t, x) = limu↓0−1u

∫ t+u

0dv w(v)(v ∧ u),

and the result follows by dominated convergence. As for the last equation Lf = −u, the sameproof rigorously applies with t set to +∞. 2

In the next statement, we characterize the law of the process X conditioned by absorptionat o via a harmonic change of measure.


Theorem 2.2.2 The process (u(Xt); t ≥ 0) is a positive martingale and P? := P(· | τ <∞) isobtained by the following h-transform

P?x(Θ) = Ex(

1Θu(Xt)u(x)

)x ∈ I,

for any event Θ in the σ-field Ft generated by (Xs; s ≤ t). As a consequence, the generator L?

of the process X conditioned by {τ <∞} is given by

L?f(x) =L(uf)(x)u(x)

x ∈ I.

Proof. Since u(Xt) = P(τ <∞ | Ft), it is trivially a martingale. Then for any Θ ∈ Ft,

P?x(Θ) =1

u(x)Ex(1Θ1{τ<∞}) =

1u(x)

Ex(1ΘP(τ <∞ | Ft)),

which shows the first part of the theorem. As a consequence,

P ?t f(x) = Ex(f(Xt)

u(Xt)u(x)

)=

1u(x)

Pt(uf)(x),

so that1t(P ?t f(x)− f(x)) =

1tu(x)

(Pt(uf)(x)− (uf)(x)),

and the result follows letting t ↓ 0. 2

2.2.2 Fixed size : generalized Fleming–Viot processes

In the very beginning of the present chapter, we have shown that the genealogy of the Canningsmodel can be represented as follows. Define B a discrete bridge as an increasing sequence(B(j) ; 0 ≤ j ≤ 2N) that has exchangeable increments and satisfies B(0) = 0 and B(2N) = 2N .The interpretation is that B(j) is the number of descendants, after one generation, of the first jindividuals of the initial population. Now let Bn be i.i.d. discrete bridges, and for any integersm < n, set Bm,m := Id and

Bm,n := Bn ◦ · · · ◦Bm+1.

Then Bm,n(j) is the size, at generation n, of the subpopulation descending from the first jindividuals of the population at generation m (this interpretation requires a consistent labellingof individuals that was mentioned in the first subsection). Equivalently, Bm,n(j)−Bm,n(j− 1)is the descendance at generation n of individual j belonging to generation m. In particular,

Yn := B0,n

gives the genealogical structure of the descendance at generation n of the initial population.Then we have shown that this construction can be generalized to continuous populations

by considering bridges as in Definition 2.1.1, that is, nondecreasing right-continuous processesB with exchangeable increments from [0, 1] to [0, 1] such that B(0) = 0 and B(1) = 1. Allinterpretations given above carry over to continuous populations provided that the phrase‘population size’ is replaced with ‘relative size’ (i.e., fraction, proportion, frequency). A slight


difference is that the descendance at generation n of individual x belonging to generation m isnow given by Bm,n(x)−Bm,n(x−).

Thus, a discrete flow of (non-discrete) bridges can be constructed directly by generatingi.i.d. bridges and composing them as previously, which yields a collection of bridges (Bm,n ; 0 ≤m ≤ n) such that

Bm,n ◦B`,m = B`,n ` ≤ m ≤ n,where the law of a bridge Bm,n depends solely on n−m and for any n1 ≤ · · · ≤ nk, the bridgesBn1,n2 , . . . , Bnk−1,nk are independent. One of the tasks of [14] was to construct a flow of bridgesin continuous time.

Definition 2.2.3 A flow of bridges is a collection of bridges (Bs,t ; 0 ≤ s ≤ t) such that(i) for any s < t < u,

Bt,u ◦Bs,t = Bs,u

(ii) the law of a bridge Bs,t solely depends on t − s and for any t1 ≤ · · · ≤ tk, the bridgesBt1,t2 , . . . , Btk−1,tk are independent

(iii) the bridge B0,0 is the identity, and for every x ∈ [0, 1], B0,t(x) converges to x inprobability as t ↓ 0.

Such a construction can be achieved via a Poissonian construction that relies on simple bridges.More precisely, to every u ∈ [0, 1] and r ∈ [0, 1], we can associate (see Fig. 2.5) the simplebridge bu,r for which u is the unique jump time and r the size of this jump, as

bu,r(x) := (1− r)x+ r1{u≤x} x ∈ [0, 1].

6

-

��

��

��

1

10

?

6

r

u

x

Figure 2.5: Graph of the simple bridge x 7→ bu,r(x).

Here comes the Poissonian construction of the flow. Let ν be a positive measure on (0, 1]such that

∫ 10 x

2ν(dx) <∞, and (ti, ui, ri; i ∈ N) the atoms of a Poisson measure with intensitydt⊗ du⊗ νn(dr) on [0,∞]× (0, 1)× (0, 1], where

νn(dr) := 1{r>1/n}ν(dr).


Since we have truncated the measure ν to (1/n, 1], this Poisson measure is finite on any boundedtime interval, and we can assume that its atoms are ranked in the increasing order of their firstcomponent t1 < t2 < · · · Next define the simple bridge b(i) := bui,ri , and for any 0 ≤ s < t, set

Bns,t := b(j) ◦ · · · ◦ b(i),

whenever there are i < j such that ti ≤ s < ti+1 and tj ≤ t < tj+1, otherwise Bns,t := Id.

Theorem 2.2.4 (Bertoin–Le Gall [14]) The collection (Bns,t ; 0 ≤ s ≤ t) is a flow of bridges,

called νn-simple flow. It converges weakly, as n → ∞, to a flow of bridges (Bs,t ; 0 ≤ s ≤ t)called ν-simple flow.

Actually, the primary highlight in [14] is not the previous nice construction of continuous flowsof bridges but a one-to-one correspondence between these flows and Λ-coalescents. Despite thecomplexity of the subject, we think it might be useful to the reader to get a flavour of thisarea.

A Λ-coalescent is a Markov process (Πt; t ≥ 0) with values in the partitions of the integers,associated with a finite measure Λ on (0, 1]. The construction is roughly as follows. First,generate a Poisson point process on [0,∞)× (0, 1] with intensity dt⊗ r−2Λ(dr). Then at eachatom (t, r), mark each block of the current partition independently and with probability r, andmerge together all marked blocks (to get a coarser partition).

Now go back to the ν-simple flow (Bs,t ; 0 ≤ s ≤ t). Consider a collection (Vi; i ∈ N) ofindependent uniform r.v. on (0, 1), and fix t > 0. Then the equivalence relation

i ∼s j ⇔ B−1s,t (Vi) = B−1

s,t (Vj)

induces a random partition Πs,t on the integers. In words, i ∼s j if Vi and Vj ‘fall into’ thesame jump of Bs,t (see Fig. 2.6). By construction, the partition Πs,t gets finer as s grows,ending at the all-singleton partition Πt,t.

The main theorem in [14] states that (Πt−s,t; 0 ≤ s ≤ t) is Markovian and has the sametransitions as the Λ-coalescent (Πs; 0 ≤ s ≤ t) started at the all-singleton partition, with

x−2 Λ(dx) = ν(dx).

A beautiful consequence of this correspondence is the following formula (equation (8) in[15])

E(Yt(x)n) = E(x#Πnt ), (2.2)

where #Πnt is the number of blocks at time t of the Λ-coalescent restricted to {1, . . . , n}.

The two sides of this equation can be seen as two different ways of expressing the probabilitythat n individuals drawn at random in the population at time t have ancestors in the samesubpopulation of relative size x. This is a generalization of a formerly known duality betweenthe Kingman coalescent and Fisher–Wright diffusion (see forthcoming Theorem 2.3.2).

Proof of the theorem. The fact that (Bns,t ; 0 ≤ s ≤ t) is a flow follows essentially from the

fact that the composition of independent bridges is a bridge. The weak convergence relies onthe one-to-one correspondence with Λ-coalescents mentioned in the foregoing paragraph, andthe associated weak convergence proved by J. Pitman [96]. 2


6

-��

��

��

��

��

1

10 u

V2

V5

V1

V4

V3

Figure 2.6: Random partition of {1, 2, 3, 4, 5} associated to a simple bridge : {1, 4, 5}, {2}, {3}.

Now setYt := B0,t.

Definition 2.2.5 The real number Yt(x) ∈ [0, 1] is the relative size, at time t, of the subpopu-lation descending from an initial subpopulation of relative size x.

The process (Yt(x) ; t ≥ 0, x ∈ [0, 1]) is called a generalized Fleming–Viot process, or GFV-process. This terminology is justified by the fact that Yt(·) is a random distribution functionon [0, 1], and hence dYt(·) can be viewed as a random population density on [0, 1].

Another result of Bertoin and Le Gall is that for any p-tuple (x1, . . . , xp) of [0, 1], the cadlagprocess (Yt(x1), . . . Yt(xp) ; t ≥ 0) is a Markov process. By exchangeability, E(Yt(x)) = x, so(Yt ; t ≥ 0) is a martingale.

Theorem 2.2.6 (Bertoin–Le Gall [14]) The limit of the martingale (Yt ; t ≥ 0) is 0 or 1a.s. and for any x ∈ (0, 1),

P( limt→∞

Yt(x) = 1) = x.

By monotonicity of bridges, there is a random uniform point e ∈ (0, 1) called the primitive eve,such that limt→∞ Yt(x) equals 0 for any x < e and 1 for any x ≥ e. In addition,

limt→∞

Yt(e)− Yt(e−) = 1.

Other interesting results in this direction are displayed in Subsection 2.2.4.

2.2.3 Stochastic size : continuous-state branching process

In this subsection, we introduce and study the continuous-state branching process, or CSBP,or CB-process, which is a strong Markov process (Zt; t ≥ 0) with values in [0,∞] and cadlag


paths satisfying the branching property (1.1). In particular, 0 is an absorbing state for Z.CB-processes were first considered in [64]. Fundamental properties of the CB-process werediscovered in [17, 53, 89]. One might like to consult [33], [78, chapter 10], or [68].

Definition 2.2.7 We will say that extinction occurs if limt→∞ Zt = 0, and that absorptionoccurs if there is some t such that Zt = 0. The first event will be denoted {Ext} as usual, andthe second one by {Abs}.

Lamperti transform

By analogy with the continuous-time, discrete state-space setting, we will see that there is aone-to-one correspondence between CB-processes and the continous analogue of left-continuousrandom walks, namely Levy processes with no negative jumps. This correspondence can be seenthanks to various different bijections, but the simplest one is the Lamperti transform, which isexactly the same time-change as that given in the previous chapter. Similarly as in the discretesetting, define implicitly X as the solution to

Zt = X

(∫ t

0Zs ds

)t ≥ 0.

More rigorously, let T be the (possibly infinite) absorption time of the branching process andset

θt :=∫ t

0Zs ds t > 0.

Since θ is continuous increasing on [0, T ), we let κ be its inverse on [0, θT ). Next define

Xt := Z ◦ κt t < θT .

To contrast with the law Px of the CB-process Z started at x ≥ 0, we will denote by Px the lawof X started at x ∈ R. For the same reason, T0 will denote the (possibly infinite) first hittingtime of 0 by X.

Theorem 2.2.8 (Lamperti [89]) The process (Xt; t ≥ 0) is a Levy process with no negativejumps killed upon hitting 0. Let ψ be its Laplace exponent (when not killed), defined for any xas

Ex(exp−λXt) = E0(exp−λ(Xt + x)) = exp(−λx+ tψ(λ)). (2.3)

Then the Laplace transform of the one-dimensional marginal of Z is given by

Ex(exp−λZt) = exp(−xut(λ)) t, x, λ ≥ 0,

where t 7→ ut(λ) is the unique nonnegative solution of the integral equation

v(t) +∫ t

0ψ(v(s))ds = λ. (2.4)

Remark 2.1 Exactly as in the discrete setting, the Lamperti transform can be performed inboth directions, starting with a Levy process X with no negative jumps, noticing that for t < T0,

κt =∫ t

0

ds

Xs,


and that θ is the inverse of κ, stopped at κ(T0−). Conclude with Z = X ◦ θ. In particular, thisprocedure can be achieved without reference to the initial process Z, so it proves at the sametime the existence of CB-processes, and their one-to-one correspondence with Levy processeswith no negative jumps.

Surprisingly, the proof of this theorem is more difficult than the discrete version would letthink, and to my knowledge, no complete proof has ever been published so far. In a joint workwith Maria-Emilia Caballero and Geronimo Uribe (UNAM, Mexico), we actually provide twoproofs of this theorem [19].

Main properties

The Laplace exponent ψ of X is also called branching mechanism of Z, which in turn issometimes denoted CB(ψ). The branching mechanism can be specified [11, 17, 89, 107] by theLevy–Khinchin formula (A.1) displayed in the Appendix. Recall that ψ is a convex functionsuch that ψ(0) = 0 and

ρ := ψ′(0+) ∈ [−∞,+∞).

From now on, we also discard the case when X is a subordinator, which corresponds to nonde-creasing paths of both X and Z. This is consistent with the assumption made in the discretesetting that p0 6= 0 (discrete time), or d 6= 0 (continuous time).

Recall that η is the largest root of ψ.

Lemma 2.2.9 For λ < η (resp. > η, = η ), t 7→ ut(λ) increases (resp. decreases, remainsconstant equal) to η. The following equation gives an implicit characterization of ut(λ)∫ λ

ut(λ)

ds

ψ(s)= t t, λ ≥ 0. (2.5)

Proof. For clarity, write y(t) instead of ut(λ) and recall from Theorem 2.2.8 that y′ = −ψ(y),with y(0) = λ. Since ψ(η) = 0 and ψ′(η) ≥ 0, η is globally attractive for y (recall that if η 6= 0,then ψ′(0) < 0). As a consequence, integrating the differential equation as in the theorem ispossible because y(t) and y(0), i.e. λ and ut(λ), are always in the same connected componentof [0,∞)\{η}. Specifically, if we set

Gλ(v) =∫ v

λ

ds

ψ(s),

then y′ = −ψ(y), with boundary condition y(0) = λ, integrates as Gλ ◦ y(t) = −t. 2

Corollary 2.2.10 For any x ≥ 0, Px(Ext) = exp(−xη).

Proof. It is easy to deduce from Theorem 2.2.8 that either Zt goes to infinity (includingblow-up) or it goes to 0. Since for any positive λ,

limt→∞

Ex(exp−λZt) = Px( limt→∞

Zt = 0),

and u∞(λ) = η, the result follows. 2


Theorem 2.2.11 (Grey [53]) The CB-process Z blows up with positive probability iff ρ =−∞ and ∫

0

ds

ψ(s)> −∞.

Absorption at 0 occurs with positive probability iff∫ ∞ ds

ψ(s)<∞.

Proof. Thanks to Theorem 2.2.8, it is elementary to show that if T∞ denotes the blow-uptime,

Px(T∞ > t) = exp(−xut(0)).

If blow-up occurs with positive probability, then ut(0) > 0 for all t > 0 and thanks to (2.5),∫ 0+

ut(0)

ds

ψ(s)= t,

which proves that∫

0 ds/ψ(s) > −∞ (and actually implies that ρ = −∞). Conversely, assumethat this integral converges. Then we can take limits in (2.5) as λ ↓ 0, and the last displayedequation holds, showing that ut(0) is positive as soon as t is.

For the absorption time, again thanks to Theorem 2.2.8,

Px(T < t) = exp(−xut(∞)),

so that absorption occurs with positive probability iff ut(∞) > 0. The rest of the proof isidentical as that for the blow-up time. 2

Corollary 2.2.12 If ρ > −∞, then Z has integrable marginals and

Ex(Zt) = x exp(−ρt) t ≥ 0.

Definition 2.2.13 A CB-process is said subcritical if ρ > 0, critical if ρ = 0, and supercriticalif ρ < 0.

Proof of the corollary. Set

f(t) :=∂ut(λ)∂λ

|λ=0 .

Since there is no blow-up (ρ is finite), ut(0) = 0, and differentiating Ex(exp−λZt) = exp(−xut(λ)),we get Ex(Zt) = xf(t). Then differentiating (2.4) yields

f(t) + ρ

∫ t

0f(s) ds = 1,

which implies that f(t) = exp(−ρt). 2


Corollary 2.2.14 Assume∫∞ 1/ψ converges, and put

φ(t) :=∫ ∞t

ds

ψ(s)t > η.

The mapping φ : (η,∞)→ (0,∞) is bijective decreasing, and we write ϕ for its inverse mapping.Then

ut(λ) = ϕ(t+ φ(λ)) λ > η,

and furthermore,Px(T < t) = exp(−xϕ(t)).

In particular Px(Abs) = exp(−ηx) = Px(Ext).

Proof. Because∫∞

ds/ψ(s) converges, (2.5) entails∫ ∞ut(λ)

ds

ψ(s)−∫ ∞λ

ds

ψ(s)= t,

which reads φ(ut(λ)) − φ(λ) = t, or equivalently ut(λ) = ϕ(t + φ(λ)). Now from the proof ofthe previous theorem,

Px(T < t) = exp(−xut(∞)),

and because φ vanishes at +∞, we get ut(∞) = ϕ(t). The last line of the statement comesfrom the fact that Px(Abs) = Px(T <∞) and Corollary 2.2.10. 2

Remark 2.2 When∫∞ 1/ψ converges, the sample-paths of Z have infinite variation a.s.

(see Appendix). When∫∞ 1/ψ diverges, absorption is impossible but with positive probabil-

ity limt→∞ Zt = 0.

Proposition 2.2.15 Assume η > 0. Then the supercritical CB-process with branching mech-anism ψ conditioned on {Ext}, is the subcritical CB-process with branching mechanism ψ\,where ψ\(λ) = ψ(λ+ η).

Exercise 2.1 Prove the previous statement.

Extensions

Here, we give the definitions of the CB-process with immigration, and CB-process with logisticgrowth.

Immigration. Recall from the previous chapter that for discrete branching processes withimmigration, the total number of immigrants up until time t is a compound Poisson processwith intensity measure ν only charging nonnegative integers, that is, a renewal process. In thecontinuous setting, this role is played by a subordinator, which is characterized by its Laplaceexponent, denoted by χ. Then the continuous-state branching process with immigration, de-noted CBI(ψ, χ), is a strong Markov process characterized by its Laplace transform

Ex(exp−λZt) = exp(−xut(λ)−

∫ t

0χ(us(λ))ds

)λ ≥ 0, (2.6)


where ut(λ) is given by (2.5). The last equation is the analogue of (1.3) and (2.1).As a consequence, a CBI(ψ, χ) has infinitesimal generator B whose action on the exponen-

tial functions x 7→ eλ(x) = exp(−λx) is given by

Beλ(x) = (xψ(λ)− χ(λ))eλ(x) x ≥ 0.

Seminal papers on CBI-processes are [69, 95].

Logistic growth. For simplicity, we can say that the logistic branching process in continuous-state space [81], or LB-process, is the Markov process with generator U given by

Uf(x) = xAf(x)− cx2f ′(x) x ≥ 0,

where c ≥ 0 and A is the generator of a Levy process X with no negative jumps. When c = 0,note that we recover the generator of a standard CB-process.

Actually, we prefer not to define a process from its generator, and we can rigorously defineZ as

Rt = Z

(∫ t

0ds/Rs

)t < T0,

where R is the Ornstein–Uhlenbeck type process strong solution to

dRt = dXt − cRtdt t > 0.

The analogue of R in the discrete setting is the Markov chain with rates{n → n+ k at rate πkn → n− 1 at rate d+ c(n− 1).

Then change (accelerate) time to multiply the rates by n and get the logistic branching process.

Since the properties of the LB-process reviewed hereafter have rather technical proofs, wemerely state them and refer the reader to [81] for details. From now on, we assume thatE(log(X1)) <∞, which is equivalent to

∫∞ log(r)Λ(dr) <∞.In the first statement, we consider the case when X is a subordinator. We then denote by

δ ≥ 0 its drift coefficient, so that (see Appendix)

ψ(λ) = −δλ−∫ ∞

0Λ(dr)(1− e−λr) λ ≥ 0.

We introduce Condition (∂), where ρ is defined as

ρ :=∫ ∞

0Λ(dr) ≤ ∞.

We say that (∂) holds iff (at least) one of the following holds

• δ 6= 0

• ρ =∞


• c < ρ <∞.

Theorem 2.2.16 ([81]) Assume X is a subordinator. Then the LB-process Z oscillates in(δ/c, ∞) and

(i) If (∂) holds, then it is positive-recurrent in (δ/c, ∞).(ii) If (∂) does not hold, then it is null-recurrent in (0, ∞) and converges to 0 in probability.

Now X is assumed not to be a subordinator. In the next theorem, note that the criterion forabsorption does not depend on c and is the same as for the CB-process (c = 0).

Theorem 2.2.17 ([81]) Assume X is not a subordinator. Then the LB-process goes to 0 a.s.,and if T denotes the absorption time, then P(T < ∞) = 1 or 0 according to whether

∫∞ 1/ψconverges or diverges.

The next statement assures that the LB-process comes down from infinity (still under theintegrability condition given earlier).

Theorem 2.2.18 ([81]) The probabilities (Px, x ≥ 0) converge weakly, as x→∞, to the lawP∞ of the so-called logistic branching process starting from infinity. In addition, if

∫∞ 1/ψconverges, then under P∞ the absorption time T is a.s. finite and has finite expectation.

2.2.4 A relation between GFV-processes and CB-processes

Recall from Subsection 2.1.2 how we constructed the Jirina process (Zn(x), n ≥ 0), where xis the initial population size Z0 (nonnegative real number). Merely invoking the branchingproperty of Jirina processes, we have shown that there were i.i.d. subordinators (Sn;n ≥ 0)such that Zn = Sn ◦ · · · ◦ S1(x). Now for any integers m < n, set Sm,m := Id and

Sm,n := Sn ◦ · · · ◦ Sm+1.

The rough interpretation is that Sm,n(y) is the size, at generation n, of the subpopulationdescending from the subpopulation [0, y] at generation m. Equivalently, Sm,n(y) − Sm,n(y−)is the descendance at generation n of individual y belonging to generation m. A more specificinterpretation was in terms of splitting trees and lifespans, when the subordinators have nodrift. We interpret x as the lifespan of the ancestor, and say that each jump (t,∆t) of S1 = Z1

corresponds to the birth time t of a newborn whose lifespan is ∆t. Then S1(x) is interpretedas the sum of all lifespans of the ancestor’s offspring. More generally, think of [0, y] as theconcatenation of all lifetimes of the ‘first’ individuals of a given generation whose total lifespanamounts to y. Then Sm,n(y) = Sn ◦ · · · ◦ Sm+1(y) is seen to be the sum of all lifespans ofindividuals of generation n descending from the subpopulation [0, y] at generation m. Formore details, see Chapter 4.

Notice that, by Bochner’s subordination, (Sm,n ; 0 ≤ m ≤ n) is a discrete flow of subordi-nators, such that

Sm,n ◦ S`,m = S`,n ` ≤ m ≤ n,

where the law of a subordinator Sm,n depends solely on n−m and for any n1 ≤ · · · ≤ nk, thesubordinators Sn1,n2 , . . . , Snk−1,nk are independent. The flow of subordinators in continuoustime was constructed in [13] as follows.


Recall that if (Zt(x); t ≥ 0) stands for a CB-process starting from x, the branching propertyreads for all t

Zt(x+ y) = Zt(x) + Zt(y),

where Z is an independent copy of Z. Then Kolmogorov’s existence theorem ensures that onecan build on a same probability space a doubly indexed process (Zt(x); t, x ≥ 0) such that, foreach fixed time t, Zt(·) is a subordinator.

Thanks to Theorem 2.2.8, the subordinator x 7→ Zt(x) has Laplace exponent ut(·) whichsatisfies the semigroup property ut = ut−s ◦ us. This is due to the following relationship interms of Bochner subordination

Zt(x) = Zt−s ◦ Zs(x),

where Z again is an independent copy of Z. Thanks to this last equation, we can invoke onceagain Kolmogorov’s existence theorem and show that on a same probability space there is aflow of subordinators Ss,t satisfying

(i) for any s < t < u,St,u ◦ Ss,t = Ss,u

(ii) the subordinator Ss,t has Laplace exponent ut−s(·) and for any t1 ≤ · · · ≤ tk, the subor-dinators St1,t2 , . . . , Stk−1,tk are independent. In particular, (S0,t(x); t, x ≥ 0) has the same lawas the CB-process (Zt(x); t, x ≥ 0).

This representation is extremely similar to that given in Definition 2.2.3, in terms of flows ofbridges, for exchangeable genealogies of continuous populations in continuous time with fixedpopulation size. Indeed, recall from Subsection 2.2.2 that a flow of bridges can be constructedthanks to a Poisson point process with intensity dt⊗du⊗ν(dr) on [0,∞]× (0, 1)× (0, 1], whereν is a positive measure on (0, 1] such that

∫ 10 x

2ν(dx) <∞. To each atom (t, u, r) is associateda simple bridge with one single jump of size r occurring at u. Composing these bridges givesrise to a flow of bridges (Bs,t ; 0 ≤ s ≤ t) called a ν-simple flow.

Then the generalized Fleming–Viot process (Yt(x) ; t, x) := (B0,t(x) ; t, x) is to be interpretedas the relative size at time t of an initial subpopulation of size x. It is thus tempting to establisha link between the GFV-process Y and the CB-process Z via a relation between bridges andsubordinators. The idea is that the growth of a very small subpopulation Yt(εx) is blind tothe constraint of constant population size, and so must resemble a CB-process with branchingmechanism involving the measure ν.

Specifically, for ε > 0, let νε be a positive measure on (0, 1] such that∫ 1

0 x2νε(dx) < ∞.

Then let Yt be the GFV-process associated to νε and starting from εx, and set

Y εt := ε−1Yt/ε t ≥ 0,

so that in particular Y ε starts from x.

Theorem 2.2.19 (Bertoin–Le Gall [16]) Let νε denote the image of νε by the dilation r 7→r/ε. If the measures (r2 ∧ r)νε(dr) converge weakly as ε ↓ 0 to a finite measure on (0,∞),which we may write in the form (r2 ∧ r)π(dr), then the rescaled GFV-process Y ε

t converges indistribution to the CB-process Z starting from x, with Laplace exponent ψ given by

ψ(λ) =∫ ∞

0(e−λr − 1 + λr)π(dr) λ ≥ 0.

In particular, Z is a critical CB-process.


2.3 Diffusions

In this section, we study the processes defined in the last section in the special case when theirsample paths are continuous a.s. The process under focus will be the Fisher–Wright diffusionin the case of fixed size populations, and the Feller diffusion in the branching case.

2.3.1 Fisher–Wright diffusion

Definition

A slightly different definition of the Moran model from that given in the previous chapter isas follows. In a population of constant size 2N , each individual has a constant birth rateequal to N (instead of 1 in the earlier definition), and at each birth event, an individual issimultaneously killed, who is chosen uniformly among the old ones. Then the Markov chainY

(N)t counting the proportion, at time t, of descendants from some fixed initial subpopulation,

has transition rates {y → y + 1/2N at rate 2N2y(1− y)y → y − 1/2N at rate 2N2y(1− y).

The process Y (N) takes its values in {0, 1/2N, . . . , (2N − 1)/2N, 1} and its generator LN isgiven by

LNf(x) = 2N2x(1− x)(f(x+

12N)− f(x)

)+ 2N2x(1− x)

(f(x− 1

2N)− f(x)

).

As a consequence, if f is of class C2 on [0, 1], the last quantity converges to

limN→∞

LNf(x) =12x(1− x)f ′′(x)x ∈ [0, 1].

Theorem 2.3.1 The sequence of Markov processes (Y (N)) converges weakly on the Skorokhodspace of cadlag processes with values in [0, 1] to the diffusion Y , strong solution to the followingSDE

dYt =√Yt(1− Yt)dBt t ≥ 0,

where B is the standard Brownian motion. This diffusion is the so-called Fisher–Wright diffu-sion.

In this document, we do not want to spend time proving convergence theorems, but a rigorousproof of the previous statement can be found in [18]. In passing, this proof uses a well-knownduality relationship, that has very interesting consequences, and is worth being stated as follows.

Theorem 2.3.2 The n-th moment of the Fisher–Wright diffusion is given by

Ey(Y nt ) = En(yNt),

where Nt is a pure death process with transition rate from k to k − 1 equal to k(k − 1)/2.


The pure death process mentioned in the theorem is the number of blocks in the Kingmancoalescent. Since it comes down from infinity (check this thanks to Theorem 1.2.10), theLebesgue dominated convergence theorem yields

Py(T1 < t) = E∞(yNt),

where T1 is the (possibly infinite) first hitting time of 1 by the Fisher–Wright diffusion. Lettingt → ∞, one recovers Py(Fix) = y. As was mentioned in Subsection 2.2.2, equation (2.2), ageneralization of this duality relationship to Λ-coalescents was obtained recently by Bertoinand Le Gall [15].

Approximations

Note that we might like to speed up time otherwise than at a rate exactly equal to the popula-tion size, so that for example, each individual has a birth rate of Nσ. Then of course this newscaling will result in a generator equal to σ times the one we have obtained previously, that is,Lf(x) = (σ/2)x(1− x)f ′′(x).

Actually, if we come back to the initial model where each individual gives birth at rate 1,we can approximate the dynamics of the proportion of descendants from a fixed initial sub-population in a total population of constant size 2N , as N gets large, as the Fisher–Wrightdiffusion slown down at rate σ = 1/N . This approximation is very convenient, since it allowsto use the corresponding generator, say AN , given by

ANf(x) =1

2Nx(1− x)f ′′(x) x ∈ [0, 1],

and the associated Kolmogorov equations to derive such quantities as the expected time toabsorption. Indeed, set

fN (x) = Ex(τ,Fix),

where τ is the absorption time, that is, the first hitting time of the boundary {0, 1} by thediffusion with generator AN . Now, recall that in Subsection 2.2.1, we gave a few applicationsof generators, in particular thanks to Theorem 2.2.1, we have ANfN (x) = −uN (x), whereuN (x) = x is the fixation probability and fN (0) = fN (1) = 0. It takes a straightforwardcalculation to deduce that for N large,

Ex(τ,Fix) ≈ −2N(1− x) log(1− x),

whereasEx(τ) ≈ −2N(1− x) log(1− x)− 2Nx log(x).

Since absorption is certain, the last quantity, say gN , was computed thanks to ANgN = −1.Note that we have recovered very quickly the results obtained fastidiously for the Moran modelin Subsection 1.2.2.


Selection (and mutation)

A natural way of modifying this model is to add selection. Specifically, we can assume thatthe individuals of the subpopulation we are following through time are of a special type w.r.t.the rest of the population, that confers them an increased (positive selection) or decreased(negative selection) birth rate, say NσN (1 + rN ). Then Y (N) has the following transition rates{

y → y + 1/2N at rate 2N2σN (1 + rN )y(1− y)y → y − 1/2N at rate 2N2σNy(1− y).

The new generator LN of Y (N) is given by

LNf(x) = 2N2σN (1+rN )x(1−x)(f(x+

12N)− f(x)

)+2N2σNx(1−x)

(f(x− 1

2N)− f(x)

).

Assuming again that f is of class C2, we see that

LNf(x) = 2N2σNx(1− x)(rN2N

f ′(x) +2 + rN8N2

f ′′(x) + o(N−2)).

Now there are three possibilities

• if rN = O(1), then the diffusive motion, called genetic drift, vanishes in the limit comparedto the action of selection and the correct timescaling is given by σN = 1/2N . The limitingmotion is driven by the ordinary differential equation y = ry(1− y) (whether fixation orextinction occurs depends solely on the sign of r)

• if rN = O(1/N), we can keep the initial timescaling by setting σN = σ, and then withr := limN σNrN , the limiting generator is

Lf(x) = rx(1− x)f ′(x) +σ

2x(1− x)f ′′(x) x ∈ [0, 1].

• rN = o(1/N), then selection has negligible effect compared to genetic drift, and we areback to the situation of the last theorem.

As a conclusion, we give a definition.

Definition 2.3.3 The solution to the following SDE

dYt = rYt(1− Yt)dt+√σYt(1− Yt)dBt

is called Fisher–Wright diffusion with selection.

Remark 2.3 Most species are diploid. An individual has two copies of each gene, and herphenotype depends on the type (allele) of each of these copies and their interaction. So thegrowth rate of a subpopulation of individuals bearing a certain allele, say A, will not dependsolely on the frequencies of each allele A,B,C, . . . in the population, but also on how pairs ofalleles interact to confer a certain fitness (propensity to propagate one’s genes, i.e. reproduce)to their bearer. A classical way of accounting for diploidy is to replace the drift functiona(y) = ry(1 − y) with a(y) = ry(1 − y)(y + h(1 − 2y)), where h is the so-called dominancecoefficient. This drift function pops up assuming that, conditional on being bearer of the alleleA, a newborn


1. is homozygote AA with probability y and heterozygote AB,AC, . . . with probability 1− y

2. has marginal birth rate, or fitness, r(1 − h) if she is homozygote, and rh if she is het-erozygote (the fitness of all others is 0 by definition).

Thinking of the deterministic scaling limit (σ = 0), the following terminology appears natural

• when h ∈ [0, 1], selection is said directional (no deterministic equilibrium). If h = 0or 1, dominance is complete (a heterozygote has the same fitness as a homozygote); ifh = 1/2, dominance is absent; if h ∈ (0, 1), dominance is incomplete

• when h < 0, selection is said disruptive (one deterministic unstable equilibrium). Onespeaks of underdominance (the heterozygote is less fit than the homozygote and all others)

• when h > 1, selection is said stabilizing (one deterministic stable equilibrium). Onespeaks of overdominance (the heterozygote is fitter than the homozygote and all others).

Remark 2.4 Even if we will not treat that subject in detail, we want to point out that it isstandard to assume that mutations from the focal type to other types (and possibly conversely)occur with a certain probability θ/2N at each birth event. This adds to the drift function a(y)an extra term equal to

θ0(1− y)− θ1y,

where θ0 (resp. θ1) is the (rescaled) mutation rate towards (resp. from) the focal type from(resp. towards) any other type.

If both θ0 and θ1 are nonzero, the fixation probability is zero. On the other hand, the sametricks can be used to compute the stationary distribution, say π(x). In particular Lπ(x) = 0.

Conditioning

Recall again Subsection 2.2.1 and apply Theorem 2.2.2 to the Fisher–Wright diffusion Y (withor without selection) with generator L. Then the Fisher–Wright diffusion conditioned onfixation has generator L? given by

L?f(x) =L(uf)(x)u(x)

x ∈ [0, 1],

where u(x) := Px(Fix). Recall that Lu(x) = 0, so it is easy to get, in the selection case (r 6= 0)

u(x) =1− exp(−2rx/σ)1− exp(−2r/σ)

x ∈ [0, 1],

while as usual u(x) = x in the neutral case (r = 0).

Exercise 2.2 Prove the following statement.

Proposition 2.3.4 The Fisher–Wright diffusion with selection (r 6= 0) conditioned on fixationsatisfies the SDE

dYt = rYt(1− Yt) coth(rYtσ

)dt+

√σYt(1− Yt)dBt,

which becomes in the absence of selection (r = 0)

dYt = σ(1− Yt)dt+√σYt(1− Yt)dBt.


2.3.2 CB-diffusions

Observe that if a CB-process Z has continuous paths, then by Lamperti’s time-change, it isalso the case of the associated Levy process, so that the branching mechanism must be of theform ψ(λ) = σλ2/2 − rλ. Using again Lamperti’s time-change θt :=

∫ t0 Zsds and its right-

inverse κ, X := Z ◦κ is a (killed) Levy process with continuous paths, namely Xt =√σβt+ rt,

where β is a standard Brownian motion. As a consequence, X(θt) − rθt is a local martingalewith increasing process σθt, or equivalently, Zt− r

∫ t0 Zsds is a local martingale with increasing

process σ∫ t

0 Zsds. This entails

Definition 2.3.5 The CB-diffusion with branching mechanism ψ(λ) = σλ2/2− rλ satisfies

dZt = rZtdt+√σZtdBt t > 0,

where B is a standard Brownian motion. Such a diffusion is generally called Feller diffusion(denomination sometimes exclusive to r = 0), and when r = 0 and σ = 4, squared Besselprocess with dimension 0.

Since∫∞ 1/ψ converges, we can define φ, and by elementary calculus, check that if r = 0, then

for any t > 0,φ(t) = ϕ(t) = 2/σt,

so thatut(λ) =

λ

1 + σλt/2,

whereas if r 6= 0,

φ(t) = −r−1 log(1− 2r/σt) and ϕ(t) = (2r/σ)ert/(ert − 1),

so that

ut(λ) =2rert/σ

ert − 1 + 2/σλ.

Note that ρ = ψ′(0+) = −r. A CB-diffusion is subcritical if r < 0, critical if r = 0, andsupercritical if r > 0.

In the supercritical case, the probability of extinction in t units of time is

Px(T < t) = exp−(

2rx/σ1− e−rt

),

so that in particularPx(Ext) = exp−(2rx/σ).

The following statement can be derived either from Theorem 2.2.2 or from Proposition 2.2.15.

Proposition 2.3.6 A supercritical Feller diffusion with parameters (r, σ) conditioned on itsultimate extinction is distributed as a subcritical Feller diffusion with parameters (−r, σ).


2.3.3 A relation between Fisher–Wright and Feller diffusions

We want to display the same kind of relation that was shown in the first chapter, namely,that a BGW model (with Poisson offspring) conditioned to have constant size, has the samelaw as the Wright–Fisher model. In contrast with the discrete case, a CB-process does notprovide us with a genealogy, but only with a random population size. As a consequence, wewill not display a relationship between microscopic genealogies, but only between genealogiesassociated to macroscopic subpopulations.

Theorem 2.3.7 Let Z(1), Z(2) be two independent CB-diffusions with parameters (r1, σ1) and(r2, σ2). Then conditional on Z(1) +Z(2) = z at all times, the frequency Y := Z(1)/(Z(1) +Z(2))is a diffusion on [0, 1] satisfying the following SDE

dYt = s Yt(1− Yt) dt+√Yt(1− Yt)

√σ2

zYt +

σ1

z(1− Yt) dBt

where the resulting selection coefficient s equals

s = r1 − r2 +1z

(σ2 − σ1).

In particular if σ1 = σ2 =: σ,

dYt = (r1 − r2)Yt(1− Yt) dt+√σ

zYt(1− Yt) dBt.

The last displayed equation in the theorem is a Fisher–Wright diffusion with selection coefficientr1 − r2. If one compares this diffusion with that given in the subsection on large populationapproximations in the last section, one can identify the quantity σ/z with the quantity N ,which is usually termed in population genetics the effective population size (as far as stochas-ticity is concerned), that is, the constant size that a population would have if its demographicstochasticity was to be compared to that of a Wright–Fisher population. The theorem givesa more rigorous interpretation to this effective population size, as the ratio of the offspringvariance σ to the census size z (the real size). For applications of the last theorem, see e.g.[82].

Exercise 2.3 Prove the last theorem applying Ito’s formula to the bivariate mapping (x, y) 7→x/(x+ y), and a classical representation of Brownian martingales [99, Proposition V.3.8].

A somewhat reverse relation between CB-diffusions and Fisher–Wright diffusions can be derivedfrom the ideas developed in Section 2.2.2 on stochastic flows of bridges. Indeed, if one looks atthe growth of a very small subpopulation of a Fisher–Wright diffusion model, it is likely thatthis growth will be blind to the constraint of constant population size, so that it will resemblethat of a Feller diffusion.

Proposition 2.3.8 Let Y be a Fisher–Wright diffusion solving

dYt =√Yt(1− Yt) dBt

and starting from εx. Rescale this subpopulation as Zεt := ε−1Yεt. Then Zε converges indistribution to the Feller diffusion

dZt =√Zt dBt

starting from x.

Chapter 3

Quasi–stationary distributions andthe Q-process

3.1 What is quasi–stationarity ?

Let X be a Markov chain on the integers or a Markov process on [0,∞) for which 0 is one (andthe only) accessible absorbing state. Then the only stationary probability is the Dirac mass at0. A quasi-stationary distribution (QSD) is a positive measure ν satisfying

Pν(Xt ∈ A | Xt 6= 0) = ν(A) t ≥ 0. (3.1)

Hereafter, we will only consider quasi-stationary probabilities. A quasi-stationary distributionmay not be unique, but a specific candidate is defined (if it exists) as the law of Υ, where

P(Υ ∈ A) := limt→∞

Px(Xt ∈ A | Xt 6= 0),

for some Dirac initial condition x 6= 0. The r.v. Υ is sometimes called the Yaglom limit, inreference to the proof of this result for BGW processes, attributed to A.M. Yaglom [113].

If ν is a QSD, then by application of the simple Markov property,

Pν(T > t+ s) = Pν(T > s)Pν(T > t),

so that the extinction time T under Pν has a geometric distribution in discrete-time modelsand an exponential distribution in continuous-time models.

Other conditional limiting distributions include

limt→∞

Px(Xt ∈ A | Xt+s 6= 0),

for fixed s, but we shall not consider such conditionings here, and rather focus on

P↑x(Θ) := lims→∞

Px(Θ | Xt+s 6= 0),

defined, if it exists, for any Θ ∈ Ft. Thus resulting law P↑ is that of a (possibly dishonest)Markov process X↑, that we call the Q-process.

57

CHAPTER 3. QUASI–STATIONARITY 58

A natural question is to compare the stationary distributions of X↑ (if any) with thequasi-stationary distribution Υ, that is, compare the asymptotic distribution of Xt conditionedon not yet being absorbed, with the asymptotic distribution of Xt conditioned on not beingabsorbed in the distant future. Intuitively, the second conditioning is more stringent than thefirst one, and should thus charge more heavily the paths that stay away from 0, than the firstconditioning. The following statement gives a mathematical formulation of this intuition, interms of stochastic domination.

Proposition 3.1.1 If for any t > 0 the mapping x 7→ Px(Xt 6= 0) is nondecreasing, then forany starting point x 6= 0, and for any t, s > 0,

Px(Xt > a | Xt+s 6= 0) ≥ Px(Xt > a | Xt 6= 0) a > 0.

Then, if there exists a Q-process X↑, by letting s→∞,

Px(X↑t > a) ≥ Px(Xt > a | Xt 6= 0) a > 0.

If there is a Yaglom limit Υ, and in addition the Q-process converges in distribution to a r.v.X↑∞, then by letting t→∞,

X↑∞stoch≥ Υ.

Remark 3.1 By a standard coupling argument, the monotonicity condition for the probabilitiesx 7→ Px(Xt 6= 0) is satisfied for any strong Markov process with no negative jumps.

Proof. It takes a standard application of Bayes’ theorem to get that for any a, x, s, t > 0,

Px(Xt > a | Xt+s 6= 0) ≥ Px(Xt > a | Xt 6= 0)m

Px(Xt+s 6= 0 | Xt > a) ≥ Px(Xt+s 6= 0 | Xt 6= 0),

so it remains to prove the last displayed inequality. Next for any m ≥ 0, set Wm the r.v.defined as

P(Wm ∈ dr) = Px(Xt ∈ dr | Xt > m) r > 0.

For any m ≤ m′, and u ≥ 0, check that

P(Wm′ > u) ≥ P(Wm > u),

which means Wm′stoch≥ Wm, so in particular Wa

stoch≥ W0. Finally, observe that

Px(Xt+s 6= 0 | Xt > a) ≥ Px(Xt+s 6= 0 | Xt 6= 0)⇔ E(f(Wa)) ≥ E(f(W0)),

where f(x) := Px(Xs 6= 0). Since f is nondecreasing, the proof is complete. 2


3.2 Markov chains with finite state-space

3.2.1 Perron–Frobenius theory

A comprehensive account on the applications of Perron–Frobenius theory to Markov chains isthe book by E. Seneta [103].

Let X be a Markov chain on {0, 1, . . . , 2N} that has two communication classes, namely{0} and {1, . . . , 2N}. We assume further that 0 is absorbing and accessible. Next, let P bethe transition matrix, that is, the matrix with generic element pij := Pi(X1 = j) (row i andcolumn j), and let Q be the square matrix of order 2N obtained from P by deleting its firstrow and its first column. In particular,

Pi(Xn = j) = qij(n) i, j ≥ 1,

where qij(n) is the generic element of the matrix Qn (row i and column j).Everything that follows still holds if there are two absorbing states, {0} and {2N}, instead

of one, if one deletes also the last column and row of P , and replacing the event {Xn = 0} withthe event {Xn = 0 or 2N}.

Recall that the eigenvalue with maximal modulus of a matrix with nonnegative entries isreal and nonnegative, and is called the dominant eigenvalue. The dominant eigenvalue of Pis 1, but that of Q is strictly less than 1. Now because we have assumed that all nonzerostates communicate, Q is regular, so thanks to the Perron–Frobenius theorem, its dominanteigenvalue, say λ ∈ (0, 1) has multiplicity 1. We write v for its right eigenvector (columnvector with positive entries) and u for its left eigenvector (row vector with positive entries),normalized so that ∑

i≥1

ui = 1 and∑i≥1

uivi = 1.

Theorem 3.2.1 Let (Xn;n ≥ 0) be a Markov chain in {0, 1, . . . , 2N} absorbed at 0, such that0 is accessible and all nonzero states communicate. Then X has a Yaglom limit Υ given by

P(Υ = j) = uj j ≥ 1,

and there is a Q-process X↑ whose transition probabilities are given by

Pi(X↑n = j) =vjviλ−nPi(Xn = j) i, j ≥ 1.

In addition, the Q-process converges in distribution to the r.v. X↑∞ with law

P(X↑∞ = j) = ujvj j ≥ 1.

Exercise 3.1 Prove the previous statement using the following key result in the Perron–Frobenius theorem

limn→∞

λ−nqij(n) = ujvi i, j ≥ 1.


3.2.2 Application to population genetics

It would be straightforward to apply the last theorem to the Wright–Fisher model if we knewhow to express the eigenvectors associated to the dominant eigenvalue of Q, which is λ2 =(2N − 1)/2N . Unfortunately, this is not the case, but nice expressions are available for theMoran model.

To stick to the preceding framework, we could consider the discrete-time Markov chainassociated to the Moran model, namely, the chain with transition probabilities

pi,i−1 = pi,i+1 =(

i

2N

)(1− i

2N

)and pi,i =

(i

2N

)2

+(

1− i

2N

)2

.

This model is studied in detail in [43], so we prefer to provide the result in continuous time,which is extremely similar to that in discrete time. In addition, it will give a first flavour ofwhat happens in the other models we consider.

Recall the transition rates for the Moran model{i → i+ 1 at rate ai(2N − i)i → i− 1 at rate ai(2N − i),

where the individual birth rate a was taken equal to 1/2N in Chapter 1, and to 1 in Chapter2. Let (Qt) be the semigroup of the Moran model killed upon absorption (i.e. the analogueof the matrix Qn in the discrete setting), and R its rate matrix, that is, the square matrix oforder 2N − 1 with generic element

rij ={

ai(2N − i) if j = i± 1−2ai(2N − i) if j = i.

i, j ∈ {1, . . . , 2N − 1}.

It is known that Qt = exp(tR), and it is easy to check that

u =(

1 · · · 1 · · · 1)

and v =

1(2N − 1)

...j(2N − j)

...(2N − 1)1

are resp. left and right eigenvectors of R for the eigenvalue −2a, and hence of Qt for theeigenvalue exp(−2at). The latter value can be proved to be the dominant eigenvalue of Qt, sothe following result is straightforward.

Theorem 3.2.2 Let (Yt; t ≥ 0) denote the Moran model with individual birth rate a. It has aYaglom limit Υ which is uniform on {1, . . . , 2N − 1}

P(Υ = j) := limt→∞

P(Yt = j | Yt 6∈ {0, 2N}) =1

2N − 1j ∈ {1, . . . , 2N − 1}.

The Moran model conditioned on being never absorbed, or Q-process Y ↑, has transitions

Pi(Y ↑t = j) := lims→∞

Pi(Yt = j | Yt+s 6∈ {0, 2N}) = e2at j(2N − j)i(2N − i)

Pi(Yt = j).


In addition, the Q-process converges in distribution to Y ↑∞, given by

P(Y ↑∞ = j) = c−1N j(2N − j),

where cN is a normalizing constant equal to (2N − 1)(2N)(2N + 1)/6.

3.3 Markov chains with countable state-space

In this section, we will consider the case when X is a Markov chain with integer values, having0 as an accessible absorbing state, and all nonzero states communicating. A more generalsetting is studied in [52], but it roughly amounts to the situation just described.

3.3.1 R-theory

In his pioneering papers [108, 109], D. Vere-Jones extended the Perron–Frobenius theory toinfinite matrices Q with generic element qij , i, j ≥ 1, where it is implicit that

qij = Pi(X1 = j) i, j ≥ 1.

The results on the applications of R-theory to Markov chains with one absorbing state can befound in [105]. Let us also mention that a similar theory has been set up for processes withuncountable state-space by P. Tuominen and R.L. Tweedie [110].

Under the assumptions on X made above, the following result holds, where R−1 is the analogueof the dominant eigenvalue in the finite case.

Theorem 3.3.1 (Vere-Jones [108]) The entire series z 7→∑

n≥1 qij(n)zn all have the sameradius of convergence, say R, and the following dichotomy holds.

Either the series z 7→∑

n≥1 qij(n)Rn all converge, and Q is said R-transient, or they alldiverge, and Q is said R-recurrent.

In the latter case, either the sequences (qij(n)Rn) all converge to 0, and Q is said R-null,or they converge to a positive limit, and Q is said R-positive.

Remark 3.2 This theorem also applies to stochastic matrices corresponding to recurrent chains(here we consider a killed chain or a transient chain). Then the transition matrix is either 1-null (when the chain is null-recurrent), or 1-positive (when the chain is positive-recurrent).

Theorem 3.3.2 (Vere-Jones [108]) The value R is the greatest value of r for which thereexist nonzero r-subinvariant vectors (i.e. ruQ ≤ u, or rQv ≤ v, and u and v have nonnegativeentries). The infinite matrix Q is R-recurrent iff there is a unique pair (u, v) (up to a constantfactor) such that the row vector u and the column vector v both have positive entries and

RuQ = u and RQv = v.

In addition, if Q is R-recurrent, then it is R-positive iff uv < ∞ (where uv is the usual dotproduct). In that case, Rnqij(n)→ ujvi/uv.

The link with quasi-stationarity is the following


Theorem 3.3.3 (Seneta–Vere-Jones [105]) Let ai be the probability that X hits 0 startingfrom i. Then the three following limits

limn→∞

Pi(Zn = j | Zn 6= 0), limn→∞

Pi(Zn = j | Zn+k 6= 0) and limn→∞

Pj(Zn 6= 0)Pi(Zn 6= 0)

all exist and are honest, iff Q is R-positive with R > 1 and the left eigenvector u satisfies∑i≥1 aiui <∞.

3.3.2 Application to the BGW model

R-theory and quasi-stationarity

The results from the previous subsection can be applied to the BGW process (Zn, n ≥ 0) withoffspring distribution (pk, k ≥ 0) and associated p.g.f. f . Recall that m stands for the meanoffspring and that we always assume p0p1 6= 0. From now on, m ≤ 1. Note that everything thatfollows can also be stated for the supercritical process provided it is conditioned on extinction,since such conditioned process has the same law as a subcritical BGW process (see Exercise1.3). A slight difference is that the condition L logL that will appear below is always fulfilledin the latter case.

To compute the radius of convergence R defined in the previous subsection for the BGWmodel, note that

q11(n) = P1(Zn = 1) = f ′n(0),

where fn is the n-th iterate of f , so that

q11(n+ 1)q11(n)

=f ′n+1(0)f ′n(0)

= f ′ ◦ fn(0),

which converges to m (since fn(0) converges to the extinction probability 1). This proves thatR = m−1. Also, one can prove that vj = j (up to a positive constant factor), since then

(Qv)j =∑k≥1

kPj(Z1 = k) =d

dsf(s)j |s=1= mj = mvj .

One of the contributions of [105] is to prove that Q is m−1–positive iff∑

k pk(k log k) < ∞.This ensures that this condition is equivalent to the joint existence of the Yaglom limit andthe Q-process. Indeed, when Q is m−1–positive,

∑k kuk < ∞, so in particular

∑k uk < ∞,

which proves that the condition∑

k akuk < ∞ in Theorem 3.3.3 is fulfilled (recall ak = 1 inthe subcritical case).

The computation of u is not possible in general; the knowledge of v will be exploited forthe Q-process.

The most refined result for the Yaglom limit is the following

Theorem 3.3.4 (Yaglom [113], Sevast’yanov [106], Heathcote–Seneta–Vere-Jones [58])In the subcritical case, there is a r.v. Υ with probability distribution (uj , j ≥ 1) such thatuQ = mu and

limn→∞

P(Zn = j | Zn 6= 0) = uj j ≥ 1.


The following dichotomy holds.If∑

k pk(k log k) =∞, then m−nP(Zn 6= 0) goes to 0 and Υ has infinite expectation.If∑

k pk(k log k) <∞, then m−nP(Zn 6= 0) has a positive limit, and Υ has finite expectationsuch that

limn→∞

Ex(Zn | Zn 6= 0) = E(Υ).

Remark 3.3 The functional version of the equation uQ = mu is g(f(s)) − g(f(0)) = mg(s),where g is the probability generating function of Υ. Taking s = 1 implies that g(f(0)) = 1−m,so that

1− g(f(s)) = m(1− g(s)) s ∈ [0, 1],

which can be read as

E(1− sZ1 | Z0 = Υ, Z1 6= 0) = E(1− sZ1 | Z0 = Υ)/m = E(1− sΥ).

Remark 3.4 As pointed out in [105], for any α ∈ (0, 1),

gα(s) := 1− (1− g(s))α s ∈ [0, 1]

is the generating function of an honest probability distribution which is a QSD associated tothe rate of mass decay mα (instead of m), since

1− gα(f(s)) = mα(1− gα(s)) s ∈ [0, 1].

This statement is to be related to forthcoming Theorem 3.6.2.

We wish to provide here the so-called ‘conceptual proof’ of [90]. To do this, we will need thefollowing lemma.

Lemma 3.3.5 Let (νn) be a set of probability measures on the positive integers, with finitemeans an. Let νn(k) := kνn(k)/an. If (νn) is tight, then (an) is bounded, while if νn → ∞ indistribution, then an →∞.

Proof of Theorem 3.3.4 (Joffe [65], Lyons–Pemantle–Peres [90]). Let µn be the lawof Zn conditioned on Zn 6= 0. For any planar embedding of the tree, we let un be the leftmostchild of the root that has descendants at generation n, and Hn the number of such descendants.If Zn = 0, we put Hn = 0. Then check that

P(Hn = k) = P(Zn = k | Zn 6= 0, Z1 = 1) = P(Zn−1 = k | Zn−1 = 0).

Since Hn ≤ Zn, (µn) increases stochastically. Then consider the functions Gn defined by

Gn(s) := E1(1− sZn | Zn 6= 0) =1− fn(s)1− fn(0)

s ∈ [0, 1].

The previously mentioned stochastic monotonicity implies that the sequence (Gn(s))n is nonde-creasing. Let G(s) be its limit. Then G is nonincreasing, and −G is convex, so it is continuouson (0, 1). Then notice that Gn(f(s)) = Γ(fn(0))Gn+1(s), where

Γ(s) :=1− f(s)

1− ss ∈ [0, 1].


Since Γ(s) goes to m as s → 1 and fn(0) goes to 1 as n → ∞, we get that G(f(s)) = mG(s).This entails G(1−) = mG(1−), and since m < 1, G(1−) = 0, which ensures that 1−G is thegenerating function of a proper random variable.

Now since E(Zn) = E(Zn, Zn 6= 0),

P(Zn 6= 0) =E(Zn)

E(Zn | Zn 6= 0)= a−1

n mn,

where an :=∑

k≥1 kµn(k). This shows that (m−nP(Zn 6= 0)) decreases and that its limit isnonzero iff the means of µn are bounded, that is (an), are bounded.

Now consider the BGW process with immigration Z↑, where the immigrants come in packsof ζ = k individuals with probability (k + 1)pk+1/m. The associated generating function isg(s) = f ′(s)/m. Then the law of Z↑n is given by (1.3)

E0(sZ↑n) =

n−1∏k=0

(f ′ ◦ fk(s)/m) s ∈ [0, 1].

An immediate recursion shows that

E0(sZ↑n+1) = m−nsf ′n(s) s ∈ [0, 1].

Now a straightforward calculation provides

a−1n

∑k≥1

kµn(k)sk = m−nsf ′n(s) s ∈ [0, 1].

We deduce that the size-biased distribution µn is the law of the BGW process with immigrationZ↑ (+1) started at 0. Now thanks to Theorem 1.1.7, this distribution converges to a properdistribution or to +∞, according whether

∑k≥1 P(ζ = k) log k is finite or infinite, that is,

according whether∑

k pk(k log k) is finite or infinite. Conclude thanks to the previous lemma.2

Exercise 3.2 In the linear-fractional case, where p0 = b and pk = (1−b)(1−a)ak−1 for k ≥ 1,prove that the Yaglom limit is geometric with parameter min(a/b, b/a).

By the previous theorem, we know that in the subcritical case, under the L logL condition, theprobabilities P(Zn 6= 0) decrease geometrically with reason m. The following statement givestheir rate of decay in the critical case.

Theorem 3.3.6 (Kesten–Ney–Spitzer [71]) Assume σ := Var(Z1) <∞. Then we have(i) Kolmogorov’s estimate [76]

limn→∞

nP(Zn 6= 0) =2σ

(ii) Yaglom’s universal limit law [113]

limn→∞

P(Zn/n ≥ x | Zn 6= 0) = exp(−2x/σ) x > 0.

For a modern proof, see [49, 91].


Q-process

Now we state without proof the results concerning the conditioning of Z on being non extinctin the distant future (they can be found in [6, pp.56–59]).

Theorem 3.3.7 The Q-process Z↑ can be properly defined as

Pi(Z↑n = j) := limk→∞

Pi(Zn = j | Zn+k 6= 0) =j

im−nPi(Zn = j) i, j ≥ 1. (3.2)

It is transient if m = 1, and if in addition σ := Var(Z1) <∞, then

limn→∞

P(2Z↑n/σ ≥ x) =∫ ∞x

y exp(−y) dy x > 0.

If m < 1, it is positive-recurrent iff ∑k≥1

pk(k log k) <∞.

In the latter case, the stationary law is the size-biased distribution (kuk/µ) of the Yaglom limitu from the previous theorem.

Observe that the generating function of the transition probabilities of the Q-process is givenby ∑

j≥0

Pi(Z↑1 = j)sj =∑j≥0

Pi(Z1 = j)j

im−1sj =

sf ′(s)m

f(s)i−1,

where the foregoing equality is reminiscent of the BGW process with immigration. Moreprecisely, it provides a useful recursive construction for a Q-process tree, called size-biased tree(see also [90], as well as the section in Chapter 1 dedicated to BGW trees, and Chapter 4). Ateach generation, a particle is marked. Give to the others independent BGW descendant treeswith offspring distribution p, which is (sub)critical. Give to the marked particle k childrenwith probability µk, where

µk =kpkm

k ≥ 1,

and mark one of these children at random.This construction shows that the Q-process tree contains one infinite branch and one only,

that of the marked particles, and that (Z↑n − 1, n ≥ 0) is a BGW process with branchingmechanism f and immigration mechanism f ′/m (by construction, Z↑n is the total number ofparticles belonging to generation n; just remove the marked particle at each generation torecover the process with immigration).

3.4 Birth–death processes

In this section, we state a (slight refinement of a) result on birth–death processes due to E.A.van Doorn [30].


Let Y be a birth–death process with birth rate λn and death rate µn when in state n.Assume that λ0 = µ0 = 0 and that extinction (absorption at 0) occurs with probability 1. Let

S =∑i≥1

ρi +∑n≥1

(λnρn)−1∑i≥n+1

ρi ,

whereρn =

λ1λ2 · · ·λn−1

µ1µ2 · · ·µn.

Theorem 3.4.1 For a birth–death process Y absorbed at 0 with probability 1, the following areequivalent:

(i) Y comes down from infinity

(ii) There is one and only one QSD

(iii) limn↑∞ ↑ En(T0) <∞

(iv) S <∞.

Proof. In [30, Theorem 3.2], it is stated that either S =∞ and there is no or infinitely manyQSD’s, or S <∞, and there is a unique QSD, that is, (ii) and (iv) are equivalent. Let us nowexamine how this criterion is related to the nature of the boundary at +∞. Set

Un =n−1∑k=1

µ1 · · ·µkλ1 · · ·λk

.

Recall from Chapter 1, Subsection 1.2.4, that extinction has probability 1 if and only if thesequence (Un)n converges to +∞. Also extinction times have finite first-order moment if andonly if the sequence (ρn)n is summable. In addition, thanks to Theorem 1.2.12, the expectedtime to extinction starting from n equals

En(T0) =∑k≥1

ρk(1 + Un∧k) .

Then by Beppo Levi’s theorem, this quantity converges as n→∞ to∑

k≥1 ρk(1 + Uk), whichafter elementary transformations, can be seen to equal S

S = limn↑∞↑ En(T0) .

Hence (iv) and (iii) are equivalent. It is clear that (iii) implies (i), which shows in particularthat if S <∞ then Y comes down from infinity.

Now other elementary transformations on the expression given for S yield

S =∑n≥1

1µn+1

(1 +

λnµn

+ · · ·+ λn · · ·λ1

µn · · ·µ1

).

Furthermore, Theorem II.2.3. in [3] states that the solutions to Kolmogorov forward equationsassociated with birth–death rate matrices are not unique if and only if S is finite. This is


precisely the case when the birth–death process comes down from infinity (and the rate matrixis non conservative), since in that case both the minimal process and the minimal processresurrected at infinity at each killing time have transition functions which solve the forwardequations. Hence (i) implies (iv) and the proof is completed. Note that this provides in passinga proof for Theorem 1.2.10. 2

Remark 3.5 The case of general continuous-time Markov chains with integer values is inves-tigated in [45] (under the assumption that the chain does not come down from infinity). Theauthors use an elegant and compelling trick which can supposedly be extended to more generalabsorbed Markov processes, as follows. Thanks to the renewal theorem, one can associate toany probability measure µ such that 0 < Eµ(T ) <∞, the unique stationary probability measureof the process resurrected at each absorption time, in a random state drawn independently ac-cording to µ. Denote this probability measure by Φ(µ). If T has exponential moments underPµ, then one can define the iterates Φn(µ) of Φ applied to µ. It is easily seen that a fixed pointof Φ is a QSD. To get the existence of a QSD, the authors prove that a subsequence of Φn(δx)converges to a probability measure ν such that Pν(T > t) = e−γt for some positive γ and all t.Since Φ is continuous on the convex, compact (nonempty) set of probability measures µ suchthat Pµ(T > t) = e−γt, the Schauder–Tychonov fixed point theorem yields the conclusion.

3.5 Kimmel’s branching model

We consider a deterministic binary tree, interpreted as the genealogical tree of a cell population.Each cell is represented by a finite sequence i = (i1, . . . , in), where n is the depth, or generation,of the cell in the tree, and the i′s are elements of {0, 1}. Each cell i contains a number Zi

(integer) of parasites. The dynamics is given by the following rules

• When the cell i divides, it gives birth to two daughter cells i0 and i1

• Upon division, each parasite that was contained in the cell at its birth has proliferatedinto Z(0) +Z(1) parasites oriented to one of the two daughter cells, i0 and i1, respectively

• all parasites proliferate independently and with the same law.

As a consequence, conditional on Zi = n, the pair (Zi1, Zi2) is distributed as

n∑k=1

(Y (0)k , Y

(1)k ),

where the r.v.’s (Y (0)k , Y

(1)k ) are i.i.d. with common distribution (Z(0), Z(1)). This model is due

to M. Kimmel and was first studied in [73]. Note that here, time is discrete (generations), andthe law of (Z(0), Z(1)) is not necessarily symmetric.

Observe that in this model, the total number of parasites (Zn;n ≥ 0) is a BGW processwith offspring distribution ξ = Z(0) + Z(1). In addition, the number of parasites (Zn;n ≥ 0)along a random line of descent is a BGW process in random environment, where there are twoenvironments (law Z(0) and law Z(1)) chosen with equal probabilities.


We use the following notation

m0,m1 = E(Z(0)),E(Z(1))Gn = generation n

G?n = set of contaminated cells in Gn

∂T = set of infinite lines of descent∂T? = set of infinite contaminated lines of descent

Theorem 3.5.1 (Bansaye [8]) The following limit exists a.s.

L := limn→∞

2−n#G?n.

If m0m1 ≤ 1, then P(L = 0) = 1 (dilution).If m0m1 > 1, then {L = 0} = {Ext}.

In words, L is the asymptotic proportion of contaminated cells. In the case m0m1 > 1, the onlyway for the organism to heal up, that is, for this asymptotic proportion to be zero, is that allparasites die out.

Now assume m0 +m1 > max(m20 +m2

1 , 1), so that in particular

• m0m1 ≤ 1: the organism heals up (L = 0) and parasites become extinct along anyrandom line of descent (the BGW process with random environment (Zn;n ≥ 0) issubcritical)

• m0+m1 > 1: parasites (may) grow overall (the BGW process (Zn;n ≥ 0) is supercritical)

Theorem 3.5.2 (Bansaye [8]) Let

Fk(n) :=#{i ∈ G?

n : Zi = k}#G?

n

.

Then conditionally on {Ext}c,limn→∞

Fk(n) P= P(Υ = k),

where Υ is the Yaglom limit of the number of parasites along a random line of descent.In addition, let

Fk(n, p) :=#{i ∈ G?

n+p : Zi|n = k}#G?

n+p

.

Then conditionally on {Ext}c,

limn→∞

limp→∞

Fk(n, p)P=kP(Υ = k)

E(Υ),

which is the stationary probability of the Q-process along a random line of descent.

Remark 3.6 The QSD Υ is a distribution conditional on asymptotically evanescent events.This is not the case for the limit of Fk(n), which provides therefore a realization of the QSD.


Remark 3.7 As p→∞, one roughly has

Fk(n, p) ≈ P(a line of descent uniformly picked in ∂T? contained k parasites at generation n).

Because descendances of parasites separate into disjoint lines of descent with high probability,a uniform pick in ∂T? roughly amounts to a size-biased pick at generation n. Roughly speak-ing, if there are two cells at generation n, the first one containing 1 parasite and the secondone containing k parasites, then the probability that a uniform line in ∂T? descends from aparasite (this makes sense because of separation) in the second cell is k times greater than itscomplementary (size-biasing of Υ).

This provides a conceptual explanation for the link (size-biasing) between the Yaglom limitand the stationary measure of the Q-process.

3.6 The CB-process

In this section, we consider a CB-process (Zt; t ≥ 0) with branching mechanism ψ. Everythingthat follows is done under the assumptions that

ρ := ψ′(0+) ≥ 0 and φ(t) :=∫ ∞t

ds

ψ(s)<∞,

that is, Z is (sub)critical and absorbed at 0 with probability 1. If we dropped the first as-sumption and assumed ρ < 0, the rest of the section would remain unchanged provided thatwe conditioned Z on eventual absorption. Indeed, thanks to Proposition 2.2.15, this amountsto considering a subcritical CB-process.

Actually, one could also drop the second assumption [94] by conditioning Zt on events ofthe type {Zt+s > ε} or {S(ε) > t+ s}, where S(ε) is the last hitting time of ε.

Before continuing further, we state a technical lemma, whose proof, as well as those of mostother statements, can readily be found in [83]. Earlier works on this topic were [89, 100].

Recall that ϕ is the inverse mapping of φ.

Lemma 3.6.1 Assume ρ ≥ 0 and let G(λ) := exp(−ρφ(λ)). Then for any positive λ

limt→∞

ut(λ)ϕ(t)

= G(λ),

and for any nonnegative s

limt→∞

ϕ(t+ s)ϕ(t)

= e−ρs.

When ρ > 0, the following identities are equivalent(i) G′(0+) <∞(ii)

∫∞r log rΛ(dr) < +∞

(iii) There is a positive constant c such that ϕ(t) ∼ c exp(−ρt), as t→∞.In that case, G′(0+) = c−1.


3.6.1 Quasi-stationary distributions

Recall that a QSD ν is defined by (3.1), and that the extinction time T under Pν has anexponential distribution with parameter, say, γ. Then γ can be seen as the constant rate ofmass decay of (0,∞) under Pν . It is a natural question to characterize all the quasi-stationaryprobabilities associated to a mass decay rate γ.

Theorem 3.6.2 Assume ρ > 0 (subcritical case). For any γ ∈ (0, ρ] there is a unique QSD νγassociated to the mass decay rate γ. It is characterized by its Laplace transform∫

(0,∞)νγ(dr) e−λr = 1− e−γφ(λ) λ ≥ 0.

There is no QSD associated to γ > ρ.In addition, the minimal QSD νρ is the so-called Yaglom distribution, in the sense that for

any starting point x ≥ 0, and any Borel set A

limt→∞

Px(Zt ∈ A | T > t) = νρ(A).

From now on, we will denote by Υ the r.v. with distribution νρ. Since the Laplace transformof Υ is 1−G, Υ is integrable iff

∫∞r log rΛ(dr) <∞, and if this holds then

E(Υ) = c−1,

where c is defined in Lemma 3.6.1.

Proof. There are multiple ways of proving this theorem. The most straightforward way isthe following

1− e−γt = Pνγ (T < t) =∫

(0,∞)νγ(dr) e−rϕ(t),

so that, writing t = φ(λ), one gets

1− e−γφ(λ) =∫

(0,∞)νγ(dr) e−λr λ ≥ 0.

Another way of getting this consists in proving that νγQ = −γQ + γδ0, where Q is the in-finitesimal generator of the Feller process Z and δ0 is the Dirac measure at 0. Taking Laplacetransforms then leads to the differential equation

γ(1− χγ(λ)) = −ψ(λ)χ′γ(λ) λ ≥ 0,

where χγ stands for the Laplace transform of νγ . Solving this equation with the boundarycondition χ(0) = 1 yields the same result as given above.

Next recall that φ(λ) =∫∞λ du/ψ(u), so that φ′(λ) ∼ −1/ρλ and φ(λ) ∼ −ρ−1 log(λ), as

λ ↓ 0. This entails ∫(0,∞)

rνγ(dr) e−λr ∼ C(λ)λγ/ρ−1 as λ ↓ 0,

where C is slowly varying at 0+, which would yield a contradiction if γ > ρ.


Before proving that 1−Gγ/ρ is indeed a Laplace transform, we display the Yaglom distri-bution of Z. Observe that

Ex(1− e−λZt | T > t) =Ex(1− e−λZt)

Px(T > t)=

1− e−xut(λ)

1− e−xϕ(t)

so that, by Lemma 3.6.1,

limt→∞

Ex(e−λZt | T > t) = 1−G(λ) λ > 0.

Since G(0+) = 0, this proves indeed that 1 − G is the Laplace transform of some probabilitymeasure νρ on (0,∞). It just remains to show that when γ ∈ (0, ρ), 1 − Gγ/ρ is indeed theLaplace transform of some probability measure νγ on (0,∞). Actually this stems from thefollowing result applied to g = G and α = γ/ρ : if 1 − g is the Laplace transform of someprobability measure on (0,∞), then so is 1− gα, for any α ∈ (0, 1). 2

It is not difficult to get a similar result as the last theorem in the critical case. Assumeρ = 0 and σ := ψ′′(0+) < +∞ (Z has second-order moments). Variations on the arguments ofthe proof of Lemma 3.6.1 then show that ϕ(t) ∼ 2/σt as t→∞, and

limt→∞

ut(λ/t)/ϕ(t) =1

1 + 2/σλλ > 0.

Since

Ex(1− e−λZt/t | T > t) =1− e−xut(λ/t)

1− e−xϕ(t)λ > 0,

the following statement follows, which displays the usual ‘universal’ exponential limiting dis-tribution of the rescaled conditioned critical process.

Theorem 3.6.3 Assume ρ = 0 and σ := ψ′′(0+) < +∞. Then

limt→∞

Px(Zt/t > z | T > t) = exp(−2z/σ) z ≥ 0.

3.6.2 The Q-process

The next theorem states the existence in some special sense of the branching process conditionedto be never extinct, or Q-process.

Theorem 3.6.4 Let x > 0.

(i) The conditional laws Px(· | T > t) converge as t → ∞ to a limit denoted by P↑x, in thesense that for any t ≥ 0 and Θ ∈ Ft,

lims→∞

Px(Θ | T > s) = P↑x(Θ).

(ii) The probability measures P↑ can be expressed as h-transforms of P based on the (P, (Ft))-martingale

Dt = Zteρt,


that isdP↑x|Ft =

Dt

x. dPx|Ft

(iii) The process Z↑ which has law P↑x is a CBI(ψ, χ) started at x, where χ is (the Laplacetransform of a subordinator) defined by

χ(λ) = ψ′(λ)− ψ′(0+), λ ≥ 0.

Now we investigate the asymptotic properties of this Q-process Z↑. The symbol P↑ denotesthe law of the Q-process, whereas P ↑ is that of the Levy process conditioned to stay positive(see Appendix).

Also recall that the Yaglom r.v. Υ displayed in Theorem 3.6.2 is integrable as soon as∫∞r log rΛ(dr) <∞.

Theorem 3.6.5 (i) (Lamperti transform) If ρ = 0, then

limt→∞

Z↑t = +∞ a.s.

Moreover, set

θt =∫ t

0Z↑s ds, t ≥ 0,

and let κ be its right inverse, then for x > 0, the process Z↑ ◦ κ under Px has law P ↑x .

(ii) If ρ > 0, the following dichotomy holds.(a) If

∫∞r log rΛ(dr) =∞, then

limt→∞

Z↑tP= +∞.

(b) If∫∞

r log rΛ(dr) < ∞, then Z↑t converges in distribution as t → ∞ to a positive r.v.Z↑∞ which has the distribution of the size-biased Yaglom distribution

P(Z↑∞ ∈ dr) =rP(Υ ∈ dr)

E(Υ)r > 0.

3.7 Diffusions

3.7.1 Fisher–Wright diffusion

Recall the Fisher–Wright diffusion Y , which is the scaling limit of the Moran model

dYt =√Yt(1− Yt) dBt.

Works of M. Kimura ([74], but see also [43, chapter 5] and the references therein) are at theorigin of the following theorem.


Theorem 3.7.1 The Yaglom limit Υ of the Fisher–Wright diffusion exists and is uniform onthe open interval (0, 1).

Its Q-process Y ↑ can be obtained via the following h-transform

Px(Y ↑t ∈ dy) := lims→∞

Px(Yt ∈ dy | Yt+s 6∈ {0, 1}) = ety(1− y)x(1− x)

Px(Yt ∈ dy).

In addition, the Q-process converges in distribution to Y ↑∞, given by

P(Y ↑∞ ∈ dy) = 6y(1− y) dy y ∈ (0, 1).

3.7.2 CB-diffusion

In this subsection, we briefly translate our results in the case when the CB-process is a diffusion.Recall that Z is absorbed with positive probability, and is solution to

dZt = rZtdt+√σZtdBt t > 0,

where B is a standard Brownian motion. Recall ρ = ψ′(0+) = −r, as well as the expressionsgiven for φ and ϕ in Subsection 2.3.2.

The quasi-stationary distributions

Here we assume that r < 0 (subcritical case), so that ρ = −r > 0. Then from Theorem 3.6.2,for any γ ∈ (0, ρ], the Laplace transform of the QSD νγ is∫ ∞

0νγ(dx) e−λx = 1−

(λ

λ+ 2ρ/σ

)γ/ρIn particular, whenever γ < ρ, νγ has infinite expectation, and it takes only elementary calcu-lations to check that for any γ < ρ, νγ has a density fγ given by

fγ(t) =2ρ/σ

Γ(1− γ/ρ)Γ(γ/ρ)

∫ 1

0ds sγ/ρ(1− s)−γ/ρ e−2ρts/σ t > 0.

This can also be expressed as

νγ((t,∞)) = E(exp(−2ρBt/σ)) t > 0,

where B is a random variable with law Beta(γ/ρ, 1− γ/ρ).Finally, for γ = ρ, the Laplace transform is easier to invert and provides the Yaglom

distribution. The Yaglom r.v. Υ with distribution νρ is an exponential variable with parameter2ρ/σ

P(Υ ∈ dx) = (2ρ/σ)e−2ρx/σ x ≥ 0.


The Q-process

Here we assume that r ≤ 0. From Theorem 3.6.4, theQ-process is a CBI-process with branchingmechanism ψ and immigration mechanism χ = ψ′ − ρ. Now recall that the infinitesimalgenerator B of a CBI(ψ, χ) acts on the exponential functions eλ(x) := exp(−λx) as

Beλ(x) = (xψ(λ)− χ(λ))eλ(x) x ≥ 0.

In the present case, ψ(λ) = σλ2/2− rλ and χ(λ) = σλ, so that

Beλ(x) = (xσλ2

2− xrλ− σλ)eλ(x) x ≥ 0,

which yields, for any twice differentiable function f ,

Af(z) =σ

2xf ′′(x) + rxf ′(x) + σf ′(x) x ≥ 0,

and this can equivalently be read as

dZ↑t = rZ↑t dt+√σZ↑t dBt + σdt,

where Z↑ stands for the Q-process. Note that the immigration can readily be seen in theadditional deterministic term σdt.

Now if r < 0, according to Theorem 3.6.5, the Q-process converges in distribution to ther.v. Z↑∞ which is the size-biased Υ. But Υ is an exponential r.v. with parameter 2ρ/σ, so that

P(Z↑∞ ∈ dx) = (2ρ/σ)2 xe−2ρx/σ x ≥ 0,

or equivalently,

Z↑∞(d)= Υ1 + Υ2,

where Υ1 and Υ2 are two independent copies of the Yaglom r.v. Υ.

3.7.3 More general diffusions

Let (ZN )N be a sequence of birth–death processes on {0, N−1, 2N−1, . . .}, with birth and deathrates from state x respectively equal to λN (x) and µN (x). More visually,{

x → x+ 1/N at rate λN (x)x → x− 1/N at rate µN (x),

where we assume that λN (0) = µN (0) = 0, ensuring that the state 0 is absorbing. Let usfurther assume that supN λN (x) ≤ B(x), x ≥ 0, where B is at most linear, and that thereexist a positive constant γ and a function h of class C1 on [0,+∞) such that h(0) = 0, and forany x ∈ [0,+∞),

limN→∞

1N

(λN (x)− µN (x)) = h(x) and limN→∞

12N2

(λN (x) + µN (x)) = γ. (3.3)


If γ = 0, the asymptotic behaviour of ZN (as N gets large) is close to the dynamical systemz = h(z). If γ is strictly positive, the sequence (ZN )N converges in law to a generalized Fellerdiffusion defined as solution to the stochastic differential equation

dZt =√γZtdBt + h(Zt)dt. (3.4)

In particular, this is the case if λN (x) = (γN2 + µN)x and µN (x) = (γN2 + µN)x + cx2/N .The limiting diffusion is then the Feller diffusion with logistic growth [38, 81], already definedin the last chapter, that is, h(z) = rz − cz2, where we have set r = λ− µ.

The goal of [21] is to display sufficient conditions on h to get the existence of QSDs. Thus,we make the following additional assumptions on h (recall h is of class C1 on [0,+∞) andh(0) = 0) denoted Assumption (HH)

(i) limx→∞

h(x)√x

= −∞, (ii) limx→∞

xh′(x)h(x)2

= 0.

Notice that Assumption (ii) holds for most classical functions (polynomial, exponential, loga-rithmic,...). In particular (HH) holds for any subcritical branching diffusion (h(x) = −rx, withr > 0), and any logistic branching diffusion (h(x) = rx− cx2). Assumption (i) ensures at leastthat the process is absorbed at 0 with probability 1. However, we can modify it so as to havea limit equal to +∞ instead of −∞, provided the process is further conditioned on extinction.Indeed, the following statement ensures that conditioning on extinction a diffusion satisfyingthe last assumption, roughly amounts to replacing h with −h.

Proposition 3.7.2 Assume that Z is given by (3.4), where h satisfies (HH)(ii), along with

limx→∞

h(x) = +∞.

Let u(x) := Px(Ext), and X the diffusion Z conditioned on extinction. Then X is given by

dXt =√γXtdBt +

(h(Xt) + γXt

u′(Xt)u(Xt)

)dt.

In addition,

h(x) + γxu′(x)u(x)

∼x→∞ −h(x).

Remark 3.8 Recall from Proposition 2.3.6 that when h(x) = rx (pure branching diffusion),the conditioning on extinction exactly turns h into −h.

Proof. Recall from Subsection 2.2.1 that the generator L? of X is given by

L?f(x) =1

u(x)L(uf)(x) x ≥ 0,

where L is the generator of Z. Because Lf(x) = (γx/2)f ′′(x) + h(x)f ′(x) and Lu = 0, it iseasy to get

L?f(x) =γ

2xf ′′(x) +

(h(x) + γx

u′(x)u(x)

)x ≥ 0,

which ends the first part of the theorem. The second part relies on technical tricks using Lu = 0and Assumption (HH)(ii). 2


Theorem 3.7.3 (Cattiaux et al. [21]) Let Z be the diffusion given by (3.4) and assume(HH). Then for all initial laws with bounded support, the law of Zt conditioned on {Zt 6= 0}converges exponentially fast to a probability measure ν, called the Yaglom limit.

The law of the process Z starting from x and conditioned to be never extinct exists (Q-process) and converges in distribution, as t → ∞, to its unique invariant probability measure.This probability measure is absolutely continuous w.r.t. ν with a nondecreasing Radon–Nikodymderivative.

If in addition∫∞ 1/h > −∞, then Z comes down from infinity and the convergence of the

conditional one-dimensional distributions holds for all initial laws, so that the Yaglom limit νis the unique quasi-stationary distribution.

Compare the ‘uniqueness’ part of the statement with Theorem 3.4.1.

Chapter 4

Random trees and Ray–Knight typetheorems

In this chapter, we start with introducing splitting trees, namely those random trees whereindividuals give birth at constant rate during a lifetime with general distribution, to i.i.d.copies of themselves. We show that the contour process of such trees is a Levy process, whichallows to derive new properties of splitting trees, including the coalescent point process, andto recover well-known connections between Levy processes and branching processes. Theseconnections open up to more general problems such as Ray–Knight type theorems.

Unless otherwise specified, the results mentioned in this chapter are to be found in [84, 85].

4.1 Preliminaries on trees

4.1.1 Discrete trees

We consider (locally finite) rooted trees [33, 93]. Let U be the set of finite sequences of integers.A discrete tree T is a subset of U , such that each vertex of T is represented according to

the so-called Ulam–Harris–Neveu labelling as follows. The root of the tree is ∅, and the j-thchild of u = (u1, . . . , un) ∈ Nn, is uj, where vw stands for the concatenation of the sequencesv and w. Then we define |u| = n the generation, or genealogical height or depth, of u.

In addition, we write u ≺ v if u is an ancestor of v, that is, there is a sequence w such thatv = uw. For any u = (u1, . . . , un), u|k denotes the ancestor (u1, . . . , uk) of u at generation k.Finally, we denote by u ∧ v the most recent common ancestor, in short mrca, of u and v, thatis, the longest sequence w such that w ≺ u and w ≺ v.

4.1.2 Chronological trees

We consider particular instances of real trees, as defined e.g. in [40, 41, 87]. The real trees weconsider here can roughly be seen as the set of edges embedded in the plane, of some discretetree, where each edge length is a lifespan.

Specifically, each individual of the underlying discrete tree possesses a birth level α and adeath level ω, both nonnegative real numbers such that α < ω, and (possibly zero) offspringwhose birth times are distinct from one another and belong to the interval (α, ω). We think

77

CHAPTER 4. SPLITTING TREES 78

of a chronological tree as the set of all so-called existence points of individuals (vertices) of thediscrete tree. See Fig. 4.1 and 4.2 for graphical representations of a chronological tree.

Definition. More rigorously, let

U = U × [0,+∞),

and let p1 and p2 stand respectively for the canonical projections on U and [0,+∞).A chronological tree T is a subset of U such that T := p1(T) is a discrete tree and for any

u ∈ T , there are 0 ≤ α(u) < ω(u) ≤ ∞ such that (u, σ) ∈ T if and only if σ ∈ (α(u), ω(u)]. Wewrite ρ := (∅, 0) for the root of T.

For any u ∈ T , α(u) is the birth level of u, ω(u) her death level, and we denote by ζ(u) herlifespan ζ(u) := ω(u) − α(u). We will always assume that α(∅) = 0. Also for any u ∈ T andj ∈ N such that uj ∈ T , the birth time α(uj) of u’s daughter uj is in (α(u), ω(u)). Points ofthe type (u, α(uj)) are called branching points.

Because the construction is rather obvious and for the sake of conciseness, we do not mentionall other requirements needed to properly define a chronological tree.

The number of individuals alive at the chronological level τ is denoted by Ξτ

Ξτ = Card{v ∈ T : α(v) < τ ≤ ω(v)} = Card{x ∈ T : p2(x) = τ},

and (Ξτ ; τ ≥ 0) is usually called the width process.

-

--

-

-

-

-

--

-

-

--

-

--

-

-

--

--

6

--

α(u)

ω(u)

α(v)u

v

∅

ρ

Figure 4.1: A representation of a chronological tree, showing the lifetime of the root ∅ of theassociated discrete tree, and that of an individual u. The root of the chronological tree is ρ,the birth time of u is α(u) and its death time ω(u).

Genealogical structure. A chronological tree can naturally be equipped with the followinggenealogical structure. For any x, y ∈ T such that x = (u, σ) and y = (v, τ), we will say that xis an ancestor of y, and write x ≺ y as for discrete trees, if u ≺ v and


• if u = v, then σ ≤ τ

• if u 6= v, then σ ≤ α(uj), where j is the unique integer such that uj ≺ v.

For y = (v, τ), the segment [ρ, y] is the set of ancestors of x, that is

[ρ, y] := {(v, σ) : α(v) < σ ≤ τ} ∪ {(u, σ) : ∃k, u = v|k, α(v|k) < σ ≤ α(v|k + 1)}.

For any x, y ∈ T, it is not difficult to see that there is a unique z ∈ T such that [ρ, x] ∩ [ρ, y] =[ρ, z]. The existence point z is the point of highest level in T such that z ≺ x and z ≺ y. Inparticular, notice that p1(z) = p1(x) ∧ p1(y) (i.e. p1(z) is the mrca of p1(x) and p1(y)). Thelevel p2(z) is called the coalescence level of x and y, and z the coalescence point (or most recentcommon ancestor) of x and y, denoted as for discrete trees by z = x ∧ y.

Total order ‘≤’. There is a total order ≤ on T, and for any x, y ∈ T, we say that x is tothe left-hand side of y (and y is to the right-hand side of x) if x ≤ y. This order is defined asfollows

• if y ≺ x then x ≤ y

• if neither x ≺ y nor x ≺ y, then x∧y = (u, σ) is a branching point and there is an integerj such that σ = α(uj). Notice that u ≺ p1(x) and u ≺ p1(y), but either uj ≺ p1(x) oruj ≺ p1(y). If uj ≺ p1(y), then x ≤ y, otherwise y ≤ x.

It is important to notice that if T is not reduced to the root, then for any x ∈ T,

(∅, ω(∅)) ≤ x ≤ (∅, α(∅)) = ρ.

Miscellaneous. We define the total length of the tree λ(T) as the sum of all lifespans

λ(T) =∑u∈T

ζ(u) ≤ ∞.

Actually the Borel σ-field of U can be defined as the σ-field generated by sets of the type{u}×A, for u ∈ U and A any Borel set of [0,∞]. Then λ can be seen as the Lebesgue measureon T, and for any Borel subset S of T, it makes sense to define its length λ(S). We will abusivelysay that S is finite if it has finite length and finite discrete part p1(S).

4.2 The exploration process

4.2.1 Definition

Real trees are usually defined as abstract metric spaces with specific properties (such as theso-called four-point condition), see [41]. Most examples of real trees are constructed thanksto a real function coding the genealogy and called its contour function (see [32, 42]). Here wedo the opposite, that is, we define the contour of the tree after having defined the tree on a‘concrete’ space.Hereafter, T denotes a finite chronological tree, with total length ` = λ(T).


6

x u

u

u

y

z

Figure 4.2: Illustration of a chronological tree. We have y ≺ x and x ≤ y ≤ z. The heights(generations in the discrete tree) of points x, y, z are respectively 3, 0, 2.

Definition 4.2.1 For any x ∈ T, the set

S(x) := {y ∈ T : y ≤ x}

is measurable, so we can define the mapping ϕ from T to the real interval [0, `] as

ϕ(x) := λ(S(x)).

Then in particular, ϕ(∅, ω(∅)) = 0 and ϕ(ρ) = `. The process (ϕ−1(t); t ∈ [0, `]) is called theexploration process. Its second projection will be denoted by (Xt; t ∈ [0, `]) and called JCCP,standing for jumping chronological contour process.

The JCCP is a cadlag function taking the values of all levels of all points in T, once and onceonly, starting at the death level of the progenitor and following this rule : when the visit of anindividual v with lifespan (α(v), ω(v)] begins, the value of the JCCP is ω(v). The JCCP thenvisits lower chronological levels of v’s lifespan at linear speed −1. If v has no child, then thisvisit lasts exactly the lifespan ζ(v) of v; if v has at least one child, then the visit is interruptedeach time a birth level of one of v’s daughters, say w, is encountered (youngest child first sincethe visit started at the death level). At this point, the JCCP jumps from α(w) to ω(w) andstarts the visit of the existence levels of w. Since the tree is finite, the visit of v has to termi-nate: it does so at the chronological level α(v) and continues the exploration of the existencelevels of v’s mother, at the level where it had been interrupted. This procedure then goes onrecursively as soon as level 0 is encountered (0 = α(∅) = birth level of the root).

Fig. 4.3 shows a chronological tree and the associated JCCP.


6

6

@@@

@@

@@ @@

@@

@@@@

@@@@

@@

@@@

@@@@@@

@@@

@@

@@@@

@@@ -

Figure 4.3: A chronological tree (top) and the associated jumping chronological contour process(JCCP), with jumps in full line (bottom).

Remark 4.1 The JCCP has another interpretation [88, Fig.1, p.230] in terms of queues. Eachjump ∆t is interpreted as a customer of a one-server queue arrived at time t with a load ∆t.This server treats the customers’ loads at constant speed 1 and has priority LIFO (last in –first out). The tree structure is derived from the following rule: each customer is the mother ofall customers who interrupted her while she was being served. Then the value Xt of the JCCPis the remaining load in the server at time t.

4.2.2 Properties of the JCCP

Actually, the chronological tree itself can be recovered from the JCCP, modulo labelling ofsiblings. In the next statement, we give useful applications of this correspondence.

For each t ∈ [0, `], set

t := sup{s ≤ t : Xs < Xt} ∨ 0 0 ≤ t ≤ `.

Theorem 4.2.2 Let x = (u, σ) and y = (v, τ) denote any two points in T, and set s = ϕ(x)and t = ϕ(y). Then the following hold

(i) The first visit to v is tϕ(v, ω(v)) = t.

If t is a jump time of X, then t = t, and the first visit to the mother u of v in T is given by

ϕ(u, ω(u)) = sup{s ≤ t : Xs < Xt−}.


(ii) Ancestry between x and y.

y ≺ x ⇔ t ≤ s ≤ t

(iii) Coalescence level between x and y (assume e.g. s ≤ t).

p2(x ∧ y) = infs≤r≤t

Xr.

For any t ∈ [0, `], we define the process X(t) on [0, t] as

X(t)r := Xt− −X(t−r)− r ∈ [0, t],

with the convention that X0− = 0. The following corollary states that the record times of X(t)

are the first visits to v’s ancestors, and displays the process (Ht; t ≥ 0) of genealogical heights,or height process, as a functional of the path of the JCCP.

Corollary 4.2.3 For v ∈ T , set n = |v| and tk the first visit to vk = v|k (ancestor of vbelonging to generation k), that is,

tk := ϕ(vk, ω(vk)) 0 ≤ k ≤ n.

(i) Define recursively the record times of X(t) by s1 = t− t and

sk+1 = inf{s ≥ sk : X(t)s > X(t)

sk} 1 ≤ k ≤ n.

Thentk = t− sn−k+1 0 ≤ k ≤ n.

(ii) Define Ht as the genealogical height of ϕ−1(t) in T . Then Ht is given by

Ht := |p1(ϕ−1(t))| = Card{0 ≤ s ≤ t : X(t)s = sup

0≤r≤sX(t)r }

= Card{0 ≤ s ≤ t : Xs− < infs≤r≤t

Xr}. (4.1)

Quantities defined in the previous two statements are represented in Fig. 4.4.

4.3 Splitting trees

In this section, we consider random chronological trees, called splitting trees, and correspondingto (binary homogeneous) Crump–Mode–Jagers processes.

4.3.1 Definition

The Crump–Mode–Jagers process, or CMJ process, denotes the general, non Markovian, branch-ing process. The random genealogy associated to it satisfies very loose rules, that can bedescribed as follows.


6

@@

@@

@@

@@

@@

@@@

@@

@@

@@

@@@@

@@

@@

-

t′3t tt′t′2t′1

Figure 4.4: Illustration of a JCCP with jumps in full line. Set u = p1 ◦ ϕ−1(t) to be theindividual visited at time t. The first visit t of u is shown. The first visits to the Ht′ = 3ancestors of u′ are t′1, t′2, t′3.

• given her lifetime (α, ω], each individual gives birth at times α+ σ1 < α+ σ2 < · · · < ω,to clutches of respective sizes ξ1, ξ2, . . .

• the law of this reproduction scheme only depends on the joint distribution of the lifespanζ := ω − α and the point process (σi, ξi) on (0, ζ)× N

• reproduction schemes of all individuals are i.i.d.

However, in what follows, we will always make the following additional assumptions

1. all clutch sizes (the ξi’s) are a.s. equal to 1 (binary splitting)

2. conditional on the lifespan ζ, the point process (σi) is a Poisson point process on (0, ζ)with intensity b (homogeneous reproduction scheme)

3. the common distribution of lifespans is Λ(·)/b, where Λ is some positive measure on (0,∞]with mass b.

Trees satisfying the previous assumptions fit in the framework given in the preliminaries ontrees, so they are random chronological trees. They could also be called general binary treeswith constant birth rate, or homogeneous binary Crump–Mode–Jagers trees, but we will preferto use the terminology from the literature as ‘splitting trees’ [50]. On the other hand, thisterminology is unfortunate because it evokes binary fission.

We assume further that the tree starts with one ancestor individual ∅, with deterministiclifetime (0, χ], called the progenitor.

Observe that for any v ∈ T, in agreement with the definition, the pairs (α(vi), ζ(vi))i≥1

made of the birth levels and lifespans of v’s offspring are the atoms of a Poisson measure on(α(v), ω(v))×(0,+∞) with intensity measure Leb⊗Λ, where Leb stands for the Lebesgue mea-sure. In words, each individual gives birth at rate b during her lifetime (α, ω), to independent


copies of herself whose lifespan common distribution is Λ(·)/b. In particular, the total offspringnumber of v, conditional on ζ(v) = z, is a Poisson random variable with parameter bz.

Two branching processes. Recall that Ξτ denotes the number of individuals alive at τ(width of T at level τ)

Ξτ = Card{v ∈ T : α(v) < τ ≤ ω(v)},

and set Zn the number of individuals of generation n

Zn = Card{v ∈ T : |v| = n}.

From the definition, it is easy to see that (Zn;n ≥ 0) is a BGW process started at 1, withoffspring generating function f

f(s) =∫

(0,∞)b−1Λ(dz) e−bz(1−s) s ∈ [0, 1],

so the mean number of offspring per individual is

m :=∫

(0,∞)zΛ(dz).

As for the width process (Ξτ ; τ ≥ 0), it is a homogeneous binary Crump–Mode–Jagers process.Unless Λ is exponential, this branching process is not Markovian.

4.3.2 Law of the JCCP

From now on, we consider a splitting tree T with lifespan measure Λ, and to keep in mind thatthe lifespan ζ(∅) of the progenitor is equal to χ > 0, we denote the law of the tree by Pχ.

As usual we say that extinction occurs when T is finite, and denote this event by {Ext}.Recall that the jumping chronological contour process (JCCP) of a chronological tree is welldefined as soon as this tree is finite. For any positive real number τ , we define Cτ (T) as thetree obtained after cutting all branches of T above level τ

Cτ (T) := {x ∈ T : p2(x) ≤ τ}.

Observe that Cτ (T) is a finite chronological tree.

We denote by Y the compensated compound Poisson process t 7→ Yt := −t+∑

s≤t ∆s, where(∆t, t ≥ 0) is a Poisson point process with intensity measure Leb⊗Λ. Then Y is a Levy processwith no negative jumps, whose Laplace exponent (see Appendix) will be denoted by ψ

ψ(λ) := λ−∫ ∞

0(1− exp(−λr)) Λ(dr) λ ≥ 0.

The following statement is the fundamental result of this section. It is a little bit surprising atfirst sight, in the sense that, even though (ϕ−1(t); t ≥ 0) is not Markovian, its second projectionis. Recall that TA is the first hitting time of A.


Theorem 4.3.1 Denote by (Xt, t ≥ 0) the JCCP of T whenever T is finite, and by (X(τ)t , t ≥

0) the JCCP of Cτ (T).

(i) Define recursively t0 = 0, and ti+1 = inf{t > ti : X(τ)t ∈ {0, τ}}. Then under Pχ,

the killed paths ei := (X(τ)ti+t

, 0 ≤ t < ti+1 − ti), i ≥ 0, form a sequence of i.i.d. excursions,distributed as the Levy process Y killed at T0 ∧ T(τ,+∞), ending at the first excursion hitting 0before (τ,+∞). These excursions all start at τ , but the first one, which starts at min(χ, τ). Inother words, X(τ) has the law of Y reflected below τ and killed upon hitting 0.

(ii) Under Pχ(· | Ext), X has the law of the Levy process Y , started at χ, conditionedon, and killed upon, hitting 0.

Proof. We only provide a hand-waving proof, in the (sub)critical case, which is sufficient tounderstand the main point.

The JCCP visits every individual starting from its date of death, going backwards throughchronological levels (at unit speed), and interrupting the visit by the visit of its children(youngest child first, since the visit started at the death level). The JCCP X is then thesum of a pure-jump process and the linear process t 7→ −t, started at χ and killed when ithits 0. Since the sizes of jumps are exactly the lifespans of the individuals of the CMJ tree,the jumps of X have i.i.d. sizes. Since individuals give birth at constant rate b during theirlifetime, interarrivals of births are exponential with parameter b and independent. Becausethe exploration is done at unit speed, and by the lack-of-memory property of the exponentialdistribution, this also holds for the interarrivals of the jumps of X. As a consequence, X hasthe law of Y killed when it hits 0. 2

4.3.3 New properties of the Crump–Mode–Jagers process

Recall that Pχ denotes the law of the splitting tree with lifespan measure Λ, conditional onζ(∅) = χ. From now on, we will denote by Pχ the law of the Levy process Y with Laplaceexponent ψ, when it is started at χ. The scale function W is the positive function with Laplacetransform 1/ψ, and η is the largest root of ψ (see Appendix). Check that

ψ′(0+) = 1−m,

so that η > 0 corresponds to the supercritical case, and η = 0, either to the subcritical case (ifψ′(0+) > 0) or the critical case (if ψ′(0+) = 0). Also note that because ψ is convex, ψ(0) = 0,and ψ(λ) ≥ λ− b, one has

η < b.

Corollary 4.3.2 The probability of extinction is Pχ(Ext) = e−ηχ. In addition, the one-dimensional marginal of the homogeneous CMJ process Ξτ = Card{t : Xt = τ} (number ofindividuals alive at level τ) is given by

Pχ(Ξτ = 0) = Pχ(T0 < T(τ,+∞)) = W (τ − χ)/W (τ),

and conditional on being nonzero, Ξτ has a geometric distribution with success probability

Pτ (T0 > T(τ,+∞)) = 1− 1/W (τ).

In particular, Eχ(Ξτ | Ξτ 6= 0) = W (τ).


Corollary 4.3.3 In the supercritical case, set P\ := P(· | Ext). Then the JCCP under P\ is aLevy process with Laplace exponent ψ\

ψ\(λ) = ψ(λ+ η) = λ−∫ ∞

0e−ηr(1− e−λr)Λ(dr) λ ≥ 0.

As a consequence, the supercritical splitting tree conditioned on its extinction has the same lawas the subcritical splitting tree with lifespan measure e−ηrΛ(dr). In particular its birth rateequals b− η.

Corollary 4.3.4 The asymptotic behaviour of the CMJ process is as follows.(i) (Yaglom’s distribution) In the subcritical case,

limτ→∞

P(Ξτ = n | Ξτ 6= 0) = mn−1(1−m) n ≥ 1.

(ii) In the critical case, provided that∫∞

r2Λ(dr) <∞,

limτ→∞

P(Ξτ/τ > x | Ξτ 6= 0) = exp(−ψ′′(0+) x/2) x ≥ 0.

(iii) In the supercritical case, provided that∫∞

r log(r)Λ(dr) <∞, conditional on {Extc},

limτ→∞

e−ητ Ξτ = ξ a.s.,

where ξ is an exponential variable with parameter 1/ψ′(η).

Because of the last result, η is called the Malthusian parameter.

4.3.4 The coalescent point process

Fix τ > 0. For any chronological tree T, we let (xi(τ); 1 ≤ i ≤ Ξτ ) denote the ranked pointsx1 ≤ x2 ≤ · · · of T such that p2(xi) = τ . In particular, the vertices p1(xi) of T are exactly theindividuals alive at level τ .

Theorem 4.3.5 Under P, the coalescence level between xj(τ) and xk(τ) (j ≤ k) is given by

p2(xj(τ) ∧ xk(τ)) = min{ai : j ≤ i < k},

where ai := p2(xi ∧ xi+1), 1 ≤ i ≤ Ξτ , form a sequence of i.i.d. r.v., killed at the first negativeone, and distributed as τ − inf Yt, where Y is the Levy process with Laplace exponent ψ startedat τ and killed upon exiting (0, τ ]. As a consequence, the duration ci = τ − ai elapsed since thecoalescence of points xi(τ) and xi+1(τ) has

P(ci ≤ σ) =1− 1/W (σ)1− 1/W (τ)

σ ≤ τ.

Coalescence levels can be seen straightforwardly in Fig. 4.5.

Remark 4.2 Taking Λ(dz) = be−bzdz, one can recover Lemma 3 in [94]. Namely, since anelementary calculation yields W (x) = 1 + bx,

P(c1 ∈ dσ) =1 + bτ

τ

1(1 + bσ)2

σ ≤ τ.


6

u u

?

?

?

τx1 x2 x3 x4

c1

c2

c3

u u

Figure 4.5: Illustration of a splitting tree showing the durations c1, c2, c3 elapsed since coa-lescence for each of the three consecutive pairs (x1, x2), (x2, x3) and (x3, x4) of the Ξτ = 4individuals alive at level τ . The heights (generations in the discrete tree) of points x1, x2, x3, x4

are respectively 3, 3, 2, 4.

4.4 Spine decomposition of infinite trees

In this section, we rely on the properties of the JCCP to give a decomposition of infinite treesinto an infinite skeleton together with the finite trees grafted on it. We treat two cases.

The first case is that of the supercritical splitting tree, where the infinite skeleton is a Yuletree and the finite trees are distributed according to P\ = P(· | Ext).

The second case is that of (sub)critical splitting trees conditioned on not being extinct atgeneration n. As n → ∞, the limiting tree has one infinite spine, and the finite trees aredistributed according to P.

Because those trees are infinite, we cannot define the JCCP of the whole tree as previously,but only the JCCP of the first infinite subtree F1 defined as follows. Rigorously, the explorationprocess is always defined for the truncation at level τ of an infinite tree. We denote its inverseby ϕτ . Then F1 is the set of existence points x such that ϕτ (x) converges to a finite limit asτ →∞. In other terms, F1 is the subtree coming just before the leftmost infinite branch, thatis,

F1 := {x ∈ T : λ (S(x)) <∞},

where we remind the reader that S(x) = {y ∈ T : y ≤ x}. By construction, F1 admits anexploration process and hence a JCCP, both having infinite lifetime on {Extc} (the subtree F1

has infinite length).See Fig. 4.6 page 89.


4.4.1 The supercritical infinite tree

Recall that Pχ(Extc) = 1− e−ηχ, where η is the Malthusian parameter. Then set

p := P(Extc) =∫ ∞

0b−1Λ(dr)Pr(Extc) = η/b.

Under P(· | Extc), define recursively u0 = ∅, and for i ≥ 0

u0 · · ·uiui+1 is the youngest daughter with infinite descendance of u0 · · ·ui.

Then u := u0u1u2 · · · is the first infinite branch of the discrete tree T in the lexicographicalorder associated to the ‘youngest-first’ labelling of offspring. For k ≥ 0, let Ak be the age atwhich the individual u|k gives birth to u|k+ 1, and let Rk be her residual lifetime at that level

Ak := α(u|k + 1)− α(u|k) and Rk := ω(u|k)− α(u|k + 1).

Modulo the labelling of individuals, the sequence (Ak, Rk)k≥0 characterizes the leftmost back-bone or spine B1 of the real tree

B1 := {x ∈ T : p1(x) = u|n for some n}.

Finally, let A1 be the leftmost infinite branch

A1 := {x ∈ T : p1(x) = u|n for some n, p2(x) ≤ α(u|n+ 1)},

and R1 the thorns of the backboneR1 := B1\A1.

Notice that the ages characterize A1 and the residual lifetimes characterize R1. For a graphicalrepresentation of the infinite tree, see Fig. 4.6.

Theorem 4.4.1 Assume that η > 0 (supercritical case) and recall that P\ = P(· | Ext).Under P(· | Extc), the pairs (Ak, Rk), k ≥ 0, are independent copies of (A,R), where

P(A+R ∈ dz,R ∈ dr) = e−ηr dr Λ(dz) 0 < r < z.

Next let B′ = A′ ∪ R′ be a copy of B1 = A1 ∪ R1 (characterized by the pairs (Ak, Rk)), and let

• (xi)i∈N be the atoms of a Poisson measure with intensity b(1− p) on B′

• (yi)i∈N be the atoms of a Poisson measure with intensity bp on A′.

Conditionally on B′ and the foregoing Poisson measures, graft i.i.d. finite subtrees with commonlaw P\ at all points (xi), and define recursively X as the tree thus obtained with independentcopies of itself grafted at all points (yi). Then X has the law of T under P(· | Extc). Inparticular, the subtree of points with infinite descendance is a Yule tree with birth rate bp = η.


6

6

6

66

6

6

66

6

-

-

-

-

-

-

Figure 4.6: An ‘infinite’ chronological tree. The backbone B1 is the subtree made up of thevertical edges in bold, decomposed into the leftmost infinite branch A1 (bold arrows whose sizesare the ages) and the thorns R1 (superior part of the bold edges, whose sizes are the residuallifetimes). The first infinite subtree F1 is the collection of finite subtrees grafted on R1 togetherwith R1 itself. The horizontal dashed lines indicate the points where trees on A1 are grafted.

Proof. By the branching property, it suffices to characterize the joint law of the first infinitesubtree F1 and that of the leftmost infinite branch B1. From Theorem 4.3.1, we can deduce thatthe law under P(· | Extc) of the JCCP of F1 is that of a Levy process with Laplace exponentψ starting from a r.v. with law Λ(·)/b and conditioned not to hit 0. As a consequence, settingJ the future infimum of X

Jt := inf{Xs : s ≥ t},

and Uk the k-th jump time of J , we get

(Ak, Rk) = (J(Uk)−X(Uk−), X(Uk)− J(Uk)).

Moreover, conditional on these ages and residual lifetimes, excursions (XUk+t − JUk ; 0 ≤ t <Uk+1−Uk) of X above its future infimum are independent and distributed as X started at Rkunder P \ and killed upon hitting zero. Since F1 is a collection of finite subtrees whose JCCPsare exactly the excursions of X above its future infimum, we have proved that, conditionallyon B1, all these finite subtrees are i.i.d. with law P\.

It only remains to prove that the ages and residual lifetimes (Ak, Rk) have the law displayedin the theorem. To that end, consider any jump of X occurring at time, say, S. By the Markovproperty, conditional on X(S−) = x and X(S) − X(S−) = z, the event that this jump is a


jump of the future infimum and the value of that jump on this event, are independent from thepast. This yields recursively the independence of the pairs (Ak, Rk). Now let I be the infimumof {Xt : t ≥ S}. It is known that x + z − I is exponential with parameter η, so that, sincep = P(S is a jump time of J), we get

P(A+R ∈ dz,R ∈ dr) = p−1

∫ ∞0

P(X(S−) ∈ dx,X(S)−X(S−) ∈ dz, x+ z − I ∈ dr)

= p−1b−1Λ(dz) η e−ηr dr.

The proof ends recalling that p = η/b. 2

4.4.2 Conditioned (sub)critical trees

As in the previous subsection, we want to give a spine decomposition of the splitting tree, buthere we are interested in the (sub)critical case, when the tree is conditioned to reach largeheights. This is the exact analogue of the Q-process applied to splitting trees.

We define Pn as the conditional law P of a splitting tree on the event of reaching height n

Pn := P(· | Zn 6= 0).

Under Pn, define recursively u0 = ∅, and for i ≥ 0,

u0 · · ·uiui+1 is the youngest daughter of u0 · · ·ui who has living descendants at generation n.

Then u := u0 · · ·un is the first lineage of the discrete tree T reaching generation n in thelexicographical order associated to the ‘youngest-first’ labelling. Note that u depends on n.

The ages and residual lifetimes (Ak, Rk) are defined as previously, as well as the backboneor spine Bn

Bn := {x ∈ T : p1(x) = u|k for some k}.

In the following statement, when we say that the laws Pn converge for finite truncations asn → ∞, we mean convergence for events that are measurable w.r.t. the truncation Cσ(T) ={x ∈ T : p2(x) ≤ σ} defined earlier, for some fixed σ.

Theorem 4.4.2 Assume that∫∞ Λ(dr)(r log r) < ∞. As n → ∞, the law of Bn under Pn

converges for finite truncations to an infinite spine B′ characterized by the pairs (Ak, Rk),k ≥ 0, which form a sequence of independent copies of (UD, (1− U)D), where

P(D ∈ dz) = m−1zΛ(dz) z > 0,

and U is an independent r.v. uniform on (0, 1).Next let (xi)i∈N be the atoms of a Poisson measure with intensity b on B′. Then as n→∞,

the law of T under Pn converges for finite truncations to the infinite tree with one infinitebranch B′ obtained by grafting, at all points (xi) on B′, i.i.d. finite subtrees with common lawP.

Remark 4.3 One could think that conditioning on reaching large heights amounts asymptot-ically to conditioning on reaching high levels. Actually, in the latter case, the limiting JCCPwould be a Levy process conditioned to hit all positive numbers before zero. In the critical case,


such a process exists and is known as a Levy process conditioned to stay positive (see Appendix).But in the subcritical case, such a process does not exist without additional assumptions on theexistence of exponential moments [61]. In a work in progress, we will show that the formerconditioning leads to a new definition of the Levy process conditioned to stay positive, whichcoincides, in the critical case, with the usual definition.

Sketch of proof. Let σ > 0, and set Nn(σ) the number of individuals at level τ having livingdescendants at generation n. Since P(Zn 6= 0) vanishes as n→∞, P(Nn(σ) = 1 | Zn 6= 0)→ 1as n → ∞. This ensures that the limiting tree under Pn has only one infinite branch, whichmust therefore have the law of B′. To get the law of the spine B′, first observe that

Pn(Rk < r,Ak +Rk ∈ dz) = ρ−1n−kb

−1Λ(dz)Pr(Zn−k 6= 0),

whereρj := b−1

∫ ∞0

Λ(dz)Pz(Zj 6= 0) j ∈ N.

Now under Pz, (Zn;n ≥ 0) is a BGW process with mean m, starting from a Poisson variablewith mean z. More precisely, its offspring distribution is given by

pk := b−1

∫ ∞0

Λ(dz)(bz)k

k!

It is well-known that since∫∞ Λ(dr)(r log r) <∞, we have

∑k pk(k log k) <∞. Then thanks

to Theorem 3.3.4, there is a positive constant c such that

limn→∞

m−nP(Zn 6= 0 | Z0 = j) = cj,

and since P(Zn 6= 0 | Z0 = j) ≤ j P(Zn 6= 0 | Z0 = 1), the dominated convergence theoremensures that m−nρn → cm/b, and

limn→∞

Pn(Rk < r,Ak +Rk ∈ dz) = m−1 rΛ(dz) 0 < r < z.

The theorem follows from a conditional (on the spine) application of the branching property.We leave to the reader the task of concluding rigorously. 2

4.5 Ray–Knight type theorems

4.5.1 Ray–Knight theorems

Let H be any real stochastic process whose occupation measure at time t admits a density(Lat ; a ∈ R) w.r.t. some (deterministic) reference measure, called the local time. For example,when the reference measure is Lebesgue measure, this means that for any bounded measurablef a.s. ∫ t

0f(Xs) ds =

∫ +∞

−∞f(a)Lat da.

When H is transient or killed at some a.s. finite time, one can consider the total local time(La∞; a ∈ R), that we will call local time process, as indexed by a.


Theorems displaying stochastic processes whose local time process is Markovian are usuallycalled Ray–Knight type theorems since the pioneering works of D. Ray and F.B. Knight onBrownian motion. In particular, there are two very well-known such theorems [99], that wewill call first and second Ray–Knight theorem.

1. The first Ray–Knight theorem states that the local time process of a reflecting Brownianmotion starting from 0 and killed upon summing up x units of local time at 0, is a squaredBessel process of dimension 0, starting from x. Note that this Markov process is theCB-process with branching mechanism ψ(λ) = λ2/2

2. The second Ray–Knight theorem, also called Ray–Knight–Williams theorem, states thatthe local time process on [0, a] of a Brownian motion starting from a and killed uponhitting 0, is a squared Bessel process of dimension 2 starting from 0. Note that thisMarkov process is the CBI-process with branching mechanism ψ(λ) = λ2/2 and immigra-tion mechanism χ(λ) = λ/2.

Another version of this theorem uses the theorem due to J. Pitman stating that aftertime-reversal, the Brownian motion starting from a and killed upon hitting 0 becomesa Bessel process of dimension 3 starting from 0 and killed at its last hitting time of a.Since local times are invariant under time reversal, the Ray–Knight–Williams theoremcan even be stated for a = ∞ as follows: the local time process of a Bessel process ofdimension 3 starting from 0 is a squared Bessel process of dimension 2 starting from 0.

4.5.2 A new viewpoint on Le Gall–Le Jan theorem

The finite variation case

The next statement is a slight refinement of Proposition 3.2 in [88] and sheds light on thegenealogy defined in [13] by composing subordinators.

For background on Jirina processes, see Subsection 2.1.2.

Theorem 4.5.1 Let X be a Levy process with Laplace exponent ψ(λ) = λ−∫∞

0 (1−e−λr)Λ(dr)such that ψ′(0+) ≥ 0, started at χ and killed when it hits 0. Then define the height process

Ht = Card{0 ≤ s ≤ t : Xs− < infs≤r≤t

Xr} t < T0,

and Z the local time process of H

Zn :=∫ T0

0dt1{Ht=n} n ≥ 0.

Then (Zn;n ≥ 0) is a Jirina process with branching mechanism F starting from Z0 = χ, where

F (λ) :=∫ ∞

0(1− e−λr) Λ(dr) λ ≥ 0.

Remark 4.4 Recall that a Jirina process with branching mechanism F is such that for anyinteger n ≥ 1,

Zn = Sn ◦ · · · ◦ S1(χ),


where the Si are i.i.d. subordinators with Laplace exponent F . It satisfies

Eχ(exp(−λZn)) = exp(−χFn(λ)) λ ≥ 0,

where Fn is the n-th iterate of F (see Subsection 2.1.2). Note that, after Ξ and Z, we got withZ a third branching process.

Proof. Recall from Theorem 4.2.2 and equation (4.1) that Ht = |p1 ◦ ϕ−1(t)|, that is, Ht isthe genealogical height of the individual visited at time t by the exploration process ϕ−1, sothat Zn is the sum of lifespans of all individuals of generation n, that is,

Zn =∑

v:|v|=n

ζ(v).

Next recall that conditional on Zn−1 = z, the number of individuals of generation n is aPoisson variable with mean bz, and all their lifespans are independent with law Λ(·)/b. Thisis not different from saying that Zn = S(z), where S is a compound Poisson process withintensity measure Λ. The theorem follows from a recursive application of this observation. 2

Remark 4.5 Actually, the previous theorem even holds if Λ is not finite any longer, providedthat

∫0 rΛ(dr) < ∞ (and

∫∞ Λ(dr) < ∞). Indeed, this assumption ensures that the corre-sponding Levy process X has finite variation (see Appendix), so that, even if birth events ‘raindown’ on each lifetime, the jumps of the past supremum of X are discrete. Indeed, recall thatX(t) is the time-reversed of X at time t

X(t)s = Xt− −X(t−s)− s ∈ [0, t],

and set S(t) its past supremum

S(t)s := sup{X(t)

r : 0 ≤ r ≤ s}.

Since X and X(t) have the same law, the jumps of S(t) are also discrete and we can define theheight process H of X exactly as in (4.1), that is,

Ht := Card{0 ≤ s ≤ t : X(t)s = S(t)

s }.

The infinite variation case

Note that the last theorem can be stated in the form of a Ray–Knight theorem as follows.

‘The local time process of the height process H of X is Markovian, where X is aLevy process with no negative jumps and finite variation, which does not drift to+∞ and is killed upon hitting zero. In addition the law of this Markov process canbe specified as that of a (sub)critical Jirina process’.

Actually, most of what has been done in this chapter associating Levy processes with finitevariation to splitting trees can be adapted to general Levy processes with no negative jumps,associated to continuous trees called Levy trees [32, 34]. We do not define Levy trees, but theywill be underlying the discussion, since we want to take advantage of the genealogies encoded


by Levy processes thanks to the intuition we get from the study of splitting trees with finitevariation.

The generalization of Theorem 4.5.1 to the infinite variation case is due to J.F. Le Galland Y. Le Jan [88]. Let X be any Levy process with no negative jumps. When X has infinitevariation, S(t)−X(t) is a strong Markov process for whom 0 is regular, so H cannot be definedas the cardinal of a finite set, but as the local time of S(t) −X(t) at 0.

Theorem 4.5.2 (Le Gall–Le Jan [88]) Assume that X has infinite variation and does notdrift to +∞. Then the local time process of the height process H of the Levy process X startedat x and killed upon hitting 0, is a CB-process with branching mechanism ψ started at x.

Remark 4.6 This last theorem is not only a generalization of Theorem 4.5.1 to the infinitevariation case, but also includes as a special case, and hence can be seen as an extension of,the first Ray–Knight theorem. Indeed, if X is a standard Brownian motion started at x andkilled upon hitting 0, then ψ(λ) = λ2/2, and by continuity of the path, Ht = Xt − It where I isthe past infimum of X.

Now thanks to Levy’s equivalence theorem, H has the law of a reflecting Brownian motionkilled upon summing up x units of local time at 0. According to the first Ray–Knight theorem,the local time process of H is indeed the CB-process with branching mechanism ψ, that is, thesquared Bessel process with dimension 0 (see Subsection 4.5.1).

If we try to push further the analysis of the relations between Levy processes and trees toget new Ray–Knight type theorems, we can follow two tracks, having in common the focus oninfinite trees.

1. the first possibility is to consider the JCCP of the first infinite subtree defined in theprevious section, either in the supercritical case, or in the (sub)critical case when the treeis conditioned on reaching large heights

2. the second track relies on the JCCP of the truncated tree.

4.5.3 JCCP of the first infinite subtree

The supercritical case

In Subsection 4.4.1, we have seen that in the supercritical case under P(· | Extc), the JCCP ofthe first infinite subtree F1 is a Levy process conditioned to stay positive, and Theorem 4.4.1has provided us with a genealogical characterization of this tree. Indeed, as pointed out in theproof of this theorem, F1 is a collection of finite subtrees starting from Rk at generation k (seeFig. 4.6), and, conditional on Rk, these subtrees are i.i.d. with law P\. Moreover, the (Rk) arei.i.d. with common distribution

P(R ∈ dr) = Λ(r)e−ηr dr,

where Λ(r) =∫

(r,∞) Λ(dz).This can be recorded in the following statement, where X is assumed to be a Levy process

with finite variation and Laplace exponent ψ(λ) = λ−∫∞

0 (1−e−λr) Λ(dr) such that ψ′(0+) < 0(i.e., X drifts to +∞), started at χ and conditioned not to hit 0.


The height process of X is defined as usual, and its local time process Z as in the beginningof this section, that is,

Zn :=∫ ∞

0dt1{Ht=n} n ≥ 0,

since we work under P (· | T0 =∞). Recall from Theorem 4.5.1 that F is the Laplace exponentof a subordinator, given by F (λ) =

∫∞0 (1− e−λr) Λ(dr).

Theorem 4.5.3 Under Pχ(· | T0 = ∞), the local time process (Zn;n ≥ 0) of H is a Jirinaprocess with immigration. It has branching mechanism F \

F \(λ) := F (λ+ η) λ ≥ 0,

and immigration mechanism G

G(λ) :=F (λ+ η)λ+ η

λ ≥ 0.

It is started at the first residual lifetime R0, which is an exponential variable with parameter ηconditioned to be smaller than χ.

Proof. We prove the theorem in the case when Λ is finite. Set In the n-th thorn of R1, whichhas length Rn. From Theorem 4.4.1, for any x ∈ In, the height of x is |p1(x)| = n. Thenadapting the proof of Theorem 4.5.1, we get that

Zn = Rn +∑

v:∃σ,(v,σ)∈F1\R1,|v|=n

ζ(v).

Next, thanks to Theorem 4.4.1 again, all vertices v with height n such that (v, σ) ∈ F1\R1

for some σ, descend from the thorns with indices strictly lower than n and have lifespanmeasure e−ηr Λ(dr) (thanks to Corollary 4.3.3), so that, conditional on Zn−1 = z, we haveZn = Rn + S(z), where S is a subordinator independent from Rn and with Laplace exponent

F \(λ) =∫ ∞

0e−ηr(1− e−λr) Λ(dr) = F (λ+ η) λ ≥ 0.

As a consequence, Z is a Jirina process with immigration, where the branching mechanism isF \ and the immigration is given by the common distribution of (Rn), so that

G(λ) = E(exp(−λR)) =∫ ∞

0Λ(r) e−(λ+η)r dr =

F (λ+ η)λ+ η

λ ≥ 0,

where the last equality follows from an elementary integration by parts. 2

Actually, a generalization of this theorem can be found in [79]. Indeed, let X be a generalLevy process with no negative jumps and infinite variation, and Laplace exponent ψ. As in theprevious subsection, it is possible to define its height Ht as the local time at 0 of S(t) −X(t).


Theorem 4.5.4 ([79]) Assume that X has infinite variation and drifts to +∞ (ψ′(0+) < 0).Then the local time process of the height process H of the Levy process X started at χ andconditioned not to hit 0, is a CBI-process with branching mechanism ψ\

ψ\(λ) = ψ(λ+ η) λ ≥ 0,

and immigration mechanism γ

γ(λ) :=ψ(λ+ η)λ+ η

λ ≥ 0.

It is started from an exponential r.v. with parameter η conditioned to be smaller than χ.

Remark 4.7 The similarity between both theorems is obvious. However, note that in the finitecase, F \ is the Laplace exponent of a subordinator and G is the Laplace transform of a positiver.v., whereas in the infinite case, ψ\ is the Laplace exponent of a general Levy process withno negative jumps and γ is the Laplace exponent of a subordinator. Of course, the biggestdifference lies in the proofs of these theorems. In the infinite variation case, the proof reliesheavily on excursion theory.

The critical case

Exactly as in the supercritical case, we will state a result that uses the spine decompositiongiven in the previous section for a splitting tree conditioned to reach large heights. We haveseen that the limiting JCCP of the first infinite subtree is a Levy process conditioned to staypositive. In the critical case, this process is defined by harmonic transform of the initial lawof the Levy process X via the martingale Xt1{t<T0} (see Appendix). The proof of the nextstatement relies on Theorem 4.4.2. Since this proof mimicks that for the supercritical case, wewill not display it.

Let X be a Levy process with finite variation and Laplace exponent ψ(λ) = λ −∫∞

0 (1 −e−λr) Λ(dr) such that ψ′(0+) = 0, started at χ and conditioned to stay positive. Its law isdenoted P ↑χ . As usual, H denotes the height process of X.

Theorem 4.5.5 Under P ↑χ, the local time process (Zn;n ≥ 0) of H is a Jirina process withimmigration. It has branching mechanism F and immigration mechanism G such that G(0) = 0and

G(λ) :=F (λ)λ

λ > 0.

It is started at the r.v. R0, which is uniform on (0, χ).

Remark 4.8 One might like to take better advantage of Theorem 4.4.2, since the previousstatement only takes into account the first infinite subtree, which is the subtree to the left-handside of the spine. It would indeed be possible to state a bivariate version of this theorem usingthe knowledge we have of the joint law of left-hand and right-hand sides of the spine, but thiswould require defining first a bivariate version of the Levy process conditioned to stay positive.The interested reader will consult [31].


As in the supercritical case, we state the infinite variation counterpart of the previous theorem.Let X be a Levy process with no negative jumps and with infinite variation such that ψ′(0+) =0, started at χ and conditioned to stay positive, in the aforementioned sense (h-transform). Itslaw is denoted by P ↑χ . Note that by local absolute continuity, it is still possible to define itsheight Ht as the local time at 0 of S(t) −X(t).

Theorem 4.5.6 ([79]) Under P ↑χ, the local time process of the height process H of X, is aCBI-process with branching mechanism ψ and immigration mechanism γ such that γ(0) = 0and

γ(λ) :=ψ(λ)λ

λ > 0.

It is started from a uniform r.v. on (0, χ) (from 0 if χ = 0).

Remark 4.9 This last theorem is not only a generalization of Theorem 4.5.5 to the infinitecase, but also includes as a special case, and hence can be seen as an extension of, the secondRay–Knight theorem. Indeed, if X is a standard Brownian motion started at 0 and conditionedto stay positive, then ψ(λ) = λ2/2, and it is known [112] that X is a Bessel process of dimension3 started at 0. By continuity of the path, H = X so by the second Ray–Knight theorem, thelocal time process of H is indeed a CBI-process with branching mechanism ψ and immigrationmechanism γ(λ) = ψ(λ)/λ = λ/2, that is, the squared Bessel process of dimension 2 (seeSubsection 4.5.1).

4.5.4 JCCP of the truncated tree

Truncations

Let T be a (possibly infinite in the supercritical case) splitting tree associated to the finitelifespan measure Λ and recall that Cτ (T) denotes the tree truncated at level τ

Cτ (T) = {x ∈ T : p2(x) ≤ τ}.

Recall from Section 4.2 that the JCCP (X(τ)t ; t ≥ 0) of the truncated splitting tree Cτ (T) is

well defined. Next, if we define H(τ) as the height process of X(τ), and Z(τ) as the local timeprocess of H(τ), that is, as usual,

Z(τ)n :=

∫ T0

0dt1{H(τ)

t =n} n ≥ 0,

then, by a similar proof as that of Theorem 4.5.1,

Z(τ)n =

∑v:|v|=n,ω(v)≤τ

ζ(v) +∑

v:|v|=n,α(v)≤τ<ω(v)

(τ − α(v)).

The integers n (depending on τ) for which the second sum in the last displayed equation is notempty are called incomplete generations.

The presence of incomplete generations impedes us from getting a new Ray–Knight typetheorem for a fixed τ , but only as τ →∞. Indeed, it is easily seen that the lowest incompletegeneration tends to ∞ as τ →∞, so that (Z(τ)

n ;n ≥ 0) converges for finite-dimensional distri-butions, to the Jirina process with branching mechanism F (λ) =

∫∞0 (1− e−λr) Λ(dr). To state

this in a useful way, we construct the process (X(τ)t ; t, τ ≥ 0) without further reference to the

tree.


Construction of doubly indexed JCCP

Let ψ be defined as the usual Laplace exponent λ−F (λ) and χ be the lifespan of the progenitor.As seen in Theorem 4.3.1, the JCCP (X(τ)

t ; t ≥ 0) of the truncated splitting tree Cτ (T) is theconcatenation of a sequence of i.i.d. excursions, distributed as the Levy process with Laplaceexponent ψ killed at T0 ∧ T(τ,+∞), and ending at the first excursion hitting 0 before (τ,+∞).These excursions all start at τ , but the first one, which starts at min(χ, τ).

One can see X(τ) as a Levy process reflected below τ . Actually, there are various ways ofreflecting a real-valued path below or above a constant barrier. One of the possibilities is asfollows.

We will use the same notation Cτ for the map which associates to a real-valued path ε thepath Cτ (ε) obtained after cutting the excursions of ε above τ . Rigorously, set

Aτt (ε) =∫ t

0ds1{εs<τ} t ≥ 0,

and let aτ (ε) be its right inverse

aτt (ε) = inf{s : Aτs(ε) > t} t ≥ 0.

Then the truncation of ε at level τ is defined as

Cτ (ε) = ε ◦ aτ (ε).

A problem with this procedure is for transient paths. Indeed, if εt goes to ∞ as t → ∞, thenaτ (ε) is only defined on [0, Sτ ), where Sτ is the last hitting time of τ , and so is Cτ (ε). Onthe contrary, here, we aim at constructing paths which end up at 0, and are reflected as manytimes as necessary to hit 0.

The right way to do this is exactly as in the theorem, by concatenating independent excur-sions below τ , of the Levy process X with Laplace exponent ψ. In the infinite variation case,one has to use Ito’s synthesis theorem [62]. But in reality, we can do that in the same fashionfor both finite and infinite variation cases, by considering S the past supremum of a generalLevy process X with no negative jumps after it has overshot level τ

S(τ)t := sup{Xs : s ≤ t} ∨ τ.

Definition 4.5.7 The process X(τ) defined as

X(τ)t := Xt − S(τ)

t + τ t ≥ 0,

will be called the Levy process X reflected below τ .

Actually, since we want to let τ →∞, we need a joint construction of the Levy process reflectedbelow τ for all positive τ . In particular, notice that by construction of the truncated trees inthe beginning of this subsection, we had for any τ < τ ′,

Cτ (X(τ ′)) = X(τ).

Proposition 4.5.8 There is a doubly indexed process (X(τ)t ; t, τ ≥ 0) such that for each τ ,

X(τ) has the law of the Levy process X reflected below τ and for any τ < τ ′,

Cτ (X(τ ′)) = X(τ).


Proof. By standard results on Levy processes, for any τ < τ ′,

Cτ (X(τ ′)) L= X(τ).

Then the proposition stems from Kolmogorov’s existence theorem. 2

Theorem 4.5.9 In the finite variation case, the family, indexed by τ , of the processes (X(τ)t ; t ≥

0) all killed upon hitting 0, have the same law as the family of JCCPs (X(τ))τ of the splittingtree with lifespan measure Λ and truncated at level τ .

If Z(τ)n stands for the time spent at level n by the height process H(τ) of X(τ), then as

τ → ∞, (Z(τ)n ;n ≥ 0) converges for finite-dimensional distributions to the Jirina process with

branching mechanism F .

Appendix A

Levy processes

We denote by (Xt; t ≥ 0) a Levy process (i.e. a process with independent and homogeneous in-crements with a.s. cadlag paths) with no negative jumps, and by Px its distribution conditionalon X0 = x. The following facts can all be found in [11, 78, 101].

The (convex) Laplace exponent ψ of X is defined by

E0(exp(−λXt)) = exp(tψ(λ)) t, λ ≥ 0,

and is specified by the Levy–Khinchin formula

ψ(λ) = αλ+ βλ2 +∫ ∞

0(e−λr − 1 + λr1r<1) Λ(dr) λ ≥ 0, (A.1)

where α ∈ R, β ≥ 0 denotes the Gaussian coefficient, and Λ is a Levy measure on (0,∞), thatis, a positive measure such that

∫∞0 (r2 ∧ 1)Λ(dr) <∞.

6 6ψ(λ) ψ(λ)

λ λ0 0

η

ψ′(0+) > 0 ψ′(0+) < 0

-

(a) (b)

-

Figure A.1: Graph of the Laplace exponent ψ of a Levy process with no negative jumps (a)drifting downwards (subcritical) (b) drifting upwards (supercritical).

100

APPENDIX A. LEVY PROCESSES 101

The paths of X have finite variation a.s. if and only if β = 0 and∫ 1

0 rΛ(dr) <∞. Otherwiseits paths have infinite variation a.s.

When X has increasing paths a.s., it is called a subordinator. In that case, ψ(λ) < 0for any positive λ, and we will prefer to define its Laplace exponent as −ψ. Indeed, since asubordinator has finite variation, its Laplace exponent can be written as

−ψ(λ) = dλ+∫ ∞

0(1− e−λr) Λ(dr) λ ≥ 0,

where d ≥ 0 is called the drift coefficient. If X is not a subordinator, since ψ is convex,

limλ→∞

ψ(λ) = +∞.

Set St := sups≤tXs the past supremum of X. If X is not a subordinator but has finitevariation, the zero set of S −X is discrete. In addition, ψ(λ) = aλ −

∫∞0 (1 − e−λr) Λ(dr) for

some positive a, so that ψ(λ) ≤ aλ, and∫∞ 1/ψ diverges (from this observation stems the fact

that CB-processes that hit 0 have infinite variation).When X has infinite variation, S − X is a strong Markov process for which 0 is regular,

that is,inf{t > 0 : St −Xt = 0} = 0 P − a.s.

When X is not a subordinator, denote by η the largest root of ψ

η := sup{λ : ψ(λ) = 0}.

If η > 0, ψ has exactly two roots (0 and η), otherwise ψ has a unique root η = 0. Theright-inverse of ψ is denoted by φ : [0,∞)→ [η,∞) and has φ(0) = η.

We writeTA = inf{t ≥ 0 : Xt ∈ A},

for the first entrance time in a Borel set A. It is known that

E0(e−qT{x}) = eφ(q)x q ≥ 0, x ≤ 0.

There exists a unique continuous function W : [0,+∞)→ [0,+∞), with Laplace transform∫ ∞0

e−λxW (x)dx =1

ψ(λ)λ > η,

such that for any 0 ≤ x ≤ a,

Px(T0 < T(a,+∞)) = W (a− x)/W (a). (A.2)

The function W is strictly increasing and called the scale function.We introduce two laws connected with P .

1. The probability measure P ↑x is the law of the Levy process started at x > 0 and conditionedto stay positive. When X drifts to +∞, the conditioning is taken in the usual sense,since X stays positive with positive probability. When X oscillates, as X reaches 0continuously, the process (Xt1{t<T0}, t ≥ 0) is a martingale. Then P ↑x is defined by localabsolute continuity w.r.t. Px with density x−1Xt1{t<T0} on Ft (t ≥ 0).

It is known that the probability measures P ↑x converge weakly as x→ 0+ to a Markovianlaw P ↑0 . For details, see [9, 10, 25, 26, 27].

APPENDIX A. LEVY PROCESSES 102

2. The law P \ is that of a spectrally positive Levy process with Laplace exponent ψ\ : λ 7→ψ(λ + η). When η > 0, the path of X a.s. drifts to +∞, and if I∞ denotes its overallinfimum (Lemma VII.7(i) in [11]), then for any t ≥ 0,

limx→∞

P (Θ | I∞ > −x) = P \(Θ) Θ ∈ Ft.

This process is thus called the Levy process conditioned to drift to −∞, and for every realnumber x, P \x is defined by local absolute continuity w.r.t. Px with density exp(−η(Xt−x)) on Ft (t ≥ 0).

Bibliography

[1] Aldous, D.J. (1999)Deterministic and stochastic models for coalescence (aggregation and coagulation): a re-view of the mean–field theory for probabilists. Bernoulli 5(1) 3–48.

[2] Aldous, D.J., Popovic, L. (2004)A critical branching process model for biodiversity Adv. Appl. Prob. 37(4) 1094–1115.

[3] Anderson, W.J. (1991)Continuous time Markov chains – an application oriented approach. Springer–Verlag, NewYork.

[4] Asmussen, S., Hering, H. (1983)Branching processes. Birkauser, Boston.

[5] Athreya, K.B. (2000)Change of measures for Markov chains and the L logL theorem for branching processes.Bernoulli 6 323–338.

[6] Athreya, K.B., Ney P.E. (1972)Branching processes. Springer–Verlag, New York.

[7] Axelrod, D.E., Kimmel, M. (2002)Branching processes in biology. Springer–Verlag, New York.

[8] Bansaye, V. (2008)Proliferating parasites in dividing cells : Kimmel’s branching model revisited. Ann. Appl.Prob. 18(3) 967–996.

[9] Bertoin, J. (1991)Sur la decomposition de la trajectoire d’un processus de Levy spectralement positif en soninfimum. Ann. Inst. H. Poincare 27 537–547.

[10] Bertoin, J. (1992)An extension of Pitman’s theorem for spectrally positive Levy processes. Ann. Probab. 201464–1483.

[11] Bertoin, J. (1996)Levy Processes. Cambridge U. Press.

103

BIBLIOGRAPHY 104

[12] Bertoin, J. (2006)Random Fragmentation and Coagulation Processes. Cambridge U. Press.

[13] Bertoin, J., Le Gall, J.F. (2000)The Bolthausen–Sznitman coalescent and the genealogy of continuous–state branchingprocesses. Probab. Theory Relat. Fields 117(2) 249–266.

[14] Bertoin, J., Le Gall, J.F. (2003)Stochastic flows associated to coalescent processes. Probab. Theory Relat. Fields 126(2)261–288.

[15] Bertoin, J., Le Gall, J.F. (2005)Stochastic flows associated to coalescent processes II. Stochastic differential equations.Ann. Inst. Henri Poincare 41 307–333.

[16] Bertoin, J., Le Gall, J.F. (2006)Stochastic flows associated to coalescent processes III. Limit theorems. Illinois J. Math.50 147–181.

[17] Bingham, N.H. (1976)Continuous branching processes and spectral positivity. Stoch. Proc. Appl. 4 217–242.

[18] Birkner, M. (2005)Stochastic models from population biology. Lecture notes for a course at TU Berlin.http://www.wias-berlin.de/people/birkner/smpb-30.6.05.pdf

[19] Caballero, M.E., Lambert, A., Uribe Bravo, G. (2008)Proof(s) of the Lamperti representation of continuous-state branching processes. PreprintarXiv:0802.2693.

[20] Cannings, C. (1974)The latent roots of certain Markov chains arising in genetics: A new approach. I. Haploidmodels. Adv. Appl. Prob. 6 260–290.

[21] Cattiaux, P., Collet, P., Lambert, A., Martınez, S., Meleard, S., San Martin, J. (2008)Quasi–stationary distributions and diffusion models in population dynamics. PreprintarXiv:math/0703781.

[22] Champagnat, N. (2006)A microscopic interpretation for adaptive dynamics trait substitution sequence models.Stoch. Proc. Appl. 116 1127–1160.

[23] Champagnat, N., Ferriere, R., Meleard S. (2006)Unifying evolutionary dynamics: from individual stochastic processes to macroscopic mod-els via timescale separation. Theor. Popul. Biol. 69 297–321.

[24] Champagnat, N., Lambert, A. (2006)Evolution of discrete populations and the canonical diffusion of adaptive dynamics. Ann.Appl. Prob. 17 102–155.

BIBLIOGRAPHY 105

[25] Chaumont, L. (1994)Sur certains processus de Levy conditionnes a rester positifs. Stochastics Stoch. Reports47 1–20.

[26] Chaumont, L. (1996)Conditionings and path decompositions for Levy processes. Stoch. Proc. Appl. 64 39–54.

[27] Chaumont, L., Doney, R.A. (2005)On Levy processes conditioned to stay positive. Electron. J. Probab. 10 948–961.

[28] Crow, J.F., Kimura, M. (1970)An Introduction to Population Genetics Theory. Harper & Row, New York.

[29] Darwin, C. (1859)On the Origin of Species by Means of Natural Selection, or the Preservation of FavouredRaces in the Struggle for Life. John Murray, London.

[30] van Doorn, E.A. (1991)Quasi–stationary distributions and convergence to quasi–stationarity of birth–death pro-cesses. Adv. Appl. Prob. 23 683–700.

[31] Duquesne, T. (2007)Continuum random trees and branching processes with immigration. Stoch. Proc. Appl.In press. Preprint arXiv:math/0509519.

[32] Duquesne, T. (2007)The coding of compact real trees by real valued functions. Preprint arXiv:math/0604106.

[33] Duquesne, T., Le Gall, J.F. (2002)Random Trees, Levy Processes and Spatial Branching Processes. Asterisque 281.

[34] Duquesne, T., Winkel, M. (2007)Growth of Levy trees. Probab. Theory Relat. Fields. 139 313–371.

[35] Durrett, R. (2006)Random Graph Dynamics. Cambridge U. Press.

[36] Durrett, R. (2008)Probability Models for DNA Sequence Evolution. Springer–Verlag, Berlin. 2nd revised ed.

[37] Erdos, P., Renyi, A. (1960)On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5 17—61.

[38] Etheridge, A.M. (2004)Survival and extinction in a locally regulated population. Ann. Appl. Prob. 14 188–214.

[39] Ethier, S. N., Kurtz, T. G. (1986)Markov Processes, Characterization and Convergence. John Wiley & Sons, New York.

[40] Evans, S.N. (2008)Probability and real trees. Lectures from the 35th Summer School on Probability Theoryheld in Saint-Flour, July 6–23, 2005. Lecture Notes in Mathematics, 1920. Springer, Berlin.

BIBLIOGRAPHY 106

[41] Evans, S.N., Pitman, J., Winter, A. (2006)Rayleigh processes, real trees, and root growth with re-grafting. Probab. Theory Relat.Fields 134 81–126.

[42] Evans, S.N., Winter, A. (2006)Subtree prune and regraft: a reversible tree-valued Markov process. Ann. Probab. 34918–961.

[43] Ewens, W.J. (2005)Mathematical Population Genetics. 2nd edition, Springer–Verlag, Berlin.

[44] Feller, W. (1951)Diffusion processes in genetics. Proc. Second Berkeley Symp. Math. Statist. Prob. 227–246.

[45] Ferrari, P.A., Kesten, H., Martınez, S., Picco, P. (1995)Existence of quasi-stationary distributions. A renewal dynamical approach. Ann. Probab.23 (2) 501–521.

[46] Fisher, R.A. (1922)On the dominance ratio. Proc. Roy. Soc. Edin. 42 321–341.

[47] Fisher, R.A. (1930)The distribution of gene ratios for rare mutations. Proc. Roy. Soc. Edin. 50 205–220.

[48] Galton, F. (1894)A plausible paradox in chances. Nature 49 365–366.

[49] Geiger, J. (1999)Elementary new proofs of classical limit theorems for Galton–Watson processes. J. Appl.Prob. 36 301–309.

[50] Geiger, J., Kersting, G. (1997)Depth–first search of random trees, and Poisson point processes in Classical and modernbranching processes (Minneapolis, 1994) IMA Math. Appl. Vol. 84. Springer–Verlag, NewYork.

[51] Gillespie, J.H. (1998)Population Genetics: A Concise Guide. Johns Hopkins U. Press, Baltimore, MD.

[52] Gosselin, F. (2001)Asymptotic behavior of absorbing Markov chains conditional on nonabsorption for appli-cations in conservation biology. Ann. Appl. Prob. 11 261–284.

[53] Grey, D.R. (1974)Asymptotic behaviour of continuous–time, continuous state–space branching processes. J.Appl. Prob. 11 669–677.

[54] Haccou, P., Jagers, P., Vatutin, V.A. (2005)Branching Processes. Variation, Growth, and Extinction of Populations. Series: Cam-bridge Studies in Adaptive Dynamics (No. 5). Cambridge U. Press.

BIBLIOGRAPHY 107

[55] Haldane, J.B.S. (1927)A mathematical theory of natural and artificial selection. Part V : Selection and mutation.Proc. Camb. Phil. Soc. 23 838–844.

[56] Harris, T.E. (1963)The theory of branching processes. Springer–Verlag, Berlin.

[57] Heathcote, C. R. (1966)Corrections and comments on the paper “A branching process allowing immigration”. J.Roy. Statist. Soc. Ser. B 28 213–217.

[58] Heathcote, C. R., Seneta, E., Vere-Jones, D. (1967)A refinement of two theorems in the theory of branching processes. Teor. Verojatnost. iPrimenen. 12 341–346.

[59] Heyde, C.C. (1970)Extension of a result of Seneta for the super–critical Galton–Watson process. Ann. Math.Statist. 41 739–742.

[60] Heyde, C.C., Seneta, E. (1972)The simple branching process, a turning point test and a fundamental inequality: a his-torical note on I.J. Bienayme. Biometrika 59 680–683.

[61] Hirano, K. (2001)Levy processes with negative drift conditioned to stay positive. Tokyo J. Math. 24 291–308.

[62] Ito, K. (1971)Poisson point processes attached to Markov processes. Proc. Sixth Berkeley Symp. vol.III225–240. U. of California Press.

[63] Jagers, P. (1975)Branching processes with biological applications. John Wiley & Sons, London New YorkSydney.

[64] Jirina, M. (1958)Stochastic branching processes with continuous state space. Czech. Math. J. 8 292–312.

[65] Joffe, A. (1967)On the Galton–Watson branching processes with mean less than one. Ann. Math. Stat.38 264–266.

[66] Kahneman, D., Slovic, P., Tversky, A. (1993)Judgment Under Uncertainty– Heuristics and Biases. Cambridge U. Press.

[67] Kallenberg, O. (1973)Canonical representations and convergence criteria for processes with interchangeable in-crements. Z. Wahrscheinlichkeitstheorie verw. Gebiete 27 23–36.

[68] Kallenberg, P.J.M. (1979)Branching processes with continuous state space. Mathematisch Centrum, Amsterdam.

BIBLIOGRAPHY 108

[69] Kawazu, K., Watanabe, S. (1971)Branching processes with immigration and related limit theorems. Teor. Verojanost iPrimenen 16 34–51.

[70] Kendall, D.G. (1975)The genalogy of genealogy: branching processes before (and after) 1873. Bull. LondonMath. Soc. 7 part 3 (21) 225–253.

[71] Kesten, H., Ney, P., Spitzer, F. (1966)The Galton–Watson process with mean one and finite variance. Teor. Verojatnost. i Prime-nen. 11 579–611.

[72] Kesten, H., Stigum, B.P. (1966)A limit theorem for multidimensional Galton–Watson processes. Ann. Math. Statist. 37part 3 (21) 1211–1223.

[73] Kimmel, M. (1997)Quasistationarity in a branching model of division–within–division. In Classical and mod-ern branching processes (Minneapolis, 1994) IMA Math. Appl. Vol. 84. Springer–Verlag,New York.

[74] Kimura, M. (1957)Some problems of stochastic processes in genetics. Ann. Math. 28 882–901.

[75] Kingman, J.F.C. (1982)The coalescent. Stochastic Process. Appl. 13(3) 235–248.

[76] Kolmogorov, A.N. (1938)Zur Losung einer biologischen Aufgabe. Comm. Math. Mech. Chebyshev Univ. Tomsk 21–6.

[77] Kot, M. (2001)Elements of Mathematical Ecology. Cambridge U. Press.

[78] Kyprianou, A.E. (2006)Introductory lectures on fluctuations of Levy processes with applications. Springer–Verlag,Berlin Heidelberg.

[79] Lambert, A. (2002)The genealogy of continuous–state branching processes with immigration. Probab. TheoryRelat. Fields 122 (1): 42–70.

[80] Lambert, A. (2003)Coalescence times for the branching process. Adv. Appl. Prob. 35 (4): 1071–1089.

[81] Lambert, A. (2005)The branching process with logistic growth. Ann. Appl. Prob. 15 1506–1535.

[82] Lambert, A. (2006)Probability of fixation under weak selection: a branching process unifying approach. Theor.Popul. Biol. 69 419–441.

BIBLIOGRAPHY 109

[83] Lambert, A. (2007)Quasi–stationary distributions and the continuous–state branching process conditioned tobe never extinct. Electron. J. Prob. 12 420–446.

[84] Lambert, A. (2007)The contour of splitting trees is a Levy process. Preprint arXiv:0704.3098.

[85] Lambert, A. (2009)Spine decompositions of Levy trees. In preparation.

[86] Lamperti, J. (1967)Continuous–state branching processes. Bull. Amer. Math. Soc. 73 382–386.

[87] Le Gall, J.F. (2005)Random trees and applications. Probab. Surv. 2 245–311.

[88] Le Gall, J.F., Le Jan, Y. (1998)Branching processes in Levy processes : the exploration process. Ann. Probab. 26 213–252.

[89] Li, Z.-H. (2000)Asymptotic behavior of continuous time and state branching processes. J. Aus. Math. Soc.Series A 68 68–84.

[90] Lyons, R., Pemantle, R., Peres, Y. (1995)Conceptual proofs of L logL criteria for mean behavior of branching processes. Ann.Probab. 23 1125–1138.

[91] Lyons, R., Peres, Y.Probability on trees and networks, a book in progress. To be published by Cambridge U.Press.http://mypage.iu.edu/∼rdlyons/prbtree/prbtree.html

[92] Malthus, T.R. (1798)An Essay on the Principle of Population. J. Johnson, London.

[93] Neveu, J. (1986)Arbres et processus de Galton–Watson. Ann. Inst. H. Poincare 22 199–207.

[94] Pakes, A.G. (1998)Conditional limit theorems for continuous time and state branching processes. Technicalreport. To appear in Records and Branching Processes (M. Ahsanullah and G. Yanev Eds)Nova Science Publishers Inc.

[95] Pinsky, M.A. (1972)Limit theorems for continuous state branching processes with immigration. Bull. Amer.Math. Soc. 78 242–244.

[96] Pitman, J. (1999)Coalescents with multiple collisions. Ann. Probab. 27 1870—1902.

BIBLIOGRAPHY 110

[97] Popovic, L. (2004)Asymptotic genealogy of a critical branching process. Ann. Appl. Prob. 14 (4) 2120–2148.

[98] Renshaw, E. (1991)Modelling biological populations in space and time. Cambridge Studies in MathematicalBiology, 11. Cambridge U. Press.

[99] Revuz, D., Yor, M. (1999)Continuous Martingales and Brownian Motion. (3rd revised ed.) Springer–Verlag, BerlinHeidelberg New York.

[100] Roelly, S., Rouault, A. (1989)Processus de Dawson–Watanabe conditionne par le futur lointain. C. R. Acad. Sci. ParisSer. I 309 867–872.

[101] Sato, K.I. (1999)Levy processes and infinitely divisible distributions. Cambridge studies in advanced math-ematics Vol. 68. Cambridge U. Press.

[102] Schweinsberg, J. (2003)Coalescent processes obtained from supercritical Galton–Watson processes Stoch. Proc.Appl. 106 107–139.

[103] Seneta, E. (2006)Nonnegative matrices and Markov chains. (3rd revised ed.) Springer Series in Statistics,Springer.

[104] Seneta, E. (1970)On the supercritical Galton–Watson process with immigration. Math. Biosci. 7 9–14.

[105] Seneta, E., Vere-Jones, D. (1966)On quasi–stationary distributions in discrete–time Markov chains with a denumerableinfinity of states. J. Appl. Prob. 3 403–434.

[106] Sevast’yanov, B.A. (1971)Branching processes. Nauka, Moscow.

[107] Silverstein, M. L. (1967–68)A new approach to local times. J. Math. Mech. 17 1023–1054.

[108] Vere-Jones, D. (1967)Ergodic properties of nonnegative matrices. I. Pacific J. Math. 22 361–386.

[109] Vere-Jones, D. (1968)Ergodic properties of nonnegative matrices. II. Pacific J. Math. 26 601–620.

[110] Tuominen, P., Tweedie, R.L. (1979)Exponential decay and ergodicity of general Markov processes and their discrete skeletons.Adv. Appl. Probab. 11(4) 784—803.

BIBLIOGRAPHY 111

[111] Watson, H.W., Galton, F. (1874)On the probability of the extinction of families. J. Anthropol. Inst. Great Britain andIreland 4 138–144.

[112] Williams, D. (1974)Path decomposition and continuity of local time for one dimensional diffusions. Proc.London Math. Soc. 28(4) 738—768.

[113] Yaglom, A.M. (1947)Certain limit theorems of the theory of branching stochastic processes (in Russian). Dokl.Akad. Nauk. SSSR (n.s.) 56 795–798.

population dynamics and random genealogies1uctuates randomly (processes of bienaym e{galton{watson,...

Documents