branching processes - department of mathematicsroch/mdp/roch-mdp-chap5.pdf · branching process...

39
Chapter 5 Branching processes Branching processes arise naturally in the study of stochastic processes on trees and locally tree-like graphs. After a review of the basic extinction theory of branching processes, we give a few classical examples of applications in discrete probability. 5.1 Background We begin with a review of the extinction theory of Galton-Watson branching pro- cesses. 5.1.1 Basic definitions Recall the definition of a Galton-Watson process. Definition 5.1. A Galton-Watson branching process is a Markov chain of the fol- Galton-Watson process lowing form: Let Z 0 := 1. Let X (i, t), i 1, t 1, be an array of i.i.d. Z + -valued random variables with finite mean m = E[X (1, 1)] < +1, and define inductively, Z t := X 1iZ t-1 X (i, t). To avoid trivialities we assume P[X (1, 1) = i] < 1 for all i 0. Version: September 9, 2015 Modern Discrete Probability: An Essential Toolkit Copyright c 2015 S´ ebastien Roch 212

Upload: others

Post on 23-Mar-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Chapter 5

Branching processes

Branching processes arise naturally in the study of stochastic processes on trees andlocally tree-like graphs. After a review of the basic extinction theory of branchingprocesses, we give a few classical examples of applications in discrete probability.

5.1 Background

We begin with a review of the extinction theory of Galton-Watson branching pro-cesses.

5.1.1 Basic definitions

Recall the definition of a Galton-Watson process.

Definition 5.1. A Galton-Watson branching process is a Markov chain of the fol-Galton-Watsonprocess

lowing form:

• Let Z0

:= 1.

• Let X(i, t), i � 1, t � 1, be an array of i.i.d. Z+

-valued random variableswith finite mean m = E[X(1, 1)] < +1, and define inductively,

Zt :=X

1iZt�1

X(i, t).

To avoid trivialities we assume P[X(1, 1) = i] < 1 for all i � 0.

Version: September 9, 2015Modern Discrete Probability: An Essential ToolkitCopyright c� 2015 Sebastien Roch

212

In words, Zt models the size of a population at time (or generation) t. The randomvariable X(i, t) corresponds to the number of offspring of the i-th individual (ifthere is one) in generation t � 1. Generation t is formed of all offsprings of theindividuals in generation t� 1.

We denote by {pk}k�0

the law of X(1, 1). We also let f(s) := E[sX(1,1)] bethe corresponding probability generating function.

By tracking genealogical relationships, that is, who is whose child, we obtaina tree T rooted at the single individual in generation 0 with a vertex for each indi-vidual in the progeny and an edge for each parent-child relationship. We refer to Tas a Galton-Watson tree.

Galton-Watsontree

A basic observation about Galton-Watson processes is that their growth is ex-ponential in t.

Lemma 5.2 (Exponential growth I). Let Mt := m�tZt. Then (Mt) is a nonneg-ative martingale with respect to the filtration Ft = �(Z

0

, . . . , Zt). In particular,E[Zt] = mt.

Proof. Recall the following measure-theoretic lemma (see e.g. [Dur10, Exercise5.1.1]).

Lemma 5.3. Let (⌦,F ,P) be a probability space. If Y1

= Y2

a.s. on B 2 F thenE[Y

1

| F ] = E[Y2

| F ] a.s. on B.

Returning to the proof, observe that on {Zt�1

= k}

E[Zt | Ft�1

] = E

2

4

X

1jk

X(j, t)

Ft�1

3

5 = mk = mZt�1

.

This is true for all k. Rearranging shows that (Mt) is a martingale. For the secondclaim, note that E[Mt] = E[M

0

] = 1.

In fact, the martingale convergence theorem gives the following.

Lemma 5.4 (Exponential growth II). We have Mt ! M1 < +1 a.s. for somenonnegative random variable M1 2 �([tFt) with E[M1] 1.

Proof. This follows immediately from the martingale convergence theorem fornonnegative martingales (Corollary 3.35) and Fatou’s lemma.

213

5.1.2 Extinction

Observe that 0 is a fixed point of the process. The event

{Zt ! 0} = {9t : Zt = 0},

is called extinction. Establishing when extinction occurs is a central question inextinction

branching process theory. We let ⌘ be the probability of extinction. Throughout,we assume that p

0

> 0 and p1

< 1. Here is a first observation about extinction.

Lemma 5.5. A.s. either Zt ! 0 or Zt ! +1.

Proof. The process (Zt) is integer-valued and 0 is the only fixed point of the pro-cess under the assumption that p

1

< 1. From any state k, the probability of nevercoming back to k > 0 is at least pk

0

> 0, so every state k > 0 is transient. Theclaim follows.

In the critical case, that immediately implies almost sure extinction.

Theorem 5.6 (Extinction: critical case). Assume m = 1. Then Zt ! 0 a.s., i.e.,⌘ = 1.

Proof. When m = 1, (Zt) itself is a martingale. Hence (Zt) must converge to 0by Lemma 5.4.

We address the general case using generating functions. Let ft(s) = E[sZt ].Note that, by monotonicity,

⌘ = P[9t � 0 : Zt = 0] = limt!+1P[Zt = 0] = lim

t!+1 ft(0), (5.1)

Moreover, by the Markov property, ft has a natural recursive form

ft(s) = E[sZt ]

= E[E[sZt | Ft�1

]]

= E[f(s)Zt�1 ]

= ft�1

(f(s)) = · · · = f (t)(s), (5.2)

where f (t) is the t-th iterate of f .

Theorem 5.7 (Extinction: subcriticial and supercritical cases). The probability ofextinction ⌘ is given by the smallest fixed point of f in [0, 1]. Moreover:

• (Subcritical regime) If m < 1 then ⌘ = 1.

214

• (Supercritical regime) If m > 1 then ⌘ < 1.

Proof. The case p0

+ p1

= 1 is straightforward: the process dies almost surelyafter a geometrically distributed time. So we assume p

0

+ p1

< 1 for the rest ofthe proof.

We first summarize some properties of f .

Lemma 5.8. On [0, 1], the function f satisfies:

(a) f(0) = p0

, f(1) = 1;

(b) f is indefinitely differentiable on [0, 1);

(c) f is strictly convex and increasing;

(d) lims"1 f 0(s) = m < +1.

Proof. See e.g. [Rud76] for the relevant power series facts. Observe that (a) isclear by definition. The function f is a power series with radius of convergenceR � 1. This implies (b). In particular,

f 0(s) =X

i�1

ipisi�1 � 0, and f 00(s) =

X

i�2

i(i� 1)pisi�2 > 0,

because we must have pi > 0 for some i > 1 by assumption. This proves (c).Since m < +1, f 0(1) = m is well defined and f 0 is continuous on [0, 1], whichimplies (d).

We first characterize the fixed points of f . See Figure 5.1 for an illustration.

Lemma 5.9. We have:

• If m > 1 then f has a unique fixed point ⌘0

2 [0, 1).

• If m < 1 then f(t) > t for t 2 [0, 1). Let ⌘0

:= 1 in that case.

Proof. Assume m > 1. Since f 0(1) = m > 1, there is � > 0 s.t. f(1��) < 1��.On the other hand f(0) = p

0

> 0 so by continuity of f there must be a fixed pointin (0, 1� �). Moreover, by strict convexity and the fact that f(1) = 1, if x 2 (0, 1)is a fixed point then f(y) < y for y 2 (x, 1), proving uniqueness.

The second part follows by strict convexity and monotonicity.

It remains to prove convergence of the iterates to the appropriate fixed point.See Figure 5.2 for an illustration.

Lemma 5.10. We have:

215

Figure 5.1: Fixed points of f in subcritical (left) and supercritical (right) cases.

Figure 5.2: Convergence of iterates to a fixed point.

216

• If x 2 [0, ⌘0

), then f (t)(x) " ⌘0

• If x 2 (⌘0

, 1) then f (t)(x) # ⌘0

Proof. We only prove 1. The argument for 2. is similar. By monotonicity, forx 2 [0, ⌘

0

), we have x < f(x) < f(⌘0

) = ⌘0

. Iterating

x < f (1)(x) < · · · < f (t)(x) < f (t)(⌘0

) = ⌘0

.

So f (t)(x) " L ⌘0

. By continuity of f we can take the limit inside of

f (t)(x) = f(f (t�1)(x)),

to get L = f(L). So by definition of ⌘0

we must have L = ⌘0

.

The result then follows from the above lemmas together with Equations (5.1)and (5.2).

Example 5.11 (Poisson branching process). Consider the offspring distributionX(1, 1) ⇠ Poi(�) with � > 0. We refer to this case as the Poisson branchingprocess. Then

f(s) = E[sX(1,1)] =X

i�0

e���i

i!si = e�(s�1).

So the process goes extinct with probability 1 when � 1. For � > 1, theprobability of extinction ⌘� is the smallest solution in [0, 1] to the equation

e��(1�x) = x.

The survival probability ⇣� := 1� ⌘� satisfies 1� e��⇣� = ⇣�. J

We can use the extinction results to obtain more information on the limit inLemma 5.4. Of course, conditioned on extinction, M1 = 0 a.s. On the otherhand:

Lemma 5.12 (Exponential growth III). Conditioned on nonextinction, either M1 =0 a.s. or M1 > 0 a.s. In particular, P[M1 = 0] 2 {⌘, 1}.

Proof. A property of rooted trees is said to be inherited if all finite trees satisfy thisproperty and whenever a tree satisfies the property then so do all the descendanttrees of the children of the root. The property {M1 = 0} is inherited. The resultthen follows from the following 0-1 law.

217

Lemma 5.13 (0-1 law for inherited properties). For a Galton-Watson tree T , aninherited property A has, conditioned on nonextinction, probability 0 or 1.

Proof. Let T (1), . . . , T (Z1

) be the descendant subtrees of the children of the root.We use the notation T 2 A to mean that the tree T satisfies A. Then, by indepen-dence,

P[A] = E[P[T 2 A |Z1

]] E[P[T (i) 2 A, 8i Z1

|Z1

]] = E[P[A]Z1 ] = f(P[A]),

so P[A] 2 [0, ⌘] [ {1}. Also P[A] � ⌘ because A holds for finite trees.

That concludes the proof.

A further moment assumption provides a more detailed picture.

Lemma 5.14 (Exponential growth: finite second moment). Let (Zt) be a branch-ing process with m = E[X(1, 1)] > 1 and �2 = Var[X(1, 1)] < +1. Then, (Mt)converges in L2 and, in particular, E[M1] = 1. Further, P[M1 = 0] = ⌘.

Proof. We bound E[M2

t ] by computing it explicitly by induction. From the orthog-onality of increments (Lemma 3.39), it holds that

E[M2

t ] = E[M2

t�1

] + E[(Mt �Mt�1

)2].

On {Zt�1

= k},

E[(Mt �Mt�1

)2 | Ft�1

] = m�2tE[(Zt �mZt�1

)2 | Ft�1

]

= m�2tE

2

4

kX

i=1

X(i, t)�mk

!

2

Ft�1

3

5

= m�2tk�2

= m�2tZt�1

�2.

HenceE[M2

t ] = E[M2

t�1

] +m�t�1�2.

Since E[M2

0

] = 1,

E[M2

t ] = 1 + �2

t+1

X

i=2

m�i,

which is uniformly bounded when m > 1. Therefore (Mt) converges in L2 (seee.g. [Dur10, Theorem 5.4.5]). Finally, by Fatou’s lemma,

E|M1| sup kMtk1 sup kMtk2 < +1

218

and|E[Mt]� E[M1]| kMt �M1k

1

kMt �M1k2

,

implies the convergence of expectations.The last statement follows from Lemma 5.12.

Remark 5.15. The Kesten-Stigum Theorem gives a necessary and sufficient condition forE[M1] = 1 to hold [KS66b]. See e.g. [LP, Chapter 12].

5.1.3 . Bond percolation on Galton-Watson trees

Let T be a Galton-Watson tree for an offspring distribution with mean m > 1.Perform bond percolation on T with density p.

Theorem 5.16 (Bond percolation on Galton-Watson trees). Conditioned on nonex-tinction,

pc

(T ) =1

ma.s.

Proof. Let C0

be the cluster of the root in T with density p. We can think of C0

as being generated by a Galton-Watson branching process where the offspring dis-tribution is the law of

PX(1,1)i=1

Ii where the Iis are i.i.d. Ber(p) and X(1, 1) isdistributed according to the offspring distribution of T . In particular, by condition-ing on X(1, 1), the offspring mean under C

0

is mp. If mp 1 then

1 = Pp[|C0| < +1] = E[Pp[|C0| < +1 |T ]],

and we must have Pp[|C0| < +1 |T ] = 1 a.s. In other words, pc

(T ) � 1

m a.s.On the other hand, the property of trees {Pp[|C0| < +1 |T ] = 1} is inherited.

So by Lemma 5.13, conditioned on nonextinction, it has probability 0 or 1. Thatprobability is of course 1 on extinction. So by

Pp[|C0| < +1] = E[Pp[|C0| < +1 |T ]],

if the probability is 1 conditioned on nonextinction then it must be that mp 1.In other words, for any fixed p such that mp > 1, conditioned on nonextinctionPp[|C0| < +1 |T ] = 0 a.s. By monotonicity of Pp[|C0| < +1 |T ] in p, taking alimit pn ! 1/m proves the result.

5.1.4 . Random walk on Galton-Watson trees

To be written. See [LP, Theorem 3.5 and Corollary 5.10].⇤

⇤Requires: Section 2.2.3 and 3.1.1.

219

5.2 Random-walk representation

In this section, we develop a useful random-walk representation of the Galton-Watson process.

5.2.1 Exploration process

Consider the following exploration process of a Galton-Watson tree T . The explo-ration process, started at the root 0, has 3 types of vertices:

active, explored,and neutralvertices

- At: active vertices,

- Et: explored vertices,

- Nt: neutral vertices.

We start with A0

:= {0}, E0

:= ;, and N0

contains all other vertices in T . At timet, if At�1

= ; we let (At, Et,Nt) := (At�1

, Et�1

,Nt�1

). Otherwise, we pick anelement, at, from At�1

and set:

- At := At�1

[ {x 2 Nt�1

: {x, at} 2 T}\{at},

- Et := Et�1

[ {at},

- Nt := Nt�1

\{x 2 Nt�1

: {x, at} 2 T}.

To be concrete, we choose at in breadth-first search (or first-come-first-serve) man-ner: we exhaust all vertices in generation t before considering vertices in genera-tion t+ 1.

We imagine revealing the edges of T as they are encountered in the explorationprocess and we let (Ft) be the corresponding filtration. In words, starting with0, the Galton-Watson tree T is progressively grown by adding to it at each time achild of one of the previously explored vertices and uncovering its children in T .In this process, Et is the set of previously explored vertices and At is the set ofvertices who are known to belong to T but whose full neighborhood is waiting tobe uncovered. The rest of the vertices form the set Nt.

Let At := |At|, Et := |Et|, and Nt := |Nt|. Note that (Et) is non-decreasingwhile (Nt) is non-increasing. Let

⌧0

:= inf{t � 0 : At = 0},

(which by convention is +1 if there is no such t). The process is fixed for allt > ⌧

0

. Notice that Et = t for all t ⌧0

, as exactly one vertex is explored at each

220

time until the set of active vertices is empty. Moreover, for all t, (At, Et,Nt) formsa partition of [n] so

At + t+Nt = n, 8t ⌧0

.

Lemma 5.17 (Total progeny). Let W be the total progeny. Then

W = ⌧0

.

The random-walk representation is the following. Observe that the process(At) admits a simple recursive form. Recall that A

0

:= 1. Conditioning on Ft�1

:

- If At�1

= 0, the exploration process has finished its course and At = 0.

- Otherwise, (a) one active vertex becomes an explored vertex and (b) its neu-tral neighbors become active vertices. That is,

At =

8

>

<

>

:

At�1

+⇥ �1|{z}

(a)

+ Xt|{z}

(b)

, t� 1 < ⌧0

,

0, o.w.

where Xt is distributed according to the offspring distribution.

We let Yt = Xt � 1 � �1 and

St := 1 +

tX

i=1

Yi,

with S0

:= 1. Then

⌧0

= inf{t � 0 : St = 0}= inf{t � 0 : 1 + [X

1

� 1] + · · ·+ [Xt � 1] = 0}= inf{t � 0 : X

1

+ · · ·+Xt = t� 1},

and (At) is a random walk started at 1 with steps (Yt) stopped when it hits 0 forthe first time:

At = (St^⌧0

).

We give two applications.

221

5.2.2 Duality principle

The random-walk representation above is useful to prove the following dualityprinciple.

Theorem 5.18 (Duality principle). Let (Zt) be a branching process with offspringdistribution {pk}k�0

and extinction probability ⌘ < 1. Let (Z 0t) be a branching

process with offspring distribution {p0k}k�0

where

p0k = ⌘k�1pk.

Then (Zt) conditioned on extinction has the same distribution as (Z 0t), which is

referred to as the dual branching process.dual branchingprocessProof. We use the random walk representation. Let H = (X

1

, . . . , X⌧0

) and H 0 =(X 0

1

, . . . , X 0⌧ 00

) be the histories of the processes (Zt) and (Z 0t) respectively. (Under

breadth-first search, the process (Zt) can be reconstructed from H .) In the case ofextinction, the history of (Zt) has finite length. We call (x

1

, . . . , xt) a valid historyif x

1

+ · · · + xi � (i � 1) > 0 for all i < t and x1

+ · · · + xt � (t � 1) = 0.By definition of the conditional probability, for a valid history (x

1

, . . . , xt) with afinite t,

P[H = (x1

, . . . , xt) | ⌧0 < +1] =P[H = (x

1

, . . . , xt)]

P[⌧0

< +1]= ⌘�1

tY

i=1

pxi .

Because x1

+ · · ·+ xt = t� 1,

⌘�1

tY

i=1

pxi = ⌘�1

tY

i=1

⌘1�xip0xi=

tY

i=1

p0xi= P[H 0 = (x

1

, . . . , xt)].

Note thatX

k�0

p0k =X

k�0

⌘k�1pk = ⌘�1f(⌘) = 1,

because ⌘ is a fixed point of f . So {p0k}k�0

is indeed a probability distribution.Note further that

X

k�0

kp0k =X

k�0

k⌘k�1pk = f 0(⌘) < 1,

since f 0 is strictly increasing, f(⌘) = ⌘ < 1 and f(1) = 1. So the dual branchingprocess is subcritical.

222

Example 5.19 (Poisson branching process). Let (Zt) be a Galton-Watson branch-ing process with offspring distribution Poi(�) where � > 1. Then the dual proba-bility distribution is given by

p0k = ⌘k�1pk = ⌘k�1e���k

k!= ⌘�1e�� (�⌘)

k

k!,

where recall that e��(1�⌘) = ⌘, so

p0k = e�(1�⌘)e�� (�⌘)k

k!= e��⌘ (�⌘)

k

k!.

That is, the dual branching process has offspring distribution Poi(�⌘). J

5.2.3 Hitting-time theorem

The random-walk representation also gives a formula for the distribution of the sizeof the progeny.

We start with a combinatorial lemma of independent interest. Let u1

, . . . , ut 2R and define r

0

:= 0 and ri := u1

+· · ·+ui for 1 i t. We say that j is a ladderindex if rj > r

0

_· · ·_rj�1

. Consider the cyclic permutations of u = (u1

, . . . , ut):u

(0) = u, u(1) = (u2

, . . . , ut, u1), . . . , u(t�1) = (ut, u1, . . . , ut�1

). Definethe corresponding partial sums r(�)j := u(�)

1

+ · · · + u(�)j for j = 1, . . . , t and� = 0, . . . , t� 1. Observe that

(r(�)1

, . . . , r(�)t )

= (r�+1

� r� , r�+2

� r� , . . . , rt � r� ,

[rt � r� ] + r1

, [rt � r� ] + r2

, . . . , [rt � r� ] + r�)

= (r�+1

� r� , r�+2

� r� , . . . , rt � r� ,

rt � [r� � r1

], rt � [r� � r2

], . . . , rt � [r� � r��1

], rt). (5.3)

Lemma 5.20 (Spitzer’s combinatorial lemma). Assume rt > 0. Let ` be the num-ber of cyclic permutations such that t is a ladder index. Then ` � 1. Moreover,each such cyclic permutation has exactly ` ladder indices.

Proof. We first show that ` � 1, i.e., there is at least one cyclic permutation where tis a ladder index. Let � be the smallest index achieving the maximum of r

1

, . . . , rt,i.e.,

r� > r1

_ · · · _ r��1

and r� � r�+1

_ · · · _ rt.

From (5.3),r�+i � r� 0 < rt, 8i = 1, . . . , t� �,

223

andrt � [r� � rj ] < rt, 8j = 1, . . . ,� � 1.

Moreover, rt > 0 = r0

by assumption. So, in u

(�), t is a ladder index.Since ` � 1, we can assume w.l.o.g. that u is such that t is a ladder index. Then

� is a ladder index in u if and only if

r� > r0

_ · · · _ r��1

,

if and only if

rt > rt � r� and rt � [r� � rj ] < rt, 8j = 1, . . . ,� � 1.

Moreover, because rt > rj for all j, we have rt � [r�+i � r� ] = (rt � r�+i) + r�and the last equation is equivalent to

rt > rt � [r�+i � r� ], 8i = 1, . . . , t� �

andrt � [r� � rj ] < rt, 8j = 1, . . . ,� � 1.

That is, t is a ladder index in the �-th cyclic permutation.

We are now ready to prove the hitting-time theorem.

Theorem 5.21 (Hitting-time theorem). Let (Zt) be a Galton-Watson branchingprocess with total progeny W . In the random walk representation of (Zt),

P[W = t] =1

tP[X

1

+ · · ·+Xt = t� 1],

for all t � 1.

Proof. Let Ri := 1 � Si and Ui := 1 � Xi for all i = 1, . . . , t and let R0

:= 0.Then

{X1

+ · · ·+Xt = t� 1} = {Rt = 1},and

{W = t} = {t is the first ladder index in R1

, . . . , Rt}.By symmetry, for all �

P[t is the first ladder index in R1

, . . . , Rt]

= P[t is the first ladder index in R(�)1

, . . . , R(�)t ].

224

Let E� be the event on the last line. Hence

P[W = t] = E[ E1

] =1

tE

2

4

tX

�=1

E�

3

5 .

By Spitzer’s combinatorial lemma, there is at most one cyclic permutationwhere t is the first ladder index. In particular,

Pt�=1

E� 2 {0, 1}. So

P[W = t] =1

tP⇥[t

�=1

E�⇤

.

Finally observe that, because R0

= 0 and Ui 1 for all i, the partial sum at thej-th ladder index must take value j. So the event {[t

�=1

E�} implies that {Rt = 1}because the last partial sum of all cyclic permutations is Rt. Similarly, becausethere is at least one cyclic permutation such that t is a ladder index, the event{Rt = 1} implies {[t

�=1

E�}. Therefore,

P[W = t] =1

tP [Rt = 1] ,

which concludes the proof.

Note that the formula in the hitting-time theorem is somewhat remarkable asthe probability on the l.h.s. is P[Si > 0, 8i < t and St = 0] while the probabilityon the r.h.s. is 1

tP[St = 0].

Example 5.22 (Poisson branching process). Let (Zt) be a Galton-Watson branch-ing process with offspring distribution Poi(�) where � > 0. Let W be its totalprogeny. By the hitting-time theorem, for t � 1,

P[W = t] =1

tP[X

1

+ · · ·+Xt = t� 1]

=1

te��t (�t)

t�1

(t� 1)!

= e��t (�t)t�1

t!,

where we used that a sum of independent Poisson is Poisson. J

225

5.3 Comparison to branching processes

We begin with an example whose connection to branching processes is clear: per-colation on trees. Translating standard branching process results into their perco-lation counterpart immediately gives a more detailed picture of the behavior of theprocess than was derived in Section 2.2.3. We then tackle the phase transition ofErdos-Renyi graphs using a comparison to branching processes.

5.3.1 . Percolation on trees: critical exponents

In this section, we use branching processes to study bond percolation on the infiniteb-ary tree bTb. The same techniques can be adapted to Td with d = b + 1 in astraightforward manner.

We denote the root by 0. We think of the open cluster of the root, C0

, as theprogeny of a branching process as follows. Denote by @n the n-th level of bTb,that is, the vertices of bTb at graph distance n from the root. In the branchingprocess interpretation, we think of the immediate descendants in C

0

of a vertex vas the “children” of v. By construction, v has at most b children, independently ofall other vertices in the same generation. In this branching process, the offspringdistribution is binomial with parameters b and p; Zn := |C

0

\ @n| represents thesize of the progeny at generation n; and W := |C

0

| is the total progeny of theprocess. In particular |C

0

| < +1 if and only if the process goes extinct. Becausethe mean number of offspring is bp, by Theorem 5.7, this leads immediately to a(second) proof of:

Claim 5.23.pc

bTb

=1

b.

The generating function of the offspring distribution is �(s) := ((1�p)+ps)b.So, by Theorems 5.6 and 5.7, the percolation function

✓(p) = Pp[|C0| = +1],

is 0 on [0, 1/b], while on (1/b, 1] the quantity ⌘(p) := (1 � ✓(p)) is the uniquesolution in [0, 1) of the fixed point equation

s = ((1� p) + ps)b. (5.4)

For b = 2, for instance, we can compute the fixed point explicitly by noting that

0 = ((1� p) + ps)2 � s

= p2s2 + [2p(1� p)� 1]s+ (1� p)2,

226

whose solution for p 2 (1/2, 1] is

s⇤ =�[2p(1� p)� 1]±p

[2p(1� p)� 1]2 � 4p2(1� p)2

2p2

=�[2p(1� p)� 1]±p

1� 4p(1� p)

2p2

=�[2p(1� p)� 1]± (2p� 1)

2p2

=2p2 + [(1� 2p)± (2p� 1)]

2p2.

So, rejecting the fixed point 1,

✓(p) = 1� 2p2 + 2(1� 2p)

2p2=

2p� 1

p2.

We have proved:

Claim 5.24. For b = 2,

✓(p) =

(

0, 0 p 1

2

,2(p� 1

2

)

p21

2

< p 1.

The expected size of the population at generation n is (bp)n so for p 2 [0, 1b )

E|C0

| =X

n�0

(bp)n =1

1� bp.

For p 2 (1b , 1), the total progeny is almost surely infinite, but it is of interest tocompute the expected cluster size conditioned on |C

0

| < +1. We use the dualityprinciple, Theorem 5.18. For 0 k b, let

pk := [⌘(p)]k�1pk

= [⌘(p)]k�1

b

k

pk(1� p)b�k

=[⌘(p)]k

((1� p) + p ⌘(p))b

b

k

pk(1� p)b�k

=

b

k

◆✓

p ⌘(p)

(1� p) + p ⌘(p)

◆k ✓ 1� p

(1� p) + p ⌘(p)

◆b�k

=

b

k

pk (1� p)b�k

227

where we used (5.4) and implicitly defined the dual density

p :=p ⌘(p)

(1� p) + p ⌘(p). (5.5)

In particular {pk} is indeed a probability distribution. In fact it is binomial withparameters b and p. The corresponding generating function is

�(s) := ((1� p) + ps)b = ⌘(p)�1�(s ⌘(p)),

where the second expression can be seen directly from the definition of {pk}.Moreover,

�0(s) = ⌘(p)�1�0(s ⌘(p)) ⌘(p) = �0(s ⌘(p)),

so �0(1�) = �0(⌘(p)) < 1 by the proof of Theorem 5.7, confirming that percola-tion with density p is subcritical. Summarizing:

Claim 5.25. Conditioned on |C0

| < +1, (supercritical) percolation on bTb withdensity p 2 (1b , 1) has the same distribution as (subcritical) percolation on bTb withdensity defined by (5.5).

Therefore:

Claim 5.26.

�f(p) := Ep

⇥|C0

| {|C0

|<+1}⇤

=

(

1

1�bp , p 2 [0, 1b ),⌘(p)1�bp , p 2 (1b , 1).

For b = 2, ⌘(p) = 1� ✓(p) =⇣

1�pp

2

so

p =p⇣

1�pp

2

(1� p) + p⇣

1�pp

2

=(1� p)2

p(1� p) + (1� p)2= 1� p,

and

Claim 5.27. For b = 2,

�f(p) =

8

>

<

>

:

1/21

2

�p, p 2 [0, 1

2

),

1

2

⇣1�pp

⌘2

p� 1

2

, p 2 (12

, 1).

228

In fact, the hitting-time theorem, Theorem 5.21, gives an explicit formula forthe distribution |C

0

|. Namely, because |C0

| d

= ⌧0

for St =P

`tX`� (t�1) whereS0

= 1 and the X`s are i.i.d. binomial with parameters b and p and further

P[⌧0

= t] =1

tP[St = 0],

we have

Pp[|C0| = `] =1

`P

2

4

X

i`

X` = `� 1

3

5 =1

`

b`

`� 1

p`�1(1� p)b`�(`�1), (5.6)

where we used that a sum of independent binomials with the same p is still bino-mial. In particular, at criticality, using Stirling’s formula it can be checked that

Ppc

[|C0

| = `] ⇠ 1

`

1p

2⇡pc

(1� pc

)b`=

1p

2⇡(1� pc

)`3.

as ` ! +1.Close to criticality, physicists predict that many quantities behave according to

power laws of the form |p � pc

|� , where the exponent is referred to as a criticalexponent. The critical exponents are believed to satisfy certain “universality” prop-

critical exponenterties. But even proving the existence of such exponents in general remains a majoropen problem. On trees, though, we can simply read off the critical exponents fromthe above formulas. For b = 2, Claims 5.24 and 5.27 imply for instance that, asp ! p

c

,✓(p) ⇠ 8(p� p

c

) {p>1/2},

and�f(p) ⇠ 1

2|p� p

c

|�1.

In fact, as can be seen from Claim 5.26, the critical exponent of �f(p) does notdepend on b. The same holds for ✓(p). See Exercise 5.5. Using (5.6), the highermoments of |C

0

| can also be studied around criticality. See Exercise 5.6.

5.3.2 . Random binary search trees: height

To be written. See [Dev98, Section 2.1].

229

5.3.3 . Erdos-Renyi graph: the phase transition

A compelling way to view Erdos-Renyi graphs as the density varies is the followingcoupling or “evolution.”† For each pair {i, j}, let U{i,j} be independent uniformrandom variables in [0, 1] and set G(p) := ([n], E(p)) where {i, j} 2 E(p) if andonly if U{i,j} p. Then G(p) is distributed according to Gn,p. As p varies from 0to 1, we start with an empty graph and progressively add edges until the completegraph is obtained. We showed in Section 2.2.4 that logn

n is a threshold functionfor connectivity. Before connectivity occurs in the evolution of the random graph,a quantity of interest is the size of the largest connected component. As we showin this section, this quantity itself undergoes a remarkable phase transition: whenp = �

n with � < 1, the largest component has size ⇥(log n); as � crosses 1, manycomponents quickly merge to form a so-called “giant component” of size ⇥(n).

This celebrated result of Erdos and Renyi, which is often referred to as “the”phase transition of the Erdos-Renyi graph, is related to the phase transition in per-colation. That should be clear from the similarities between the proofs, specificallythe branching process approach to percolation on trees (Section 5.3.1). Althoughthe proof is quite long, it is well worth studying in details. It employs most toolswe have seen up to this point: first and second moment methods, Chernoff-Cramerbound, martingale techniques, coupling and stochastic domination, and branchingprocesses. It is quintessential discrete probability.

Statements and proof sketch Before stating the main theorems, we recall a basicresult from Chapter 2.

- (Poisson tail) Let Sn be a sum of n i.i.d. Poi(�) variables. Recall from (2.28)and (2.29) that for a > �

� 1

nlogP[Sn � an] � a log

⇣a

� a+ � =: IPoi� (a), (5.7)

and similarly for a < �

� 1

nlogP[Sn an] � IPoi� (a). (5.8)

To simplify the notation, we let

I� := IPoi� (1) = �� 1� log � � 0, (5.9)

where the inequality follows from the convexity of I� and the fact that itattains its minimum at � = 1 where it is 0.

†Requires: Sections 2.2.1, 2.3.1, 4.1 and 4.3.1

230

We let p = �n and denote by C

max

the largest connected component. In the subcrit-ical case, that is, when � < 1, we show that the largest connected component haslogarithmic size in n.

Theorem 5.28 (Subcritical case: upper bound on the largest cluster). Let Gn ⇠Gn,pn where pn = �

n with � 2 (0, 1). For all > 0,

Pn,pn

⇥|Cmax

| > (1 + )I�1

� log n⇤

= o(1),

where I� is defined in (5.9).

(We also give a matching logarithmic lower bound on the size of Cmax

in Theo-rem 5.36.) In the supercritical case, that is, when � > 1, we prove the existence ofa unique connected component of size linear in n, which is referred to as the giantcomponent.

Theorem 5.29 (Supercritical regime: giant component). Let Gn ⇠ Gn,pn wherepn = �

n with � > 1. For any � 2 (1/2, 1) and � < 2� � 1,

Pn,pn [||Cmax

|� ⇣�n| � n� ] O(n��).

In fact, with probability 1 � o(1), there is a unique largest component and thesecond largest cluster has size ⌦(log n).

See Figure 5.3 for an illustration.At a high level, the proof goes as follows:

• (Subcritical regime) In the subcritical case, we use an exploration processand a domination argument to approximate the size of the connected compo-nents with the progeny of a branching process. The result then follows fromthe hitting-time theorem and the Poisson tail above.

• (Supercritical regime) In the supercritical case, a similar argument gives abound on the expected size of the giant component, which is related to thesurvival of the branching process. Chebyshev’s inequality gives concentra-tion. The hard part there is to bound the variance.

Exploration process For a vertex v 2 [n], let Cv be the connected componentcontaining v, also referred to as the cluster of v. To analyze the size of Cv, we

clusterintroduce a natural procedure to explore Cv and show that it is dominated aboveand below by branching processes. This procedure is similar to the explorationprocess defined in Section 5.2.1.

The exploration process started at v has 3 types of vertices:active, explored,and neutralvertices

231

Figure 5.3: Illustration of the phase transition.

- At: active vertices,

- Et: explored vertices,

- Nt: neutral vertices.

We start with A0

:= {v}, E0

:= ;, and N0

contains all other vertices in Gn. Attime t, if At�1

= ; we let (At, Et,Nt) := (At�1

, Et�1

,Nt�1

). Otherwise, we picka random element, at, from At�1

and set:

- At := (At�1

\{at}) [ {x 2 Nt�1

: {x, at} 2 Gn}- Et := Et�1

[ {at}- Nt := Nt�1

\{x 2 Nt�1

: {x, at} 2 Gn}We imagine revealing the edges of Gn as they are encountered in the explorationprocess and we let (Ft) be the corresponding filtration. In words, starting with v,the cluster of v is progressively grown by adding to it at each time a vertex adjacentto one of the previously explored vertices and uncovering its neighbors in Gn. Inthis process, Et is the set of previously explored vertices and At—the frontier ofthe process—is the set of vertices who are known to belong to Cv but whose fullneighborhood is waiting to be uncovered. The rest of the vertices form the set Nt.See Figure 5.4.

232

Figure 5.4: Exploration process for Cv. The green edges are in Ft. The red onesare not.

Let At := |At|, Et := |Et|, and Nt := |Nt|. Note that (Et) is non-decreasingwhile (Nt) is non-increasing. Let

⌧0

:= inf{t � 0 : At = 0}.

The process is fixed for all t > ⌧0

. Notice that Et = t for all t ⌧0

, as exactly onevertex is explored at each time until the set of active vertices is empty. Moreover,for all t, (At, Et,Nt) forms a partition of [n] so

At + t+Nt = n, 8t ⌧0

. (5.10)

Hence, in tracking the size of the exploration process, we can work alternativelywith At or Nt. Specifically, the size of the cluster of v can be characterized asfollows.

Lemma 5.30.⌧0

= |Cv|.Proof. Indeed a single vertex of Cv is explored at each time until all of Cv has beenvisited. At that point, At is empty.

The processes (At) and (Nt) admit a simple recursive form. Conditioning on Ft�1

:

233

- (Active vertices) If At�1

= 0, the exploration process has finished its courseand At = 0. Otherwise, (a) one active vertex becomes an explored vertexand (b) its neutral neighbors become active vertices. That is,

At = At�1

+ {At�1

>0}⇥ �1|{z}

(a)

+ Zt|{z}

(b)

, (5.11)

where Zt is binomial with parameters Nt�1

= n � (t � 1) � At�1

and pn.For the coupling arguments below, it will be useful to think of Zt as a sumof independent Bernoulli variables. That is, let (It,j : t � 1, j � 1) be anarray of independent, identically distributed {0, 1}-variables with P[I

11

=1] = pn. We write

Zt =

Nt�1

X

i=1

It,i. (5.12)

- (Neutral vertices) Similarly, if At�1

> 0, i.e. Nt�1

< n� (t�1), Zt neutralvertices become active vertices. That is,

Nt = Nt�1

� {Nt�1

<n�(t�1)} Zt. (5.13)

Branching process arguments With these observations, we now relate the clus-ter size of v to the total progeny of a certain branching process. This is the keylemma.

Lemma 5.31 (Cluster size: branching process approximation). Let Gn ⇠ Gn,pn

where pn = �n with � > 0 and let Cv be the connected component of v 2 [n]. Let

W� be the total progeny of a branching process with offspring distribution Poi(�).Then, for kn = o(

pn),

P[W� � kn]�O

k2nn

Pn,pn [|Cv| � kn] P[W� � kn].

Before proving the lemma, recall the following simple domination results fromChapter 4:

- (Binomial domination) We have

n � m =) Bin(n, p) ⌫ Bin(m, p). (5.14)

The binomial distribution is also dominated by the Poisson distribution inthe following way:

� 2 (0, 1) =) Poi(�) ⌫ Bin

n� 1,�

n

. (5.15)

For the proofs, see Examples 4.24 and 4.28.

234

We use these domination results to relate the size of the connected components tothe progeny of the branching process.

Proof of Lemma 5.31. We start with the upper bound.

Upper bound: Because Nt�1

= n� (t�1)�At�1

n�1, conditioned on Ft�1

,the following stochastic domination relations hold

Bin

Nt�1

,�

n

� Bin

n� 1,�

n

� Poi(�),

by (5.14) and (5.15). Observe that the r.h.s. does not depend on Nt�1

. Let (Z�t )

be a sequence of independent Poi(�). Using the coupling in Example 4.28, we cancouple the processes (It,j)j and (Z�

t ) in such way that Z�t � Pn�1

j=1

It,j a.s. for allt. Then by induction on t, for all t, 0 At A�

t a.s. where we define

A�t := A�

t�1

+ {A�t�1

>0}⇥� 1 + Z�

t

, (5.16)

with A�0

:= 1. (In fact, this is a domination of Markov transition matrices, asdefined in Definition 4.36.) In words, (At) is stochastically dominated by the ex-ploration process of a branching process with offspring distribution Poi(�). As aresult, letting

⌧�0

:= inf{t � 0 : A�t = 0},

be the total progeny of the branching process, we immediately get

Pn,pn [|Cv| � kn] = Pn,pn [⌧0 � kn] P[⌧�0

� kn] = P[W� � kn].

Lower bound: In the other direction, we proceed in two steps. We first show that,up to a certain time, the process is bounded from below by a branching processwith binomial offspring distribution. In a second step, we show that this binomialbranching process can be approximated by a Poisson branching process.

1. (Domination from below) Let A�t be defined as

A�t := A�

t�1

+ {A�t�1

>0}⇥� 1 + Z�

t

, (5.17)

with A�0

:= 1, where

Z�t :=

n�knX

i=1

It,j . (5.18)

Note that (A�t ) is the size of the active set in the exploration process of a

branching process with offspring distribution Bin(n� kn, pn). Let

⌧�0

:= inf{t � 0 : A�t = 0},

235

be the total progeny of this branching process. We claim that At is boundedfrom below by A�

t up to time

�n�kn := inf{t � 0 : Nt n� kn}.Indeed, for all t �n�kn , Nt�1

> n � kn. Hence, by (5.12) and (5.18),Zt � Z�

t for all t �n�kn and as a result, by induction on t,

At � A�t , 8t �n�kn .

Because the inequality between At and A�t holds only up to time �n�kn , we

cannot compare directly ⌧0

and ⌧�0

. However, observe that the size of thecluster of v is at least the total number of active and explored vertices at anytime t; in particular, when �n�kn < +1,

|Cv| � A�n�kn+ E�n�kn

= n�N�n�kn� kn.

On the other hand, when �n�kn = +1, Nt > n�kn for all t—in particularfor all t � ⌧

0

—and therefore |Cv| < kn. Moreover in that case, becauseAt � A�

t for all t, it holds in addition that ⌧�0

⌧0

< kn. To sum up, wehave proved the implications

⌧�0

� kn =) �n�kn < +1 =) ⌧0

� kn.

In particular,P[⌧�

0

� kn] Pn,pn [⌧0 � kn]. (5.19)

2. (Poisson approximation) By Theorem 5.21,

P[⌧�0

= t] =1

tP"

tX

i=1

Z�i = t� 1

#

, (5.20)

where the Z�i s are independent Bin(n � kn, pn). Note that

Pti=1

Z�i ⇠

Bin(t(n � kn), pn). Recall the definition of (Z�t ) from (5.16). By Exam-

ple 4.10, Theorem 4.16, and the triangle inequality for total variation dis-tance,�

P"

tX

i=1

Z�i = t� 1

#

� P"

tX

i=1

Z�i = t� 1

#

1

2t(n� kn)(� log(1� pn))

2 + [t�� t(n� kn)(� log(1� pn))]

1

2tn

n+O(n�2)

2

+

t�� t(n� kn)

n+O(n�2)

◆�

= O

tknn

.

236

So by (5.20)

P[⌧�0

� kn] = 1� P[⌧�0

< kn]

= 1� P[⌧�0

< kn] +O

k2nn

= P[⌧�0

� kn] +O

k2nn

.

Plugging this approximation back into (5.19) gives

Pn,pn [|Cv| � kn] = Pn,pn [⌧0 � kn]

� P[⌧�0

� kn]�O

k2nn

= P[W� � kn]�O

k2nn

.

Remark 5.32. In fact one can get a slightly better lower bound. See Exercise 5.7.

When kn is large, the branching process approximation above is not as accuratebecause of the saturation effect: an Erdos-Renyi graph has a finite pool of verticesfrom which to draw edges; as the number of neutral vertices decreases, so does theexpected number of uncovered edges at each time. Instead we use the followinglemma.

Lemma 5.33 (Cluster size: saturation). Let Gn ⇠ Gn,pn where pn = �n with � > 0

and let Cv be the connected component of v 2 [n]. Let Yt ⇠ Bin(n � 1, 1 � (1 �pn)t). Then, for any t,

Pn,pn [|Cv| = t] P[Yt = t� 1].

Proof. We work with neutral vertices. By Lemma 5.30 and Equation (5.10), forany t,

Pn,pn [|Cv| = t] = Pn,pn [⌧0 = t] Pn,pn [Nt = n� t]. (5.21)

Recall that N0

= n� 1 and

Nt = Nt�1

� {Nt�1

<n�(t�1)}Nt�1

X

i=1

It,i. (5.22)

237

It is easier to consider the process without the indicator as it has a simple distribu-tion. Define N0

0

:= n� 1 and

N0

t := N0

t�1

�N0

t�1

X

i=1

It,i, (5.23)

and observe that Nt � N0

t for all t, as the two processes agree up to time ⌧0

atwhich point Nt stays fixed. The interpretation of N0

t is straightforward: startingwith n�1 vertices, at each time each remaining vertex is discarded with probabilitypn. Hence, the number of surviving vertices at time t has distribution

N0

t ⇠ Bin(n� 1, (1� pn)t),

by the independence of the steps. Arguing as in (5.21),

Pn,pn [|Cv| = t] Pn,pn [N0

t = n� t]

= Pn,pn [(n� 1)�N0

t = t� 1]

= P[Yt = t� 1].

which concludes the proof.

Combining the previous lemmas we get:

Lemma 5.34 (Bound on the cluster size). Let Gn ⇠ Gn,pn where pn = �n with

� > 0 and let Cv be the connected component of v 2 [n].

- (Subcritical case) Assume � 2 (0, 1). For all > 0,

Pn,pn

⇥|Cv| > (1 + )I�1

� log n⇤

= O(n�(1+)).

- (Supercritical case) Assume � > 1. Let ⇣� be the unique solution in (0, 1) tothe fixed point equation

1� e��⇣ = ⇣.

Note that, Example 5.11, ⇣� is the survival probability of a branching processwith offspring distribution Poi(�). For any > 0,x

Pn,pn

⇥|Cv| > (1 + )I�1

� log n⇤

= ⇣� +O

log2 n

n

.

Moreover, for any ↵ < ⇣� and any � > 0, there exists �,↵ > 0 large enoughso that

Pn,pn

(1 + �,↵)I�1

� log n |Cv| ↵n⇤

= O(n�(1+�)). (5.24)

238

Proof. In both cases we use Lemma 5.31. To apply the lemma we need to boundthe tail of the progeny W� of a Poisson branching process. Using the notation ofLemma 5.31, by Theorem 5.21,

P [W� > kn] = P [W� = +1] +X

t>kn

1

tP"

tX

i=1

Z�i = t� 1

#

, (5.25)

where the Z�i s are i.i.d. Poi(�). Both terms on the r.h.s. depend on whether or

not the mean � is smaller or larger than 1. We start with the first term. When� < 1, the Poisson branching process goes extinct with probability 1. HenceP[W� = +1] = 0. When � > 1 on the other hand, P[W� = +1] = ⇣�, where⇣� > 0 is the survival probability of the branching process. As to the second term,the sum of the Z�

i s is �t. When � < 1, using (5.7),

X

t>kn

1

tP"

tX

i=1

Z�i = t� 1

#

X

t>kn

P"

tX

i=1

Z�i � t� 1

#

X

t>kn

exp

�tIPoi�

t� 1

t

◆◆

X

t>kn

exp��t(I� �O(t�1))

X

t>kn

C 0 exp (�tI�)

C exp (�I�kn) , (5.26)

for some constants C,C 0 > 0, where we assume that kn = !(1). When � > 1,

X

t>kn

1

tP"

tX

i=1

Z�i = t� 1

#

X

t>kn

P"

tX

i=1

Z�i t

#

X

t>kn

exp (�tI�)

C exp (�I�kn) , (5.27)

for a possibly different C > 0.

Subcritical case: Assume 0 < � < 1 and let c = (1 + )I�1

� for > 0. ByLemma 5.31,

Pn,pn [|C1| > c log n] P [W� > c log n] .

239

By (5.25) and (5.26),

P [W� > c log n] = O (exp (�I�c log n)) , (5.28)

which proves the claim.

Supercritical case: Now assume � > 1 and again let c = (1 + )I�1

� for > 0.By Lemma 5.31,

Pn,pn [|Cv| > c log n] = P [W� > c log n] +O

log2 n

n

, (5.29)

By (5.25) and (5.27),

P [W� > c log n] = ⇣� +O (exp (�cI� log n))

= ⇣� +O(n�(1+)). (5.30)

Combining (5.29) and (5.30), for any > 0,

Pn,pn [|Cv| > c log n] = ⇣� +O

log2 n

n

. (5.31)

Next, we show that in the supercritical case when |Cv| > c log n the cluster size isin fact linear in n with high probability. By Lemma 5.33

Pn,pn [|Cv| = t] P[Yt = t� 1] P[Yt t],

where Yt ⇠ Bin(n � 1, 1 � (1 � pn)t). Roughly, the r.h.s. is negligible until themean µt := (n � 1)(1 � (1 � �/n)t) is of the order of t. Let ⇣� be the uniquesolution in (0, 1) to the fixed point equation

f(⇣) := 1� e��⇣ = ⇣.

The solution is unique because f(0) = 0, f(1) < 1, and the f is increasing, strictlyconcave and has derivative � > 1 at 0. Note in particular that, when t = ⇣�n,µt ⇡ t. Let ↵ < ⇣�. For any t 2 [c log n,↵n], by a Chernoff bound for Poissontrials (Theorem 2.31),

P[Yt t] exp

�µt

2

1� t

µt

2

!

. (5.32)

240

For t/n ↵ < ⇣�, using 1� x e�x for x 2 (0, 1), there is �↵ > 1 such that

µt � (n� 1)(1� e��(t/n))

= t

n� 1

n

1� e��(t/n)

t/n

= t

n� 1

n

f(t/n)

t/n

� t

n� 1

n

1� e��↵

� �↵t,

for n large enough, by the properties of f mentioned above. Plugging this backinto (5.32), we get

P[Yt t] exp

�t

(

�↵2

1� 1

�↵

2

)!

.

Therefore↵nX

t=c logn

Pn,pn [|Cv| = t] ↵nX

t=c logn

P[Yt t]

+1X

t=c logn

exp

�t

(

�↵2

1� 1

�↵

2

)!

= O

exp

�c log n

(

�↵2

1� 1

�↵

2

)!!

.

Taking > 0 large enough proves (5.24).

Let Cmax

be the largest connected component of Gn (choosing the componentcontaining the lowest label if there is more than one such component). Our goal isto characterize the size of C

max

. Let

Xk :=X

v2[n]{|Cv |>k},

be the number of vertices in clusters of size at least k. There is a natural connectionbetween Xk and C

max

, namely,

|Cmax

| > k () Xk > 0 () Xk > k.

241

A first moment argument on Xk and the previous lemma immediately imply anupper bound on the size of C

max

in the subcritical case.

Proof of Theorem 5.28. Let c = (1 + )I�1

� for > 0. We use the first momentmethod on Xk. By symmetry and the first moment method (Corollary 2.5),

Pn,pn [|Cmax

| > c log n] = Pn,pn [Xc logn > 0]

En,pn [Xc logn]

= nPn,pn [|C1| > c log n] . (5.33)

By Lemma 5.34,

Pn,pn [|Cmax

| > c log n] = O(n · n�(1+)) = O(n�) ! 0,

as n ! +1.

In fact we prove below that the largest component is of size roughly I�1

� log n. Butfirst we turn to the supercritical regime.

Second moment arguments To characterize the size of the largest cluster in thesupercritical case, we need a second moment argument. Assume � > 1. For � > 0and ↵ < ⇣�, let �,↵ be as defined in Lemma 5.34. Set

kn := (1 + �,↵)I�1

� log n and kn := ↵n.

We call a vertex v such that |Cv| kn a small vertex. Letsmall vertex

Yk :=X

v2[n]{|Cv |k}.

Then Ykn is the number of small vertices. Observe that by definition Yk = n�Xk.Hence by Lemma 5.34, the expectation of Ykn is

En,pn [Ykn ] = n(1� Pn,pn [|Cv| > kn]) = (1� ⇣�)n+O�

log2 n�

. (5.34)

Using Chebyshev’s inequality (Theorem 2.13), we prove that Ykn is close to itsexpectation:

Lemma 5.35 (Concentration of Ykn). For any � 2 (1/2, 1) and � < 2� � 1,

Pn,pn [|Ykn � (1� ⇣�)n| � n� ] O(n��).

242

Lemma 5.35, which is proved below, leads to our main result in the supercriticalcase: the existence of the giant component, a unique cluster of size linear in n.

giant component

Proof of Theorem 5.29. Take ↵ 2 (⇣�/2, ⇣�) and let kn and kn be as above. LetB1,n := {|Xkn

� ⇣�n| � n�}. Because � < 1, for n large enough, the event Bc1,n

implies that Xkn� 1 and, in particular, that

|Cmax

| Xkn.

Let B2,n := {9v, |Cv| 2 [kn, kn]}. If, in addition to Bc

1,n, Bc2,n also holds then

|Cmax

| Xkn= X

¯kn .

There is equality in the last display if there is a unique cluster of size greater thankn. This is indeed the case under Bc

1,n \Bc2,n: if there were two distinct clusters of

size kn, then since 2↵ > ⇣� we would have for n large enough

Xkn= X

¯kn > 2kn = 2↵n > ⇣�n+ n� ,

a contradiction. Hence we have proved that, under Bc1,n \ Bc

2,n, we have

|Cmax

| = Xkn= X

¯kn .

Applying Lemmas 5.34 and 5.35 concludes the proof.

It remains to prove Lemma 5.35.

Proof of Lemma 5.35. The main task is to bound the variance of Ykn . Note that

En,pn [Y2

k ] =X

u,v2[n]Pn,pn [|Cu| k, |Cv| k]

=X

u,v2[n]

Pn,pn [|Cu| k, |Cv| k, u , v]

+Pn,pn [|Cu| k, |Cv| k, u < v]

, (5.35)

where u , v indicates that u and v are in the same connected component.

243

To bound the first term in (5.35), we note that u , v implies that Cu = Cv.Hence,

X

u,v2[n]Pn,pn [|Cu| k, |Cv| k, u , v] =

X

u,v2[n]Pn,pn [|Cu| k, v 2 Cu]

=X

u,v2[n]En,pn [ {|Cu|k} {v2Cu}]

=X

u2[n]En,pn [|Cu| {|Cu|k}]

= nEn,pn [|C1| {|C1

|k}] nk. (5.36)

To bound the second term in (5.35), we sum over the size of Cu and note that,conditioned on {|Cu| = `, u < v}, the size of Cv has the same distribution as theunconditional size of C

1

in a Gn�`,pn random graph, that is,

Pn,pn [|Cv| k | |Cu| = `, u < v] = Pn�`,pn [|C1| k].

Observe that this probability is increasing in `. HenceX

u,v2[n]

X

`k

Pn,pn [|Cu| = `, |Cv| k, u < v]

=X

u,v2[n]

X

`k

Pn,pn [|Cu| = `, u < v]Pn,pn [|Cv| k | |Cu| = `, u < v]

X

u,v2[n]

X

`k

Pn,pn [|Cu| = `]Pn�k,pn [|Cv| k]

=X

u,v2[n]Pn,pn [|Cu| k]Pn�k,pn [|Cv| k].

To get a bound on the variance of Yk, we need to relate this last expression to(En,pn [Yk])

2. For this purpose we define

�k := Pn�k,pn [|C1| k]� Pn,pn [|C1| k].

Then, plugging this back above, we getX

u,v2[n]

X

`k

Pn,pn [|Cu| = `, |Cv| k, u < v]

X

u,v2[n]Pn,pn [|Cu| k](Pn,pn [|Cv| k] +�k)

(En,pn [Yk])2 + n2|�k|,

244

and it remains to bound �k. We use a coupling argument. Let H ⇠ Gn�k,pn

and construct H 0 ⇠ Gn,pn in the following manner: let H 0 coincide with H onthe first n � k vertices then pick the rest the edges independently. Then clearly�k � 0 since the cluster of 1 in H 0 includes the cluster of 1 in H . In fact, �k isthe probability that under this coupling the cluster of 1 has at most k vertices inH but not in H 0. That implies in particular that at least one of the vertices in thecluster of 1 in H is connected to a vertex in {n� k+1, . . . , n}. Hence by a unionbound over those edges

�k k2pn,

andX

u,v2[n]Pn,pn [|Cu| k, |Cv| k, u , v] (En,pn [Yk])

2 + �k2n. (5.37)

Combining (5.36) and (5.37), we get

Var[Yk] 2�k2n.

The result follows from Chebyshev’s inequality (Theorem 2.13) and Equation (5.34).

A similar second moment argument also gives a lower bound on the size of thelargest component in the subcritical case. We proved in Theorem 5.28 that, when� < 1, the probability of observing a connected component of size significantlylarger than I�1

� log n is vanishingly small. In the other direction, we get:

Theorem 5.36 (Subcritical regime: lower bound on the largest cluster). Let Gn ⇠Gn,pn where pn = �

n with � 2 (0, 1). For all 2 (0, 1),

Pn,pn

⇥|Cmax

| (1� )I�1

� log n⇤

= o(1).

Proof. Recall thatXk :=

X

v2[n]{|Cv |>k}.

It suffices to prove that with probability 1 � o(1) we have Xk > 0 when k =(1 � )I�1

� log n. To apply the second moment method, we need an upper boundon the second moment of Xk and a lower bound on its first moment. The followinglemma is closely related to Lemma 5.35. Exercise 5.8 asks for a proof.

Lemma 5.37 (Second moment of Xk). Assume � < 1. There is C > 0 such that

En,pn [X2

k ] (En,pn [Xk])2 + Cnke�kI� , 8k � 0.

245

Lemma 5.38 (First moment of Xk). Let kn = (1 � )I�1

� log n. Then, for any� 2 (0,) we have that

En,pn [Xkn ] = ⌦(n�),

for n large enough.

Proof. By Lemma 5.31,

En,pn [Xkn ] = nPn,pn [|C1| > kn]

= nP[W� > kn]�O�

k2n�

. (5.38)

Once again, we use the random-walk representation of the total progeny of abranching process (Theorem 5.21). Using the notation of Lemma 5.31,

P[W� > kn] =X

t>kn

1

tP"

tX

i=1

Z�i = t� 1

#

=X

t>kn

1

te��t (�t)

t�1

(t� 1)!.

Using Stirling’s formula, we note that

e��t (�t)t�1

t!= e��t (�t)t

�t(t/e)tp2⇡t(1 + o(1))

=1 + o(1)

�tp2⇡t

exp (�t�+ t log �+ t)

=1 + o(1)

�p2⇡t3

exp (�tI�) .

For any " > 0, for n large enough,

P[W� > kn] � ��1

X

t>kn

exp (�t(I� + "))

= ⌦ (exp (�kn(I� + "))) .

For any � 2 (0,), plugging the last line back into (5.38) and taking " smallenough gives

En,pn [Xkn ] = ⌦ (n exp (�kn(I� + ")))

= ⌦�

exp�{1� (1� )I�1

� (I� + ")} log n��

= ⌦(n�).

246

We return to the proof of Theorem 5.28. Let kn = (1 � )I�1

� log n. By thesecond moment method (Theorem 2.18) and Lemmas 5.37 and 5.38,

Pn,pn [Xkn > 0] � (EXkn)2

E[X2

kn]

�✓

1 +O(nkne�knI�)

⌦(n2�)

◆�1

=

1 +O(kne logn)

⌦(n2�)

◆�1

! 1,

for � close enough to . That proves the claim.

Critical regime via martingales To be written. See [NP10].

5.4 Further applications

5.4.1 . Uniform random trees: local limit

To be written. See [Gri81].

Exercises

Exercise 5.1 (Galton-Watson process: geometric offspring). Let (Zt) be a Galton-Watson branching process with geometric offspring distribution (started at 0), i.e.,pk = p(1 � p)k for all k � 0, for some p 2 (0, 1). Let q := 1 � p, let m be themean of the offspring distribution, and let Mt = m�tZt.

a) Compute the probability generating function f of {pk}k�0

and the extinctionprobability ⌘ := ⌘p as a function of p.

b) If G is a 2⇥ 2 matrix, define

G(s) :=G

11

s+G12

G21

s+G22

.

Show that G(H(s)) = (GH)(s).

c) Assume m 6= 1. Use b) to derive

ft(s) =pmt(1� s) + qs� p

qmt(1� s) + qs� p.

247

Deduce that when m > 1

E[exp(��M1)] = ⌘ + (1� ⌘)(1� ⌘)

�+ (1� ⌘).

d) Assume m = 1. Show that

ft(s) =t� (t� 1)s

t+ 1� ts,

and deduce thatE[e��Zt/t |Zt > 0] ! 1

1 + �.

Exercise 5.2 (Supercritical branching process: infinite line of descent). Let (Zt)be a supercritical Galton-Watson branching process with offspring distribution{pk}k�0

. Let ⌘ be the extinction probability and define ⇣ := 1 � ⌘. Let Z1t

be the number of individuals in the t-th generation with an infinite line of descent,i.e., whose descendant subtree is infinite. Denote by S the event of nonextinctionof (Zt). Define p1

0

:= 0 and

p1k := ⇣�1

X

j�k

j

k

⌘j�k⇣kpj .

a) Show that {p1k }k�0

is a probability distribution and compute its expectation.

b) Show that for any k � 0

P[Z11

= k | S] = p1k .

[Hint: Condition on Z1

.]

c) Show by induction on t that, conditioned on nonextinction, the process (Z1t )

has the same distribution as a Galton-Watson branching process with off-spring distribution {p1k }k�0

.

Exercise 5.3 (Hitting-time theorem: nearest-neighbor walk). Let X1

, X2

, . . . bei.i.d. random variables taking value +1 with probability p and �1 with probabilityq := 1� p. Let St :=

Pti=1

Xi with S0

:= 0 and Mt := max{Si : 0 i t}.

a) For r � 1, use the reflection principle to show that

P[Mt � r, St = b] =

(

P[St = b], b � r,

(q/p)r�b P[St = 2r � b], b < r.

248

b) Use a) to give a proof of the hitting-time theorem in the following specialcase: letting ⌧b be the first time St hits b > 0, show that for all t � 1

P[⌧b = t] =b

tP[St = b].

[Hint: Consider the probability P[Mt�1

= St�1

= b� 1, St = b].]

Exercise 5.4 (Percolation on bounded-degree graphs). Let G = (V,E) be a count-able graph such that all vertices have degree bounded by b+ 1 for b � 2. Let 0 bea distinguished vertex in G. For bond percolation on G, prove that

pc

(G) � pc

(bTb),

by bounding the expected size of the cluster of 0. [Hint: Consider self-avoidingpaths started at 0.]

Exercise 5.5 (Percolation on bTb: critical exponent of ✓(p)). Consider bond per-colation on the rooted infinite b-ary tree bTb with b > 2. For " 2 [0, 1 � 1

b ] andu 2 [0, 1], define

h(", u) := u� ��

1� 1

b � "�

(1� u) + 1

b + "�b

.

a) Show that there is a constant C > 0 not depending on ", u such that�

h(", u)� b"u+b� 1

2bu2�

C(u3 _ "u2).

b) Use a) to prove that

limp#p

c

(bTb)

✓(p)

(p� pc

(bTb))=

2b2

b� 1.

Exercise 5.6 (Percolation on bT2

: higher moments of |C0

|). Consider bond per-colation on the rooted infinite binary tree bT

2

. For density p < 1

2

, let Zp be aninteger-valued random variable with distribution

Pp[Zp = `] =`Pp[|C0| = `]

Ep|C0| , 8` � 1.

a) Using the explicit formula for Pp[|C0| = `] derived in Section 5.3.1, showthat for all 0 < a < b < +1

Pp

"

Zp

(1/4)(12

� p)�2

2 [a, b]

#

! C

Z b

ax�1/2e�xdx,

as p " 1

2

, for some constant C > 0.

249

b) Show that for all k � 2 there is Ck > 0 such that

limp"p

c

(

bT2

)

Ep|C0|k(p

c

(bT2

)� p)�1�2(k�1)

= Ck.

c) What happens when p # pc

(bT2

)?

Exercise 5.7 (Branching process approximation: improved bound). Let pn = �n

with � > 0. Let Wn,pn , respectively W�, be the total progeny of a branchingprocess with offspring distribution Bin(n, pn), respectively Poi(�).

a) Show that

|P[Wn,pn � k]� P[W� � k]| max{P[Wn,pn � k,W� < k],P[Wn,pn < k,W� � k]}.

b) Couple the two processes step-by-step and use a) to show that

|P[Wn,pn � k]� P[W� � k]| �2

n

k�1

X

i=1

P[W� � i].

Exercise 5.8 (Subcritical Erdos-Renyi: Second moment). Prove Lemma 5.37.

Bibliographic remarks

Section 5.1 See [Dur10, Section 5.3.4] for a quick introduction to branching pro-cesses. A more detailed overview relating to its use in discrete probability canbe found in [vdH14, Chapter 3]. The classical reference on branching processesis [AN04]. The Kesten-Stigum theorem is due to Kesten and Stigum [KS66a,KS66b, KS67]. Our proof of a weaker version with an L2 condition follows [Dur10,Example 5.4.3]. Spitzer’s combinatorial lemma is due to Spitzer [Spi56]. Theproof presented here follows [Fel71, Section XII.6]. The hitting-time theoremwas first proved by Otter [Ott49]. Several proofs of a generalization can be foundin [Wen75]. The critical percolation threshold for percolation on Galton-Watsontrees is due to R. Lyons [Lyo90].

Section 5.3 The presentation in Section 5.3.1 follows [vdH10]. See also [Dur85].For much more on the phase transition of Erdos-Renyi graphs, see e.g. [vdH14,Chapter 4], [JLR11, Chapter 5] and [Bol01, Chapter 6]. In particular a centrallimit theorem for the giant component, proved by several authors including Martin-Lof [ML98], Pittel [Pit90], and Barraez, Boucheron, and de la Vega [BBFdlV00],is established in [vdH14, Section 4.5].

250