a course in probability theory kl chung

433

Upload: anonymous-rp7bufy

Post on 14-Dec-2015

35 views

Category:

Documents


5 download

DESCRIPTION

Curso básico de probabilidad con una serie de ejercicios muy bien explicados.

TRANSCRIPT

Page 1: A Course in Probability Theory KL Chung
Page 2: A Course in Probability Theory KL Chung

ACOURSE INPROBABILITYTHEORY

THIRD EDITION

Page 3: A Course in Probability Theory KL Chung

A

COURSE IN

PROBABILITYTHEORY

THIRD EDITION

Kai Lai ChungStanford University

ACADEMIC PRESSA l-iorcc_lurt Science and Technology Company

San Diego San Francisco New YorkBoston London Sydney Tokyo

Page 4: A Course in Probability Theory KL Chung

This book is printed on acid-free paper .

COPYRIGHT © 2001, 1974, 1968 BY ACADEMIC PRESS

ALL RIGHTS RESERVED.

NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY

MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION

STORAGE AND RETRIEVAL SYSTEM . WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER .

Requests for permission to make copies of any part of the work should be mailed to thefollowing address : Permissions Department, Harcourt, Inc ., 6277 Sea Harbor Drive, Orlando,Florida 32887-6777 .

ACADEMIC PRESSA Harcourt Science and Technology Company525 B Street, Suite 1900, San Diego, CA 92101-4495, USAhttp://www.academicpress .com

ACADEMIC PRESSHarcourt Place, 32 Jamestown Road, London, NW 17BY, UKhttp://www.academicpress.com

Library of Congress Cataloging in Publication Data : 00-106712

International Standard Book Number : 0-12-174151-6

PRINTED IN THE UNITED STATES OF AMERICA

00 01 02 03 IP 9 8 7 6 5 4 3 2 1

Page 5: A Course in Probability Theory KL Chung

Contents

Preface to the third edition

ixPreface to the second edition

xiPreface to the first edition

xiii

1 J Distribution function

1 .1 Monotone functions

11.2 Distribution functions

71 .3 Absolutely continuous and singular distributions

11

2 I Measure theory

2 .1 Classes of sets

162.2 Probability measures and their distribution

functions

21

3 1 Random variable. Expectation . Independence

3.1 General definitions

343.2 Properties of mathematical expectation

413.3 Independence

53

4 Convergence concepts

4.1 Various modes of convergence 684.2 Almost sure convergence ; Borel-Cantelli lemma

75

Page 6: A Course in Probability Theory KL Chung

vi I CONTENTS

4 .3 Vague convergence

844.4 Continuation

914.5 Uniform integrability ; convergence of moments

99

5 1 Law of large numbers . Random series

5.1 Simple limit theorems

1065.2 Weak law of large numbers

1125.3 Convergence of series

1215.4 Strong law of large numbers

1295.5 Applications

138Bibliographical Note

148

6 1 Characteristic function

6.1 General properties ; convolutions

1506.2 Uniqueness and inversion

1606.3 Convergence theorems

1696.4 Simple applications

1756.5 Representation theorems

1876.6 Multidimensional case ; Laplace transforms

196Bibliographical Note

204

7 1 Central limit theorem and its ramifications

7.1 Liapounov's theorem

2057.2 Lindeberg-Feller theorem

2147.3 Ramifications of the central limit theorem

2247.4 Error estimation

2357.5 Law of the iterated logarithm

2427.6 Infinite divisibility

250Bibliographical Note

261

8 1 Random walk

8 .1 Zero-or-one laws

2638.2 Basic notions

2708.3 Recurrence

2788.4 Fine structure

2888.5 Continuation

298Bibliographical Note

308

Page 7: A Course in Probability Theory KL Chung

9 1 Conditioning . Markov property. Martingale

9.1 Basic properties of conditional expectation

3109.2 Conditional independence ; Markov property

3229.3 Basic properties of smartingales

3349.4 Inequalities and convergence

3469.5 Applications

360Bibliographical Note

373

Supplement: Measure and Integral

1 Construction of measure

3752 Characterization of extensions

3803 Measures in R

3874 Integral

3955 Applications

407

General Bibliography

413

Index 415

CONTENTS I vii

Page 8: A Course in Probability Theory KL Chung

Preface to the third edition

In this new edition, I have added a Supplement on Measure and Integral .The subject matter is first treated in a general setting pertinent to an abstractmeasure space, and then specified in the classic Borel-Lebesgue case for thereal line. The latter material, an essential part of real analysis, is presupposedin the original edition published in 1968 and revised in the second editionof 1974. When I taught the course under the title "Advanced Probability"at Stanford University beginning in 1962, students from the departments ofstatistics, operations research (formerly industrial engineering), electrical engi-neering, etc . often had to take a prerequisite course given by other instructorsbefore they enlisted in my course . In later years I prepared a set of notes,lithographed and distributed in the class, to meet the need . This forms thebasis of the present Supplement . It is hoped that the result may as well servein an introductory mode, perhaps also independently for a short course in thestated topics .

The presentation is largely self-contained with only a few particular refer-ences to the main text. For instance, after (the old) §2 .1 where the basic notionsof set theory are explained, the reader can proceed to the first two sections ofthe Supplement for a full treatment of the construction and completion of ageneral measure ; the next two sections contain a full treatment of the mathe-matical expectation as an integral, of which the properties are recapitulated in§3 .2. In the final section, application of the new integral to the older Riemannintegral in calculus is described and illustrated with some famous examples .Throughout the exposition, a few side remarks, pedagogic, historical, even

Page 9: A Course in Probability Theory KL Chung

x I PREFACE TO THE THIRD EDITION

judgmental, of the kind I used to drop in the classroom, are approximatelyreproduced .

In drafting the Supplement, I consulted Patrick Fitzsimmons on severaloccasions for support. Giorgio Letta and Bernard Bru gave me encouragementfor the uncommon approach to Borel's lemma in §3, for which the usual proofalways left me disconsolate as being too devious for the novice's appreciation .

A small number of additional remarks and exercises have been added tothe main text .

Warm thanks are due: to Vanessa Gerhard of Academic Press who deci-phered my handwritten manuscript with great ease and care ; to Isolde Fieldof the Mathematics Department for unfailing assistence ; to Jim Luce for amission accomplished . Last and evidently not least, my wife and my daughterCorinna performed numerous tasks indispensable to the undertaking of thispublication .

Page 10: A Course in Probability Theory KL Chung

Preface to the second edition

This edition contains a good number of additions scattered throughout thebook as well as numerous voluntary and involuntary changes . The reader whois familiar with the first edition will have the joy (or chagrin) of spotting newentries. Several sections in Chapters 4 and 9 have been rewritten to make thematerial more adaptable to application in stochastic processes . Let me reiteratethat this book was designed as a basic study course prior to various possiblespecializations . There is enough material in it to cover an academic year inclass instruction, if the contents are taken seriously, including the exercises .On the other hand, the ordering of the topics may be varied considerably tosuit individual tastes. For instance, Chapters 6 and 7 dealing with limitingdistributions can be easily made to precede Chapter 5 which treats almostsure convergence. A specific recommendation is to take up Chapter 9, whereconditioning makes a belated appearance, before much of Chapter 5 or evenChapter 4 . This would be more in the modern spirit of an early weaning fromthe independence concept, and could be followed by an excursion into theMarkovian territory .

Thanks are due to many readers who have told me about errors, obscuri-ties, and inanities in the first edition . An incomplete record includes the namesbelow (with apology for forgotten ones) : Geoff Eagleson, Z . Govindarajulu,David Heath, Bruce Henry, Donald Iglehart, Anatole Joffe, Joseph Marker,P. Masani, Warwick Millar, Richard Olshen, S . M. Samuels, David Siegmund .T. Thedeen, A. Gonzalez Villa lobos, Michel Weil, and Ward Whitt . Therevised manuscript was checked in large measure by Ditlev Monrad. The

Page 11: A Course in Probability Theory KL Chung

xii I PREFACE TO THE SECOND EDITION

galley proofs were read by David Kreps and myself independently, and it wasfun to compare scores and see who missed what. But since not all parts ofthe old text have undergone the same scrutiny, readers of the new editionare cordially invited to continue the fault-finding . Martha Kirtley and JoanShepard typed portions of the new material . Gail Lemmond took charge ofthe final page-by-page revamping and it was through her loving care that therevision was completed on schedule .

In the third printing a number of misprints and mistakes, mostly minor, arecorrected. I am indebted to the following persons for some of these corrections :Roger Alexander, Steven Carchedi, Timothy Green, Joseph Horowitz, EdwardKorn, Pierre van Moerbeke, David Siegmund .

In the fourth printing, an oversight in the proof of Theorem 6 .3 .1 iscorrected, a hint is added to Exercise 2 in Section 6 .4, and a simplificationmade in (VII) of Section 9 .5. A number of minor misprints are also corrected. Iam indebted to several readers, including Asmussen, Robert, Schatte, Whitleyand Yannaros, who wrote me about the text .

Page 12: A Course in Probability Theory KL Chung

Preface to the first edition

A mathematics course is not a stockpile of raw material nor a random selectionof vignettes . It should offer a sustained tour of the field being surveyed anda preferred approach to it. Such a course is bound to be somewhat subjectiveand tentative, neither stationary in time nor homogeneous in space . But itshould represent a considered effort on the part of the author to combine hisphilosophy, conviction, and experience as to how the subject may be learnedand taught. The field of probability is already so large and diversified thateven at the level of this introductory book there can be many different viewson orientation and development that affect the choice and arrangement of itscontent. The necessary decisions being hard and uncertain, one too often takesrefuge by pleading a matter of "taste ." But there is good taste and bad tastein mathematics just as in music, literature, or cuisine, and one who dabbles init must stand judged thereby .

It might seem superfluous to emphasize the word "probability" in a bookdealing with the subject . Yet on the one hand, one used to hear such speciousutterance as "probability is just a chapter of measure theory"; on the otherhand, many still use probability as a front for certain types of analysis such ascombinatorial, Fourier, functional, and whatnot . Now a properly constructedcourse in probability should indeed make substantial use of these and otherallied disciplines, and a strict line of demarcation need never be drawn. ButPROBABILITY is still distinct from its tools and its applications not only inthe final results achieved but also in the manner of proceeding . This is perhaps

Page 13: A Course in Probability Theory KL Chung

xiv I PREFACE TO THE FIRST EDITION

best seen in the advanced study of stochastic processes, but will already beabundantly clear from the contents of a general introduction such as this book .

Although many notions of probability theory arise from concrete modelsin applied sciences, recalling such familiar objects as coins and dice, genesand particles, a basic mathematical text (as this pretends to be) can no longerindulge in diverse applications, just as nowadays a course in real variablescannot delve into the vibrations of strings or the conduction of heat . Inciden-tally, merely borrowing the jargon from another branch of science withouttreating its genuine problems does not aid in the understanding of concepts orthe mastery of techniques .

A final disclaimer : this book is not the prelude to something else and doesnot lead down a strait and righteous path to any unique fundamental goal .Fortunately nothing in the theory deserves such single-minded devotion, asapparently happens in certain other fields of mathematics . Quite the contrary,a basic course in probability should offer a broad perspective of the open fieldand prepare the student for various further possibilities of study and research .To this aim he must acquire knowledge of ideas and practice in methods, anddwell with them long and deeply enough to reap the benefits .

A brief description will now be given of the nine chapters, with somesuggestions for reading and instruction . Chapters 1 and 2 are preparatory . Asynopsis of the requisite "measure and integration" is given in Chapter 2,together with certain supplements essential to probability theory . Chapter 1 isreally a review of elementary real variables ; although it is somewhat expend-able, a reader with adequate background should be able to cover it swiftly andconfidently-with something gained from the effort . For class instruction itmay be advisable to begin the course with Chapter 2 and fill in from Chapter 1as the occasions arise. Chapter 3 is the true introduction to the language andframework of probability theory, but I have restricted its content to what iscrucial and feasible at this stage, relegating certain important extensions, suchas shifting and conditioning, to Chapters 8 and 9 . This is done to avoid over-loading the chapter with definitions and generalities that would be meaninglesswithout frequent application . Chapter 4 may be regarded as an assembly ofnotions and techniques of real function theory adapted to the usage of proba-bility. Thus, Chapter 5 is the first place where the reader encounters bona fidetheorems in the field . The famous landmarks shown there serve also to intro-duce the ways and means peculiar to the subject . Chapter 6 develops some ofthe chief analytical weapons, namely Fourier and Laplace transforms, neededfor challenges old and new. Quick testing grounds are provided, but for majorbattlefields one must await Chapters 7 and 8 . Chapter 7 initiates what has beencalled the "central problem" of classical probability theory . Time has marchedon and the center of the stage has shifted, but this topic remains withoutdoubt a crowning achievement . In Chapters 8 and 9 two different aspects of

Page 14: A Course in Probability Theory KL Chung

PREFACE TO THE FIRST EDITION I xv

(discrete parameter) stochastic processes are presented in some depth . Therandom walks in Chapter 8 illustrate the way probability theory transformsother parts of mathematics . It does so by introducing the trajectories of aprocess, thereby turning what was static into a dynamic structure . The samerevolution is now going on in potential theory by the injection of the theory ofMarkov processes . In Chapter 9 we return to fundamentals and strike out inmajor new directions . While Markov processes can be barely introduced in thelimited space, martingales have become an indispensable tool for any seriousstudy of contemporary work and are discussed here at length . The fact thatthese topics are placed at the end rather than the beginning of the book, wherethey might very well be, testifies to my belief that the student of mathematicsis better advised to learn something old before plunging into the new .

A short course may be built around Chapters 2, 3, 4, selections fromChapters 5, 6, and the first one or two sections of Chapter 9 . For a richer fare,substantial portions of the last three chapters should be given without skippingany one of them . In a class with solid background, Chapters 1, 2, and 4 neednot be covered in detail. At the opposite end, Chapter 2 may be filled in withproofs that are readily available in standard texts . It is my hope that this bookmay also be useful to mature mathematicians as a gentle but not so meagerintroduction to genuine probability theory . (Often they stop just before thingsbecome interesting!) Such a reader may begin with Chapter 3, go at once toChapter 5 with a few glances at Chapter 4, skim through Chapter 6, and takeup the remaining chapters seriously to get a real feeling for the subject .

Several cases of exclusion and inclusion merit special comment . I choseto construct only a sequence of independent random variables (in Section 3 .3),rather than a more general one, in the belief that the latter is better absorbed in acourse on stochastic processes . I chose to postpone a discussion of conditioninguntil quite late, in order to follow it up at once with varied and worthwhileapplications . With a little reshuffling Section 9 .1 may be placed right afterChapter 3 if so desired . I chose not to include a fuller treatment of infinitelydivisible laws, for two reasons : the material is well covered in two or threetreatises, and the best way to develop it would be in the context of the under-lying additive process, as originally conceived by its creator Paul Levy. Itook pains to spell out a peripheral discussion of the logarithm of charac-teristic function to combat the errors committed on this score by numerousexisting books . Finally, and this is mentioned here only in response to a queryby Doob, I chose to present the brutal Theorem 5 .3 .2 in the original formgiven by Kolmogorov because I want to expose the student to hardships inmathematics .

There are perhaps some new things in this book, but in general I havenot striven to appear original or merely different, having at heart the interestsof the novice rather than the connoisseur . In the same vein, I favor as a

Page 15: A Course in Probability Theory KL Chung

xvi I PREFACE TO THE FIRST EDITION

rule of writing (euphemistically called "style") clarity over elegance . In myopinion the slightly decadent fashion of conciseness has been overwrought,particularly in the writing of textbooks . The only valid argument I have heardfor an excessively terse style is that it may encourage the reader to think forhimself. Such an effect can be achieved equally well, for anyone who wishesit, by simply omitting every other sentence in the unabridged version .

This book contains about 500 exercises consisting mostly of special casesand examples, second thoughts and alternative arguments, natural extensions,and some novel departures . With a few obvious exceptions they are neitherprofound nor trivial, and hints and comments are appended to many of them .If they tend to be somewhat inbred, at least they are relevant to the text andshould help in its digestion. As a bold venture I have marked a few of themwith * to indicate a "must," although no rigid standard of selection has beenused. Some of these are needed in the book, but in any case the reader's studyof the text will be more complete after he has tried at least those problems .

Over a span of nearly twenty years I have taught a course at approx-imately the level of this book a number of times . The penultimate draft ofthe manuscript was tried out in a class given in 1966 at Stanford University .Because of an anachronism that allowed only two quarters to the course (asif probability could also blossom faster in the California climate!), I had toomit the second halves of Chapters 8 and 9 but otherwise kept fairly closelyto the text as presented here . (The second half of Chapter 9 was covered in asubsequent course called "stochastic processes .") A good fraction of the exer-cises were assigned as homework, and in addition a great majority of themwere worked out by volunteers . Among those in the class who cooperatedin this manner and who corrected mistakes and suggested improvements are :Jack E. Clark, B . Curtis Eaves, Susan D. Horn, Alan T. Huckleberry, ThomasM . Liggett, and Roy E. Welsch, to whom I owe sincere thanks . The manuscriptwas also read by J . L. Doob and Benton Jamison, both of whom contributeda great deal to the final revision . They have also used part of the manuscriptin their classes. Aside from these personal acknowledgments, the book owesof course to a large number of authors of original papers, treatises, and text-books . I have restricted bibliographical references to the major sources whileadding many more names among the exercises . Some oversight is perhapsinevitable ; however, inconsequential or irrelevant "name-dropping" is delib-erately avoided, with two or three exceptions which should prove the rule .

It is a pleasure to thank Rosemarie Stampfel and Gail Lemmond for theirsuperb job in typing the manuscript .

Page 16: A Course in Probability Theory KL Chung

Distribution function

1 .1 Monotone functions

We begin with a discussion of distribution functions as a traditional wayof introducing probability measures . It serves as a convenient bridge fromelementary analysis to probability theory, upon which the beginner may pauseto review his mathematical background and test his mental agility . Some ofthe methods as well as results in this chapter are also useful in the theory ofstochastic processes .

In this book we shall follow the fashionable usage of the words "posi-tive", "negative", "increasing", "decreasing" in their loose interpretation .For example, "x is positive" means "x > 0" ; the qualifier "strictly" will beadded when "x > 0" is meant . By a "function" we mean in this chapter a realfinite-valued one unless otherwise specified .

Let then f be an increasing function defined on the real line (-oc, +oo) .Thus for any two real numbers x1 and x2,

(1)

x1 < x2 = f (XI) < f (x2) •

We begin by reviewing some properties of such a function . The notation"t 'r x" means "t < x, t - x"; "t ,, x" means "t > x, t -a x" .

Page 17: A Course in Probability Theory KL Chung

2 1 DISTRIBUTION FUNCTION

(i) For each x, both unilateral limits

(2)

h m f (t) = f (x-) and lm f (t) = f (x+)

exist and are finite . Furthermore the limits at infinity

t o m f (t) = f (-oo) and t am f (t) = f (+oo)

exist; the former may be -oo, the latter may be +oo .This follows from monotonicity ; indeed

f (x-) = sup f (t), f (x+) = inf f (t) .-oo<t<x

x<t<+ao

(ii) For each x, f is continuous at x if and only if

f (x- ) = f (x) = f (x+).

To see this, observe that the continuity of a monotone functionequivalent to the assertion that

lm f (t) = f (x) = 1 m f (t) .

By (i), the limits above exist as f (x-) and f (x+) and

(3)

.f(x-) < .f (x) < f (x+),

from which (ii) follows .

f at x is

In general, we say that the function f has a jump at x iff the two limitsin (2) both exist but are unequal . The value of f at x itself, viz. f (x), may bearbitrary, but for an increasing f the relation (3) must hold. As a consequenceof (i) and (ii), we have the next result .

(iii) The only possible kind of discontinuity of an increasing function is ajump. [The reader should ask himself what other kinds of discontinuity thereare for a function in general .]

If there is a jump at x, we call x a point of jump of f and the numberf (x+) - f (x-) the size of the jump or simply "the jump" at x .

It is worthwhile to observe that points of jump may have a finite pointof accumulation and that such a point of accumulation need not be a point ofjump itself . Thus, the set of points of jump is not necessarily a closed set .

Page 18: A Course in Probability Theory KL Chung

1 .1 MONOTONE FUNCTIONS 1 3

Example 1 . Let x0 be an arbitrary real number, and define a function f as follows :

f(x)=0

for x<x0 - l ;

1

1

1=1- n forx0- n <x<x0-n+l'n=1 .2, . . . ;

for x > x0 .

The point x0 is a point of accumulation of the points of jump {x 0 - 1 In, n > 11, butf is continuous at x0 .

Before we discuss the next example, let us introduce a notation that willbe used throughout the book . For any real number t, we set

0

for x < t,1

for x > t .

We shall call the function S t the point mass at t .

Example 2. Let {a" , n > 1) be any given enumeration of the set of all rationalnumbers, and let {b,,, n > 1) be a set of positive (>0) numbers such that b,, < 00 .

For instance, we may take b„ = 2" . Consider now

00(5)

f (x) =

b,, 8,,, (x) .

Since 0 < 8 a „ (x) < 1 for every n and x, the series in (5) is absolutely and uniformlyconvergent. Since each 5a , is increasing, it follows that if x 1 < x,,

.f (x2) - .f (xi) =

b„ [Sa„ (x2) - ba n (x1 )l >- 0 -

Hence f is increasing . Thanks to the uniform convergence (why?) we may deducethat for each x,

00(6 )

f (x+) - f (x-)

b„ [Sa,, (x+) - (San (x- )1 .„ =1

But for each n, the number in the square brackets above is 0 or I according as x ,A a,,or x = a" . Hence if x is different from all the a,, 's, each term on the right side of (6)vanishes ; on the other hand if x = a k , say, then exactly one term, that correspondingto n = k, does not vanish and yields the value bk for the whole series . This provesthat the function f has jumps at all the rational points and nowhere else .

This example shows that the set of points of jump of an increasing functionmay be everywhere dense ; in fact the set of rational numbers in the example may

Page 19: A Course in Probability Theory KL Chung

4 1 DISTRIBUTION FUNCTION

be replaced by an arbitrary countable set without any change of the argument . Wenow show that the condition of countability is indispensable . By "countable" we meanalways "finite (possibly empty) or countably infinite" .

(iv) The set of discontinuities of f is countable .

We shall prove this by a topological argument of some general applica-bility. In Exercise 3 after this section another proof based on an equally usefulcounting argument will be indicated . For each point of jump x consider theopen interval IX = (f (x-), f (x+)) . If x' is another point of jump and x < x',say, then there is a point x such that x < x < x'. Hence by monotonicity wehave

f (x+) < f (x) < f (x' - ) .

It follows that the two intervals Ix and Ix , are disjoint, though they mayabut on each other if f (x+) = f (x'-) . Thus we may associate with the set ofpoints of jump in the domain of f a certain collection of pairwise disjoint openintervals in the range of f . Now any such collection is necessarily a countableone, since each interval contains a rational number, so that the collection ofintervals is in one-to-one correspondence with a certain subset of the rationalnumbers and the latter is countable . Therefore the set of discontinuities is alsocountable, since it is in one-to-one correspondence with the set of intervalsassociated with it .

(v) Let f 1 and f 2 be two increasing functions and D a set that is (every-where) dense in (-oc, +oc) . Suppose that

Vx E D : f 1(x) = f 2(X)-

Then f 1 and f 2 have the same points of jump of the same size, and theycoincide except possibly at some of these points of jump .

To see this, let x be an arbitrary point and let t o E D, t;, E D, t, 'r x,t,'

x . Such sequences exist since D is dense. It follows from (i) that

f 1(x-) = lim f 1 (t„) = lira f 2 (t„) = f 2 (x-),(6)

In particular

n

n

f 1 (x+)=lnmf l (t;,)=limf2(t;,)= f2(x+) .

Vx: f I (X+) - f 1(x- ) = f 2(X+) - f2(x- ) .

The first assertion in (v) follows from this equation and (ii) . Furthermore iff 1 is continuous at x, then so is f 2 by what has just been proved, and we

Page 20: A Course in Probability Theory KL Chung

havef I (x) = f 1 (x- ) = f 2(x- ) = f 2(x),

proving the second assertion .How can f 1 and f2 differ at all? This can happen only when f 1 (x)

and f2(x) assume different values in the interval (f1(x-), f 1 (x+)) =(f2(x-), f 2(x+)) . It will turn out in Chapter 2 (see in particular Exercise 21of Sec. 2 .2) that the precise value of f at a point of jump is quite unessentialfor our purposes and may be modified, subject to (3), to suit our convenience .More precisely, given the function f, we can define a new function f inseveral different ways, such as

f (x) = f (x -), f (x) = f (x+), f (x) =f (x - )+ f (x+) ,

2

and use one of these instead of the original one . The third modification isfound to be convenient in Fourier analysis, but either one of the first two ismore suitable for probability theory. We have a free choice between them andwe shall choose the second, namely, right continuity .

(vi) If we putVx: f (x) = f (x+),

then f is increasing and right continuous everywhere .

Let us recall that an arbitrary function g is said to be right continuous atx iff lim t 4x g(t) exists and the limit, to be denoted by g(x+), is equal to g(x) .To prove the assertion (vi) we must show that

Vx: lm f (t+) = f (x+) .

This is indeed true for any f such that f (t+) exists for every t . For then :given any E > 0, there exists 8(E) > 0 such that

Vs E (x, x + 8): I f (s) - f (x+)I <- E .

Let t E (x, x + 6) and let s 4, t in the above, then we obtain

f (t+) - f (x+)I < E,

which proves that f is right continuous . It is easy to see that it is increasingif f is so .

Let D be dense in (-oo, +oo), and suppose that f is a function with thedomain D. We may speak of the monotonicity, continuity, uniform continuity,

1 .1 MONOTONE FUNCTIONS 1 5

Page 21: A Course in Probability Theory KL Chung

6 1 DISTRIBUTION FUNCTION

and so on of f on its domain of definition if in the usual definitions we restrictourselves to the points of D . Even if f is defined in a larger domain, we maystill speak of these properties "on D" by considering the "restriction of fto D" .

(vii) Let f be increasing on D, and define f on (-oo, +oo) as follows :

Vx: f (x) = inf f (t) .z<IED

Then f is increasing and right continuous everywhere .This is a generalization of (vi) . f is clearly increasing . To prove right

continuity let an arbitrary xo and c > 0 be given . There exists to E D, to > xo,such that

f(to) - E < f (xo) < f (to) .

Hence if t E D, xo < t < to, we have

0 < .f (t) - .f (xo) < .f (to) - .f (xo) <- E .

This implies by the definition of f that for xo < x < to we have

0<f(x)-f(xo)<E .

Since e is arbitrary, it follows that f is right continuous at xo, as was to beshown .

EXERCISES

1. Prove that for the f in Example 2 we have

00

f (-oc) = 0, f (+oo) = n-

2. Construct an increasing function on (-oo, +oo) with a jump of sizeone at, each integer, and constant between jumps . Such a function cannot berepresented as >„° =1 b„ S, (x) with b1, = 1 for each n, but a slight modificationwill do. Spell this out .**3. Suppose that f is increasing and that there exist real numbers A and

B such that Vx : A < f (x) < B . Show that for each E > 0, the number of jumpsof size exceeding E is at most (B - A)/E . Hence prove (iv), first for bounded

f and then in general .

** indicates specially selected exercises (as mentioned in the Preface) .

. v-' I I JIVJ

Page 22: A Course in Probability Theory KL Chung

8 1 DISTRIBUTION FUNCTION

bi the size at jump at ai, then

F(ai) - F(ai-) = bi

since F(ai +) = F(ai) . Consider the function

Fd(x) =

(x)J

which represents the sum of all the jumps of F in the half-line (-00, x) . It is .clearly increasing, right continuous, with

Fd (-oc) = 0, Fd (+oo) =i

(3)

Hence Fd is a bounded increasing function . It should constitute the "jumpingpart" of F, and if it is subtracted out from F, the remainder should be positive,contain no more jumps, and so be continuous . These plausible statements willnow be proved -they are easy enough but not really trivial .

Theorem 1.2.1 . LetF, (x) = F(x) - FAX);

then F, is positive, increasing, and continuous .

PROOF . Let x < x', then we have

(4)

Fd(x') - FAX) =x<aj<x'

b

< F(x') - F(x) .

It follows that both Fd and F, are increasing, and if we put x = -00 in theabove, we see that Fd < F and so F, is indeed positive . Next, Fd is rightcontinuous since each 8aJ is and the series defining Fd converges uniformlyin x ; the same argument yields (cf. Example 2 of Sec. 1 .1)

Fd(x) - Fd(x- ) {b1

if x = ai,0

otherwise .

Now this evaluation holds also if Fd is replaced by F according to the defi-nition of ai and bi , hence we obtain for each x:

F,(x)-Fjx-)=F(x)-F(x-)-[Fd(x)-Fd(x-)]=0 .

This shows that F, is left continuous ; since it is also right continuous, beingthe difference of two such functions, it is continuous .

x<aj <_x'- F(ai - )]

Page 23: A Course in Probability Theory KL Chung

1 .2 DISTRIBUTION FUNCTIONS 1 9

Theorem 1.2.2 . Let F be a d .f. Suppose that there exist a continuous func-tion G, and a function Gd of the form

Gd(x) -

bj a~ (x).l

[where {a~ } is a countable set of real numbers and>; Ib' I < oo], such that

F=G,+Gd,

thenGC = F, Gd = Fd,

where Fc and Fd are defined as before .PROOF. If Fd 0 Gd, then either the sets {aj } and {a'} are not identical,

or we may relabel the ai so that ai. = aJ for all j but bi :A bj for some j . Ineither case we have for at least one j, and a = aj or a' :

[Fd(a) - Fd(a - )] - [Gd(a) - Gd (a-)] 0 0.

Since Fc - G, = Gd - Fd, this implies that

F, (a) - Gja) - [Fc (a-) - G, (a- )] 0 0,

contradicting the fact that F, - Gc is a continuous function . Hence Fd = Gdand consequently Fc = G,

DEFINITION . A d.f . F that can be represented in the form

F= ) . bj8aj

where {aj } is a countable set of real numbers, bj > 0 for every j and >; bj = 1,is called a discrete d .f. A d.f. that is continuous everywhere is called a contin-uous d .f.

Suppose Fc =A 0, Fd # 0 in Theorem 1 .2.1, then we may set a = Fd (oo)sothat 0<a<1,

and write

(5)

1

_ 1F1 - Fd, F2 1 - a F`,

F = aF1 + (1 - a)F2 .

Now F1 is a discrete d .f., F2 is a continuous d .f., and F is exhibited as aconvex combination of them. If Fc - 0, then F is discrete and we set a = 1,F1 - F, F2 - 0; if Fd - 0, then F is continuous and we set a = 0, F1 - 0,

Page 24: A Course in Probability Theory KL Chung

10 1 DISTRIBUTION FUNCTION

F2 - F ; in either extreme case (5) remains valid . We may now summarizethe two theorems above as follows .

Theorem 1.2.3 . Every d.f. can be written as the convex combination of adiscrete and a continuous one . Such a decomposition is unique .

EXERCISES

1. Let F be a d.f. Then for each x,

lim[F(x+E)-F(x-E)]=0C40

unless x is a point of jump of F, in which case the limit is equal to the sizeof the jump .

*2. Let F be a d.f. with points of jump {aj } . Prove that the sum

[F(aj ) - F(aj -)]X-E<ai <X

converges to zero as e 4. 0, for every x . What if the summation above isextended to x - E < aj < x instead? Give another proof of the continuity ofF, in Theorem 1 .2.1 by using this problem .

3. A plausible verbal definition of a discrete d .f. may be given thus : "Itis a d.f. that has jumps and is constant between jumps ." [Such a function issometimes called a "step function", though the meaning of this term does notseem to be well established.] What is wrong with this? But suppose that the setof points of jump is "discrete" in the Euclidean topology, then the definitionis valid (apart from our convention of right continuity) .

4. For a general increasing function F there is a similar decompositionF = F, + Fd, where both F, and Fd are increasing, F, is continuous, andFd is "purely jumping" . [HINT : Let a be a point of continuity, put Fd(a) _F(a), add jumps in (a, oc) and subtract jumps in (-oc, a) to define Fd . Cf.Exercise 2 in Sec . 1 .1 .]

5. Theorem 1 .2 .2 can be generalized to any bounded increasing function .More generally, let f be the difference of two bounded increasing functions on(-oc, +oc); such a function is said to be of bounded variation there. Defineits purely discontinuous and continuous parts and prove the correspondingdecomposition theorem .

*6. A point x is said to belong to the support of the d .f. F iff for everyE > 0 we have F(x + E) - F(x -,e) > 0 . The set of all such x is called thesupport of F . Show that each point of jump belongs to the support, and thateach isolated point of the support is a point of jump . Give an example of adiscrete d .f. whose support is the whole line .

Page 25: A Course in Probability Theory KL Chung

1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 11

7. Prove that the support of any d .f. is a closed set, and the support ofany continuous d.f. is a perfect set .

1 .3 Absolutely continuous and singular distributions

Further analysis of d .f.'s requires the theory of Lebesgue measure. Throughoutthe book this measure will be denoted by m; "almost everywhere" on the realline without qualification will refer to it and be abbreviated to "a.e." ; anintegral written in the form f . . . dt is a Lebesgue integral ; a function f issaid to be "integrable" in (a, b) iff

b

f (t) dtG

is defined and finite [this entails, of course, that f be Lebesgue measurable] .The class of such functions will be denoted by L 1 (a, b), and Ll (-oo, oo) isabbreviated to L 1 . The complement of a subset S of an understood "space"such as (-oo, +oo) will be denoted by S` .

DEFINITION . A function F is called absolutely continuous [in (-oc, oo)and with respect to the Lebesgue measure] iff there exists a function f in L1such that we have for every x < x' :

X/

(1)

F(x) - F(x) _f

f (t) dt .JX

It follows from a well-known proposition (see, e.g ., Natanson [3]*) that sucha function F has a derivative equal to f a.e. In particular, if F is a d .f., then

(2)

f > 0 a.e . andf~

f (t) dt = 1 .

Conversely, given any f in L 1 satisfying the conditions in (2), the function Fdefined by a(3)

dx: F(x) _J

f (t) dt

is easily seen to be a d .f. that is absolutely continuous .

DEFINITION . A function F is called singular iff it is not identically zeroand F' (exists and) equals zero a .e .

* Numbers in brackets refer to the General Bibliography .

Page 26: A Course in Probability Theory KL Chung

12 ( DISTRIBUTION FUNCTION

The next theorem summarizes some basic facts of real function theory ;see, e .g ., Natanson [3] .

Theorem 1.3.1 . Let F be bounded increasing with F(-oo) = 0, and let F'denote its derivative wherever existing . Then the following assertions are true .

(a) If S denotes the set of all x for which F'(x) exists with 0 < F(x) <oo, then m(S`) = 0 .

(b) This F' belongs to L', and we have for every x < x':

X(4)

fF'(t) dt < F(x') - F(x) .

X

(c) If we put

xVx: F,,(x) =f

F'(t) dt, F,(x) = F(x) - F., (x),00

then Fay = F' a.e . so that Fs = F' - FQc = 0 a.e . and consequently FS issingular if it is not identically zero .

DEFINITION . Any positive function f that is equal to F' a.e. is called adensity of F. FQc is called the absolutely continuous part, F s the singular partof F. Note that the previous Fd is part of FS as defined here .

It is clear that Fogy is increasing and Fogy < F. From (4) it follows that ifx < x'

SFs (x') - Fs(x) = F (x') - F (x) -

If(t)dt>0 .

Hence FS is also increasing and Fs < F. We are now in a position to announcethe following result, which is a refinement of Theorem 1 .2 .3 .

Theorem 1.3 .2 . Every d.f. F can be written as the convex combination ofa discrete, a singular continuous, and an absolutely continuous d .f. Such adecomposition is unique .

EXERCISES

1 . A d.f . F is singular if and only if F = Fs ; it is absolutely continuousif and only if F - FQc .

2. Prove Theorem 1 .3 .2 .*3. If the support of a d .f. (see Exercise 6 of Sec . 1 .2) is of measure zero,

then F is singular. The converse is false .

(5)

Page 27: A Course in Probability Theory KL Chung

1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 13

*4. Suppose that F is a d.f. and (3) holds with a continuous f . ThenF' = f > 0 everywhere .

5. Under the conditions in the preceding exercise, the support of F is theclosure of the set {t I f (t) > 0} ; the complement of the support is the interiorof the set {t I f(t) = 0) .

6. Prove that a discrete distribution is singular . [Cf. Exercise 13 ofSec . 2 .2.]

7. Prove that a singular function as defined here is (Lebesgue) measurablebut need not be of bounded variation even locally . [FuNT : Such a function iscontinuous except on a set of Lebesgue measure zero ; use the completenessof the Lebesgue measure .]

The remainder of this section is devoted to the construction of a singularcontinuous distribution . For this purpose let us recall the construction of theCantor (ternary) set (see, e.g ., Natanson [3]) . From the closed interval [0,1],the "middle third" open interval (3 , 3 ) is removed; from each of the tworemaining disjoint closed intervals the middle third, (9 , 9 ) and (9 , 9 ), respec-tively, are removed and so on . After n steps, we have removed

1+2+ . . .+2n-1=2'i-1

disjoint open intervals and are left with 2' disjoint closed intervals each oflength 1/3 n . Let these removed ones, in order of position from left to right,be denoted by Jn,k, 1 < k < 2" - 1, and their union by U,, . We have

1

2

4

2n-1

2 nm(U„)=3+32+33+

. . . } 3n =1_(2

As n T oo, Un increases to an open set U; the complement C of U withrespect to [0,I] is a perfect set, called the Cantor set. It is of measure zerosince

m(C)=1-m(U)=1-1=0 .

Now for each n and k, n > 1, 1 < k < 2n - 1, we put

kCn .k = Zn ;

and define a function F on U as follows :

(7)

F(x) = C,,,k for X E J,l,k .

This definition is consistent since two intervals, J,,,k and J„',k, are eitherdisjoint or identical, and in the latter case so are cn ,k = cn ',k- . The last assertion

Page 28: A Course in Probability Theory KL Chung

14 1 DISTRIBUTION FUNCTION

becomes obvious if we proceed step by step and observe that

Jn+1,2k = Jn,k, Cn+1,2k = Cn,k for 1 < k < 2n - 1 .

The value of F is constant on each Jn , k and is strictly greater on any otherJn ', k ' situated to the right of Jn , k . Thus F is increasing and clearly we have

lim F(x) = 0, lim F(x) = 1 .xyo

xt1

Let us complete the definition of F by setting

F(x) = 0 for x < 0, F(x) = 1 for x > 1 .

F is now defined on the domain D = (- oo, 0) U U U (1, oo) and increasingthere. Since each Jn , k is at a distance > 1/3n from any distinct Jn,k' and thetotal variation of F over each of the 2' disjoint intervals that remain afterremoving Jn , k , 1 < k < 2n - 1, is 1/2n, it follows that

0<x'-x< 11 =0<F(x')-F(x)< 1 .- 2n

Hence F is uniformly continuous on D. By Exercise 5 of Sec . 1 .1, there existsa continuous increasing F on (-oo, +oo) that coincides with F on D. ThisF is a continuous d.f. that is constant on each J,,k . It follows that F' = 0on U and so also on (-oo, +oo) - C. Thus k is singular . Alternatively, itis clear that none of the points in D is in the support of F, hence the latteris contained in C and of measure 0, so that F is singular by Exercise 3above. [In Exercise 13 of Sec . 2 .2, it will become obvious that the measurecorresponding to k is singular because there is no mass in U .]

EXERCISES

The F in these exercises is the k defined above .8. Prove that the support of F is exactly C .

* 9. It is well known that any point x in C has a ternary expansion withoutthe digit 1 :

Prove that for this x we have

Oilx=~

an = 0 or 2.31111=1

00

F(x) = >an

211+1n=1

Page 29: A Course in Probability Theory KL Chung

1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 15

10. For each x c [0, 1 ], we have

2F ( 3 ) = F(x), 2F(3 + 3 - 1 = F(x).

11 . Calculate1

I

f

1x2 dF(x), f

1e" dF(x) .

0

0

[H NT : This can be done directly or by using Exercise 10; for a third methodsee Exercise 9 of Sec . 5.3 .]

12. Extend the function F on [0,1 ] trivially to (-oc, oo). Let {rn } be anenumeration of the rationals and

G(x) =

2nF(rn + x) .

n=1

Show that G is a d .f. that is strictly increasing for all x and singular . Thus wehave a singular d .f. with support (-oo, oo) .

*13. Consider F on [0,I] . Modify its inverse F-1 suitably to make itsingle-valued in [0,1] . Show that F-1 so modified is a discrete d.f. and findits points of jump and their sizes .

14. Given any closed set C in (-oo, +oo), there exists a d .f. whosesupport is exactly C . [xmJT: Such a problem becomes easier when the corre-sponding measure is considered ; see Sec . 2.2 below.]

*15. The Cantor d .f. F is a good building block of "pathological"examples. For example, let H be the inverse of the homeomorphic map of [0,1 ]onto itself : x ---> 2[F +x] ; and E a subset of [0,1] which is not Lebesguemeasurable. Show that

1H(E)°H = lE

O0 1

where H(E) is the image of E, 1B is the indicator function of B, and o denotesthe composition of functions . Hence deduce: (1) a Lebesgue measurable func-tion of a strictly increasing and continuous function need not be Lebesguemeasurable ; (2) there exists a Lebesgue measurable function that is not Borelmeasurable .

Page 30: A Course in Probability Theory KL Chung

Measure theory

2.1 Classes of sets

Let Q be an "abstract space", namely a nonempty set of elements to becalled "points" and denoted generically by co. Some of the usual opera-tions and relations between sets, together with the usual notation, are givenbelow .

Containing (for subsets of Q as well as for collections thereof) :

E C F,

F D E

(not excluding E = F)

c/ C A,

-~3 D ,cY

(not excluding <c/ = A)

Union

: E U F, U E/111

Intersection

: E f1 F, n E,

ComplementDifferenceSymmetric differenceSingleton

'I

E` = Q\EE\F=Ef F`E A F = (E\F) U (F\E){cv}

Page 31: A Course in Probability Theory KL Chung

Belonging (for elements as well as for sets) :

wEE,

EESI

Empty set: 0

The reader is supposed to be familiar with the elementary properties of theseoperations .

A nonempty collection SV of subsets of 0 may have certain "closureproperties" . Let us list some of those used below ; note that j is always anindex for a countable set and that commas as well as semicolons are used todenote "conjunctions" of premises .

(i) E E C? = E` E S1 .

(ii) E I E sl, E2 E 9? = E l U E2 e si.

(iii) E 1 E Sa(, E2 E S1 = E 1 n E2 E .w? .

(iv) Vn > 2 : E i E a, 1 < j < n

U=I Ej E'.

(v) Vn > 2 : Ej E a, 1 < j < n

I Ii 1Ej E s .

(vi) Ej E ; Ej C Ej+ 1, 1 < j < oc > U0 I Ej E Sad .

(vii) Ei E a ; Ej D Ej+ 1, 1 -<j < oc => I Ijo I E,; E a.(viii) Ej E SYW, 1 < j < oo = U~° I Ej E S1.

(ix) Ej E w/, l <-j < oo = i f j=I Ej E .1.

(x) E1 E cc!, E2 E S/, E1 C E2 = E2\E1 E SV.

It follows from simple set algebra that under (i) : (ii) and (iii) are equiv-alent; (vi) and (vii) are equivalent ; (viii) and (ix) are equivalent . Also, (ii)implies (iv) and (iii) implies (v) by induction . It is trivial that (viii) implies(ii) and (vi) ; (ix) implies (iii) and (vii) .

DEFINITION . A nonempty collection 3of subsets of c2 is called a field iff(1) and (ii) hold . It is called a monotone class (M.C.) iff (vi) and (vii) hold . Itis called a Borel field (B .F.) iff (i) and (viii) hold.

Theorem 2.1.1 . A field is a B .F. if and only if it is also an M.C .

PROOF . The "only if" part is trivial ; to prove the "if" part we show that(iv) and (vi) imply (viii) . Let E1 E c/ for 1 < j < oo, then

2.1 CLASSES OF SETS 1 17

Page 32: A Course in Probability Theory KL Chung

18 1 MEASURE THEORY

by (iv), which holds in a field, Fn C F,,+1 and

00

00

U Ej = U Fj ;j=1

j=1

hence U~_ 1 Ej E cJ by (vi) .

The collection J of all subsets of S2 is a B .F. called the total B.F. ; thecollection of the two sets {o, Q} is a B .F. called the trivial B.F. If A is anyindex set and if for every a E A, ~Ta is a B .F. (or M.C.) then the intersectionn ,,,,-A `~a of all these B .F.'s (or M.C .'s), namely the collection of sets eachof which belongs to all Via , is also a B .F. (or M.C.) . Given any nonemptycollection 6 of sets, there is a minimal B.F. (or field, or M.C.) containing it ;this is just the intersection of all B .F.'s (or fields, or M.C.'s) containing -E,of which there is at least one, namely the J mentioned above . This minimalB .F. (or field, or M.C .) is also said to be generated by 6. In particular ifis a field there is a minimal B .F. (or M.C .) containing A.

Theorem 2.1 .2 . Let 3~ be a field, 6 the minimal M .C . containing ~)/b, ~ theminimal B .F. containing A, then 3 = 6 .

PROOF . Since a B .F. is an M.C., we have ~T D 6. To prove Z c 6 it issufficient to show that { is a B .F. Hence by Theorem 2.1 .1 it is sufficient toshow that 6 is a field . We shall show that it is closed under intersection andcomplementation. Define two classes of subsets of 6 as follows :

C,={EE S' :EnFE6~'forallFE },

(2 ={EE~' :EnFE6 for allFE6} .

The identities

Fn UEj = U(FnEj)(j=1

j=1

00

00

Fn nEj = n(FnEj)j=1

j=1

show that both (i'1 and (2 are M.C.'s . SinceT is closed under intersection andcontained in 6, it is clear that - C 1 . Hence f,C (i by the minimality of .6,

n

F,1 =UEj E .C/j=1

Page 33: A Course in Probability Theory KL Chung

(

00c

nEjj=1

00

UEjj=1

2.1 CLASSES OF SETS 1 19

and so = 6„71 . This means for any F E J~ and E E ~' we have F fl E E ',which in turn means A C (2 . Hence {' = f2 and this means f, is closed underintersection .

Next, define another class of subsets of §' as follows :

63={EEC : E` E9}

The (DeMorgan) identities

(

00

U Ej) c

00

= I I Ejj=1

j=1

show that e3 is a M.C . Since lb c e3, it follows as before that ' = - E3, whichmeans ' is closed under complementation . The proof is complete .

Corollary . Let 3~ be a field, 07 the minimal B .F. containing A ; -F a classof sets containing A and having the closure properties (vi) and (vii), then -Ccontains ~T .

The theorem above is one of a type called monotone class theorems . Theyare among the most useful tools of measure theory, and serve to extend certainrelations which are easily verified for a special class of sets or functions to alarger class. Many versions of such theorems are known ; see Exercise 10, 11,and 12 below .

EXERCISES

*1 . (UjAj)\(UjB1) C Uj(Aj\Bj) . (f jAj)\(fljBj) C Uj (Aj\Bj) . Whenis there equality?

*2. The best way to define the symmetric difference is through indicatorsof sets as follows :

lAOB = lA + 1B

(mod 2)

where we have arithmetical addition modulo 2 on the right side . All propertiesof o follow easily from this definition, some of which are rather tedious toverify otherwise . As examples :

(A AB) AC=AA(BAC),

(A AB)o(BAC)=AAC,

Page 34: A Course in Probability Theory KL Chung

20 1 MEASURE THEORY

(A AB) A(C AD) = (A A C) A(B AD),

AAB=C' A =BAC,

AAB=CAD4 AAC=BAD .

3. If S2 has exactly n points, then J has 2" members. The B .F. generatedby n given sets "without relations among them" has 22" members .

4. If Q is countable, then J is generated by the singletons, andconversely . [HNT : All countable subsets of S2 and their complements forma B .F.]

5. The intersection of any collection of B .F.'s

a E A) is the maximalB .F. contained in all of them ; it is indifferently denoted by I IaEA ~a or AaE a .

*6. The union of a countable collection of B .F.'s {5~ } such that :~Tj C z~+rneed not be a B .F ., but there is a minimal B .F. containing all of them, denotedby vj ; 7~ . In general V A denotes the minimal B .F. containing all Oa, a E A .[HINT : Q = the set of positive integers; 9Xj = the B .F. generated by those upto j .]

7. A B.F. is said to be countably generated iff it is generated by a count-able collection of sets . Prove that if each j is countably generated, then sois v~__ 1 .

*8. Let `T be a B .F. generated by an arbitrary collection of sets {Ea , a EA). Prove that for each E E ~T, there exists a countable subcollection {Eaj , j >1 } (depending on E) such that E belongs already to the B .F. generated by thissubcollection . [Hrwr : Consider the class of all sets with the asserted propertyand show that it is a B .F. containing each Ea .]

9. If :T is a B .F. generated by a countable collection of disjoint sets{A„ }, such that U„ A„ = Q, then each member of JT is just the union of acountable subcollection of these A,, 's .

10 . Let 2 be a class of subsets of S2 having the closure property (iii) ;let .,/ be a class of sets containing Q as well as , and having the closureproperties (vi) and (x) . Then ~/ contains the B .F. generated by !7 . (This isDynkin's form of a monotone class theorem which is expedient for certainapplications . The proof proceeds as in Theorem 2 .1 .2 by replacing ~'3 and -6-with 9 and .~/ respectively .)

11 . Take Q = .:;4" or a separable metric space in Exercise 10 and let 7be the class of all open sets . Let ;XF be a class of real-valued functions on Qsatisfying the following conditions .

(a) 1 E '' and l D E ~Y' for each D E 1;(b) dY' is 'a vector space, namely : if f i E

f2 E XK and c 1 , c2 are anytwo real constants, then c1 f 1 + c2f2 E X';

Page 35: A Course in Probability Theory KL Chung

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 21

(c) )K is closed with respect to increasing limits of positive functions,namely: if f, E', 0 < f, < fi+1 for all n, and f =limn T fn <oo, then f E&"( .

Then aW contains all Borel measurable functions on Q, namely all finite-valued functions measurable with respect to the topological Borel field (= theminimal B .F. containing all open sets of Q) . [HINT : let C- = {E C Q : l E E ~f'} ;apply Exercise 10 to show that contains the B .F. just defined. Each positiveBorel measurable function is the limit of an increasing sequence of simple(finitely-valued) functions .]

12 . Let e be a M.C. of subsets of ?n (or a separable metric space)containing all the open sets and closed sets . Prove that -E D An (the topologicalBorel field defined in Exercise 11) . [HINT : Show that the minimal such classis a field .]

2 .2 Probability measures and their distributionfunctions

Let Q be a space, a B .F. of subsets of Q . A probability measure g'( •) on eis a numerically valued set function with domain 3T, satisfying the followingaxioms :

(i) YE E Z : Z(E) > 0 .(ii) If {Ej } is a countable collection of (pairwise) disjoint sets in zT, then

(iii) .' 2) = 1 .

(UEi)=I J

9)(E1).

The abbreviation "p.m." will be used for "probability measure" .These axioms imply the following consequences, where all sets are

members of 3T.

(iv) °J'(E) < 1 .(v) 91(o) = 0 .(vi) l7)(E`) = 1 - i~P(E) .(vii) ~P(E U F) + 9'(E fl F) = ,T(E) + GJ'(F) .(viii) E C F = P(E) = GJ'(F) - A(F\E) < J'(F) .(ix) Monotone property . E, T E or E„ ,4, E

(E,) -+ ° (E) .(x) Boole's inequality . J'(Uj Ej ) < Ej ~~(Ej ) .

Page 36: A Course in Probability Theory KL Chung

22 1 MEASURE THEORY

Axiom (ii) is called "countable additivity" ; the corresponding axiomrestricted to a finite collection {Ej) is called "finite additivity" .

The following proposition

(1)

En ., 0 =#-

) -~- 0

is called the "axiom of continuity" . It is a particular case of the monotoneproperty (x) above, which may be deduced from it or proved in the same wayas indicated below .

Theorem 2.2.1 . The axioms of finite additivity and of continuity togetherare equivalent to the axiom of countable additivity .

PROOF. Let E„ ~ . We have the obvious identity :00

00

En = U (Ek\Ek+1) U n Ek.k=n

k=1

If E, 4, 0, the last term is the empty set . Hence if (ii) is assumed, we have

00Vn > 1 : J (En ) _

-,~)"(Ek\Ek+1)~k=n

the series being convergent, we have limn,,,,-,"P(En) = 0 . Hence (1) is true .Conversely, let {Ek , k > 1) be pairwise disjoint, then

00

U Ek ~ok=n+1

(why?) and consequently, if (1) is true, then

00lim :~, U Ek = 0 .

)t -4 00(k=n+1

Now if finite additivity is assumed, we haveC,

~'

=Ono+1U Ek = ?P U Ek + JP U Ek

\k=1

k=1

k=n+1

1

00

?11(Ek) + ~) U Ek -k=1

k=n+1

This shows that the infinite series Ek° 1 J'(Ek) converges as it is bounded bythe first member above . Letting n -+ oo, we obtain

Page 37: A Course in Probability Theory KL Chung

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 23

Hence (ii) is true .

Remark. For a later application (Theorem 3 .3 .4) we note the followingextension. Let J be defined on a field :l which is finitely additive and satis-fies axioms (i), (iii), and (1) . Then (ii) holds whenever Uk E k E T, . For then

Uk_r,+1 Ek also belongs to , and the second part of the proof above remainsvalid .

The triple (S2, 97, °J') is called a probability space (triple) ; S2 alone iscalled the sample space, and co is then a sample point.

Let A c S2, then the trace of the B .F. ~7 on A is the collection of all setsof the form A n F, where F E 3~7 . It is easy to see that this is a B .F. of subsetsof A, and we shall denote it by A n ~F . Suppose A E 9 and 9"(A) > 0; thenwe may define the set function J'o on A n J~ as follows :

?P(E)dE E A n J :

?PA (E) = JP(Q) .

It is easy to see that GPA is a p.m. on A n J~ . The triple (A, A n ~T , ~To) willbe called the trace of (S2, J , 7') on A .

Example 1 . Let Q be a countable set : Q = (U)j, j E J}, where J is a countable indexset, and let :' be the total B .F. of S2 . Choose any sequence of numbers { p j , j E J)satisfying

x

U Ek = limn-+oc

k=1

00

k=1

n

k=1

J'(Ek ) •

?/'(Ek) -f- 11m11-+00

U Ek=n+1

(2)

YjEJ :pj>_0 ;

and define a set function :~ on

as follows :

(3) `dE E ' : '~'(E) =

jEJp

WjEEp

= l ;

In words, we assign p j as the value of the "probability" of the singleton {wj }, andfor an arbitrary set of wj's we assign as its probability the sum of all the probabilitiesassigned to its elements. Clearly axioms (i), (ii), and (iii) are satisfied . Hence ~/, sodefined is a p.m .

Conversely, let any such ?T be given on ",T . Since (a)j} E :-'7 for every j, :~,,)({wj})is defined, let its value be pj . Then (2) is satisfied. We have thus exhibited all the

Page 38: A Course in Probability Theory KL Chung

24 1 MEASURE THEORY

possible p.m.'s on Q, or rather on the pair (S2, J) ; this will be called a discrete samplespace . The entire first volume of Feller's well-known book [13] with all its rich contentis based on just such spaces .

Example 2. Let

(0, 1 ], C the collection of intervals :

C'={(a,b] :0<a<b<1} ;

73 the minimal B.F. containing

m the Borel-Lebesgue measure on 3 . ThenA, m) is a probability space .Let Jo be the collection of subsets of °l/ each of which is the union of a finite

number of members of e. Thus a typical set B in ~0o is of the formn

B = U (aj ,b i ]

where a, < b 1 < a2 < b2 < . . . < a„ < bn,j=1

It is easily seen that 3o is a field that is generated by -P and in turn generates ~ .If we take ?/ = [0, 1] instead, then Ao is no longer a field since 2( 0 _'010, but

A and m may be defined as before. The new . is generated by the old _` 14 and thesingleton {0} .

Example 3 . Let 3' = (-oo, +oo), ' the collection of intervals of the form (a, b] .-oo < a < b < + oo. The field ?Zo generated by i' consists of finite unions of disjointsets of the form (a, b], (-oo, a] or (b, oo). The Euclidean B .F . .1 on 9.1 is the B .F .generated by or 7jo . A set in 7A' will be called a (linear) Borel set when there is nodanger of ambiguity . However, the Borel-Lebesgue measure m on R1 is not a p.m . ;indeed m(7') = +oo so that m is not a finite measure but it is a-finite on A0, namely :there exists a sequence of sets E„ E 730, E n f 91 with m(E n ) < oo for each n .

EXERCISES

1 . For any countably infinite set Q, the collection of its finite subsetsand their complements forms a field ~ . If we define OP(E) on ~ to be 0 or 1according as E is finite or not, then 9' is finitely additive but not countably so .

*2. Let Q be the space of natural numbers. For each E C Q let Nn (E)be the cardinality of the set E n [0, n] and let e' be the collection of E's forwhich the following limit exists :

;~'(E) = limN„ (E)

n oo

n

?/1 is finitely additive on (," and is called the "asymptotic density" of E. Let E _{all odd integers), F = {all odd integers in .[22n 22n+1] and all even integersin [22n+]22n+2] for n > 0} . Show that E E , F E -E, but E n F { . Hence

is not a field .

Page 39: A Course in Probability Theory KL Chung

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 25

3. In the preceding example show that for each real number a in [0, 1 ]there is an E in -C such that ^E) = a . Is the set of all primes in f ? Givean example of E that is not in -r .

4. Prove the nonexistence of a p.m. on (Q, ./'), where (S2, J) is as inExample 1, such that the probability of each singleton has the same value .Hence criticize a sentence such as : "Choose an integer at random" .

5. Prove that the trace of a B .F. 3A on any subset A of Q is a B .F .Prove that the trace of (S2, ~T, ,,~P) on any A in ~ is a probability space, ifP7(A) > 0 .

*6. Now let A 0 3 be such that

ACFE~'F= P(F)=1 .

Such a set is called thick in (S2, ~, 91) . If E = A f1 F, F E ~~7 , define ~P* (E) =~P(F) . Then GJ'* is a well-defined (what does it mean?) p .m. on (A, A l 3).This procedure is called the adjunction of A to (Q, ZT,

7. The B .F . A1 on ~1 is also generated by the class of all open intervalsor all closed intervals, or all half-lines of the form (-oo, a] or (a, oo), or theseintervals with rational endpoints . But it is not generated by all the singletonsof J l nor by any finite collection of subsets of R1 .

8. A1 contains every singleton, countable set, open set, closed set, Gsset, Fa set. (For the last two kinds of sets see, e.g ., Natanson [3] .)

* 9. Let 6 be a countable collection of pairwise disjoint subsets {Ej , j > 1 }of R1 , and let ~ be the B .F. generated by ,. Determine the most generalp.m. on zT- and show that the resulting probability space is "isomorphic" tothat discussed in Example 1 .

10. Instead of requiring that the Ed 's be pairwise disjoint, we may makethe broader assumption that each of them intersects only a finite number inthe collection . Carry through the rest of the problem .

The question of probability measures on J31 is closely related to thetheory of distribution functions studied in Chapter 1 . There is in fact a one-to-one correspondence between the set functions on the one hand, and the pointfunctions on the other . Both points of view are useful in probability theory .We establish first the easier half of this correspondence .

Lemma. Each p.m. ,a on J3 1 determines a d .f. F through the correspondence

(4)

Vx E ,W 1 :

x]) = F(x) .

Page 40: A Course in Probability Theory KL Chung

26 1 MEASURE THEORY

As a consequence, we have for -oo < a < b < +oo :

µ((a, b]) = F(b) - F(a),

(5)

µ((a, b)) = F(b-) - F(a),

IL([a, b)) = F(b-) - F(a-),

µ([a, b]) = F(b) - F(a-) .

Furthermore, let D be any dense subset of R.1 , then the correspondence isalready determined by that in (4) restricted to x E D, or by any of the fourrelations in (5) when a and b are both restricted to D .

PROOF . Let us write

dx E ( 1 : IX = (- 00, x] .

Then IX E A1 so that µ(Ix) is defined ; call it F(x) and so define the functionF on R 1 . We shall show that F is a d .f. as defined in Chapter 1 . First of all, Fis increasing by property (viii) of the measure . Next, if x, 4, x, then IX, 4 Ix,hence we have by (ix)

(6)

F(x,,) = µ(Ix„) 4 h(Ix) = F (x) .

Hence F is right continuous . [The reader should ascertain what changes shouldbe made if we had defined F to be left continuous .] Similarly as x ,[ -oo, Ix ,-0; as x T +oc, Ix T J? 1 . Hence it follows from (ix) again that

lim F(x) = lim µ(Ix ) = µ(Q1) = 0 ;x~-oo

x,j-00

x11n F(x) = xlim /-t (Ix ) = µ(S2) = 1 .

This ends the verification that F is a d .f. The relations in (5) follow easilyfrom the following complement to (4) :

g (( - oc, x)) = F(x-).

To see this let x„ < x and x„ T x. Since IX „ T (-oc, x), we have by (ix) :

F(x-) = lim F(x„) =

x,,)) T µ((-oo, x)) .11 -). 00

To prove the last sentence in the theorem we show first that (4) restrictedto x E D implies (4) unrestricted . For this purpose we note that µ((-oc, x]),as well as F(x), is right continuous as a function of x, as shown in (6) .Hence the two members of the equation in (4), being both right continuousfunctions of x and coinciding on a dense set, must coincide everywhere . Now

Page 41: A Course in Probability Theory KL Chung

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 27

suppose, for example, the second relation in (5) holds for rational a and b . Foreach real x let an , b„ be rational such that a„ ,(, -oc and b„ > x, b,, ,4, x. Thenµ((a,,, b„ )) -- µ((-oo, x]) and F(b"-) - F(a") -* F(x) . Hence (4) follows .

Incidentally, the correspondence (4) "justifies" our previous assumptionthat F be right continuous, but what if we have assumed it to be left continuous?

Now we proceed to the second-half of the correspondence .

Theorem 2.2.2. Each d .f. F determines a p .m . µ on 0 through any one ofthe relations given in (5), or alternatively through (4) .

This is the classical theory of Lebesgue-Stieltjes measure ; see, e.g .,Halmos [4] or Royden [5] . However, we shall sketch the basic ideas as animportant review. The d.f. F being given, we may define a set function forintervals of the form (a, b] by means of the first relation in (5) . Such a functionis seen to be countably additive on its domain of definition . (What does thismean?) Now we proceed to extend its domain of definition while preservingthis additivity. If S is a countable union of such intervals which are disjoint :

S = U(ai, bl ]

we are forced to define µ(S), if at all, by

µ(S) =

µ((ay, bi]) _

{F(bt) - F(ai)} .i

But a set S may be representable in the form above in different ways, sowe must check that this definition leads to no contradiction: namely that itdepends really only on the set S and not on the representation . Next, we noticethat any open interval (a, b) is in the extended domain (why?) and indeed theextended definition agrees with the second relation in (5) . Now it is wellknown that any open set U in JR1 is the union of a countable collection ofdisjoint open intervals [there is no exact analogue of this in R " for n > 1], sayU = U; (c; , d,); and this representation is unique . Hence again we are forcedto define µ(U), if at all, by

Lt(U) =

µ((ci, d,)) =

i

{F(d, - ) - F(c,)} .

Having thus defined the measure for all open sets, we find that its values forall closed sets are thereby also determined by property (vi) of a probabilitymeasure . In particular, its value for each singleton {a} is determined to beF(a) - F(a-), which is nothing but the jump of F at a. Now we also know itsvalue on all countable sets, and so on -all this provided that no contradiction

Page 42: A Course in Probability Theory KL Chung

28 1 MEASURE THEORY

is ever forced on us so far as we have gone . But even with the class of openand closed sets we are still far from the B .F . :- 9 . The next step will be the G8sets and the FQ sets, and there already the picture is not so clear . Althoughit has been shown to be possible to proceed this way by transfinite induction,this is a rather difficult task . There is a more efficient way to reach the goalvia the notions of outer and inner measures as follows . For any subset S ofj)l consider the two numbers :

µ*(S) =

inf µ(U),U open, UDS

A* (S) =

sup WC) .C closed, CcS

A* is the outer measure, µ* the inner measure (both with respect to the givenF) . It is clear that g* (S) > µ* (S) . Equality does not in general hold, but whenit does, we call S "measurable" (with respect to F) . In this case the commonvalue will be denoted by µ(S) . This new definition requires us at once tocheck that it agrees with the old one for all the sets for which p. has alreadybeen defined. The next task is to prove that : (a) the class of all measurablesets forms a B.F., say i; (b) on this I, the function ,a is a p .m. Details ofthese proofs are to be found in the references given above . To finish: since' is a B .F ., and it contains all intervals of the form (a, b], it contains theminimal B .F. A i with this property. It may be larger than A i , indeed it is(see below), but this causes no harm, for the restriction of µ to 3i is a p.m .whose existence is asserted in Theorem 2 .2 .2 .

Let us mention that the introduction of both the outer and inner measuresis useful for approximations . It follows, for example, that for each measurableset S and E > 0, there exists an open set U and a closed set C such thatUDS3C and

(7)

AM- E < /.c(S) < µ(C) + E .

There is an alternative way of defining measurability through the use of theouter measure alone and based on Caratheodory's criterion .

It should also be remarked that the construction described above for(J,i J3 1 µ) is that of a "topological measure space", where the B .F. is gener-ated by the open sets of a given topology on 9 1 , here the usual Euclideanone. In the general case of an "algebraic measure space", in which there is notopological structure, the role of the open sets is taken by an arbitrary field A,and a measure given on Z)b

be extended to the minimal B .F . ?~T containingin a similar way. In the case of M. l , such an 33 is given by the field Z4o of

sets, each of which is the union of a finite number of intervals of the form (a,b], (-oo, b], or (a, oo), where a E oRl, b E Mi . Indeed the definition of the

Page 43: A Course in Probability Theory KL Chung

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 29

outer measure given above may be replaced by the equivalent one :

(8)

A* (E) = inf

a(Un ) .n

where the infimum is taken over all countable unions Un Un such that eachUn E Mo and Un Un D E. For another case where such a construction isrequired see Sec . 3 .3 below .

There is one more question : besides the µ discussed above is there anyother p.m. v that corresponds to the given F in the same way? It is importantto realize that this question is not answered by the preceding theorem . It isalso worthwhile to remark that any p .m. v that is defined on a domain strictlycontaining A1 and that coincides with µ on ?Z1 (such as the µ on f asmentioned above) will certainly correspond to F in the same way, and strictlyspeaking such a v is to be considered as distinct from u . Hence we shouldphrase the question more precisely by considering only p .m.'s on A1 . Thiswill be answered in full generality by the next theorem .

Theorem 2.2.3 . Let µ and v be two measures defined on the same B .F. 97,which is generated by the field A. If either µ or v is a-finite on A, andµ(E) = v(E) for every E E A, then the same is true for every E E 97, andthus µ = v .

PROOF. We give the proof only in the case where µ and v are both finite,leaving the rest as an exercise . Let

C = {E E 37: µ(E) = v(E)},

then C 3 by hypothesis . But e is also a monotone class, for if En E forevery n and En T E or En 4, E, then by the monotone property of µ and v,respectively,

µ(E) = lim µ(En ) = lim v(E„) = v(E) .n

n

It follows from Theorem 2 .1 .2 that -C D 97, which proves the theorem .

Remark . In order that µ and v coincide on A, it is sufficient that theycoincide on a collection ze~ such that finite disjoint unions of members ofconstitute A .

Corollary . Let µ and v be a-finite measures on A1 that agree on all intervalsof one of the eight kinds : (a, b], (a, b), [a, b), [a, b], (-oc, b], (-oo, b), [a, oc),(a, oc) or merely on those with the endpoints in a given dense set D, thenthey agree on A' .

Page 44: A Course in Probability Theory KL Chung

30 1 MEASURE THEORY

PROOF . In order to apply the theorem, we must verify that any of thehypotheses implies that µ and v agree on a field that generates J3 . Let us takeintervals of the first kind and consider the field z~ 713o defined above . If µ and vagree on such intervals, they must agree on Ao by countable additivity . Thisfinishes the proof .

Returning to Theorems 2 .2.1 and 2.2.2, we can now add the followingcomplement, of which the first part is trivial .

Theorem 2.2.4 . Given the p .m . µ on A1 , there is a unique d .f . F satisfying(4) . Conversely, given the d .f. F, there is a unique p .m. satisfying (4) or anyof the relations in (5) .

We shall simply call µ the p.m. of F, and F the d.f. of ic .Instead of (R1 , ^ we may consider its restriction to a fixed interval [a,

b] . Without loss of generality we may suppose this to be 9l = [0, 1] so that weare in the situation of Example 2 . We can either proceed analogously or reduceit to the case just discussed, as follows . Let F be a d .f. such that F = 0 for x <0 and F = 1 for x > 1 . The probability measure u of F will then have supportin [0, 1], since u((-oo, 0)) = 0 =,u((1, oo)) as a consequence of (4) . Thusthe trace of (9.1 , ?A3 1 , µ) on J« may be denoted simply by (9I, A, µ), whereA is the trace of -7,3 1 on 9/. Conversely, any p .m. on J3 may be regarded assuch a trace . The most interesting case is when F is the "uniform distribution"on 9,/ :

0

for x < 0,F(x)= x

for 0<x< 1,1

for x > 1 .

The corresponding measure m on ",A is the usual Borel measure on [0, 1 ],while its extension on J as described in Theorem 2.2.2 is the usual Lebesguemeasure there . It is well known that / is actually larger than °7; indeed (J, m)is the completion of (.7, m) to be discussed below .

DEFINITION . The probability space (Q,

is said to be complete iffany subset of a set in -T with l(F) = 0 also belongs to ~T .

Any probability space (Q, T , J/') can be completed according to the nexttheorem. Let us call a set in J7T with probability zero a null set. A propertythat holds except on a null set is said to hold almost everywhere (a.e .), almostsurely (a.s.), or for almost every w .

Theorem 2.2.5 . Given the probability space (Q, 3 , U~'), there exists acomplete space (S2, , %/') such that :~i C ?T and /) = J~ on r~ .

Page 45: A Course in Probability Theory KL Chung

(9)

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 31

PROOF . Let "1 " be the collection of sets that are subsets of null sets, andlet ~~7 be the collection of subsets of Q each of which differs from a set in z~;7by a subset of a null set . Precisely :

~T={ECQ:EAFEA"

for some FE~7} .

It is easy to verify, using Exercise 1 of Sec . 2.1, that ~~ is a B .F. Clearly itcontains 27. For each E E 7, we put

o'(E) = -,:;P(F'),

where F is any set that satisfies the condition indicated in (7) . To show thatthis definition does not depend on the choice of such an F, suppose that

E A F 1 E .1l',

E A F2 E A'.

Then by Exercise 2 of Sec. 2 .1,

(E A Fl) o(E o F2) = (F1 A F2) o(E o E) = F1 A F2 .

Hence F 1 A F2 E J" and so 2P(F 1 A F2 ) = 0. This implies 7)(F 1 ) = GJ'(F 2 ),as was to be shown. We leave it as an exercise to show that 0 1 is a measureon o . If E E 97, then E A E = 0 E .A', hence GJ'(E) = GJ'(E) .

Finally, it is easy to verify that if E E 3 and 9171(E) = 0, then E E -A'' .

Hence any subset of E also belongs to . A l^ and so to JT . This proves that(Q, ~, J') is complete .

What is the advantage of completion? Suppose that a certain property,such as the existence of a certain limit, is known to hold outside a certain setN with J'(N) = 0 . Then the exact set on which it fails to hold is a subsetof N, not necessarily in 97 , but will be in k with 7(N) = 0 . We need themeasurability of the exact exceptional set to facilitate certain dispositions, suchas defining or redefining a function on it; see Exercise 25 below .

EXERCISES

In the following, g is a p.m. on %3 1 and F is its d .f.*11. An atom of any measure g on A1 is a singleton {x} such that

g({x}) > 0. The number of atoms of any a-finite measure is countable . Foreach x we have g({x}) = F(x) - F(x-).

12. g is called atomic iff its value is zero on any set not containing anyatom. This is the case if and only if F is discrete . g is without any atom oratomless if and only if F is continuous .

13. g is called singular iff there exists a set Z with m(Z) = 0 suchthat g (Z`') = 0 . This is the case if and only if F is singular . [Hurt: One halfis proved by using Theorems 1 .3 .1 and 2 .1 .2 to get fB F(x) dx < g(B) forB E A 1 ; the other requires Vitali's covering theorem .]

Page 46: A Course in Probability Theory KL Chung

32 1 MEASURE THEORY

14. Translate Theorem 1 .3 .2 in terms of measures .* 15 . Translate the construction of a singular continuous d .f. in Sec. 1 .3 in

terms of measures . [It becomes clearer and easier to describe!] Generalize theconstruction by replacing the Cantor set with any perfect set of Borel measurezero . What if the latter has positive measure? Describe a probability schemeto realize this .

16. Show by a trivial example that Theorem 2 .2.3 becomes false if thefield ~~ is replaced by an arbitrary collection that generates 27 .

17. Show that Theorem 2 .2.3 may be false for measures that are a-finiteon ~T . [HINT: Take Q to be 11, 2, . . . , oo} and ~~ to be the finite sets excludingoc and their complements, µ(E) = number of points in E, ,u,(oo) :A v(oo) .]

18. Show that the Z in (9) is also the collection of sets of the formF U N [or F\N] where F E and N E . 1' .

19. Let -1 - be as in the proof of Theorem 2 .2 .5 and Ao be the set of allnull sets in (Q, ZT, ~?). Then both these collections are monotone classes, andclosed with respect to the operation "\" .

*20. Let (Q, 37, 9') be a probability space and 91 a Borel subfield oftJ . Prove that there exists a minimal B .F. ~ satisfying 91 C S c ~ and1o c 3~, where J6 is as in Exercise 19 . A set E belongs to ~ if and only

if there exists a set F in 1 such that EAF E A0 . This O~ is called theaugmentation of j with respect to (Q, 97,9).

21. Suppose that F has all the defining properties of a d .f. except that itis not assumed to be right continuous . Show that Theorem 2.2 .2 and Lemmaremain valid with F replaced by F, provided that we replace F(x), F(b), F(a)in (4) and (5) by F(x+), F(b+), F(a+), respectively . What modification isnecessary in Theorem 2.2.4?

22. For an arbitrary measure ?/-' on a B .F. T, a set E in 3 is called anatom of J7' iff .(E) > 0 and F C E, F E T imply q'(F) = 9(E) or 9'(F) =0 . JT is called atomic iff its value is zero over any set in 3 that is disjointfrom all the atoms . Prove that for a measure µ on ?/3 1 this new definition isequivalent to that given in Exercise 11 above provided we identify two setswhich differ by a JI-null set .

23. Prove that if the p.m . JT is atomless, then given any a in [0, 1]there exists a set E E zT with JT(E) = a. [HINT : Prove first that there exists Ewith "arbitrarily small" probability . A quick proof then follows from Zorn'slemma by considering a maximal collection of disjoint sets, the sum of whoseprobabilities does not exceed a . But an elementary proof without using anymaximality principle is also possible .]

*24. A point x is said to be in the support of a measure µ on A" iffevery open neighborhood of x has strictly positive measure . The set of allsuch points is called the support of µ . Prove that the support is a closed setwhose complement is the maximal open set on which µ vanishes. Show that

Page 47: A Course in Probability Theory KL Chung

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 33

the support of a p .m. on A' is the same as that of its d.f., defined in Exercise 6of Sec. 1 .2 .

*25. Let f be measurable with respect to ZT, and Z be contained in a nullset. Define

_f t f

on Zc,1 K

on Z,

where K is a constant . Then f is measurable with respect to 7 provided that(Q, , Y) is complete . Show that the conclusion may be false otherwise .

Page 48: A Course in Probability Theory KL Chung

3 Random variable .Expectation . Independence

3 .1 General definitions

Let the probability space (Q, ~ , °J') be given . Ri = (-oo, -boo) the (finite)real line, A* = [-oc, +oo] the extended real line, A1 = the Euclidean Borelfield on A", A* = the extended Borel field . A set in U, * is just a set in Apossibly enlarged by one or both points +oo .

DEFINITION OF A RANDOM VARIABLE . A real, extended-valued random vari-able is a function X whose domain is a set A in ~ and whose range iscontained in M* = [-oo, +oc] such that for each B in A*, we have

(1) {N: X(60) E B} E A n :-T

where A n ~T is the trace of ~T- on A . A complex-valued random variable isa function on a set A in ~T to the complex plane whose real and imaginaryparts are both real, finite-valued random variables .

This definition in its generality is necessary for logical reasons in manyapplications, but for a discussion of basic properties we may suppose A = Qand that X is real and finite-valued with probability one . This restricted meaning

Page 49: A Course in Probability Theory KL Chung

3.1 GENERAL DEFINITIONS 1 35

of a "random variable", abbreviated as "r.v.", will be understood in the bookunless otherwise specified. The general case may be reduced to this one byconsidering the trace of (Q, ~T, ~') on A, or on the "domain of finiteness"A0 = {co: IX(w)J < oc), and taking real and imaginary parts .

Consider the "inverse mapping" X-1 from M,' to Q, defined (as usual)as follows :

VA C R' : X-1 (A) = {co: X (w) E A} .

Condition (1) then states that X -1 carries members of A1 onto members of .~:

(2)

dB E 9,3 1 : X-1 (B) E `T ;

or in the briefest notation :

X-11 )C 3;7 .

Such a function is said to be measurable (with respect to ) . Thus, an r .v. isjust a measurable function from S2 to R' (or R*) .

The next proposition, a standard exercise on inverse mapping, is essential .

Theorem 3.1 .1 . For any function X from Q to g1 (or R*), not necessarilyan r.v ., the inverse mapping X -1 has the following properties :

X-1 (A` ) _ (X-' (A))` .

X-1 (UAa) = UX-1 (A«),

X-1 nAa = nX- '(Ac,) .a

«

where a ranges over an arbitrary index set, not necessarily countable .

Theorem 3.1.2 . X is an r.v. if and only if for each real number x, or eachreal number x in a dense subset of cal, we have

{(0:X(0)) < x} E ;7-T-

PROOF . The preceding condition may be written as

(3)

Vx : X-1((-00,x]) E .

Consider the collection .~/ of all subsets S of R1 for which X -1 (S) E i. FromTheorem 3 .1 .1 and the defining properties of the Borel field f , it follows thatif S E c/, then

X-' (S`) = (X -1 (S))` E :'- ;

Page 50: A Course in Probability Theory KL Chung

36 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE

if V j : Sj E /, then

µ /U B„ _

X -1 (us)j = UX-1 (Sj) E -~ .

l

Thus S` E c/ and U, Sj E c'/ and consequently c/ is a B .F. This B .F. containsall intervals of the form (-oc, x], which generate 73 1 even if x is restricted toa dense set, hence ,</ D 73 1 , which means that X-1 (B) E ~W- for each B E '3 1 .Thus X is an r.v. by definition . This proves the "if' part of the theorem ; the"only if' part is trivial .

Since 7'( •) is defined on 7T, the probability of the set in (1) is definedand will be written as

9'{X(co) E B} or (?I'{X E B) .

The next theorem relates the p.m. T to a p .m. on (W 1 , A 1 ) as discussedin Sec. 2.2 .

Theorem 3 .1.3 . Each r.v. on the probability space (S2, , ~T) induces aprobability space (7{ 1 , .73 1 , µ) by means of the following correspondence :

(4)

dB E A, 1 : µ(B) = T{X -1 (B)) =,'flX E B) .

PROOF . Clearly g(B) > 0 . If the B,,'s are disjoint sets in .73 1 , then theX-1 (Bn )'s are disjoint by Theorem 3 .1 .1 . Hence

X-1 (uB))n

= J/, UX-1 (B,l )n

n

9'(X-1(Bn)) =

g (Bn) .n n

Finally X-1 (J/' 1 ) = Q, hence µ(M. 1 ) = 1. Thus µ is a p.m .

The collection of sets {X-1 (S), S C 41} is a B .F. for any function X. If

X is a r .v . then the collection {X-1 (B), B E A') is called the B.F. generatedby X. It is the smallest Bore] subfield of T which contains all sets of the form{co: X (co) < x}, where x E :~? 1 . Thus (4) is a convenient way of representingthe measure .'7' when it is restricted to this subfield ; symbolically we maywrite it as follows :

IL P° X-1 .

This µ is called the "probability distribution measure" or p .m . of X, and itsassociated d.f. F according to Theorem 2 .2.4 will be called the d .f. of X .

Page 51: A Course in Probability Theory KL Chung

Specifically, F is given by

F(x) = A((- oo, x]) = 91{X < x) .

While the r .v. X determines lc and therefore F, the converse is obviouslyfalse . A family of r .v.'s having the same distribution is said to be "identicallydistributed" .

Example 1 . Let (S2, J) be a discrete sample space (see Example 1 of Sec . 2.2) .Every numerically valued function is an r .v .

Example 2 . (91, A, m) .In this case an r .v. is by definition just a Borel measurable function . According to

the usual definition, f on 91 is Borel measurable iff f -' (A3') C ?Z . In particular, thefunction f given by f (co) - w is an r.v. The two r .v .'s co and 1 - co are not identicalbut are identically distributed ; in fact their common distribution is the underlyingmeasure m.

Example 3 . (9' A' µ)The definition of a Borel measurable function is not affected, since no measure

is involved; so any such function is an r.v ., whatever the given p.m . µ may be . Asin Example 2, there exists an r .v . with the underlying µ as its p.m. ; see Exercise 3below .

We proceed to produce new r.v .'s from given ones .

Theorem 3.1.4 . IfX is an r.v ., f a Borel measurable function [on (R1 A 1 )],

then f (X) is an r.v .

PROOF . The quickest proof is as follows. Regarding the function f (X) ofco as the "composite mapping" :

f ° X: co -* f (X (w)),

we have (f ° X) -1 = X -1 ° f -1 and consequently

(f ° X)-1 0)= X-1(f-1 W)) C X-1 (A1) C

The reader who is not familiar with operations of this kind is advised to spellout the proof above in the old-fashioned manner, which takes only a littlelonger .

We must now discuss the notion of a random vector. This is just a vectoreach of whose components is an r .v . It is sufficient to consider the case of twodimensions, since there is no essential difference in higher dimensions apartfrom complication in notation .

3.1 GENERAL DEFINITIONS 1 37

Page 52: A Course in Probability Theory KL Chung

38

1 RANDOM VARIABLE

. EXPECTATION. INDEPENDENCE

We recall first that in the 2-dimensional Euclidean space : 2, or the plane,the Euclidean Borel field _32 is generated by rectangles of the form

{(x,y):a<x<b,c<y<d} .

A fortiori, it is also generated by product sets of the form

B1 xB2={(x,y):xEB1,yEB2},

where B1 and B2 belong to A1 . The collection of sets, each of which is a finiteunion of disjoint product sets, forms a field A2 . A function from g2 into M.1is called a Borel measurable function (of two variables) iff f-1( 1) C 2 .Written out, this says that for each 1-dimensional Borel set B, viz ., a memberof .A1, the set

{(x, y) : f (x, y) E B}

is a 2-dimensional Borel set, viz . a member of ',~62 .Now let X and Y be two r .v.'s on (Q, .J~ , ~P) . The random vector (X, Y)

induces a probability v on J.32 as follows :

(5)

VA E J.32 : v(A) = -':fl(X, Y) E A},

the right side being an abbreviation of J'({w : (X(w), Y(w)) E A}) . This vis called the (2-dimensional, probability) distribution or simply the p.m. of

(X, Y) .Let us also define, in imitation of X-1, the inverse mapping (X, Y)-1 by

the following formula :

VA E :%i32 : (X, Y)-1(A) _ {w : (X, Y) E A} .

This mapping has properties analogous to those of X-1 given inTheorem 3.1 .1, since the latter is actually true for a mapping of any twoabstract spaces . We can now easily generalize Theorem 3 .1 .4 .

Theorem 3.1.5. If X and Y are r.v.'s and f is a Borel measurable functionof two variables, then f (X, Y) is an r .v .

PROOF.

[f ° (X, Y)]-'( .:A1) = (X, Y)-1 ° f -'(?13') C (X, Y)-'(A') C ~7 .

The last inclusion says the inverse mapping (X, Y)-1 carries each 2-dimensional Borel set into a set in . This is proved as follows. If A =B1 x B2, where B1 E %31, B2 E A1, then it is clear that

(X Y)-'(A) = X-1(B1) f1 Y-' (B2) E ~T

Page 53: A Course in Probability Theory KL Chung

3 .1 GENERAL DEFINITIONS 1 39

by (2) . Now the collection of sets A in ?i2 for which (X, Y)- '(A) E T formsa B .F. by the analogue of Theorem 3 .1 .1 . It follows from what has just beenshown that this B .F. contains %~o hence it must also contain A2 . Hence eachset in /3 2 belongs to the collection, as was to be proved .

Here are some important special cases of Theorems 3 .1 .4 and 3 .1 .5 .Throughout the book we shall use the notation for numbers as well as func-tions :

(6)

x v y= max(x, y), x A y= min(x, y) .

Corollary . If X is an r.v . and f is a continuous function on R 1 , then f (X)is an r.v . ; in particular X r for positive integer r, 1XIr for positive real r, e-U,e"x for real A and t, are all r .v .'s (the last being complex-valued). If X and Yare r, v.' s, then

XvY, XAY, X+Y, X-Y, X •Y , X/Y

are r.v.'s, the last provided Y does not vanish .

Generalization to a finite number of r .v.'s is immediate . Passing to aninfinite sequence, let us state the following theorem, although its analogue inreal function theory should be well known to the reader .

Theorem 3.1 .6 . If {Xj , j > 1} is a sequence of r.v.'s, then

infX j , supXj, lim inf X j , lim supXjJ

J

J

j

are r.v.'s, not necessarily finite-valued with probability one though everywheredefined, and

lim X jj -+00

is an r.v . on the set A on which there is either convergence or divergence to+0O.

PROOF . To see, for example, that sup j X j is an r.v ., we need only observethe relation

vdx E Jai': {supX j < x} = n{Xj < x}J

J

and use Theorem 3 .1 .2. Since

lim sup Xj = inf (supX j ),I

n j>n

Page 54: A Course in Probability Theory KL Chung

40 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

and limj_,,,, Xj exists [and is finite] on the set where lim sup s Xj = lim infi Xj[and is finite], which belongs to ZT, the rest follows .

Here already we see the necessity of the general definition of an r .v .given at the beginning of this section .

DEFINITION . An r.v. X is called discrete (or countably valued) iff thereis a countable set B C R.1 such that 9'(X E B) = 1 .

It is easy to see that X is discrete if and only if its d .f. is . Perhaps it isworthwhile to point out that a discrete r .v. need not have a range that is discretein the sense of Euclidean topology, even apart from a set of probability zero .Consider, for example, an r.v. with the d .f. in Example 2 of Sec. 1 .1 .

The following terminology and notation will be used throughout the bookfor an arbitrary set 0, not necessarily the sample space .

DEFINTTION . For each A C Q, the function 1 ° (•) defined as follows :

b'w E S2 : IA(W) _ 1,

if w E A,- 0,

if w E Q\0,

is called the indicator (function) of 0 .

Clearly 1 ° is an r.v. if and only if A E 97 .A countable partition of 0 is a countable family of disjoint sets {Aj},

with Aj E zT for each j and such that Q = U; Aj . We have then

1=1i

More generally, let bj be arbitrary real numbers, then the function co definedbelow :

Vw E Q: cp(w) =

bj 1 A, (w),

is a discrete r.v . We shall call cp the r.v . belonging to the weighted partition{A j ; bj } . Each discrete r.v . X belongs to a certain partition . For let {bj } bethe countable set in the definition of X and let A j = {w: X (o)) = b1 }, then Xbelongs to the weighted partition {Aj ; bj } . If j ranges over a finite index set,the partition is called finite and the r.v. belonging to it simple .

EXERCISES

1. Prove Theorem 3 .1 .1 . For the "direct mapping" X, which of theseproperties of X -1 holds?

Page 55: A Course in Probability Theory KL Chung

3.2 PROPERTIES OF MATHEMATICAL EXPECTATION ( 41

2. If two r.v.'s are equal a.e ., then they have the same p .m .*3. Given any p .m . µ on

define an r.v. whose p.m. is µ. Canthis be done in an arbitrary probability space?

* 4. Let 9 be uniformly distributed on [0,1] . For each d.f. F, define G(y) _sup{x : F(x) < y} . Then G(9) has the d.f. F .

*5. Suppose X has the continuous d .f . F, then F(X) has the uniformdistribution on [0,I]. What if F is not continuous?

6. Is the range of an r .v. necessarily Borel or Lebesgue measurable?7. The sum, difference, product, or quotient (denominator nonvanishing)

of the two discrete r.v.'s is discrete .8. If S2 is discrete (countable), then every r .v. is discrete. Conversely,

every r .v . in a probability space is discrete if and only if the p .m. is atomic .[Hi' r: Use Exercise 23 of Sec . 2.2.]

9. If f is Borel measurable, and X and Y are identically distributed, thenso are f (X) and f (Y) .

10. Express the indicators of A, U A2, A 1 fl A2, A 1 \A 2 , A 1 A A2,lim sup A„ , lim inf A, in terms of those of A 1 , A 2 , or A, [For the definitionsof the limits see Sec . 4.2 .]

*11 . Let T {X} be the minimal B .F. with respect to which X is measurable .Show that A E f {X} if and only if A = X -1 (B) for some B E A' . Is this Bunique? Can there be a set A ~ A1 such that A = X -1 (A)?

12. Generalize the assertion in Exercise 11 to a finite set of r .v.'s . [It ispossible to generalize even to an arbitrary set of r.v .'s.]

3 .2 Properties of mathematical expectation

The concept of "(mathematical) expectation" is the same as that of integrationin the probability space with respect to the measure ' . The reader is supposedto have some acquaintance with this, at least in the particular case (V, A, m)or %31 , in) . [In the latter case, the measure not being finite, the theoryof integration is slightly more complicated.] The general theory is not muchdifferent and will be briefly reviewed . The r.v.'s below will be tacitly assumedto be finite everywhere to avoid trivial complications .

For each positive discrete r.v. X belonging to the weighted partition{Aj ; bj }, we define its expectation to be

(1)

J`(X) = >bj {A;} .J

This is either a positive finite number or +oo . It is trivial that if X belongsto different partitions, the corresponding values given by (1) agree . Now let

Page 56: A Course in Probability Theory KL Chung

42 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

X be an arbitrary positive r.v. For any two positive integers m and n, the set

n

n + 1A mn = w:

2m< X (O)) < 2m

belongs to :A. For each m, let Xn, denote the r.v. belonging to the weightedpartition {An,n ; n /2"' ) ; thus Xm = n /2m if and only if n/2'n < X < (n +1)/2m . It is easy to see that we have for each m :

1` w: Xm ((0 ) < X.+i (w) ; 0 < X ( (0) - X m (w) < 2M

.

Consequently there is monotone convergence :

dw: lim Xm (cv) = X(w) .M-+00

The expectation Xm has just been defined ; it is00 n

n<X < n+1

2m

2m -

2mn=0

If for one value of m we have cf (Xm ) = +oo, then we define o (X) = +oc;otherwise we define

(X) = lim (,"(Xm ),m~ 00

the limit existing, finite or infinite, since cP(X,,,) is an increasing sequence ofreal numbers . It should be shown that when X is discrete, this new definitionagrees with the previous one .

For an arbitrary X, put as usual

(2)

X = X+ -X - where X+ = X v 0, X- = (-X) v 0.

Both X+ and X- are positive r.v .'s, and so their expectations are defined .Unless both c` (X+) and c'(X- ) are +oo, we define

(3)

(`(X) _ (X+) - "(X )

with the usual convention regarding oc . We say X has a finite or infiniteexpectation (or expected value) according as ~'(X) is a finite number or ±oo .In the expected case we shall say that the expectation of X does not exist . Theexpectation, when it exists, is also denoted by

[X(w)(dw) .JP

c~(Xn,)

More generally, for each A in

we define

(4)

f X (w)'"(dw) = ((XJA

1A)

Page 57: A Course in Probability Theory KL Chung

3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 43

and call it "the integral of X (with respect to 91) over the set A" . We shallsay that X is integrable with respect to JP over A iff the integral above existsand is finite .

In the case of (:T 1 , ?P3 1 , A), if we write X = f , (o = x, the integral

I X(w)9'(dw) = f f (x)li(dx)n

is just the ordinary Lebesgue-Stieltjes integral of f with respect to p . If Fis the d.f. of µ and A = (a, b], this is also written as

f (x) dF(x) .(a b]

This classical notation is really an anachronism, originated in the days when apoint function was more popular than a set function. The notation above mustthen amend to

1b+0 b+0 b-0 b-0

a+0 1-0 1+0

a-0

to distinguish clearly between the four kinds of intervals (a, b], [a, b], (a, b),[a, b) .

In the case of (2l, A, m), the integral reduces to the ordinary Lebesgueintegral

b

bf (x)m(dx) =

ff (x) dx .

a

a

Here m is atomless, so the notation is adequate and there is no need to distin-guish between the different kinds of intervals .

The general integral has the familiar properties of the Lebesgue integralon [0,1] . We list a few below for ready reference, some being easy conse-quences of others. As a general notation, the left member of (4) will beabbreviated to fn X dJ/' . In the following, X, Y are r.v.'s ; a, b are constants ;A is a set in ,~ .

(i) Absolute integrability . fn X d J~' is finite if and only if

IX I dJ' < oc .JA

(ii) Linearity .

f(aX+bY)cl=afXd+bf.°~' YdJA

provided that the right side is meaningful, namely not +oo - oc or -oo + oc.

Page 58: A Course in Probability Theory KL Chung

44 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

(iii) Additivity over sets . If the An 's are disjoint, then

JXd9'=~ f Xd~'X

u n An

n An

(iv) Positivity . If X > 0 a .e. on A, then

fn

X d °J' > 0 .

f(v) Monotonicity . If X1 < X < X2 a.e. on A, then

X i dam'< f XdGJ'< f X2d?P.n

n

n

Mean value theorem . If a < X < b a.e. on A, then

aGJ'(A) < f X d°J' < b°J'(A) .n

(vii) Modulus inequality .

JAX d°J'

n

?7J

(viii) Dominated convergence theorem. If lim n, X n = X a.e. or merelyin measure on A and `dn : IXn I < Y a.e. on A, with fA Y dY' < oc, then

(5)

lim f Xn dGJ' =f

X dJ' = f lim Xn d-OP .n moo A

JA

A n-'°°

(ix) Bounded convergence theorem . If lim n, X n = X a.e . or merelyin measure on A and there exists a constant M such that `'n : IX„ I < M a.e .on A, then (5) is true .

(x) Monotone convergence theorem . If X, > 0 and Xn T X a.e . on A,then (5) is again true provided that +oo is allowed as a value for eithermember. The condition "X,, > 0" may be weakened to : "6'(X„) > -oo forsome n".

(xi) Integration term by term . If

Ef IXn I dJ' < oo,n

fIXId .

then En IXn I < oc a.e. on A so that En Xn converges a.e. on A and

'Xn dJ' =

f Xn d?P.n

Page 59: A Course in Probability Theory KL Chung

3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 45

(xii) Fatou's lemma. If X, > 0 a.e. on A, then

Af ( lim X,) &l' < lim

X, 1 d:-Z'.n-+oc

n-+00 A

Let us prove the following useful theorem as an instructive example .

Theorem 3.2.1 . We have00

00

(6)

E (IXI >- n) < "( IXI) < 1 +

'( IXI >_ n)I1=1 n=1

so that (`(IXI) < oo if and only if the series above converges .

PROOF . By the additivity property (iii), if A 11 = { n < IXI < n + 1),

00

(`°(IXI) = Y,'f IXI dGJ' .n=0 nn

Hence by the mean value theorem (vi) applied to each set An00

(7)

ng'(An) < (IXI)

(n + 1)~P(An) = 1 +

n~P(An)jn=0

Thus we haveN

n=1

n=1

N

n=1

n=1

n=0

It remains to show00

(8)

n_,~P(An) _

( IXI > n),n=0

00

n=0

finite or infinite . Now the partial sums of the series on the left may be rear-ranged (Abel's method of partial summation!) to yield, for N > 1,

N

(9)

En{J~'(IXI > n) -'(IXI > n + 1)}n=0

- - 1)}"P(IXI >- n) -N~(IXI > N+ 1)

'(IXI > n) - NJ'(IXI > N + 1) .

N

00

n=0

(10) Ln'~,)(An) <

° (IXI > n)

.~/(An)+N,I'(IXI > N+ 1) .n=1

Page 60: A Course in Probability Theory KL Chung

46 I RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

Another application of the mean value theorem gives

N:~(IX I > N + 1) < f

IXI dT.(IXI?N+1)

Hence if c`(IXI) < oo, then the last term in (10) converges to zero as N --* 00and (8) follows with both sides finite . On the other hand, if c1(IXI) = o0, thenthe second term in (10) diverges with the first as N -* o0, so that (8) is alsotrue with both sides infinite .

Corollary. If X takes only positive integer values, then

~(X) =J'(X > n) .n

EXERCISES

1. If X > 0 a .e. on A and fA X dY = 0, then X = 0 a .e. on A .*2. If c (IXI) < oo and limn,x "J(Af ) = 0, then limn,oo fA X d.°J' = 0 .

In particular

lim

XdA=0Xn->oo f(IXI>n)

3. Let X > 0 and f 2 X d~P = A, 0 < A < oo . Then the set function vdefined on J;7 as follows :

v(A)= A fn

X dJT,

is a probability measure on ~ .4. Let c be a fixed constant, c > 0 . Then ct(IXI) < o0 if and only if

x

~ftXI > cn) < oo .

n=1

In particular, if the last series converges for one value of c, it converges forall values of c .

5. For any r > 0, (, (IX Ir) < oc if and only if

xnr-1 "T(IXI > n) < oc .

n=1

Page 61: A Course in Probability Theory KL Chung

3.2 PROPERTIES OF MATHEMATICAL EXPECTATION ( 47

*6. Suppose that sup, IX„ I < Y on A with fA Y d?/' < oc. Deduce fromFatou's lemma :

Af (lim X,,) d'/' > lim [Xd'P .„ :

11+00

- n-+00 A

Show this is false if the condition involving Y is omitted .*7. Given the r.v. X with finite c< (X), and c > 0, there exists a simple r .v .

XE (see the end of Sec. 3 .1) such that

d ( IX - X E I) < E.

Hence there exists a sequence of simple r.v.'s Xm such that

lim ((IX - X,n 1)-0.M_+ 00

We can choose IX,,) so that IXm I < IX I for all m .

* 8. For any two sets A 1 and A 2 in ~T, define

p(A1, A2) _ 9'(A1 A A2) ;

then p is a pseudo-metric in the space of sets in 3~ 7; call the resulting metricspace M(27, J') . Prove that for each integrable r.v. X the mapping of M(J~ , ~')to t given by A -f fA X d2' is continuous . Similarly, the mappings onM(T, ?P) x M(3 , 7') to M(7 , P) given by

(A1, A2) --- A1 U A2, A1 fl A2, A1\A2, A1 A A2

are all continuous . If (see Sec. 4.2 below)

lim sup A„ = lim inf A„it

n

modulo a null set, we denote the common equivalence class of these two setsby lim„ A,, . Prove that in this case [A,,) converges to lim„ A,, in the metricp . Deduce Exercise 2 above as a special case .

/There is a basic relation between the abstract integral with respect to

' over sets in : T on the one hand, and the Lebesgue-Stieltjes integral withrespect to µ over sets in 1 on the other, induced by each r .v. We give theversion in one dimension first .

Theorem 3.2.2. Let X on (Q, ;~i , .T) induce the probability space%3 1 , µ) according to Theorem 3 .1 .3 and let f be Borel measurable . Then

we have

(11)

f f (X (cw));'/'(dw) =

f (x)A(dx)

provided that either side exists .

Page 62: A Course in Probability Theory KL Chung

48 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE

PROOF. Let B E .%1 1 , and f = 1B, then the left side in (11) is ' X E B)and the right side is µ(B) . They are equal by the definition of p. in (4) ofSec. 3 .1 . Now by the linearity of both integrals in (11), it will also hold if

f= bj1 Bj ,

namely the r .v . on (MI, A) belonging to an arbitrary weighted partition{B j ; bj} . For an arbitrary positive Borel measurable function f we can define,as in the discussion of the abstract integral given above, a sequence f f, m >1 } of the form (12) such that f . .. T f everywhere. For each of them we have

(13)

[fm(X)d~'= f .fordµ ;

hence, letting in -+ oo and using the monotone convergence theorem, weobtain (11) whether the limits are finite or not . This proves the theorem forf > 0, and the general case follows in the usual way .

We shall need the generalization of the preceding theorem in severaldimensions. No change is necessary except for notation, which we will givein two dimensions . Instead of the v in (5) of Sec. 3 .1, let us write the "masselement" as it2 (dx, d y) so that

v(A) = ff 2 dxµ( , d y) .

A

Theorem 3.2.3 . Let (X, Y) on (Q, f, J') induce the probability space( , ,,2 , ;,~32 , µ2) and let f be a Borel measurable function of two variables .Then we have

( 14)

f f (X(w), Y(w)).o(dw) = ff f (x, y)l.2 (dx, dy) .s~

Note that f (X, Y) is an r.v . by Theorem 3 .1 .5 .

As a consequence of Theorem 3 .2 .2, we have : if µx and Fx denote,respectively, the p.m. and d .f. induced by X, then we have

~( (X) = f xµx (dx) = f x d Fx (x) ;~~

ocand more generally

(15)

(`(.f (X)) = f, . f (x)l.x(dx) = f .f (x) dFx(x)00

with the usual proviso regarding existence and finiteness .

Page 63: A Course in Probability Theory KL Chung

3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 49

Another important application is as follows : let µ2 be as in Theorem 3 .2.3and take f (x, y) to be x + y there . We obtain

(16)

i(X + Y) = f f (x + Y)Ii2(dx, dy)

= ff xµ2(dx, dy) + ff yµ2(dx, dy)

o~? z

JR~

On the other hand, if we take f (x, y) to be x or y, respectively, we obtain

(` (X) = JfxIL2(dx, dy), (`` (Y) = ff yh .2(dx, dy)

R2

9?2

and consequently

(17)

(-r,(X + Y) _ (-"(X) + e(Y) .

This result is a case of the linearity of oc' given but not proved here ; the proofabove reduces this property in the general case of (0, 97, ST) to the corre-sponding one in the special case (R2, ?Z2, µ2) . Such a reduction is frequentlyuseful when there are technical difficulties in the abstract treatment .

We end this section with a discussion of "moments" .Let a be real, r positive, then cf(IX - air) is called the absolute moment

of X of order r, about a. It may be +oo ; otherwise, and if r is an integer,~`((X - a)r) is the corresponding moment . If µ and F are, respectively, the

p.m. and d.f. of X, then we have by Theorem 3 .2 .2 :

('(IX - air) = f Ix - al rµ(dx) = f Ix - air dF(x),

cc.

00`((X - a)r) =

(x - a)rµ(dx) = f (x - a)r dF(x) .fJi

00

For r = 1, a = 0, this reduces to (`(X), which is also called the mean of X .The moments about the mean are called central moments . That of order 2 is

particularly important and is called the variance, var (X) ; its positive squareroot the standard deviation . 6(X):

var (X) = Q2 (X ) = c`'{ (X - `(X))2) = (,`(X2) { (X ))2 .

We note the inequality a2(X) < c`(X2), which will be used a good dealin Chapter 5 . For any positive number p, X is said to belong to LP =LP(Q, :~ , :J/') iff c (jXjP) < oo .

Page 64: A Course in Probability Theory KL Chung

50 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

The well-known inequalities of Holder and Minkowski (see, e.g .,Natanson [3]) may be written as follows . Let X and Y be r.v.'s, 1 < p < 00and l / p + l /q = 1, then

(18)

I f (XY)I < <`( IXYI) < Z(IXIP) 1 /PL(IYlq) 11q ,

(19)

{c(IX + YIP)} 1 IP < ( IX I p ) l/p + (`'( I YI P) 1 /P .

If Y - 1 in (18), we obtain

(20)

(-,C,-(IXI)- (r(IXI P ) 1/P ;

for p = 2, (18) is called the Cauchy-Schwarz inequality . Replacing IXI byIXI', where 0 < r < p, and writing r' = pr in (20) we obtain

(21)

,F(IXl r)l,r < <qIXI'') 11r, 0 < r < r' < oo .

The last will be referred to as the Liapounov inequality . It is a special case ofthe next inequality, of which we will sketch a proof .

Jensen's inequality. If cp is a convex function on

,i L l , and X and (p (X) areintegrable r.v.'s, then

(22)

PUP(X))) < €((p(X )) .

PROOF . Convexity means : for every positive ? .1, . . . , ~, 2 with sum 1 wehave

n

(23)

~0

jyj

jco(yj) .j=1 j=1

This is known to imply the continuity of gyp, so that cp(X) is an r.v . We shallprove (22) for a simple r .v. and refer the general case to Theorem 9 .1 .4. Letthen X take the value yj with probability A j, 1 < j < n . Then we have bydefinition (1) :

n

nF(X) _

jyj,

(co(X)) =

j~o(yj)1j=1 j=1

Thus (22) follows from (23) .

Finally, we prove a famous inequality that is almost trivial but veryuseful .

Chebyshev inequality . If ~p is a strictly positive and increasing functionon (0, oc), cp(u) = co(-u), and X is an r .v. such that 7{cp(X)} < oc, then for

Page 65: A Course in Probability Theory KL Chung

each u > 0 :

3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 51

7{IXI >- u} <{~P(X)}

cp(u)

PROOF. We have by the mean value theorem :

flco(X )} =in

~o(X )dam' >_jX1>U)

I~o(X)d-,~P >_ ~o(u)'' f IXI >- u}

from which the inequality follows .

The most familiar application is when co(u) = I uI P for 0 < p < oo, sothat the inequality yields an upper bound for the "tail" probability in terms ofan absolute moment .

EXERCISES

9. Another proof of (14) : verify it first for simple r.v.'s and then useExercise 7 of Sec. 3.2 .

10. Prove that if 0 < r < r' and (,,1(IX I r') < oo, then (,(IX I r) < oo. Alsothat (r ( IX I r) < oo if and only if AIX - a I r ) < oo for every a .

*11 . If (r(X2 ) = 1 and cP(IXI) > a > 0, then UJ'{IXI > Xa} > (1 - X)2a2

for 0<A<1 .*12. If X > 0 and Y > 0, p_> 0, then e { (X + Y )P } < 2P { AXP) +

AYP)} . If p > 1, the factor 2P may be replaced by 2P -1 . If 0 < p < 1, itmay be replaced by 1 .

*13. If X j > 0, then

n

P

n

c~

< or c~ (XP

)

j=1

j=1

according as p < 1 or p > 1 .*14. If p > 1, we have

and so

P n1

<-

IXjP

cF'(IX1IP) ;

Page 66: A Course in Probability Theory KL Chung

52 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

we have also

Compare the inequalities .15. If p > 0, ( (iXi ) < oc, then xp?7-1{iXI > x} = o(l) as x -± oo .

Conversely, if xPJ' J IX I > x) = o(1), then ((IX IP-E) < oc for 0 < E < p.* 16. For any d.f. and any a > 0, we have

f00[F(x + a) - F(x)] dx = a.

00

17. If F is a d .f. such that F(0-) = 0, then00f/

{1 - F(x)} dx =J ~

x dF(x) < +oo .0

0

Thus if X is a positive r .v ., then we have00

c` (X)_ I

~ 3A{X > x} dx =fo

GJ'{X > x} dx .0

18. Prove that f"0,,, l x i d F(x) < oc if and only if

0F(x) dx < oc and

00

0

*19. If {X,} is a sequence of identically distributed r .v.'s with finite mean,then

1lim - c`'{ max iXj I } = 0 .n fl

I<j<n

[HINT : Use Exercise 17 to express the mean of the maximum .]20. For r > 1, we have

~~ 1 ((XAu r)du= r c`(Xl fi r ) .0 ur

r - 1

[HINT : By Exercise 17,

u'

uc (X A ur) =

f./'(X > x) dx =

°l'(X llr > v)rvr-1 dv,0

0

substitute and invert the order of the repeated integrations .]

[1 - F(x)] dx < oo .

Page 67: A Course in Probability Theory KL Chung

3.3 Independence

We shall now introduce a fundamental new concept peculiar to the theory ofprobability, that of "(stochastic) independence" .

DEFINITION OF INDEPENDENCE. The r.v.'s {X j, 1 < j < n} are said to be(totally) independent iff for any linear Borel sets {Bj, 1 < j < n } we have

n

n

(1)

°' n(Xj E Bj) = fl ?)(Xj E Bj) .j=1

j=1

The r.v.'s of an infinite family are said to be independent iff those in everyfinite subfamily are. They are said to be pairwise independent iff every twoof them are independent .

Note that (1) implies that the r.v.'s in every subset of {X j , 1 < j < n)are also independent, since we may take some of the Bj 's as 9. 1 . On the otherhand, (1) is implied by the apparently weaker hypothesis : for every set of realnumbers {xj , 1 < j < n }

{~(Xi<xi)

n

(2)

_=fl (Xj <xj).j=1

j=1

The proof of the equivalence of (1) and (2) is left as an exercise . In termsof the p .m . µn induced by the random vector (X 1 , . . . , X„) on (zRn , A"), andthe p.m.'s {µj , 1 < j < n } induced by each X j on (?P0, A'), the relation (1)may be written as

n

n

(3)

µn X Bj = 11µj(Bj),j=1

j=1

where X i=1 Bj is the product set B1 x . . . x Bn discussed in Sec . 3.1 . Finally,we may introduce the n-dimensional distribution function corresponding toµn , which is defined by the left side of (2) or in alternative notation :

nF(xi, . . .,xn)= .%fXj<xj,l<j<n}=µ" X(-oo,xj})

j=1

then (2) may be written asn

F(x1, . . .,xn)=11j=1

Fj(xj) .

3.3 INDEPENDENCE 1 53

Page 68: A Course in Probability Theory KL Chung

54 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

From now on when the probability space is fixed, a set in `T will also becalled an event . The events {E j , 1 < j < n } are said to be independent iff theirindicators are independent; this is equivalent to : for any subset {j 1, . . . j f } of11, . . . , n }, we have

(4)

Theorem 3 .3.1 . If {X j , 1 < j < n} are independent r.v .'s and {f j , 1 < j <n) are Borel measurable functions, then { f j (Xj ), 1 < j < n } are independentr.v .'s .

PROOF . Let A j E 231 , then f l 1 (Aj ) E ~ 1 by the definition of a Borelmeasurable function . By Theorem 3 .1 .1, we have

n

n

n{fj(Xj)EA1)=n{XjE f~ 1 (Aj)) .j=1

j=1

Hence we have

n

n

nJ~ n[f j(Xj ) E Aj

n{[x1 E f7 1 (A j )] _ f 2'{Xj E f ;1 (Aj)}j=1

j=1

j=1

11

11

f j(Xj) E Aj} .j=1

This being true for every choice of the A j's, the f j(X j)'s are independentby definition .

The proof of the next theorem is similar and is left as an exercise .

Theorem 3 .3.2 . Let 1 < n 1 < n2 < . . . < n k = n ; f 1 a Borel measurablefunction of n 1 variables, f2 one of n2 - n 1 variables, . . . , fk one of n k - n k_ 1

variables. If {X j , 1 < j < n} are independent r .v .'s then the k r.v.'s

fI(X1, . . .,X, ),f2(X,1+1, . . .,X11,), . . .,fk(Xnk_1+1, . . .,X, )

are independent.

Theorem 3.3.3 . If X and Y are independent and both have finite expecta-tions, then

(5)

(` (X Y) = (` (X) ~` (Y )

e

en E j k. = H ?1') (Ej k ) .k=1

k=1

Page 69: A Course in Probability Theory KL Chung

andX (w)Y(w) = c j dk if Co E AjMk .

Hence the r.v . XY is discrete and belongs to the superposed partitions{AjMk ; cjdk} with both the j and the k varying independently of each other.Since X and Y are independent, we have for every j and k:

~'(AjMk) = P(X = c1 ; Y = dk ) = ?P(X = c j )~'(Y = dk) _ 'T(Aj)^Mk);

and consequently by definition (1) of Sec . 3 .2 :

c`(XY) =

jdk'(AjMk) =

c1:Y(A1)

dk'J'(Mk)j.k

= c (X)c`(Y) .

n UMk = U(AjMk)k

j,k

j

3.3 INDEPENDENCE 1 55

PROOF . We give two proofs in detail of this important result to illustratethe methods . Cf. the two proofs of (14) in Sec . 3.2, one indicated there, andone in Exercise 9 of Sec . 3 .2 .

First proof. Suppose first that the two r .v.'s X and Y are both discretebelonging respectively to the weighted partitions {A1 ;cj } and {Mk.;d k } suchthat A j = {X = c j }, Mk = { Y = dd . Thus

c`(X) =

j,JT(A1 ),

c`'(Y) =

d :%I'(M k ) .j

Now we have

Thus (5) is true in this case .Now let X and Y be arbitrary positive r.v.'s with finite expectations . Then,

according to the discussion at the beginning of Sec . 3 .2, there are discreter.v .'s X,,, and Y,,, such that ((X,,,) T c`(X) and (`(Y, n ) T C "(Y) . Furthermore,for each m, X,n and Y,,, are independent . Note that for the independence ofdiscrete r .v .'s it is sufficient to verify the relation (1) when the Bj 's are theirpossible values and "E" is replaced by "=" (why ?). Here we have

n

n'

n

n+1 n'

n'+l~~p Xm - 2rn' Yn, = 2177

{ 2rn - X < 2m , 2rn < Y < ur11

J~ n <X< n+ 1 J~ n' <Y< n' + 1217' -

2m

{ 21n -

2111

Page 70: A Course in Probability Theory KL Chung

56 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

r= J7~ {xm =

n2

2m ~

~Ym=n

' ~ .

The independence of Xm and Ym is also a consequence of Theorem 3 .3.1,since X," = [2'"X]/2"' where [X] denotes the greatest integer in X. Finally, itis clear that X,,,Y,,, is increasing with m and

0 <XY-X,,,Y,,,=X(Y-Ym)+Ym(X -Xm ) ± 0.

Hence, by the monotone convergence theorem, we conclude that

1(XY) = lim ((Xm Ym ) = lim ct(Xm )cr(Ym )m-±W

M-+00

= lim d`(Xm ) lim (-,' (Ym ) = AX) AY) .M-+00

M-+ 00

Thus (5) is true also in this case . For the general case, we use (2) and (3) ofSec. 3.2 and observe that the independence of X and Y implies that of X+ andY+ ; X - and Y- ; and so on. This again can be seen directly or as a consequenceof Theorem 3 .3 .1 . Hence we have, under our finiteness hypothesis :

,~D(XY) _ (-~-((X+ - X- )(Y+ - Y - ))

= (r(X+ Y+ - X+ Y - - x- Y+ + x-Y - )

= cr(X+ Y+ ) - (-r (X+ Y ) - (r(X Y+ ) + o(X Y- )

_ ( (X+)(r-(Y+) - (`(X+ )((Y ) - e(X )d,(Y+ ) + (r(X )(-r(Y )

_ {((X+ ) - (r(X )){&C,(Y + ) - g(Y )} =

The first proof is completed .

Second proof. Consider the random vector (X, Y) and let the p .m.induced by it be µ2 (dx, d y) . Then we have by Theorem 3 .2.3 :

(XY)= J

XY d.0) = ffxy2(dx,

,udy)

By (3), the last integral is equal to

f fxy g I (dx)µ2(dy) = f x µi (dx)

fy µ2(dx) = d(X)((r(Y),

finishing the proof! Observe that we are using here a very simple form ofFubini's theorem (see below) . Indeed, the second proof appears to be so muchshorter only because we are relying on the theory of "product measure" µ2 =µ] x µ2 on (:l'2 , i2 ) . This is another illustration of the method of reductionmentioned in connection with the proof of (17) in Sec . 3 .2 .

Page 71: A Course in Probability Theory KL Chung

Corollary. If {Xj , 1 < j < n} are independent r.v.'s with finite expectations,then

n

n

(6)

c fJXj = 11 c'(Xj) .

j=1

j=1

This follows at once by induction from (5), provided we observe that thetwo r.v .'s

ftx1

n

and 11 Xjj=1

j=k+1

are independent for each k, I < k < n - 1 . A rigorous proof of this fact maybe supplied by Theorem 3 .3.2 .

Do independent random variables exist? Here we can take the cue fromthe intuitive background of probability theory which not only has given risehistorically to this branch of mathematical discipline, but remains a source ofinspiration, inculcating a way of thinking peculiar to the discipline . It maybe said that no one could have learned the subject properly without acquiringsome feeling for the intuitive content of the concept of stochastic indepen-dence, and through it, certain degrees of dependence . Briefly then : events aredetermined by the outcomes of random trials . If an unbiased coin is tossedand the two possible outcomes are recorded as 0 and 1, this is an r.v ., and ittakes these two values with roughly the probabilities 1 each. Repeated tossingwill produce a sequence of outcomes . If now a die is cast, the outcome maybe similarly represented by an r.v. taking the six values 1 to 6; again thismay be repeated to produce a sequence . Next we may draw a card from apack or a ball from an urn, or take a measurement of a physical quantitysampled from a given population, or make an observation of some fortuitousnatural phenomenon, the outcomes in the last two cases being r .v.'s takingsome rational values in terms of certain units ; and so on. Now it is veryeasy to conceive of undertaking these various trials under conditions suchthat their respective outcomes do not appreciably affect each other ; indeed itwould take more imagination to conceive the opposite! In this circumstance,idealized, the trials are carried out "independently of one another" and thecorresponding r .v .'s are "independent" according to definition . We have thus"constructed" sets of independent r .v.'s in varied contexts and with variousdistributions (although they are always discrete on realistic grounds), and thewhole process can be continued indefinitely .

Can such a construction be made rigorous? We begin by an easy specialcase .

3.3 INDEPENDENCE 1 57

Page 72: A Course in Probability Theory KL Chung

58 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE

Example 1 . Let n > 2 and (S2 j , Jj', ~'j) be n discrete probability spaces . We definethe product space

2' = S2 1 x . . . x S2n (n factors)

to be the space of all ordered n-tuples w" = (cot, . . . , 600 1 where each wj E 2, . Theproduct B.F. ,I" is simply the collection of all subsets of S2", just as lj is that for S2 j .Recall that (Example 1 of Sec . 2.2) the p.m . ~j is determined by its value for eachpoint of S2, . Since S2" is also a countable set, we may define a p .m . on Jn by thefollowing assignment :

n

(7)

1(10), 1) = IIPi ({wj}),

j=1

namely, to the n-tuple (w 1 , . . . , co n ) the probability to be assigned is the product of theprobabilities originally assigned to each component wj by ~Pj . This p.m. will be calledthe product measure derived from the p .m.'s {-,~'Pj , 1 < j < n } and denoted by X j=1 iIt is trivial to verify that this is indeed a p .m. Furthermore, it has the following productproperty, extending its definition (7) : if S j E 4, 1 < j < n, then

n

n

X Sj -II~"

,j (S j ) .J=1

j=1

To see this, we observe that the left side is, by definition, equal ton

(8)

E . . . E ~~~({wl, . . ., (On)) _ E . . . E 11=~,j({0)j})W1ES1

(0,Es n

wIES1

Wn Esn j= 1

n

n

({

) _ f -,~Pi j(Si)+

the second equation being a matter of simple algebra .Now let Xj be an r .v. (namely an arbitrary function) on Qj ; Bj be an arbitrary

Borel set ; and S j = Xi 1 (Bj), namely :

Sj = {(0j E Qj : Xj(wj ) E Bj)

so that Sj E Jj, We have then by (8) :

11

n

n

(9) Gl'" {*x1EB1I

L _ '" X Sj = 11 ~j(Sj) = 11 ~j{X j E Bj} •J=1

j=1

-'=1

j=1

To each function X j on Q 1 let correspond the function Xj on S2" defined below, inwhich w = (w1, . . . , wn ) and each "coordinate" w j is regarded as a function of thepoint w :

Vw E Q'1 : X j (w) = X j (w

Page 73: A Course in Probability Theory KL Chung

Then we haven n

n{co:X j (w) E Bj} = X {wj:Xj(wj) E Bj)j=1

j=1

since

{(O:Xj(w) E Bj) = Q1 x . . . X Qj_i X {(wj :X j(wj) E Bj} x Qj+l x . . . X Q n .

It follows from (9) that

n

n

2 n[X j E Bj] = f 'op"{X

j E B j } .j=1

j=1

Therefore the r .v .'s {Xj , 1 < j < n} are independent .

Example 2. Let W11 be the n-dimensional cube (immaterial whether it is closed ornot) :

2l' ={(xi, . . .,xn) : 0 <xj _< 1 ;1 < j <_ n} .

The trace on Gll" of (g", ", m"), where ?A is the n-dimensional Euclidean space,73" and m" the usual Borel field and measure, is a probability space . The p.m . Mn on°/l" is a product measure having the property analogous to (8) . Let {f j , 1 < j < n}be n Borel measurable functions of one variable, and

Xj((x1, . . .,xn)) = f1(X1)-

Then {X j , 1 < j < n } are independent r.v.'s . In particular if f j (xj ) - xj , we obtainthe n coordinate variables in the cube . The reader may recall the term "independentvariables" used in calculus, particularly for integration in several variables . The twousages have some accidental rapport .

Example 3 . The point of Example 2 is that there is a ready-made product measurethere. Now on (Y', A") it is possible to construct such a one based on given p .m.'son (s' 1 , A 1 ). Let these be {µ j , 1 < j < n}; we define µ" for product sets, in analogywith (8), as follows :

3.3 INDEPENDENCE 1 59

µnn

n

X Bj = 11,Lj(Bj) .

J=1

j=1

It remains to extend this definition to all of A', or, more logically speaking, to provethat there exists a p .m. 1.i," on J3" that has the above "product property" . The situationis somewhat more complicated than in Example 1, just as Example 3 in Sec . 2.2 ismore complicated than Example 1 there . Indeed, the required construction is exactlythat of the corresponding Lebesgue-Stieltjes measure in n dimensions . This will besubsumed in the next theorem . Assuming that it has been accomplished, then sets ofn independent r .v .'s can be defined just as in Example 2 .

Page 74: A Course in Probability Theory KL Chung

60 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

Example 4 . Can we construct r .v.'s on the probability space (W, A, m) itself, withoutgoing to a product space? Indeed we can, but only by imbedding a product structurein %/. The simplest case will now be described and we shall return to it in the nextchapter .

For each real number in (0,1 ], consider its binary digital expansion

0Cx = .E1 E2 . . .

EnEn . . .2"n=1

each c, = 0 or 1 .

This expansion is unique except when x is of the form m/2n ; the set of such x iscountable and so of probability zero, hence whatever we decide to do with them willbe immaterial for our purposes . For the sake of definiteness, let us agree that onlyexpansions with infinitely many digits "1" are used . Now each digit Ej of x is afunction of x taking the values 0 and 1 on two Borel sets . Hence they are r.v .'s . Let{c j , j ? 1 } be a given sequence of 0's and l's . Then the set

n

{x : Ej(x) = cj, I < j < n} =nix : EJ(x) = cj}j=1

is the set of numbers x whose first n digits are the given c /s, thus

x = .C1C2 . . .CnEn+t€n+2 . . .

with the digits from the (n + 1)st on completely arbitrary . It is clear that this set isjust an interval of length 1/2n, hence of probability 1/2n . On the other hand for eachj, the set {x : Ej (x) = cj } has probability ; for a similar reason . We have therefore

1

n

n191{Ej = cj, 1 < j < n} = 2n = 11 2 = fl ~{Ej = cj} .

j= 1

j=1

This being true for every choice of the cj's, the r.v .'s {Ej , j > 1} are independent. Let{f j , j > 1 } be arbitrary functions with domain the two points {0, 11, then {f j (E j ), j >1 } are also independent r.v .'s .

This example seems extremely special, but easy extensions are at hand (seeExercises 13, 14, and 15 below) .

We are now ready to state and prove the fundamental existence theoremof product measures .

Theorem 3.3 .4 . Let a finite or infinite sequence of p .m.'s 11-t j ) on (AP, A'),or equivalently their d.f.'s, be given. There exists a probability space(Q, :'T , J ) and a sequence of independent r .v.'s {X j } defined on it such thatfor each j, It j is the p.m. of X j .

PROOF . Without loss of generality we may suppose that the givensequence is infinite. (Why?) For each n, let (Qn, Jn , ., -A„) be a probability space

Page 75: A Course in Probability Theory KL Chung

3.3 INDEPENDENCE 1 61

in which there exists an r .v . X„ with µn as its p .m. Indeed this is possible if wetake (S2„ , ?l„ , ;J/;,) to be (J/? 1 , J/,3 1 , µ„) and X„ to be the identical function ofthe sample point x in

now to be written as w, (cf. Exercise 3 of Sec . 3 .1) .

Now define the infinite product space

00S2 = X Qn

n=1

on the collection of all "points" co = {c )l, cot, . . . , cv n , . . .}, where for each n,co,, is a point of S2„ . A subset E of Q will be called a "finite-product set" iffit is of the form

00(11)

E = X Fn,n=1

where each Fn E 9„ and all but a finite number of these F n 's are equal to thecorresponding Stn 's. Thus cv E E if and only if con E Fn , n > 1, but this isactually a restriction only for a finite number of values of n . Let the collectionof subsets of 0, each of which is the union of a finite number of disjoint finite-product sets, be A . It is easy to see that the collection A is closed with respectto complementation and pairwise intersection, hence it is a field . We shall takethe z5T in the theorem to be the B .F. generated by A. This ~ is called theproduct B . F. of the sequence {fin , n > 1) and denoted by X n __1 n .

We define a set function ~~ on

as follows . First, for each finite-productset such as the E given in (11) we set

00(12)

~(E) _ fl -~ (F'n ),n=1

where all but a finite number of the factors on the right side are equal to one .Next, if E E A and

nE = U E(k) ,

k=1

where the E(k) 's are disjoint finite-product sets, we put

11

(13)

°/'(E (k)) .

k=1

If a given set E in A~ has two representations of the form above, then it isnot difficult to see though it is tedious to explain (try drawing some pictures!)that the two definitions of ,~"(E) agree. Hence the set function J/' is uniquely

Page 76: A Course in Probability Theory KL Chung

62 1 RANDOM VARIABLE . EXPECTATION. INDEPENDENCE

defined on ib ; it is clearly positive with :%'(52) = 1, and it is finitely additiveon ib by definition . In order to verify countable additivity it is sufficient toverify the axiom of continuity, by the remark after Theorem 2 .2.1 . Proceedingby contraposition, suppose that there exist a 3 > 0 and a sequence of sets{C,,, n > l} in :rb such that for each n, we have C,, D C„+1 and JT(C,,) >S > 0; we shall prove that n,°__ 1 C„ :A 0 . Note that each set E in , as wellas each finite-product set, is "determined" by a finite number of coordinatesin the sense that there exists a finite integer k, depending only on E, such thatif w = (wl, w2, . . .) and w' = (wl , w 2 , . . .) are two points of 52 with wj = w~for 1 < j < k, then either both w and w' belong to E or neither of them does .To simplify the notation and with no real loss of generality (why?) we maysuppose that for each n, the set C„ is determined by the first n coordinates .Given w?, for any subset E of Q let us write (E I w?) for 21 x E1, where E1 isthe set of points (cot, w3, . . .) in Xn_2 Q,, such that (w?, w2, w3, . . .) E E. If w0

does not appear as first coordinate of any point in E, then (E I cv?) = 0 . Notethat if E E , then (E I cv?) E A for each wo . We claim that there exists ana), such that for every n, we have '((Cn 1 (v?)) > S/2 . To see this we beginwith the equation

(14)

~(C,) =jo

=JA((Cn I co1))~'1(dwl ) .

This is trivial if C„ is a finite-product set by definition (12), and follows byaddition for any C„ in To . Now put Bn = {w1 : ^(C„ I w1)) > S/2}, a subsetof Q 1 ; then it follows that

S,S <

B1J'1(dc)l) +

B„ 2,- Ij (d co )~^

and so .T(B,,) > S/2 for every n > 1 . Since B„ is decreasing with C,,, we haveJI'1 (n,°O 1 B,,) > S/2 . Choose any cvo in nc,,o1 B,, . Then 9'((C,, I w?)) > S/2 .Repeating the argument for the set (C„ I cv?), we see that there exists an wosuch that for every n, JP((C„ I coo cv o )) > S/4, where (C,, I wo, (oz) = ((C,,(v?) 1 wo) is of the form Q1 X Q2 x E3 and E3 is the set (W3, w4 . . . . ) inX,°°_3 52„ such that (cv?, w°, w3, w4, . . .) E C,, ; and so forth by induction . Thusfor each k > 1, there exists wo such that

V11 :-~/) (( Cn I w?, . . . , w°))

S>

Zk.

Consider the point wo = ()? wo

wo

.). Since (C k I w°

wo) :Athere is a point in Ck whose first k coordinates are the same as those of w o ;since Ck is determined by the first k coordinates, it follows that w o E Ck . Thisis true for all k, hence wo E nk 1 Ck, as was to be shown .

Page 77: A Course in Probability Theory KL Chung

3.3 INDEPENDENCE 1 63

We have thus proved that ?-P as defined on ~~ is a p .m. The next theorem,which is a generalization of the extension theorem discussed in Theorem 2 .2.2and whose proof can be found in Halmos [4], ensures that it can be extendedto ~A as desired. This extension is called the product measure of the sequence{ ~6n , n > 1 } and denoted by X n°_1 ?;71, with domain the product field X n°_1 n .

Theorem 3.3.5 . Let 3 3 be a field of subsets of an abstract space Q, and °J'

a p.m. on ~j . There exists a unique p .m. on the B .F. generated by A thatagrees with GJ' on 3 .

The uniqueness is proved in Theorem 2 .2.3 .Returning to Theorem 3 .3 .4, it remains to show that the r .v.'s {w3 } are

independent . For each k > 2, the independence of {w3, 1 < j < k) is a conse-quence of the definition in (12) ; this being so for every k, the independenceof the infinite sequence follows by definition .

We now give a statement of Fubini's theorem for product measures,which has already been used above in special cases . It is sufficient to considerthe case n = 2. Let 0 = Q 1 x 522, 3;7= A x .lz and _ 37'1 x A be theproduct space, B .F ., and measure respectively .

Let (z~Kj, 9'1 ), (3,, A) and (j x , °J'1 x A) be the completionsof ( 1 ,

and (~1 x ~,JPj x A), respectively, according toTheorem 2 .2.5 .

Fubini's theorem . Suppose that f is measurable with respect to 1 xand integrable with respect to ?/-~1 x A . Then

(i) for each w1 E Q1\N1 where N 1 E ~1 and 7 (N1) = 0, f (c0 1 , .) ismeasurable with respect to ,J~ and integrable with respect to A, ;

(ii) the integral

f f ( •, w2)A(dw2)

is measurable with respect to `Tj and integrable with respect to ?P1;(iii) The following equation between a double and a repeated integral

holds :

( 15 ) f f f (a) 1, (02)?/'l x `~'2(dw) = f f f (wl, w2)'~z(dw2) ~i (dw1)sl,

c2 zQ, X Q'

Furthermore, suppose f is positive and measurable with respect toJ% x 1; then if either member of (15) exists finite or infinite, so does theother also and the equation holds .

Page 78: A Course in Probability Theory KL Chung

64 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

Finally if we remove all the completion signs "-" in the hypothesesas well as the conclusions above, the resulting statement is true with theexceptional set N 1 in (i) empty, provided the word "integrable" in (i) bereplaced by "has an integral ."

The theorem remains true if the p .m.'s are replaced by Q-finite measures ;see, e.g., Royden [5] .

The reader should readily recognize the particular cases of the theoremwith one or both of the factor measures discrete, so that the correspondingintegral reduces to a sum . Thus it includes the familiar theorem on evaluatinga double series by repeated summation .

We close this section by stating a generalization of Theorem 3 .3.4 to thecase where the finite-dimensional joint distributions of the sequence {Xj } arearbitrarily given, subject only to mutual consistency . This result is a particularcase of Kolmogorov's extension theorem, which is valid for an arbitrary familyof r.v.'s .

Let m and n be integers : 1 < m < n, and define nmn to be the "projectionmap" of Am onto 7A" given by

`dB E

'" : 7rmn (B) _ {(x1,

xn ) : (X1, . . , Xm ) E B) .

Theorem 3.3.6 . For each n > 1, let µ" be a p.m. on (mil", A") such that

(16) dm < n : µ n ° 7rmn = µm .

Then there exists a probability space (0, ~ , 0') and a sequence of r .v.'s {X j }on it such that for each n, µ" is the n-dimensional p .m. of the vector

(X1, • • , Xn) .

Indeed, the Q and x may be taken exactly as in Theorem 3 .3.4 to be theproduct space

lim.cm~ l

-.x

An-CJ

and X ~TilJ

where (Q 1 , JTj ) = (Al , A 1 ) for each j ; only .JT is now more general . In termsof d.f.'s, the consistency condition (16) may be stated as follows . For eachin > 1 and (x1, . . . , x,n ) E "', we have if n > m:

Fn (X1, . . . , Xm, Xm+l, . . . , xn) = Fm(X1, . . . , X m ) .

For a proof of this first fundamental theorem in the theory of stochasticprocesses, see Kolmogorov [8] .

Page 79: A Course in Probability Theory KL Chung

EXERCISES

1 . Show that two r.v .'s on (S2, ~T ) may be independent according to onep.m. ?/' but not according to another!

*2. If X I and X 2 are independent r .v .'s each assuming the values +1and -1 with probability 2, then the three r.v.'s {X 1 , X2 , XIX2 } are pairwiseindependent but not totally independent. Find an example of n r.v.'s such thatevery n - 1 of them are independent but not all of them . Find an examplewhere the events A 1 and A2 are independent, A 1 and A3 are independent, butA I and A2 U A 3 are not independent .

*3. If the events {Ea , a E A} are independent, then so are the eventsIF,, a E A), where each Fa may be Ea or Ea; also if {Ap, ,B E B), whereB is an arbitrary index set, is a collection of disjoint countable subsets of A,then the events

U Ea, 8 E B,aEAp

are independent .4. Fields or B .F.'s 3 a (C 3) of any family are said to be independent iff

any collection of events, one from each ./a., forms a set of independent events .Let a° be a field generating Via . Prove that if the fields a are independent,then so are the B.F.'s 3 . Is the same conclusion true if the fields are replacedby arbitrary generating sets? Prove, however, that the conditions (1) and (2)are equivalent . [HINT : Use Theorem 2.1 .2 .]

5. If {X a } is a family of independent r.v.'s, then the B .F.'s generatedby disjoint subfamilies are independent. [Theorem 3 .3.2 is a corollary to thisproposition .]

6. The r.v . X is independent of itself if and only if it is constant withprobability one. Can X and f (X) be independent where f E A I ?

7. If {Ej , 1 < j < oo} are independent events then

00

cc

g)

(j=1

j=1n Ej = 11 UT(Ej),

where the infinite product is defined to be the obvious limit; similarly

00%/' U E j _ - 11(1 - 91'(E,))~ ( 00

J=1

j=1

8. Let {Xj , 1 < j < n} be independent with d .f.'s {Fj , I < J < n} . Findthe d.f. of maxi Xj and mini Xj .

3.3 INDEPENDENCE I 65

Page 80: A Course in Probability Theory KL Chung

66 1 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

*9. If X and Y are independent and (_, (X) exists, then for any Bore] setB, we have

{

X d.JT = c`(X)~'(Y E B).YEB}

* 10 . If X and Y are independent and for some p > 0: ()(IX + Y I P) < 00,then (IXIP) < oc and e (IYIP) < oo .

11. If X and Y are independent, (t( IX I P) < oo for some p > 1, andiP(Y) = 0, then c` (IX + Y I P) > (_,~(IX I P) . [This is a case of Theorem 9 .3.2 ;but try a direct proof!]

12. The r.v .'s 16j ) in Example 4 are related to the "Rademacher func-tions" :

rj (x) = sgn(sin 2 1 rx) .

What is the precise relation?13. Generalize Example 4 by considering any s-ary expansion where s >

2 is an integer :00

Enx =

where En = 0, l , . . ., s - 1 .S 1

n=1

* 14. Modify Example 4 so that to each x in [0, 1 ] there corresponds asequence of independent and identically distributed r .v.'s {En , n > 1}, eachtaking the values 1 and 0 with probabilities p and 1 - p, respectively, where0 < p < 1 . Such a sequence will be referred to as a "coin-tossing game (withprobability p for heads)" ; when p = 1 it is said to be "fair" .

15 . Modify Example 4 further so as to allow c,, to take the values 1 and0 with probabilities p„ and 1 - p,1 , where 0 < pn < 1, but pn depends on n .

16. Generalize (14) to

J' (C) =

~"((C I wl, . . . , (0k))r'T1(dwl) . . . I (d(Ok)C(1k)

= f ^(C I w1, . . ., wk))~'(dw),

where C(l, . . . , k) is the set of (w1, . . . , wk) which appears as the first kcoordinates of at least one point in C and in the second equation w 1 , . . . , wkare regarded as functions of w .

* 17. For arbitrary events {Ej , 1 < j < n), we have

n

j=1

:JII(Ej) -

T(EjEk).1<j<k<n

Page 81: A Course in Probability Theory KL Chung

If do : {E~n 1 , 1 < j < n } are independent events, and

then

then

n

J' U E(' ) -~ 0 as n -* oo,(j=l

n

Gl'

n

U E~n )

~(E(n ) ) .j=I j=1

18. Prove that A x A =,4 A x

where J3 is the completion of A withrespect to the Lebesgue measure ; similarly A x 3.

19. If f E 3l x ~ and

fffdI I(~j x A) < oo,0 1 X Q2

fn ' ff dGJ-'z ] dJ' 1 = fn2 [L1

f d?P i d?Pz .

3.3 INDEPENDENCE I 67

20. A typical application of Fubini's theorem is as follows . If f is aLebesgue measurable function of (x, y) such that f (x, y) = 0 for each x E 9 1and y N, where m(NX ) = 0 for each x, then we have also f (x, y) = 0 foreach y N and X E NY', where m(N) = 0 and m(Ny) = 0 for each y N .

Page 82: A Course in Probability Theory KL Chung

4 Convergence concepts

4.1 Various modes of convergence

As numerical-valued functions, the convergence of a sequence of r .v.'s{X„ , n ? I), to be denoted simply by {X, } below, is a well-defined concept.Here and hereafter the term "convergence" will be used to mean convergenceto a finite limit. Thus it makes sense to say : for every co E A, whereA E ~T, , the sequence {X„ (w)} converges. The limit is then a finite-valuedr.v. (see Theorem 3.1 .6), say X(w), defined on A . If Q = A, then we have"convergence every-where", but a more useful concept is the following one .

DEFINITION OF CONVERGENCE "ALMOST EVERYWHERE" (a.e.) . The sequence ofr.v. {X, } is said to converge almost everywhere [to the r .v. X] iff there existsa null set N such that

(1)

Vw E Q\N: Jim X„ (w) = X (o)) finite .li -~ OC

Recall that our convention stated at the beginning of Sec . 3 .1 allowseach r.v. a null set on which it may be ±oo. The union of all these sets beingstill a null set, it may be included in the set N in (1) without modifying the

Page 83: A Course in Probability Theory KL Chung

conclusion . This type of trivial consideration makes it possible, when dealingwith a countable set of r.v .'s that are finite a.e ., to regard them as finiteeverywhere .

The reader should learn at an early stage to reason with a single samplepoint coo and the corresponding sample sequence {Xn (coo), n > 1 } as a numer-ical sequence, and translate the inferences on the latter into probability state-ments about sets of co. The following characterization of convergence a .e. isa good illustration of this method .

Theorem 4.1.1 . The sequence {X n } converges a.e. to X if and only if forevery c > 0 we have

(2)

lim J'{IX n -XI < E for all n > m} = 1 ;M--+00

or equivalently

(2')

lim J'{ IXn - X 1 > E for some n > m} = 0 .M_+00

PROOF . Suppose there is convergence a .e. and let 00 = Q\N where N isas in (1) . For m > 1 let us denote by Am (E) the event exhibited in (2), namely:

00(3)

Am(E) = n {IXn - XI < E} .n=m

Then Am (E) is increasing with m . For each coo, the convergence of {Xn (c )o)}to X(coo) implies that given any c > 0, there exists m(coo, c) such that

(4)

n > m(wo, E) = Wn(wo) -X(wo)I E.

Hence each such coo belongs to some Am (E) and so Q o C U,° 1 A,n (E) .It follows from the monotone convergence property of a measure thatlimm, %'(Am (E)) = 1, which is equivalent to (2) .

Conversely, suppose (2) holds, then we see above that the set A(E) _

U00=1 Am (E) has probability equal to one . For any 0.)o E A(E), (4) is true forthe given E. Let E run through a sequence of values decreasing to zero, forinstance [11n) . Then the set

O0

1A= nA nn=1

still has probability one since

195(A) = lira ?T

(A Cn .

4.1 VARIOUS MODES OF CONVERGENCE 1 69

Page 84: A Course in Probability Theory KL Chung

70 ( CONVERGENCE CONCEPTS

If coo belongs to A, then (4) is true for all E = 1 /n, hence for all E > 0 (why?) .This means {X„ (wo)} converges to X(a)o) for all wo in a set of probability one .

A weaker concept of convergence is of basic importance in probabilitytheory.

DEFINITION OF CONVERGENCE "IN PROBABILITY" (in pr.) . The sequence {X„ }is said to converge in probability to X iff for every E > 0 we have

(5)

lim :%T{IXn -XI > E} = 0 .n-+o0

Strictly speaking, the definition applies when all Xn and X are finite-valued. But we may extend it to r .v.'s that are finite a .e. either by agreeingto ignore a null set or by the logical convention that a formula must first bedefined in order to be valid or invalid . Thus, for example, if X, (w) = +ooand X (w) = +oo for some w, then Xn (w) - X (CO) is not defined and thereforesuch an w cannot belong to the set { IX„ - X I > E} figuring in (5) .

Since (2') clearly implies (5), we have the immediate consequence below .

Theorem 4.1.2. Convergence a .e. [to X] implies convergence in pr . [to X] .

Sometimes we have to deal with questions of convergence when no limitis in evidence . For convergence a.e. this is immediately reducible to the numer-ical case where the Cauchy criterion is applicable . Specifically, {Xn } convergesa.e. iff there exists a null set N such that for every w E Q\N and every c > 0,there exists m(a), E) such that

n ' > n > m(w, c) = IX,, (w) -X n '(w)I < E.

The following analogue of Theorem 4 .1 .1 is left to the reader.

Theorem 4.1 .3 . The sequence {X, } converges a.e. if and only if for every ewe have

(6)

lim %/'{ IX, - X n ' I > E for some n' > n > ,n} = 0 .m-00

For convergence in pr., the obvious analogue of (6) is

(7)

lim J~{IX,, -Xn 'I > E} = 0.n--x

It can be shown (Exercise 6 of Sec . 4.2) that this implies the existence of afinite r.v . X such that X„ --+ X in pr .

Page 85: A Course in Probability Theory KL Chung

4.1 VARIOUS MODES OF CONVERGENCE I 71

DEFINITION OF CONVERGENCE IN LP, 0 < p < oc . The sequence {X n } is saidto converge in LP to X iff X, E LP, X E LP and

(8)

lim ("(IX, - XIP) = 0 .n-+oo

In all these definitions above, Xn converges to X if and only if X„ - Xconverges to 0. Hence there is no loss of generality if we put X - 0 in thediscussion, provided that any hypothesis involved can be similarly reducedto this case . We say that X is dominated by Y if IX I < Y a.e., and that thesequence {X n } is dominated by Y iff this is true for each X, with the sameY. We say that X or {X„ } is uniformly bounded iff the Y above may be takento be a constant.

Theorem 4.1.4 . If Xn converges to 0 in LP, then it converges to 0 in pr .The converse is true provided that {X n } is dominated by some Y that belongsto LP .

Remark. If Xn ->. X in LP, and {Xn } is dominated by Y, then {Xn -X}is dominated by Y + I X I, which is in LP . Hence there is no loss of generalityto assume X - 0.

PROOF . By Chebyshev inequality with ~p(x) - IxI P, we have( (IXnIP)(9)

9/11 IX, I > E) <EP

Letting n --* oc, the right member ---> 0 by hypothesis, hence so does the left,which is equivalent to (5) with X = 0 . This proves the first assertion. If nowIX„ I < Y a.e. with E(YP) < oo, then we have

%(IX,I P ) =

IXn I P d~P+J

IX,I P d:J~<EP + f

YPdJTPftIXh <El

{Xn?E)

{IXnI?E)

Since P( IXn I > 6) ---> 0, the last-written integral -+ 0 by Exercise 2 inSec. 3 .2. Letting first n -+ oo and then E --~ 0, we obtain ("(IX, I P) - 0; henceX„ converges to 0 in LP .

As a corollary, for a uniformly bounded sequence {X„ } convergence inpr. and in LP are equivalent . The general result is as follows .

Theorem 4.1.5 . X„ --+ 0 in pr. if and only if

(10)

( IXnI

1+IX„I

0-

Furthermore, the functional p( ., .) given by

IX - YIp(X,Y)= (

`C1+Ix-YI

Page 86: A Course in Probability Theory KL Chung

72 l CONVERGENCE CONCEPTS

is a metric in the space of r .v.'s, provided that we identify r .v.'s that areequal a.e .

PROOF. If p(X, Y) = 0, then (``(IX - Y j) = 0, hence X = Y a.e. byExercise 1 of Sec. 3 .2. To show that p( •, •)

is metric it is sufficient to show that

IX - YI

C',IXI+

lYlI (

1+IX-YI

1+IXI

1+IYI

For this we need only verify that for every real x and y :

(11) Ix + yl<Ixl

+IYI

1+lx+yl - I+IxI

1+IY1

B y symmetry we may suppose that I Y I < IxI ; then

(12) Ix + A _ lxl _

Ix + A -IxI1 + Ix + yl

I +IxI

(1 + Ix + yI)(1 +IxI)

< I Ix + YI - IxI I <IYI1+IxI

1+lyl

For any X the r.v. IXI/(1 + IXI) is bounded by 1, hence by the second

part of Theorem 4.1 .4 the first assertion of the theorem will follow if we showthat IX„ 1 0 in pr. if and only if J X, I / (1 + I X n 1) - 0 in pr. But Ix I < E isequivalent to Ix l / (1 + lx I) < E/ (1 + E) ; hence the proof is complete .

Example 1. Convergence in pr. does not imply convergence in LP, and the latterdoes not imply convergence a .e .

Take the probability space (2, , -1fl to be (9/,X13, m) as in Example 2 ofSec. 2.2. Let cok j be the indicator of the interval

Cikl,kl,

k>_1,1<j<k .

Order these functions lexicographically first according to k increasing, and then foreach k according to j increasing, into one sequence {X„ } so that if X„ = cpk„ i . , thenk„ - oc as n -+ oo . Thus for each p > 0 :

(`(X;') = 1 - 0,k„

and so X„ -+ 0 in L" . But for each w and every k, there exists a j such that cPkj (w) = 1 ;hence there exist infinitely many values of n such that X„ (w) = 1 . Similarly there existinfinitely many values of n such that X, (w) = 0. It follows that the sequence {Xn (w)}of 0's and l's cannot converge for any w . In other words, the set on which (X,)converges is empty .

Page 87: A Course in Probability Theory KL Chung

4.1 VARIOUS MODES OF CONVERGENCE 1 73

Now if we replace cok, by k"Pcokf , where p > 0, then -7?{X,, > 01 = 1/k" -+0 so that X,, -+ 0 in pr., but for each n, we have i (X„ P) = 1 . Consequentlylim,, -,, (` (IX„ - 01P) = 1 and X,, does not -* 0 in LP .

Example 2. Convergence a .e. does not imply convergence in LP .In (Jl/, .l, m) define

X" (w) =12",

if WE 0,- ;n0,

otherwise .

Then ("- (IX,, JP) = 2"P/n -+ +oo for each p > 0, but X, --* 0 everywhere .

The reader should not be misled by these somewhat "artificial" examplesto think that such counterexamples are rare or abnormal . Natural examplesabound in more advanced theory, such as that of stochastic processes, butthey are not as simple as those discussed above . Here are some culled fromlater chapters, tersely stated . In the symmetrical Bernoullian random walk{S,,, n > 1), let 1 {sn = 0} . Then limn 0 for every p > 0, but

exists) = 0 because of intermittent return of S, to the origin (seeSec . 8 .3) . This kind of example can be formulated for any recurrent processsuch as a Brownian motion . On the other hand, if {~„ , n > 1) denotes therandom walk above stopped at 1, then 0 for all n but g{lim„, co ~„ _11 = 1 . The same holds for the martingale that consists in tossing a faircoin and doubling the stakes until the first head (see Sec . 9.4) . Anotherstriking example is furnished by the simple Poisson process {N(t), t > 01 (seeSect . 5 .5) . If ~(t) = N(t)/t, then c (~(t)) = ? for all t > 0 ; but °J'{limao 0) =01 = 1 because for almost every co, N(t, w) = 0 for all sufficiently small valuesof t . The continuous parameter may be replaced by a sequence t, , . 0 .

Finally, we mention another kind of convergence which is basic in func-tional analysis, but confine ourselves to L 1 . The sequence of r .v.'s {X„ 1 in L 1is said to converge weakly in L 1 to X iff for each bounded r .v . Y we have

lim (,(X,,Y) = (XY), finite .n--00

It is easy to see that X E L 1 and is unique up to equivalence by takingY = llxox-} if X' is another candidate . Clearly convergence in L1 definedabove implies weak convergence ; hence the former is sometimes referred toas "strong". On the other hand, Example 2 above shows that convergencea.e. does not imply weak convergence ; whereas Exercises 19 and 20 belowshow that weak convergence does not imply convergence in pr . (nor even indistribution; see Sec . 4.4) .

Page 88: A Course in Probability Theory KL Chung

74 1 CONVERGENCE CONCEPTS

EXERCISES

1 . X„ -~ +oc a .e. if and only if VM > 0: 91{X, < M i.o.) = 0 .2. If 0 < X,, X, < X E L1 and X, -* X in pr., then X, -~ X in L1 .3. If Xn -+ X, Y„ -* Y both in pr., then X, ± Y„ -+ X + Y, X„ Y n

X Y, all in pr.*4. Let f be a bounded uniformly continuous function in

Then X,0 in pr. implies (`if (X,,)) - f (0) . [Example :

f (x)_1XI

1+1XI

as in Theorem 4.1 .5 .]5. Convergence in LP implies that in L' for r < p.6. If X, ->. X, Y,

Y, both in LP, then X, ± Yn --* X ± Y in LP. IfX, --* X in LP and Y, -~ Y in Lq, where p > 1 and 1 / p + l 1q = 1, thenXn Yn -+ XY in L 1 .

7. If Xn

X in pr. and Xn -->. Y in pr., then X = Y a .e .8. If X, -+ X a.e. and µn and A are the p.m.'s of X, and X, it does not

follow that /.c„ (P) - µ(P) even for all intervals P.9. Give an example in which e(Xn ) -)~ 0 but there does not exist any

subsequence Ink) - oc such that Xnk -* 0 in pr.* 10. Let f be a continuous function on M.1 . If X, -+ X in pr., then

f (X n ) - f (X) in pr. The result is false if f is merely Borel measurable .[HINT : Truncate f at +A for large A .]

11. The extended-valued r.v. X is said to be bounded in pr. iff for eachE > 0, there exists a finite M(E) such that T{IXI < M(E)} > 1 - E . Prove thatX is bounded in pr. if and only if it is finite a .e .

12. The sequence of extended-valued r.v. {Xn } is said to be bounded inpr. iff supra Wn I is bounded in pr . ; {Xn ) is said to diverge to +oo in pr . iff foreach M > 0 and E > 0 there exists a finite n o (M, c) such that if n > n o , thenJT{ jX„ j > M } > 1 - E . Prove that if {X„ } diverges to +00 in pr. and {Y„ } isbounded in pr ., then {Xn + Y„ } diverges to +00 in pr .

13 . If sup„ X n = +oc a.e ., there need exist no subsequence {X nk } thatdiverges to +oo in pr .

14. It is possible that for each co, limnXn (co) = +oo, but there doesnot exist a subsequence {n k } and a set A of positive probability such thatlimkXnk (c)) = +00 on A . [HINT : On (//, A) define Xn (co) according to thenth digit of co .]

-a

Page 89: A Course in Probability Theory KL Chung

4.2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 75

*15. Instead of the p in Theorem 4.1 .5 one may define other metrics asfollows. Let Pi (X, Y) be the infimum of all e > 0 such that

' (IX - YI > E) < E .

Let p2 (X , Y) be the infimum of UT{ IX - Y I > E} + E over all c > 0. Provethat these are metrics and that convergence in pr. is equivalent to convergenceaccording to either metric .

* 16. Convergence in pr . for arbitrary r.v.'s may be reduced to that ofbounded r.v.'s by the transformation

X' = arctan X .

In a space of uniformly bounded r.v .'s, convergence in pr . is equivalent tothat in the metric po(X, Y) = (-P(IX - YI); this reduces to the definition givenin Exercise 8 of Sec . 3.2 when X and Y are indicators .

17. Unlike convergence in pr ., convergence a .e. is not expressibleby means of metric . [r-nwr: Metric convergence has the property that ifp(x,,, x) -+* 0, then there exist c > 0 and ink) such that p(x,1 , x) > E forevery k .]

18 . If X„ ~ X a.s ., each X, is integrable and inf„ (r(X„) > -oo, thenX„- XinL 1 .

19. Let f„ (x) = 1 + cos 27rnx, f (x) = 1 in [0, 1] . Then for each g E L 1[0, 1 ] we have

0 1

1

I frig dx

0fg dx,

but f „ does not converge to f in measure . [HINT : This is just theRiemann-Lebesgue lemma in Fourier series, but we have made f „ > 0 tostress a point .]

20. Let {X, } be a sequence of independent r .v.'s with zero mean andunit variance . Prove that for any bounded r .v . Y we have lim, f`(X„Y) = 0 .[HINT : Consider (1'71[Y - ~ _ 1 ~ (XkY)Xk] 2 } to get Bessel's inequality c(Y2 ) >Ek=l c`(Xk 7 )2 . The stated result extends to case where the X„ 's are assumedonly to be uniformly integrable (see Sec . 4.5) but now we must approximateY by a function of (X 1 , . . . , X,,,), cf. Theorem 8.1 .1 .]

4.2 Almost sure convergence; Borel-Cantelli lemma

An important concept in set theory is that of the "lim sup" and "lim inf" ofa sequence of sets . These notions can be defined for subsets of an arbitraryspace Q .

Page 90: A Course in Probability Theory KL Chung

76 1 CONVERGENCE CONCEPTS

DEFINITION . Let E n be any sequence of subsets of S2 ; we define0C 00

00 00

lim sup En = n U En 5

lim infEn = U n En .n

m=1 n=m

n

m=1 n=m

Let us observe at once that

(1)

lim inf En = (lim sup En )`n

n

so that in a sense one of the two notions suffices, but it is convenient toemploy both. The main properties of these sets will be given in the followingtwo propositions .

(i) A point belongs to I'M sup, En if and only if it belongs to infinitelymany terms of the sequence {En , n > 1} . A point belongs to liminfn En ifand only if it belongs to all terms of the sequence from a certain term on .It follows in particular that both sets are independent of the enumeration ofthe En 's .

PROOF . We shall prove the first assertion . Since a point belongs toinfinitely many of the En 's if and only if it does not belong to all En froma certain value of n on, the second assertion will follow from (1) . Now if wbelongs to infinitely many of the En 's, then it belongs to

00F - U E n for every m ;

n=m

hence it belongs to

n FM = lim supEn .m=1

n

Conversely, if w belongs toncc 1 Fm , then w E Fm for every m. Were co tobelong to only a finite number of the En 's there would be an m such thatw V E,7 for n > m, so that

00w¢ UE„=Fill .

n=in

This contradiction proves that w must belong to an infinite number of the En 's .

In more intuitive language : the event lim sup, E n occurs if and only if theevents E,; occur infinitely often . Thus we may write

/'(I'm supEn ) = ^ En i .o . )11

Page 91: A Course in Probability Theory KL Chung

4.2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 77

where the abbreviation "i.o ." stands for "infinitely often" . The advantage ofsuch a notation is better shown if we consider, for example, the events "IX,E" and the probability JT{IX n I > E i .o.} ; see Theorem 4.2 .3 below .

(ii) If each En E :T, then we have

00

(2)

T(lim sup En ) = lim ?-il U Enn

m-+00 n=m00

(3)

?)'(1im infEn ) = lim 91n

Enn

m-+00 n=mPROOF . Using the notation in the preceding proof, it is clear that Fm

decreases as m increases . Hence by the monotone property of p .m.'s :

00

~' n FM = lim -6P' (Fm),M-+00m=1

which is (2) ; (3) is proved similarly or via (1) .

Theorem 4.2.1 . We have for arbitrary events {En } :

(4)

E 9'(En ) < oo = J(E, i .o.) = 0.n

PROOF. By Boole's inequality for p .m.'s, we have

00

(Fm) < ), ~(Ejn=m

Hence the hypothesis in (4) implies that 7(Fm ) -+ 0, and the conclusion in(4) now follows by (2) .

As an illustration of the convenience of the new notions, we may restateTheorem 4.1 .1 as follows. The intuitive content of condition (5) below is thepoint being stressed here .

Theorem 4.2.2. X„ -a 0 a .e . if and only if

(5)

YE > 0: ?/1 1 IX, I > E i .o . } = 0 .

PROOF . Using the notation An , = I In°m {IX, I <c) as in (3) of Sec . 4.1

(with X = 0), we have

00 00

00

{IXnI > E i .o.} = n U {IXnI > E) = n AM .,n=1 n=m

m=1

Page 92: A Course in Probability Theory KL Chung

78 1 CONVERGENCE CONCEPTS

According to Theorem 4 .1 .1, X, - 0 a.e. if and only if for each E > 0,

J'(A' ) -+ 0 as m -+ oo ; since Am decreases as m increases, this is equivalentto (5) as asserted .

Theorem 4.2.3. If X, -* X in pr., then there exists a sequence ink) of inte-gers increasing to infinity such that Xnk -± X a.e. Briefly stated: convergencein pr. implies convergence a .e . along a subsequence .

PROOF . We may suppose X - 0 as explained before . Then the hypothesismay be written as

tik > 0 :

nlim~ ~ 1Xn I >Zk

I = 0 .

It follows that for each k we can find nk such that

GT C 1Xnk I >2k

and consequently

1

- 2k

' (IxflkI> 2kk

k

Having so chosen Ink), we let Ek be the event "iX nk I > 1/2k" . Then we haveby (4) :

1(IxlZkI>

2ki .o . = 0.

[Note: Here the index involved in "i.o ." is k; such ambiguity being harmlessand expedient .] This implies Xnk -* 0 a.e. (why?) finishing the proof .

EXERCISES

1. Prove that

J'(lim supEn ) > lim ?7 (E„ ),n

n

J'(lim inf E n ) < lim J'(E„ ) .n

11

2. Let {B,, } be a countable collection of Borel sets in `'Z/ . If there existsa 8 > 0 such that m(Ba ) > 8 for every n, then there is at least one point in -ithat belongs to infinitely many Bn 's .

3. If {Xn } converges to a finite limit a.e ., then for any c there existsM(E) < oc such that JT{sup IXn I < M(E)} > 1 - E.

Page 93: A Course in Probability Theory KL Chung

4.2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 79

*4. For any sequence of r .v.'s {X„ } there exists a sequence of constants{A„ } such that X n /An -* 0 a.e .

*5. If {Xn } converges on the set C, then for any E > 0, there existsCo C C with 9'(C\Co) < E such that X„ converges uniformly in Co. [Thisis Egorov's theorem . We may suppose C = S2 and the limit to be zero . LetFmk= n,0 ° =k{( o : I X, (co) < 1/m} ; then Ym, 3k(m) such that

l(Fm,k(m)) > 1 - E/2m .

Take Co = fl 1 Fm, k(m) .]

6. Cauchy convergence of {X,} in pr. (or in LP) implies the existence ofan X (finite a.e.), such that X„ converges to X in pr. (or in LP) . [HINT: Choosen k so that

A(m,n,n')=

E ~P {Ixflk+lk

1- Xnk, I > 2k < oo ;

cf. Theorem 4.2 .3 .]*7. {X,) converges in pr. to X if and only if every subsequence

{Xnk} contains a further subsequence that converges a .e. to X. [ENT: UseTheorem 4 .1 .5 .]

8. Let {Xn, n > 1 } be any sequence of functions on Q to .* and letC denote the set of co for which the numerical sequence {Xn(co), n > 1}converges. Show that

c= 00 00 00 {ixn - Xn' I< m = n 00 00 00U n l \(m, n, n')m=1 n=1 n'=n+1

m=1 n=1 n'=n+1

where

w: max 1X j (w) - X k (w) In<j<k<n'

Hence if the Xn's are r.v.'s, we have

J~'(C) = lim lim lim J' (A('n' n, n')) .m-*00 n-00 n'-oo

9. As in Exercise 8 show that00 00 00

1{w: nl Xn (w) = 0} = n U n IXn I ` m

m=1 k=1 n=k

10. Suppose that for a < b, we have

J/'{X„ < a i.o. and X„ > b i.o .} = 0 ;

Page 94: A Course in Probability Theory KL Chung

80 1 CONVERGENCE CONCEPTS

then lim n ._, X„ exists a.e. but it may be infinite . [HINT : Consider all pairs ofrational numbers (a, b) and take a union over them .]

Under the assumption of independence, Theorem 4 .2.1complement .

Theorem 4.2.4 .

(6)

(7)

PROOF. By (3) we have

1:

If the events {En) are independent, then

J7(En) = oc =:~ 1~6(E n i.o .) = l .

n

00

?{lim inf En } = lim I n Enn

my 00 n=m

The events {Ei } are independent as well as {En }, by Exercise 3 ofSec. 3 .3 ; hence we have if m' > m :

m'

m'

m'n En = 11 ~(Ec) = 11(1 - ~(En ))n=m

n=m

n=m

Now for any x > 0, we have 1 - x < e-x; it follows that the last term abovedoes not exceed

m'

m'

11 e-JP(E ^ ) = exp - )7 (En )n=m

n=m

Letting in' --* oc, the right member above -* 0, since the series in theexponent ---> +oo by hypothesis . It follows by monotone property that

cc

in'J'

E; = lim %' n En = 0.n

m -+ 00n

n=m

Thus the right member in (7) is equal to 0, and consequently so is the leftmember in (7). This is equivalent to :JT(En i .o.) = 1 by (1) .

Theorems 4.2.1 and 4.2.4 together will be referred to as theBorel-Cantelli lemma, the former the "convergence part" and the latter "thedivergence part" . The first is more useful since the events there may becompletely arbitrary . The second has an extension to pairwise independentr.v.'s ; although the result is of some interest, it is the method of proof to begiven below that is more important . It is a useful technique in probabilitytheory.

has a striking

Page 95: A Course in Probability Theory KL Chung

Theorem 4.2.5. The implication (6) remains true if the events {E„ } are pair-wise independent .

PROOF . Let I„ denote the indicator of E,,, so that our present hypothesisbecomes

(8)

4.2 ALMOST SURE CONVERGENCE ; BOREL-CANTELLI LEMMA 1 81

Consider the series of r .v .'s : >n°_ 1 In (w) . It diverges to +oo if and only ifan infinite number of its terms are equal to one, namely if w belongs to aninfinite number of the En 's. Hence the conclusion in (6) is equivalent to

00(9)

What has been said so far is true for arbitrary En 's . Now the hypothesis in(6) may be written as

00

E AI1 ) = +00 -n=1

Consider the partial sum Jk = En=1 In . Using Chebyshev's inequality, wehave for every A > 0:

1(10)

{IJk - 0(Jk)I < A6(Jk)} > 1 -

U2 (Jk)

A2U2(Jk)= 1 - A2 '

where or2 (J) denotes the variance of J. Writing

Pit = ((I,) = ~~(En ),

we may calculate a2 (Jk) by using (8), as follows :

(f-(j2) _

n=1

k

n=1

k

= ' C`(In ) 2 + 2

n=1

dm :A n :

L (Im 1n ) = 1(I.)(%'(I n ) .

In + 2

I,nln

1 <m<n <k

((I,2,)+2

c (Ir) ("'(In)1<mvi<k

1<mv,<k

2

k

+ >(Pn - P2)n=1

n=1

k

( (Im) ( (In)+

`(I )- (In)2}n=1

Page 96: A Course in Probability Theory KL Chung

82 1 CONVERGENCE CONCEPTS

Ilcncek

Q2 (Jk) = c`(J2) - C (Jk)2 =

(pn - p2)

nQ2 (In)

n=1 =1

This calculation will turn out to be a particular case of a simple basic formula ;see (6) of Sec . 5 .1 . Since En =1 pn = (Jk) -+ 00, it follows that

9(h)

(Jk) 112 = o(~r(Jk))

in the classical "o, 0" notation of analysis . Hence if k > k0 (A), (10) implies

1

1~' A > 2 (Jk) > 1 - A2

(where i may be replaced by any constant < 1) . Since A increases with kthe inequality above holds a fortiori when the first A there is replaced bylimk,~A ; after that we can let k --+ oo in (Jk ) to obtain

1{k A = +oo} > 1 -A2

.

Since the left member does not depend on A, and A is arbitrary, this impliesthat lim k,0 Jk = + 00 a.e ., namely (9) .

Corollary. If the events {En } are pairwise independent, then

?-7-1(lim supEn ) = 0 or 1n

according as En A(En) < 00 or = cc .

This is an example of a "zero-or-one" law to be discussed in Chapter 8,though it is not included in any of the general results there .

EXERCISES

Below X, , Y„ are r.v .'s, En events .11 . Give a trivial example of dependent {En } satisfying the hypothesis

but not the conclusion of (6) ; give a less trivial example satisfying the hypoth-esis but with ~6(lim sup n En ) = 0. [HINT : Let En be the event that a real numberin [0, 1] has its n-ary expansion begin with 0 .]

*12. Prove that the probability of convergence of a sequence of indepen-dent r.v.'s is equal to zero or one .

13 . If {X,) is a sequence of independent and identically distributed r .v.'snot constant a.e ., then /'{X„ converges} = 0 .

k

Page 97: A Course in Probability Theory KL Chung

4.2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 83

*14. If IX, j is a sequence of independent r .v.'s with d.f.'s {F n }, then%T{limn X„ = 0) = 1 if and only if vE > 0 : ~n {1 - Fn (E) + F„ (-E)} < oc .

15. If >n ~(IXnI > n) < oc, then

lim sup IX"I < 1 a.e .n

n

*16. Strengthen Theorem 4 .2.4 by proving that

lim Jn = 1 a.e .n-*oc (f(Jn )

[HINT : Take a subsequence {kn } such that e(J nk ) k2 ; prove the result firstfor this subsequence by estimating P I Jk - f(Jk) I > S (-r(Jk)) ; the generalcase follows because if n k < n < nk+l,

Jnk/~P(Jnk+,) < JnlL(Jn) <- Jnk+,IC~(Jnk)

17. If a(Xn ) = 1 and e (X2) is bounded in n, then

7) { lira X n > 1 } > 0 -n-+00

[This is due to Kochen and Stone . Truncate Xn at A to Yn with A so largethat C "(Y,) > 1 -,e for all n ; then apply Exercise 6 of Sec . 3 .2 .]

*18. Let {En} be events and {In} their indicators . Prove the inequality

n

n

2

n

2

(11)

J' U Ek >

E~

k

k=1

k=1

k=1

Deduce from this that if (i) En 31)(E,1) oo and (ii) there exists c > 0 suchthat we have

b'm < n : ~l'(E,nE„) < cJ''(Em)~'(En-m) ;

then{lira sup En) > 0 .

n

19. If En ;'(En) = oo and

n

n n

.j=1 k=1

n

k=1

then I'{limsup, En } = 1 . [HINT : Use (11) above.]

}°~'(Ek)

= 1,11m

J'(EjEk)

Page 98: A Course in Probability Theory KL Chung

84 1 CONVERGENCE CONCEPTS

20. Let {En } be arbitrary events satisfying

(i) lim /'(En) = 0,

(ii)

'T(EnEi+1 ) < 00 ;

nn

then ~fllim sup„ E„ } = 0. [This is due to Barndorff-Nielsen .]

4.3 Vague convergence

If a sequence of r.v .'s {Xn } tends to a limit, the corresponding sequence ofp.m.'s {,u } ought to tend to a limit in some sense . Is it true that lim n µ,, (A)exists for all A E A 1 or at least for all intervals A? The answer is no fromtrivial examples .

Example 1 . Let X„ = c„ where the c„ 's are constants tending to zero . Then X„ - -* 0deterministically . For any interval I such that 0 0 I, where I is the closure of I, wehave limo µo (I) = 0 = µ(I) ; for any interval such that 0 E I°, where I° is the interiorof I, we have limo µ„ (I) = 1 = µ(I). But if [Co) oscillates between strictly positiveand strictly negative values, and I = (a, 0) or (0, b), where a < 0 < b, then µo (I )oscillates between 0 and 1, while µ(I) = 0 . On the other hand, if I = (a, 0] or [0, b),then µo (I) oscillates as before but µ(I) = 1 . Observe that {0} is the sole atom of µand it is the root of the trouble .

Instead of the point masses concentrated at c o , we may consider, e.g ., r.v .'s {X,}having uniform distributions over intervals (c,, c ;,) where c, < 0 < c', and c, -+ 0,c„ -* 0. Then again X„ -+ 0 a.e. but ,a, ((a, 0)) may not converge at all, or convergeto any number between 0 and 1 .

Next, even if {µ„ } does converge in some weaker sense, is the limit necessarilya p.m.? The answer is again no .

Example 2 . Let X„ = c„ where c„ --+ +oo. Then Xo -+ +oo deterministically .According to our definition of a r.v ., the constant +oo indeed qualifies . But forany finite interval (a, b) we have limo µ„ ((a, b)) = 0, so any limiting measure mustalso be identically zero . This example can be easily ramified ; e.g. let a„ --* -oc,b„ -+ +oo and

For any finite interval (a, b) containing 0 we have

lim µo ((a, b)) = lim µ„ ({0}) = 1 - a - fl .

X„ =a„0

with probability a,with probability 1 - a - ~,

Then X„ -). X where

X =

b„

+oo0

with probability fi .

with probability a,with probability 1 - a - ~,

-oo with probability fi .

Page 99: A Course in Probability Theory KL Chung

4.3 VAGUE CONVERGENCE 1 85

In this situation it is said that masses of amount a and ,B "have wandered off to +ooand -oc respectively ." The remedy here is obvious: we should consider measures onthe extended line R* = [-oo, +00], with possible atoms at {+oo} and {-00} . Weleave this to the reader but proceed to give the appropriate definitions which take intoaccount the two kinds of troubles discussed above .

DEFINITION . A measure g on (LI, A I ) with g( I ) < 1 will be called asubprobability measure (s.p .m.) .

DEFINITION OF VAGUE CONVERGENCE . A sequence {gn , n > 1} of s.p .m.'s issaid to converge vaguely to an s.p.m . g iff there exists a dense subset D ofRI such that

(1)

Va E D, b E D, a < b :

An ((a, b]) -± /,t ((a, b]) .

This will be denoted by

(2) An v g

and g is called the vague limit of [An)- We shall see that it is unique below .For brevity's sake, we will write g((a, b]) as it (a, b] below, and similarly

for other kinds of intervals. An interval (a, b) is called a continuity intervalof g iff neither a nor b is an atom of g; in other words iff p. (a, b) = It [a, b] .As a notational convention, g(a, b) = 0 when a > b .

Theorem 4.3.1 . Let 111n) and g be s.p.m.'s . The following propositions areequivalent .

(i) For every finite interval (a, b) and E > 0, there exists an n o (a, b, c)such that if n > no, then

(3)

g(a+E,b-E)-E < gn(a,b) < g(a-E,b+E)+E .

Here and hereafter the first term is interpreted as 0 if a + E > b - E.

(ii) For every continuity interval (a, b] of g, we have

An (a, b]

g(a, b] .

v(iii) An -* g

PROOF. To prove that (i) = (ii), let (a, b) be a continuity interval of g .It follows from the monotone property of a measure that

Eo g(a+ E, b - E) = /j, (a, b) = [a, b] = limo /-t (a - E, b + E) .

Letting n -+ oc, then c 4, 0 in (3), we have

A(a, b) < lim An (a, b) < l nm gn [a, b] < g[a, b] = A(a, b),n

Page 100: A Course in Probability Theory KL Chung

86 1 CONVERGENCE CONCEPTS

which proves (ii) ; indeed (a, b] may be replaced by (a, b) or [a, b) or [a, b]there. Next, since the set of atoms of µ is countable, the complementary setD is certainly dense in X4 1 . If a E D, b E D, then (a, b) is a continuity intervalof µ . This proves (ii) = (iii) . Finally, suppose (iii) is true so that (1) holds .Given any (a, b) and E > 0, there exist al , a2, b l , b2 all in D satisfying

a-E<ai <a<a2<a+E,

b-E<b i <b<b2<b+e .

By (1), there exists n o such that if n > no, then

I A, (ai, b j ] - A(aj, bj] I < E

for i = 1, 2 and j = 1, 2. It follows that

µ(a + E, b - E) - E < µ(a2, bi] - E < µn (a2, bi] < µn (a, b) < µn (ai, b2]

< µ(ai,b2]+E < µ(a-e,b+E)+E .

Thus (iii) =* (i) . The theorem is proved .

As an immediate consequence, the vague limit is unique . More precisely,if besides (1) we have also

Va E D', b E D', a < b : µn (a, b] --* µ'(a, b],

then µ - µ'. For let A be the set of atoms of µ and of µ' ; then if a E A',b E A`, we have by Theorem 4 .3.1, (ii) :

µ(a, b] E- µn (a, b] -+ µ'(a, b]

so that µ(a, b] = µ'(a, b] . Now A' is dense in Mi, hence the two measures µand µ' coincide on a set of intervals that is dense in Ml and must thereforebe identical (see Corollary to Theorem 2.2.3) .

Another consequence is : if µn 4 µ and (a, b) is a continuity intervalof µ, then µn (I) --+ µ(I ), where I is any of the four kinds of intervals withendpoints a, b . For it is clear that we may replace µ„ (a, b) by µ,l [a, b] in (3) .In particular, we have A„ ({a}) - 0, µ„ ({b}) - 0.

The case of strict probability measures will now be treated .

Theorem 4.3.2 . Let {µ„ } and µ be p .m.'s . Then (i), (ii), and (iii) in thepreceding theorem are equivalent to the following "uniform" strengtheningof (i) .

(i') For any 8 > 0 and c > 0, there exists no (8, E) such that if n > no

then we have for every interval (a, b), possibly infinite :

(4) µ(a +6,b-8)-E < µn (a, b) < µ(a-8,b+6)+E .

Page 101: A Course in Probability Theory KL Chung

PROOF . The employment of two numbers 6 and c instead of one c as in (3)is an apparent extension only (why?). Since (i') : (i), the preceding theoremyields at once (i') = (ii) q (iii). It remains to prove (ii) = (i') when the µ„'sand µ are p.m.'s . Let A denote the set of atoms of µ . Then there exist aninteger £ and aj E A`, 1 < j < £, satisfying :

aj <aj+ l <aj+8,

1 < j <t-1 ;

and

(5)

µ((a1, at)`) < E4

By (ii), there exist no depending on E and £ (and so on E and 8) such that ifn > n o then

E

(6)

sup I µ(aj, aj+1] - µn(aj, aj+1]I < - .1<j<t-1

4

It follows by additivity that

E

l µ(a1, at] - I-tn (al, at] I < 4

and consequently by (5) :

(7)

µn ((a1, at)c ) < E .

2

From (5) and (7) we see that by ignoring the part of (a, b) outside (a 1 , a t ), wecommit at most an error <E/2 in either one of the inequalities in (4) . Thus itis sufficient to prove (4) with 6 and c/2, assuming that (a, b) C (a 1 , at) . Letthen aj < a < aj+1 and ak < b < ak+1 , where 0 < j < k < f - 1 . The desiredinequalities follow from (6), since when n > n o we have

µn (a + 6, b - 6) -6< µn (aj+1, ak) - -

< l-t(aj+1, ak) < µ(a, b)6

< µ(aj, ak+1) < µn(aj, ak+1) + 4E

< µ„(a-8,b+6)+ 4 .

The set of all s.p .m .'s on -A 1 bears close analogy to the set of allreal numbers in [0, 1] . Recall that the latter is sequentially compact, whichmeans : Given any sequence of numbers in the set, there is a subsequencewhich converges, and the limit is also a number in the set . This is thefundamental Bolzano-Weierstrass theorem . We have the following analogue

4.3 VAGUE CONVERGENCE 1 87

Page 102: A Course in Probability Theory KL Chung

88 1 CONVERGENCE CONCEPTS

which states : The set of all s .p.m.'s is sequentially compact with respect tovague convergence . It is often referred to as "Helly's extraction (or selection)principle" .

Theorem 4.3.3. Given any sequence of s.p.m.'s, there is a subsequence thatconverges vaguely to an s .p.m .

PROOF . Here it is convenient to consider the subdistribution function(s.d.f.) F„ defined as follows :

Vx: Fn (X) = µn (- 00, X] .

If , , is a p.m., then Fn is just its d .f. (see Sec . 2.2) ; in general Fn is increasing,right continuous with Fn (-oo) = 0 and Fn (+oo) = An (9 1 ) < 1 .

Let D be a countable dense set of R 1 , and let irk, k > 1) be anenumeration of it. The sequence of numbers {Fn (rl ), n > 1} is bounded, henceby the Bolzano-Weierstrass theorem there is a subsequence {Flk , k > 1} ofthe given sequence such that the limit

lim Flk(rl) = fik-+o0

exists ; clearly 0 < L 1 < 1 . Next, the sequence of numbers {Flk (r2 ), k > 1 } isbounded, hence there is a subsequence {F2k , k > 1} of {F lk, k > 1} such that

lim F2k(r2) = £2k--*00

where 0 < £2 < 1 . Since {F2k } is a subsequence of {FU}, it converges also atrl to f l . Continuing, we obtain

F11, F12, . . . , Flk, . . .

converging at rl ;

F21, F22, . . . , F2k, . . .

converging at r l , r2 ;. . . . . . . . . . . . . . . . . . . .Fj 1, Fi2, . . . , Fkk, . . .

converging at rl, r2, . . . , rj ;. . . . . . . . . . . . . . . . . . . .

Now consider the diagonal sequence {Fkk, k > 11 . We assert that it convergesat every rj , j > 1 . To see this let r j be given . Apart from the first j - 1 terms,the sequence {Fkk, k > 1) is a subsequence of {Fjk, k > 11, which convergesat rj and hence limky00 Fkk (rj ) = f j, as desired .

We have thus proved the existence of an infinite subsequence Ink) and afunction G defined and increasing on D such that

Yr E D : lim Fnk (r) = G(r) .k a o0

From G we define a function F on %il as follows :

Yx E ?Rl : F (x) = inf G (r) .x<rED

Page 103: A Course in Probability Theory KL Chung

By Sec. 1 .1, (vii), F is increasing and right continuous . Let C denote the setof its points of continuity; C is dense in ?4 1 and we show that

Vx E C : lim F„ k (x) = F(x) .k+ oc(8)

For, let x E C and c > 0 be given, there exist r, r', and r" in D such thatr < r' < x < r" and F(r") - F(r) < E. Then we have

F(r) <G(r')<F(x)<G(r")<F(r")<F(r)+E ;

and

F„ k (r') < Fy, k (x) < Fp k (r") .

From these (8) follows, since E is arbitrary .To F corresponds a (unique) s.p.m . g such that

F(x) - F(-oo) = g(-oo, x]

as in Theorem 2 .2.2. Now the relation (8) yields, upon taking differences :

Va E C, b E C, a < b : lim gnk (a, b] = g(a, b] .k---f00

Thus g„ k 4 g, and the theorem is proved .

We say that F„ converges vaguely to F and write F„ -4 F for g„ -4 gwhere p and g are the s .p.m.'s corresponding to the s .d.f.'s F„ and F.

The reader should be able to confirm the truth of the following propositionabout real numbers. Let {x„) be a sequence of real numbers such that everysubsequence that tends to a limit (±oo allowed) has the same value for thelimit; then the whole sequence tends to this limit . In particular a boundedsequence such that every convergent subsequence has the same limit isconvergent to this limit .

The next theorem generalizes this result to vague convergence of s .p.m.'s .It is not contained in the preceding proposition but can be reduced to it if weuse the properties of vague convergence ; see also Exercise 9 below .

Theorem 4.3.4 . If every vaguely convergent subsequence of the sequenceof s .p.m.'s (g, j converges to the same g, then g, -v, g.

PROOF . To prove the theorem by contraposition, suppose g„ does notconverge vaguely to g. Then by Theorem 4.3 .1, (ii), there exists a continuityinterval (a, b) of g such that g„ (a, b) does not converge to g(a, b) . Bythe Bolzano-Weierstrass theorem there exists a subsequence Ink) tending to

4.3 VAGUE CONVERGENCE ( 89

Page 104: A Course in Probability Theory KL Chung

90 1 CONVERGENCE CONCEPTS

infinity such that the numbers An, (a, b) converge to a limit, say L :A g(a, b) .By Theorem 4.3 .3, the sequence {g,,, k > 1} contains a subsequence, sayLun t I k > 11, which converges vaguely, hence to g by hypothesis of thetheorem. Hence again by Theorem 4 .3.1, (ii), we have

An , (a, b) -+ A(a, b) .

But the left side also -f L, which is a contradiction .

EXERCISES

1. Perhaps the most logical approach to vague convergence is as follows .The sequence {An, n > 1 } of s .p.m.'s is said to converge vaguely iff thereexists a dense subset D of £ such that for every a E D, b E D, a < b, thesequence (An (a, b), n > 1 } converges . The definition given before implies this,of course, but prove the converse .

2. Prove that if (1) is true, then there exists a dense set D', such thatAn (I) -+ g(I) where I may be any of the four intervals (a, b), (a, b], [a, b),[a, b] with a E D', b c D' .

3. Can a sequence of absolutely continuous p .m.'s converge vaguely toa discrete p.m.? Can a sequence of discrete p .m .'s converge vaguely to anabsolutely continuous p.m.?

4. If a sequence of p.m.'s converges vaguely to an atomless p.m., thenthe convergence is uniform for all intervals, finite or infinite . (This is due toPolya.)

5. Let {fn } be a sequence of functions increasing in R1 and uniformlybounded there : Supra,,, I fn WI < M < oo. Prove that there exists an increasingfunction f on 91 and a subsequence {n k} such that fnk (x) -* f (x) for everyx. (This is a form of Theorem 4 .3 .3 frequently given ; the insistence on "everyx" requires an additional argument .)

6. Let {g n } be a sequence of finite measures on /31 . It is said to convergevaguely to a measure g iff (1) holds. The limit g is not necessarily a finitemeasure. But if It, (M ) is bounded in n, then g is finite .

7. If 9'n is a sequence of p .m.'s on (Q, 3T) such that ,/'n (E) convergesfor every E E ~7, then the limit is a p.m. ~'. Furthermore, if f is boundedand 9,T-measurable, then

I f d.~'n -- ' f dJ'.J s~

~

(The first assertion is the Vitali-Hahn-Saks theorem and rather deep, but itcan be proved by reducing it to a problem of summability ; see A. Renyi, [24] .

Page 105: A Course in Probability Theory KL Chung

4.4 CONTINUATION 1 91

8. If g„ and g are p .m.'s and g„ (E) -- u(E) for every open set E, thenthis is also true for every Borel set . [HINT: Use (7) of Sec . 2.2 .]

9. Prove a convergence theorem in metric space that will include bothTheorem 4.3 .3 for p .m.'s and the analogue for real numbers given before thetheorem . [IIIIVT: Use Exercise 9 of Sec. 4.4 .]

4 .4 Continuation

We proceed to discuss another kind of criterion, which is becoming ever more

popular in measure theory as well as functional analysis . This has to do with

classes of continuous functions on R1 .

CK = the class of continuous functions f each vanishing outside acompact set K(f ) ;

Co = the class of continuous functions f such that

lima,

f (x) = 0;

CB = the class of bounded continuous functions ;C = the class of continuous functions .

We have CK C Co C CB C C . It is well known that Co is the closure of CKwith respect to uniform convergence .

An arbitrary function f defined on an arbitrary space is said to havesupport in a subset S of the space iff it vanishes outside S . Thus if f E CK,

then it has support in a certain compact set, hence also in a certain compactinterval. A step function on a finite or infinite interval (a, b) is one with supportin it such that f (x) = cj for x E (aj, aj+i) for 1 < j < £, where £ is finite,a = al < . . . < at = b, and the cg's are arbitrary real numbers . It will be

called D-valued iff all the ad's and cg's belong to a given set D . When the

interval (a, b) is M1, f is called just a step function. Note that the values of

f at the points aj are left unspecified to allow for flexibility ; frequently theyare defined by right or left continuity . The following lemma is basic .

Approximation Lemma. Suppose that f E CK has support in the compactinterval [a, b] . Given any dense subset A of °41 and e > 0, there exists anA-valued step function f E on (a, b) such that

(1)

sup I f (x) - f E (x) I < E .x E.R

If f E C0, the same is true if (a, b) is replaced by M1

This lemma becomes obvious as soon as its geometric meaning is grasped .

In fact, for any f in CK, one may even require that either f E < f or f E > f .

Page 106: A Course in Probability Theory KL Chung

92 1 CONVERGENCE CONCEPTS

The problem is then that of the approximation of the graph of a plane curveby inscribed or circumscribed polygons, as treated in elementary calculus . Butlet us remark that the lemma is also a particular case of the Stone-Weierstrasstheorem (see, e.g ., Rudin [2]) and should be so verified by the reader. Sucha sledgehammer approach has its merit, as other kinds of approximationsoon to be needed can also be subsumed under the same theorem . Indeed,the discussion in this section is meant in part to introduce some modernterminology to the relevant applications in probability theory . We can nowstate the following alternative criterion for vague convergence .

Theorem 4.4.1 . Let [/-t,) and µ be s.p .m.'s . Then µn - µ if and only if

(2)

Vf E CK[or Co] : f~ f (x)µn (dx)

f (x)µ (dx).

PROOF . Suppose µn - lc ; (2) is true by definition when f is the indicatorof (a, b] for a E D, b E D, where D is the set in (1) of Sec . 4.3 . Hence by thelinearity of integrals it is also true when f is any D-valued step function . Nowlet f E Co and c > 0 ; by the approximation lemma there exists a D-valuedstep function f E satisfying (1) . We have

(3) ffd nll- ffdµ < f(f - fE)dµn

fuE - f)dµ

By the modulus inequality and mean value theorem for integrals (see Sec . 3 .2),the first term on the right side above is bounded by

f I f - fEI dhn < Ef dµn < E;

similarly for the third term. The second term converges to zero as n -* 00because fE is a D-valued step function. Hence the left side of (3) is boundedby 2E as n -f oo, and so converges to zero since c is arbitrary .

Conversely, suppose (2) is true for f E CK . Let A be the set of atomsof A as in the proof of Theorem 4.3 .2 ; we shall show that vague convergenceholds with D = A'. Let g = 1(a,b] be the indicator of (a, b] where a E D,b E D. Then, given c > 0, there exists 8(E) > 0 such that a + 8 < b - 8, andsuch that µ (U) < E where

U= (a-8,a+8)U(b-8,b+8).

Now define gl to be the function that coincides with g on (-oc, a] U [a +6, b - 8] U [b, oc) and that is linear in (a, a + 6) and in (b - 8, b) ; 92 to bethe function that coincides with g on (-oo, a - 6] U [a, b] U [b + 8, 00) and

+ f fEdAn - f fEdi

Page 107: A Course in Probability Theory KL Chung

that is linear in (a - 6, a) and (b, b + 6) . It is clear that g, < g < 92 < 91 + 1and consequently

(4)

fgidnld._ f g d An _< f 92 dl-t,

(5)

fgid g< f g dg < fgd .2 g

Since gi E CK , 92 E CK, it follows from (2) that the extremeconverge to the corresponding terms in (5) . Since

f 92 dg- f gl dg < f 1dµ - µ(U) < E,U

and E is arbitrary, it follows that the middle term in (4) also converges to thatin (5), proving the assertion .

Corollary. If {g„ } is a sequence of s .p.m.'s such that for every f E CK,

limf

f (x)g, (dx)n ~,

exists, then 1g, j converges vaguely .

For by Theorem 4.3.3 a subsequence converges vaguely, say to g . ByTheorem 4.4.1, the limit above is equal to f,,,,, f (x)g (dx) . This must thenbe the same for every vaguely convergent subsequence, according to thehypothesis of the corollary. The vague limit of every such sequence istherefore uniquely determined (why?) to be g, and the corollary follows fromTheorem 4 .3.4 .

Theorem 4.4.2 . Let {gn } and g be p.m.'s . Then gn -v,. A if and only if

(6)

Vf E CB:

f (x) A, (dx) -a f f (x) A (A).ffI

PROOF . Suppose g„ -v, g . Given E > 0, there exist a and b in D such that

(7)

lt((a, b] c ) = 1 - g((a, b]) < E .

It follows from vague convergence that there exists n0(E) such that ifn > n0(E), then

(8)

pt, ((a, b]`) = 1 - An ((a, b]) < e .

Let f E CB be given and suppose that I f I < M < oc. Consider the functionf E , which is equal to f on [a, b], to zero on (-oo, a - 1) U (b + 1, oc),

4.4 CONTINUATION 1 93

terms in (4)

Page 108: A Course in Probability Theory KL Chung

94 1 CONVERGENCE CONCEPTS

and which is linear in [a - 1, a) and in (b, b + I] . Then f E E Ch andf - f E < 2M . We have by Theorem 4 .4 .1

(9)

/ fEdlit„ -~

fEdA.

On the other hand, we have

( 10)

If - f E I d A,< I

2M dit, < 2MEf~~1

(a bh

by (8) . A similar estimate holds with µ replacing µ„ above, by (7) . Now theargument leading from (3) to (2) finishes the proof of (6) in the same way .This proves that lt„ -v+ g implies (6) ; the converse has already been provedin Theorem 4 .4.1 .

Theorems 4.3.3 and 4 .3 .4 deal with s.p.m.'s . Even if the given sequence{µ„ } consists only of strict p.m.'s, the sequential vague limit may not be so .This is the sense of Example 2 in Sec. 4 .3 . It is sometimes demanded thatsuch a limit be a p.m. The following criterion is not deep, but applicable .

Theorem 4.4.3. Let a family of p .m.'s {µa , a E A) be given on an arbitraryindex set A . In order that every sequence of them contains a subsequencewhich converges vaguely to a p.m., it is necessary and sufficient that thefollowing condition be satisfied : for any E > 0, there exists a finite interval Isuch that

(11)

inf ha (I) > 1 - E.aEA

PROOF . Suppose (11) holds . For any sequence {,a } from the family, thereexists a subsequence {,u ;t } such that µn ~ it . We show that µ is a p.m. Let Jbe a continuity interval of µ which contains the I in (11) . Then

µ(L' ) > lt(J) = lim lt„ (J) > lim m1(I)11

Since E is arbitrary, lc (%T ~) = 1 . Conversely, suppose the condition involving(11) is not satisfied, then there exists c > 0, a sequence of finite intervals I„increasing to ~W l , and a sequence {,u,,, } from the family such that

Vn : p,, (I„) < 1 - E .Let {µn } and It be as before and J any continuity interval of A. Then J C I„for all sufficiently large n, so that

µ(J) s=limµ„(J) < lim-i (I„) <-I -E .„

n

Thus µ(W~ 1 ) < 1 - E and p. is not a p .m. The theorem is proved.

Page 109: A Course in Probability Theory KL Chung

4.4 CONTINUATION 1 95

A family of p .m.'s satisfying the condition above involving (11) is said tobe tight . The preceding theorem can be stated as follows: a family of p .m.'s isrelatively compact if and only if it is tight . The word "relatively" purports thatthe limit need not belong to the family ; the word "compact" is an abbreviationof "sequentially vaguely convergent to a strict p.m." Extension of the resultto p .m.'s in more general topological spaces is straight-forward but plays animportant role in the convergence of stochastic processes .

The new definition of vague convergence in Theorem 4 .4.1 has theadvantage over the older ones in that it can be at once carried over to measuresin more general topological spaces. There is no substitute for "intervals" insuch a space but the classes C K, Co and CB are readily available. We willillustrate the general approach by indicating one more result in this direction .

Recall the notion of a lower semicontinuous function on R1 defined by :

(12)

/x E M1 : f (x) < lim f (y) .Y_X

y54x

There are several equivalent definitions (see, e.g ., Royden [5]) butthe following characterization is most useful : f is bounded and lowersemicontinuous if and only if there exists a sequence of functions fk E CBwhich increases to f everywhere, and we callf upper semicontinuous iff -fis lower semicontinuous . Usually f is allowed to be extended-valued ; but toavoid complications we will deal with bounded functions only and denoteby L and U respectively the classes of bounded lower semicontinuous andbounded upper semicontinuous functions .

Theorem 4.4.4 . If {µn } and µ are p.m.'s, then µn 4 µ if and only if oneof the two conditions below is satisfied :

(13)

V f E L : l ttm f ,f (x)µ 11 (dx) >f

f (x)µ (dx)

Vg E U : li1mJ

g(x)µn (dx) <J

g(x)µ (A) .

PROOF . We begin by observing that the two conditions above are equiv-alent by putting f = -g. Now suppose µ„ 4 µ and let fk E CB, f k T f .Then we have

(14)

l „m [f(x)jin (dx) > limf f k (x)lin (dx) = f f k (x)µ (dx)

by Theorem 4 .4.2. Letting k -* oc the last integral above converges tof f (x)µ(dx) by monotone convergence . This gives the first inequality in (13) .Conversely, suppose the latter is satisfied and let co c CB, then ~p belongs to

Page 110: A Course in Probability Theory KL Chung

96 1 CONVERGENCE CONCEPTS

both L and U, so that

f (P(x)µ (dx) < lim I cp(x)µn (dx) < lien (p[(x)iLn (dx) < [(x)p.

~o(dx)n

n J

which proves

linen f cp(x)µn (dx) _ f (p(x) . (dx) .

Hence µ„ - p. by Theorem 4 .4.2 .

Remark. (13) remains true if, e .g., f is lower semicontinuous, with +ooas a possible value but bounded below .

Corollary. The conditions in (13) may be replaced by the following :

for every open 0 : limit, (0) > µ(O) ;n

for every closed C : lien µ, (C) < µ(C) .n

We leave this as an exercise .Finally, we return to the connection between the convergence of r.v .'s

and that of their distributions .

DEFINITION OF CONVERGENCE "IN DISTRIBUTION" (in dist .) . A sequence ofr.v.'s {Xn } is said to converge in distribution to F iff the sequence [Fn)of corresponding d .f.'s converges vaguely to the d .f. F .

If X is an r .v. that has the d.f. F, then by an abuse of language we shallalso say that {X, } converges in dist . to

X

.

Theorem 4.4.5. Let IF, J, F be the d .f.'s of the r.v .'s {Xn }, X . If Xn -+ X in

pr., then Fn - F. More briefly stated, convergence in pr. implies convergencein dist .

PROOF. If Xn -+ X in pr., then for each f E CK, we have f (X,) -* f (X)in pr. as easily seen from the uniform continuity of f (actually this is truefor any continuous f, see Exercise 10 of Sec . 4 .1). Since f is bounded theconvergence holds also in LI by Theorem 4.1 .4 . It follows that

f {f (X,,)) -+ (t{f M),

which is just the relation (2) in another guise, hence µ, 4 µ .

Convergence of r .v .'s in dist. is

merely a convenient turn of speech

; itdoes not have the usual properties associated with convergence . For instance,

Page 111: A Course in Probability Theory KL Chung

if X„ -+ X in dist. and Y, -* Y in dist., it does not follow by any meansthat X„ + Y, will converge in dist. to X + Y . This is in contrast to the trueconvergence concepts discussed before ; cf. Exercises 3 and 6 of Sec . 4.1 . Butif X, and Y, are independent, then the preceding assertion is indeed true as aproperty of the convergence of convolutions of distributions (see Chapter 6) .However, in the simple situation of the next theorem no independence assump-tion is needed . The result is useful in dealing with limit distributions in thepresence of nuisance terms .

Theorem 4.4.6 . If X, --* X in dist, and Y„ --* 0 in dist ., then

(a) X, + Y, -* X in dist.(b) X„ Y n --* 0 in dist.

PROOF. We begin with the remark that for any constant c, Y, --* c in dist .i s equivalent to Y„ - c in pr. (Exercise 4 below) . To prove (a), let f E CK ,

I f I < M. Since f is uniformly continuous, given e > 0 there exists 8 suchthat Ix - yI < 8 implies I f (x) - f (y) I < E . Hence we have

{If(Xn+Yn) - f(XI)I}

< Ef{I f (X,, + Y,1) - f(X,,)I < E} I +2WP7 {I f(X,, + Y,) - f(Xn)I > E}

< E+2MJ {IYnI > 8) .

The last-written probability tends to zero as n -- oo ; it follows that

lira ("If (Xn + Yn )} = lim c {f(X")'

= c {f (X )l1100

11-*00

by Theorem 4.4 .1, and (a) follows by the same theorem .To prove (b), for given e > 0 we choose Ao so that both ±A0 are points

of continuity of the d .f. of X, and so large that

lim 9~'{IX I > A0 } = ~fl IXI > Ao } < E .n-*oo

This means %~'{IX„ I > Ao} < E for n > no (-e) . Furthermore we choose A > A0so that the same inequality holds also for n < no(E) . Now it is clear that

;-~'{lXnYnI > E} < ~,1{IX,I >A}+ ?/'{IYnI > A1 <c +"'{IYnI > ~ ? .

The last-written probability tends to zero as n - oc, and (b) follows .

Corollary. If X, -f X, an -~ a, ,Bn -+ b, all in dist. where a and b areconstants, then a,X,, + Ian ---> aX + b in dist .

4.4 CONTINUATION 1 97

Page 112: A Course in Probability Theory KL Chung

98 I CONVERGENCE CONCEPTS

EXERCISES

*1. Let g, and A be p.m.'s such that An A . Show that the conclusionin (2) need not hold if (a) f is bounded and Borel measurable and all Anand A are absolutely continuous, or (b) f is continuous except at one pointand every A„ is absolutely continuous . (To find even sharper counterexampleswould not be too easy, in view of Exercise 10 of Sec . 4 .5.)

2. Let A, ~ A when the ,a 's are s.p.m.'s . Then for each f E C andeach finite continuity interval I we have fl f dA„ -). rt f dµ.

*3. Let it, and It be as in Exercise 1 . If the f„ 's are bounded continuousfunctions converging uniformly to f, then f f „ dl-t,, -* f f d A .

*4. Give an example to show that convergence in dist . does not imply thatin pr. However, show that convergence to the unit mass 8Q does imply that inpr. to the constant a.

5. A set {µa } of p.m.'s is tight if and only if the corresponding d .f.'s{Fa } converge uniformly in a as x --> -oc and as x -+ +oo .

6. Let the r .v.'s {Xa } have the p.m.'s {Aa } . If for some real r > 0,{ I Xa I'} is bounded in a, then {Aa } is tight .

7. Prove the Corollary to Theorem 4 .4.4 .8. If the r.v.'s X and Y satisfy

J'{ IX - Y I > E} < E

for some E, then their d .f.'s F and G satisfying the inequalities :

(15)

Vx E Z I : F(x - E) - E < G(x) < F(x + E) + E .

Derive another proof of Theorem 4 .4.5 from this .*9. The Levy distance of two s.d.f.'s F and G is defined to be the infimum

of all E > 0 satisfying the inequalities in (15) . Prove that this is indeed a metricin the space of s.d .f.'s, and that F„ converges to F in this metric if and onlyif Fn --'*' Fand fxdF„~ f~dF.

10. Find two sequences of p .m.'s (,a,,) and {v„ } such that

VfECK :JfdA„- / fdv„>0 ;

but for no finite (a, b) is it true that

An (a, b) - v„ (a, b) - 0 .

[HINT : Let A„ = 8,,, v„ = &,, and choose {r„}, {s„} suitably .]11 . Let {An } be a sequence of p.m.'s such that for each f E CB, the

sequence f ,, f dA„ converges; then An 4 A, where A is a p.m . [HINT: If the

Page 113: A Course in Probability Theory KL Chung

4.5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 99

hypothesis is strengthened to include every f in C, and convergence of realnumbers is interpreted as usual as convergence to a finite limit, the result iseasy by taking an f going to oo . In general one may proceed by contradictionusing an f that oscillates at infinity .]

*12. Let Fn and F be d.f.'s such that Fn 4 F . Define Gn (9) and G(0)as in Exercise 4 of Sec . 3 .1 . Then G,(0) - G(9) in a.e . [HINT : Do this firstwhen Fn and F are continuous and strictly increasing . The general case isobtained by smoothing Fn and F by convoluting with a uniform distributionin [-6, +8] and letting 8 ~ 0; see the end of Sec . 6.1 below .]

4.5 Uniform integrability; convergence of moments

The function IxI', r > 0, is in C but not in CB, hence Theorem 4 .4.2 doesnot apply to it. Indeed, we have seen in Example 2 of Sec . 4 .1 that evenconvergence a .e. does not imply convergence of any moment of order r > 0 .For, given r, a slight modification of that example will yield X„ - X a.e .,

(IXn ~') = 1 but <<(IXI') = 0.It is useful to have conditions to ensure the convergence of moments

when X, converges a .e . We begin with a standard theorem in this directionfrom classical analysis .

Theorem 4.5.1 . If Xn -~- X a.e., then for every r > 0 :

(1)

cr° (IXI') < lim C-~(IX,I') .11-+00

If Xn - X in L', and X E L', then C`(IXnI') -* c0(IXI') .PROOF . (1) is just a case of Fatou's lemma (see Sec . 3 .2) :

lXlrdJT =

lim IX, I'd:%T < limJ IXnI'd 11 ,

/

n 22

where +oc is allowed as a value for each member with the usual convention .In case of convergence in L', r > 1, we have by Minkowski's inequality(Sec. 3 .2), since X = X„ + (X - X„) = X„ - (X,, - X):

(< (IX„I') 1 /r - ((IXn-XI')'i' <

C` (IX I') I

< ~`( IXnI')1/' + <`( IXn -XI')' 1~,

Letting n -f oc we obtain the second assertion of the theorem . For 0 < r < 1,the inequality Ix + , Ir < IxI' + 1y1' implies that

(IX,t I') - C% (IX - X', I') < (IXI') <- J (IXn I') + C (IX - x» I'),

whence the same conclusion .

The next result should be compared with Theorem 4 .1 .4 .

Page 114: A Course in Probability Theory KL Chung

100 1 CONVERGENCE CONCEPTS

Theorem 4.5.2 . If {Xn } converges in dist. to X, and for some p > 0,sup,, cc'{ IX„ I P } = M < oo, then for each r < p :

(2)

Jimc (IX, ( r ) = U~(IX

I

r)<

00.n-+o0

If r is a positive integer, then we may replace IXn I r and IXI r above by X„ rand Xr .

PROOF . We prove the second assertion since the first is similar . Let F, Fbe the d .f.'s of Xn , X; then Fn -v+ F. For A > 0 define fA on 'W1 as follows :

xr ,

if IxI < A ;(3)

fA(x) = Ar ,

if x > A;

(-A)r,

if x < -A .Then fA E CB , hence by Theorem 4.4.4 the "truncated moments" converge :

00

00

f fA(x)dFn(x) f fA (x)dF(x) .00

Next we have

f IfA(x) - xr IdFn(x) <f

Ixl r dF,(x)=J

IXn I r dJ~00

~xI>A

jX„ I>A

1< Ap_r IXnI P d~' <

M

S2

AP-r

The last term does not depend on n, and converges to zero as A -* 00. Itfollows that as A -+ oo, f 00 fA dFn converges uniformly in n to f 0 xr dF.Hence by a standard theorem on the inversion of repeated limits, we have

x

f00 fAdF=

f00 fAdFfl(4)

xr dF = lim

lim lim_ 00

A-*0O 00

A-*o0 n-*00 I.,f

00

00= lim lim

fA dFn = lim / xr dFn .n oo A-*00

00

/l -_, 00

00

We now introduce the concept of uniform integrability, which is of basicimportance in this connection. It is also an essential hypothesis in certainconvergence questions arising in the theory of martingales (to be treated inChapter 9) .

DEFINITION OF UNIFORM INTEGRABILITY . A family of r.v .'s {X,}, t E T,where T is an arbitrary index set, is said to be uniformly integrable iff

(5)

lim

IXtI d:~6 = 0A->o0 1X,1>A

uniformly in t E T.

Page 115: A Course in Probability Theory KL Chung

4.5 UNIFORM INTEGRABILITY; CONVERGENCE OF MOMENTS 1 101

Theorem 4.5.3 . The family {X t } is uniformly integrable if and only if thefollowing two conditions are satisfied :

(a) ~(IX, I) is bounded in t E T ;(b) For every c > 0, there exists 8(e) > 0 such that for any E E

~P(E) < 8(e) ==;~ f IXt I dJ' < F for every t E T.E

f

PROOF . Clearly (5) implies (a) . Next, let E E ~T and write E t for the set{w : IX t (w) I > A) . We have by the mean value theorem

IXtI dGP' = f + f

IXtI dGJ'< fEl

dg'+AJ'(E) .E

EriE,

E\E,

E,

Given c > 0, there exists A = A(e) such that the last-written integral is lessthan c/2 for every t, by (5) . Hence (b) will follow if we set 8 = E/2A . Thus(5) implies (b) .

Conversely, suppose that (a) and (b) are true . Then by the Chebyshevinequality we have for every t,

lfl IXtI >A) < AIXt1) < MA - A'

where M is the bound indicated in (a) . Hence if A > M/8, then GJ'(E t ) < 8and we have by (b) :

Thus (5) is true .

Theorem 4.5.4 . Let 0 < r < oo, X, E Lr, and X„ -+ X in pr. Then thefollowing three propositions are equivalent :

(i) {IX n Ir} is uniformly integrable ;(ii) Xn -* X in Lr ;

(iii)

(IXnI r )

(IXIr) < 00.

PROOF . Suppose (i) is true ; since X, -~ X in pr . by Theorem 4 .2.3, thereexists a subsequence ink) such that Xfk -+ X a.e. By Theorem 4.5 .1 and (a)above, X E Lr . The easy inequality

IXn -XIr < 2r {IXnI r + (XI r),

valid for all r > 0, together with Theorem 4.5 .3 then implies that the sequencefixn - XIr } is also uniformly integrable. For each c > 0, we have

E,fIXt I dGJ' < E .

Page 116: A Course in Probability Theory KL Chung

102 1 CONVERGENCE CONCEPTS

(6) f IXn _XIrd=f!I'

IXn -xlrd~T+

IXn -XI'dJ'2

X„-XI>E

/IX, -XI<E

<

IX, -XI'd9'+Er .IXn-XI>E

Since J/'{IX n - X I > E} - 0 as n -+ oo by hypothesis, it follows from (b)above that the last written integral in (6) also tends to zero. This being truefor every E > 0, (ii) follows .

Suppose (ii) is true, then we have (iii) by the second assertion ofTheorem 4.5 .1 .

Finally suppose (iii) is true. To prove (i), let A > 0 and consider a functionfA in CK satisfying the following conditions :

= IxlrfA(x) <

jX jr

=0

for I x I r < A ;for A < jXjr < A + 1 ;for Ixl r > A + 1 ;

cf. the proof of Theorem 4 .4.1 for the construction of such a function . Hencewe have

lim

IX, Ir d-~711 > lim (-r{fA(Xn)} _ ~~{fA(X)} >

IXl r dg,n-*oo IX,lr<A+1

n*Oo

fXlr<A

where the inequalities follow from the shape of f A, while the limit relationin the middle as in the proof of Theorem 4 .4.5 . Subtracting from the limitrelation in (iii), we obtain

limf

Ix" Ir d 7 <

IX Ir dam .n->°o IX n Ir>A+1

IXir>A

The last integral does not depend on n and converges to zero as A -+ oo . Thismeans: for any c > 0, there exists Ao = Ao(E) and no = no(Ao(E)) such thatwe have

supfix

IXn I" d °T < En>no

, V r>A+1

provided that A > Ao . Since each IX n Ir is integrable, there exists A 1 = A 1 (E)

such that the supremum above may be taken over all n > 1 provided thatA > Ao v A, . This establishes (i), and completes the proof of the theorem .

In the remainder of this section the term "moment" will be restricted to amoment of positive integral order . It is well known (see Exercise 5 of Sec. 6 .6)that on (?/, A) any p.m. or equivalently its d .f. is uniquely determined byits moments of all orders . Precisely, if F1 and F2 are two d .f.'s such that

Page 117: A Course in Probability Theory KL Chung

4.5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 103

F i (0) = 0, F; (1) = 1 for i = l, 2 ; and1

1do > l :

fx" dF 1 (x) =

Ix" dF2(x),

0

0

then F1 - F2 . The corresponding result is false in (0, J3 1 ) and a furthercondition on the moments is required to ensure uniqueness. The sufficientcondition due to Carleman is as follows :

O0

11/2r

r=1 m2r

where Mr denotes the moment of order r. When a given sequence of numbers{mr , r > 1) uniquely determines a d.f. F such that

(7)

Mr =J

xr dF(x),

we say that "the moment problem is determinate" for the sequence. Of coursean arbitrary sequence of numbers need not be the sequence of moments forany d.f. ; a necessary but far from sufficient condition, for example, is thatthe Liapounov inequality (Sec . 3 .2) be satisfied. We shall not go into thesequestions here but shall content ourselves with the useful result below, whichis often referred to as the "method of moments" ; see also Theorem 6 .4.5 .

Theorem 4.5.5. Suppose there is a unique d .f. F with the moments {m (r) , r >

11, all finite. Suppose that {F„ } is a sequence of d.f.'s, each of which has allits moments finite :

00

In nr) =

xr dF„ .

Finally, suppose that for every r > 1 :

(8) lim mnr) = mfr > .ri-00

Then F,1 --v+ F .

PROOF . Let A,, be the p.m. corresponding to F,, . By Theorem 4.3 .3 thereexists a subsequence of {µ„ } that converges vaguely . Let {µ„A } be any sub-sequence converging vaguely to some µ . We shall show that µ is indeed ap.m. with the d .f. F . By the Chebyshev inequality, we have for each A > 0 :

µ11A (-A, +A) > 1 - A-2mn~~ .

Since m;,2) - m(2) < 00, it follows that as A -+ oo, the left side convergesuniformly in k to one . Letting A - oo along a sequence of points such that

Page 118: A Course in Probability Theory KL Chung

104 I CONVERGENCE CONCEPTS

both ::LA belong to the dense set D involved in the definition of vague conver-gence, we obtain as in (4) above :

4 (m 1 ) = lim g (-A, +A) = lim lim /.t,, (-A, +A)A--oc

A-+oo k-+oo

= lim lim gnk (-A, +A) = lim g fk (A 1 ) = 1 .k-+oo A-), oo

k-+o0

Now for each r, let p be the next larger even integer . We have

XP dg, = m(np) --~ m(p) ,

hence mnp ) is bounded in k. It follows from Theorem 4 .5 .2 that

f:xrdnk

aog~

fx r dg .J 00

But the left side also converges to m(r) by (8). Hence by the uniquenesshypothesis g is the p .m. determined by F. We have therefore proved that everyvaguely convergent subsequence of [An}, or equivalently {Fn }, has the samelimit g, or equivalently F . Hence the theorem follows from Theorem 4 .3 .4 .

EXERCISES

1. If SUP, IXn I E LP and Xn --* X a.e ., then X E LP and Xn -a X in LP.2. If {X,) is dominated by some Y in L", and converges in dist . to X,

then (- (IXnV)

C(IXIp) .3. If X„ -* X in dist ., and f E C, then f (X„) - f (X) in dist .

*4. Exercise 3 may be reduced to Exercise 10 of Sec. 4 .1 as follows . LetFn , 1 < n < cc, be d .f.'s such that Fn $ F. Let 0 be uniformly distributed on[0, 1 ] and put X„ = F- 1 (0), 1 < n < oo, where Fn 1 (y) = sup{x: F n (x) < y}(cf. Exercise 4 of Sec . 3 .1) . Then Xn has d.f. Fn and X, -+ X,,, in pr .

5. Find the moments of the normal d .f. 4 and the positive normal d.f .(D + below :

()

1

xe_,,'/2 dy,

+ (x)

V n2 foe-yz/24) x

d y,

if x> 0 ;=

~ =.,/27r -x 0,

if x < 0 .

Show that in either case the moments satisfy Carleman's condition .6. If {X, } and {Y, } are uniformly integrable, then so is {X, + Y 1 } and

{Xt + Y t } .7. If {X„ } is dominated by some Y in L 1 or if it is identically distributed

with finite mean, then it is uniformly integrable .

Page 119: A Course in Probability Theory KL Chung

4.5 UNIFORM INTEGRABILITY; CONVERGENCE OF MOMENTS 1 105

*8. If sup„ (r(IX„ I p) < oc for some p > 1, then {X, } is uniformly inte-grable .

9. If {X,, } is uniformly integrable, then the sequence

is uniformly integrable .*10. Suppose the distributions of {X„ , 1 < n < oc) are absolutely contin-

uous with densities {g n } such that g n -+ g,,,, in Lebesgue measure . Thengn -a g. in L' (-oo, oo), and consequently for every bounded Borel measur-able function f we have flf (X„ ) } - S {f (Xo,)} . [HINT : f (go, - g, ) + dx =f (goo - gn )- dx and (gam - g,)+ < gam; use dominated convergence .]

Page 120: A Course in Probability Theory KL Chung

5 Law of large numbers .Random series

5 .1 Simple limit theorems

The various concepts of Chapter 4 will be applied to the so-called "law oflarge numbers" - a famous name in the theory of probability. This has to dowith partial sums

/7

S,l

j=1

of a sequence of r.v.'s . In the most classical formulation, the "weak" or the"strong" law of large numbers is said to hold for the sequence according as

(1)

Sri - (1 ~ (Sll )

0

nin pr. or a .e. This, of course, presupposes the finiteness of (, (S,,) . A naturalgeneralization is as follows :

Sil

all--*b„

where {a„ } is a sequence of real numbers and {b„ } a sequence of positivenumbers tending to infinity. We shall present several stages of the development,

Page 121: A Course in Probability Theory KL Chung

even though they overlap each other to some extent, for it is just as importantto learn the basic techniques as the results themselves .

The simplest cases follow from Theorems 4 .1 .4 and 4 .2.3, according towhich if Zn is any sequence of r.v .'s, then c`'(Z2) __+ 0 implies that Z, -+ 0in pr. and Znk -+ 0 a.e. for a subsequence Ink) . Applied to Z„ = S,, In, thefirst assertion becomes

(2)

((Sn) = o(n2 )

Sn

0 in pr .n

Now we can calculate f (Sn) more explicitly as follows :

1_<<j<k<n

(3)

~_(X2 ) + 2

e(XjXk).

1_<<j<k<n

5.1 SIMPLE LIMIT THEOREMS 1 107

X~ + 2

Observe that there are n 2 terms above, so that even if all of them are boundedby a fixed constant, only ('(S2 ) = 0(n2 ) will result, which falls critically shortof the hypothesis in (2) . The idea then is to introduce certain assumptions tocause enough cancellation among the "mixed terms" in (3) . A salient featureof probability theory and its applications is that such assumptions are not onlypermissible but realistic . We begin with the simplest of its kind .

DEFINrrI0N . Two r.v.'s X and Y are said to be uncorrelated iff both havefinite second moments and

(4)

(F(XY) = (r`(X)(_1"'(Y) •

They are said to be orthogonal iff (4) is replaced by

(5)

(F(XY) = 0 .

The r.v.'s of any family are said to be uncorrelated [orthogonal] iff every twoof them are .

Note that (4) is equivalent to

~`{(X - (`(X))(Y - (1C(Y))} = 0,

which reduces to (5) when c'(X) = (F(Y) = 0 . The requirement of finitesecond moments seems unnecessary, but it does ensure the finiteness of11(XY) (Cauchy-Schwarz inequality!) as well as that of ((X) and (`(Y), and

Page 122: A Course in Probability Theory KL Chung

108 1 LAW OF LARGE NUMBERS. RANDOM SERIES

without it the definitions are hardly useful . Finally, it is obvious that pairwiseindependence implies uncorrelatedness, provided second moments are finite .

If {X,, } is a sequence of uncorrelated r.v.'s, then the sequence {X„ -(, (X,,)) is orthogonal, and for the latter (3) reduces to the fundamental relationbelow :

(6)

(7)

U2 (Sn)n

j=12 (X j),

which may be called the "additivity of the variance" . Conversely, the validityof (6) for n = 2 implies that X 1 and X2 are uncorrelated . There are only n

terms on the right side of (6), hence if these are bounded by a fixed constantwe have now 62 (Sn ) = 0(n) = o(n2 ) . Thus (2) becomes applicable, and wehave proved the following result .

Theorem 5.1 .1 . If the X j 's are uncorrelated and their second moments havea common bound, then (1) is true in L 2 and hence also in pr .

This simple theorem is actually due to Chebyshev, who invented hisfamous inequalities for its proof. The next result, due to Rajchman (1932),strengthens the conclusion by proving convergence a .e. This result is inter-esting by virtue of its simplicity, and serves well to introduce an importantmethod, that of taking subsequences .

Theorem 5.1.2 . Under the same hypotheses as in Theorem 5 .1 .1, (1) holdsalso a .e .

PROOF . Without loss of generality we may suppose that (F (Xj ) = 0 foreach j, so that the X j 's are orthogonal . We have by (6) :

(t (S2) < Mn,

where M is a bound for the second moments . It follows by Chebyshev'sinequality that for each c > 0 we have

Mn M~{IS,I > nE} < - 2E2 = nE2 .

If we sum this over n, the resulting series on the right diverges. However, ifwe confine ourselves to the subsequence {n 2 }, then

~S n 21 > n2 E) _

nM2< 00.

n

n

Hence by Theorem 4 .2 .1 (Borel-Cantelli) we have

°/'{Isit21 > n 2E i.o.} = 0;

Page 123: A Course in Probability Theory KL Chung

and consequently by Theorem 4 .2.2

(8)

2 --* 0 a.e .n

We have thus proved the desired result for a subsequence; and the "methodof subsequences" aims in general at extending to the whole sequence a resultproved (relatively easily) for a subsequence . In the present case we must showthat Sk does not differ enough from the nearest Sn 2 to make any real difference .

Put for each n > 1 :

Dn = max ISk -Sn z 1 .

n2<_k<(n+1)2

Then we have

n+1)2

((Dn) < 2n °(IS(n+1)2 - Sn 2I 2 ) = 2n

62 (Xj ) < 4n2Mj=n 2+1

and consequently by Chebyshev's inequality

2

4M{Dn > n E{ < E2n2 .

It follows as before that

(9)

(10)

Dn-+ 0 a.e .

na

0) = . X1X2 . . . Xn . . . .

5.1 SIMPLE LIMIT THEOREMS 1 109

Now it is clear that (8) and (9) together imply (1), since

Ski < 1Sn2i+D,,

k -

n2

for n 2 < k < (n + 1)2 . The theorem is proved .

The hypotheses of Theorems 5 .1 .1 and 5 .1 .2 are certainly satisfied for asequence of independent r .v.'s that are uniformly bounded or that are identi-cally distributed with a finite second moment . The most celebrated, as well asthe very first case of the strong law of large numbers, due to Borel (1909), isformulated in terms of the so-called "normal numbers ." Let each real numberin [0, 1] be expanded in the usual decimal system :

Except for the countable set of terminating decimals, for which there are twodistinct expansions, this representation is unique . Fix a k : 0 < k < 9, and let

Page 124: A Course in Probability Theory KL Chung

110 1 LAW OF LARGE NUMBERS. RANDOM SERIES

vk" ) (w) denote the number of digits among the first n digits of w that areequal to k . Then vk"~(w)/n is the relative frequency of the digit k in the firstn places, and the limit, if existing :

limvk" (60)

= (Pk (w),n-*oo

n

may be called the frequency of k in w. The number w is called simply normal(to the scale 10) iff this limit exists for each k and is equal to 1/10 . Intuitivelyall ten possibilities should be equally likely for each digit of a number picked"at random". On the other hand, one can write down "at random" any numberof numbers that are "abnormal" according to the definition given, such as• 1111 . . ., while it is a relatively difficult matter to name even one normalnumber in the sense of Exercise 5 below . It turns out that the number

1 2345678910111213 . . .,

which is obtained by writing down in succession all the natural numbers inthe decimal system, is a normal number to the scale 10 even in the strin-gent definition of Exercise 5 below, but the proof is not so easy . As fordetermining whether certain well-known numbers such as e - 2 or r - 3 arenormal, the problem seems beyond the reach of our present capability formathematics . In spite of these difficulties, Borel's theorem below asserts thatin a perfectly precise sense almost every number is normal. Furthermore, thisstriking proposition is merely a very particular case of Theorem 5.1 .2 above .

Theorem 5 .1.3 . Except for a Borel set of measure zero, every number in[0, 1] is simply normal .

PROOF . Consider the probability space (/, J3, m) in Example 2 ofSec . 2.2 . Let Z be the subset of the form m/l0" for integers n > 1, m > 1,then m(Z) = 0 . If w E 'Z/\Z, then it has a unique decimal expansion ; if w E Z,it has two such expansions, but we agree to use the "terminating" one for thesake of definiteness . Thus we have

w= •~1~2 . . .~n . . . .

where for each n > 1, "() is a Borel measurable function of w. Just as inExample 4 of Sec . 3 .3, the sequence [~", n > 1) is a sequence of independentr.v.'s with

fi n = k) _ ,lo , k=0,1, . . .,9 .

Indeed according to Theorem 5 .1 .2 we need only verify that the ~"'s areuncorrelated, which is a very simple matter . For a fixed k we define the

Page 125: A Course in Probability Theory KL Chung

r.v. X„ to be the indicator of the set {c ) : 4, (co) = k}, then <<(X„) = 1 / 10,C (X„'` ) = 1/10, and

X

5.1 SIMPLE LIMIT THEOREMS 1 1 1 1

is the relative frequency of the digit k in the first n places of the decimal forco. According to Theorem 5 .1 .2, we have then

S„

1- --+ - a.e .n

10Hence in the notation of (11), we have ''{~Ok = 1/101 = 1 for each k andconsequently also

9

19D l l cPk = 10

=1,k=0

which means that the set of normal numbers has Borel measure one .Theorem 5.1 .3 is proved .

The preceding theorem makes a deep impression (at least on the oldergeneration!) because it interprets a general proposition in probability theoryat a most classical and fundamental level . If we use the intuitive language ofprobability such as coin-tossing, the result sounds almost trite . For it merelysays that if an unbiased coin is tossed indefinitely, the limiting frequency of"heads" will be equal to 2 -that is, its a priori probability . A mathematicianwho is unacquainted with and therefore skeptical of probability theory tendsto regard the last statement as either "obvious" or "unprovable", but he canscarcely question the authenticity of Borel's theorem about ordinary decimals .As a matter of fact, the proof given above, essentially Borel's own, is alot easier than a straightforward measure-theoretic version, deprived of theintuitive content [see, e.g ., Hardy and Wright, An introduction to the theory ofnumbers, 3rd. ed., Oxford University Press, Inc ., New York, 1954] .

EXERCISES

1 . For any sequence of r .v .'s {X„ }, if c`'(X2 )

0, then (1) is true in pr .but not necessarily a .e .

*2. Theorem 5 .1 .2 may be sharpened as follows: under the same hypo-theses we have S„ /n" -~ 0 a.e. for any a > a .

3. Theorem 5 .1 .2 remains true if the hypothesis of bounded second mo-ments is weakened to : a2 (X„) = 0(n° ) where 0 < 9 < 2 . Various combina-tions of Exercises 2 and 3 are possible .

*4. If {X„ } are independent r .v.'s such that the fourth moments (`(X4 )have a common bound, then (1) is true a .e . [This is Cantelli's strong law of

Page 126: A Course in Probability Theory KL Chung

112 1 LAW OF LARGE NUMBERS. RANDOM SERIES

large numbers . Without using Theorem 5 .1 .2 we may operate with (F(S4in 4 )as we did with C`(S /n 2 ) . Note that the full strength of independence is notneeded .]

5. We may strengthen the definition of a normal number by consideringblocks of digits. Let r > 1, and consider the successive overlapping blocks ofr consecutive digits in a decimal ; there are n - r + 1 such blocks in the firstn places . Let 0")(w) denote the number of such blocks that are identical witha given one ; for example, if r = 5, the given block may be "21212" . Provethat for a .e . w, we have for every r :

limv (")(w)

= 1-n-±oo

n

l or

[HINT : Reduce the problem to disjoint blocks, which are independent .]*6. The above definition may be further strengthened if we consider diffe-

rent scales of expansion . A real number in [0, 1 ] is said to be completelynormal iff the relative frequency of each block of length r in the scale s tendsto the limit 1 /sr for every s and r . Prove that almost every number in [0, 1 ]is completely normal .

7. Let a be completely normal. Show that by looking at the expansionof a in some scale we can rediscover the complete works of Shakespearefrom end to end without a single misprint or interruption . [This is Borel'sparadox.]

*8. Let X be an arbitrary r .v. with an absolutely continuous distribution .Prove that with probability one the fractional part of X is a normal number .[xmT: Let N be the set of normal numbers and consider ~P{X - [X] E N}.]

9. Prove that the set of real numbers in [0, 1 ] whose decimal expansionsdo not contain the digit 2 is of measure zero . Deduce from this the existenceof two sets A and B both of measure zero such that every real number isrepresentable as a sum a + b with a E A, b E B.

* 10 . Is the sum of two normal numbers, modulo 1, normal? Is the product?[HINT : Consider the differences between a fixed abnormal number and allnormal numbers: this is a set of probability one .]

5 .2 Weak law of large numbers

The law of large numbers in the form (1) of Sec . 5 .1 involves only the firstmoment, but so far we have operated with the second . In order to drop anyassumption on the second moment, we need a new device, that of "equivalentsequences", due to Khintchine (1894-1959) .

Page 127: A Course in Probability Theory KL Chung

DEFWUION. Two sequences of r .v .'s {Xn } and {Y n } are said to be equiv-alent iff

( 1 )

1: °J'{Xn 0 Y n } < 00.n

In practice, an equivalent sequence is obtained by "truncating" in variousways, as we shall see presently .

Theorem 5.2.1. If {Xn } and [Y,) are equivalent, then

E(Xn - Yn ) converges a .e .n

Furthermore if an f oo, then

(2)

5.2 WEAK LAW OF LARGENUMBERS 1 113

Yj ) -)~ 0 a.e .j=1

PROOF. By the Borel-Cantelli lemma, (1) implies that

"P7 {Xn 0 Yn i.0.} = 0 .

This means that there exists a null set N with the following property : ifco E c2\N, then there exists no(co) such that

n > no(w) , Xn (w) = Yn (CA) .

Thus for such an co, the two numerical sequences {X, (c))} and {Yn (c))) differonly in a finite number of terms (how many depending on co) . In other words,the series

E(Xn (a)) - Yn (a)))n

consists of zeros from a certain point on . Both assertions of the theorem aretrivial consequences of this fact .

Corollary . With probability one, the expression

~`

1 nL~Xn or -n

an j=1

converges, diverges to +oo or -oc, or fluctuates in the same way as

E Yn17

or

Page 128: A Course in Probability Theory KL Chung

114 1 LAW OF LARGE NUMBERS. RANDOM SERIES

respectively . In particular, if

converges to X in pr., then so does

To prove the last assertion of the corollary, observe that by Theorem 4 .1 .2the relation (2) holds also in pr . Hence if

then we have

(3)

n

(see Exercise 3 of Sec . 4.1) .

The next law of large numbers is due to Khintchine . Under the strongerhypothesis of total independence, it will be proved again by an entirelydifferent method in Chapter 6 .

Theorem 5.2.2 . Let {X„ } be pairwise independent and identically distributedr.v.'s with finite mean m . Then we have

Sn- -~ m in pr .n

PROOF . Let the common d.f. be F so that00

00

M = r(Xn ) _J

xdF(x), t (jX,,1) =J

lxi dF(x) < oc .00

n

1 n

XJ

i ~- -aJ=1

j=1

n .1=1

( -Xi)-+X+O=X inpr.

By Theorem 3 .2.1 the finiteness of (,,'(IX, 1) is equivalent to

MIX, I>n)<oo .n

Hence we have, since the X n 's have the same distribution :

(4)

E> n) < oc .n

Page 129: A Course in Probability Theory KL Chung

We introduce a sequence of r .v.'s {Y n } by "truncating at n" :

Yn (w) =Xn (w),

if IXn (co) I < n ;0,

if IXn (w) I > n .

This is equivalent to {X n } by (4), since GJ'(IXn I > n) = MXn :A Y n ) . Letn

Tnj=1

By the corollary above, (3) will follow if (and only if) we can prove Tn /n --+m in pr. Now the Yn 's are also pairwise independent by Theorem 3 .3 .1(applied to each pair), hence they are uncorrelated, since each, being bounded,has a finite second moment . Let us calculate U2 (Tn ) ; we have by (6) ofSec. 5 .1,

n

x 2 dF(x)j=1 IxI_<j

a 2 (T,) =n

j=1

x 2 dF(x) =

< L anf

IxI dF(x) +

Lian IxI dF(x)j<a

IxI `<_an

an < j <n

<

62 (Y j ) <

n

j :< a,,

n

j=1

The crudest estimate of the last term yields

.1

IxI dF(x) < n(n 11) f00Ix I dF(x),

Ixl-<j

2

00

which is 0(n 2), but not o(n2 ) as required by (2) of Sec. 5 .1 . To improveon it, let fan ) be a sequence of integers such that 0 < an < n, an --* oo buta„ = o(n) . We have

a„<j<n

5.2 WEAK LAW OF LARGENUMBERS 1 1 15

n

S(Y~)

x2 dF(x) .Ixl<j

+ E nJa,< j<n

a <Ixl<nIxI dF(x)

< nanf

IxI dF(x) + n 2 1

Ix I dF(x) .0o

IxI>a„

The first term is O(nan ) = o(n 2 ) ; and the second is n2o(l) = o(n 2 ), sincethe set {x : I x I > an } decreases to the empty set and so the last-written inte-gral above converges to zero . We have thus proved that a 2 (Tn ) = o(n 2 ) and

Page 130: A Course in Probability Theory KL Chung

116 1 LAW OF LARGE NUMBERS. RANDOM SERIES

consequently, by (2) of Sec . 5 .1,

It follows thatTnn

as was to be proved .

For totally independent r.v.'s, necessary and sufficient conditions forthe weak law of large numbers in the most general formulation, due toKolmogorov and Feller, are known . The sufficiency of the following crite-rion is easily proved, but we omit the proof of its necessity (cf. Gnedenko andKolmogorov [121) .

Theorem 5.2.3 . Let {Xn } be a sequence of independent r.v.'s with d.f.'sIF, J ; and S„ _ I;=1 Xj . Let {bn } be a given sequence of real numbers increa-sing to +oo .

Suppose that we have

0) I~=1 fxj>bn dFj (x) = 0(1),1

(ii) b2 Ej=1 fXI <bn x 2 d Fj (x) = 0(1) ;n

then if we put

1

Now it is clear that as n -+ oo, ""(Yn ) - (K (X) = m; hence alson

12j=1

n

c (Yj) --)~ m.

(5)

1

xdFj(x),J=

IXI<bn

we have1

(6)

-(Sn - a„) -- 0 in pr .bn

Next suppose that the Fn 's have the property that there exists a > 0such that

(7)

Vn: Fn (0) > X, 1 - F„ (0-) > a, .

Page 131: A Course in Probability Theory KL Chung

Then if (6) holds for the given {bn } and any sequence of real numbers {a n },the conditions (i) and (ii) must hold .

Remark . Condition (7) may be written as

9{Xn < 0} > A, GJ'{Xn > 0) >,k;

when ~. = 2 this means that 0 is a median for each X,, ; see Exercise 9 below .In general it ensures that none of the distribution is too far off center, and itis certainly satisfied if all F n are the same; see also Exercise 11 below .

It is possible to replace the an in (5) by

and maintain (6) ; see Exercise 8 below .

PROOF OF SUFFICIENCY . Define for each n > 1 and 1 <

Y n ,j

X j,

if IXjI < bn ;n,j = 0,

if IXj I > bn ;

and writen

Tnj=1

Then condition (i) may be written asn

{Yn,j

XI} = O(1)ij=1

and it follows from Boole's inequality that

bn

n

Next, condition (ii) may be written as

n

n

5.2 WEAK LAW OF LARGENUMBERS I 1 17

xdFj(x)j=1 Ix I<bj

{C(Y1

n(8)

{Tn

Sn }

n, X1)

{Yn,j

X1) = O(1).j=1 j=1

Y

2n,j

(j=1

bn

nY i t,j

j=1

b"

j=1

J< n :

from which it follows, since {Yn , j , 1 < j < n} are independent r.v.'s :

Yn,j12

( ( b,,

/

- 0(l)

Page 132: A Course in Probability Theory KL Chung

118 I LAW OF LARGE NUMBERS. RANDOM SERIES

Hence as in (2) of Sec . 5 .1,

(9)

T" - (t (T, ) -~ 0 in r .bn

pr .

It is clear (why?) that (8) and (9) together imply

Sn - (Tn) --* 0 in pr .bn

Since

where c is the constant

f (Tn) _

d (Y ,,jj=

j 1

We have then, for large values of n,

(6) is proved .

As an application of Theorem 5 .2 .3 we give an example where the weakbut not the strong law of large numbers holds .

Example. Let {Xn } be independent r .v .'s with a common d .f. F such that

Cn} = 9'{X,

_ -n} = n2 log n'

031 ~ 12 n 3 n 2 log n

dF(x) = n

1

2

1

n2. n •

x aT (x) _ -Xj<n

-3 k2 log kti

tog ni

Thus conditions (i) and (ii) are satisfied with bn = n ; and we have a„ = 0 by (5) .Hence Sn /n -* 0 in pr. in spite of the fact that `(IX, I) _ -hoc . On the other hand,we have

,1111X1I > n)

k>n

n

C

n log n '

) -I

C

C

k2 log kti

log n '

xdFj(x) = an ,

n=3,4, . . .,

ck 2

C

so that, since X, and X,, have the same d .f .,

IX,, I > n) _

"fl IX ,I > n} = oo .

Hence by Theorem 4 .2.4 (Borel-Cantelli),

(' 111X, I > n i.o .) = 1 .

Page 133: A Course in Probability Theory KL Chung

But IS,, -S,,- 1 I = IX, I > n implies IS, I > n12 or ISn _ 1 I > n/2 ; it follows that

Isn l > 2and so it is certainly false that Sn /n --* 0 a.e. However, we can prove more . For anyA > 0, the same argument as before yields

"Pfl IX n I > An i.o .) = 1

and consequently

I S n I >An2

1 .0 . = 1.

This means that for each A there is a null set Z(A) such that if w E S2\Z(A), then

(10)

lIM Sn (w) > 2n->oo n

5.2 WEAK LAW OF LARGENUMBERS I 1 19

00Let Z = U;_ 1 Z(m) ; then Z is still a null set, and if w E S2\Z, (10) is true for everyA, and therefore the upper limit is +oo . Since X is "symmetric" in the obvious sense,it follows that

lim Sn = -oo, urnSn

= +oo a.e .n->oo n

n - oo n

EXERCISES

n

Sn

j=1

1. For any sequence of r.v.'s {Xn }, and any p > 1 :

Xn -+ 0 a.e . = S" -+ 0 a.e .,n

Xn -+0inLP ==~ Sn - 0inLP0n

The second result is false for p < 1 .2. Even for a sequence of independent r .v.'s {X J, },

Xn - -)~ 0 in pr .

St - 0 in pr .n

[HINT : Let X, take the values 2" and 0 with probabilities n -1 and 1 - n -1 .]

3. For any sequence {X,):

Sn -+ 0 in pr .

X„ -* 0 in pr .n

n

More generally, this is true if n is replaced by bn , where b,,+i/b„ -* 1 .

Page 134: A Course in Probability Theory KL Chung

1 20 1 LAW OF LARGE NUMBERS. RANDOM SERIES

*4. For any 8 > 0, we have

urn

- P)n-k = 0lk-n p >nS

uniformly in p : 0 < p < 1 .*5. Let .7T(X1 = 2n) = 1/2n, n > 1 ; and let {Xn , n > l} be independent

and identically distributed . Show that the weak law of large numbers does nothold for b„ = n ; namely, with this choice of b n no sequence {an } exists forwhich (6) is true . [This is the St . Petersburg paradox, in which you win 2n ifit takes n tosses of a coin to obtain a head . What would you consider as afair entry fee? and what is your mathematical expectation?]

*6. Show on the contrary that a weak law of large numbers does hold forbn = n log n and find the corresponding a n . [xirrr: Apply Theorem 5 .2.3 .]

7. Conditions (i) and (ii) in Theorem 5 .2.3 imply that for any 8 > 0,

n

dFj (x) = o(1)~x 1>6b,

and that a„ = of nb„ ) .8. They also imply that

1

nxdFj (x) = o(1) .

n j=1 b j <jxj <bn

[HINT : Use the first part of Exercise 7 and divide the interval of integrationbj < ( xj < bn into parts of the form Ak < IxI < Ak+l with k > 1 .]

9. A median of the r .v . X is any number a such that

{X < a} > 2, J'{X>a}> 2 .

Show that such a number always exists but need not be unique .* 10. Let {X,, 1 < n < oo} be arbitrary r.v.'s and for each n let mn be a

median of X n . Prove that if Xn -a X,, in pr. and m,, is unique, then m n -±mom . Furthermore, if there exists any sequence of real numbers {c n } such thatXn - C n -> 0 in pr ., then Xn - Mn ---> 0 in pr .

11 . Derive the following form of the weak law of large numbers fromTheorem 5 .2.3 . Let {b„ } be as in Theorem 5 .2.3 and put X, = 2bn for n > 1 .Then there exists {an } for which (6) holds but condition (i) does not .

Page 135: A Course in Probability Theory KL Chung

12. Theorem 5 .2.2 may be slightly generalized as follows . Let IX,) bepairwise independent with a common d .f. F such that

(i)

x dF(x) = o(1), (ii) n f

dF(x) = 0(1);

Jx1<n

IxI>n

then S„ /n -f 0 in pr .13. Let {Xn } be a sequence of identically distributed strictly positive

random variables . For any cp such that ~p(n)/n -- 0 as n -)~ oo, show that:T{S„ > (p(n) i .o .} = 1, and so S, -- oo a.e . [HINT : Let Nn denote the numberof k < n such that Xk < cp(n)/n . Use Chebyshev's inequality to estimate

pro-PIN, > n /2} and so conclude J'{Sn > rp(n )/2} > 1 - 2F((p(n )/n ) . This pro-blem was proposed as a teaser and the rather unexpected solution was givenby Kesten.J

14. Let {bn } be as in Theorem 5 .2 .3 . and put X n = 2b„ for n > 1 . Thenthere exists {an } for which (6) holds, but condition (i) does not hold. Thuscondition (7) cannot be omitted .

5.3 Convergence of series

If the terms of an infinite series are independent r.v .'s, then it will be shownin Sec. 8.1 that the probability of its convergence is either zero or one . Herewe shall establish a concrete criterion for the latter alternative . Not only isthe result a complete answer to the question of convergence of independentr.v.'s, but it yields also a satisfactory form of the strong law of large numbers .This theorem is due to Kolmogorov (1929) . We begin with his two remarkableinequalities . The first is also very useful elsewhere ; the second may be circum-vented (see Exercises 3 to 5 below), but it is given here in Kolmogorov'soriginal form as an example of true virtuosity .

Theorem 5.3.1 . Let {X,, } be independent r .v .'s such that

Vn : (-`(X,,) = 0, c`(XZ) = U2 (Xn ) < oo .

Then we have for every c > 0 :

(1)

max jSj I > F} < a2 (Sn)

1<j<n

EZ

Remark. If we replace the maxi< j<„ jS j 1 in the formula by IS„ I, thisbecomes a simple case of Chebyshev's inequality, of which it is thus anessential improvement.

5.3 CONVERGENCE OF SERIES 1 121

Page 136: A Course in Probability Theory KL Chung

1 22 1 LAW OF LARGE NUMBERS. RANDOM SERIES

PROOF . Fix c > 0 . For any co in the set

A = {co : max ISj (w)I > E},1<j-<n

let us definev(w) = min{j : 1 < j < n, ISj(w)i > E) .

Clearly v is an r.v. with domain A . Put

Ak = {co: v(co) = k} _ {w : max ISj(w)I < E, ISk(w)I > E1,1<j<k-1

where for k = 1, maxi <j<o I Sj (w) I is taken to be zero . Thus v is the "firsttime" that the indicated maximum exceeds E, and Ak is the event that thisoccurs "for the first time at the kth step" . The Ak's are disjoint and we have

n

A = U A k .k=1

It follows thatn

(2)

Sd?'n

_A

k=1

k

n

n

I [Sk + (Sn - Sk)] 2 d9k=1 Ak

f [Sk + 2Sk (Sn - SO + (Sn - Sk ) 2 ] d _6P.k=1

k

fALet cOk denote the indicator of Ak , then the two r.v .'s cpkSk and Sn - Sk areindependent by Theorem 3 .3 .2, and consequently (see Exercise 9 of Sec . 3 .3)

Sk(Sn - SO d = f (cOkSk)(Sn - SO dJ'k

S2

f cOkSk d_1:1P

f (Sn - SO) d ~' = 0,S2

c2

since the last-written integral isn

(Sn - SO =

cP(Xj ) = 0.j=k+l

Using this in (2), we obtain

a2 (Sn) = f Sn dGJ' > f Sn dJ'A

k=1

kA

n ,(Ak) = E2 ~(A),k=1

f . SkdA

Page 137: A Course in Probability Theory KL Chung

where the last inequality is by the mean value theorem, since I Sk I > E on Ak bydefinition . The theorem now follows upon dividing the inequality above by E2 .

Theorem 5.3.2 . Let {Xn } be independent r.v.'s with finite means and sup-pose that there exists an A such that

(3)

Vn: IX,, - e(Xn )I < A < oo .

Then for every E > 0 we have

(4)

~'{ max 13 j I< E} < ( +4E)2

1 < j<n

Q2 (,Sn )

PROOF . Let Mo = S2, and for 1 < k < n

Mk = {w: max IS j I <- E},1<j <k

Ak = Mk-1 - Mk-

We may suppose that GJ'(Mn ) > 0, for otherwise (4) is trivial . Furthermore,let So = 0 and fork > 1,

k

Xk = Xk - e(Xk), Sk = ~Xjj=1

Define numbers ak , 0 < k < n, as follows :1

ak =

Sk d°J',_0P(Mk) Mk

so that

(5)

f(Sk - ak ) dOP = 0.

MA

Now we write

(6) f (Sk+1 - ak+1)2 d °J' _

(S'k - ak + ak - ak+1 + Xk+1)2 dJ'Mk+,

Mk

(Sk - ak + ak - ak+1 + xk+1 )2 dam'

and denote the two integrals on the right by I1 and 12, respectively . Using thedefinition of Mk and (3), we have

Sk - a k I =

5.3 CONVERGENCE OF SERIES 1 123

Sk - c"(Sk)- '(Mk) JMk

- (6 (Sk )} dA

Sk 1f Sk d?,71(Mk) Mk

< ISkI+E ;

Page 138: A Course in Probability Theory KL Chung

124 1 LAW OF LARGE NUMBERS. RANDOM SERIES

1

1lak - ak+1 1 =

Sk dJ' -Sk d?,/19'(Mk) Mk

'(Mk+l) fM1

1Xk+1 dJ~ < 2e +A .7~(Mk+l ) Mk+1

It follows that, since I Sk I < E on Qk+l,

12 < fo

(I SkI +,e + 2E +A +A)2 dJ' < (4+ 2A)2~(Qk+1 ) •k+1

On the other hand, we have

1l =fk

{(Sk - ak)2 + (ak - ak+1 )2 + I'k+1 + 2(S'k - ak)(ak - ak+l)M

+ 2(Sk - ak)Xk+1 + 2(ak - ak+1)Xk+1 } d i .

The integrals of the last three terms all vanish by (5) and independence, hence

11 ?Mk

(Sk - ak)2 d°J' .+

Xk+l dGJ'fMk= f (Sk - ak)2 d°J' + - (Mk)a 2 (Xk+l ) .Mk

Substituting into (6), and using Mk M, we obtain for 0 < k < n - 1 :

(Sk+l - ak+1 )2 d~ -M

k

(Sk - ak)2Mk+,

dam'

> ~'(Mn)62(Xk+l) - (4E + 2A)2J'(Qk+l ) •Summing over k and using (7) again for k = n :

4E2 (Mn) > f (ISn I + E)2 d~ >

'

2 JM„

,

(Sn - an ) dM„

n> ~'(M")

- (4E + 2A)2 GJ'(Q\Mn ),j=1

hencen

(2A + 46)2 > 91(ml)j=1

which is (4) .

Page 139: A Course in Probability Theory KL Chung

We can now prove the "three series theorem" of Kolmogorov (1929) .

Theorem 5.3.3 . Let IX,,) be independent r.v.'s and define for a fixed con-stant A > 0 :

Yn (w) - Xn (w),

if IX,, (w) I < A0,

if IXn (w)I > A.

Then the series >n Xn converges a .e. if and only if the following three seriesall converge :

(i) E„ -OP{IXnI > A} _ En 9~1{Xn :A Yn},(11)

E17 cr(Yll ),(iii) En o 2 (Yn ) .

PROOF . Suppose that the three series converge . Applying Theorem 5 .3 .1to the sequence {Yn - d (Y,, )}, we have for every m > 1 :

If we denote the probability on the left by /3(m, n, n'), it follows from theconvergence of (iii) that for each m :

lim lim _?(m, n, n') = 1 .n-aoo n'-oo

This means that the tail of >n {Yn - 't(Y,1 )} converges to zero a .e ., so that theseries converges a.e. Since (ii) converges, so does En Yn . Since (i) converges,{X„ } and {Y n } are equivalent sequences; hence En X„ also converges a .e. byTheorem 5 .2 .1 . We have therefore proved the "if' part of the theorem .

Conversely, suppose that En Xn converges a .e. Then for each A > 0 :

°l'(IXn I

> A i.o .) = 0 .

It follows from the Borel-Cantelli lemma that the series (i) must converge .Hence as before En Y,, also converges a .e . But since I Yn - (r (Yn ) I < 2A, wehave by Theorem 5.3 .2

maxn<k<n'

5.3 CONVERGENCE OF SERIES 1 125

(4A + 4)2ll '

a (Yj)j=n

j=n

Were the series (iii) to diverge, the probability above would tend to zero asn' --+ oc for each n, hence the tail of En Y„ almost surely would not be

Page 140: A Course in Probability Theory KL Chung

126 1 LAW OF LARGE NUMBERS . RANDOM SERIES

bounded by 1, so the series could not converge. This contradiction proves that(iii) must converge . Finally, consider the series En {Yn - (1"'(Y,)} and applythe proven part of the theorem to it . We have ?PJIY n - (5'(Yn )I > 2A) = 0 ande` (Yn - (`(Y,)) = 0 so that the two series corresponding to (i) and (ii) with2A for A vanish identically, while (iii) has just been shown to converge . Itfollows from the sufficiency of the criterion that En {Yn - ([(Y„ )} convergesa .e ., and so by equivalence the same is true of En {Xn - (r(Yn )} . Since E„ Xnalso converges a .e. by hypothesis, we conclude by subtraction that the series(ii) converges . This completes the proof of .the theorem .

The convergence of a series of r.v.'s is, of course, by definition the sameas its partial sums, in any sense of convergence discussed in Chapter 4 . Forseries of independent terms we have, however, the following theorem due toPaul Levy.

Theorem 5.3 .4 . If {X,} is a sequence of independent r .v.'s, then the conver-gence of the series En Xn in pr. is equivalent to its convergence a .e .

PROOF . By Theorem 4.1 .2, it is sufficient to prove that convergence ofEll X„ in pr. implies its convergence a.e. Suppose the former; then, givenE: 0 < E < 1, there exists m0 such that if n > m > m0 , we have

(8)

{Isrn,nI > E} < E,

wheren

Sin, _j=m+1

It is obvious that for in < k < n we haven

(9)

U { in ax ISin, j I < 2E ; ISm, I > 2E; I Sk,n I :!S E} C { ISin,,, I > E}M< J_

k=m+ 1

where the sets in the union are disjoint . Going to probabilities and usingindependence, we obtain

nJl max I Sin, j 1

26 ; I Sin, k l > 2E} J~'{ ISk,n I < E} <f ISm,n I > E} .m< j<k-1

k=m+ 1

If we suppress the factors

ISk,,, I < E), then the sum is equal to

Y{ max I S, n ,j I > 2E}

,n< j<n

(cf. the beginning of the proof of Theorem 5 .3 .1). It follows that

~{ max ISmJ I > 2E} min J/'{ I Sk,n I _< E}

{ I Sm,n I > 6}m< j<n

m<k<n

Page 141: A Course in Probability Theory KL Chung

5.3 CONVERGENCE OF SERIES 1 127

This inequality is due to Ottaviani . By (8), the second factor on the left exceeds1 -c, hence if m > m 0 ,

(10)

GJ'{ max ISm,jl > 2E} < 1 1 E-Opt ISm,nI > E} < 1 E E .m< j<n

Letting n --> oc, then m -± oo, and finally c --* 0 through a sequence ofvalues, we see that the triple limit of the first probability in (10) is equalto zero. This proves the convergence a .e. of >n X, by Exercise 8 of Sec . 4.2 .

It is worthwhile to observe the underlying similarity between the inequal-ities (10) and (1) . Both give a "probability upper bound" for the maximum ofa number of summands in terms of the last summand as in (10), or its varianceas in (1) . The same principle will be used again later .

Let us remark also that even the convergence of S, in dist. is equivalentto that in pr. or a.e. as just proved ; see Theorem 9 .5.5 .

We give some examples of convergence of series .

Example. >n ±1/n .This is meant to be the "harmonic series" with a random choice of signs in each

term, the choices being totally independent and equally likely to be + or - in eachcase. More precisely, it is the series

n

where {Xn , n > 1) is a sequence of independent, identically distributed r .v.'s takingthe values ±1 with probability z each .

We may take A = 1 in Theorem 5 .3 .3 so that the two series (i) and (ii) vanish iden-tically . Since u'(Xn ) = u 2 (Yn ) = 1/n'- , the series (iii) converges . Hence, 1:„ ±1/nconverges a .e. by the criterion above . The same conclusion applies to En ±1/n 8 if2 < 0 < 1 . Clearly there is no absolute convergence . For 0 < 0 < 1 , the probabilityof convergence is zero by the same criterion and by Theorem 8 .1 .2 below .

EXERCISES

1 . Theorem 5 .3 .1 has the following "one-sided" analogue . Under thesame hypotheses, we have

0 1 max Sj > E} <(S" )

1<j<n

E2 + a2 (Sn)

[This is due to A . W. Marshall .]

Page 142: A Course in Probability Theory KL Chung

128 1 LAW OF LARGE NUMBERS . RANDOM SERIES

*2. Let {X„ } be independent and identically distributed with mean 0 andvariance 1 . Then we have for every x :

PJ max S j > x} < 2,91'{Sn > x - 2n } .1<j_<<n

[HINT : LetAk = { max Sj < x ; Sk > x}

1<j<k

then Ek-1 '{Ak ; S„ - Sk > -,/2-n } < ~P{Sn > x - 2n} .]3. Theorem 5 .3.2 has the following companion, which is easier to prove .

Under the joint hypotheses in Theorems 5 .3 .1 and 5 .3 .2, we have

?71 { max JSj I <- E} <(A+ E)2

1<j<n

U2.

(Sn)

4. Let {X,,, Xn, n > 1} be independent r .v .'s such that Xn and X' havethe same distribution . Suppose further that all these r .v.'s are bounded by thesame constant A . Then

E(Xn - Xn)n

converges a .e. if and only if

1: 62 (Xn ) < 00 .

n

Use Exercise 3 to prove this without recourse to Theorem 5 .3 .3, and so finishthe converse part of Theorem 5 .3 .3 .

*5. But neither Theorem 5.3 .2 nor the alternative indicated in the prece-ding exercise is necessary ; what we need is merely the following result, whichis an easy consequence of a general theorem in Chapter 7 . Let {X,) be asequence of independent and uniformly bounded r .v.'s with a2 (Sn ) --* -hoc .Then for every A > 0 we have

lim l'{ ISn I < A} = 0 .n-+ oc

Show that this is sufficient to finish the proof of Theorem 5 .3 .3 .*6. The following analogue of the inequalities of Kolmogorov and Otta-

viani is due to P . Levy . Let Sn be the sum of n independent r .v.'s andS° = Sn - m0 (Sn ), where mo (S,) is a median of S„ . Then we have

max I S. l > E} < 3J'

E{ IS° I >

2} .

1< <n

[HINT: Try "4" in place of "3" on the right .]

Page 143: A Course in Probability Theory KL Chung

7. For arbitrary {X,, }, if

n

then E„ Xn converges absolutely a .e .8. Let {Xn }, where n = 0, ±1, ±2, . . ., be independent and identically

distributed according to the normal distribution c (see Exercise 5 of Sec . 4.5) .Then the series of complex-valued r .v .'s

00 einxXn

n=1

5.4 STRONG LAW OF LARGENUMBERS 1 129

E (-f' '(IX,, I) < 00,

where i =

and x is real, converges a .e. and uniformly in x . (This isWiener's representation of the Brownian motion process .)

*9. Let {X n } be independent and identically distributed, taking the values0 and 2 with probability 2 each; then

00 Xn

3nn=1

converges a .e. Prove that the limit has the Cantor d .f. discussed in Sec . 1 .3 .Do Exercise 11 in that section again ; it is easier now .

*10. If En ±Xn converges a .e. for all choices of ±1, where the Xn 's arearbitrary r.v.'s, then >n Xn 2 converges a.e . [HINT : Consider >n rn (t)X, (w)where the rn 's are coin-tossing r .v .'s and apply Fubini's theorem to the spaceof (t, cw) .]

5 .4 Strong law of large numbers

To return to the strong law of large numbers, the link is furnished by thefollowing lemma on "summability" .

Kronecker's lemma. Let {xk} be a sequence of real numbers, {ak} asequence of numbers > 0 and T oo . Then

nxn- < converges -

n an J=

PROOF . For 1 < n < oc letn

xj --* 0 .

Page 144: A Course in Probability Theory KL Chung

1 30 1 LAW OF LARGE NUMBERS. RANDOM SERIES

If we also write a0 - 0, bo = 0, we have

and

(Abel's method of partial summation) . Since aj+ l - aj > 0,

(aj+l - aj) = 1,

and b„ --+ b,,,, we have

xj-*b,,, -b,,~b~=

then

(3)

xn = an (bn - bn-1)

The lemma is proved .

Now let cp be a positive, even, and continuous function on -R,' such thatas Ixl increases,

(1)

co(x)T ,

cp(x)

IxI

x2

Theorem 5.4.1 . Let {Xn } be a sequence of independent r.v.'s with ((Xn ) _0 for every n ; and 0 < a,, T oo. If ~o satisfies the conditions above and

(2)

E(t(~O(Xn))

< 00,W(an)

n

Xn converges a.e .n a,

PROOF . Denote the d.f. Of Xn by Fn . Define for each n

(4)

Yn (w) _ f Xn (w),

if IXn (w) I < an,0,

if IXn (w) I > an .

Then2Q)Y„n

a n n

n -

. - bj_1) = b n -

bj(aj+l - aj)j=0

x22 dF„(x).

fxj<a,, a n

Page 145: A Course in Probability Theory KL Chung

Thus, for the r.v.'s {Yn - AY,)}/an, the series (iii) in Theorem 5 .3.3 con-verges, while the two other series vanish for A = 2, since I Yn - ct(Yn ) I < 2a, ;hence

(5)

2 Yn

~ Yn

(P(X)Q - <

< dFn (x)2n

an

n

an

n

Ixl <a n ~p (an )

C (~O(Xn ))< 00 .

n

Next we have

IL (Yn)

n

an

n

n

I{Yn - c(Yn )} converges a.e .

n

J~{Xn :0 Yn}n

n

where the second equation follows from f 0xdFn (x) = 0. By the first hypo-thesis in (1), we have

1XI_<

~P(x)

for Ix1 > an .an

~o(an)

It follows that

E I (( Yn)I

f

(p(x)dFn (x)

n an n I>a„ W(an)

This and (5) imply that En (Y„ /a„) converges a .e . Finally, since cp T, we have

n

'I

5.4 STRONG LAW OF LARGENUMBERS 1 131

By the second hypothesis in (1), we have2X< 'p(x) for

Ix I< an .

a~(an )n

It follows that

(p(an )

x dFn (x)

- dFn (x),

n

n

xdF„(x)~xI >a,

([('(X,, )) < oo .(p(an)

dFn (x) <

J

'p(x)dFn (x)

ixI >a„

n

ixl>a„ p(an )

~~ (~P(Xn ) )< 00 .

~o(an)

Thus, {Xn } and {Yn } are equivalent sequences and (3) follows .

Page 146: A Course in Probability Theory KL Chung

1 32 1 LAW OF LARGE NUMBERS . RANDOM SERIES

Applying Kronecker's lemma to (3) for each c) in a set of probabilityone, we obtain the next result .

Corollary. Under the hypotheses of the theorem, we have

1

n(6)

an j=1

Particular cases . (i) Let ~p(x) = Ix I P , I < p < 2 ; an - n . Then we haven

(7)

nP ~(IXn I,) < 00 ~ 1

-+ 0 a.e .n

j=1

For p - 2, this is due to Kolmogorov ; for 1 < p < 2, it is due toMarcinkiewicz and Zygmund .

(ii) Suppose for some 8, 0 < 8 < 1 and M < oc we have

Vn : cO(IXn I 1+s) < M .

Then the hypothesis in (7) is clearly satisfied with p = 1 + 8 . This case isdue to Markov . Cantelli's theorem under total independence (Exercise 4 ofSec. 5 .1) is a special case .

(iii) By proper choice of {an }, we can considerably sharpen the conclu-sion (6) . Suppose

ndn : Q 2 (Xn) = 6n < 00, a2 (Sn) = sn -

Q~ -~ oo .

- * 0 a.e .

Choose co(x) = x2 and an = Sn (log sJ 1 /2)+E,

Theorem 5 .4.1 . Then

((Xn )a2

n

n n

by Dini's theorem, and consequently

Sn

2 Orn < 00

sn (log Sn )1+2c

-+ 0 a.e .

E > 0, in the corollary to

Sn (log Sn )(1/2)+E

In case all on = 1 so that sn = n, the above ratio has a denominator that isclose to n 1 / 2 . Later we shall see that n 1 / 2 is a critical order of magnitudefor Sn .

Page 147: A Course in Probability Theory KL Chung

We now come to the strong version of Theorem 5.2.2 in the totally inde-pendent case. This result is also due to Kolmogorov .

Theorem 5.4.2 . Let {Xn } be a sequence of independent and identically distri-buted r.v.'s . Then we have

(8)

c`(IX i I) < 00 ==~ Sn

((X1 ) a.e .,n

(9)

(-r(1X11) = 00 =:~ lim is" I= +oo a.e .

n-+oo nPROOF . To prove (8) define {Y n } as in (4) with an = n . Since

E -'-'P{Xn 0 Y,)n

00

n

=C(t(IX1I) < 00.

In the above we have used the elementary estimate E,O°_j n -2 < Cj-1 forsome constant C and all j > 1 . Thus the first sum in (10) converges, and weconclude by (7) that

n

n

by Theorem 3 .2.1, {Xn } and {Yn } are equivalent sequences. Let us apply (7)to {Y„

with ~p(x) = x2 . We have

02 (Yn)

"(I2n)

2(10)

n 2

n

x dF(x).

n

n

n

IxI<<n

We are obliged to estimate the last written second moment in terms of the firstmoment, since this is the only one assumed in the hypothesis . The standardtechnique is to split the interval of integration and then invert the repeatedsummation, as follows :

00 1En2

x2 dF(x)n=1

j=1I

-1 <IxI<j

x 2 dF(x)I 1 Ixl :!~j

n=j

5.4 STRONG LAW OF LARGENUMBERS 1 133

0P{1XnI > n} =E°'{IX1I > n} < 00

n

- "(Yj )} -+ 0 a.e .

! IxII- <I<

Ix I dF(x)

Page 148: A Course in Probability Theory KL Chung

1 34 1 LAW OF LARGE NUMBERS. RANDOM SERIES

Clearly cr(Y,,) -+ (`(X 1 ) as n -+ oc ; hence also

and consequentlyn

-- . e(X I ) a.e .j=1

By Theorem 5 .2.1, the left side above may be replaced by (1/n) I:;=1 Xj ,

proving (8) .To prove (9), we note that ~(IX 1 1) = oc implies AIX1 I/A) = oo for each

A > 0 and hence, by Theorem 3 .2.1,

G.1'(IX 1 I > An) = +oo .n

Since the r .v.'s are identically distributed, it follows that

E ~(lXn I > An) = +oo .n

Now the argument in the example at the end of Sec . 5.2 may be repeatedwithout any change to establish the conclusion of (9) .

Let us remark that the first part of the preceding theorem is a special caseof G. D. Birkhoff's ergodic theorem, but it was discovered a little earlier andthe proof is substantially simpler .

N. Etemadi proved an unexpected generalization of Theorem 5 .4.2: (8)is true when the (total) independence of {Xn } is weakened to pairwiseindependence (An elementary proof of the strong law of large numbers,Z. Wahrscheinlichkeitstheorie 55 (1981), 119-122) .

Here is an interesting extension of the law of large numbers when themean is infinite, due to Feller (1946) .

Theorem 5.4 .3 . Let {Xn } be as in Theorem 5 .4.2 with ct (IX 11) = oc. Let{a„ } be a sequence of positive numbers satisfying the condition an /n T . Thenwe have

(11)

lim is" I = 0 a.e ., or= oo a.e .n an

according as

(12)

t'{1X n I > a n } -

dF(x) < oo, or = oc .n

PROOF . Writing

IXI >an

f (Y 1 ) -+ (%'(X 1),

n

cc

k=n Jak < Ixl <ak+,

dF(x),

Page 149: A Course in Probability Theory KL Chung

5.4 STRONG LAW OF LARGENUMBERS 1 135

substituting into (12) and rearranging the double series, we see that the seriesin (12) converges if and only if

(13)

1:k [

dF(x) < oo .k

k_1<Ixl<ak

Assuming, this is the case, we put

µn =

xdF(x);Ixl<a„

and so

17

Yn{Xn - An

- /-fin

Thus (Yn ) = 0. We have by (12),

(14)

'{Yn Xn - µn } < 00 -n

Next, with a0 = 0 :v y2n < 1

fx2 dF(x)

an

n an Ixl<a„

Since an /n > ak/k for n > k, we have

if Wn I < an ,

if 1Xn I > an .

°O 1X2 dF(x)

a2`k=1

fk_ 1 <Ixl<ak

n

00

k=1 Ja k _ 1 <IxI<a k

O° 1

k2

1

2ka2 Q2 n 22 - a2 '

n=k n

k n=k

k

Y2

00

<E2k

dF(x)<oon

an

k=1

ak_i <IxI<ak

by (13) . Hence E Yn /an converges (absolutely) a .e. by Theorem 5 .4.1, andso by Kronecker's lemma :

n1(15)

Yk ± 0 a.e .k=1

Page 150: A Course in Probability Theory KL Chung

136 1 LAW OF LARGE NUMBERS. RANDOM SERIES

We now estimate the quantity

1(16)

1

a

Ik = --

k=1 JxI<aAxdF(x)

" k=1 lIk

as n -~ oc. Clearly for any N < n, it is bounded in absolute value by

n (aN +

Ix I dF(x) .an

aN<IxI<a„

Since (r(IX1I) = oc, the series in (12) cannot converge if an /n remains boun-ded (see Exercise 4 of Sec . 3 .2). Hence for fixed N the term (n/an )aN in (17)tends to 0 as n -* oc . The rest is bounded by

n

11n(18)

aj

dF(x)

dF(x)an

j=N+1

aj-i<Ixj<a1

j=N+1

a1_i< xf <aj

because naj /an < j for j < n . We may now as well replace the n in theright-hand member above by oo ; as N --a oo, it tends to 0 as the remainder ofthe convergent series in (13) . Thus the quantity in (16) tends to 0 as n --> oo ;combine this with (14) and (15), we obtain the first alternative in (11) .

The second alternative is proved in much the same way as in Theorem 5 .4 .2and is left as an exercise . Note that when a" = n it reduces to (9) above .

n

Corollary. Under the conditions of the theorem, we have

( 1 9)

911 IS, I > a, i .o.} _ -OP{ IX„ I > an i.o .} .

This follows because the second probability above is equal to 0 or 1according as the series in (12) converges or diverges, by the Borel-Cantellilemma. The result is remarkable in suggesting that the nth partial sum andthe n th individual term of the sequence {X" } have comparable growth in acertain sense . This is in contrast to the situation when e(IXi I) < oo and (19)is false for an = n (see Exercise 12 below) .

EXERCISES

The X„'s are independent throughout ; in Exercises 1, 6, 8, 9, and 12 they arealso identically distributed ; S„ = I~=I Xj .

*1 . If (`(X+) = + o0, F (X i) < oc, then S" /n -- -boo a .e .

*2. There is a complement to Theorem 5 .4 .1 as follows . Let {an } and cpbe as there except that the conditions in (1) are replaced by the condition thatcp(x) T and 'p(x)/IxI ~ . Then again (2) implies (3) .

Page 151: A Course in Probability Theory KL Chung

3. Let {X„ } be independent and identically distributed r .v.'s such thatc`(IX1IP) < od for some p : 0 < p < 2 ; in case p > 1, we assume also thatC`(X 1 ) = 0 . Then S„n-('In)_E --* 0 a.e. For p = 1 the result is weaker thanTheorem 5 .4.2 .

4. Both Theorem 5.4 .1 and its complement in Exercise 2 above are "bestpossible" in the following sense . Let {a, 1 ) and (p be as before and suppose thatb11 > 0,

bn

_ ~.

n~T(all) -

Then there exists a sequence of independent and identically distributed r .v.'s{Xn } such that cr-(Xn ) = 0, (r(Cp(Xn )) = b, and

Xnan

converges = 0.

[HINT : Make En '{ JX, I > an } = oo by letting each X, take two or threevalues only according as bn /Wp(an ) _< I or >1 .1

5. Let Xn take the values ±n o with probability 1 each. If 0 < 0 <

z , then S11 /n --k 0 a.e. What if 0 > 2 ? [HINT: To answer the question, useTheorem 5 .2 .3 or Exercise 12 of Sec . 5 .2 ; an alternative method is to considerthe characteristic function of S71 /n (see Chapter 6) .]

6. Let t(X 1 ) = 0 and {c, 1 } be a bounded sequence of real numbers . Then

j --}0 a. e .

[HINT : Truncate X„ at n and proceed as in Theorem 5 .4 .2 .]

7. We have S„ /n --). 0 a.e. if and only if the following two conditionsare satisfied :

(i) S 11 /n -+ 0 in pr .,

(ii) S2"/2" -± 0 a .e . ;

an alternative set of conditions is (i) and

(ii') VC > 0 : En JP(1S2 ,'+1- S2,' ( > 211E < cc .

*8. If ("(`X11) < oo, then the sequence {S,1 /n } is uniformly integrable and

Sn /n -* ( (X 1 ) in L 1 as well as a .e .

5.4 STRONG LAW OF LARGENUMBERS1 137

Page 152: A Course in Probability Theory KL Chung

1 38 I LAW OF LARGE NUMBERS . RANDOM SERIES

9. Construct an example where i(X i)

= + c and S„ /n -+00 a.e . [HINT : Let 0 < a < fi < 1 and take a d .f. F such that I - F(x) x-"as x --+ 00, and f°0, IxI ~ dF(x) < 00 . Show that

/'( max X < n'/c") < 00n

1 <.7<n J

for every a' > a and use Exercise 3 for En=, X- . This example is due toDerman and Robbins . Necessary and sufficient condition for S„In --> +00has been given recently by K . B . Erickson .

10. Suppose there exist an a, 0 < a < 2, a ; 1, and two constants A 1and A2 such that

bn, b'x > 0 :X" <

J'{IXnI > x} < A2

If a > 1, suppose also that (1'(Xn ) = 0 for each n . Then for any sequence {an }increasing to infinity, we have

z'{ Is, I > an i-0-1 _ 5 01

if > '~ 1 <an a„ -

[This result, due to P . Levy and Marcinkiewicz, was stated with a superfluouscondition on {a„} . Proceed as in Theorem 5 .3 .3 but truncate Xn at an ; directestimates are easy .]

11. Prove the second alternative in Theorem 5 .4.3 .12. If c~(X1 ) ; 0, then maxi <k<„ IXk I I ISn I -). 0 a.e . [HINT : IX„ I /n -a

0 a.e .]13. Under the assumptions in Theorem 5 .4.2, if S„ /n converges a .e . then

((IXII) < o0 . [Hint : X, In converges to 0 a.e ., hence Gl'{IXn I > n i.o .) = 0 ;use Theorem 4 .2 .4 to get E„ /'{ IX 1 I > n } < oc.]

5.5 ApplicationsThe law of large numbers has numerous applications in all parts of proba-bility theory and in other related fields such as combinatorial analysis andstatistics . We shall illustrate this by two examples involving certain importantnew concepts .

The first deals with so-called "empiric distributions" in sampling theory .

Let {X„, n > l} be a sequence of independent, identically distributed r .v.'s

with the common d .f. F . This is sometimes referred to as the "underlying"or "theoretical distribution" and is regarded as "unknown" in statistical lingo .

For each w, the values X,(&)) are called "samples" or "observed values", andthe idea is to get some information on F by looking at the samples . For each

Page 153: A Course in Probability Theory KL Chung

5.5 APPLICATIONS 1 1 39

n, and each cv E S2, let the n real numbers {X j (co), 1 _< j < n } be arranged inincreasing order as

(1)

Now define a discrete d .f. Fn ( •,

cv) as follows

:

In other words, for each x, nFn (x, c)) is the number of values of j, 1 < j < n,for which X j (co) < x ; or again Fn (x, w) is the observed frequency of samplevalues not exceeding x . The function Fn ( •,

co) is called the empiric distribution

function based on n samples from F .For each x, Fn (x, •)

is an r

.v., as we can easily check. Let us introducealso the indicator r .v .'s {~j(x), j > 1) as follows :

We have then

Yn1(w) < Yn2((0) < . . . < Ynn(w)-

~1(x, w) = 10

nFn (x, co) = -

~ j (x, cv) .

j=1

For each x, the sequence {~j(x)} is totally independent since {X j} is, byTheorem 3 .3 .1 . Furthermore they have the common "Bernoullian distribution",taking the values 1 and 0 with probabilities p and q = 1 - p, where

p = F(x), q = 1 - F(x) ;

thus c_'(~ j (x)) = F (x) . The strong law of large numbers in the formTheorem 5.1 .2 or 5 .4.2 applies, and we conclude that

(2)

F,, (x, c)) --). F(x) a.e .

Matters end here if we are interested only in a particular value of x, or a finitenumber of values, but since both members in (2) contain the parameter x,which ranges over the whole real line, how much better it would be to makea global statement about the functions Fn ( •,

(o) and F(

•). We shall do this inthe theorem below. Observe first the precise meaning of (2) : for each x, there

exists a null set N(x) such that (2) holds for co E Q\N(x) . It follows that (2)

also holds simultaneously for all x in any given countable set Q, such as the

if X j (c)) < x,if X j (co) > x .

Fn (x, cv) = 0, if x < Yn1(w),

kF, (x, cv) = if Ynk(co) < X < Yn,k+l(w), 1 < k < n - 1,

n -

Fn (x, co) = 1, if x > Y„n (c)) .

Page 154: A Course in Probability Theory KL Chung

1 40 1 LAW OF LARGE NUMBERS. RANDOM SERIES

set of rational numbers, for co E Q\N, where

N= UN(x)XEQ

is again a null set . Hence by the definition of vague convergence in Sec . 4 .4,we can already assert that

F,(-,w)4 F( •) for a.e . co .

This will be further strengthened in two ways : convergence for all x anduniformity . The result is due to Glivenko and Cantelli .

Theorem 5.5.1 . We have as n -± oc

sup IF, (x, co) - F(x)I -3 0 a.e .-00<X<00

PROOF . Let J be the countable set of jumps of F . For each x e J, define

_ 1,

if Xj (w) x;~~ (x, cv) - 1 0,

if Xj (c)) ,-~ x .

Then for X E J :n

Fn (x-f-, (o) - Fn (x-, co) = - 17j (X, 0)),

and it follows as before that there exists a null set N(x) such that if co E

S2\N(x), then

(3)

Fn (x+, c)) - F n (x-, c)) -- F (x+) - F (x-) .

Now let N1 = UXEQuJ N(x), then N1 is a null set, and if co E Q\N1, then (3)holds for every x E J and we have also

(4)

Fn (x, cv) -+ F(x)

for every x E Q. Hence the theorem will follow from the following analyticalresult .

Lemma. Let Fn and F be (right continuous) d.f.'s, Q and J as before .Suppose that we have

Vx E Q: Fn (x) -~ F(x) ;

Vx E J : F n (x) - Fn (x-) --* F(x) - F(x-) .

Then Fn converges uniformly to F in JL1 .

Page 155: A Course in Probability Theory KL Chung

PROOF . Suppose the contrary, then there exist E > 0, a sequence {nk} ofintegers tending to infinity, and a sequence {xk} in ~~1 such that for all k :

(5) IFnk(xk) - F(xk)I > E > 0.

5.5 APPLICATIONS 1 1 4 1

This is clearly impossible if xk -+ +oc or Xk -+ -oo . Excluding these cases,we may suppose by taking a subsequence that xk -~ ~ E ~:-41 . Now considerfour possible cases and the respective inequalities below, valid for all suffi-ciently large k, where r 1 E Q, r2 E Q, r l < 4 < r2 .

Case 1 . xk t 4, xk < i :

< Fnk (xk) - F(xk) < Fnk (4- ) - F(rl )

< Fnk (4-) - Fnk (44) + Fnk (r2) - F(r2) + F(r2) - F(rl ) .

Case 2. xk t 4, xk < 4 :

< F(Xk) - Fnk(xk) < F(4- ) - Fnk(rl)

= F(4-) - F(rl) + F(r1) - F,,, (rl ) .

Case 3. xk ~ 4, xk > 4

< F(xk) - Fnk(xk) < F(r2) - Fnk(4)

< F(r2) - F(r1) + F(r1) - Fnk (rl ) + Fnk (4-) - Fnk (4) .

Case 4. xk ,j, 4, xk > 4

< F„k (xk) - F(xk) < Fn k (r2) - F(4)

= Fnk (r2) - Fnk (ri ) + F„ k (r j ) - F(ri) + F(ri) - F(4) .

In each case let first k -a oo, then r1 T 4, r2 ~ 4 ; then the last member ofeach chain of inequalities does not exceed a quantity which tends to 0 and acontradiction is obtained .

Remark . The reader will do well to observe the way the proof aboveis arranged. Having chosen a set of c) with probability one, for each fixedc) in this set we reason with the corresponding sample functions F„ ( ., (0)

and F(., co) without further intervention of probability . Such a procedure isstandard in the theory of stochastic processes .

Our next application is to renewal theory . Let IX,,, n > 1 } again be asequence of independent and identically distributed r.v .'s . We shall furtherassume that they are positive, although this hypothesis can be dropped, andthat they are not identically zero a .e. It follows that the common mean is

Page 156: A Course in Probability Theory KL Chung

142 1 LAW OF LARGE NUMBERS. RANDOM SERIES

strictly positive but may be +oo . Now the successive r .v.'s are interpreted as"lifespans" of certain objects undergoing a process of renewal, or the "returnperiods" of certain recurrent phenomena. Typical examples are the ages of asuccession of living beings and the durations of a sequence of services . Thisraises theoretical as well as practical questions such as : given an epoch intime, how many renewals have there been before it? how long ago was thelast renewal? how soon will the next be?

Let us consider the first question. Given the epoch t > 0, let N(t, w)be the number of renewals up to and including the time t . It is clear thatwe have

(6)

{w: N(t, w) = n } = {w: S„ (w) < t < S„+1(w)}

valid for n > 0, provided So =- 0 . Summing over n < m - 1, we obtain

(7)

{w: N(t, (o) < m} = {w : Sm (w) > t} .

This shows in particular that for each t > 0, N(t) = N(t, .) is a discrete r.v .whose range is the set of all natural numbers. The family of r .v.'s {N(t)}indexed by t E [0, oo) may be called a renewal process . If the common distri-bution F of the X„'s is the exponential F(x) = 1 - e- , x > 0 ; where a. > 0,then {N(t), t > 0} is just the simple Poisson process with parameter A .

Let us prove first that

(8)

lim N(t) = +oo a.e .,t-> 00

namely that the total number of renewals becomes infinite with time . This isalmost obvious, but the proof follows . Since N(t, w) increases with t, the limitin (8) certainly exists for every w . Were it finite on a set of strictly positiveprobability, there would exist an integer M such that

.P{ sup N(t, w) < M} >0 .0<t<oo

This implies by (7) that

JT{SM(w) = +oo} > 0,

which is impossible . (Only because we have laid down the convention longago that an r .v . such as X1 should be finite-valued unless otherwise specified .)

Next let us write

(9)

0<m=c`(X1)<+o0,

and suppose for the moment that m < +oo . Then, according to the strong lawof large numbers (Theorem 5 .4.2), S,/n -+ m a.e. Specifically, there exists a

Page 157: A Course in Probability Theory KL Chung

null set Z 1 such that

Yw E 2\Z1 : limX1((0) + . . . + Xn (w)

= m.n-aoo

n

We have just proved that there exists a null set Z2 such that

Vw E S2\Z2 : lim N(t, w) = +oo .t-+00

Now for each fixed coo , if the numerical sequence {an (coo ), n > 1 } convergesto a finite (or infinite) limit m and at the same time the numerical function{N(t, wo), 0 < t < oo} tends to +oo as t -~ +oo, then the very definition of alimit implies that the numerical function {aN(t,uo)(wo), 0 < t < oo} convergesto the limit m as t -+ +oo . Applying this trivial but fundamental observation to

Theorem 5.5 .2 . We have

(11)

and

n

5.5 APPLICATIONS 1 1 43

j=1

for each w in Q\(Z 1 U Z2 ), we conclude that

SN(t,W)(w)(10)

tli~N(t w)

= m a.e.

By the definition of NQ, w), the numerator on the left side should be close tot ; this will be confirmed and strengthened in the following theorem .

N(t)

1lim = - a.e .t-* oo t

m

IN(t)} _ 1lim

-,t->oo

t

m

both being true even if m = +oo, provided we take 1 /m to be 0 in that case .

PROOF . It follows from (6) that for every w :

SN(t,w) (w) -< t < SN(t,,)+1 (w)

and consequently, as soon as t is large enough to make N(t, w) > 0,

SN(t, w ) (w) <t < SN(t,w)+1 (w) N(t, (o) + 1

N(t, w) - N(t, (o)

N(t, w) + 1 N(t, w)

Letting t --~ oo and using (8) and (10) (together with Exercise 1 of Sec . 5 .4in case m = +o0), we conclude that (11) is true .

Page 158: A Course in Probability Theory KL Chung

1 44 1 LAW OF LARGE NUMBERS. RANDOM SERIES

The deduction of (12) from (11) is more tricky than might have beenthought. Since X„ is not zero a.e ., there exists 8 > 0 such that

Yn:A{X„>8)=p>0.

DefineX, (0)) _

f

8,

if X„ (w) > 8 ;0,

if X„ (w) < 8 ;

and let S;, and N' (t) be the corresponding quantities for the sequence {X ;, , n >11 . It is obvious that Sn < S, and N'(t) > N(t) for each t . Since the r .v.'s{X;, /8} are independent with a Bernoullian distribution, elementary computa-tions (see Exercise 7 below) show that

ft 2

{N'(t)2 } = 0 (g2

as t -). oo.

Hence we have, 8 being fixed,

~N(t)) 2 <f

(Nr~t)) 2= O(1) .

Since (11) implies the convergence of N(t)/t in distribution to 8 1/,,,, an appli-cation of Theorem 4.5 .2 with X, = N(n)/n and p = 2 yields (12) with treplaced by n in (12), from which (12) itself follows at once .

Corollary. For each t, f{N(t)} < oo .

An interesting relation suggested by (10) is that f{SN(, ) } should be closeto m f{N(t)} when t is large . The precise result is as follows :

f {X 1 + . . . + XN(r)+1 } = f {X1) f{N(t) + 1 1

This is a striking generalization of the additivity of expectations when thenumber of terms as well as the summands is "random ." This follows from thefollowing more general result, known as "Wald's equation" .

Theorem 5.5.3. Let {X„ , n > 1 } be a sequence of independent and identi-cally distributed r .v.'s with finite mean . For k > 1 let A, 1 < k < oo, be theBorel field generated by {Xj , 1 < j < k} . Suppose that N is an r.v. takingpositive integer values such that

(13)

Vk>1 :{N<k)E',

and C '(N) < oc . Then we have

('(SN) = ("(X1)(°(N) .

Page 159: A Course in Probability Theory KL Chung

00(14) (r(SN) - f SNd~'=

Skd9'=

f Xj dTk=1 (N=k)

k=1 j=1 {N=k}

W

PROOF. Since So = 0 as usual, we have

Now the set {N < j - 1) and the r.v . X j are independent, hence the last writtenintegral is equal to (t(Xj ) 7{N < j - 1} . Substituting this into the above, weobtain

cc

00 00

= ~7EJ XjdOP=f

Xj dJTj=1 k=j {N k}

j=1 {N'j }

( (SN) =

(-F(Xj)z~-1{N > j} _ Y(X1)

9{N > j} = ((X1)cP(N),j=1

j=1

the last equation by the corollary to Theorem 3 .2 .1 .

It remains to justify the interchange of summations in (14), which isessential here . This is done by replacing X j with jX j I and obtaining as beforethat the repeated sum is equal to c(IX11)(-P(N) < cc.

We leave it to the reader to verify condition (13) for the r .v . N(t) + 1above. Such an r .v . will be called "optional" in Chapter 8 and will play animportant role there .

Our last example is a noted triumph of the ideas of probability theoryapplied to classical analysis . It is S . Bernstein's proof of Weierstrass' theoremon the approximation of continuous functions by polynomials .

Theorem 5.5.4 . Let f be a continuous function on [0, 1], and define theBernstein polynomials {p„) as follows :

pn (x) _

n

k=0

00

k

(--,- (Xj)-L

Xj dA .j=1

{N<_j-1}

00

k )()xin

f -

k(- x)n-kn

k

Then p, converges uniformly to f in [0, 1] .

PROOF. For each x, consider a sequence of independent Bernoullian r .v.'s{Xn , n > 1} with success probability x, namely:

1

with probability x,Y n = 0

with probability 1 - x;

5.5 APPLICATIONS 1 145

Page 160: A Course in Probability Theory KL Chung

146 1 LAW OF LARGE NUMBERS. RANDOM SERIES

and let S„ _ Ek"-1 X, as usual. We know from elementary probability theorythat

so that

Pn(x)=~~{f (SI )}.

We know from the law of large numbers that Sn /n -* x with probability one,but it is sufficient to have convergence in probability, which is Bernoulli'sweak law of large numbers . Since f is uniformly continuous in [0, 1], itfollows as in the proof of Theorem 4 .4 .5 that

l

Sn

-~ ,If (X),= f(x)f

l (n

We have therefore proved the convergence of pn (x) to f (x) for each x. Itremains to check the uniformity . Now we have for any 8 > 0:

{Sn = k} =

(16)

I Pn (x) - f (x)1 :5 LP f

f

211f NP

(1 _ x)n-k ,

Sn - xn

-f(x)

- f (x)

f (nn - f (x)

0<k<n,

n

where we have written 1 {Y; A} for fA Y d-JP . Given c > 0, there exists 8(E)such that

Ix - y I :!~ 8= I f (x) - f (y) I <- E/2 .

With this choice of 8 the last term in (16) is bounded by E/2 . The precedingterm is clearly bounded by

Now we have by Chebyshev's inequality, since o''(S 1,) = nx, a2 (Sn ) = nx(1 -x), and x(1 - x) < a for 0 < x < 1 :

This is nothing but Chebyshev's proof of Bernoulli's theorem . Hence if n >IIf II/8 2E, we get I p, (x) - f (x)I < E in (16) . This proves the uniformity .

Sn- -x >8 <

1a- 2 Sn nx(1 - x) < 1

n - 62 n 82n 2

- 482n

Page 161: A Course in Probability Theory KL Chung

5 .5 APPLICATIONS 1 1 47

One should remark not only on the lucidity of the above derivation butalso the meaningful construction of the approximating polynomials . Similarmethods can be used to establish a number of well-known analytical resultswith relative ease ; see Exercise 11 below for another example .

EXERCISES

{X„ } is a sequence of independent and identically distributed r.v .'s ;n

S,j=1

1. Show that equality can hold somewhere in (1) with strictly positiveprobability if and only if the discrete part of F does not vanish .

*2. Let Fn and F be as in Theorem 5 .5.1 ; then the distribution of

sup IFn (x, co) - F(x)-00<X<00

is the same for all continuous F . [HINT: Consider F(X), where X has thed.f. F.]

3. Find the distribution of Ynk , 1 < k < n, in (1) . [These r .v.'s are calledorder statistics .]

*4. Let Sn and N(t) be as in Theorem 5 .5 .2 . Show that00

e{N(t)} _

.G/}{Sn < t} .n=1

This remains true if X1 takes both positive and negative values .5. If 1(X1 ) > 0, then

00

lim ?~p U IS. < t] = 0 .t-* -00

n=1

*6. For each t > 0, define

v(t, w) = min{n : IS n (w)I > t}

if such an n exists, or +oo if not. If J'(X1 =A 0) > 0, then for every t > 0 andr > 0 we have

v(t) > n } < An for some k < 1 and all large n ; consequently{v(t)r} < oo. This implies the corollary of Theorem 5 .5 .2 without recourse

to the law of large numbers. [This is Charles Stein's theorem .]*7. Consider the special case of renewal where the r .v.'s are Bernoullian

taking the values 1 and 0 with probabilities p and 1 - p, where 0 < p < 1 .

Page 162: A Course in Probability Theory KL Chung

1 48 1 LAW OF LARGE NUMBERS. RANDOM SERIES

Find explicitly the d .f. of v(O) as defined in Exercise 6, and hence of v(t)for every t > 0 . Find [[{v(t)} and (, {v(t)2 ) . Relate v(t, w) to the N(t, w) inTheorem 5 .5.2 and so calculate (1{N(t)} and "{N(t)2}.

8. Theorem 5 .5.3 remains true if (r'(X1) is defined, possibly +00 or -oo .*9. In Exercise 7, find the d .f. of X„ ( , ) for a given t . (`'{X„(,)} is the mean

lifespan of the object living at the epoch t ; should it not be the same as C`{X 1 },the mean lifespan of the given species? [This is one of the best examples ofthe use or misuse of intuition in probability theory .]

10 . Let r be a positive integer-valued r.v. that is independent of the X, 's .Suppose that both r and X 1 have finite second moments, then

a2(St)= (qr)a2(X1)+u2 (r)(U(X1))2 .

Then

f (x) = -

(- 1)n-1 n nl

(n-1) (nnooo (n - 1)! C x / g

`x '

where g ( n -1) is the (n - 1)st derivative of g, uniformly in every finite interval .[xarrr : Let ) > 0, T {X 1 (a.) < t } = 1 - e - ~. . Then

L {f (Sn(1))) _ [(-1)n-1/(n _ l)!]A.ng(n-1) (.)

and S, (n /x) -a x in pr. This is a somewhat easier version of Widder's inver-sion formula for Laplace transforms .]

12. Let .{X 1 = k} = p k , 1 < k < f, ik-1 pk = 1 . Let N(n, w) be thenumber of values of j, 1 < j < n, for which X1 = k and

PH(

) = lj Pk (n w) .n, w

k=1

Prove that

* 11. Let f be continuous and belong to L ' (0, oo) for some r > 1, and00

g(A) _ f e-Xtf(t) dt .0

limIlog jj(n, w) exists a .e .

n +oc n

and find the limit . [This is from information theory .]

Bibliographical Note

Borel's theorem on normal numbers, as well as the Borel-Cantelli lemmaSecs . 4.2-4.3, is contained in

in

Page 163: A Course in Probability Theory KL Chung

5 .5 APPLICATIONS 1 149

Smile Borel, Sur les probabilites denombrables et leurs applications arithmetiques,Rend. Circ. Mat. Palermo 26 (1909), 247-271 .

This pioneering paper, despite some serious gaps (see Frechet [9] for comments),is well worth reading for its historical interest . In Borel's Jubile Selecta (Gauthier-Villars, 1940), it is followed by a commentary by Paul Levy with a bibliography onlater developments .

Every serious student of probability theory should read :

A. N . Kolmogoroff, Ober die Summen durch den Zufall bestimmten unabhangigerGrossen, Math. Annalen 99 (1928), 309-319 ; Bermerkungen, 102 (1929),484-488 .

This contains Theorems 5 .3.1 to 5 .3.3 as well as the original version of Theorem 5 .2 .3 .For all convergence questions regarding sums of independent r.v .'s, the deepest

study is given in Chapter 6 of Levy's book [11] . After three decades, this book remainsa source of inspiration .

Theorem 5 .5.2 is taken from

J. L. Doob, Renewal theory from the point of view ofprobability, Trans. Am. Math .Soc . 63 (1942), 422-438 .

Feller's book [13], both volumes, contains an introduction to renewal theory as wellas some of its latest developments .

Page 164: A Course in Probability Theory KL Chung

6 Characteristic function

6.1 General properties; convolutions

An important tool in the study of r.v.'s and their p .m.'s or d .f.'s is the char-acteristic function (ch.f.) . For any r.v . X with the p.m . 11. and d.f . F, this isdefined to be the function f on :T. 1 as follows, Vt E R1 :

t

c(eitX )

eitX(m) .1)v(dw)

eitx

00

(1) f O = (

irx_ f

=

µ(dx) =

e dF(x) .

The equality of the third and fourth terms above is a consequence ofTheorem 3 .32.2, while the rest is by definition and notation . We remind thereader that the last term in (1) is defined to be the one preceding it, where theone-to-one correspondence between µ and F is discussed in Sec . 2 .2. We shalluse both of them below. Let us also point out, since our general discussionof integrals has been confined to the real domain, that f is a complex-valuedfunction of the real variable t, whose real and imaginary parts are givenrespectively by

R f (t) = f cosxtic(dx), I f (t) = f sin xtµ(dx) .

Page 165: A Course in Probability Theory KL Chung

6.1 GENERAL PROPERTIES; CONVOLUTIONS 1 1 5 1

Here and hereafter, integrals without an indicated domain of integration aretaken over W. 1 .

Clearly, the ch .f. is a creature associated with µ or F, not with X, butthe first equation in (1) will prove very convenient for us, see e .g. (iii) and (v)below. In analysis, the ch.f. is known as the Fourier-Stieltjes transform of µor F . It can also be defined over a wider class of µ or F and, furthermore,be considered as a function of a complex variable t, under certain conditionsthat ensure the existence of the integrals in (1) . This extension is important insome applications, but we will not need it here except in Theorem 6 .6.5 . Asspecified above, it is always well defined (for all real t) and has the followingsimple properties .

(i) `dt E iw 1

If WI < 1 = f (0), f (-t) = f (t),

where z denotes the conjugate complex of z .(ii) f is uniformly continuous in R1 .To see this, we write for real t and h:

f (t + h) - f (t) = f (ei(t+b)X - e itX )l-t(dx),

I f (t + h) - f (t)I < f Ieitx IIe n,X -l IA(dx) =

fIe''ix - 1I µ(dx) .

The last integrand is bounded by 2 and tends to 0 as h --* 0, for each x. Hence,the integral converges to 0 by bounded convergence . Since it does not involvet, the convergence is surely uniform with respect to t .

(iii) If we write fx for the ch .f. of X, then for any real numbers a andb, we have

fax+b(t)= fx (at)e't b

f-x(t) = fx(t) .

This is easily seen from the first equation in (1), for

t (eit(aX+b)) = t~(ei(ta)X e itb) = c (ei(ta)X) e itb

(iv) If If,, n > 1 } are ch .f.'s, X n > 0, T11=1 X,, = 1, then

00~~,nfnn=1

is a ch .f. Briefly: a convex combination of ch .f.'s is a ch .f .

Page 166: A Course in Probability Theory KL Chung

1 52 1 CHARACTERISTIC FUNCTION

For if {µ„ , n > 1 } are the corresponding p .m.'s, then En=1 ;~ n µn is a p .m .whose ch.f. is n=1 4 fn

(v) If {fj , 1 < j < n) are ch .f.'s, thenn

is a ch .f.

By Theorem 3.3 .4, there exist independent r .v.'s {Xj , 1 < j < n} withprobability distributions {µj, 1 < j < n}, where µ j is as in (iv) . Letting

Sn

or in the notation of (iii) :

(2)

(3)

and written as

j=1

we have by the corollary to Theorem 3 .3 .3 :

(fjeitxi

n

n(t(eitsn) _ s'

= 11S(eitxJ) =

11f j (t) ;

(j=1

j=1

j=1

n

fsn = J fx,fj=1

(For an extension to an infinite number of fj 's see Exercise 4 below .)The ch.f. of Sn being so neatly expressed in terms of the ch .f.'s of the

summands, we may wonder about the d .f. Of Sn . We need the followingdefinitions .

DEFINTT1oN . The convolution of two d .f.'s F 1 and F2 is defined to be thed.f. F such that

Vx E J.1 : F(x) =J700

F, (x - y) dF2(y),

F=F1 *F2 .

It is easy to verify that F is indeed a d .f. The other basic properties ofconvolution are consequences of the following theorem .

Theorem 6 .1.1 . Let X 1 and X2 be independent r .v .'s with d .f.'s F1 and F2 ,respectively . Then X1 + X2 has the d .f . F1 * F2 .

Page 167: A Course in Probability Theory KL Chung

6.1 GENERAL PROPERTIES ; CONVOLUTIONS 1 1 53

PROOF . We wish to show that

(4)

Vx: ,ql{Xi + X2 < x) = (F 1 * F2)(x) .

For this purpose we define a function f of (x1, x2 ) as follows, for fixed x :

_ 1,

if x1 + x2 < x ;f (x1,

x2)

0,

otherwise .

f is a Borel measurable function of two variables . By Theorem 3 .3 .3 andusing the notation of the second proof of Theorem 3 .3 .3, we have

f f(X1, X2)d?T = fff(xi,x2) 2 (dxi,dx2)µ

92

= f 112 (dx2)

f (XI, x2µl (dxl )

= f 1L2(dx2) f

µl (dxl )(-oo,x-X21

= f00 dF2(x2)F1(x - X2)-00

This reduces to (4) . The second equation above, evaluating the double integralby an iterated one, is an application of Fubini's theorem (see Sec . 3 .3) .

Corollary. The binary operation of convolution * is commutative and asso-ciative .

For the corresponding binary operation of addition of independent r .v.'shas these two properties .

DEFINITION . The convolution of two probability density functions P1 andP2 is defined to be the probability density function p such that

(5)

and written as

dx E R,' : P(x) ='COO

P1(x - Y)P2(), ) dy,

P=P1*P2 •

We leave it to the reader to verify that p is indeed a density, but we willspell out the following connection .

Theorem 6.1.2. The convolution of two absolutely continuous d .f.'s withdensities p1 and P2 is absolutely continuous with density P1 * P2 .

Page 168: A Course in Probability Theory KL Chung

1 54 1 CHARACTERISTIC FUNCTION

PROOF . We have by Fubini's theorem:

J

X

X

00

fp(u) du =

fduf P1 (U - v)p2(v) dv

00

00

00

[TO0[f00 Pi_u]

(u v)dp2(v)dv00

00

~F1(x - v)p2(v) A

-oo

= fooF, (x - v) dF2(v) = (F1 * F'2)(x) •

00

This shows that p is a density of F1 * F2 .

What is the p.m., to be denoted by Al * µ2, that corresponds to F 1 * F2?For arbitrary subsets A and B of R 1 , we denote their vector sum and differenceby A + B and A - B, respectively :

(6)

A±B={x±y:xEA,yEB} ;

and write x ± B for {x} ± B, -B for 0 - B. There should be no danger ofconfusing A - B with A\B.

Theorem 6.1.3 . For each B E .A, we have

(7)

(A1 * /x2)(B) = f~~ AIM - Y)A2(dY) .

For each Borel measurable function g that is integrable with respect to µ1 *,a2,we have

(8)

f g(u)(A] * 92)(du) = f fg(x + Y)µ] (dx)/i2(dy) .

PROOF . It is easy to verify that the set function (Al * ADO defined by(7) is a p.m . To show that its d .f. is F1 * F2, we need only verify that its valuefor B = (- oo, x] is given by the F(x) defined in (3) . This is obvious, sincethe right side of (7) then becomes

f F, (x - Y)A2(dy) = f F, (x - Y) dF2(Y)-

f

Now let g be the indicator of the set B, then for each y, the function g y definedby gy (x) = g(x + y) is the indicator of the set B - y. Hence

g(x + Y)It1(dx) = /t, (B - y)I

Page 169: A Course in Probability Theory KL Chung

6.1 GENERAL PROPERTIES ; CONVOLUTIONS 1 155

and, substituting into the right side of (8), we see that it reduces to (7) in thiscase. The general case is proved in the usual way by first considering simplefunctions g and then passing to the limit for integrable functions .

As an instructive example, let us calculate the ch .f. of the convolution/11 * g2 . We have by (8)

f eitu (A] * g2)(du) = feityeuXgl(dx)g2(dy)

= f eirzg1(dx) f eityg2(dy) .

This is as it should be by (v), since the first term above is the ch .f. of X + Y,where X and Y are independent with it, and g2 as p.m.'s . Let us restate theresults, after an obvious induction, as follows .

Theorem 6.1.4 . Addition of (a finite number of) independent r .v.'s corre-sponds to convolution of their d .f.'s and multiplication of their ch .f.'s .

Corollary. If f is a ch .f., then so is I

f 12 .

To prove the corollary, let X have the ch .f . f. Then there exists on someQ (why?) an r .v. Y independent of X and having the same d.f., and so alsothe same ch.f . f . The ch .f. of X - Y is

(eit(X-Y) ) = cP(eitX)cr(e-itY) = f (t)f (-t) = I f (t) 12 .

The technique of considering X - Y and I f 12 instead of X and f willbe used below and referred to as "symmetrization" (see the end of Sec . 6 .2) .This is often expedient, since a real and particularly a positive-valued ch.f .such as If (2 is easier to handle than a general one .

Let us list a few well-known ch .f.'s together with their d .f.'s or p .d.'s(probability densities), the last being given in the interval outside of whichthey vanish .

(1) Point mass at a :d.f . 8Q ;

ch.f. e`a t .

(2) Symmetric Bernoullian distribution with mass 1 each at +1 and -1 :

d.f. 1 (8 1 +6- 1 ) ;

ch.f. cost .

(3) Bernoullian distribution with "success probability" p, and q = I - p:

d.f . q8o + p8 1 ; ch.f. q + pe't = 1 + p(e' t - 1) .

Page 170: A Course in Probability Theory KL Chung

1 56 1 CHARACTERISTIC FUNCTION

(4) Binomial distribution for n trials with success probability p:n

d.f . Y:(n

pkgn-k8k ; ch .f. (q + pe it )n

k=o(5) Geometric distribution with success probability p :

00d.f.

qn p6n ; ch .f. p(l - qe` t )-1

n =O

(6) Poisson distribution with (mean) parameter ). :

(10) Reciprocal of (9) :

d.f.00

n=0

~. nSn ;

ch. f. e" e 1)n .

(7) Exponential distribution with mean ),-1 :

p .d . ke-~ in [0, oc) ; ch.f. (1 - A-1 it)-1 .

(8) Uniform distribution in [-a, +a] :

p.d .I

in [-a, a] ; ch.f .sin

t (- 1 for t = 0) .2a

at(9) Triangular distribution in [-a, a] :

a - Ixl .

2(1 - cosat)ch .f.p.d .

a2

in [-a, a] ;a2 t 2 at

2 /

1 -cosaxi

/

Itlp.d .

n (-oo, oo) ; ch.f. f l - - v 0.7rax2

a

(11) Normal distribution N(m, a2 ) with mean m and variance a2

1

(x-m)2p.d . exp -

in (-oc, oo) ;1/27ra

2Q2

ch .f. exp Cimt -Q 2t2

2Unit normal distribution N(0, 1) = c with mean 0 and variance 1 :

p.d .

1 e -x2 /2 in (-oc, oc) ;

ch.f. e -t2 /2 .1/2Tr

(12) Cauchy distribution with parameter a > 0 :

p .d .

a 7r(a2 + x2)in (-oc, oc) ;

ch.f. e-'I t l .

Page 171: A Course in Probability Theory KL Chung

00

f nsk) (x - y)f (Y)dY

6.1 GENERAL PROPERTIES; CONVOLUTIONS 1 157

Convolution is a smoothing operation widely used in mathematical anal-ysis, for instance in the proof of Theorem 6.5 .2 below . Convolution with thenormal kernel is particularly effective, as illustrated below .

Let ns be the density of the normal distribution N(0, 82 ), namely

1 2ns ( -

= 270.exp

x282)

, -00 < x < 00 .

For any bounded measurable function f on 'V, put

00

"0(9) fs(x) = (f * ns)(x) = f f (x - Y)ns(Y)dY = f ns(x - Y)f (Y)dy .00

It will be seen below that the integrals above converge. Let CB denote theclass of functions on R1 which have bounded derivatives of all orders ; CUthe class of bounded and uniformly continuous functions on R 1 .

Theorem 6.1.5 . For each 8 > 0, we have f s E COO . Furthermore if f E CU ,then fs --> f uniformly in R 1 .

PROOF. It is easily verified that ns E CB . Moreover its kth derivative n3(k)

is dominated by Ck.an2s where ck, s is a constant depending only on k and 8 sothat

CS_ ck,sIIfII

n2S(x-Y)dy=Ck,sIIfII-

Thus the first assertion follows by differentiation under the integral of the lastterm in (9), which is justified by elementary rules of calculus . The secondassertion is proved by standard estimation as follows, for any 77 > 0:

00If (x)-fs(x)I < f If(x)-f(x-Y)Ins(Y)dy

< sup If(x)-f(x-Y)I +2IIf1I

n6(Y)dy.1),I-<q

fI>'I>77

Here is the probability idea involved. If f is integrable over 0, we maythink of f as the density of a r .v . X, and n8 as that of an independent normalr.v . Ys. Then fs is the density of X + Ys by Theorem 6 .1 .2 . As 8 4 0, Ysconverges to 0 in probability and so X + Ys converges to X likewise, hencealso in distribution by Theorem 4 .4.5. This makes it plausible that the densitieswill also converge under certain analytical conditions .

As a corollary, we have shown that the class CB is dense in the class Cuwith respect to the uniform topology on i~' 1 . This is a basic result in the theory

Page 172: A Course in Probability Theory KL Chung

158 1 CHARACTERISTIC FUNCTION

of "generalized functions" or "Schwartz distributions" . By way of application,we state the following strengthening of Theorem 4 .4.1 .

Theorem 6.1.6 . If {li ,1 } and A are s .p.m.'s such that

b f E C,', : f f (x)l- ,, (dx) -~ f f (x)Iu (dx),

then lµ„ -~µ.

This is an immediate consequence of Theorem 4.4.1, and Theorem 6 .1 .5,if we observe that Co C CU . The reduction of the class of "test functions"from Co to CB is often expedient, as in Lindeberg's method for proving centrallimit theorems ; see Sec . 7.1 below .

EXERCISES

1 . . If f is a ch .f., and G a d.f. with G(0-) = 0, then the followingfunctions are all ch .f.'s :

f f (ut) du, f00

00

f (ut)e -u du, fe_I t dG(u),lu0

0f o" e_t-u

dG(u) f00

f (ut) dG(u) .0

o

*2. Let f (u, t) be a function on (-oo, oo) x (-oo, oo) such that for eachu, f (u, •) is a ch .f. and for each t, f ( •, t) is a continuous function ; then

foo

f (u, t) dG(u)cc

is a ch .f. for any d .f. G . In particular, if f is a ch .f. such that lim t, 00 f (t)exists and G a d .f. with G(0-) = 0, then

fo

°O

tf - dG(u) is a ch .f.

u

3. Find the d .f. with the following ch .f.'s (a > 0, ,6 > 0) :

a2

1

1a2 + t2 ' (1 - ait)V (1 + of - ape" ) 1 /

[HINT : The second and third steps correspond respectively to the gamma andPolya distributions .]

Page 173: A Course in Probability Theory KL Chung

6 .1 GENERAL PROPERTIES ; CONVOLUTIONS 1 159

4. Let S1 , be as in (v) and suppose that S, --+ S,,, in pr. Prove that00flfj(t)j=1

converges in the sense of infinite product for each t and is the ch .f. of Sam .5. If F 1 and F2 are d.f.'s such that

F1 =j

and F2 has density p, show that F 1 * F2 has a density and find it .*6. Prove that the convolution of two discrete d .f.'s is discrete ; that of a

continuous d .f. with any d.f. is continuous ; that of an absolutely continuousd.f. with any d .f. is absolutely continuous .

7. The convolution of two discrete distributions with exactly m and natoms, respectively, has at least m + n - I and at most mn atoms .

8. Show that the family of normal (Cauchy, Poisson) distributions isclosed with respect to convolution in the sense that the convolution of anytwo in the family with arbitrary parameters is another in the family with someparameter(s) .

9. Find the nth iterated convolution of an exponential distribution .*10. Let {Xj , j > 1} be a sequence of independent r.v.'s having the

common exponential distribution with mean 1/),, a, > 0. For given x > 0 letv be the maximum of n such that Sn < x, where So = 0, S, = En=1 Xj asusual . Prove that the r.v. v has the Poisson distribution with mean ).x . SeeSec . 5.5 for an interpretation by renewal theory .

11. Let X have the normal distribution (D . Find the d .f., p .d ., and ch.f.of X2 .

12. Let {X j, 1 < j < n} be independent r .v.'s each having the d .f. (D .Find the ch .f. of

b- i

and show that the corresponding p.d . is 2-n 12 F(n/2) -l x (n l2)-l e -x"2 in (0, oc) .This is called in statistics the "X 2 distribution with n degrees of freedom" .

13. For any ch .f. f we have for every t :

R[1 - f (t)] > 4R[1 - f (2t)] .

14. Find an example of two r .v .'s X and Y with the same p .m. µ that arenot independent but such that X + Y has the p .m . µ * It . [HINT : Take X = Yand use ch.f.]

Page 174: A Course in Probability Theory KL Chung

160

1 CHARACTERISTIC FUNCTION

*15. For a d.f. F and h > 0, define

QF(h) = sup[F(x + h) - F(x-)] ;x

QF is called the Levy concentration function of F . Prove that the sup aboveis attained, and if G is also a d .f., we have

Vh > 0: QF•G

(h) -< QF (h) A QG (h)

.

16. If 0 < hA < 2n, then there is an absolute constant A such that

QF(h) < - f if (t)I dt,

where f is the ch.f. of F. [HINT: Use Exercise 2 of Sec . 6.2 below .]17. Let F be a symmetric d.f., with ch .f. f > 0 then

f

DO h2 h +

2 dF(x) = h 00 e-ht f(t) dtoo

x

0(Dr (h) _

is a sort of average concentration function . Prove that if G is also a d .f. withch.f. g > 0, then we have Vh > 0 :

(PF*G(h) < cOF(h) A ip0(h) ;

1 - cPF*G(h) < [1 - coF(h)] + [1 - cPG(h)] .

*18. Let the support of the p.m. a on ~RI be denoted by supp p . Provethat

supp (j.c * v) = closure of supp . + supp v ;

supp (µ i * µ2 * . . . ) = closure of (supp A l + supp / 2 + • )

where "+" denotes vector sum .

6.2 Uniqueness and inversion

To study the deeper properties of Fourier-Stieltjes transforms, we shall needcertain "Dirichlet integrals" . We begin with three basic formulas, where "sgna" denotes 1, 0 or -1, according as a > 0, = 0, or < 0 .

ry sin ax(1)

dy>0:0<(sgna)J dx</ " sin x A .o

x

0 X°° sin ax

7r(2) A = 2 sgn a .

0

x

Page 175: A Course in Probability Theory KL Chung

(3)

0°° 1 - os ax

A = 2 Icel .

The substitution ax = u shows at once that it is sufficient to prove all threeformulas for a = 1 . The inequality (1) is proved by partitioning the interval[0, oo) with positive multiples of Jr so as to convert the integral into a seriesof alternating signs and decreasing moduli. The integral in (2) is a standardexercise in contour integration, as is also that in (3) . However, we shall indicatethe following neat heuristic calculations, leaving the justifications, which arenot difficult, as exercises .

r°0 sin x

f°O

f°°

f°0 r°°o

[ J

x dx= J

sin xI

e-XU du dx= J

Je-xu sin x dx du

f00 du n1

u2

2 '

6.2 UNIQUENESS AND INVERSION 1 161

°0 1 2osxdx= 00

x2X sin udu dx =

00sinu 00 dx

du10

0

o

o

U

~O° sin u

n

J du = -.

o

u

2

We are ready to answer the question : given a ch .f. f, how can we findthe corresponding d .f. F or p.m . µ? The formula for doing this, called theinversion formula, is of theoretical importance, since it will establish a one-to-one correspondence between the class of d .f.'s or p .m .'s and the class ofch .f.'s (see, however, Exercise 12 below) . It is somewhat complicated in itsmost general form, but special cases or variants of it can actually be employedto derive certain properties of a d .f. or p .m. from its ch .f. ; see, e.g., (14) and(15) of Sec. 6 .4 .

Theorem 6.2.1 . If x1 < x2, then we have

(4 )

A((X1, X2)) + 2-~({X1 }) + 2/~({x2})

1

T e-irx1 - e-i'x2= Too 2ir

I T itf (t) dt

(the integrand being defined by continuity at t = 0) .

PROOF . Observe first that the integrand above is bounded by Ixl - x21everywhere and is O(ItI -1 ) as It! --+ oo ; yet we cannot assert that the "infiniteintegral" f0 exists (in the Lebesgue sense). Indeed, it does not in general(see Exercise 9 below) . The fact that the indicated limit, the so-called Cauchylimit, does exist is part of the assertion .

Page 176: A Course in Probability Theory KL Chung

1 62 1 CHARACTERISTIC FUNCTION

We shall prove (4) by actually substituting the definition of f into (4)and carrying out the integrations . We have

1

T e- l"' - e-itx2

00(5)

27r IT

it

f 00e' rx µ(dx) dt

~cc

T eit(x-x,) - eit(x-x2)00[~dt] µ(dx) .

T

27ritt

Leaving aside for the moment the justification of the interchange of the iteratedintegral, let us denote the quantity in square brackets above by I (T, x, x1, X2)-Trivial simplification yields

1 T sin t(x - x2)

t dt.

0

eit(x-x,) - eit(x-xz)it

Furthermore, I is bounded in T by (1) . Hence we may let T ---> oo under theintegral sign in the right member of (5) by bounded convergence, since

JI(T, x, x1, x2 )1< 2

sinxdx

7r 0

X

by (1) . The result is

xO+ Jx 2 +

i+ f 1 +

0}µ(dx)oo,

x ,z

x 2

(x2, 00

))))(

i)

{ ~ l

( i

2)

{ z?

= 2A({x1}) + µ((x1, x2)) + 2µ({x2}) .

This proves the theorem . For the justification mentioned above, we invokeFubini's theorem and observe that

xi,I

e-itu du

where the integral is taken along the real axis, andT

1? 1 IT

for x < x1,

for x = x1,

for xl < x < x2,for x = X2,

for x > x2 .

< IX l -x21,

Ixl - x2 I dt,u,(dx) < 2T Ix, -x21 < oo,

1 1 T sinI(T, x, x1, x2) =

- 0

It follows from (2) that

t(x - xl)dt -

1-

0

t

1

1_ 2 _(_ 2 )_1

10 - (- 2 ) = 21

1lim I (T, x, x1, x2) _ 2-(- 2)=1T-+oo 1

1- 0 -2

21

1 -02 -2=0

Page 177: A Course in Probability Theory KL Chung

6.2 UNIQUENESS AND INVERSION 1 163

so that the integrand on the right of (5) is dominated by a finitely integrablefunction with respect to the finite product measure dt - u(dx) on [-T, +T] x?)1 . This suffices .

Remark. If x1 and x2 are points of continuity of F, the left side of (4)is F(x2) - F(xl ) .

The following result is often referred to as the "uniqueness theorem" forthe "determining" p. or F (see also Exercise 12 below) .

Theorem 6.2.2 . If two p .m.'s or d .f.'s have the same ch .f., then they are thesame.

PROOF . If neither x 1 nor x2 is an atom of µ, the inversion formula (4)shows that the value of µ on the interval (x1, x2 ) is determined by its ch .f. Itfollows that two p .m.'s having the same ch .f. agree on each interval whoseendpoints are not atoms for either measure. Since each p.m. has only a count-able set of atoms, points of ?P,, 1 that are not atoms for either measure form adense set. Thus the two p .m.'s agree on a dense set of intervals, and thereforethey are identical by the corollary to Theorem 2 .2.3 .

We give next an important particular case of Theorem 6 .2.1 .

Theorem 6.2.3 . If f E L 1 (-oo, +oo), then F is continuously differentiable,and we have

00

(6)

F(x) =Zn

e- `X'f (t) dt .J 00

PROOF . Applying (4) for x2 = x and x1 = x - h with h > 0 and using Finstead of µ, we have

F(x) + F(x-) F(x - h) + F(x - h-)

I

O° e"~'-12

2

2n ,~ ~

it e-"X f (t) dt .

Here the infinite integral exists by the hypothesis on f, since the integrandabove is dominated by I h f (t)! . Hence we may let h --)- 0 under the integralsign by dominated convergence and conclude that the left side is 0 . Thus, Fis left continuous and so continuous in

Now we can write

F(x)-F(x-h)

1

°O e"h-1_ e-itx f(t) dt .h

27r _~ ith

The same argument as before shows that the limit exists as h -* 0. HenceF has a left-hand derivative at x equal to the right member of (6), the latterbeing clearly continuous [cf . Proposition (ii) of Sec . 6.1] . Similarly, F has a

Page 178: A Course in Probability Theory KL Chung

164 1 CHARACTERISTIC FUNCTION

right-hand derivative given by the same formula . Actually it is known for acontinuous function that, if one of the four "derivates" exists and is continuousat a point, then the function is continuously differentiable there (see, e .g .,Titchmarsh, The theory offunctions, 2nd ed., Oxford Univ. Press, New York,1939, p. 355) .

The derivative F' being continuous, we have (why?)

fF'(u)du .Vx: F(x) =

Thus F' is a probability density function . We may now state Theorem 6 .2 .3in a more symmetric form familiar in the theory of Fourier integrals .

Corollary. If f E LI , then p E L 1 , where

1

°°p(x) =

2n J e-txt f(t) dt,

and

00f (t) = f00 e1txpxdx .()

The next two theorems yield information on the atoms of µ by means off and are given here as illustrations of the method of "harmonic analysis" .

Theorem 6.2.4 . For each x0, we have

T(7)

lim 1

e-`txo f(t) dt= µ({xo}) .

T---).w 2T -T

PROOF . Proceeding as in the proof of Theorem 6 .2.1, we obtain for theintegral average on the left side of (7) :

(8)

sin T(x - x0)µ(dx) +

lµ(dx) .~~-{x } T(.,c- x0)

{xo)

The integrand of the first integral above is bounded by 1 and tends to 0 asT -). oc everywhere in the domain of integration ; hence the integral convergesto 0 by bounded convergence. The second term is simply the right memberof (7) .

Theorem 6.2.5 . We have

1

T(9)

T2T J-T I f (t)I 2 dt

XE_/? I

Page 179: A Course in Probability Theory KL Chung

~t

6.2 UNIQUENESS AND INVERSION 1 165

PROOF . Since the set of atoms is countable, all but a countable numberof terms in the sum above vanish, making the sum meaningful with a valuebounded by 1 . Formula (9) can be established directly in the manner of (5)and (7), but the following proof is more illuminating . As noted in the proofof the corollary to Theorem 6 .1 .4, 1 f

12 is the ch .f. of the r.v . X - Y there,whose distribution is g * g', where g'(B) = It (-B) for each B E A ApplyingTheorem 6 .2.4 with x0 = 0, we see that the left member of (9) is equal to

(A * g')({0}) .

By (7) of Sec. 6.1, the latter may be evaluated as

I g'({-y})g(dy) =

g({y})g({y}),yE9?'

since the integrand above is zero unless -y is an atom of g', which is thecase if and only if y is an atom of g. This gives the right member of (9) . Thereader may prefer to carry out the argument above using the r .v.'s X and -Yin the proof of the Corollary to Theorem 6 .1 .4 .

Corollary. g is atomless (F is continuous) if and only if the limit in the leftmember of (9) is zero .

This criterion is occasionally practicable .

DEFINITION . The r.v . X is called symmetric iff X and -X have the samedistribution .

For such an r.v ., the distribution g has the following property :

VB E .A : 11(B) = g(-B) .

Such a p.m. may be called symmetric ; an equivalent condition on its d .f. F isas follows :

Yx E RI : F(x) = 1 - F(-x-),

(the awkwardness of using d .f. being obvious here) .

Theorem 6.2.6. X or g is symmetric if and only if its ch .f. is real-valued(for all t) .

PROOF. If X and -X have the same distribution, they must "determine"the same ch.f. Hence, by (iii) of Sec. 6.1, we have

.f (t) = .f (t)

Page 180: A Course in Probability Theory KL Chung

1 66 1 CHARACTERISTIC FUNCTION

and so f is real-valued. Conversely, if f is real-valued, the same argumentshows that X and -X must have the same ch .f. Hence, they have the samedistribution by the uniqueness theorem (Theorem 6.2.2) .

EXERCISES

is the ch .f. of F below.1. Show that

J°O (sinx)2

7r x

dx = 2 .

f

*2. Show that for each T > 0 :1 / °O (1 -cos Tx) cos tx

00

X2dx = (T - ItI) v 0.

Deduce from this that for each T > 0, the function of t given by

(1_)vOItiT

is a ch .f. Next, show that as a particular case of Theorem 6 .2.3,

1 - cos Tx _ 1 fTX2

2J

T (T - It I)e` txdt .

Finally, derive the following particularly useful relation (a case of Parseval'srelation in Fourier analysis), for arbitrary a and T > 0 :

~'OO 1 - cos T(x - a)

1 1 fT

J

[T(x - a)]2 dF(x) = 2 Tz

JT (T - ItI)e-`t° f(t)dt.

*3. Prove that for each a > 0 :

Ja [F(x + u) - F(x - u)] du

-cosate_,tX f (t) dt .o

to

As a sort of reciprocal, we have

1

a

u

o0

/du / f (t) dt =

1 - cosax dF(x) .

0

u

x22

4. If f (t)/t c L1 (-oo, oo), then for each a > 0 such that ±a are pointsof continuity of F, we have

00

F(a) - F(-a) = 1

sin at f.(t)dt .

rr 0 t

Page 181: A Course in Probability Theory KL Chung

6.2 UNIQUENESS AND INVERSION 1 1 67

5. What is the special case of the inversion formula when f - 1? Deducealso the following special cases, where a > 0 :

1 f°O sin at sin tdt=aAl,

7r

x

t2

1

°O sin at (sin t)2

a2

-

t3

d t = a -4

for a < 2; 1 for a > 2 .-00

6. For each n > 0, we have

1

°O sin t n+2

j2 uJ",

t

dt =

f~pn (t)dtdu,

where Cp l = 2and tOn = cpn-1 * (PI for n > 2.*7. If F is absolutely continuous, then liml t l,, f (t) = 0 . Hence, if the

absolutely continuous part of F does not vanish, then lim t, I f (t)I < 1 . IfF is purely discontinuous, then lim, f (t) = 1 . [The first assertion is theRiemann-Lebesgue lemma ; prove it first when F has a density that is asimple function, then approximate . The third assertion is an easy part of theobservation that such an f is "almost periodic" .]

8. Prove that for 0 < r < 2 we have

f

00Ix I r dFx

W

00O= C(r) f 00

1ItR+I (t) dt

where°O 1 -cos a -1 F(r + 1) r7r

C(r) - Cf 00 IuIr+1 du

=

7r

sin 2 ,thus C (l) = 1/,7. [HINT :

°° 1 - cos xtIxIr = C(r)

r+1

dt.]-00

I t I

*9. Give a trivial example where the right member of (4)replaced by the Lebesgue integral

1

00

e-ttx1 - e-Itx2

27r J-c,,R

it

f (t) dt .

But it can always be replaced by the improper Rieniann integral :

1

T2 e-uxi - e-ux2R

f (t) dt .T~--x 27r Tj

itT,- +x

cannot be

Page 182: A Course in Probability Theory KL Chung

1 68 1 CHARACTERISTIC FUNCTION

10. Prove the following form of the inversion formula (due to Gil-Palaez) :

1

1

T eitxf(-t) - e-itx f (t)2 {F(x+) + F(x-)} = 2 + Jim dt.Trx a

27rit

[Hmrr: Use the method of proof of Theorem 6.2.1 rather than the result .]11. Theorem 6.2.3 has an analogue in L2 . If the ch .f. f of F belongs

to L2 , then F is absolutely continuous. [Hmrr : By Plancherel's theorem, thereexists ~0 E L2 such that

x

1

0o e-itx - 1cp(u) du = ~f (t) dt .

0

27r Jo

-it

Now use the inversion formula to show thatx

F(x) - F(O) _cp(u) du .]o2n

* 12. Prove Theorem 6.2 .2 by the Stone-Weierstrass theorem . [=: Cf.Theorem 6.6.2 below, but beware of the differences. Approximate uniformlyg l and 92 in the proof of Theorem 4.4.3 by a periodic function with "arbitrarilylarge" period .]

13. The uniqueness theorem holds as well for signed measures [or func-tions of bounded variations] . Precisely, if each i.ci, i = 1, 2, is the differenceof two finite measures such that

Vt :I

eitxA, (dx)= f

e itxµ2(dx),

then µl = /J,2-14. There is a deeper supplement to the inversion formula (4) or

Exercise 10 above, due to B . Rosen. Under the condition

(1 + log Ix 1) dF(x) < oo,00

the improper Riemann integral in Exercise 10 may be replaced by a Lebesgueintegral . [HINT : It is a matter of proving the existence of the latter . Since

00

fN

J~dF(y)

J

we have

sin(x - y)tt

00d t <f

dF(y){1 + log(1 +NIx - yl)} < oc,00

fdF(y) ~N sin(x t y)t dt = N dt 00

sin(x - y)tdF(y) .~00

Jo

Jo

j 00

Page 183: A Course in Probability Theory KL Chung

For fixed x, we have

dF(y)y¢x

f00 sin(x - y)tdt

N

t

+ Cz f

dF(y),<Ix-yl<1/N

both integrals on the right converging to 0 as N --> oo .]

6.3 Convergence theorems

For purposes of probability theory the most fundamental property of the ch .f.is given in the two propositions below, which will be referred to jointly as theconvergence theorem, due to P. Levy and H . Cramer. Many applications willbe given in this and the following chapter . We begin with the easier half.

Theorem 6.3.1 . Let {/.6n , 1 < n < oc) be p.m.'s on R.1 with ch.f.'s {fn , 1 <n < oo} . If An converges vaguely to µ ms , then fn converges to f , uniformlyin every finite interval . We shall write this symbolically as

< C1

dF(y)

Ix-yI>1/N Nix - y I

6.3 CONVERGENCE THEOREMS I 169

(1) V

uAn -~~C00 : f n-*f00 .

Furthermore, the family If„) is equicontinuous on A1 .

PROOF . Since ettx is a bounded continuous function on JT 1 , althoughcomplex-valued, Theorem 4 .4.2 applies to its real and imaginary parts andyields (1) at once, apart from the asserted uniformity . Now for every t and h,we have, as in (ii) of Sec . 6 .1 :

Ifn (t + h) - f n (t)I < f l e"' - 11µn (dx) < f

I hxJun (dx)Ix I <A

+ 2 J µn(dx) < IhIA+2 f g(dx)+EIx I>A

1xI >A

for any e > 0, suitable A and n > n0 (A, E) . The equicontinuity of { f „ } follows .This and the pointwise convergence f , 7 - fcc imply fn -4 f00 by a simplecompactness argument (the "3E argument") left to the reader .

Theorem 6.3.2 . Let {µn, 1 < n < oo} be p.m.'s on ?,,2 1 with ch.f .'s {fn , 1 <n < co} . Suppose that

(a) fn converges everywhere in a~1 and defines the limit function f,,, ;(b) f00 is continuous at t = 0 .

Page 184: A Course in Probability Theory KL Chung

170 1 CHARACTERISTIC FUNCTION

Then we have

(a)

where µ,, is a p.m . ;(f) f,, is the ch .f. of µ ms .

PROOF . Let us first relax the conditions (a) and (b) to require only conver-gence of f„ in a neighborhood (-80, 8) of t = 0 and the continuity of thelimit function f (defined only in this neighborhood) at t = 0 . We shall provethat any vaguely convergent subsequence of {µ„ } converges to a p .m . µ . Forthis we use the following lemma, which illustrates a useful technique forobtaining estimates on a p .m . from its ch .f.

Lemma. For each A > 0, we have

A - '(2)

µ([-2A, 2A]) > A - f (t) dt - 1 .A- ~

PROOF OF THE LEMMA . By (8) of Sec . 6 .2, we have

(3)

~T T f (t) dt

sin Txµ(dx) .

T

J -oo

Since the integrand on the right side is bounded by 1 for all x (it is defined to be1 at x = 0), and by ITx1 -1 < (2TA)-1 for Ix I > 2A, the integral is bounded by

1/1([-2A, 2A]) + 2TA {1 -,u([-2A, 2A]))

_ (1 - 2TA) /L([-2A, 2A]) + 2TA'

Putting T = A-1 in (3), we obtain

A A 12 f-A f (t) dt

(4)1

a

28 fs f„(t)dtwhich reduces to (2). The lemma is proved .

Now for each 8, 0 < 8 < 80, we have

1

s

28 f sf (t) dt

1

1< 2/u ([-2A, 2A]) + 2

1

s

28 fa If,, (t) - f (t)I dt .

The first term on the right side tends to I as 8 .4, 0, since f (0) = 1 and fis continuous at 0 ; for fixed 8 the second term tends to 0 as n -* oc, bybounded convergence since If,, - f I < 2. It follows that for any given c > 0,

Page 185: A Course in Probability Theory KL Chung

6.3 CONVERGENCE THEOREMS 1 171

there exist 8 = 8(E) < 8o and n o = no (E) such that if n > n o , then the leftmember of (4) has a value not less than 1 - E. Hence by (2)

(5)

µn ([-28-1 , 28-1 ]) > 2(1 - E) - 1 > 1 - 2E .

Let {/i, ) be a vaguely convergent subsequence of {,u, ), which alwaysexists by Theorem 4 .3 .3 ; and let the vague limit be a, which is always ans.p.m. For each 8 satisfying the conditions above, and such that neither -28 -1nor 28-1 is an atom of µ, we have by the property of vague convergenceand (5) :

/t( 1 ) ? /t([-23 -1 , 25-1 1)

lim µn ([-28-1 , 28-1 ]) > 1 - 2E.n-+oo

Since c is arbitrary, we conclude that µ is a p .m., as was to be shown .Let f be the ch .f. of a . Then, by the preceding theorem, f,, --* f every-

where; hence under the original hypothesis (a) we have f = f00. Thus everyvague limit ,u considered above has the same ch .f. and therefore by the unique-ness theorem is the same p .m. Rename it µ00 so that u00 is the p.m. havingthe ch .f. f,,, . Then by Theorem 4 .3 .4 we have µn-'* ,u00 . Both assertions (a)and (,8) are proved .

As a particular case of the above theorem : if (An, 1 < n < oo} and{fn, 1 < n < oc) are corresponding p .m.'s and ch .f.'s, then the converse of(1) is also true, namely :

(6)

An --+ /too '4#' fn --> fcc •

This is an elegant statement, but it lacks the full strength of Theorem 6 .3 .2,which lies in concluding that f ,,z) is a ch.f. from more easily verifiable condi-tions, rather than assuming it . We shall see presently how important this is .

Let us examine some cases of inapplicability of Theorems 6 .3 .1 and 6 .3 .2 .

Example 1 . Let µ„ have mass '-z at 0 and mass 2 at n . Then µ„ --->. /-µx, where it,,,has mass ; at 0 and is not a p .m . We have

1

1

intf„ (t) = Z + 7i e

which does not converge as n -* oo, except when t is equal to a multiple of 27r .

Example 2. Let µ„ be the uniform distribution [-n, n] . Then µ„ --* µ,x , where µ,,,is identically zero . We have

'fn(t) =

1,

if t

0;

if t=0;

Page 186: A Course in Probability Theory KL Chung

172 1 CHARACTERISTIC FUNCTION

and

f » (t) -~ f (t) = ~ 0,

if t = 0 ;1,

if t=0.

Thus, condition (a) is satisfied but (b) is not .

Later we shall see that (a) cannot be relaxed to read : f , ( t) converges inI t I < T for some fixed T (Exercise 9 of Sec . 6.5) .

The convergence theorem above settles the question of vague conver-gence of p .m.'s to a p .m. What about just vague convergence without restric-tion on the limit? Recalling Theorem 4 .4.3, this suggests first that we replacethe integrand euTX in the ch .f. f by a function in Co . Secondly, going over thelast part of the proof of Theorem 6 .3 .2, we see that the choice should be madeso as to determine uniquely an s .p .m. (see Sec . 4 .3) . Now the Fourier-Stieltjestransform of an s.p.m. is well defined and the inversion formula remains valid,so that there is unique correspondence just as in the case of a p .m. Thus anatural choice of g is given by an "indefinite integral" of a ch .f., as follows :

(7)

g(u) = fu[f e`rXµ(dx) dt

= fe' - I µ(dx).

0

I

ix

Let us call g the integrated characteristic function of the s.p.m . µ. We arethus led to the following companion of (6), the details of the proof being leftas an exercise .

Theorem 6.3.3. A sequence of s.p .m.'s {,cc,,, 1 < n < oo} converges (to p 0 )if and only if the corresponding sequence of integrated ch .f.'s {g, 1 } converges(to the integrated ch .f. of µ,,) .

Another question concerning (6) arises naturally . We know that vagueconvergence for p.m.'s is metric (Exercise 9 of Sec . 4.4) ; let the metric bedenoted by Uniform convergence on compacts (viz ., in finite intervals)for uniformly bounded subsets of C B (:% 1) is also metric, with the metricdenoted by ( •, •) 2 , defined as follows :

(f, g)2 = supIf (t) -g(t)1

tE'4l

1 + t

It is easy to verify that this is a metric on CB and that convergence in this metricis equivalent to uniform convergence on compacts ; clearly the denominatorI + t2 may be replaced by any function continuous on Mi l , bounded belowby a strictly positive constant, and tending to +oo as ItI -- oo . Since there isa one-to-one correspondence between ch .f.'s and p .rn.'s, we may transfer the

Page 187: A Course in Probability Theory KL Chung

metric ( •, .) 2 to the latter by setting

(µ, v)2 = (f u , fl,)2

in obvious notation. Now the relation (6) may be restated as follows .

Theorem 6.3 .4 . The topologies induced by the two metrics ( ) 1 and ( ) 2 onthe space of p.m.'s on Z1 are equivalent .

This means that for each u and given c > 0, there exists S(1.t, c) such that :

(u, v)1 :!S S(µ, E) = (A, v)2 < E,

(ii, v)2 < S(µ, E)

(A, v)1 < E .

Theorem 6 .3.4 needs no new proof, since it is merely a paraphrasing of (6) innew words . However, it is important to notice the dependence of S on µ (aswell as c) above. The sharper statement without this dependence, which wouldmean the equivalence of the uniform structures induced by the two metrics,is false with a vengeance ; see Exercises 10 and 11 below (Exercises 3 and 4are also relevant) .

EXERCISES

1 . Prove the uniform convergence of fn in Theorem 6 .3 .1 by an inte-gration by parts of f e` tX d F, (x) .

*2. Instead of using the Lemma in the second part of Theorem 6 .3 .2,prove that µ is a p.m. by integrating the inversion formula, as in Exercise 3of Sec. 6 .2. (Integration is a smoothing operation and a standard technique intaming improper integrals : cf. the proof of the second part of Theorem 6 .5.2below .)

3. Let F be a given absolutely continuous d .f. and let F„ be a sequenceof step functions with equally spaced steps that converge to F uniformly inJT 1 . Show that for the corresponding ch .f.'s we have

Vn : sup I f (t) - f,,(t)I = 1 .tER 1

4. Let F,,, G,, be d.f.'s with ch .f.'s f, and g,, . If f„ - g„ -+ 0 a.e .,then for each f E CK we have f f d F, - f f d G, --- 0 (see Exercise 10of Sec . 4 .4) . This does not imply the Levy distance (F, G,), -+ 0; finda counterexample . [HINT : Use Exercise 3 of Sec . 6.2 and proceed as inTheorem 4.3 .4.]

5. Let F be a discrete d .f. with points of jump {aj , j > 1 } and sizes ofjump {bj , j > 11 . Consider the approximating s .d .f.'s F„ with the same jumpsbut restricted to j < n . Show that F,-'+'F .

6.3 CONVERGENCE THEOREMS 1 173

Page 188: A Course in Probability Theory KL Chung

174 1 CHARACTERISTIC FUNCTION

* 6. If the sequence of ch .f.'s {fn } converges uniformly in a neighborhoodof the origin, then if, } is equicontinuous, and there exists a subsequence thatconverges to a ch .f. [HINT : Use Ascoli-Arzela's theorem.]

7. If F„ - F and Gn -* G, then F,, * G n 4 F * G . [A proof of this simpleresult without the use of ch.f.'s would be tedious .]

*8. Interpret the remarkable trigonometric identity

sin t

°O

tt = H cos n

n=1

in terms of ch .f.'s, and hence by addition of independent r.v.'s . (This is anexample of Exercise 4 of Sec . 6.1 .)

9. Rewrite the preceding formula as

sin t

00

t

00

tt = 11 Cos 22k-1

H COs 22kk=1

k=1

Prove that either factor on the right is the ch .f. of a singular distribution . Thusthe convolution of two such may be absolutely continuous . [xiwr : Use thesame r.v.'s as for the Cantor distribution in Exercise 9 of Sec . 5.3 .]

10 . Using the strong law of large numbers, prove that the convolution oftwo Cantor d .f.'s is still singular . [Hmrr : Inspect the frequency of the digits inthe sum of the corresponding random series ; see Exercise 9 of Sec . 5 .3 .]

*11. Let Fn , G n be the d .f.'s of µ„ , v n , and fn , gn their ch .f.'s . Even ifsupxEg? , IF,, (x) - Gn (x) I - * 0, it does not follow that (f It, gn) 2 -+ 0; indeedit may happen that (f,,, g„)2 = 1 for every n . [HINT : Take two step functions"out of phase" .]

12. In the notation of Exercise 11, even if sup t.R , I f n (t) - g,, (t) I -* 0,it does not follow that (F,,, Gn ) --> 0 ; indeed it may - k 1 . [HINT : Let f be anych.f. vanishing outside (-1, 1), fj (t) = e-initf (mjt), g j (t) = e'n i t f (mj t), andFj , G j be the corresponding d.f .'s . Note that if mjn,. 1 0, then F j (x) -k 1,Gj (x) -a 0 for every x, and that fj - g j vanishes outside (-mi 1 , mi 1 ) andis 0(sin n j t) near t = 0. If mj = 2j2 and n j = jmj then Ej (f j - gj) is

uniformly bounded in t : for nk+1 < t < nk 1 consider j > k, j = k, j < k sepa-rately. Let

It

j=1

n*

-1gn = n

~gj,j=1

then sup I fn - gn I = O(n-1 ) while F ; - Gn --* 0. This example is due to

Katznelson, rivaling an older one due to Dyson, which is as follows . For

Page 189: A Course in Probability Theory KL Chung

b > a > 0, let

for x < 0 and = 1 for x > 0; G(x) = 1 - F(-x) . Then,

t [e-° l t l - e-' ]f (t) - g(t) _ -7ri ~tl

blog

-/a

If a is large, then (F, G) is near 1 . If b/a is large, then (f, g) is near 0.]

6.4 Simple applications

A common type of application of Theorem 6.3.2 depends on power-seriesexpansions of the ch .f., for which we need its derivatives .

Theorem 6 .4.1 . If the d .f. has a finite absolute moment of positive integralorder k, then its ch.f. has a bounded continuous derivative of order k given by

a

x2log

+ b2F(x) _ - (x2 +a2

2 a logb

6.4 SIMPLE APPLICATIONS I 175

(1) f (k) (t) = f(jxe1tx)k dF(x) .

Conversely, if f has a finite derivative of even order k at t = 0, then F hasa finite moment of order k .

PROOF . For k = 1, the first assertion follows from the formula :

f (t + h) - f (t)

00 ei(t+h)x - eitxdF(x) .

h

_00

h

An elementary inequality already used in the proof of Theorem 6 .2.1 showsthat the integrand above is dominated by txj . Hence if f Jxl dF(x) < oo, wemay let h 0 under the integral sign and obtain (1) . Uniform continuity ofthe integral as a function of t is proved as in (ii) of Sec . 6 .1 . The case of ageneral k follows easily by induction .

To prove the second assertion, let k = 2 and suppose that f "(0) existsand is finite . We have

f1/ (0)-hof(h)-2f(0)+f(-h)

h2

Page 190: A Course in Probability Theory KL Chung

176 1 CHARACTERISTIC FUNCTION

ihx _

e ihz= ;io e

h2 dF(x)

_

1 cos hx-21o

h2 dF(x).

As h -* 0, we have by Fatou's lemma,1 - cos hx

1 - cos hxfx 2 dF(x) = 2 f ho h

dF(x) < lim 2f

dF(x)2

h~O

h2

_ - f"(0).

Thus F has a finite second moment, and the validity of (1) for k = 2 nowfollows from the first assertion of the theorem .

The general case can again be reduced to this by induction, as follows .Suppose the second assertion of the theorem is true for 2k - 2, and thatf(2k) (0) is finite. Then f(2k-2) (t) exists and is continuous in the neighborhoodof t = 0, and by the induction hypothesis we have in particular

(- I )k-1 f x2k-2 dF(x) = f(2k-2) (0)

Put G(x) = fX" y2k-2 dF(y) for every x, then G(•)/G(oo) is a d.f. with the

(2)

Hence 1/r" exists, and by the case k = 2 proved above, we have

-V(2)(0)

oG(o) fx2dG(x) G(«~) fx2k dFx( )

Upon cancelling G(oo), we obtain

(_ 1)k f(2k) (O) = fx2k dF(x),

which proves the finiteness of the 2kth moment . The argument above failsif G(oc) = 0, but then we have (why?) F = 60 , f = 1, and the theorem istrivial .

Although the next theorem is an immediate corollary to the precedingone, it is so important as to deserve prominent mention .

Theorem 6.4.2 . If F has a finite absolute moment of order k, k an integer> 1, then f has the following expansion in the neighborhood of t = 0 :

k Jl ,n (i) t.i + o(I tI k ),I

I•

ch.f.t

( )1

= f e itxx2k-2 dF (x) = (-1)k-i f (2k-2) (t)G(oc) G(oo)

Page 191: A Course in Probability Theory KL Chung

(3')

f (t) =

k-1 j m (j) t i+ k~ A

(k) Itl k ;-j_0

where m (j ) is the moment of order j, / .t (k) is the absolute moment of order k,and l9kI<1 .

PROOF . According to a theorem in calculus (see, e.g ., Hardy [1], p . 290]),if f has a finite kth derivative at the point t = 0, then the Taylor expansionbelow is valid :

FU)(4)

f (t) =

'0) tj + o(Itlk) .j=0

J!

If f has a finite kth derivative in the neighborhood of t = 0, thenk-1 U) (0)

f (k)(4')

f (t) -

0)tj +iet) tk,

I9l < 1 .j=0

~ .

k .

Since the absolute moment of order j is finite for 1 < j < k, andf(j)(0) = ijm(j),

lf (k) (0t)I < 1_t (k)from (1), we obtain (3) from (4), and (3') from (4') .

It should be remarked that the form of Taylor expansion given in (4)is not always given in textbooks as cited above, but rather under strongerassumptions, such as "f has a finite kth derivative in the neighborhood of0" . [For even k this stronger condition is actually implied by the weaker onestated in the proof above, owing to Theorem 6.4 .1 .] The reader is advisedto learn the sharper result in calculus, which incidentally also yields a quickproof of the first equation in (2) . Observe that (3) implies (3') if the last termin (3') is replaced by the more ambiguous O(Itl k ), but not as it stands, sincethe constant in "0" may depend on the function f and not just on µ(k) .

By way of illustrating the power of the method of ch .f.'s without theencumbrance of technicalities, although anticipating more elaborate develop-ments in the next chapter, we shall apply at once the results above to prove twoclassical limit theorems : the weak law of large numbers (cf. Theorem 5 .2 .2),and the central limit theorem in the identically distributed and finite variancecase. We begin with an elementary lemma from calculus, stated here for thesake of clarity .

Lemma. If the complex numbers c 12 have the limit c, then

(5)

6.4 SIMPLE APPLICATIONS I 177

lim (1 +Cn)

n= e`

n-*oo

n

(For real c,,'s this remains valid for c = +oo .)

Page 192: A Course in Probability Theory KL Chung

178 1 CHARACTERISTIC FUNCTION

Now let {X„ , n > 1 } be a sequence of independent r .v .'s with the commond.f. F, and S, = En=1 X j , as in Chapter 5 .

Theorem 6.4.3 . If F has a finite mean m, then

S n-f m in pr .

n

PROOF. Since convergence to the constant m is equivalent to that in dist .to Sn, (Exercise 4 of Sec . 4.4), it is sufficient by Theorem 6 .3 .2 to prove thatthe ch .f. of S„ /n converges to etmt (which is continuous) . Now, by (2) ofSec. 6.1 we have

E(eit(Sn/n) ) = E(ei(t/n)Sn

f

t

n) =

-n

By Theorem 6.4 .2, the last term above may be written as

t

t1 ~-im-+ o

n

n

n

for fixed t and n --* oo. It follows from (5) that this converges to ell' asdesired .

Theorem 6.4.4. If F has mean m and finite variance a 2 > 0, then

Sn - mn-* (D in dist .

a In-

where c is the normal distribution with mean 0 and variance 1 .

PROOF . We may suppose m = 0 by considering the r.v.'s X j - m, whosesecond moment is Q2 . As in the preceding proof, we have

Sn

n

C,Cexp it= f

t

1+ i2a2 t2+ o

(ft,

)

(t2

2

n

-= j 1-

+ o t

--+ e r~/2

l

2n

n

The limit being the ch .f. of c, the proof is ended .

The convergence theorem for ch .f.'s may be used to complete the methodof moments in Theorem 4 .5 .5, yielding a result not very far from what ensuesfrom that theorem coupled with Carleman's condition mentioned there .

Page 193: A Course in Probability Theory KL Chung

Theorem 6 .4.5 . In the notation of Theorem 4 .5 .5, if (8) there holds togetherwith the following condition :

m (k) t k(6)

Vt E'W1 : lim

= 0,k-+ oo k !

then F n - 17). F.

PROOF . Let f n be the ch .f. of Fn . For fixed t and an odd k we have bythe Taylor expansion for e`tt with a remainder term :

f n (t) = f ee`tX dFn (x) _

k

(ltx)j +9

l itxl k+1

dFn(x)

j=0 J .

(k + 1)'.

(lt)jm (j) + emnk+l)tk+l

j=o Ji

n

(k +1)!

'

where 0 denotes a "generic" complex number of modulus <1, not necessarilythe same in different appearances . (The use of such a symbol, to be furtherindulged in the next chapter, avoids the necessity of transposing terms andtaking moduli of long expressions . It is extremely convenient, but one mustoccasionally watch the dependence of 9 on various quantities involved .) Itfollows that

j

k+l

(7)

fn(t)-f(t)=

(~i(m,(,i-)-m(~))+

(k+1)! (mnk+1)+m(k+1))+

j=0

Given E > 0, by condition (6) there exists an odd k = k(E) such that for thefixed t we have

(2m (k+l)+1 )tk+l

E(8)

(k + 1) !

< 2

Since we have fixed k, there exists n o = n 0 (E) such that if n > n 0 , then

and moreover,

j=

m(k+l) < m (k+l) + 1n

-

'

max I m (') - MW I < e-Itl E1<j<k

2

Then the right side of (7) will not exceed in modulus :k

ItI'e-Itl 6 + tk+l (2m(k+l) + 1) <

_ E .J!

2

(k+ 1)!

6.4 SIMPLE APPLICATIONS 1 179

Page 194: A Course in Probability Theory KL Chung

180 1 CHARACTERISTIC FUNCTION

Hence f „ (t) -+ f (t) for each t, and since f is a ch .f., the hypotheses ofTheorem 6 .3 .2 are satisfied and so F„ -4F.

As another kind of application of the convergence theorem in which alimiting process is implicit rather than explicit, let us prove the followingcharacterization of the normal distribution .

Theorem 6.4.6 . Let X and Y be independent, identically distributed r .v .'swith mean 0 and variance 1 . If X + Y and X - Y are independent then thecommon distribution of X and Y is (D .

PROOF . Let f be the ch.f., then by (1), f'(0) = 0, f"(0) = -1 . The ch.f.of X + Y is f (t)2 and that of X - Y is f (t) f (-t) . Since these two r .v .'s areindependent, the ch .f . f (2t) of their sum 2X must satisfy the following relation :

(9)

f (2t) = f (t) 3f ( -t) .

It follows from (9) that the function f never vanishes. For if it did at t0 , thenit would also at t0/2, and so by induction at t0 /2" for every n > 1 . This isimpossible, since limn,, f (t0/2n) = f (0) = 1 . Setting for every t :

Wp(t) = f

((t)'we obtain

(10)

p(2t) = p(t) 2 .

Hence we have by iteration, for each t :

t

t

/

2nP(t)=p

21

= 1-{ 0211 }

1Cby Theorem 6.4 .2 and the previous lemma . Thus p(t) - 1, f (t) - f (-t), and(9) becomes

(11)

f (2t) = f (t)4 .

Repeating the argument above with (11), we have"

t 4

1

t )2

t

42

f(t)=f 211

=

22n +O C21~ e

_t2/2

This proves the theorem without the use of "logarithms" (see Sec . 7.6) .

EXERCISES

*1. If f is the ch .f. of X, and

f(t)- 1 _Ez

lim

=

> - 00,r~0

t 2

2

Page 195: A Course in Probability Theory KL Chung

6.4 SIMPLE APPLICATIONS 1 181

then c` (X) = 0 and c`(X 2 ) = a2. In particular, if f (t) = 1 + 0(t 2 ) as t -+ 0,then f - 1 .

*2. Let {Xn } be independent, identically distributed with mean 0 and vari-ance a2 , 0 < a2 < oo . Prove that

ISn 1

S.+

2lira ` /~ = 2 lim c~

= -an-oo

v"

n--4oo

-,fn-

7i

[If we assume only A{X 1 0 0} > 0, c_~(jX11) < oo and €(X 1 ) = 0, then wehave c`(ISn 1) > C.,fn- for some constant C and all n ; this is known asHornich's inequality .] [Hu rr: In case a 2 = oo, if lim n (-r(1 S,/J) < oo, thenthere exists Ink} such that Sn / nk converges in distribution; use an extensionof Exercise 1 to show I f (t/ /,i)I2n -* 0 . This is due to P. Matthews .]

3. Let GJ'{X = k} = Pk, 1 < k < f < oc, Ek=1 pk = 1 . The sum Sn ofn independent r.v .'s having the same distribution as X is said to have amultinomial distribution . Define it explicitly. Prove that [Sn - c``(S n )]/a(Sn )converges to 4) in distribution as n -± oc, provided that a(X) > 0 .

*4. Let Xn have the binomial distribution with parameter (n, pn ), andsuppose that npn --+ a, > 0. Prove that Xn converges in dist. to the Poisson d.f.with parameter A . (In the old days this was called the law of small numbers .)

5. Let XA have the Poisson distribution with parameter A . Prove thatconverges in dist. to' as k --* oo .

*6. Prove that in Theorem 6 .4.4, Sn /6.,fn- does not converge in proba-bility . [HINT : Consider S,/a/ and Stn /o 2n .]

7. Let f be the ch .f. of the d .f. F . Suppose that as t ---> 0,

f (t) - 1 = O(ItI« )

where 0 < a < 2, then as A --> 00,

dF(x) = O(A-").IxI >A

[HINT : Integrate fxI>A (1 - cos tx) dF(x) < Ct" over t in (0, A) .]

8. If 0 < a < 1 and f IxI" dF(x) < oc, then f (t) - 1 = o(jtI') as t -+ 0 .For 1 < a < 2 the same result is true under the additional assumption thatf xdF(x) = 0 . [HIT: The case 1 < a < 2 is harder . Consider the real andimaginary parts of f (t) - 1 separately and write the latter as

sin tx dF(x) +J

sin tx dF(x) .IXI <E/l

IXI>E 1 ItI

Page 196: A Course in Probability Theory KL Chung

182 1 CHARACTERISTIC FUNCTION

The second is bounded by (Its/E)« f i, 1 ,11 , 1 Jxj« dF(x) = o(ltl«) for fixed c . Inthe first integral use sin tx = tx + O(ItxI 3 ),

txdF(x) = tJ

xdF(x),IXI<E/Itl

IXI>E/Iti

00(x) _ < E3-« f

.I

IXI<E/Ititx 3dF

00I tx I« dF(x)

9. Suppose that e- `I`la , where c > 0, 0 < a < 2, is a ch .f. (Theorem 6.5 .4below). Let {Xj , j > 11 be independent and identically distributed r.v.'s witha common ch .f. of the form

1 - $jti« + o0ti«)

as t -a 0. Determine the constants b and 9 so that the ch .f. of S, /bne convergesto e -ltl°

10. Suppose F satisfies the condition that for every q > 0 such that asA --± oo,

Then all moments of F are finite, and condition (6) in Theorem 6 .4.5 is satisfied .11 . Let X and Y be independent with the common d .f . F of mean 0 and

variance 1 . Suppose that (X + Y)1,12- also has the d .f . F. Then F =- 4) . [HINT :

Imitate Theorem 6 .4.5 .]*12. Let {Xj , j > 11 be independent, identically distributed r .v .'s with

mean 0 and variance 1 . Prove that both11 n

converges uniformly in t.]

fX>AdF(x) = O(e-"A ) .

EXj

,ln-j=1

n

and nj=1

converge in dist . t o (D . [Hint: Use the law of large numbers .]13 . The converse part of Theorem 6 .4 .1 is false for an odd k. Example .

F is a discrete symmetric d .f. with mass C/n 2 log n for integers n > 3, whereO is the appropriate constant, and k = 1 . [HINT : It is well known that the series

sin ntln og n

Page 197: A Course in Probability Theory KL Chung

6.4 SIMPLE APPLICATIONS 1 183

We end this section with a discussion of a special class of distributionsand their ch .f.'s that is of both theoretical and practical importance .

A distribution or p.m. on 31 is said to be of the lattice type iff itssupport is in an arithmetical progression -that is, it is completely atomic(i .e ., discrete) with all the atoms located at points of the form {a + jd}, wherea is real, d > 0, and j ranges over a certain nonempty set of integers . Thecorresponding lattice d.f. is of form :

00F(x) =

a+jd (x),j=-00

where pj > 0 and

pj = 1 . Its ch .f. is

00

(12)

f (t) = e ait

ejdit

j=-00

which is an absolutely convergent Fourier series . Note that the degenerate d .f .Sa with ch.f. e' is a particular case. We have the following characterization .

Theorem 6.4.7 . A ch.f. is that of a lattice distribution if and only if thereexists a to :A 0 such that I f (to ) I = 1 .

PROOF . The "only if" part is trivial, since it is obvious from (12) that I fis periodic of period 2rr/d. To prove the "if" part, let f (to ) = ei9°, where 00is real; then we have

1 = e-i00 f(to) = Iei(t°X-00)g (dx )

and consequently, taking real parts and transposing :

(13)

0 = f[1 - cos(tox - 00)]g(dx) .

The integrand is positive everywhere and vanishes if and only if for someinteger j,

xBO+>C to )

It follows that the support of g must be contained in the set of x of this formin order that equation (13) may hold, for the integral of a strictly positivefunction over a set of strictly positive measure is strictly positive . The theoremis therefore proved, with a = 00/t0 and d = 27r/to in the definition of a latticedistribution .

Page 198: A Course in Probability Theory KL Chung

184 1 CHARACTERISTIC FUNCTION

It should be observed that neither "a" nor "d" is uniquely determinedabove; for if we take, e .g ., a' = a + d' and d' a divisor of d, then the supportof z is also contained in the arithmetical progression {a' + jd'} . However,unless ,u is degenerate, there is a unique maximum d, which is called the"span" of the lattice distribution. The following corollary is easy .

Corollary. Unless I f I - 1, there is a smallest to > 0 such that f (to) = 1 .The span is then 27r/to.

Of particular interest is the "integer lattice" when a = 0, d = 1 ; that is,when the support of ,u is a set of integers at least two of which differ by 1 . Wehave seen many examples of these . The ch .f. f of such an r.v. X has period27r, and the following simple inversion formula holds, for each integer j:

(14)

9'(X = j) = Pj = - ~ f (t)e-Pu dt,I

7r

2,7 -

where the range of integration may be replaced by any interval of length2,7. This, of course, is nothing but the well-known formula for the "Fouriercoefficients" of f. If {Xk} is a sequence of independent, identically distributedr.v.'s with the ch .f. f, then S„ Xk has the ch.f. (f)', and the inversionformula above yields :

(15)

~'(S" = j) = 2n

[f (t)]"edt.-n

This may be used to advantage to obtain estimates for S, (see Exercises 24to 26 below) .

EXERCISES

f or f, is a ch .f. below .14 . If I f (t) I = 1, 1 f (t')I = 1 and t/t' is an irrational number, then f is

degenerate. If for a sequence {tk } of nonvanishing constants tending to 0 wehave I f (0 1 = 1, then f is degenerate .

*15 . If l f „ (t) l -)~ 1 for every t as n --* oo, and F" is the d .f. corre-sponding to f ", then there exist constants a,, such that F" (x + a" )48o . [HINT :

Symmetrize and take a,, to be a median of F" .]* 16. Suppose b,, > 0 and I f (b" t) I converges everywhere to a ch .f. that

is not identically 1, then b" converges to a finite and strictly positive limit .[HINT: Show that it is impossible that a subsequence of b" converges to 0 orto +oo, or that two subsequences converge to different finite limits .]

* 17 . Suppose c" is real and that e`'^`` converges to a limit for every tin a set of strictly positive Lebesgue measure . Then c" converges to a finite

Page 199: A Course in Probability Theory KL Chung

limit . [HINT : Proceed as in Exercise 16, and integrate over t . Beware of anyargument using "logarithms", as given in some textbooks, but see Exercise 12of Sec. 7 .6 later .]

* 18. Let f and g be two nondegenerate ch .f.'s. Suppose that there existreal constants a„ and bn > 0 such that for every t :

.f n (t) -* f (t) and eltan /b„ fn t -* g(t) .bn

Then an -- a, bn --* b, where a is finite, 0 < b < oo, and g(t) = eitalb f(t/b) .[HINT : Use Exercises 16 and 17 .]

*19. Reformulate Exercise 18 in terms of d .f.'s and deduce the followingconsequence . Let Fn be a sequence of d .f.'s an , a'n real constants, bn > 0,b;, > 0. If

Fn (bn x + an ) -v+ F(x) and

where F is a nondegenerate d.f., then

bn -~ 1 and anb

an ~ 0.n

n

[Two d.f.'s F and G such that G(x) = F(bx + a) for every x, where b > 0and a is real, are said to be of the same "type" . The two preceding exercisesdeal with the convergence of types .]

20. Show by using (14) that I cos tj is not a ch .f. Thus the modulus of ach.f. need not be a ch .f., although the squared modulus always is .

21. The span of an integer lattice distribution is the greatest commondivisor of the set of all differences between points of jump .

22. Let f (s, t) be the ch .f. of a 2-dimensional p.m. v. If I f (so , to) I = 1for some (so, t o ) :0 (0, 0), what can one say about the support of v?

*23. If {Xn } is a sequence of independent and identically distributed r .v.'s,then there does not exist a sequence of constants {c n } such that En (X„ - c„ )converges a .e ., unless the common d .f. is degenerate .In Exercises 24 to 26, let S n

Xj , where the X's are independent r.v.'swith a common d .f. F of the integer lattice type with span 1, and taking both>0 and <0 values .

*24. If f xdF(x) = 0, f x2 dF(x) = 62 , then for each integer j

n 112 jp-~1{& = j } 6 2n

[HINT : Proceed as in Theorem 6 .4.4, but use (15) .]

6.4 SIMPLE APPLICATIONS 1 185

Fn (bnx + an ) -4 F(x),

Page 200: A Course in Probability Theory KL Chung

186 1 CHARACTERISTIC FUNCTION

25. If F 0 60, then there exists a constant A such that for every j :

5{S, = j} < An -1/2 .

[HINT : Use a special case of Exercise 27 below .]26 . If F is symmetric and f Ix I dF(x) < oo, then

n-13111S, = j } --+ oc .

[HINT : 1 -f(t) =o(ItI) as t-+ 0.127. If f is any nondegenerate ch . f, then there exist constants A > 0

and 8 > 0 such that

f (t)I < 1 -At 2 for I t I < 8 .

[HINT: Reduce to the case where the d .f. has zero mean and finite variance bytranslating and truncating .]

28. Let Q,, be the concentration function of S, = En j-1 Xj , where theXD's are independent r.v.'s having a common nondegenerate d .f . F. Then forevery h > 0,

Qn (h) < An -1/2

[HINT: Use Exercise 27 above and Exercise 16 of Sec . 6.1 . This result is dueto Levy and Doeblin, but the proof is due to Rosen .]In Exercises 29 to 35, it or Ak is a p.m. on u41 = (0, 1] .

29. Define for each n :

fu(n) = fe2mnxA(dx).

9r

Prove by Weierstrass's approximation theorem (by trigonometrical polyno-mials) that if f A , (n) = f ,,, (n) for every n > 1, then µ 1 - µ2. The conclusionbecomes false if =?,l is replaced by [0, 1] .

30. Establish the inversion formula expressing µ in terms of the f µ (n )'s .Deduce again the uniqueness result in Exercise 29 . [HINT : Find the Fourierseries of the indicator function of an interval contained in W.]

31. Prove that I f µ (n) I = 1 if and only if . has its support in the set{eo + jn -1 , 0 < j < n - 1} for some 00 in (0, n -1 ] .

*32. ,u, is equidistributed on the set {jn -1 , 0 < j < n - 1 } if and only iffA(i) = 0 or 1 according to j t n or j I n .

* 33. itk- v µ if and only if f:-, k ( ) -,, f ( ) everywhere .34. Suppose that the space 9/ is replaced by its closure [0, 1 ] and the

two points 0 and I are identified ; in other words, suppose 9/ is regarded as the

Page 201: A Course in Probability Theory KL Chung

(2)

circumference

of a circle; then we can improve the result in Exercise 33as follows . If there exists a function g on the integers such that f , , A ( .) -+ g( .)

such that g = fµ and µk-µeve where, then there exists a p.m. µ onon 'l .

35. Random variables defined on are also referred to as defined"modulo 1" . The theory of addition of independent r .v .'s on 0 is somewhatsimpler than on 0, as exemplified by the following theorem . Let {Xj , j > 1}be independent and identically distributed r .v.'s on 0 and let Sk = Ek j=1 Xj .Then there are only two possibilities for the asymptotic distributions of Sk .Either there exist a constant c and an integer n > I such that Sk - kc convergesin dist. to the equidistribution on {jn -1 , 0 < j < n - 1} ; or Sk converges indist. to the uniform distribution on [HINT : Consider the possible limits of(f N, (n) )k as k -+ oo, for each n . ]

6.5 Representation theorems

A ch.f. is defined to be the Fourier- Stieltjes transform of a p .m. Can it be char-acterized by some other properties? This question arises because frequentlysuch a function presents itself in a natural fashion while the underlying measureis hidden and has to be recognized . Several answers are known, but thefollowing characterization, due to Bochner and Herglotz, is most useful . Itplays a basic role in harmonic analysis and in the theory of second-orderstationary processes .

A complex-valued function f defined on R1 is called positive definite ifffor any finite set of real numbers t j and complex numbers zj (with conjugatecomplex zj), 1 < j < n, we have

n

n

(1)

E T~ f (tj - tk)ZjZk ? 0 -i=1 k=1

Let us deduce at once some elementary properties of such a function .

Theorem 6.5.1. If f is positive definite, then for each t E0 :

f(-t) = f (t),

I f (t)I << f (0) .

If f is continuous at t = 0, then it is uniformly continuous in R, 1 . Inthis case, we have for every continuous complex-valued function ~ on .1 andevery T > 0 :

6.5 REPRESENTATIVE THEOREMS 1 187

f (s - t)~(s)~(t) ds dt > 0 .

Page 202: A Course in Probability Theory KL Chung

188 1 CHARACTERISTIC FUNCTION

PROOF. Taking n = 1, t 1 = 0, zl = 1 in (1), we see that

f(0)>0.

Taking n = 2, t1 = 0, t2 = t, ZI = Z2 = 1, we have

2 f (0) + f (t) + f (-t) - 0 ;

changing z2 to i, we have

f (0 ) + f (t)i - f ( -t)i + f (0) > 0 .

Hence f(t) + f (-t) is real and f (t) - f (-t) is pure imaginary, which implythat f (t) = f (-t) . Changing zl to f (t) and z2 to -If (t)I, we obtain

2f (0) If (t) 12 - 21f (t)1 3 > 0 .

Hence f (0) > I f (t) 1, whether I f (t) I = 0 or > 0. Now, if f (0) = 0, thenf ( .) = 0 ; otherwise we may suppose that f (0) = 1. If we then take n = 3,t l = 0, t2 = t, t3 = t + h, a well-known result in positive definite quadraticforms implies that the determinant below must have a positive value :

f (0)

f (-t)

f (-t - h)f (t)

f (0)

f (-h)f(t+h)

f(h)

f(0)

= 1 - If (t)1 2 - I f (t + h) 12 - I f (h) 12 + 2R{f (t)f (h)f (t + h)} ? 0 .

It follows that

if (t)-f(t+h)12 =1f(t)12 +if(t+h)1 2 -2R{f(t)f(t+h)}

< 1 - I f (h) 1 2 + 2R{f (t)f (t + h)[f (h) - 11)

< 1 - If (h) 12 + 211 - f (h)I < 411 - f (h)I .

Thus the "modulus of continuity" at each t is bounded by twice the squareroot of that at 0; and so continuity at 0 implies uniform continuity every-where. Finally, the integrand of the double integral in (2) being continuous,the integral is the limit of Riemann sums, hence it is positive because thesesums are by (1) .

Theorem 6.5.2 . f is a ch .f. if and only if it is positive definite and continuousat 0 with f (0) = 1 .

Remark. It follows trivially that f is the Fourier-Stieltjes transform

eltX v(dx)

Page 203: A Course in Probability Theory KL Chung

6.5 REPRESENTATIVE THEOREMS 1 189

of a finite measure v if and only if it is positive definite and finite continuousat 0: then v(A 1 ) = f (0) .

PROOF . If f is the ch .f. of the p.m. g, then we need only verify that it ispositive definite. This is immediate, since the left member of (1) is then

2

(4)

Now observe that for a > 0,

1 fa

1 a 2 sin Pt

2(1 - cos at)o dj3f e

-itX dx =- f d fi =a

P

a

t

ate

(where at t = 0 the limit value is meant, similarly later) ; it follows that

1

a

T

It

1- cos ata f d~ PT (x) dx =

7r

l1 - 1 f (t) atedt

0

T

-cos at dt71

. ff T(t)

at2

-

7r

f°°

~a

~ 1 -tcos tf T

z

d t,

where

(5)

fT(t) =

x

e"'zje` rkzkµ(dx) _n

j=1e`t) zj

T

PT(x)= 2n f

TCl

- ~tJ f (t)e- ' 'x dt .

,u (dx) > 0 .

Conversely, if f is positive definite and continuous at 0, then byTheorem 6.5.1, (2) holds for ~(t) = e-"x . Thus

T T

(3)

12f f f (s - t)e-`(s-r)X ds dt > 0.0

o

Denote the left member by PT (x) ; by a change of variable and partial integra-tion, we have

C1 - ItI

) f(t),

if ItI < T ;

0,

if I t I > T.

Note that this is the product of f by a ch .f. (see Exercise 2 of Sec . 6 .2)and corresponds to the smoothing of the would-be density which does notnecessarily exist .

Page 204: A Course in Probability Theory KL Chung

190 1 CHARACTERISTIC FUNCTION

Since I fT (t)I < If(t) I < 1 by Theorem 6 .5 .1, and (1 - cost)/t2 belongs toL 1 (-oc, oc), we have by dominated convergence :

1~

1(6) Jim a of d,Bf PT (x) dx = 7r

- 17r

(7)

(8)

Here the second equation follows from the continuity of f T at 0 withf T(0) = 1, and the third equation from formula (3) of Sec . 6.2 . Since PT ? 0,

the integral f ~ PT (x)dx is increasing in ,B, hence the existence of the limit ofits "integral average" in (6) implies that of the plain limit as f -+ oo, namely:

f 00

fiPT (x) dx = lim

PT (x) dx = 1 .00

6-+00 f8

Therefore PT is a probability density functio n . Returning to (3) and observingthat for real z :

e'iXe-`txdx = 2 sin $(z - t)

z-t

we obtain, similarly to (4) :

1

a J_ 18

7rIrX

1 fco

- cos a(z - t)

ao d~

e PT (x)dx =-~~ fT(t) a(z - t)2

dt

- 1 f00T

t 1 -costz--

dt.TC

oo

a

t2

Note that the last two integrals are in reality over finite intervals. Lettinga -* oc, we obtain by bounded convergence as before :

( 00

1 -lim fT (

t-~

costdt

00 a-* °°

a

t2f0 1 - cost

t2

dt = 1 .

'COOe

= fT(z),-00

the integral on the left existing by (7) . Since equation (8) is valid for each r,we have proved that fT is the ch .f. of the density function PT . Finally, sincef T (z) --+ f (z) as T -~ oc for every z, and f is by hypothesis continuous atz = 0, Theorem 6 .3 .2 yields the desired conclusion that f is a ch.f.

As a typical application of the preceding theorem, consider a family of(real-valued) r .v.'s {X1 , t E +}, where °X+ _ [0, oc), satisfying the followingconditions, for every s and t in :W+ :

Page 205: A Course in Probability Theory KL Chung

(i) f' (X2 ) = 1 ;(ii) there exists a function r(.) on R1 such that $ (X SX t ) = r(s - t) ;(iii) limo cF((X0 -Xt )2 ) = 0.

A family satisfying (i) and (ii) is called a second-order stationary process orstationary process in the wide sense ; and condition (iii) means that the processis continuous in L2(Q, 3~7, 'q) .

For every finite set of t j and z j as in the definitions of positive definite-ness, we have

tj Zj

_}

n n

j=1 k=1

Thus r is a positive definite function . Next, we have

r(0) - r(t) = 9(Xo (Xo - X t )),

hence by the Cauchy-Schwarz inequality,

~r(t) - r(0)1 2 < G (X 2 ) g ((Xo - Xt)2 ) .

It follows that r is continuous at 0, with r(O) = 1 . Hence, by Theorem 6 .5.2,r is the ch.f. of a uniquely determined p .m. R:

r(t) = I e` txR(dx) .

This R is called the spectral distribution of the process and is essential in itsfurther analysis .

Theorem 6 .5.2 is not practical in recognizing special ch .f.'s or inconstructing them . In contrast, there is a sufficient condition due to Polyathat is very easy to apply .

Theorem 6.5.3 . Let f on %, . 1 satisfy the following conditions, for each t :

(9)

f (0) = 1, f (t) > 0, f (t) = f (-t),

f is decreasing and continuous convex in R+ _ [0, oc) . Then f is a ch .f.

PROOF . Without loss of generality we may suppose that

f (oc) = lim f (t) = 0 ;t +00

otherwise we consider [f (t) - f (oc)]/[f (0) - f (oc)] unless f (00) = 1, inwhich case f - 1 . It is well known that a convex function f has right-hand

e(X tjX tk )ZjZk =

tj - tk)ZjZk1k=1

6.5 REPRESENTATIVE THEOREMS 1 191

Page 206: A Course in Probability Theory KL Chung

192 1 CHARACTERISTIC FUNCTION

and left-hand derivatives everywhere that are equal except on a countable set,that f is the integral of either one of them, say the right-hand one, which willbe denoted simply by f', and that f is increasing . Under the conditions ofthe theorem it is also clear that f is decreasing and f is negative in R+ .Now consider the f T as defined in (5) above, and observe that

-fT'(t) = -(1-T f'(t)+ f(t),

if 0<t<T;-

0,

if t > T .

Thus -f' T is positive and decreasing in ?4+ . We have for each x :A 0 :

°~

°°

2

°°e-`tx f T(t) dt = 2 f cos tx f T(t) dt = f sin tx(- f T(t)) dt

o

x o00

(k+l)ir/xsin tx(- f T (t)) dt .

fnlX

The terms of the series alternate in sign, beginning with a positive one, anddecrease in magnitude, hence the sum > 0 . [This is the argument indicatedfor formula (1) of Sec. 6.2.] For x = 0, it is trivial that

J-~fT(t) dt > 0 .

We have therefore proved that the PT defined in (4) is positive everywhere,and the proof there shows that f is a ch .f. (cf. Exercise 1 below) .

Next we will establish an interesting class of ch.f.'s which are naturalextensions of the ch .f.'s corresponding to the normal and the Cauchy distri-butions .

Theorem 6.5 .4. For each a in the range (0, 2],

f a (t) = eHtl°

is a ch .f.

PROOF. For 0 < a < 1, this is a quick consequence of Polya's theoremabove. Other conditions there being obviously satisfied, we need only checkthat f a is convex in [0, oc) . This is true because its second derivative isequal to

e-ta lot2t2a-2 - a(a - 1)t'-2} > 0

for the range of a in question . No such luck for 1 < a < 2, and there areseveral different proofs in this case. Here is the one given by Levy which

Page 207: A Course in Probability Theory KL Chung

works for 0 < a < 2 . Consider the density functiona

p(x) = 2lxla+iif lxl > 1,

0

if lxl<1 ;

and compute its ch .f. f as follows, using

f(t)

symmetry:

f°O1-

=J~(1-e"X)p(x)dx=a

cos txJ

x«+i dx

a

°O 1- cos u

` 1- cos u= al tl

o

ua+1 du - fo

ua+1 du} '

after the change of variables tx = u . Since 1 - cos u 2 u2 near u = 0, the firstintegral in the last member above is finite while the second is asymptoticallyequivalent to

1

u2

_

1 2-a2 J ua+1 du - 2(2 - a) t

as t 4, 0. Therefore we obtain

f (t) = 1 - ca ltla + 0(t 2)

where ca is a positive constant depending on a.It now follows that

f

t

n =

caltla

t2(n is

1 -n

+ O (n 2/« }

n

is also a ch .f. (What is the probabilistic meaning in terms of r.v.'s?) For eacht, as n -± oc, the limit is equal to e`° I `l a (the lemma in Sec . 6 .4 again!) . This,being continuous at t = 0, is also a ch .f. by the basic Theorem 6 .3 .2, and theconstant ca may be absorbed by a change of scale . Finally, for a = 2, fa isthe ch .f. of a normal distribution . This completes the proof of the theorem .

Actually Levy, who discovered these ch .f.'s around 1923, proved alsothat there are complex constants ya such that e-Ya I 1 I a is a ch .f., and determinedthe exact form of these constants (see Gnedenko and Kolmogorov [12]) . Thecorresponding d .f.'s are called stable distributions, and those with real posi-tive ya the symmetric stable ones. The parameter a is called the exponent.These distributions are part of a much larger class called the infinitely divisibledistributions to be discussed in Chapter 7 .

Using the Cauchy ch.f. e- I'I we can settle a question of historical interest .Draw the graph of this function, choose an arbitrary T > 0, and draw thetangents at +T meeting the abscissa axis at +T', where T' > T . Now define

6 .5 REPRESENTATIVE THEOREMS I 193

Page 208: A Course in Probability Theory KL Chung

194 1 CHARACTERISTIC FUNCTION

the function f T to be f in [-T, T], linear in [-T', -T] and in [T, T'], andzero in (-oo, -T') and (T', oo). Clearly f T also satisfies the conditions ofTheorem 6 .5 .3 and so is a ch .f. Furthermore, f = fT in [-T, T] . We havethus established the following theorem and shown indeed how abundant thedesired examples are .

Theorem 6.5.5 . There exist two distinct ch .f.'s that coincide in an intervalcontaining the origin .

That the interval of coincidence can be "arbitrarily large" is, of course,trivial, for if f 1 = f2 in [-8, 8], then gl = $2 in [-n8, n8], where

g1(t)=f1Cn , g2(t)=f2Cn/

.

Corollary. There exist three ch .f.'s, f 1, f 2, f 3, such that f 1 f 3- f 2 f 3 butfl # f2.

To see this, take f I and f2 as in the theorem, and take f 3 to be anych.f. vanishing outside their interval of coincidence, such as the one describedabove for a sufficiently small value of T . This result shows that in the algebraof ch .f.'s, the cancellation law does not hold .

We end this section by another special but useful way of constructingch.f.'s .

Theorem 6.5.6 . If f is a ch .f., then so is e"(f-1) for each A > 0 .

PROOF . For each A > 0, as soon as the integer n > A, the function

1--+~'f=1+k(f-1)n n

n

is a ch .f., hence so is its nth power; see propositions (iv) and (v) of Sec .As n -* oc,

1 + ), (f -1) n --+ eA(f-1)n

and the limit is clearly continuous . Hence it is a ch .f. by Theorem 6 .3 .2 .Later we shall see that this class of ch .f.'s is also part of the infinitely

divisible family . For f (t) = e", the corresponding

e;Je,00 e-,lAn

_

eitn

n=0 n !

is the ch .f. of the Poisson distribution which should be familiar to the reader .

Page 209: A Course in Probability Theory KL Chung

EXERCISES

1 . If f is continuous in :W' and satisfies (3) for each x E

and eachT > 0, then f is positive definite .

2. Show that the following functions are ch .f.'s :

1

_

1 - Itl",

if Iti < 1 ;1 -+ Itl'

f (t)

0,

if Itl > 1 ;

0 < a < l,

1- Itl,

if 0 < Itl < if(t) =

4i

if Iti > z .

3. If {X,} are independent r .v.'s with the same stable distribution ofexponent a, then Ek=l Xk/n 'I' has the same distribution. [This is the originof the name "stable" .]

4. If F is a symmetric stable distribution of exponent a, 0 < a < 2, thenf

00 Ix J' dF(x) < oo for r < a and = oo for r > a . [HINT : Use Exercises 7and 8 of Sec . 6.4 .]

*5. Another proof of Theorem 6 .5.3 is as follows . Show that

and define the d .f. G on R+ by

G(u) = f t d f'(t) .[0 u]

Next show that

(1 - Itl dG(u) = f (t) .Jo

U

Hence if we set

f(u,t)=(1_u) v0u

(see Exercise 2 of Sec . 6.2), then

f (t) = f

f (u, t) dG(u) .[O,oc)

Now apply Exercise 2 of Sec . 6.1 .6 . Show that there is a ch .f. that has period 2m, m an integer > 1, and

that is equal to 1 - Itl in [-1, +1] . [HINT : Compute the Fourier series of sucha function to show that the coefficients are positive .]

1000

6.5 REPRESENTATIVE THEOREMS I 195

tdf'(t) = 1

Page 210: A Course in Probability Theory KL Chung

196 1 CHARACTERISTIC FUNCTION

7. Construct a ch.f. that vanishes in [-b, -a] and [a, b], where 0 < a <b, but nowhere else . [HINT : Let f,,, be the ch.f. in Exercise 6 and consider

1: p,,, f ,,, , where p

0,

, = 1,m M

and the pm 's are strategically chosen .]8. Suppose f (t, u) is a function on M2 such that for each u, f ( ., u) is a

ch.f. ; and for each t, f (t, .) is continuous . Then for any d .f . G,00

lexpf

[f (t, u) - l] dG(u) }J

)))~

is a ch .f.9. Show that in Theorem 6.3 .2, the hypothesis (a) cannot be relaxed to

require convergence of if,) only in a finite interval It j < T .

6.6 Multidimensional case; Laplace transforms

We will discuss very briefly the ch .f. of a p.m. in Euclidean space of more thanone dimension, which we will take to be two since all extensions are straight-forward . The ch .f. of the random vector (X, Y) or of its 2-dimensional p .m .µ is defined to be the function f ( ., .) on M2 :

(1)

f (s, t) = f (X,Y)(s , t) =c`(ei(sx+tf)) =

ei(sX+ty),i(dx, dy)

Propositions (i) to (v) of Sec . 6.1 have their obvious analogues . The inversionformula may be formulated as follows . Call an "interval" (rectangle)

{(x, Y) : x1 < x < x2, Yl _< Y <_ Y2}

an interval of continuity iff the tc-measure of its boundary (the set of pointson its four sides) is zero. For such an interval I, we have

1

T

T e-` SXl - e-isx, e-'t)` - e-itY2It(I) = T

limc (27)2 -T -T

is

it

f (s, t)dsdt.

The proof is entirely similar to that in one dimension . It follows, as there, thatf uniquely determines p . Only the following result is noteworthy .

Theorem 6.6.1. Two r.v .'s X and Y are independent if and only if

(2)

Vs, Vt :

f(X,Y)(s, t) = fX(s)fY(t),

where fx and f y are the ch .f.'s of X and Y, respectively .

Page 211: A Course in Probability Theory KL Chung

6.6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 1 197

The condition (2) is to be contrasted with the following identify in onevariable :

Vt: fX+Y (t) = fX(Of Y (t),

where fx+Y is the ch .f. of X + Y. This holds if X and Y are independent, butthe converse is false (see Exercise 14 of Sec . 6.1) .

PROOF OF THEOREM 6 .6.1 . If X and Y are independent, then so are eisx ande"Y for every s and t, hence

(~(ei (sx+tY)) = cf(e'sX , eitY) = e(eisx) (t(eitY ) ,

which is (2) . Conversely, consider the 2-dimensional product measure µ i x µ2,where ILl and µ2 are the 1-dimensional p .m.'s of X and Y, respectively, andthe product is defined in Sec . 3.3 . Its ch .f. is given by definition as

ffei+t (ILi(sXy)x A2)(dx, dy) = ffeisxg?2

°Rz

e `t) /L l (dx)IL2 (dy)

= f eisXµi(dx) f eity 92(dy) = fx(s)f y(t)

(Fubini's theorem!). If (2) is true, then this is the same as f (x, Y) (s, t), so thatµ l x µ2 has the same ch .f. as µ, the p.m. of (X, Y) . Hence, by the uniquenesstheorem mentioned above, we have I,c l x i 2 = /t . This is equivalent to theindependence of X and Y.

The multidimensional analogues of the convergence theorem, and someof its applications such as the weak law of large numbers and central limittheorem, are all valid without new difficulties . Even the characterization ofBochner has an easy extension . However, these topics are better pursued withcertain objectives in view, such as mathematical statistics, Gaussian processes,and spectral analysis, that are beyond the scope of this book, and so we shallnot enter into them .

We shall, however, give a short introduction to an allied notion, that ofthe Laplace transform, which is sometimes more expedient or appropriate thanthe ch .f., and is a basic tool in the theory of Markov processes .

Let X be a positive (>O) r .v. having the d.f. F so that F has support in[0, oc), namely F(0-) = 0 . The Laplace transform of X or F is the functionF on

[0, oc) given by

(3)

F(X) = -P(e -;'X ) = f

e-~`X dF(x) .[o,oo)

It is obvious (why?) that

F(0) = hm F(A) = 1, F(oo) = lim F(A) = F(0) .x J,o

x-+oo

Page 212: A Course in Probability Theory KL Chung

198 1 CHARACTERISTIC FUNCTION

More generally, we can define the Laplace transform of an s .d .f. or a functionG of bounded variation satisfying certain "growth condition" at infinity . Inparticular, if F is an s.d.f., the Laplace transform of its indefinite integral

X

G(x) = / F(u) du0

is finite for ~ . > 0 and given by

G(~ ) _

e--xF(x) dx =f

e-~ dxf

/.c(d y)4[O,oo)

[O,oo)

[O,x]

_f

a(dy)f

e-Xx dx = - f e_ ~,cc(dy)[O, )

)

[O oo)

A

whereµ is the s .p.m. of F. The calculation above, based on Fubini's theorem,replaces a familiar "integration by parts" . However, the reader should bewareof the latter operation . For instance, according to the usual definition, asgiven in Rudin [1] (and several other standard textbooks!), the value of theRiemann-Stieltjes integral

0e-~'x d6o(x)

is 0 rather than 1, but

or e-~ d8o (x) = Jim 80 (x)e- ; x IIE + f00

80 (x)Ae-~ dxAToc

is correct only if the left member is taken in the Lebesgue-Stieltjes sense, asis always done in this book .

There are obvious analogues of propositions (i) to (v) of Sec . 6.1 .However, the inversion formula requiring complex integration will be omittedand the uniqueness theorem, due to Lerch, will be proved by a different method(cf. Exercise 12 of Sec . 6 .2) .

Theorem 6.6.2 . Let Fj be the Laplace transform of the d .f . F j with support

100

in °+,j=1,2.If F1 = F2, then F1=F2 .

PROOF . We shall apply the Stone-Weierstrass theorem to the algebragenerated by the family of functions {e- ~` X, A > 0}, defined on the closedpositive real line : 7+ = [0, oo], namely the one point compactification ofM,+ = [0, oo). A continuous function of x on R + is one that is continuousin ~+ and has a finite limit as x --* oo . This family separates points on A' +and vanishes at no point of ~f? + (at the point +oo, the member e-0x = 1 of thefamily does not vanish!) . Hence the set of polynomials in the members of the

Page 213: A Course in Probability Theory KL Chung

6.6 MULTIDIMENSIONAL CASE ; LAPLACE TRANSFORMS I 199

family, namely the algebra generated by it, is dense in the uniform topology,in the space CB04+ ) of bounded continuous functions on o+ . That is to say,given any g E CB(:4), and e > 0, there exists a polynomial of the form

n

g, (x) =

cje_;, 'X ,

j=1where c j are real and Aj > 0, such that

sup 1g(X) - gE(x)I < E.XE~+

Consequently, we have

f Ig(x) - gE(x)I dF j (x) < E, j = 1, 2 .

By hypothesis, we have for each a . > 0 :

fedFi(x)-"x

= fedF2(x)-~-- ,

and consequently,

f gE (x)dF1(x) = fg€ (x)dF2(x).

It now follows, as in the proof of Theorem 4 .4.1, first that

f g(x) dF1(x) = f g(x) dF2(x)

for each g E CB (J2+ ) ; second, that this also holds for each g that is theindicator of an interval in R+ (even one of the form (a, oc] ) ; third, thatthe two p.m.'s induced by F 1 and F2 are identical, and finally that F 1 = F2as asserted .

Remark. Although it is not necessary here, we may also extend thedomain of definition of a d .f. F to 7+ ; thus F(oc) = 1, but with the newmeaning that F(oc) is actually the value of F at the point oo, rather thana notation for limX,,,,F(x) as previously defined . F is thus continuous atoc . In terms of the p.m. µ, this means we extend its domain to ~+ but setit(fool) = 0 . On other occasions it may be useful to assign a strictly positivevalue to u({oo}) .

Passing to the convergence theorem, we shall prove it in a form closestto Theorem 6 .3 .2 and refer to Exercise 4 below for improvements .

Theorem 6.6.3 . Let IF,, 1 < n < oo} be a sequence of s .d.f.'s with supportsin ?,o+ and IF, } the corresponding Laplace transforms. Then Fn4 F,,,,,, whereF,,, is a d.f., if and only if:

Page 214: A Course in Probability Theory KL Chung

200 1 CHARACTERISTIC FUNCTION

(4)

(a) lim, , ~ oc F„ (,~) exists for every k . > 0 ;(b) the limit function has the limit 1 as k ., 0 .

Remark . The limit in (a) exists at k = 0 and equals 1 if the Fn 's ared.f.'s, but even so (b) is not guaranteed, as the example Fn = 6, shows .

PROOF . The "only if" part follows at once from Theorem 4 .4.1 and theremark that the Laplace transform of an s .d .f. is a continuous function in R+ .Conversely, suppose lim F,, (a,) = GQ.), A > 0 ; extended G to M+ by settingG(0) = 1 so that G is continuous in R+ by hypothesis (b) . As in the proofof Theorem 6 .3 .2, consider any vaguely convergent subsequence Ffk with thevague limit F,,,, necessarily an s .d.f. (see Sec . 4.3) . Since for each A > 0,e-XX E C0, Theorem 4 .4.1 applies to yield P,, (A) -± F,(A) for A > 0, whereF,, is the Laplace transform of F, Thus F,, (A) = GQ.) for A > 0, andconsequently for ~. > 0 by continuity of F~ and G at A = 0 . Hence everyvague limit has the same Laplace transform, and therefore is the same F~ byTheorem 6 .6 .2. It follows by Theorem 4 .3 .4 that Fn ~ F00 . Finally, we haveFoo (oo) = Fc (0) = G(0) = 1, proving that F. is a d.f .

There is a useful characterization, due to S . Bernstein, of Laplace trans-forms of measures on R+. A function is called completely monotonic in aninterval (finite or infinite, of any kind) iff it has derivatives of all orders theresatisfying the condition :

(- 1)n f (n) (A) > 0

1?+

for each n > 0 and each a. in the domain of definition .

Theorem 6.6.4 . A function f on (0, oo) is the Laplace transform of a d .f. F :

(5)

f (.) =

e-~XdF(x),

if and only if it is completely monotonic in (0, oo) with f (0+) = 1 .

Remark . We then extend f to M+ by setting f (0) = 1, and (5) willhold for A > 0 .

PROOF. The "only if" part is immediate, since

f (n) ('~) = f (-x)n e-ax dF(x) .'

Turning to the "if" part, let us first prove that f is quasi-analytic in (0, oo),namely it has a convergent Taylor series there . Let 0 < Ao < A < it, then, by

Page 215: A Course in Probability Theory KL Chung

Taylor's theorem, with the remainder term in the integral form, we havek-I f(j)(µ)

J(6)

f (~) =

(~ - ~)j=0

J '

+(~- µ)k

1 (1 - t)k If (k) (µ + (~ - µ)t) dt .(k-1)! 0

Because of (4), the last term in (6) is positive and does not exceed~- k 1

( µ)I (1 - t)k-1f(1)(µ + (Xo - µ)t)dt .

(k - 1) ! o

For if k is even, then f (k) 4, and (~ - µ)k > 0, while if k is odd then f (k) Tand (~ - µ) k < 0 . Now, by (6) with ). replaced by ) .o , the last expression isequal to

(7)

;~

_µ µ/

k

4 -

6.6 MULTIDIMENSIONAL CASE ; LAPLACE TRANSFORMS 1 201

f (4)-iµ)

(4 - A)JJ •j=0

F, (x) =[nx]

n (-1)Jf (j) (n).j=0 j!

~_µ k

f (, 0 ),(X0 -µ

where the inequality is trivial, since each term in the sum on the left is positiveby (4) . Therefore, as k -a oc, the remainder term in (6) tends to zero and theTaylor series for f (A) converges .

Now for each n > 1, define the discrete s .d .f. Fn by the formula :

This is indeed an s .d.f., since for each c > 0 and k > 1 we have from (6) :k-1 f(j)( )

1 = f (0+) > f (E) >n

(E - n )JT, j Ij=0

Letting E ~ 0 and then k T oo, we see that F,, (oc) < 1 . The Laplace transformof F„ is plainly, for X > 0 :

\°O`

Je - 'xdF 7,(x) = Le

-~(jl~i) ( '~) f(j)(n )j=0

~~

-.l/») - n)J f(J) (n) = f(n (1 - e-Zl11 )),=L li (n(1 - ej=0

k- f(j)

the last equation from the Taylor series . Letting n --a oc, we obtain for thelimit of the last term f (A), since f is continuous at each A . It follows from

Page 216: A Course in Probability Theory KL Chung

202 ( CHARACTERISTIC FUNCTION

Theorem 6 .6 .3 that IF, } converges vaguely, say to F, and that the Laplacetransform of F is f . Hence F(oo) = f (0) = 1, and F is a d .f. The theoremis proved .

EXERCISES

1 . If X and Y are independent r .v .'s with normal d .f.'s of the samevariance, then X + Y and X - Y are independent .

*2. Two uncorrelated r .v.'s with a joint normal d .f. of arbitrary parametersare independent . Extend this to any finite number of r.v .'s .

*3. Let F and G be s.d.f.'s . If a. 0 > 0 andPQ.) = G(a,) for all A > A0, thenF = G. More generally, if F(nA0) = G(n.10) for integer n > 1, then F = G.[HINT : In order to apply the Stone-Weierstrass theorem as cited, adjoin theconstant 1 to the family {e - ~` -x, .k > A0} ; show that a function in Co can actuallybe uniformly approximated without a constant.]

*4. Let {F„ } be s.d.f.'s . If A0 > 0 and limnwFn (;,) exists for all ~ . > a.0,then IF,J converges vaguely to an s .d.f .

5. Use Exercise 3 to prove that for any d .f. whose support is in a finiteinterval, the moment problem is determinate .

6. Let F be an s .d.f. with support in R+ . Define G0 = F,

Gn (x) = IX

Gn-1 (u) du

for n > 1 . Find G, (A) in terms of F (A. ) .

7. Let f (A) = f°° e`` f (x) dx where f E L 1 (0, oo). Suppose that f hasa finite right-hand derivative f'(0) at the origin, then

f (0) = lim J,f-00

f'(0) = ~ A[Af (A) - f (0)] . ). 00

*8. In the notation of Exercise 7, prove that for every a,, µ E M+:

(µ - )

C,*,

0C

e-(~s+ur) f(s+t)fdsdt0

0

9. Given a function a on W+ that is finite, continuous, and decreasingto zero at infinity, find a a-finite measure µ on A'+ such that

`dt > 0 : 1 6(t - s)~i(ds) = 1 .[0,t]

[HINT : Assume a(0) = 1 and consider the d .f. 1 - a.]

Page 217: A Course in Probability Theory KL Chung

*10. If f > 0 on (0, oo) and has a derivative f' that is completely mono-tonic there, then 1/ f is also completely monotonic .

11 . If f is completely monotonic in (0, oo) with f (0+) = + oo, then fis the Laplace transform of an infinite measure /.t on ,W+ :

f (a.) = f e-;`Xµ(dx) .

[HIwr : Show that Fn (x) < e26 f (B) for each 8 > 0 and all large n, where F,is defined in (7) . Alternatively, apply Theorem 6 .6 .4 to f (A. + n -')/ f (n -1 )

for A > 0 and use Exercise 3 .]12. Let {g,1 , 1 < n < oo} on M+ satisfy the conditions : (i) for each

n, g,, ( .) is positive and decreasing ; (ii) g00 (x) is continuous ; (iii) for eachA>0,

0

lim f e-kxgn (x) dx = f e_;,Xg00

(x) dx.n-+oo o

o

Thenlim gn (x) = g. (x) for every x E + .n-+co

[HINT : For E > 0 consider the sequence O° e -EXgn (x) dx and show thatfb

b

~~li of e -EXg„ (x) dx =

e-EXg00 (x) dx, ` l gn (b) < goo (b- ),a

a

00

and so on .]

Formally, the Fourier transform f and the Laplace transform F of a p.m .with support in T+ can be obtained from each other by the substitution t = iAor A = -it in (1) of Sec . 6.1 or (3) of Sec . 6.6. In practice, a certain expressionmay be derived for one of these transforms that is valid only for the pertinentrange, and the question arises as to its validity for the other range . Interestingcases of this will be encountered in Secs . 8.4 and 8 .5. The following theoremis generally adequate for the situation.

Theorem 6.6.5 . The function h of the complex variable z given by

h(z) = f eu dF(x)

is analytic in Rz < 0 and continuous in Rz < 0 . Suppose that g is anotherfunction of z that is analytic in Rz < 0 and continuous in Rz -< 0 such that

VIE

1 : h(it) = g(it) .

6.6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 1 203

Page 218: A Course in Probability Theory KL Chung

204 1 CHARACTERISTIC FUNCTION

Then h(z) - g(z) in Rz < 0 ; in particular

V). E J??+ : h(-A) = g

PROOF. For each integer m > 1, the function hm defined by

°°

xn

h.(Z) _

e'dF(x) =

n

dF(x)[O,m]

n-0

[O,m ] n .

is clearly an entire function of z. We have

e7-x dF(x) <

dF(x)fm,oo)

f ,oo)

in Rz < 0; hence the sequence hm converges uniformly there to h, and his continuous in Rz < 0 . It follows from a basic proposition in the theory ofanalytic functions that h is analytic in the interior Rz < 0 . Next, the differenceh - g is analytic in Rz < 0, continuous is Rz < 0, and equal to zero on theline Rz = 0 by hypothesis . Hence, by Schwarz's reflection principle, it can beanalytically continued across the line and over the whole complex plane . Theresulting entire functions, being zero on a line, must be identically zero in theplane. In particular h - g - 0 in Rz <- 0, proving the theorem .

Bibliographical Note

For standard references on ch.f.'s, apart from Levy [7], [11], Cramer [10], Gnedenkoand Kolmogorov [12], Loeve [14], Renyi [15], Doob [16], we mention :

S . Bochner, Vorlesungen uber Fouriersche Integrale . Akademische Ver-laggesellschaft, Konstanz, 1932 .

E. Lukacs, Characteristic fiunctions . Charles Griffin, London, 1960.

The proof of Theorem 6 .6.4 is taken from

Willy Feller, Completely monotone functions and sequences, Duke J. 5 (1939),661-674 .

Page 219: A Course in Probability Theory KL Chung

7 Central limit theorem and

its ramifications

7.1 Liapounov's theorem

The name "central limit theorem" refers to a result that asserts the convergencein dist. of a "normed" sum of r.v .'s, (S 1, - a„)/b 11 , to the unit normal d .f. (D .We have already proved a neat, though special, version in Theorem 6 .4.4 .Here we begin by generalizing the set-up . If we write

(1)

(2)

we see that we are really dealing with a double array, as follows . For eachn > 1 let there be k, 1 r.v.'s {X„j , 1 < j < k11}, where kit --* oo as n - oo:

X11, X12, . . . , X1k, ;

X21, X22, . . .,X2k2 ;

. . . . . . . . . . . . . . . . .

X ,11 ,X,12, . . . , X 11k,, ,

Page 220: A Course in Probability Theory KL Chung

206 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

The r.v.'s with n as first subscript will be referred to as being in the nth row .Let F, ,j be the d.f., fn j the ch.f. of Xnj ; and put

Sn = Sn,kn

`(Xnj) = anj,kn

(- (Sn )

n j = an,(3)

j=1

e` (IXnj1 3 ) = Ynj,

In the special case of (1), we have

(4)

Xnj =Xjb,n

If we take b„ = s,, then

j=1

j=1

The particular case kn = n for each n yields a triangular array, and if, further-more, X,, j = Xj for every n, then it reduces to the initial sections of a singlesequence {Xj , j > 1) .

We shall assume from now on that the r.v.'s in each row in (2) areindependent, but those in different rows may be arbitrarily dependent as inthe case just mentioned -indeed they may be defined on different probabilityspaces without any relation to one another . Furthermore, let us introduce thefollowing notation for moments, whenever they are defined, finite or infinite :

knU2 (Xnj) = 1 .

j=1

By considering X,, j - a,, j instead of Xnj , we may suppose

(5)

`dn, bj : a„j = 0

whenever the means exist . The reduction (sometimes called "norming")leading to (4) and (5) is always available if each Xnj has a finite secondmoment, and we shall assume this in the following .

In dealing with the double array (2), it is essential to impose a hypothesisthat individual terms in the sum

o (1i'n j) = Un j ,kn

o 2 (Sn)

Un2

2j =Sn ,

j=1

j=1

Ynj .

2

a2 (Xj)6 (Xnj)= b2n

Page 221: A Course in Probability Theory KL Chung

are "negligible" in comparison with the sum itself . Historically, this arosefrom the assumption that "small errors" accumulate to cause probabilisticallypredictable random mass phenomena . We shall see later that such a hypothesisis indeed necessary in a reasonable criterion for the central limit theorem suchas Theorem 7.2.1 .

In order to clarify the intuitive notion of the negligibility, let us considerthe following hierarchy of conditions, each to be satisfied for every e > 0 :

(a)

Vj:

lim G~'{IXnjI > E} = 0 ;n-*o0(b)

lim max vi'{ IX n j I >,e) = 0;11 +00 1<j<kn

(c)

lim 1/6 max IX n j I> E = 0 ;12 +00

1<j<k„

n

(d)

lim

{IXn jI > E} = 0 .n+o0

j=1

It is clear that (d) ~ (c) = (b) = (a) ; see Exercise 1 below . It turns out that(b) is the appropriate condition, which will be given a name .

DEFINITION . The double array (2) is said to be holospoudic* iff (b) holds .

(6)

and consequently

~xI>E

Vt E 71 : lira max I f n j (t) - 11 =0 .n-*00 1<j<k„

PROOF . Assuming that (b) is true, we have

fnj (t) - 11 < f l e `tx - 11 dFn j (x) = I

+J//

J~xj >E

IxI~E

<

2dFn j (x) + It] Ix

Ix I dFnj(x)IxJ>E

Ix I <E

7.1 LIAPOUNOV'S THEOREM l 207

Theorem 7.1 .1 . A necessary and sufficient condition forholospoudic is :

< 21

dFnj (x) + EItI ;

maxifnj (t)- 11 < 2maxJ'{IXnjI > e}+Eltl .J

j

* I am indebted to Professor M . Wigodsky for suggesting this word, the only new term coinedin this book .

(2) to be

Page 222: A Course in Probability Theory KL Chung

208 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Letting n -~ oc, then e -- 0, we obtain (6) . Conversely, we have by theinequality (2) of Sec . 6 .3,

XI>E

and consequently

max ~{IXn,I > E} < E

max I1 - f,, ;(t)I dt.2 ItI<2/E J

Letting n -f oc, the right-hand side tends to 0 by (6) and bounded conver-gence; hence (b) follows . Note that the basic independence assumption is notneeded in this theorem .

We shall first give Liapounov's form of the central limit theoreminvolving the third moment by a typical use of the ch .f. From this we deducea special case concerning bounded r.v.'s . From this, we deduce the sufficiencyof Lindeberg's condition by direct arguments . Finally we give Feller's proofof the necessity of that condition by a novel application of the ch .f.

In order to bring out the almost mechanical nature of Liapounov's resultwe state and prove a lemma in calculus .

Lemma. Let {9„j , 1 < j < k,,, 1 < n} be a double array of complex numberssatisfying the following conditions as n -+ cc :

(i) maxl<;<k„ 10,,;l -j 0 ;(ii) ~ J "_ 1 I9„j I < M < oc, where M does not depend on n ;

u>

"=1 0„ j -> 0, where 9 is a (finite) complex number .()

Then we havek„

(7)

11(1 +0„i ) -~ e9 .j=1

PROOF . By (i), there exists no such that if n > no, then I9„ j I < 2 for allj, so that 1 + 9, j :A 0. We shall consider only such large values of n below,and we shall denote by log (1 + 0,, j ) the determination of logarithm with anangle in (-7r, 7r] . Thus

(8)

log(1 + 0„ j ) = 9„j + AI0', j 12 ,

where A is a complex number depending on various variables but boundedby some absolute constant not depending on anything, and is not necessarily

J d F„ j(x) < 2 - 2 f

f , ; (t)d t<ItI_2/E

< E 11 - f,,1(t)I dt ;2 't1<2/E

Page 223: A Course in Probability Theory KL Chung

the same in each appearance . In the present case we have in fact

Ilog(1 + 9nj) - OnjI =

n

(9)

„j 2 < max 10,(j=1

n

log(l+0 )=

.+Aj=1

so that the absolute constant mentioned above may be taken to be 1 . (Thereader will supply such a computation next time .) Hence

0mnj G

(This A is not the same as before, but bounded by the same l!) . It followsfrom (ii) and (i) that

and consequently we have by (iii),

kn

This is equivalent to (7) .

Theorem 7.1.2 . Assume that (4) and (5) hold for the double array (2) andthat y, , j is finite for every n and j. If

(10)

j=1

j=1

log(l + 0'j ) -* 0.

0

as n -+ oc, then S,, converges in dist. to (D .

PROOF . For each n, the range of j below will be from I to k,, . It followsfrom the assumption (10) and Liapounov's inequality that

max o _< max y„ j < I'„ -* 0 .

By (3') of Theorem 6.4.2, we have3

f,1(t) = 1 -

1

207n

2

jt2 +A

njy,,i l t l ,

where I An j I < 6 . We apply the lemma above, for a fixed t, to

011j =- 2aJ.t2 +A„jynjltj3 .

7.1 LIAPOUNOV'S THEOREM 1 209

(

I

1<j<kn

1<j<kn

m-2

(2)_ 10n.12 < 1,

Page 224: A Course in Probability Theory KL Chung

210 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Condition (i) is satisfied since

2max I9njI <- r max anj +AJtI 3 max yn j -, 0

j

2 j

j

by (11) . Condition (ii) is satisfied since

2I9n ~ I <

2+ A ItI 3 I'n

I

is bounded by (11) ; similarly condition (iii) is satisfied since

t2

t2E9„j

2 +A(tl 3rn -} 2 .

It follows thatk„

f n j (t) -> e-t`h'22

j=1

This establishes the theorem by the convergence theorem of Sec . 6.3, sincethe left member is the ch .f. of S n .

Corollary . Without supposing that AXnj ) = 0, suppose that for each n andj there is a finite constant M, j such that IX, j I < Mn j a.e., and that

max Mnj --> 0 .1< j<k„

Then S„ - c`(S n ) converges in dist . to 4) .

This follows at once by the trivial inequality

kn

( Xnj -

< 2 max Mnj>U2 (Xnj)1< j<kn

j=1

j=1

= 2 max Mn j .1 < j <kn

The usual formulation of Theorem 7 .1 .2 for a single sequence of inde-pendent r.v.'s {X j } with e (Xj) = 0, Q2(Xj) = ark < CC, A 1Xj 1 3 ) = yj < 00,

n

Sn =>Xj, Sn=

j=1

is as follows .

kn

n

j=12 ,

I'll ::::::::

n

j=1

Page 225: A Course in Probability Theory KL Chung

If

1,3 -0,

Sn

then Sn Is, converges in dist . to

(D

.This is obtained by setting Xn j = X j /sn . It should be noticed that the double

scheme gets rid of cumbersome fractions in proofs as well as in statements .We proceed to another proof of Liapounov's theorem for a single

sequence by the method of Lindeberg, who actually used it to prove hisversion of the central limit theorem, see Theorem 7 .2.1 below . In recent timesthis method has been developed into a general tool by Trotter and Feller .

The idea of Lindeberg is to approximate the sum X1 + . . . + X, in (13)successively by replacing one X at a time with a comparable normal (Gaussian)r.v. Y, as follows . Let {Y j, j > 1) be r .v.'s having the normal distributionN(0, a ); thus Y j has the same mean and variance as the corresponding X jabove; let all the X's and Y's be totally independent . Now put

Zj=Y1+ . . .+Yj-1-f-Xj+1+ . . .+Xn, 1 < j <n,

with the obvious convention that

Z1 =X2+ . . .+Xn, Zn =Y1+ . . .+Yn-1 •To compare the distribution of (X j + Z j)/sn with that of (Y j + Z j)/sn, weuse Theorem 6 .1 .6 by comparing the expectations of test functions . Namely,we estimate the difference below for a suitable class of functions f

(15)

L ~f (X 1.+} .. Xn

_ c+ f Y1 +. . . + Yn

Sn

)

Sn

_ n

SnU f (X j +Zj

c~ f (YJ+ZJ)}]Y

S11j=1

This equation follows by telescoping since Y j + Z j = X j+1 + Z j+, . We take

f in CB, the class of bounded continuous functions with three bounded contin-uous derivatives. By Taylor's theorem, we have for every x and y :

'2xf(x+Y)- f(x)+f,(x)Y+

.f ( )y2

7.1 LIAPOUNOV'S THEOREM 1 211

< MIyI36

where M = supxE , I f (3)(x)I . Hence if ~ and i are independent r .v.'s such thatcq 117131 < oo, we have by substitution followed by integration :

I C {f( +rJ)} - {f( )} - {f'(~)}~~{rl} - 2L{f~~( )}~`{r12}~

(16)

< 6 {1 13}

Page 226: A Course in Probability Theory KL Chung

212 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Note that the r .v.'s f (~), f'(4), and f"(t;) are bounded hence integrable . If ~is another r .v . independent of ~ and having the same mean and variance as 77,and c` { l1 3 } < oc, we obtain by replacing 77 with ~ in (16) and then taking thedifference :

(17)

IC `{f( +77)} - { f( +0}4 < 6 {1713 +1 1 3 } .

This key formula is applied to each term on the right side of (15), with~ = Z j/s„, i = Xi/s„ , = Yj/s„ . The bounds on the right-hand side of (17)then add up to

f y, + cad

S3

S3

where c = ,,f8/7r since the absolute third moment of N(0, Q2 ) is equal to ccr .By Liapounov's inequality (Sec . 3 .2) a < yj , so that the quantity in (18) is0(F„/s; ). Let us introduce a unit normal r .v . N for convenience of notation,so that (Y 1 + + Y„ )/s„ may be replaced by N so far as its distribution isconcerned. We have thus obtained the following estimate :

(19)

Vf E C3 : c f sn ) } - S{ .f (N)} < O (k) .n

Consequently, under the condition (14), this converges to zero as n - * oc . Itfollows by the general criterion for vague convergence in Theorem 6 .1 .6 thatSn IS, converges in distribution to the unit normal . This is Liapounov's form ofthe central limit theorem proved above by the method of ch .f.'s. Lindeberg'sidea yields a by-product, due to Pinsky, which will be proved under the sameassumptions as in Liapounov's theorem above .

Theorem 7.1 .3 . Let {x„ } be a sequence of real numbers increasing to +ocbut subject to the growth condition : for some c > 0,

2(20)

log s~l + - ( 1 + E) -* -ocn

as n --* oc . Then for this c, there exists N such that for all n > N we have2

2

(21)

exp -2 (1 + 6) < .°~'{ S„ >_ x„s„} < exp -2 (1 - E)

PROOF . This is derived by choosing particular test functions in (19) . Letf E C3 be such that

f(x)=0 forx<-z ; 0<f(x)<l for- ] <x< 2 ;

f (x) = 1 for x > i ;

Page 227: A Course in Probability Theory KL Chung

and put for all x :

fn(x)=f(x-xn-2), gn(x)=f(x-xn+2).

Thus we have, denoting by IB the indicator function of B C ~%l?' :

I[xn +1,oo) :S f n (x) <- I [Xn,oo)

gn (x) < I [Xn -1,oo) .

It follows that

(22) e, ~

) ~f n Q

< -'~PflSn > X"S. } <{gn (Sn

1whereas

(23)

-6flN > x, + 1 } < f{fn (N)} < C-019n (N)} < 7{N > xn - 11 .

Using (19) for f = f n and f = g,1 , and combining the results with (22) and(23), we obtain

9IN>xn+1}-Or3

<-9{Sn >-xnSn}Sn

7.1 LIAPOUNOV'S THEOREM I 213

rn<J'{N>xn-1}+O 3

SnNow an elementary estimate yields for x -+ +oo :

1

°°

2

1

x2T{N > x}

2n fexp C 2 / dy

2nx exp

2 ,

(see Exercise 4 of Sec . 7.4), and a quick computation shows further that

x2GJ'{N > x+ 1} =exp - 2 (1 +o(1)) ,

Thus (24) may be written as

X

rnP{Sn > xnSn} =exp -

2(1+0(1)) + O

(S3"n

Suppose n is so large that the o(1) above is strictly less than E in absolutevalue; in order to conclude (23) it is sufficient to have

(24)

rn

[4(1S3-= o exp

+,G) ,n

n --)~ oo .

This is the sense of the condition (20), and the theorem is proved .

Page 228: A Course in Probability Theory KL Chung

214 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Recalling that s„ is the standard deviation of S,,, we call the probabilityin (21) that of a "large deviation" . Observe that the two estimates of thisprobability given there have a ratio equal to eEX n 2 which is large for each e,as x„ --+ -I-oo . Nonetheless, it is small relative to the principal factor e _Xn `/2

on the logarithmic scale, which is just to say that Ex2 is small in absolutevalue compared with -x2 /2 . Thus the estimates in (21) is useful on such ascale, and may be applied in the proof of the law of the iterated logarithm inSec. 7.5 .

EXERCISES

*1. Prove that for arbitrary r .v.'s {X„j} in the array (2), the implications(d) : (c) ==~ (b) = (a) are all strict . On the other hand, if the X„j 's areindependent in each row, then (d) _ (c) .

2. For any sequence of r .v.'s {Y,}, if Y„/b, converges in dist . for anincreasing sequence of constants {bn }, then Y, /bn converges in pr . to 0 ifbn = o(b', ) . In particular, make precise the following statement : "The centrallimit theorem implies the weak law of large numbers ."

3. For the double array (2), it is possible that S„ /b n converges in dist.for a sequence of strictly positive constants b„ tending to a finite limit . Is itstill possible if b„ oscillates between finite limits?

4. Let {XJ ) be independent r.v .'s such that maxi<1< n 1X1 I /bn --* 0 inpr. and (Sn - a,, )/b„ converges to a nondegenerate d .f. Then b„ -* oc,b„+1 lbn -~ 1, and (a„+i - an )/bn --± 0 .

*5. In Theorem 7.1 .2 let the d .f. of S„ be F,, . Prove that given any E > 0,there exists a 8(E) such that F n < 8(E) = L(Fn , (D) < E, where L is Levydistance . Strengthen the conclusion to read :

sup JFn (x) - c(x)j < E .xE2A

* 6 . Prove the assertion made in Exercise 5 of Sec . 5 .3 using the methodsof this section . [HINT : use Exercise 4 of Sec . 4.3 .]

7 .2 Lindeberg-Feller theorem

We can now state the Lindeberg-Feller theorem for the double array (2) ofSec . 7 .1 (with independence in each row) .

Theorem 7.2.1 . Assume ~L < oc for each n and j and the reductionhypotheses (4) and (5) of Sec . 7 .1 . In order that as n - oc the two conclusionsbelow both hold :

Page 229: A Course in Probability Theory KL Chung

(1)

Hence

(X",,)I < f

IxI dF,l1(x) <1- f

x2 dF,1(x)IXI>n

17

IxI>n

and so by (1),

1 k"~'(S ;,

`) I < - L f x2 dF1 , j (x) > 0 .

j=1

IxI>rl

Next we have by the Cauchy-Schwarz inequality

(XI

< f

2Z

xdFn j(x) f

1 dF„j(x)'IxI> ,~

IXI>i

(i) S„ converges in dist. to (D,(ii) the double array (2) of Sec . 7.1 is holospoudic ;

it is necessary and sufficient that for each q > 0, we have

n

I=1

IXI>??

7.2 LINDEBERG-FELLER THEOREM 1 215

f x2 dF„j (x) -+ 0 .

The condition (1) is called Lindeberg's condition ; it is manifestlyequivalent to

(1')

n f

x2 dF11j (x) ~ 1 .

i=1

IXI `'?

PROOF . Sufficiency . By the argument that leads to Chebyshev's inequality,we have

(2)

J'{IXnjl > 17} < 2 f x2 dF,,j (x) .17

IxI>n

Hence (ii) follows from (1) ; indeed even the stronger form of negligibility (d)in Sec. 7 .1 follows. Now for a fixed 17, 0 < 17 < 1, we truncate X,_j as follows :

(3)

X' __ X,1j, if IX"; N <

ni

0,

otherwise .

Put S', _ ~~n 1 X;,~, a2 (S;,) = s'2 . We have, since c`(X,1j ) = 0,

IXI<n

fixI>nc`(X„j ) = f xdFn j(x) _ -

xdF„j (x) .

Page 230: A Course in Probability Theory KL Chung

216 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

and consequently

It follows by (1') that

1 = s2 > S itn - n

a2 2(X"1) _ fix x2 dFni (x) - C (X;,~ )2 > f _f }x2 dFn~ (x) .I<~

IXI<n

I>n

kn

Thus as n -+ oo, we have

sn -a 1 and ii(S',) -- 0.

Since

S' _Sn

-(P(S')

+ ~(Sn ) sn

s ,

S,

nn

n

we conclude (see Theorem 4 .4.6, but the situation is even simpler here) thatif [S' - cr(S' )] I /s' converges in dist, so will S' /s' to the same d .f.n

n

n

n nNow we try to apply the corollary to Theorem 7 .1 .2 to the double array

{X;71 } . We have IX,, j I < 77, so that the left member in (12) of Sec. 7.1 corre-sponding to this array is bounded by r) . But although q is at our disposal andmay be taken as small as we please, it is fixed with respect to n in the above .Hence we cannot yet make use of the cited corollary . What is needed is thefollowing lemma, which is useful in similar circumstances .

Lemma 1 . Le t u(ni, n) be a function of positive integers m and n such that

Ym: lim u(m, n) = 0 .n-+00

Then there exists a sequence (m, j increasing to oc such that

lim u(mn , n) = 0 .n-+ 00

PROOF . It is the statement of the lemma and its necessity in our appli-cation that requires a certain amount of sophistication ; the proof is easy . Foreach m, there is an n,n such that n > n,n = u(m, n) < 1 /m. We may chooseIn, ni > 1 } inductively so that n,, increases strictly with m . Now define(n0 = 1)

M, = m for n m < n < n n,+ 1 .

Then1

u (m,, n) < - for n,n < n < n,n+i,m

and consequently the lemma is proved .

}X2dFnj(X)-, 1 .~ I,-" 1 :!S -~

fix I-)

Page 231: A Course in Probability Theory KL Chung

We apply the lemma to (1) as follows . For each m > 1, we have

lim m

f

x 2 d Fn j (x) = 0 .n +oo

j1 IXI>1/m=

It follows that there exists a sequence {>7n } decreasing to 0 such that

x 2 dFn j (x) --)~ 0 .fj=1

XI > 1/n

Now we can go back and modify the definition in (3) by replacing 77 withi)n . As indicated above, the cited corollary becomes applicable and yields theconvergence of [S;, - ~~(S;,)]/sn in dist. to (Y, hence also that of Sn/s' asremarked .

Finally we must go from Sn to Sn . The idea is similar to Theorem 5 .2.1but simpler. Observe that, for the modified Xn j in (3) with 27 replaced by 17n,

we have

~P{Sn :A Sn } -<knU[Xnj 54Xnj ]

j=1

7.2 LINDEBERG-FELLER THEOREM 1 217

j=1-{ 1Xn j 1 > qn }

X2 d Fn j (x),f"n

I>nj= n

the last inequality from (2) . As n -- oo, the last term tends to 0 by the above,hence S„ must have the same limit distribution as Sn (why?) and the sufficiencyof Lindeberg's condition is proved . Although this method of proof is somewhatlonger than a straightforward approach by means of ch .f.'s (Exercise 4 below),it involves several good ideas that can be used on other occasions . Indeed thesufficiency part of the most general form of the central limit theorem (seebelow) can be proved in the same way .

Necessity . Here we must resort entirely to manipulations with ch .f.'s . Bythe convergence theorem of Sec . 6.3 and Theorem 7.1 .1, the conditions (i)and (ii) are equivalent to :

k„(4)

Vt: lim 11 f „j (t) = e_ 12 / 2 ;n~oc

j=1

(5)

Vt: lim max I f n j (t) - 11 = 0 .n-+ oc 1 < j<k„

n

1

By Theorem 6 .3 .1, the convergence in (4) is uniform in I t i < T for each finiteT : similarly for (5) by Theorem 7 .1 .1 . Hence for each T there exists n 0 (T)

Page 232: A Course in Probability Theory KL Chung

218 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

such that if n > no (T ), then

max max 1 f n j (t) - 11 < 2jtj<T 1< j<kn

We shall consider only such values of n below. We may take the distinguishedlogarithms (see Theorems 7 .6 .2 and 7.6 .3 below) to conclude that

(6)

n

-00

limn-*0c

n-*co

j

lim

= Jimn-* 0c

n

j=1

t2log f n j (t) = - 2 .

By (8) and (9) of Sec . 7 .1, we have

(7)

logfnj(t)=fnj(t) - 1+Alfnj(t) - 112 ;

kn(8)

If nj(t) - 112 < max 1 fnj(t) - 11

1 fnj(t) - 11 .1<j<kn

j=1

j=1

Now the last-written sum is, with some 9, 191 < 1 :

- 1)dFnj(x) =E 00

12x2

f (itx+6_-_-)2 dFnj (x)

20o

t 2< 2

x d Fn j (x) = 2 .00

Hence it follows from (5) and (9) that the left member of (8) tends to 0 asn --)~ oc. From this, (7), and (6) we obtain

t2urn

{fnj(t)-1}=-2 .n+ oc

j

Taking real parts, we have00

t2nlmo

f0C

(1-costx)dFnj (x)= 2 .

Hence for each 17 > 0, if we split the integral into two parts and transpose oneof them, we obtain

t 22

(1 - cos tx) dF nj (x)

(1 - cos tx) dF n j (x)j

~xl>n

Page 233: A Course in Probability Theory KL Chung

< limn-+oo

7.2 LINDEBERG-FELLER THEOREM 1 219

2 dFn (x)fI>7?

Qn j 2

2< lim 2

~2 = 2 'n-+ooI

77

the last inequality above by Chebyshev's inequality . Since 0 < 1 - cos 9 <02 /2 for every real 9, this implies

2lim

2

x2 dFn (x) > 0,n+oo

IxI<n

the quantity in braces being clearly positive . Thus

4

k"

> lim 1 -

x2 dFn j(x) ;t2'2

n+oo

J=1 JJX1<27

t being arbitrarily large while ?I is fixed ; this implies Lindeberg's condition(1') . Theorem 7 .2.1 is completely proved .

Lindeberg's theorem contains both the identically distributed case(Theorem 6.4.4) and Liapounov's theorem. Let us derive the latter, whichassets that, under (4) and (5) of Sec . 7.1, the condition below for any onevalue of 8 > 0 is a sufficient condition for Sn to converge in dist. to t :

k"(10)

EfIxI2+S dFnj (x) ~ 0 .

j=1 o

For 8 = 1 this condition is just (10) of Sec . 7.1 . In the general case the asser-tion follows at once from the following inequalities :

2+s

fix x 2 dF,l1(x)

IxI-

f

S dFn (x)i

I>7

IxI>??

'7

Ixl 2+s dFni. (x),

showing that (10) implies (1) .The essence of Theorem 7.2.1 is the assumption of the finiteness of the

second moments together with the "classical" norming factor sn , which is thestandard deviation of the sum Sn ; see Exercise 10 below for an interestingpossibility . In the most general case where "nothing is assumed," we have thefollowing criterion, due to Feller and in a somewhat different form to Levy .

Page 234: A Course in Probability Theory KL Chung

220 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Theorem 7.2.2. For the double array (2) of Sec . 7.1 (with independence ineach row), in order that there exists a sequence of constants {a„) such that(i) Ek"_1 Xn j - a„ converges in dist. to (D, and (ii) the array is holospoudic,it is necessary and sufficient that the following two conditions hold for everyq>0:

(a ) Ekjn=1 fxl > d F„ j (x)-+ 0;

(b) ~j=1{fxl<,~x2dF„j(x)-(fxl<,,xdFnj(x))2}--* 1 .

We refer to the monograph by Gnedenko and Kolmogorov [12] for thisresult and its variants, as well as the following criterion for a single sequenceof independent, identically distributed r .v.'s due to Levy.

Theorem 7.2.3 . Let {Xj , j > 1 } be independent r .v.'s having the commond.f. F; and Sn = E'=1 Xj. In order that there exist constants an and bn > 0(necessarily bn --,, +oo) such that (Sn - a,)/b, converges in dist. to c, it isnecessary and sufficient that we have, as y -+ +oo :

(11)

y2 f dF(x) = oCf x2 dF(x) .

Ixl>y

Ixl<y

The central limit theorem, applied to concrete cases, leads to asymptoticformulas . The best-known one is perhaps the following, for 0 < p < 1, p +q = 1, and x1 < x2, as n -a oo:(12)

(fl) pkqnk-xI,/n pq<k-n p<_x?,1n pq

~(x2) -'D(x1 ) =e `' y .,vr27r

fX2_2/2dI

This formula, due to DeMoivre, can be derived from Stirling's formula forfactorials in a rather messy way . But it is just an application of Theorem 6 .4.4(or 7 .1 .2), where each X j has the Bernoullian d.f. p81 + qSo .

More interesting and less expected applications to combinatorial analysiswill be illustrated by the following example, which incidentally demonstratesthe logical necessity of a double array even in simple situations .

Consider all n ! distinct permutations (a1, a2, . . . , a n ) of the n integers(1, 2, . . ., n). The sample space Q = Qn consists of these n ! points, and J'assigns probability 1 In! to each of the points . For each j, 1 < j < n, andeach w = (a 1 , a2 , . . . , a n ) let Xnj be the number of "inversions" caused byin w; namely X„j (w) = m if and only if j precedes exactly m of the inte-

gers 1, . . . , j - 1 in the permutation w . The basic structure of the sequence{Xnj, I < j < n) is contained in the lemma below .

Page 235: A Course in Probability Theory KL Chung

7.2 LINDEBERG-FELLER THEOREM 1 221

Lemma 2. For each n, the r .v.'s {X nj, 1 < j < n } are independent with thefollowing distributions :

1'~{Xnj=m}=- for 0<m< j-1 .

J

The lemma is a striking example that stochastic independence need notbe an obvious phenomenon even in the simplest problems and may requireverification as well as discovery . It is often disposed of perfunctorily but aformal proof is lengthier than one might think . Observe first that the values ofX, 1, . . . , X n j are determined as soon as the positions of the integers 11, . . . , j }are known, irrespective of those of the rest . Given j arbitrary positions amongn ordered slots, the number of cv's in which {1, . . . , j} occupy these positions insome order is j!(n - j)! . Among these the number of co's in which j occupiesthe (j - m)th place, where 0 < m < j - 1, (in order from left to right) of thegiven positions is (j - 1)!(n - j)! . This position being fixed for j, the integers11, . . . , j - 1 } may occupy the remaining given positions in (j - 1) ! distinctways, each corresponding uniquely to a possible value of the random vector

{Xnl, . . . , X,T,j-1} •

That this correspondence is 1 - 1 is easily seen if we note that the totalnumber of possible values of this vector is precisely 1 • 2 . . . (j - 1) =(j - 1)! . It follows that for any such value (c1, . . . , cj-,) the number ofw's in which, first, (I, . . . , j} occupy the given positions and, second,X ii 1(co) = c1, . . . , X n, j_1(w) = cj_1, Xnj(w) = m, is equal to (n -j)! . Hencethe number of co's satisfying the second condition alone is equal to ( n ) (n -j)! = n!/j! . Summing over m from 0 to j - 1, we obtain the number ofw's in which X i 1(w) = c1, . . . . Xn , j - 1 (w) = cj-1 to be jn!/j! = n!/(j - 1)! .Therefore we have

n!Jl'{Xn1 =c1, . . .,Xn,j-1 =cj-1,Xnj =m)

j!

1JJ~{X n 1 = C1, . . . , Xn, j-1 = Cj-1 }

n!

J(j-1) 1

This, as we know, establishes the lemma . The reader is urged to see for himselfwhether he can shorten this argument while remaining scrupulous .

The rest is simple algebra. We find

j-1

2

j2 -1(`(Xnj) = 2 '

~n j =

12 '

( (Sn)

For each 77 > 0, and sufficiently large n, we have

lx„ j I < j -1 <n-1 <11sn .

2

n3nS2 -

4

n

36 *

Page 236: A Course in Probability Theory KL Chung

222 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Hence Lindeberg's condition is satisfied for the double array

X,, ;1< j<n,1<n

Sn

(in fact the Corollary to Theorem 7 .1 .2 is also applicable) and we obtain thecentral limit theorem :

lim ~'n --+ co

Here for each permutation w, S n (w) _ E'=1 X n j (w) is the total number ofinversions in w ; and the result above asserts that among the n ! permutationson 11, . . . , n), the number of those in which there are <n 2/4 + x(n 3/2 )/6inversions has a proportion c (x), as n -± oo. In particular, for example, thenumber of those with <n 2/4 inversions has an asymptotic proportion of 2 .

EXERCISES

1 . Restate Theorem 7 .1 .2 in terms of normed sums of a single sequence .2. Prove that Lindeberg's condition (1) implies that

max an j -* 0 .1<j<kn

* 3. Prove that in Theorem 7.2 .1, (i) does not imply (1) . [HINT : Considerr.v .'s with normal distributions .]

*4. Prove the sufficiency part of Theorem 7 .2 .1 without usingTheorem 7 .1 .2, but by elaborating the proof of the latter . [HINT : Use theexpansion

2e"X = 1 + itx + B (t2)

for Ix I >

and2

3e'rx = 1 + itx - (t2)

+ 0' It6I

for IxI < ri .

As a matter of fact, Lindeberg's original proof does not even use ch .f.'s; seeFeller [13, vol . 2] .]

5. Derive Theorem 6.4.4 from Theorem 7 .2 .1 .6. Prove that if 8

8', then the condition (10) implies the similar onewhen 8 is replaced by 8' .

Page 237: A Course in Probability Theory KL Chung

*7. Find an example where Lindeberg's condition is satisfied butLiapounov's is not for any 8 > 0 .In Exercises 8 to 10 below {Xj , j > 1 } is a sequence of independent r.v.'s .

8. For each j let Xj have the uniform distribution in [-j, j] . Show thatLindeberg's condition is satisfied and state the resulting central limit theorem .

9. Let X,. be defined as follows for some a > 1 :

fja

1,

with probability 6 j2(a_1) each ;X~ -

10,

with probability 1 - 3 j2(a-1)

Prove that Lindeberg's condition is satisfied if and only if a < 3/2 .* 10. It is important to realize that the failure of Lindeberg's condition

means only the failure of either (i) or (ii) in Theorem 7 .2.1 with the specifiedconstants s, A central limit theorem may well hold with a different sequenceof constants . Let

Prove that Lindeberg's condition is not satisfied . Nonetheless if we take b, _11 3/18, then S„ /b„ converges in dist . to (D . The point is that abnormally largevalues may not count! [HINT : Truncate out the abnormal value .]

11 . Prove that fo x2 d F (x) < oo implies the condition (11), but notvice versa .

* 12. The following combinatorial problem is similar to that of the numberof inversions . Let Q and '~P be as in the example in the text . It is standardknowledge that each permutation

1a,

7.2 LINDEBERG-FELLER THEOREM 1 223

with probability each ;12j2

with probability 12 each ;

with probability 1 - 1 - 16 6j2

2a2

can be uniquely decomposed into the product of cycles, as follows. Considerthe permutation as a mapping n from the set (1, . . ., n) onto itself suchthat 7r(j) = aj . Beginning with 1 and applying the mapping successively,1 --* 7r(1) - 2r2 (1) -* . . ., until the first k such that rrk (1) = 1 . Thus

(1, 7r(l), 7r2(1), . . . , 7rk-1(1))

Page 238: A Course in Probability Theory KL Chung

224 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

is the first cycle of the decomposition . Next, begin with the least integer,say b, not in the first cycle and apply rr to it successively; and so on . Wesay 1 --+ r(1) is the first stepk-1 (1) --f 1 the kth step, b --k 7r(b) the(k + 1)st step of the decomposition, and so on . Now define Xnj (cv) to be equalto 1 if in the decomposition of co, a cycle is completed at the jth step ; otherwiseto be 0. Prove that for each n, {Xnj , 1 < j < n} is a set of independent r.v.'swith the following distributions :

~P{Xnj=1}=n

1

- j+1'

1~'{Xnj=O}=1-

n-j+1

Deduce the central limit theorem for the number of cycles of a permutation .

7.3 Ramifications of the central limit theorem

As an illustration of a general method of extending the central limit theoremto certain classes of dependent r.v.'s, we prove the following result. Furtherelaboration along the same lines is possible, but the basic idea, attributed toS . Bernstein, consists always in separating into blocks and neglecting small ones .

Let {Xn , n > 1 } be a sequence of r.v.'s ; let ~n be the Borel field generatedby {Xk, 1 < k < n), and ,,7 ' that by {Xk, n < k < oo}. The sequence is calledm-dependent iff there exists an integer m > 0 such that for every n the fields`gin and z~„+m are independent . When m = 0, this reduces to independence .

Theorem 7.3.1 . Suppose that {Xn } is a sequence of m-dependent, uniformlybounded r.v.'s such that

(T(Sn) - +oon 1/3

as n -->. oo. Then [Sn - f (Sn )]/6(Sn ) converges in dist. to (P .

PROOF . Let the uniform bound be M . Without loss of generality we maysuppose that f (Xn ) = 0 for each n . For an integer k > 1 let n j = [jn /k],0 < j < k, and put for large values of n :

Yj =Xnj+l +Xn j+2 + - • . +Xn ;+1-in ;

Zj = Xn jti -m+1 +Xnj+i -m+2 + . . . +Xnj+i .

We have thenk-1

Y1 +j=0

j=0

k-1+ s 11

, say.

Page 239: A Course in Probability Theory KL Chung

It follows from the hypothesis of m-dependence and Theorem 3 .3 .2 that theYd's are independent; so are the Zj's, provided nj+l - m + 1 - nj > m,which is the case if n 1k is large enough . Although S;, and S," are notindependent of each other, we shall show that the latter is comparativelynegligible so that S, behaves like S ;, . Observe that each term X, in S ;,'is independent of every term X, in Sn except at most m terms, and that('(X,X,) = 0 when they are independent, while I ci (X,X,) I < M2 otherwise .Since there are km terms in S ;;, it follows that

I(y(S;,Sn)I <km .m .M2=k(mM)2 .

We have alsok-1

(S"n) =

e(Z~) < k(mM )2j=o

From these inequalities and the identity

Gg(Sn ) = G,-(S'n ) + 2(f, (Sn Sn ) + AV n )

we obtaine`'(Sn) - (ff(S'n) I < 3km2M2 .

Now we choose k = kn = [n2/3] and write s2 = AS2) = Q2(Sn ) + s'2n

n

=n(S'n) = Q2 (Stt ) . Then we have, as n ~ oo .

(1)

7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 225

Sn

and

S,)2(2)

( n

0.Sn

Hence, first, Sri /Sn -+ 0 in pr. (Theorem 4 .1 .4) and, second, since

Sn

Sn S;,

Sn

Sn

Sn Sn

Sn

Sn /s11 will converge in disc. to

(D if Sn /s'n does

.Since kn is a function of n, in the notation above Yj should be replaced

by Y„j to form the double array {Yni, 0 < j < kn - 1, 1 < n}, which retainsindependence in each row. We have, since each Yn j is the sum of no morethan [n/kn] + 1 of the X„'s,

IY,,i1 < 11 -+1 M=O(nV3)=o(Sn)=o(S ;,)>Ell

Page 240: A Course in Probability Theory KL Chung

226 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

the last relation from (1) and the one preceding it from a hypothesis of thetheorem . Thus for each 77 > 0, we have for all sufficiently large n

J x > 27 S'n

where F,, j is the d.f. of Y, j . Hence Lindeberg's condition is satisfied for thedouble array {Ynj/s;,}, and we conclude that S;,/sn converges in dist. to (D .This establishes the theorem as remarked above .

The next extension of the central limit theorem is to the case of a randomnumber of terms (cf. the second part of Sec . 5 .5) . That is, we shall deal withthe r .v . Svn whose value at c) is given by Svn (w) (co), where

n

S"(60 )

X - ( )j=1

as before, and {vn (c)), n > 1 } is a sequence of r.v.'s . The simplest case,but not a very useful one, is when all the r.v.'s in the "double family"{Xn , Vn , n > 1 } are independent . The result below is more interesting and isdistinguished by the simple nature of its hypothesis . The proof relies essentiallyon Kolmogorov's inequality given in Theorem 5 .3 .1 .

Theorem 7.3.2 . Let {Xj , j > 1 } be a sequence of independent, identicallydistributed r .v.'s with mean 0 and variance 1 . Let {v,,, n > 1} be a sequenceof r.v.'s taking only strictly positive integer values such that

x2 dFnj (x) = 0, 0 < j < k n - 1,

(3)

where c is a constant: 0 < c < oc. Then Svn / vn converges in dist. to (D .

PROOF . We know from Theorem 6 .4.4 that S 7, / Tn- converges in dist . toJ, so that our conclusion means that we can substitute v n for n there. Theremarkable thing is that no kind of independence is assumed, but only the limitproperty in (3) . First of all, we observe that in the result of Theorem 6 .4.4 wemay substitute [cn] (= integer part of cn) for n to conclude the convergencein dist . of S[un]/,/[cn] to c (why?) . Next we write

(4)

Vn

nin pr .,

S Vn _

S[cn]

Sv, - S[cn]

Vn

/[cn ] + f[cn ]

Sv n - S[`n] -a 0

in pr .,/[cn ]

Vn

The second factor on the right converges to 1 in pr ., by (3) . Hence a simpleargument used before (Theorem 4 .4.6) shows that the theorem will be provedif we show that

Page 241: A Course in Probability Theory KL Chung

Let E be given, 0 < E < 1 ; put

an = [(1 - E3 )[cn]], bn = [( 1 + E 3 )[cn]] - 1 .

By (3), there exists no(E) such that if n > no(E), then the set

A = { C) : an < vn (co) < bn }

has probability > 1 -,e . If co is in this set, then S„ n (W) (co) is one of the sumsS j with an < j < b n . For [cn ] < j < bn , we have

Sj - S[cn ] = X[cn ]+1 + X[cn]+2 +

+Xj ;

hence by Kolmogorov's inequality

JP

max I Sj - S[cn] I > E cn _< 62 (Sbn - S[cn]) < E3 [cn] < E .

}[cn]<j<bn

E2Cn

E2Cn

A similar inequality holds for an < j < [cn] ; combining the two, we obtain

~P{ max I Si - S[cn] I > E cn } < 2E .an -<j <bn

Now we have, if n > no(E) :

S v n - S[cn]~/[cn ]

<

an <j <bn

J

7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM ( 227

lvn =j; max ISj -S[cn]I > E\[cn]} +an <j <b,,

j [an,bn]

< f~ ~ max I Sj - S[cn I I > E \/[cn ] } + UP{ vn 0 [an, bn I ]a„ -<j-<b,,

< 2E + 1 - A} < 3E .

Since c is arbitrary, this proves (4) and consequently the theorem .

As a third, somewhat deeper application of the central limit theorem, wegive an instance of another limiting distribution inherently tied up with thenormal. In this application the role of the central limit theorem, indeed inhigh dimensions, is that of an underlying source giving rise to multifariousmanifestations . The topic really belongs to the theory of Brownian motionprocess, to which one must turn for a true understanding .

Page 242: A Course in Probability Theory KL Chung

228 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Let {Xj , j > 1 } be a sequence of independent and identically distributedr.v.'s with mean 0 and variance 1 ; and

Sn =j=1

It will become evident that these assumptions may be considerably weakened,and the basic hypothesis is that the central limit theorem should be applicable .Consider now the infinite sequence of successive sums {S, 1 , n > I) . This isthe kind of stochastic process to which an entire chapter will be devoted later(Chapter 8). The classical limit theorems we have so far discussed deal withthe individual terms of the sequence [S, j itself, but there are various othersequences derived from it that are no less interesting and useful . To give afew examples :

max Sin ,

min Sin ,

max IS,, 1,

max1<m<n

1 <m_<<n

1<m<n

1<m<n

n

n

E 3. (Sin) ,

Y(Sm, Sm+1) ;M=1 m=1

where y(a, b) = 1 if ab < 0 and 0 otherwise. Thus the last two examplesrepresent, respectively, the "number of sums > a" and the "number of changesof sign" . Now the central idea, originating with Erdos and Kac, is that theasymptotic behavior of these functionals of S n should be the same regardless ofthe special properties of {Xj }, so long as the central limit theorem applies to it(at least when certain regularity conditions are satisfied, such as the finitenessof a higher moment) . Thus, in order to obtain the asymptotic distribution ofone of these functionals one may calculate it in a very particular case wherethe calculations are feasible. We shall illustrate this method, which has beencalled an "invariance principle", by carrying it out in the case of max S in ; forother cases see Exercises 6 and 7 below .

Let us therefore put, for a given x :

P„ (x) _ 3, max Sin < x~/n } .~I<m<n

For an integer k > 1 let n i = [jn Jk], 0 < j < k, and define

Rnk (x)

max Sn , < xl } .I<j<k

Let also

E_j = {to: S in ((o) < xJ, I < m < j ; S.j (w) > x/) ;

Page 243: A Course in Probability Theory KL Chung

7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 229

and for each j, define f (j) by

nt(j)-1 < j < nt(j)-

Now we write, for 0 < E < x :

nI max Sm > x,~n- } _

°'{Ej ; ISnt(j) - Sj I < 6,1n)1<m<n

1:2

j=1

~Y {Ej ; ISnt(j) - Sj I > Eln} = L + E,j=1

1

2

say. Since Ej is independent of { ISnt(j) - Sj I > Efn- } and Q2 (S",(;) - Sj) <n 1k, we have by Chebyshev's inequality :

n

j=1

n

9(Ej)E2n -< E2k .

On the other hand, since Sj > x~ and 1 Snt(j) - Sj I < EN n imply Snt(j) >(x - E) ./n, we have

< ~P max Sn t > (x - E),.,fn} = 1 - Rnk(x - E) .1

11<t<k

It follows that

(5)

Pn(x)=1 -

>-Rnk(x-E)-2

Since it is trivial that Pn (x) < Rnk (x), we obtain from (5) the followinginequalities :

(6)

Pn (x) < Rnk (x') < Pn (x + E) +1

E2k

We shall show that for fixed x and k, limn, R,,k(x) exists . Since

Rnk(x) = 9"{Sn i < x-1n, •Sn2 < xInSnk < xfn-},

it is sufficient to show that the sequence of k-dimensional random vectors

V nSnj3

n n2+ . . .,

n

k

1

E2k

Page 244: A Course in Probability Theory KL Chung

230 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

converges in distribution as n -f oc . Now its ch.f. f (t i , . . . , tk ) is given by

F{exp(iVk/n(tjS ni + . . . + t kSnk ))) = d,`'{exp(i-\/k/n[(ti + . . . + tk )S n,

+ (t2 + . . . + tk) (Sn 2 - S'1)

+ . . . + tk(Snk - Sn k_ I )D},which converges to

(7)

exp [- i (t i + . . . + tk)2 ] exp [-21(t2 + . . .

tk )2] . . . exp (_ ztk) ,

since the ch .f.'s of

V

k

kn S17 1 9

- (Snz - Sn, ), . . . 7

n (Snk -'Snk-1 )

all converge to e_ t2/2 by the central limit theorem (Theorem 6 .4.4), n.i+l -n j being asymptotically equal to n/k for each j . It is well known that thech.f. given in (7) is that of the k-dimensional normal distribution, but for ourpurpose it is sufficient to know that the convergence theorem for ch .f.'s holdsin any dimension, and so Rnk converges vaguely to R ... k, where R00k is somefixed k-dimensional distribution .

Now suppose for a special sequence {Xj } satisfying the same conditionsas IX

j.}, the corresponding Pn can be shown to converge ("pointwise" in fact,

but "vaguely" if need be) :

(8)

Vx: hm Pn (x) = G(x) .n -± 00

Then, applying (6) with Pn replaced by Pn and letting n ---). oc, we obtain,since Rook is a fixed distribution :

1G(x) < Rook (x) < G(x + E) + Elk .

Substituting back into the original (6) and taking upper and lower limits,we have

1

1G(x - E) - 2 < Rook (x - E) -

62k < lim P n (x)Ek

Ek

n1

< Rook(x) < G(x +c) +Elk

.

Letting k - oo, we conclude that P n converges vaguely to G, since E isarbitrary .

It remains to prove (8) for a special cWoice of {Xj } and determine G . Thiscan be done most expeditiously by taking the common d.f. of the X D 's to be

Page 245: A Course in Probability Theory KL Chung

the symmetric Bernoullian 2 (8 1 + S_ 1 ) . In this case we can indeed computethe more specific probability

(9)

UT max Sm < x ; Sn = y ,1<_m<n

where x and y are two integers such that x > 0, x > y . If we observe thatin our particular case maxl<m< n Sin > x if and only if Sj = x for some1 < j < n, the probability in (9) is seen to be equal to

`'T{Sn = y} - !~71' max S in > x ; S, = yI <m<n

n

= f{Sn = y} - l . -:P{Sm < x, 1 < m < j ; Sj = x; S n = y}j=1n

GP'{Sm <x,l<m<j;Sj=x;Sn -Sj=y-x}j=1

j=1

7 .3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 231

{Sm <x,1<m< j;Sj =x}-6pflSn -Sj=x-y}

n

A{Sm < x, 1 < m < j ;Sj =x ;Sn -Sj =x - y}

j=1

n

,J-'{S, n < x, 1 < m < j ; S j = x; S n = 2x - y}

j=1

j,

_ 9{Sn = } -

J'{Sm < x, 1 < m < j ; Sj = xV{Sn - Sj = y - x},j=1

where the last step is by independence. Now, the r.v .n

S71 -Sj=

Xmm=j+1

being symmetric, we have ?/ {Sn - S j = y - x} = Y{Sn - Sj = x - y} .Substituting this and reversing the steps, we obtain

n

_ '{Sn = y} - J' max Sm > x; S n = 2x - y .l<n<n

Since 2x - y > x, S„ = 2x - y implies maxi<m <n Sm > x, hence the last linereduces to

(10)

-~{Sn = y} - GJ'{Sn = 2x - y},

Page 246: A Course in Probability Theory KL Chung

232 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

and we have proved that the value of the probability in (9) is given by(10) . The trick used above, which consists in changing the sign of every X jafter the first time when S, reaches the value x, or, geometrically speaking,reflecting the path { (j, Si), j > 1 ] about the line Sj = x upon first reaching it,is called the "reflection principle" . It is often attributed to Desire Andre in itscombinatorial formulation of the so-called "ballot problem" (see Exercise 5below) .

The value of (10) is, of course, well known in the Bernoullian case, andsumming over y we obtain, if n is even :

1

n

n~I max Sm <x

2n

n-Y - n-2x+y1<m<n

Y<x

2

21

n ) ( nn-y _ n+

Y<x 2n

2

21

n2n n-x

n+x J2 <~< 2

1

n l

1

n

2n

j/ + 2n n + x ,

~<2

2

where (~) = 0 if j j j > n or if j is not an integer. Replacing x by x J (or[x,jn_] if one is pedantic) in the last expression, and using the central limittheorem for the Bernoullian case in the form of (12) of Sec. 7.2 with p = q =

i , we see that the preceding probability tends to the limit

1

x

V ? Jx e_y2/2 d y

27t j e_Y2/2dy=

as n -+ oc . It should be obvious, even without a similar calculation, that thesame limit obtains for odd values of n . Finally, since this limit as a functionof x is a d .f. with support in (0, oc), the corresponding limit for x < 0 mustbe 0 . We state the final result below .

Theorem 7.3 .3 . Let {Xj , j > 0) be independent and identically distributedr.v.'s with mean 0 and variance 1, then (max, <m <n S,,,)14 converges in dist .to the "positive normal d.f." G, where

Vx: G(x) = (20(x) - 1) V 0 .

Page 247: A Course in Probability Theory KL Chung

EXERCISES

1. Let {Xj , j > 1 } be a sequence of independent r.v .'s, and f a Borelmeasurable function of m variables. Then if ~k = f (Xk+l, . . . , Xk+,n ), thesequence {~ k , k > 1) is (m - 1)-dependent .

*2. Let {Xj , j > 1} be a sequence of independent r .v .'s having theBernoullian d .f. p81 + (1 - p)8o , 0 < p < 1 . An r-run of successes in thesample sequence {Xj (w), j > 1 } is defined to be a sequence of r consecutive"ones" preceded and followed by "zeros" . Let Nn be the number of r-runs inthe first n terms of the sample sequence . Prove a central limit theorem for N n .

3. Let {Xj , vj , j > 1} be independent r .v.'s such that the vg's are integer-valued, vj --~ oc a.e ., and the central limit theorem applies to (Sn - an)/b,,where S„ _ E~=1 Xj , an , bn are real constants, bn -+ oo. Then it also appliesto (SVn - aV n )lb,,,

* 4. Give an example of a sequence of independent and identicallydistributed r.v.'s {X„ } with mean 0 and variance I and a sequence of positiveinteger-valued r .v.'s vn tending to oo a .e. such that S„n /s„, does not convergein distribution . [HINT : The easiest way is to use Theorem 8 .3.3 below.]

*5. There are a ballots marked A and b ballots marked B . Suppose thatthese a + b ballots are counted in random order . What is the probability thatthe number of ballots for A always leads in the counting?

6. If {Xn } are independent, identically distributed symmetric r.v.'s, thenfor every x > 0,

1

-~~ ~{Ix, I>x){IsnI >x} > 12 ;,~{max IXkI >x}

1<k<n

7. Deduce from Exercise 6 that for a symmetric stable r .v. X withexponent a, 0 < a < 2 (see Sec . 6 .5), there exists a constant c > 0 such thatJ~{IXI > n 1 l') > c/n . [This is due to Feller ; use Exercise 3 of Sec . 6 .5 .]

8. Under the same hypothesis as in Theorem 7 .3 .3, prove thatmaxi< n,<„ Is . . . 1/ n converges in dist. to the d.f. H, where H(x) = 0 forx < 0 and

4

(-1)k

(2k+1)2n200

H (x)_ 7T "' 2k + 1 exp -

8x2

for x > 0 .k=0

[HINT : There is no difficulty in proving that the limiting distribution is thesame for any sequence satisfying the hypothesis . To find it for the symmetricBernoullian case, show that for 0 < z < x we have

-z < min S,n < max Sn, < x - z ; Sn = y - z1 <,n<n

1 <m<n

7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 233

Page 248: A Course in Probability Theory KL Chung

234 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

where

1I)n

00

This can be done by starting the sample path at z and reflecting at both barriers0 and x (Kelvin's method of images). Next, show that

lim :'7{-z J < Sm < (x - z),1n for 1 < m < n}n-+0C

2kx-z

(2k-1)x-z

1

(2k+1)x-z

2kx-zeye/2 d y .

Finally, use the Fourier series for the function h of period 2x:

_

1,

if - x - z < y < -z ;h(~ )

+1,

if - z < y < x - z ;

to convert the above limit to

4

1

. (2k + 1)7rz

(2k + 1)27r2

2k + 1sin

x exp -

2X2

].

=0

This gives the asymptotic joint distribution of

(f(Nn)- r !

n

n(n+2+Y_Z

kx- (n+2_Y_Z)} .kx2

2

min Sm and max Sm,1 <m<n

1 <m<n

of which that of max, <,,,<„ 1S.1 is a particular case .9 . Let {Xj > 1 } be independent r.v.'s with the symmetric Bernoullian

distribution . Let N, (w) be the number of zeros in the first n terms of thesample sequence {Sj (c)), j > 11 . Prove that N,,/.,In converges in dist . to thesame G as in Theorem 7 .3 .3 . [HINT : Use the method of moments . Show thatfor each integer r > 1 :

O<ji< . . .<jr<n/2

P21 - J~{S2j - 0}

2j

1

1-

j

22 j

P2ji P2(j2-j,). . .

P2(j.-j -1 )

as j -,- oo . To evaluate the multiple sum, say E (r), use induction on r asfollows . If

(

n lr/2

~(r) tiCr 2

Page 249: A Course in Probability Theory KL Chung

as n -~~ oo, then

(r + 1)

c,

z-1 /2 (1 - z)r12dz (2) (r+1>/2.

~ o

Thus

Finally

2r/2r r + 1

N„ r

F(r+ 1)

2e

2r12r(2+1) -

r 1(2)

This result remains valid if the common d .f. F of Xj is of the integer latticetype with mean 0 and variance 1 . If F is not of the lattice type, no S„ needever be zero-but the "next nearest thing", to wit the number of changes ofsign of S,,, is asymptotically distributed as G, at least under the additionalassumption of a finite third absolute moment .]

7 .4 Error estimation

Questions of convergence lead inevitably to the question of the "speed" ofconvergence-in other words, to an investigation of the difference betweenthe approximating expression and its limit . Specifically, if a sequence of d .f.'sF„ converge to the unit normal d .f. c, as in the central limit theorem, whatcan one say about the "remainder term" F„ (x) - c (x)? An adequate estimateof this term is necessary in many mathematical applications, as well as fornumerical computation . Under Liapounov's condition there is a neat "orderbound" due to Berry and Esseen, who improved upon Liapounov's older result,as follows .

Theorem 7.4.1 . Under the hypotheses of Theorem 7 .1 .2, there is a universalconstant A 0 such that

(1)

sup IF,, (x) - (D (x)I < A0F,x

where F„ is the d.f. of S,, .

Cr+ 1 = Cr

r(2+1)

rr+1 +12

7.4 ERROR ESTIMATION 1 235

0

00

xrdG(x) .

Page 250: A Course in Probability Theory KL Chung

236 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

In the case of a single sequence of independent and identically distributedr.v .'s {Xj, j > 1 } with mean 0, variance 62 , and third absolute moment y < oc,the right side of (1) reduces to

ny

Aoy 1Ao (n j2)3/2 =9 3 n 1/2'

H. Cramer and P. L. Hsu have shown that under somewhat stronger condi-tions, one may even obtain an asymptotic expansion of the form :

H1(x) H2(x)+

H3(x)+F" (x) _ ~(x)+ n1/2 + n

n3/2

. . .

where the H's are explicit functions involving the Hermite polynomials . Weshall not go into this, as the basic method of obtaining such an expansion issimilar to the proof of the preceding theorem, although considerable technicalcomplications arise. For this and other variants of the problem see Cramer[10], Gnedenko and Kolmogorov [12], and Hsu's paper cited at the end ofthis chapter.

We shall give the proof of Theorem 7 .4 .1 in a series of lemmas, in whichthe machinery of operating with ch .f.'s is further exploited and found to beefficient .

Lemma 1. Let F be a d.f., G a real-valued function satisfying the condi-tions below :

(i) lim1,_00 G(x) = 0, lima,+. G(x) = 1 ;(ii) G has a derivative that is bounded everywhere : supx jG'(x)i < M .

Set

1(2)

= 2M sup I F(x)- G(x)l .

Then there exists a real number a such that we have for every T > 0:

(

T° 1 - cosx(3)

2MTA { 3/

2 dx - nl o

x

cos TxJ

2 {F(x + a) - G(x + a)} dxX2

PROOF . Clearly the A in (2) is finite, since G is everywhere bounded by(1) and (ii) . We may suppose that the left member of (3) is strictly positive, forotherwise there is nothing to prove ; hence A > 0. Since F - G vanishes at+oc by (i), there exists a sequence of numbers {x„ } converging to a finite limit

Page 251: A Course in Probability Theory KL Chung

Let

(iii) G is of bounded variation in (-oo, oo) ;(iv) f00 JF(x) - G(x){ dx < oo.

00f(t) _ J e` tx dF(x), g(t) =

e`fxdG(x) .00

-00

7.4 ERROR ESTIMATION 1 237

b such that F(x„) - G(x„) converges to 2MA or -2MA . Hence either F(b) -G(b) = 2MA or F(b-) - G(b) = -2MA . The two cases being similar, weshall treat the second one . Put a = b - A ; then if Ix I < A, we have by (ii)and the mean value theorem of differential calculus :

G(x + a) > G(b) + (x - A)M

and consequently

F(x + a) - G(x + a) < F(b-) - [G(b) + (x - 0)M] _ -M(x + A) .

It follows that

° 1 - cos Tx

f° 1 - cos Tx~°x2 {F(x+a)-G(x+a)}dx < -M

I °

x2 (x+0)dx

= -2MA ~°1 - cos Tx

dx;Jo

x2

-°f

+J

00 1 - cos Tx2 {F(x+a)-G(x+a)}dx

°

x

°

1 - cos Tx

00 1 - cos Tx< 2M A ~~ + 2 dx = 4M A 2 dx.

f00)

x

°

x

Adding these inequalities, we obtain

I 1 - cos Tx

A

fO°2 {F(x+a)-G(x+a)}dx < 2MA

-[+2Jf00

x

0

°

1 - cos Tx

°

00 I - cos Txx2

dx - 2MA -30

+ 20

x2 dx.

This reduces to (3), since

f00 1 -cos Tx

7T

2dx = -

o

x

2by (3) of Sec . 6.2, provided that T is so large that the left member of (3) ispositive; otherwise (3) is trivial .

Lemma 2. In addition to the assumptions of Lemma 1, we assume that

Page 252: A Course in Probability Theory KL Chung

238 1CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Then we have

(4)

p < 1 ~T I f(t ) - g(t)1 dt + 127rM Jo

t 7rTPROOF . That the integral on the right side of (4) is finite will soon be

apparent, and as a Lebesgue integral the value of the integrand at t = 0 maybe overlooked . We have, by a partial integration, which is permissible onaccount of condition (iii) :

(5)

00f (t) - g(t) _ -it f {F(x) - G(x)}e itx dx;00

and consequently

f(t)-g(t)_1t4 = °°

-it

e

f{F(x + a)- G(x + a)}e'tx dx .

In particular, the left member above is bounded for all t ; 0 by condition (iv) .Multiplying by T - Itl and integrating, we obtainfT

f(t)-g(t) .JT

it

e-1ta(7, - Itl)dt

f-T

T °0

f {F(x0c

+ a) - G(x + a)}e` tx (T - ItI) dx A

We may invert the repeated integral by Fubini's theorem and condition (iv),and obtain (cf. Exercise 2 of Sec . 6.2) :00

{F(x + a) - G(x + a)) 1 -x2

cos Tx~ dx < TpT

I f(t)-g(t)Idt.Jo

tIn conjunction with (3), this yields

(7)

2M0 3 fTI 1x sx

dx,-n< T I f(t)-g(t)1A0 t

The quantity in braces in (7) is not less than

3f'1 -cos x

00 2f

x2 dx-3 ~ 2 dx-7r= 7r 6

JTE X2

2 TOUsing this in (7), we obtain (4) .

Lemma 2 bounds the maximum difference between two d .f.'s satisfyingcertain regularity conditions by means of a certain average difference betweentheir ch .f.'s (It is stated for a function of bounded variation, since this is needed

(6)

Page 253: A Course in Probability Theory KL Chung

for the asymptotic expansion mentioned above) . This should be compared withthe general discussion around Theorem 6 .3 .4. We shall now apply the lemmato the specific case of F„ and (D in Theorem 7 .4.1 . Let the ch.f. of F„ be f

so thatkn

f n (t)= 11 fnj (t) .j=1

Lemma 3. For It I < 1/(21"113), we have

(8)

If, (t) - e -`2/2 1 < 1' n ItI 3e_t2/2 .PROOF . We shall denote by 9 below a "generic" complex number with

101 < 1, otherwise unspecified and not necessarily the same at each appearance .By Taylor's expansion :

2t

1 -Qnj t2 +9Yni t3 .

fn' ( )

2

6

For the range of t given in the lemma, we have by Liapounov's inequality :

(9)

Ianjtl <_ (Ynj'i3ti < lrn li3 ti < 2 ,

so that2 , t2 + 9y6 t32 1

1

1<

8-~ 48 < 4 .

Using (8) of Sec . 7 .1 with A = 9/2, we may write

to

t

-a/

t2 + BY, j t3 + 9 _ a~ jt 2 + 9Ynjt -gf"jO- 2

6

2

2

6

The absolute value of the last term above is less than

a4jt4

Y,jt

(

6

a,z ;ltl

Ynjltl 3

3

1 + 1

I 13

4 + 36 c

4 + 36 ) Y"j itl - (4.2 36 .8 Y~'j t

by (9) ; hence

a- 2

1

1

1

3

°~j 2 9

3log f„i(t)_- 2 - t - +

6+8+288Ynjltl_ _-2t +2Ynjt .

Summing over j, we have

t2

9

3log f„(t) _ -

2+ 2-1',t

7.4 ERROR ESTIMATION l 239

Page 254: A Course in Probability Theory KL Chung

240 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

or explicitly :

Since l e" - 1( < I u I eH"H for all u, it follows that

(f„ (t)e2 /2 - 1( < r"2tI3 exphn2tl3

Since I'„ I t 13 /2 < 1/16 and e 1116 < 2, this implies (8) .

Lemma 4. For I t I < 1/(4r,,), we have

(10)

I f,,(t)I < e-t2/3

.

PROOF . We symmetrize (see end of Sec . 6.2) to facilitate the estimation .We have

00If„j (t)1 2 = L f

cos t(x - y)dF, j (x) dFnj (Y),

since I f „ j I 2 is real . Using the elementary inequalities

cosu-1+2

< (~ 3

Ix - YI3< 4(IxI3 + IY1 3 ) ;

we see that the double integral above does not exceed0o x

t2

2

{1- 2(x2- 2xy+Y2 )+3jti 3 (jX1 3 +IY1 3 ) dF, 1 j(x)dF„j(Y)00

0C

= 1 -621t2 4

4+ 3ynjItl3 < exp -Q2jt2 + 3y„jjt1 3

Multiplying over j, we obtain

Ifn(t)1 2 < eXp __t2+ 3rnIt13 <e-(2/3)t2

for the range of t specified in the lemma, proving (10) .

Note that Lemma 4 is weaker than Lemma 3 but valid in a wider range .We now combine them .

Lemma 5. For I t I < 1/(41,,),

, we have

(11)

e-t2/2I< 16T»ItI 3 e_t2/3 .

t 2log f n (t) + 2

< 1 2 r'„ ItI 3 .

Page 255: A Course in Probability Theory KL Chung

PROOF . If Its < 1/(21, 1/3), this is implied by (8) . If 1/(2r n 1/3 ) < I tI <1/(4I' n ), then 1 < 81, ,, jt{ 3 , and so by (10) :

If, (t) - e-" /21< If, (t)l + e _t2 /2 < 2e_t2/3 < 16F,, ItI 3 e

_t2/3.

PROOF OF THEOREM 7.4.1 . Apply Lemma 2 with F = F,, and G = (D. TheM in condition (ii) of Lemma 1 may be taken to be

2rr, since both F„ and '

have mean 0 and variance 1, it follows from Chebyshev's inequality that

1F(x) V G(x) < ,

if x < 0,x21

(1 - F(x)) V (1 - G(x)) <x2,

if x > 0 ;

and consequently

Vx: IF(x) - G(x)i <1

.X2

Thus condition (iv) of Lemma 2 is satisfied . In (4) we take T = 1/(4f,) ; wehave then from (4) and (11) :

211/(4r)

nIf,(t)-e-`2/21dt+ 96 I'sux

7r fop ~ Fn(x) - ~(x)I <

t

~/Z 7r3„

32F

1/(4F)<n

tee-t2/3dt + 96 rn7r 0

/273

:!s 1,

32

t2o-r2ls dt +

967r 0

-,/2jr3

This establishes (1) with a numerical value for A0 (which may be somewhatimproved) .

Although Theorem 7 .4.1 gives the best possible uniform estimate of theremainder F,, (x) - c (x), namely one that does not depend on x, it becomesless useful if x = x,, increases with n even at a moderate speed . For instance,we have

00

1 - F„ (x,,) = 1/ e

_y2/2dy + O(h n ),

4/27c Xn

where the first "principal" term on the right is asymptotically equal to

1 -„/2N/2rrx„

e

Hence already when x, = .~/2log(1/l,n) this will be o(Tn ) for F -+ 0 andabsorbed by the remainder . For such "large deviations", what is of interest is

7.4 ERROR ESTIMATION 1 241

Page 256: A Course in Probability Theory KL Chung

242 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

an asymptotic evaluation ofI - F,(x,)1 - 4)(x,)

as x„ -a oc more rapidly than indicated above . This requires a different typeof approximation by means of "bilateral Laplace transforms", which will notbe discussed here .

EXERCISES

1. If F and G are d .f.'s with finite first moments, then~DO

IF(x) - G(x)I dx < oo .

[HINT : Use Exercise 18 of Sec . 3 .2 .]2. If f and g are ch.f.'s such that f (t) = g(t) for Itl < T, then

00

TIF(x)-G(x)Idx< ./

- T

This is due to Esseen (Acta Math, 77(1944)) .*3. There exists a universal constant A 1 > 0 such that for any sequence

of independent, identically distributed integer-valued r .v.'s {X j} with mean 0and variance 1, we have

sup IF., (x) - (D (x) I ? 1/2'

where F t? is the d.f. of (E~,1 X j)/~. [HINT : Use Exercise 24 of Sec . 6.4.]4. Prove that for every x > 0 :

00X e-x2/2 <

e-y2/2 d < le -x2 /21 -+- x2

I

- - x

7 .5 Law of the iterated logarithm

The law of the iterated logarithm is a crowning achievement in classical proba-bility theory . It had its origin in attempts to perfect Borel's theorem on normalnumbers (Theorem 5 .1 .3) . In its simplest but basic form, this asserts : if N„ (w)denotes the number of occurrences of the digit 1 in the first n places of thebinary (dyadic) expansion of the real number w in [0, 1], then Nn (w) n/2for almost every w in Borel measure . What can one say about the devia-tion N, (a)) - n/2? The order bounds O(n(l/21+E) E > 0; O((n log n) 1 / 2 ) (cf.

Page 257: A Course in Probability Theory KL Chung

7.5 LAW OF THE ITERATED LOGARITHM 1 243

Theorem 5.4.1) ; and O((n log logn) 1 /2) were obtained successively by Haus-dorff (1913), Hardy and Littlewood (1914), and Khintchine (1922) ; but in1924 Khintchine gave the definitive answer :

for almost every w . This sharp result with such a fine order of infinity as "loglog" earned its celebrated name. No less celebrated is the following extensiongiven by Kolmogorov (1929) . Let {Xn , n > 1} be a sequence of independentr.v.'s, S, = En=, Xj ; suppose that S'(X„) = 0 for each n and

(1)

sup IX, (w)I = ° ( iogiogsn ) 'S

where s,2 = U2(S"), then we have for almost every w :

(2)

limS"(6))

= 1 .n >oo ,/2S2 log log Sn

The condition (1) was shown by Marcinkiewicz and Zygmund to be of thebest possible kind, but an interesting complement was added by Hartman andWintner that (2) also holds if the Xn 's are identically distributed with a finitesecond moment. Finally, further sharpening of (2) was given by Kolmogorovand by Erdos in the Bernoullian case, and in the general case under exactconditions by Feller; the "last word" being as follows : for any increasingsequence qpn , we have

?'Ns" (w) > S,, (p„ i.o . ) _ { 01

according as the series

00

<

ne -fin /2n

,~=1

=

We shall prove the result (2) under a condition different from (1) andapparently overlapping it . This makes it possible to avoid an intricate estimateconcerning "large deviations" in the central limit theorem and to replace it byan immediate consequence of Theorem 7 .4.1 .* It will become evident that the

00 .

* An alternative which bypasses Sec . 7 .4 is to use Theorem 7 .1 .3 ; the details are left as anexercise .

nNn (w) -

lim2

1n-+oo 12 n log log n

Page 258: A Course in Probability Theory KL Chung

244 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

proof of such a "strong limit theorem" (bounds with probability one) as thelaw of the iterated logarithm depends essentially on the corresponding "weaklimit theorem" (convergence of distributions) with a sufficiently good estimateof the remainder term .

The vital link just mentioned will be given below as Lemma 1 . In the*at .of this section "A" will denote a generic strictly positive constant, notnecessarily the same at each appearance, and A will denote a constant suchthat IA I < A . We shall also use the notation in the preceding statement ofKolmogorov's theorem, and

n

Yn = ((ox"I 3 ), rnj=1

(4)

(5)

cP(), , x) = 2X

xe- y'/2 dy

as in Sec. 7.4, but for a single sequence of r.v.'s . Let us set also

log log x, 'k > 0, x > 0 .

Lemma 1. Suppose that for some E, 0 < E < 1, we have

(3)

rn <ASn

(log Sn )1+E

Then for each b, 0 < 6 < E, we have

AgD{S„ > (W +&, SO) :!S (log sn ) 1 +s'

A{S„ > (PG - E, sn )} >

.(log sn)1_ (6/2)

PROOF . By Theorem 7.4.1, we have for each x :

(6)

J~'{S„ > xs„ } = 1 1~e_~''2 d y + A T

n

,/2n x

S3

We have as x -- oc:

(7)e-x2j2

x

(See Exercise 4 of Sec . 7 .4) . Substituting x = ,,/2(1 ± 6) log log s n , the firstterm on the right side of (6) is, by (7), asymptotically equal to

Y

.,/4n(1 ± 6) log log s n (1og sn )its

Page 259: A Course in Probability Theory KL Chung

This dominates the second (remainder) term on the right side of (6), by (3)since 0 < 8 < e. Hence (4) and (5) follow as rather weak consequences .

To establish (2), let us write for each fixed 8, 0 < 8 < E:

En+ _ {w : S n (w) > cp(1 + 8, sn)},

En = {w : S n (w) > ~0(1 - 8, s')},

and proceed by steps .1° . We prove first that

(8)

~PI {En + i .o .} = 0

in the notation introduced in Sec . 4.2, by using the convergence part ofthe Borel-Cantelli lemma there. But it is evident from (4) that the seriesEn -(En+) is far from convergent, since s, is expected to be of the orderof in-. The main trick is to apply the lemma to a crucial subsequence Ink)(see Theorem 5.1 .2 for a crude form of the trick) chosen to have two prop-erties: first, >k ?'(Enk +) converges, and second, "E„+ i.o ." already implies"Enk + i.o." nearly, namely if the given 8 is slightly decreased . This modifiedimplication is a consequence of a simple but essential probabilistic argumentspelled out in Lemma 2 below .

Given c > 1, let n k be the largest value of n satisfying Sn < Ck , so that

Snk < Ck < snk+1

Since (maxi<j< n 6 j )/sn -* 0 (why?), we have snk+l/s„k -* 1, and so

(9)

snk

Ck

as k -a oo . Now for each k, consider the range of j below:

(10)

nk < j < nk+1

and put

(11)

F> _ {w: ISnk+i - S1I< S,tk+1 } .

By Chebyshev's inequality, we have

J'(F,) > 1 -

7.5 LAW OF THE ITERATED LOGARITHM 1 245

s2 - s2nk+l

nk 1S2

--~ C2nk+1

hence GJ'(F 1 ) > A > 0 for all sufficiently large k .

Lemma 2 . Let {Ej ) and {F j }, 1 < j < n < oc, be two sequences of events .Suppose that for each j, the event F j is independent of Ei . . . EcI

_ 1Ej , and

Page 260: A Course in Probability Theory KL Chung

246

1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

that there exists a constant A > 0 such that / (F j) > A for every j . Thenwe have

n

n

(12)

UEjF1 >kJ' UEjj=1

j=1

PROOF. The left member in (12) is equal to

/ n

U[(E1F1)c

(Ej-1Fj-1)`(EjFj)]

\j=1

n

rr

n

U [J E 1 E`_1

J=1

E1FJ

j=1

)(Ei . . . Ej-,E,) A,

9(E1 . . . E>-tEj)g(F'j)

j=1

which is equal to the right member in (12) .

Applying Lemma 2 to Ej+ and the F j in (11), we obtain

114_+1-1

nk+,-1

(13)

~' U Ej+Fj > Ate' U E/

J=nk

J=nk

It is clear that the event Ej+ fl F j implies

Snk_1 > S1 - S„k+r > (W + 8, S j) - Snk .+t ,

which is, by (9) and (10), asymptotically greater than

1

3

C ~O 1 + 4 8' sr, k+,

.

Choose c so close to 1 that (1 + 3/48)/c2 > 1 + (8/2) and put

8Gk = w : Snk+i >

1 + 2' Snk+,

;

note that Gk is just E„k + with 8 replaced by 8/2 . The above implication maybe written as

Ej+F j C Gk

for sufficiently large k and all j in the range given in (10); hence we have

nk+,-1

(14)

U E+F j C Gk .

J=1t k

Page 261: A Course in Probability Theory KL Chung

It follows from (4) that

E 9)(Gk) -

A +(sl2) < A 1:1 1+(s/~~ < oo .k

(log Snk )

k (k log c)

In conjunction with (13) and (14) we conclude that

nk+, - 1U Ej + < oc;

k

j=nk

and consequently by the Borel-Cantelli lemma that

nA+, -1U E j + i.o . = 0 .j=n A

This is equivalent (why?) to the desired result (8) .2° . Next we prove that with the same subsequence Ink) but an arbitrary

c, if we put tk = S2k+I - Sn A , and

Dk = w: SnA+1 (w) - S,ik (w) > cP 1 - 2 , tk

,

then we have

(15)

UT(Dk i .o .) = 1 .

Since the differences SnA+, - S1lk , k > l, are independent r.v.'s, the divergencepart of the Borel-Cantelli lemma is applicable to them. To estimate ~ (Dk ),we may apply Lemma 1 to the sequence {X,lk+j, j > 11 . We have

7.5 LAW OF THE ITERATED LOGARITHM 1 247

Hence by (5),

(16)

/I (E, - i .o .) = 1 .

2

1

2tk

1 - C2 Snk+,

and consequently by (3) :

r, k+1- r,tk < AF,lk+, <t3

S3

0090 1+Ek

nk+,

A

A~11(Dk) >-- (log tk >- ki

and so Ek ,/'(Dk) = oo and (15) follows .3 ° . We shall use (8) and (15) together to prove

, ( I - 2 C2(k+1)C

A

Page 262: A Course in Probability Theory KL Chung

248 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

This requires just a rough estimate entailing a suitable choice of c . By (8)applied to {-X,) and (15), for almost every w the following two assertionsare true :

(i) S,jk+ , (cv) - S„ k. (w) > (p(1 - (3/2), tk ) for infinitely many k ;(ii) Silk (w) > -cp(2, silk ) for all sufficiently large k.

For such an w, we have then

S(17)

S„k.+) (w) >

2 , tk - cp(2, sil k ) for infinitely many k .

Using (9) and log log t2 log log Snk+l, we see that the expression in the rightside of (17) is asymptotically greater than

S\

1C 1 - 2 1 -C2

-C cp(1, Snk+i )> (P(1 - S, Snk+3 ),

\\

/

provided that c is chosen sufficiently large . Doing this, we have thereforeproved that

(18)

'7(Enk+l _ i.o.) = 1,

which certainly implies (16) .4° . The truth of (8) and (16), for each fixed S, 0 < S <E, means exactly

the conclusion (2), by an argument that should by now be familiar to thereader.

Theorem 7.5.1 . Under the condition (3), the lim sup and Jim inf, as n -* oo,of S„ / 1/2s2 log logs, are respectively +l and - 1, with probability one .

The assertion about lim inf follows, of course, from (2) if we apply itto {-X j , j > 1} . Recall that (3) is more than sufficient to ensure the validityof the central limit theorem, namely that Sn /sn converges in dist. to c1. Thusthe law of the iterated logarithm complements the central limit theorem bycircumscribing the extraordinary fluctuations of the sequence {Sn , n > 1} . Animmediate consequence is that for almost every w, the sample sequence S, (w)changes sign infinite&v often . For much more precise results in this directionsee Chapter 8 .

In view of the discussion preceding Theorem 7 .3.3, one may wonderabout the almost everywhere bounds for

max Sin ,

max JSmJ,

and so on .1 <r<n

1 <m<n

Page 263: A Course in Probability Theory KL Chung

7.5 LAW OF THE ITERATED LOGARITHM I 249

It is interesting to observe that as far as the lim sup,, is concerned, these twofunctionals behave exactly like Sn itself (Exercise 2 below) . However, thequestion of liminf,, is quite different. In the case of maxim,, IS,,,I, anotherlaw of the (inverted) iterated logarithm holds as follows . For almost every w,we have

Jimmaxi<n,<n IS,n(w)I = 1 ;

n->oo

~2s 2n

4 /

8 log log s„

under a condition analogous to but stronger than (3) . Finally, one may wonderabout an asymptotic lower bound for ISn I . It is rather trivial to see that thisis always o(sn ) when the central limit theorem is applicable ; but actually it iseven o(sn i ) in some general cases. Indeed in the integer lattice case, underthe conditions of Exercise 9 of 7 .3, we have "S n = 0 i.o . a.e." This kind ofphenomenon belongs really to the recurrence properties of the sequence {Sn },to be discussed in Chapter 8 .

EXERCISES

1 . Show that condition (3) is fulfilled if the X D 's have a common d .f.with a finite third moment.

*2. Prove that whenever (2) holds, then the analogous relations with S,replaced by max i< ,n<n S, n or maxi<n,<,, IS,,I also hold .

*3. Let {Xj , j > 1} be a sequence of independent, identically distributedr.v.'s with mean 0 and variance 1, and S„ _ >"=1 X j . Then

9p lim ISn((0)1 = 0 } = 1 .{n+C0

)J

[xrn-r: Consider SnA+, - SnA, with nk ^- kk . A quick proof follows fromTheorem 8 .3 .3 below .]

4. Prove that in Exercise 9 of Sec . 7.3 we have '7flSn = 0 i.o .} = 1 .*5. The law of the iterated logarithm may be used to supply certain coun-

terexamples. For instance, if the X n 's are independent and X,, = ±n'/2/log logn with probability 1 each, then S, 1n --f 0 a .e., but Kolmogorov's sufficientcondition (see case (i) after Theorem 5 .4 .1) >,, I(Xn)/n 2 < oo fails .

6. Prove that U'{ISn I >(p(1 - 8, sn )i.o .) = 1, without use of (8), as

follows . Let

ek = {(0 : Sil k (w)1 < cp(1 - 8, SnA)} ;

f k = {w:Snk+l(w)_Snk(w) > (p 1

2 , sfk.+1)}

.

Page 264: A Course in Probability Theory KL Chung

250 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Show that for sufficiently large k the event ek n fk implies the complementof ek+1 ; hence deduce

k+1

k

~~ n e j < ~(elo) 11 [1 - owj )]j=jo

j=jo

and show that the product -k 0 as k -3 00 .

7 .6 Infinite divisibilityThe weak law of large numbers and the central limit theorem are concerned,respectively, with the convergence in dist . of sums of independent r .v.'s to adegenerate and a normal d .f. It may seem strange that we should be so muchoccupied with these two apparently unrelated distributions. Let us point out,however, that in terms of ch .f.'s these two may be denoted, respectively, byeat and e°`t-b't2 - exponentials of polynomials of the first and second degreein (it) . This explains the considerable similarity between the two cases, asevidenced particularly in Theorems 6 .4.3 and 6 .4.4 .

Now the question arises : what other limiting d .f.'s are there when smallindependent r .v.'s are added? Specifically, consider the double array (2) inSec. 7.1, in which independence in each row and holospoudicity are assumed .Suppose that for some sequence of constants an ,

Sn - a

e~

j=1

converges in dist. to F . What is the class of such F's, and when doessuch a convergence take place? For a single sequence of independent r .v.'s{Xj , j > 1}, similar questions may be posed for the "normed sums" (S 7z -an )/bn .These questions have been answered completely by the work of Levy, Khint-chine, Kolmogorov, and others ; for a comprehensive treatment we refer to thebook by Gnedenko and Kolmogorov [12] . Here we must content ourselveswith a modest introduction to this important and beautiful subject .

We begin by recalling other cases of the above-mentioned limiting distri-butions, conveniently dispilayed by their ch .f.'s :

i.>0 ;

e0<a<2,c>0 .

The former is the Poisson distribution ; the latter is called the symmetricstable distribution of exponent a (see the discussion in Sec . 6 .5), includingthe Cauchy distribution fc- a = 1 . We may include the normal distribution

Page 265: A Course in Probability Theory KL Chung

7.6 INFINITE DIVISIBILITY 1 251

among the latter for a = 2. All these are exponentials and have the furtherproperty that their "n th roots" :

eait/n ,

e (1 /n)(ait-b 2 t2 )

e(x/n)(e"-1)

e-(c/n)Itl'

are also ch .f.'s. It is remarkable that this simple property already characterizesthe class of distributions we are looking for (although we shall prove onlypart of this fact here) .

DEFINITION OF INFINITE DIVISIBILITY . A ch.f. f is called infinitely divisible ifffor each integer n > 1, there exists a ch .f. fn such that

( 1 )

f = (fn)n •

In terms of d.f.'s, this becomes in obvious notation :

F=Fn* = F,*F11,k . . .,kFn .

(n factors)In terms of r.v.'s this means, for each n > 1, in a suitable probability space(why "suitable"?) : there exist r .v .'s X and Xnj , 1 < j < n, the latter beingindependent among themselves, such that X has ch.f. f, X, j has ch.f. fn , and

n(2)

X =

x1 j .

j=1

X is thus "divisible" into n independent and identically distributed parts, foreach n . That such an X, and only such a one, can be the "limit" of sums ofsmall independent terms as described above seems at least plausible .

A vital, though not characteristic, property of an infinitely divisible ch .f.will be given first .

Theorem 7.6.1. An infinitely divisible ch.f. never vanishes (for real t) .

PROOF . We shall see presently that a complex-valued ch .f. is trouble-somewhen its "nth root" is to be extracted . So let us avoid this by going to the real,using the Corollary to Theorem 6 .1 .4. Let f and fn be as in (1) and write

g=If 12 , gn=Ifnl2 .

For each t EW1, g(t) being real and positive, though conceivably vanishing,its real positive nth root is uniquely defined ; let us denote it by [g(t)] 1/" . Sinceby (1) we have

g(t) = [gn (t))n ,

and gn (t) > 0, it follows that

(3)

dt: gn (t) = [g(t)]' ln •

Page 266: A Course in Probability Theory KL Chung

252 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

But 0 < g(t) < 1, hence lim„.0 [g(t)]t/n is 0 or 1 according as g(t) = 0 org(t) :A 0 . Thus urn co g„ (t) exists for every t, and the limit function, sayh(t), can take at most the two possible values 0 and 1. Furthermore, since g iscontinuous at t = 0 with g(0) = 1, there exists a to > 0 such that g(t) =A 0 forI tl < to . It follows that h(t) = 1 for ItI < to . Therefore the sequence of ch .f.'sgn converges to the function h, which has just been shown to be continuousat the origin . By the convergence theorem of Sec . 6.3, h must be a ch .f. andso continuous everywhere . Hence h is identically equal to 1, and so by theremark after (3) we have

Vt : if (t)1 z= g(t) 0 0 .

The theorem is proved .

Theorem 7 .6.1 immediately shows that the uniform distribution on [-1, 1 ]is not infinitely divisible, since its ch .f. is sin t/t, which vanishes for some t,although in a literal sense it has infinitely many divisors! (See Exercise 8 ofSec . 6.3.) On the other hand, the ch .f.

2 + cos t3

never vanishes, but for it (1) fails even when n = 2; when n > 3, the failureof (1) for this ch .f. is an immediate consequence of Exercise 7 of Sec . 6.1, ifwe notice that the corresponding p.m. consists of exactly 3 atoms .

Now that we have proved Theorem 7 .6.1, it seems natural to go back toan arbitrary infinitely divisible ch.f. and establish the generalization of (3) :

fn (t) _ [f (t)] II n

for some "determir.::ion" of the multiple-valued nth root on the right side . Thiscan be done by a 4 -nple process of "continuous continuation" of a complex-valued function of _ real variable. Although merely an elementary exercise in"complex variables" . it has been treated in the existing literature in a cavalierfashion and then r: Bused or abused. For this reason the main propositionswill be spelled ou :n meticulous detail here .

Theorem 7.6.2 . Let a complex-valued function f of the real variable t begiven. Suppose th_ : f (0) = 1 and that for some T > 0, f is continuous in[-T, T] and does : :t vanish in the interval. Then there exists a unique (single-valued) function /' . -:f t in [-T, T] with A(0) = 0 that is continuous there andsatisfies

(4)

f (t) = e~`W,

-T < t < T .

Page 267: A Course in Probability Theory KL Chung

7.6 INFINITE DIVISIBILITY 1 253

The corresponding statement when [-T, T] is replaced by (-oo, oc) isalso true .

PROOF . Consider the range of f (t), t E [-T, T] ; this is a closed set ofpoints in the complex plane . Since it does not contain the origin, we have

inf If (t)-01 =PT > 0 .-T<t<T

Next, since f is uniformly continuous in [-T, T], there exists a 8 T , 0 < ST <PT, such that if t and t' both belong to [-T, T] and I t - t'I < 8T , then I f (t) -f (t')I < PT/2 < 2 . Now divide [-T, T] into equal parts of length less than8T , say :

-T=t_1< . . .<t_1 <to=0<tl < . . <tt =T.

For t_1 < t < t1, we define A as follows :

(5)

(6)

A(t) = A(tk) +

(7)

O0 (-1)1A(t) =

{f(t) - W .Jj=1

This is a continuous function of t in [t-1, t i ], representing that determinationof log f (t) which equals 0 for t = 0. Suppose that A has already been definedin [t-k, tk ] ; then we define ), in [tk, tk+1 ] as follows :

00 (-1)'f(t)-f(tk)

j=1

.f (tk)

'

similarly in [t_k_1 , t_k] by replacing tk with t_k everywhere on the right sideabove. Since we have, for tk < t < tk+],

f(t) - f (tk)

f (tk)the power series in (6) converges uniformly in [tk, tk+I ] and represents acontinuous function there equal to that determination of the logarithm of thefunction f (t)/ f (tk ) - 1 which is 0 for t = tk . Specifically, for the "schlichtneighborhood" Iz - 1 I < 1, let

00

(-1)~

L(z) = (z - 1)'Jj=1

be the unique determination of log z vanishing at z = 1 . Then (5) and (6)become, respectively :

A(t) = L(f (t)),

t-1 < t < t1

~-(t) _ )I(tk) + Lf(t)

,

tk < t -< tk+1f (tk ) )

Page 268: A Course in Probability Theory KL Chung

254 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

with a similar expression for t_k_ 1 < t < t_k . Thus (4) is satisfied in [t_ 1 , t 1 ],and if it is satisfied for t = tk , then it is satisfied in [tk , tk+1 ], since

e ~.(r) = eX(tk)+L(f(t)lf(tA)) = f (tk) f (t)= f (t) •

f (tk )

Thus (4) is satisfied in [-T, T] by induction, and the theorem is proved forsuch an interval . To prove it for (-oo, oc) let us observe that, having defined

in [-n, n], we can extend it to [-n - 1, n + 1] with the previous method,by dividing [n, n + 1] . for example, into small equal parts whose length mustbe chosen dependent on n at each stage (why?) . The continuity of A is clearfrom the construction .

To prove the uniqueness of A, suppose that A' has the same properties asX . Since both satisfy equation (4), it follows that for each t, there exists aninteger m(t) such that

A(t) - A'(t) = 27ri m(t) .

The left side being continuous in t, m( .) must be a constant (why?), whichmust be equal to m(0) = 0. Thus A(t) = A'(t) .

Remark. It may not be amiss to point out that A(t) is a single-valuedfunction of t but not of f (t) ; see Exercise 7 below .

Theorem 7 .6.3 . For a fixed T, let each k f , k > 1, as well as f satisfy theconditions for f in Theorem 7 .6.2, and denote the corresponding ), by kA .Suppose that k f converges uniformly to f in [-T, T], then kA convergesuniformly to ;, in [-T . T] .

PROOF . Let L be as in (7), then there exists a 6, 0 < 6 < 2, such that

1L(z)I < 1,

if 1z - 11 < 6 .

By the hypothesis of uniformity, there exists kl (T) such that if k > k1 (T),then we have

Since for each t, the .xponentials of kA(t) - a.(t) and L(k f (01f (t)) are equal,there exists an integervalued function km(t), It! < T, such that

(10)

L ( kI ) / =k A(t) - A(t) + 21ri km(t),

jtj < T.f ; :

(8)

sup kf (t) 1- < 8,iti :!~-T f (t)

and consequently

(9)

sup L (t) <1 .(kf

ItIST f (t)

Page 269: A Course in Probability Theory KL Chung

fn ( t ) = e), " W = ek(t)/n

7.6 INFINITE DIVISIBILITY 1 255

Since L is continuous in Iz - I < 8, it follows that km(') is continuous inJtl < T . Since it is integer-valued and equals 0 at t - 0, it is identically zero .Thus (10) reduces to

(11)

kA(t) - a,(t) _ L kf(t) ,

Itl < T .f (t)

The function L being continuous at z - 1, the uniform convergence of kf / fto 1 in [-T, T] implies that of k l - -k to 0, as asserted by the theorem .

Thanks to Theorem 7 .6.1, Theorem 7 .6 .2 is applicable to each infinitelydivisible ch .f. f in (-oo, oc) . Henceforth we shall call the corresponding ),the distinguished logarithm, and eI(t)/n the distinguished nth root of f . Wecan now extract the correct nth root in (1) above .

Theorem 7.6.4 . For each n, the fn in (1) is just the distinguished nth rootof f .

PROOF . It follows from Theorem 7 .6 .1 and (1) that the ch .f. f,, nevervanishes in (-oc, oc), hence its distinguished logarithm a.n is defined. Takingmultiple-valued logarithms in (1), we obtain as in (10) :

Vt : a. (t) - nAn (t) - 27r mn (t),

where Mn () takes only integer values. We conclude as before that m„ (.) = 0,and consequently

(12)

as asserted .

Corollary . If f is a positive infinitely divisible ch .f., then for every t thef n (t) in (1) is just the real positive nth root of f (t) .

PROOF . Elementary analysis shows that the real-valued logarithm of areal number x in (0, oo) is a continuous function of x. It follows that this isa continuous solution of equation (4) in (-oo, oo). The uniqueness assertionin Theorem 7 .6 .2 then identifies it with the distinguished logarithm of f, andthe corollary follows, since the real positive nth root is the exponential of 1 /ntimes the real logarithm .

As an immediate consequence of (12), we have

Vt : lim f n (t) = 1 .n-+oo

Thus by Theorem 7 .1 .1 the double array {Xn 1, 1 < j < n, 1 < n) giving riseto (2) is holospoudic . We have therefore proved that each infinitely divisible

Page 270: A Course in Probability Theory KL Chung

256 I CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

distribution can be obtained as the limiting distribution of S, = E"=1 X„j insuch an array .

It is trivial that the product of two infinitely divisible ch .f.'s is again sucha one, for we have in obvious notation :

If -2 f = (If,)' - (2fn)n = (If, -2 f,) " -

The next proposition lies deeper .

Theorem 7.6.5. Let {kf , k > 1 } be a sequence of infinitely divisible ch .f.'sconverging everywhere to the ch.f. f . Then f is infinitely divisible .

PROOF . The difficulty is to prove first that f never vanishes . Consider, asin the proof of Theorem 7 .6.1 : g = I f 12 4g = I kf I 2 . For each n > 1, let x 1 Jn

denote the real positive nth root of a real positive x . Then we have, by thehypothesis of convergence and the continuity of x11' as a function of x,

(13)

Vt: [kg(t)] 'I' --* [g(t)] lln .

By the Corollary to Theorem 7 .6.4, the left member in (13) is a ch .f. The rightmember is continuous everywhere . It follows from the convergence theoremfor ch.f.'s that [g(.)] 1 /n is a ch .f. Since g is its nth power, and this is true foreach n > 1, we have proved that g is infinitely divisible and so never vanishes .Hence f never vanishes and has a distinguished logarithm X defined every-where . Let that of kf be dl. Since the convergence of kf to f is necessarilyuniform in each finite interval (see Sec . 6 .3), it follows from Theorem 7 .6.3that d- -* ? everywhere, and consequently

(14)

exp(k~.(t)/n) -a exp(A(t)/n)

for every t . The left member in (14) is a ch .f. by Theorem 7.6 .4, and the rightmember is continuous by the definition of X . Hence it follows as before thate'*O' is a ch .f. and f is infinitely divisible .

The following alternative proof is interesting . There exists a S > 0 suchthat f does not vanish for ItI < S, hence a. is defined in this interval. Foreach n, (14) holds unifonnly in this interval by Theorem 7 .6.3 . By Exercise 6of Sec . 6.3, this is sufficient to ensure the existence of a subsequence from{exp(kX(t)/n), k > 1} converging everywhere to some ch.f. (p,, . The nth powerof this subsequence then converges to (cp„)", but being a subsequence of 1k f }it also converges to f . Hence f = (cpn ) 1l, and we conclude again that f isinfinitely divisible .

Using the preceding theorem, we can construct a wide class of ch .f.'sthat are infinitely divisible . For each a and real u, the function

(ehd _ 1q3(t ; a, u) = ee~

Page 271: A Course in Probability Theory KL Chung

is an infinitely divisible ch .f., since it is obtained from the Poisson ch .f. withparameter a by substituting ut for t . We shall call such a ch .f. a generalizedPoisson ch.f. A finite product of these :

k

jj T (t ; aj , uj ) = expj=1

j=1

is then also infinitely divisible. Now if G is any bounded increasing function,the integral fo(e't" - 1)dG(u) may be approximated by sums of the kindappearing as exponent in the right member of (16), for all t in M,1 and indeeduniformly so in every finite interval (why?) . It follows that for each such G,the function

[r_0(17)

f (t) = exp

(- 1)dG(u)

is an infinitely divisible ch .f. Now it turns out that although this falls some-what short of being the most general form of an infinitely divisible ch .f.,we have nevertheless the following qualititive result, which is a completegeneralization of (16) .

Theorem 7.6.6 . For each infinitely divisible ch .f. f, there exists a doublearray of pairs of real constants (anj , u„j ), 1 < j < k,, , 1 < n, where aj > 0,such that

k„

(18)

f (t) = li ~jj q (t ; anj, un j) .j=1

The converse is also true . Thus the class of infinitely divisible d .f.'s coincideswith the closure, with respect to vague convergence, of convolutions of a finitenumber of generalized Poisson d.f.'s .

PROOF . Let f and f „ be as in (1) and let X be the distinguished logarithmof f , F„ the d.f. corresponding to fn . We have for each t, as n -a oo :

n [fn ( t ) - 1] = n [eX(t)ln - 1] --~ A(t)

and consequently(19)

e»[f"(t)-1l -* e x(t) - f (t)

Actually the first member in (19) is a ch.f. by Theorem 6 .5.6, so that theconvergence is uniform in each finite interval, but this fact alone is neithernecessary nor sufficient for what follows . We have

0Cn [f n (t) - 1 ] = f(eitu - 1)n dF„ (u) .

-oo

7.6 INFINITE DIVISIBILITY 1 257

j(e"i - 1)

Page 272: A Course in Probability Theory KL Chung

258 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

For each n, n F, is a bounded increasing function, hence there exists

{a,,j,unj ;1 <j <kn}

where -oc < un 1 < un2 < . . . < unkn < oc and an j = n [Fn (u n,j ) - Fn (un, j-1)J,such that

(20)

supItl :~1i

(21)

supltj

n

itU n- 1)a, j - I (eitu - 1)n dFn (u),J o0

knn[fn(f)-11 -e

q3(t; an j, un j )j=1

< 1- n

(Which theorem in Chapter 6 implies this?) Taking exponentials and usingthe elementary inequality l ez - e z', < I ez l (e 1z-z 1 - 1), we conclude that asn --k oo,

This and (19) imply (18) . The converse is proved at once by Theorem 7 .6.5 .

We are now in a position to state the fundamental theorem on infinitelydivisible ch .f.'s, due to P . Levy and Khintchine .

Theorem 7.6.7 . Every infinitely divisible ch .f. f has the following canonicalrepresentation :

f (t) = exp ait + 00 e 'tu - 1 -itu

1 + u2dG(u)

-o0

u2

where a is a real constant, G is a bounded increasing function in (-c, oo),and the integrand is defined by continuity to be -t2/2 at u = 0 . Furthermore,the class of infinitely divisible ch .f.'s coincides with the class of limiting ch .f.'sof E~"_ 1 X,,j - a,i in a holospoudic double array

IX,j,l <j<kn,l <n},

where kn -a oc and for each n, the r.v.'s {X,, j , 1 < j < kn } are independent.

Note that we have proved above that every infinitely divisible ch .f. is inthe class of limiting ch .f.'s described here, although we did not establish thecanonical representation . Note also that if the hypothesis of "holospoudicity"is omitted, then every ch .f. is such a limit, trivially (why?) . For a completeproof of the theorem, various special cases, and further developments, see thebook by Gnedenko and Kolmogorov [12] .

Page 273: A Course in Probability Theory KL Chung

Let us end this section with an interesting example . Put s = o, + it, a > 1and t real ; consider the Riemann zeta function :

where p ranges over all prime numbers . Fix a > 1 and define

.f (t) + it) .

We assert that f is an infinitely divisible ch .f. For each p and every real t,the complex number 1 - p-° - ` r lies within the circle {z : 1z - 1I < 1} . Letlog z denote that determination of the logarithm with an angle in (-1r, m] . Bylooking at the angles, we see that

log 11 pp--tr - log(1 - P-U) - log(1 - p-U-tr)

00

MP 'norm=1

00

T (t ; m -1 p-mv -m log P) .m=1

Sincef (t) = "li 0 11 T3(t ; m-1 p-ma, -m log p),

P<n

it follows that f is an infinitely divisible ch .f.So far as known, this famous relationship between two "big names" has

produced no important issue .

EXERCISES

1. Is the convex combination of infinitely divisible ch .f.'s also infinitelydivisible?

2. If f is an infinitely divisible ch .f. and ), its distinguished logarithm,r > 0, then the rth power of f is defined to be e' (r) . Prove that for each r > 0it is an infinitely divisible ch .f .

* 3. Let f be a ch.f. such that there exists a sequence of positive integersnk going to infinity and a sequence of ch .f.'s ck satisfying f = ((pk)" z ; thenf is infinitely divisible .

4. Give another proof that the right member of (17) is an infinitely divis-ible ch .f. by using Theorem 6 .5 .6 .

7.6 INFINITE DIVISIBILITY 1 259

1

(e -(m log 1 it - 1)

Page 274: A Course in Probability Theory KL Chung

260

1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

5. Show that f (t) _ (1 - b)1(1 - be"), 0 < b < 1, is an infinitely divis-ible ch.f. [xmrr: Use canonical form .]

6. Show that the d.f. with density ~ar(a)-lx"-le-$X, a > 0, 8 > 0, in

(0, oo), and 0 otherwise, is infinitely divisible .7. Carry out the proof of Theorem 7 .6.2 specifically for the "trivial" but

instructive case f (t) = e'", where a is a fixed real number .*8. Give an example to show that in Theorem 7 .6.3, if the uniformity of

convergence of k f to f is omitted, then kA need not converge to A . {Hmn :k f (t) = exp{2rri(-1)kkt(1 + kt)-1 -1).]

9. Let f (t) = 1 - t, f k (t) = I - t + (_1)kitk-1, 0 < t < 2, k > 1. Then

f k never vanishes and converges uniformly to f in [0, 2] . Let If-k, denote thedistinguished square root of f k in [0, 2] . Show that / 'fk does not convergein any neighborhood of t = 1 . Why is Theorem 7 .6.3 not applicable? [Thisexample is supplied by E . Reich] .

*10. Some writers have given the proof of Theorem 7 .6.6 by apparentlyconsidering each fixed t and using an analogue of (21) without the "supjtj,there. Criticize this "quick proof'. [xmrr: Show that the two relations

Vt and Vm :

Jim umn (t) = u,n (t),M_+00

Vt:

lim u.n (t) = u(t),n-+oo

do not imply the existence of a sequence {mn } such that

Vt: nli

U,,,, (t) = U(t) .-~ cc

Indeed, they do not even imply the existence of two subsequences {,n„} and

{n„} such thatVt: 1 rn um„n (t) = u(t) .

Thus the extension of Lemma 1 in Sec . 7.2 is false .]The three "counterexamples" in Exercises 7 to 9 go to show that thecavalierism alluded to above is not to be shrugged off easily .

11. Strengthening Theorem 6 .5 .5, show that two infinitely divisiblech .f.'s may coincide in a neighborhood of 0 without being identical .

*12. Reconsider Exercise 17 of Sec . 6.4 and try to apply Theorem 7 .6 .3 .

[HINT: The latter is not immediately applicable, owing to the lack of uniformconvergence. However, show first that if e`nit converges for t E A, wherein(A) > 0, then it converges for all t . This follows from a result due to Stein-haus, asserting that the difference set A - A contains a neighborhood of 0 (see,e.g., Halrnos [4, p . 68]), and from the equation e`n"e`n`t = e`n`~t-f-t Let (b, J,

{b', } be any two subsequences of c,, then e(b„-b„ )tt --k 1 for all t . Since I is a

Page 275: A Course in Probability Theory KL Chung

7.6 INFINITE DIVISIBILITY 1 261

ch.f., the convergence is uniform in every finite interval by the convergencetheorem for ch .f .'s . Alternatively, if

co(t) = lim e`n` tn-+a0

then ~o satisfies Cauchy's functional equation and must be of the form e"t,which is a ch.f. These approaches are fancier than the simple one indicatedin the hint for the said exercise, but they are interesting . There is no knownquick proof by "taking logarithms", as some authors have done .]

Bibliographical Note

The most comprehensive treatment of the material in Secs . 7.1, 7.2, and 7.6 is byGnedenko and Kolmogorov [12] . In this as well as nearly all other existing bookson the subject, the handling of logarithms must be strengthened by the discussion inSec. 7.6 .

For an extensive development of Lindeberg's method (the operator approach) toinfinitely divisible laws, see Feller [13, vol . 2] .

Theorem 7.3 .2 together with its proof as given here is implicit in

W. Doeblin, Sur deux problemes de M. Kolmogoroff concernant les chainesdenombrables, Bull . Soc . Math. France 66 (1938), 210-220 .

It was rediscovered by F. J. Anscombe . For the extension to the case where the constantc in (3) of Sec . 7.3 is replaced by an arbitrary r.v ., see H. Wittenberg, Limiting distribu-tions of random sums of independent random variables, Z . Wahrscheinlichkeitstheorie1 (1964), 7-18 .

Theorem 7.3 .3 is contained in

P. Erdos and M . Kac, On certain limit theorems of the theory of probability, Bull .Am. Math. Soc. 52 (1946) 292-302 .

The proof given for Theorem 7 .4 .1 is based on

P. L . Hsu, The approximate distributions of the mean and variance of a sample ofindependent variables, Ann. Math . Statistics 16 (1945), 1-29 .

This paper contains historical references as well as further extensions .For the law of the iterated logarithm, see the classic

A . N. Kolmogorov, Uber das Gesetz des iterierten Logarithmus, Math. Annalen101 (1929), 126-136 .

For a survey of "classical limit theorems" up to 1945, see

W. Feller, The fundamental limit theorems in probability, Bull . Am. Math. Soc .51 (1945), 800-832 .

Page 276: A Course in Probability Theory KL Chung

262 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Kai Lai Chung, On the maximum partial sums of sequences of independent randomvariables . Trans . Am . Math. Soc. 64 (1948), 205-233 .

Infinitely divisible laws are treated in Chapter 7 of Levy [11] as the analyticalcounterpart of his full theory of additive processes, or processes with independentincrements (later supplemented by J . L. Doob and K . Ito) .

New developments in limit theorems arose from connections with the Brownianmotion process in the works by Skorohod and Strassen . For an exposition see references[16] and [22] of General Bibliography .

Page 277: A Course in Probability Theory KL Chung

8

(1)

Random walk

8.1 Zero-or-one laws

In this chapter we adopt the notation N for the set of strictly positive integers,and N° for the set of positive integers ; used as an index set, each is endowedwith the natural ordering and interpreted as a discrete time parameter . Simi-larly, for each n E N, N,, denotes the ordered set of integers from 1 to n (bothinclusive) ; N° that of integers from 0 to n (both inclusive) ; and Nn that ofintegers beginning with n + 1 .

On the probability triple (2, T, °J'), a sequence {Xn , n E N} where eachXn is an r.v . (defined on Q and finite a.e.), will be called a (discrete parameter)stochastic process . Various Borel fields connected with such a process will nowbe introduced . For any sub-B .F. ~ of ~T, we shall write

XE-

and use the expression "X belongs to . " or "S contains X" to mean thatX-1 (:T) c 6' (see Sec . 3 .1 for notation) : in the customary language X is saidto be "measurable with respect to ~j" . For each n E N, we define two B .F.'sas follows :

Page 278: A Course in Probability Theory KL Chung

264 1 RANDOM WALK

`7n = the augmented B .F. generated by the family of r .v .'s {Xk, k E N n } ;that is, , is the smallest B .F.' containing all Xk in the familyand all null sets ;

= the augmented B .F. generated by the family of r.v.'s {Xk, k E N'n } .

Recall that the union U,°°_1 ~ is a field but not necessarily a B .F. Thesmallest B .F. containing it, or equivalently containing every -,~?n- , n E N, isdenoted by

00moo = V Mn ;

n=1

it is the B .F. generated by the stochastic process {Xn , n E N}. On the otherhand, the intersection f n_ 1 n is a B .F. denoted also by An°= 1 0~' . It will becalled the remote field of the stochastic process and a member of it a remoteei vent .

Since 3~~ C ~,T, 9' is defined on ~& . For the study of the process {Xn , n E

N} alone, it is sufficient to consider the reduced triple (S2, 3'~, G1' Thefollowing approximation theorem is fundamental .

Theorem 8 .1.1 . Given e > 0 andsuch that

A E ~, there exists A E E Un_=1 9,Tn

(2'

-1~6(AAA E ) < c .

PROOF . Let -6' be the collection of sets A for which the assertion of thetheorem is true . Suppose Ak E 6~ for each k E N and Ak T A or Ak , A . ThenA also belongs to i, as we can easily see by first taking k large and thenapplying the asserted property to Ak . Thus ~ is a monotone class . Since it istrivial that ~ contains the field Un° l n that generates ~&, ' must contain;1 by the Corollary to Theorem 2 .1 .2, proving the theorem .

Without using Theorem 2 .1 .2, one can verify that 6' is closed with respectto complementation (trivial), finite union (by Exercise 1 of Sec . 2 .1), andcountable union (as increasing limit of finite unions) . Hence ~ is a B .F. thatmust contain ~& .

It will be convenient to use a particular type of sample space Q . In theno:ation of Sec . 3 .4, let 00

Q = X Qn,

n=1

where each St n is a "copy" of the real line R 1 . Thus S2 is just the space of allinnnite sequences of real numbers . A point w will be written as {wn , n E N},

and w, as a function of (o will be called the nth coordinate

nction) of w.

Page 279: A Course in Probability Theory KL Chung

Each S2„ is endowed with the Euclidean B .F . A 1 , and the product Borel fieldin the notation above) on Q is defined to be the B .F. generated by

the finite-product sets of the formk

(3)

n {co: wn , E B" j }j=1

where (n 1 , . . . , nk ) is an arbitrary finite subset of N and where each B" 1 E A 1 .In contrast to the case discussed in Sec . 3 .3, however, no restriction is made onthe p.m. ?T on JT . We shall not enter here into the matter of a general construc-tion of JT. The Kolmogorov extension theorem (Theorem 3 .3.6) asserts thaton the concrete space-field (S2, i ) just specified, there exists a p .m. 9' whoseprojection on each finite-dimensional subspace may be arbitrarily preassigned,subject only to consistency (where one such subspace contains the other) .Theorem 3 .3 .4 is a particular case of this theorem .

In this chapter we shall use the concrete probability space above, of theso-called "function space type", to simplify the exposition, but all the resultsbelow remain valid without this specification . This follows from a measure-preserving homomorphism between an abstract probability space and one ofthe function-space type ; see Doob [17, chap . 2] .

The chief advantage of this concrete representation of Q is that it enablesus to define an important mapping on the space .

DEFINITION OF THE SHIFT . The shift z is a mapping of 0 such that

r : w = {w", n E N} --f rw = {w,,+1, n E N} ;

in other words, the image of a point has as its nth coordinate the (n + l)stcoordinate of the original point .

Clearly r is an oo-to-1 mapping and it is from Q onto S2 . Its iteratesare defined as usual by composition: r° = identity, rk = r . rk-1 for k > 1 . Itinduces a direct set mapping r and an inverse set mapping r -1 according tothe usual definitions . Thus

r-1A`{w:rweA}

and r-" is the nth iterate of r -1 . If A is the set in (3), then

k(4)

r-1 A= n {w: w„ ;+1 E B,, ; }j=1

It follows from this that r-1 maps Ji into ~/T ; more precisely,

VA E :-~ : -r- " A E

n E N,

8 .1 ZERO-OR-ONE LAWS 1 265

Page 280: A Course in Probability Theory KL Chung

266 1 RANDOM WALK

where is the Borel field generated by {wk, k > n]. This is obvious by (4)if A is of the form above, and since the class of A for which the assertionholds is a B .F ., the result is true in general .

DEFNrrION . A set in 3 is called invariant (under the shift) iff A = r-1 A .An r.v . Y on Q is invariant iff Y(w) = Y(rw) for every w E 2 .

Observe the following general relation, valid for each point mapping zand the associated inverse set mapping -c - I, each function Y on S2 and eachsubset A of : h 1

(5)

r-1 {w: Y(w) E A} = {w : Y(rw) E A} .

This follows from z -1 ° Y -1 = (Y o z)-1 .

We shall need another kind of mapping of Q . A permutation on Nn is a1-to-1 mapping of N„ to itself, denoted as usual by

l,

2, . . .,

n61,

62, . . .,

an ) .

The collection of such mappings forms a group with respect to composition .A finite permutation on N is by definition a permutation on a certain "initialsegment" N of N . Given such a permutation as shown above, we define awto be the point in Q whose coordinates are obtained from those of w by thecorresponding permutation, namely

_ w0j,

if j E Nn ;(mow) j - wj

if j E Nn .,

As usual . o induces a direct set mapping a and an inverse set mapping a -1 ,the latter being also the direct set mapping induced by the "group inverse"07-1 of o. In analogy with the preceding definition we have the following .

DEFncrHON . A set in i is called permutable iff A = aA for every finitepermutation or on N. A function Y on Q is permutable iff Y(w) = Y(aw) for,every finite permutation a and every w E Q .

It is fairly obvious that an invariant set is remote and a remote set ispermutable: also that each of the collections : all invariant events, all remoteevents, all permutable events, forms a sub-B .F. of,- . If each of these B .F.'sis augmented (see Exercise 20 of Sec 2.2), the resulting augmented B .F.'swill be called "almost invariant", "almost remote", and "almost permutable",respectively . Finally, the collection of all sets in Ji of probability either 0 or

Page 281: A Course in Probability Theory KL Chung

8.1 ZERO-OR-ONE LAWS 1 267

1 clearly forms a B .F ., which may be called the "all-or-nothing" field . ThisB .F. will also be referred to as "almost trivial" .

Now that we have defined all these concepts for the general stochasticprocess, it must be admitted that they are not very useful without furtherspecifications of the process . We proceed at once to the particular case below .

DEFINITION . A sequence of independent r.v.'s will be called an indepen-dent process ; it is called a stationary independent process iff the r.v.'s have acommon distribution .

Aspects of this type of process have been our main object of study,usually under additional assumptions on the distributions . Having christenedit, we shall henceforth focus our attention on "the evolution of the processas a whole" -whatever this phrase may mean . For this type of process, thespecific probability triple described above has been constructed in Sec . 3.3 .Indeed, ZT= ~~, and the sequence of independent r.v.'s is just that of thesuccessive coordinate functions [a), n c- N), which, however, will also beinterchangeably denoted by {X,,, n E N}. If (p is any Borel measurable func-tion, then {cp(X„ ), n E N) is another such process .

The following result is called Kolmogorov's "zero-or-one law" .

Theorem 8.1.2 . For an independent process, each remote event has proba-bility zero or one .

PROOF . Let A E nn, 1 J~,t and suppose that DT(A) > 0; we are going toprove that ?P(A) = 1 . Since 3„ and uri are independent fields (see Exercise 5of Sec . 3.3), A is independent of every set in ~„ for each n c- N ; namely, ifM E Un 1 %,, then

(6)

P(A n m) = ~(A)?)(M).

If we set~(A n m)~A(M) _ '~(A)

Afor M E JT, then ETA ( .) is clearly a p .m . (the conditional probability relative to

; see Chapter 9) . By (6) it coincides with ~~ on Un°_1 3 „ and consequentlyalso on -~T by Theorem 2 .2.3 . Hence we may take M to be A in (6) to concludethat /-,(A) = ?/'(A)2 or ,?P(A) = 1 .

The usefulness of the notions of shift and permutation in a stationaryindependent process is based on the next result, which says that both -1 -1 andor are "measure-preserving" .

Page 282: A Course in Probability Theory KL Chung

268 1 RANDOM WALK

Theorem 8.1.3 . For a stationary independent process, if A E - and a isany finite permutation, we have

(7)

J'(z- 'A) _ 9/'(A) ;

(8)

^QA) _ .J'(A ) .

PROOF . Define a set function JT on T as follows :

~' (A) = 9)(i-'A).

Since z-1 maps disjoint sets into disjoint sets, it is clear that ~' is a p .m. Fora finite-product set A, such as the one in (3), it follows from (4) that

kA(A) _ fJ,u,(B ) = -'J'(A) .

j=1

Hence ;~P and -';~ coincide also on each set that is the union of a finite numberof such disjoint sets, and so on the B .F. 2-7 generated by them, according toTheorem 2.2 .3 . This proves (7) ; (8) is proved similarly .

The following companion to Theorem 8.1 .2, due to Hewitt and Savage,is very' useful .

Theorem 8.1.4 . For a stationary independent process, each permutable eventhas probability zero or one .

PROOF . Let A be a permutable event . Given c > 0, we may choose Ek > 0so that cc

E Ek < 6 -k=1

By Theorem 8 .1 .1, there exists Ak E '" k such that 91(A A Ak ) <may suppose that nk T oo . Let

a = ( 1, . . .,nk,nk+1, . . .,2nknk +1, . .,2n k ,

1, . .,nk

and Mk = aAk . Then clearly Mk E .~A . It follows from (8) that

^A A Mk) < Ek-

For any sequence of sets {Ek } in , we have

00 00

/sup Ek )

Ekk

m=1 k=m

00

k=1

UI'(Ek)-

Ek, and we

Page 283: A Course in Probability Theory KL Chung

Applying this to Ek = A AMk , and observing the simple identity

lim sup (A A Mk) =(A\ lim infMk) U (lira sup Mk \ A ),

we deduce that

'(A o lim sup Mk ) < J'(lim sup(A A Mk)) < E .

Since U',,Mk E nm, the set lim sup Mk belongs to nk_,,, „A. , which is seento coincide with the remote field. Thus lim sup M k has probability zero or oneby Theorem 8 .1 .2, and the same must be true of A, since E is arbitrary in theinequality above .

Here is a more transparent proof of the theorem based on the metric onthe measure space (S2, ~~ 7 , 9) given in Exercise 8 of Sec . 3 .2. Since A k andMk are independent, we have

A(Ak fl Mk) _ ,'~1(Ak)9'(Mk) .

Now Ak -,, A and Mk -* A in the metric just mentioned, hence also

Akf1Mk-+APIA

in the same sense . Since convergence of events in this metric implies conver-gence of their probabilities, it follows that J' (A fl A) = J'(A)~(A ), and thetheorem is proved .

Corollary. For a stationary independent process, the B .F.'s of almostpermutable or almost remote or almost invariant sets all coincide with theall-or-nothing field .

EXERCISES

Q and ~ are the infinite product space and field specified above .

1 . Find an example of a remote field that is not the trivial one ; to makeit interesting, insist that the r.v.'s are not identical .

2. An r.v. belongs to the all-or-nothing field if and only if it is constanta.e .

3. If A is invariant then A = rA ; the converse is false .4. An r.v. is invariant [permutable] if and only if it belongs to the

invariant [permutable] field .5. The set of convergence of an arbitrary sequence of r .v.'s {Y,,, n E N}

or of the sequence of their partial sums E"_, Yj are both permutable. Theirlimits are permutable r .v.'s with domain the set of convergence .

8.1 ZERO-OR-ONE LAWS 1 269

Page 284: A Course in Probability Theory KL Chung

270 1 RANDOM WALK

*6. If an > O, limn, a„ exists > 0 finite or infinite, and limn_o, (a,,+I /an)= 1, then the set of convergence of {an-1 E~_1 Y 1 } is invariant. If an -* + 00,the upper and lower limits of this sequence are invariant r .v.'s .

*7. The set {Y2, E A i .o .}, where A E Adl, is remote but not necessarilyinvariant; the set {E'= Yj A i.o .} is permutable but not necessarily remote .Find some other essentially different examples of these two kinds .

8. Find trivial examples of independent processes where the threenumbers JT(r-1 A), T(A), :7(rA) take the values 1, 0, 1 ; or 0, z, 1 .

9. Prove that an invariant event is remote and a remote event ispermutable .

*10. Consider the bi-infinite product space of all bi-infinite sequences ofreal numbers {co n , n E N}, where N is the set of all integers in its natural(algebraic) ordering . Define the shift as in the text with N replacing N, andshow that it is 1-to-1 on this space. Prove the analogue of (7) .

11 . Show that the conclusion of Theorem 8 .1 .4 holds true for a sequenceof independent r.v.'s, not necessarily stationary, but satisfying the followingcondition: for every j there exists a k > j such that Xk has the same distribu-tion as Xj . [This remark is due to Susan Horn .]

12. Let IX,, n > 1 } be independent r .v.'s with 1flXn = 4-n } = ~P{Xn =-4-' T } = 2 . Then the remote field of {S,,, n > 11, where S„ Xi, is nottrivial .

8.2 Basic notions

From now on we consider only a stationary independent process {X11 , n EN} on the concrete probability triple specified in the preceding section . Thecommon distribution of X„ will be denoted by p. (p .m.) or F (d.f.) ; when onlythis is involved, we shall write X for a representative X,,, thus ((X) for cr'(X n ) .

Our interest in such a process derives mainly from the fact that it under-lies another process of richer content . This is obtained by forming the succes-sive partial sums as follows :

nSn = ZX 1 , n EN .

j=1

(1)

An initial r.v. So - 0 is adjoined whenever this serves notational convenience,as in Xn = Sn - S„ _ I for n E N . The sequence IS,, n E N} is then a veryfamiliar object in this book, but now we wish to find a proper name for it . Anofficially correct one would be "stochastic process with stationary independentdifferences" ; the name "homogeneous additive process" can also be used . We

Page 285: A Course in Probability Theory KL Chung

have, however, decided to call it a "random walk (process)", although the useof this term is frequently restricted to the case when A is of the integer latticetype or even more narrowly a Bernoullian distribution .

DEFINUION OF RANDOM WALK . A random walk is the process {Sn , n E N}defined in (1) where {Xn , n E N} is a stationary independent process . Byconvention we set also So = 0 .

A similar definition applies in a Euclidean space of any dimension, butwe shall be concerned only with R1 except in some exercises later .

Let us observe that even for an independent process {Xn , n E N}, itsremote field is in general different from the remote field of {Sn , n E N}, whereS„ _ >J=1 X j . They are almost the same, being both almost trivial, for astationary independent process by virtue of Theorem 8 .1 .4, since the remotefield of the random walk is clearly contained in the permutable field of thecorresponding stationary independent process .

We add that, while the notion of remoteness applies to any process,"(shift)-invariant" and "permutable" will be used here only for the underlying"coordinate process" {con , n E N} or IX,, n E N} .

The following relation will be much used below, for m < n

mn-m

X (rrnw) =n-m

X i+. (w) = Sn (w) - Sm (w) •j=1

j=1

It follows from Theorem 8.1 .3 that S, _,n and Sn - Sm have the same distri-bution. This is obvious directly, since it is just ,u (n - nt ) * .

As an application of the results of Sec . 8 .1 to a random walk, we statethe following consequence of Theorem 8 .1 .4 .

Theorem 8.2.1 . Let B„ E %3 1 for each n e N. Then

u~{Sn E Bn i.o .}

is equal to zero or one .PROOF . If cr is a permutation on N,n , then S„ (6co) = S n (c )) for n > m,

hence the set 00Ain = U {Sn E Bn}

n =in

is unchanged under o--1 or a . Since A,n decreases as m increases, itfollows that 00

n A,Amin=]

is permutable, and the theorem is proved .

8.2 BASIC NOTIONS 1 271

Page 286: A Course in Probability Theory KL Chung

272 1 RANDOM WALK

Even for a fixed B = B n the result is significant, since it is by no meansevident that the set {Sn > 0 i.o .), for instance, is even vaguely invariantor remote with respect to {Xn , n E N} (cf. Exercise 7 of Sec . 8 .1) . Yet thepreceding theorem implies that it is in fact almost invariant . This is the strengthof the notion of permutability as against invariance or remoteness .

For any serious study of the random walk process, it is imperative tointroduce the concept of an "optional r.v." This notion has already been usedmore than once in the book (where?) but has not yet been named . Since thebasic inferences are very simple and are supposed to be intuitively obvious,it has been the custom in the literature until recently not to make the formalintroduction at all . However, the reader will profit by meeting these funda-mental ideas for the theory of stochastic processes at the earliest possible time .They will be needed in the next chapter, too .

DEFINITION OF OPTIONAL r.v . An r.v. a is called optional relative to thearbitrary stochastic process {Zn , n E N} iff it takes strictly positive integervalues or -boo and satisfies the following condition :

(2)

Vn E N U fool : {w: a(w) = n} E Win,

where „ is the B .F. generated by {Zk , k E Nn } .Similarly if the process is indexed by N° (as in Chapter 9), then the range

of a will be N° . Thus if the index n is regarded as the time parameter, thena effects a choice of time (an "option") for each sample point (0 . One maythink of this choice as a time to "stop", whence the popular alias "stoppingtime", but this is usually rather a momentary pause after which the processproceeds again: time marches on!

Associated with each optional r .v . a there are two or three importantobjects. First, the pre-a field `T is the collection of all sets in ~& of the form

(3)

U {{a = n) n An ],I <n <oo

where A n E Zx„ for each n E N U fool . This collection is easily verified to bea B .F . (how?) . If A E Via, then we have clearly A n {a = n} E for everyn . This property also characterizes the members of a (see Exercise 1 below) .Next, the post-a process is the process I {Z,+,, n E N} defined on the traceof the original probability triple on the set {a < oc}, where

(4)

Vn E N: Za+n (w) = Za(w)+n (w) .

Each Za+n is seen to be an r .v . with domain {a < oo} ; indeed it is finite a.e .there provided the original Z n 's are finite a.e. It is easy to see that Za E ~Ta .The post-a field « is the B .F. generated by the post-a process : it is a sub-B .F .of {a<oo}n1~ .

Page 287: A Course in Probability Theory KL Chung

8.2 BASIC NOTIONS 1 273

Instead of requiring a to be defined on all 0 but possibly taking the valueoc, we may suppose it to be defined on a set A in Note that a strictlypositive integer n is an optional r.v. The concepts of pre-a and post-a fieldsreduce in this case to the previous ~„ and ~~, ; .

A vital example of optional r.v. is that of the first entrance time into agiven Borel set A :

00min{n E N: Zn (w) E A}

on U {w : Zn (w) E A) ;(5)

aA(w) =

n=1

+00

elsewhere.To see that this is optional, we need only observe that for each n E N:

{w: aA(w) = n } _ {w: Zj (w) E A`, 1 < j < n - 1 ; Zn (w) E A}

which clearly belongs to ?,T, ; similarly for n = oo .Concepts connected with optionality have everyday counterparts, implicit

in phrases such as "within thirty days of the accident (should it occur)" . Histor-ically, they arose from "gambling systems", in which the gambler choosesopportune times to enter his bets according to previous observations, experi-ments, or whatnot . In this interpretation, a + 1 is the time chosen to gambleand is determined by events strictly prior to it. Note that, along with a, a + 1is also an optional r.v ., but the converse is false .

So far the notions are valid for an arbitrary process on an arbitrary triple .We now return to a stationary independent process on the specified triple andextend the notion of "shift" to an "a-shift" as follows : ra is a mapping on{a < oo} such that

(6)

racy = rnw on {w: a(co) = n} .

Thus the post-a process is just the process IX,, (taco), n E N} . Recalling thatXn is a mapping on S2, we may also write

(7)

Xa+n (w) = Xn (ra (0) = (Xn ° 'r' ) (0) )

and regard X„ ° ra, n E N, as the r.v.'s of the new process . The inverse setmapping (ra) -1 , to be written more simply as r-a, is defined as usual :

r-'A = {w : raw E A} .

Let us now prove the fundamental theorem about "stopping" a stationaryindependent process .

Theorem 8.2.2 . For a stationary independent process and an almost every-where finite optional r .v . a relative to it, the pre-a and post-a fields are inde-pendent. Furthermore the post-a process is a stationary independent processwith the same common distribution as the original one .

Page 288: A Course in Probability Theory KL Chung

274 1 RANDOM WALK

PROOF . Both assertions are summarized in the formula below. For anyA E .-;,, k E N, Bj E A1 , 1< j< k, we have

k

(8)

9'{A;X«+j E Bj, 1 < j < k} = _"P{A} 11 g(Bj) .j=1

To prove (8), we observe that it follows from the definition of a and ~a that

(9) Aft{a=n}=A n fl{a=n}E9,,

where An E 7n for each n E N. Consequently we have

°J'{A :a = n ; X,,+, E Bj , 1 < j < k} = °J'{A n ; a = n ;Xn+j E Bj , 1 < j < k}

= _IP{A ;a = n}_'{Xn+j E B j , 1 < j < k}k

_ A; a = n} 11 g(Bj),j=1

where the second equation is a consequence of (9) and the independence ofn and zT' . Summing over n E N, we obtain (8) .

An immediate corollary is the extension of (7) of Sec . 8.1 to an a-shift .

Corollary. For each A E 3~~ we have

(10)

91(r-'A) _ _'~P(A) .

Just as we iterate the shift r, we can iterate r" . Put a 1 = a, and defineak inductively by

ak+1((o) = ak

k E N .

Each ak is finite a .e. if a is . Next, define 8o = 0, and

We are now in a position to state the following result, which will beneed°,: later.

Theorem 8.2.3. Let a be an a .e. finite optional r.v. relative to a stationaryindependent process. Then the random vectors {Vk, k E N}, where

Vk(w) _ (ak (w) , X k_,+i (w), . . . , XP, (w)),

are independent and identically distributed .

Page 289: A Course in Probability Theory KL Chung

PROOF . The independence follows from Theorem 8 .2.2 by our showingthat 1 ,11, . . . , Vk_ 1 belong to the pre-,Bk_ 1 field, while V k belongs to the post-$k_1 field. The details are left to the reader ; cf. Exercise 6 below .

To prove that Vk and Vk+1 have the same distribution, we may supposethat k = 1 . Then for each n E N, and each n-dimensional Borel set A, we have

{w: a 2 (w) = n ; (Xaj+1(w), . . ., Xai+a 2 ((O)) E A}

= {co: a l (t"co) = n ; (X 1 (r"CO), . . ., Xa i (r" (0) E A),

since

Xa I (r" CJ) = Xa i(~W)(t" co) = Xa2(w)(taw)

= XaI (m)+a2((0) (co) = X,1 +,2 (w)

by the quirk of notation that denotes by X,,(.) the function whose value atw is given by Xa(,, ) (co) and by (7) with n = a2 (co) . By (5) of Sec . 8.1, thepreceding set is the r-"-image (inverse image under r') of the set

{w: a l (w) = n ; (X1(w), . . ., Xal (w)) E A),

and so by (10) has the same probability as the latter . This proves our assertion .

Corollary. The r.v .'s {Yk , k E N}, where

fYk(W) =

(P(Xn(w))+1

and co is a Borel measurable function, are independent and identicallydistributed .

For cP - 1, Yk reduces to ak . For co(x) - x, Yk = S& - S$A _, . The readeris advised to get a clear picture of the quantities ak, 13k' and Y k beforeproceeding further, perhaps by considering a special case such as (5) .

We shall now apply these considerations to obtain results on the "globalbehavior" of the random walk. These will be broad qualitative statementsdistinguished by their generality without any additional assumptions .

The optional r .v. to be considered is the first entrance time into the strictlypositive half of the real line, namely A = (0, oc) in (5) above . Similar resultshold for [0, oc) ; and then by taking the negative of each X n , we may deducethe corresponding result for (-oc, 0] or (-oo, 0) . Results obtained in this waywill be labeled as "dual" below . Thus, omitting A from the notation :

8.2 BASIC NOTIONS 1 275

(1 1)

a(co)

min{n E N: Sn > 0}

+cc

elsewhere ;

00on U {w: Sn (w) > 0} ;

n=1

Page 290: A Course in Probability Theory KL Chung

276 1 RANDOM WALK

andb'n EN :{a=n}={Sj <0for1 < j <n-1 ;Sn >0} .

Define also the r .v. M„ as follows :

(12)

do E N°: Mn (w) = OmaxT Sj (w) .

The inclusion of SO above in the maximum may seem artificial, but it doesnot affect the next theorem and will be essential in later developments in thenext section. Since each X, is assumed to be finite a.e ., so is each Sn andMn .Since Mn increases with n, it tends to a limit, finite or positive infinite, to bedenoted by

(13)

M(w) = lira M n (w) = Sup Si (w) .n +cc

0<j<00

Theorem 8.2.4 . The statements (a), (b), and (c) below are equivalent; thestatements (a'), (b'), and (c') are equivalent .

(a) ~fla < +oo} = 1 ;

(a') PJ'{a < +oo} < 1 ;

(b) Z'{ lim Sn = +oo} = 1 ; (b') _,fl lim S, = +oo} = 0;n-+oc

n-4oo

(c) _~P{M = +oo} = 1 ;

(c') °J'{M = +oo} = 0 .

PROOF . If (a) is true, we may suppose a < oc everywhere . Consider ther.v . Sa : it is strictly positive by definition and so 0 < C,','(Sa ) < +00 . By theCorollary to Theorem 8 .2.3, {S&+, - S,6,, k > 1 } is a sequence of indepen-dent and identically distributed r.v.'s . Hence the strong law of large numbers(Theorem 5 .4.2 supplemented by Exercise 1 of Sec . 5.4) asserts that, if a° = 0and S° = 0 :

SA,n

(SSA+i - S/k ) - + (, - (S,,) > 0 a.e .

This implies (b) . Since lim n,,,, Sn < M, (b) implies (c) . It is trivial that (c)implies (a) . We have thus proved the equivalence of (a), (b), and (c) . If (a')is true, then (a) is false, hence (b) is false . But the set

{ lira S n = + oo}n-oc

is clearly permutable (it is even invariant, but this requires a little more reflec-tion), hence (b') is true by Theorem 8 .1 .4. Now any numerical sequence withfinite upper limit is bounded above, hence (b) implies (c') . Finally, if (c') istrue then (c) is false, hence (a) is false, and (a') is true. Thus (a'), (b'), and(c') are also equivalent .

Page 291: A Course in Probability Theory KL Chung

Theorem 8.2.5. For the general random walk, there are four mutually exclu-sive possibilities, each taking place a.e . :

(i) `dn E N: S n = 0;(ii) S, -~ -00 ;

(Ill) Sn --~ +00;

(iv) -00 = limn-),oo Sn < lim""~ S n = +00.

PROOF. If X = 0 a.e ., then (i) happens . Excluding this, let c1 = limn Sn .Then ~o1 is a permutable r.v ., hence a constant c, possibly ±oo, a .e . byTheorem 8 .1 .4. Since

lim Sn = X 1 + lim(Sn - X1),n

n

we have (P1 = X1 +'P2, where c02 (0)) ='p l (rw) = c a.e . Since X1 $ 0, itfollows that c = +oo or -oo . This means that

either llm Sn = +oo or lim Sn = -00nn

n

By symmetry we have also

either lim Sn = -oo or lim S, = +00 .nn

These double alternatives yield the new possibilities (ii), (iii), or (iv), othercombinations being impossible .

This last possibility will be elaborated upon in the next section .

EXERCISES

In Exercises 1-6, the stochastic process is arbitrary .

* 1. a is optional if and only if V n E N: {a < n } E `fin .*2. For each optional a we have a E a and Xa E « . If a and ~ are both

optional and a < ,8, then 3 C . p7 .3. If a 1 and a2 are both optional, then so is a l A a2, a l V a2 , a 1 + a2 . If

a is optional and A E a, then ao defined below is also optional :

ao =a

on A+00

on c2\0 .

*4. If a is optional and ~ is optional relative to the post-a process, thena + ~ is optional (relative to the original process) .

5. `dk E N : a l + . . . + ak is optional . [For the a in (11), this has beencalled the kth ladder variable .]

8.2 BASIC NOTIONS 1 277

Page 292: A Course in Probability Theory KL Chung

278 1 RANDOM WALK

*6. Prove the following relations :

ak = a ° TOA-1 ,;

T~A-1 ° Ta = TO' ; XRk-i +j ° -r' = XI'

/gk+j

7. If a and 8 are any two optional r .v.'s, then

X13+j (Taw) = Xa((0)+fl(raw)+j ((0 ) ;

(TO ° Ta)(w) = z~(r"w)+a(w)(~) 0 T~+a (w) in general .

*8. Find an example of two optional r.v.'s a and fi such that a < ,B buta' ~5 ~j . However, if y is optional relative to the post-a process and ,B =

a + y, then indeed Via' D 1'. As a particular case, ~k is decreasing (while'-1 k is increasing) as k increases .

9. Find an example of two optional r.v.'s a and ,B such that a < ,B but- a is not optional .

10. Generalize Theorem 8.2.2 to the case where the domain of definitionand finiteness of a is A with 0 < ~P(A) < 1 . [This leads to a useful extensionof the notion of independence . For a given A in 9 with _`P(A) > 0, twoevents A and M, where M C A, are said to be independent relative to A iff{A n A n M} = {A n A}go {M} .]* 11. Let {X,,, n E N} be a stationary independent process and folk, k E N}

a sequence of strictly increasing finite optional r.v.'s . Then {Xa k +1, k E N} isa stationary independent process with the same common distribution as theoriginal process . [This is the gambling-system theorem first given by Doob in1936.]

12. Prove the Corollary to Theorem 8 .2.2 .13. State and prove the analogue of Theorem 8.2.4 with a replaced by

a[o,,,,) . [The inclusion of 0 in the set of entrance causes a small difference .]14. In an independent process where all X„ have a common bound,

(c{a} < oc implies (`{S a } < oc for each optional a [cf. Theorem 5 .5 .3] .

8 .3 Recurrence

A basic question about the random walk is the range of the whole process :U,OO_1 S„ (w) for a.e . w ; or, "where does it ever go?" Theorem 8 .2.5 tells usthat, ignoring the trivial case where it stays put at 0, it either goes off to -00or +oc, or fluctuates between them . But how does it fluctuate? Exercise 9below will show that the random walk can take large leaps from one end tothe other without stopping in any middle range more than a finite number oftimes. On the other hand, it may revisit every neighborhood of every point aninfinite number of times . The latter circumstance calls for a definition .

Page 293: A Course in Probability Theory KL Chung

DEFINrrION . The number x E P21 is called a recurrent value of the randomwalk {S,,, n E N}, iff for every c > 0 we have

(1)

{ISn -xI < E i .0.} = 1 .

The set of all recurrent values will be denoted by t)1 .

Taking a sequence of E decreasing to zero, we see that (1) implies theapparently stronger statement that the random walk is in each neighborhoodof x i .o. a.e .

Let us also call the number x a possible value of the random walk ifffor every e > 0, there exists n E N such that ISn - xI < E} > 0. Clearly arecurrent value is a possible value of the random walk (see Exercise 2 below) .

Theorem 8.3.1 . The set Jt is either empty or a closed additive group of realnumbers. In the latter case it reduces to the singleton {0) if and only if X = 0a.e . ; otherwise 91 is either the whole _0'2 1 or the infinite cyclic group generatedby a nonzero number c, namely {±nc: n E N° ) .

PROOF. Suppose N :A 0 throughout the proof . To prove that S}1 is a group,let us show that if x is a possible value and y E Y, then y - x E N. Supposenot; then there is a strictly positive probability that from a certain value of n

on, Sn will not be in a certain neighborhood of y - x. Let us put for z E R1 :

(2)

p€,m(z) _ 7{ISn - zI > E for all n > m} ;

so that p 2E , n,(y - x) > 0 for some e > 0 and m E N. Since x is a possiblevalue, for the same c we have a k such that P{ISk - xI < E) > 0. Now thetwo independent events I Sk - x I < E and I S„ - Sk - (y - x) I > 2E togetherimply that IS, - y I > E ; hence

(3) PE,k+m(Y) _ ='T {IS,, - Y I > E for all n > k + m)

> JY{ISk - XI < WITS, - Sk - (y - x)I > 2E for all n > k + in) .

The last-written probability is equal to p2E.n1(y - x), since S„ - Sk has thesame distribution as Sn _k. It follows that the first term in (3) is strictly positive,contradicting the assumption that y E 1. We have thus proved that 91 is anadditive subgroup of :%,' . It is trivial that N as a subset of %, 1 is closed in theEuclidean topology. A well-known proposition (proof?) asserts that the onlyclosed additive subgroups of are those mentioned in the second sentenceof the theorem . Unless X - 0 a.e ., it has at least one possible value x =A 0,and the argument above shows that -x = 0 - x c= 91 and consequently alsox = 0 - (-x) E 91 . Suppose 91 is not empty, then 0 E 91 . Hence 91 is not asingleton. The theorem is completely proved .

8.3 RECURRENCE I 279

Page 294: A Course in Probability Theory KL Chung

280 1 RANDOM WALK

It is clear from the preceding theorem that the key to recurrence is thevalue 0, for which we have the criterion below .

Theorem 8.3.2 . If for some E > 0 we have

(4)

1: -,:;P{ IS, I < E} < oc,n

then

(5)

-:flIS .

(for the same c) so that 0 ' N . If for every E > 0 we have

(6)

EJ ?;;D{IS,I < E} = oo,n

then

(7)

?;?If IS, I < E i .o .} = 1

for every E > 0 and so 0 E 9I .

Remark . Actually if (4) or (6) holds for any E > 0, then it holds forevery c > 0 ; this fact follows from Lemma 1 below but is not needed here .

PROOF . The first assertion follows at once from the convergence part ofthe Borel-Cantelli lemma (Theorem 4.2.1) . To prove the second part consider

F = lim inf{ I S, I > E} ;n

namely F is the event that IS n I < E for only a finite number of values of n .For each co on F, there is an m(c)) such that IS, (w)I > E for all n > m(60); itfollows that if we consider "the last time that IS, I < E", we have

0C~'(F) =

7{ISmI < E; IS, I > E for all n > m + 1} .M=0

Since the two independent events ISm I < E and I Sn - Sm I > 2E together implythat IS, I > e, we have

a1 > 1' (F) >

ISmI < E},'~P{IS, - SmI > 2E for all n > m+ 1}m=1

{ISmI < E}P2E,1(0)M=1

< E i.o .} = 0

Page 295: A Course in Probability Theory KL Chung

by the previous notation (2), since S,, - Sn, has the same distribution as Sn-m •Consequently (6) cannot be true unless P2,,1(0) = 0. We proceed to extendthis to show that P2E,k = 0 for every k E N. To this aim we fix k and considerthe event

A,,,={IS.I<E,ISn I>Eforall n>m+k} ;

then A, and A n,- are disjoint whenever m' > m + k and consequently (why?)

km=1

The argument above for the case k = 1 can now be repeated to yield00

k

J'{ISm I < E}P2E,k(0),m=1

and so P2E,k(0) = 0 for every E > 0 . Thus

1'(F) = +m PE,k(0) = 0,

which is equivalent to (7) .A simple sufficient condition for 0 E t, or equivalently for 91

willnow be given .

Theorem 8.3.3 . If the weak law of large numbers holds for the random walkIS, n E N} in the form that Sn /n -* 0 in pr., then N 0 0.

PROOF. We need two lemmas, the first of which is also useful elsewhere .

Lemma 1. For any c > 0 and m E N we have00

(8)

,~P{ISn I < mE} < 2m

-'~fl IS„ I < e} .

(9)

n=0

(PJ (Sn ) =

n=0

PROOF OF LEMMA 1 . It is sufficient to prove that if the right member of (8)is finite, then so is the left member and (8) is true . Put

I = (- E, E), J = [JE, (j + 1)E),

for a fixed j E N ; and denote by co1,'J the respective indicator functions .Denote also by a the first entrance time into J, as defined in (5) of Sec . 8 .2with Z„ replaced by S,, and A by J . We have

=k

00

=1

8.3 RECURRENCE 1 281

coj (Sn )d 7 .

Page 296: A Course in Probability Theory KL Chung

282 1 RANDOM WALK

The typical integral on the right side is by definition of a equal to00

f

1 +

6oj (Sn ) d ' <ft-=k)

1 +

6PI (Sn - Sk) d~,(a=k}

n=k+1

n=k+1

since {a = k) C {Sk E J} and {Sk E J} nIS, E J} C IS, - S k E I}. Now {a =k} and Sn - Sk are independent, hence the last-written integral is equal to

00°J'(a = k) 1 + f

co1(Sn - SO d °J' = J'(a = k) c1

n=k+1

n=0

since (p, (SO) = 1 . Summing over k and observing that cpj (0) = 1 only if j = 0,in which case J C I and the inequality below is trivial, we obtain for each j

00

cp (Sn)

L~

(PI (Sn )n=0

n=0

Now if we write Jj for J and sum over j from -m to m - 1, the inequality(8) ensues in disguised form .

This lemma is often demonstrated by a geometrical type of argument .We have written out the preceding proof in some detail as an example of themaxim: whatever can be shown by drawing pictures can also be set down insymbols!

Lemma 2. Let the positive numbers {un (m)}, where n E N and m is a realnumber > 1, satisfy the following conditions:

(i) V, : un (m) is increasing in m and tends to 1 as m -* oc ;(ii) 2c > 0 : En°0 un (M) < cm E00=

(iii) 'd8 > 0 : lim„ .,,) un (Sn) = 1 .

Then we have

(10)

un (1) = oo .n=0

Remark. If (ii) is true for all integer m > 1, then it is true for all realm > 1, with c doubled .

PROOF OF LEMMA 2. Suppose not ; then for every A > 0 :

00 >1

U, 0) > -n=0

cmn=0

1 [Am]

(m) > -Cm

n=0

Page 297: A Course in Probability Theory KL Chung

>

U ( )Cm

n=0

Letting m -+ oc and applying (iii) with 8 = A -1 , we obtain00

1: un (1) ? A 'Cn=0

8.3 RECURRENCE 1 283

since A is arbitrary, this is a contradiction, which proves the lemma

To return to the proof of Theorem 8 .3 .3, we apply Lemma 2 with

un (m) = 3'{ ISn I < m} .

Then condition (i) is obviously satisfied and condition (ii) with c = 2 followsfrom Lemma 1 . The hypothesis that Sn /n -- 0 in pr. may be written as

un (Sn) = GJ'Sn

n

for every S > 0 as n - oo, hence condition (iii) is also satisfied . Thus Lemma 2yields

00E -'~'P{ I Sn I < 1) _ +00.

n=0

Applying this to the "magnified" random walk with each Xn replaced by Xn /E,which does not disturb the hypothesis of the theorem, we obtain (6) for everye > 0, and so the theorem is proved .

In practice, the following criterion is more expedient (Chung and Fuchs,1951) .

Theorem 8.3.4 . Suppose that at least one of S(X+) and e(X- ) is finite . The910 0 if and only if cP(X) = 0; otherwise case (ii) or (iii) of Theorem 8 .2.5happens according as c(X) < 0 or > 0 .

PROOF . If -oo < (-f(X) < 0 or 0 < o (X) < +oo, then by the strong lawof large numbers (as amended by Exercise 1 of Sec . 5.4), we have

Sn -~ (X) a.e .,n

1 A

so that either (ii) or (iii) happens as asserted . If (f (X) = 0, then the same lawor its weaker form Theorem 5 .2.2 applies ; hence Theorem 8 .3 .3 yields theconclusion .

Page 298: A Course in Probability Theory KL Chung

284 1 RANDOM WALK

DEFINITION OF RECURRENT RANDOM WALK. A random walk will be calledrecurrent iff , :A 0 ; it is degenerate iff N _ {0} ; and it is of the lattice typeiff T is generated by a nonzero element .

The most exact kind of recurrence happens when the X, 's have a commondistribution which is concentrated on the integers, such that every integer isa possible value (for some S,,), and which has mean zero . In this case foreach integer c we have 7'{S n = c i .o .} = 1 . For the symmetrical Bernoullianrandom walk this was first proved by Polya in 1921 .

We shall give another proof of the recurrence part of Theorem 8 .3 .4,namely that ( (X) = 0 is a sufficient condition for the random walk to be recur-rent as just defined . This method is applicable in cases not covered by the twopreceding theorems (see Exercises 6-9 below), and the analytic machineryof ch.f.*s which it employs opens the way to the considerations in the nextsection .

The starting point is the integrated form of the inversion formula inExercise 3 of Sec. 6.2. Talking x = 0, u = E, and F to be the d .f. of Sn , wehave

E

pp

(11) -"On I < E) > f ~(ISn I < u) du = E f : 1 -~os Etf(t)n dt .E

Thus the series in (6) may be bounded below by summing the last expres-sion in (11). The latter does not sum well as it stands, and it is natural to resortto a summability method . The Abelian method suits it well and leads to, for0<r<1 :

00(12)

rn ~T(ISn I < E) > 1

1 - cos et R

1 dt,76 f00

t2

1 - r f (t)n=0

where R and I later denote the real and imaginary parts or a complex quan-tity. Since

1

1-r(13)

R1-rf(t)>

I1-rf(t)I2 >0

and (1 - cos Et)/t 2 > CE2 for IEtI < 1 and some constant C, it follows thatfor r~ < 1\c the right member of (12) is not less than

n(14)

m jR1 - r f (t) dt .

-

77

Now the existence of FF(JXI) implies by Theorem 6.4 .2 that 1 - f (t) = o(t)as t -* 0 . Hence for any given 8 > 0 we may choose the rl above so that

II1 - rf (t) 1 2 < (1 - r + r[1 - R f (t)]) 2 + (rlf (t)) 2

Page 299: A Course in Probability Theory KL Chung

The integral in (14) is then not less than

n

(1 - r) dt

1

rW - 6-1

ds-

2(1 - r) 2 + 3r2S2t2 dt >- 3 f_(1_r)

r71 + 8 2s2

As r T 1, the right member above tends to rr\3S ; since S is arbitrary, we haveproved that the right member of (12) tends to +oo as r T 1 . Since the seriesin (6) dominates that on the left member of (12) for every r, it follows that(6) is true and so 0 E In by Theorem 8.3 .2 .

EXERCISES

f is the ch.f. of p .

1. Generalize Theorems 8.3 .1 and 8 .3.2 to Rd . (For d > 3 the general-ization is illusory ; see Exercise 12 below.)

*2. If a random walk in R d is recurrent, then every possible value is arecurrent value .

3. Prove the Remark after Theorem 8 .3 .2 .*4. Assume that 7{X 1 = 01 < 1 . Prove that x is a recurrent value of the

random walk if and only if

00'{ ISn - x I < E) = oo for every c > 0 .

n=1

5. For a recurrent random walk that is neither degenerate nor of thelattice type, the countable set of points IS, (w), n E N) is everywhere dense inM, 1 for a .e.w. Hence prove the following result in Diophantine approximation :if y is irrational, then given any real x and c > 0 there exist integers m and nsuch that Imy + n - xj < E .

*6 . If there exists a 6 > 0 such that (the integral below being real-valued)

s

dtlim

= 00,rTl

rf (t)

then the random walk is recurrent .*7. If there exists a S > 0 such that

s

dtsup

< oo,O<r<1 f s I - rf (t)

8.3 RECURRENCE 1 285

< 2(1 - r) 2 + 2(r8t) 2 + (r8t) 2 = 2(1 - r)2 + 3r 28Y .

Page 300: A Course in Probability Theory KL Chung

286 I RANDOM WALK

then the random walk is not recurrent. [HINT : Use Exercise 3 of Sec . 6.2 toshow that there exists a constant C(E) such that

1 - cos E-lx

C(E) l/E

u

,,'(IS,, I < E) < C(E)_'w1

x2

An (dx) <21

dufu f (t)" dt,

where ftn is the distribution of S n , and use (13) . [Exercises 6 and 7 give,respectively, a necessary and a sufficient condition for recurrence . If a is ofthe integer lattice type with span one, then

7r

1J-n 1 - f (t)

dt = oo

is such a condition according to Kesten and Spitzer .]*8. Prove that the random walk with f (t) = e- I t I (Cauchy distribution) is

recurrent .*9. Prove that the random walk with f(t) = e- I tI", 0 < a < 1, (Stable

law) is not recurrent, but (iv) of Theorem 8 .2 .5 holds .10. Generalize Exercises 6 and 7 above to Rd .

*11. Prove that in g2 if the common distribution of the random vector(X, Y) has mean zero and finite second moment, namely :

C'5~(X) = 0,

(Y) = 0, 0 < C (X2 + Y2) < oo,

then the random walk is recurrent . This implies by Exercise 5 above thatalmost every Brownian motion path is everywhere dense in 2 . [mi''r : Usethe generalization of Exercise 6 and show that

1

CR>1 - f (tl, t2) - t1 + t2

for sufficiently small It l I + 1t2 1 . One can also make a direct estimate :

C ':~'(ISn I < E) > - •]n

*12. Prove that no truly 3-dimensional random walk, namely one whosecommon distribution does not have its support in a plane, is recurrent . [HINT :

There exists A > 0 such that

'If )_

Page 301: A Course in Probability Theory KL Chung

is a strictly positive quadratic form Q in (t 1 , t2, t3)- If

3

Iti I <A - ',i=l

thenR{1 - f (ti, t2, t3)} > CQ(t1, t2, t3) .]

13.. Generalize Lemma 1 in the proof of Theorem 8 .3 .3 to Rd . For d = 2the constant 2m in the right member of (8) is to be replaced by 4m2 , and"Sn < E" means S, is in the open square with center at the origin and sidelength 2E .

14. Extend Lemma 2 in the proof of Theorem 8.3 .3 as follows. Keepcondition (i) but replace (ii) and (iii) by

(ii') En=0 un (m) < cm2 En=0 un (1) ;

There exists d > 0 such that for every b > 1 and m > m(b):2

(iii') un (m) > dm for m2 < n < (bm) 2n

Then (10) is true .*15. Generalize Theorem 8 .3 .3 to 92 as follows. If the central limit

theorem applies in the form that S 12 I ,fn- converges in dist. to the unitnormal, then the random walk is recurrent . [HINT : Use Exercises 13 and 14 andExercise 4 of § 4 .3 . This is sharper than Exercise 11 . No proof of Exercise 12using a similar method is known .]

16. Suppose (X) = 0, 0 < (,,(X2) < oo, and A is of the integer latticetype, then

GJ'{Sn 2 = 0 i .o .} = 1 .

17. The basic argument in the proof of Theorem 8 .3.2 was the "last timein (-E, E)" . A harder but instructive argument using the "first time" may begiven as follows .

fn

{ISO)>Eform<j<n-1 ;ISnI <E} . gn(E)=~P{IS,I <E} .

Show that for 1 < m < M :M

M

M

E gn (E) < E fn(m) 1: gn (2E) .n=m

n=m

n=0

* This form of condition (iii') is due to Hsu Pei ; see also Chung and Lindvall, Proc. Amer. Math .Soc . Vol . 78 (1980), p. 285 .

8.3 RECURRENCE 1 287

Page 302: A Course in Probability Theory KL Chung

288 1 RANDOM WALK

It follows by a form of Lemma 1 in this section that

limin-+x n=mfnm) > 14

now use Theorem 8 .1 .4 .18. For an arbitrary random walk, if flVn E N : Sn > 0) > 0, then

P{Sn < 0, Sn+i > 01 < oo .n

Hence if in addition 9I{Vn E N : S, < 01 > 0, then

E I -~P7 {Sn > 01 - -~P{Sn+1 > O1j < oo ;n

and consequently

(-1)n,,,J'{Sn > 01 < oo .n n

[HINT : For the first series, consider the last time that S n < 0 ; for the third series,apply Du Bois-Reymond's test. Cf. Theorem 8.4.4 below; this exercise willbe completed in Exercise 15 of Sec . 8.5 .]

8 .4 Fine structure

In this section we embark on a probing in depth of the r .v . a defined in (11)of Sec . 8.2 and some related r.v.'s .

The r.v. a being optional, the key to its introduction is to break up thetime sequence into a pre-a and a post-a era, as already anticipated in theterminology employed with a general optional r.v . We do this with a sort ofcharacteristic functional of the process which has already made its appearancein the last section :

1(1)

~r

rn eitSn _

n f (t)n =1 -rf(t)

n=0 n=0

where 0 < r < 1, t is real, and f is the ch .f. of X . Applying the principle justenunciated, we break this up into two parts :

a-I<<

rn eltS„ +r

It=0

rn eltSn

Page 303: A Course in Probability Theory KL Chung

with the understanding that on the set {a = oo}, the first sum above is En°_owhile the second is empty and hence equal to zero . Now the second part maybe written as

00

00(2)

e I: ra+neitSa+n =

raeitsaLr

rn e it(S..+n -S.)

n=0

n=0

It follows from (7) of Sec . 8 .2 that

,S'a+n - Sa = Sno ra

has the same distribution as S n , and by Theorem 8 .2.2 that for each n it isindependent of Sa . [Note that the same fact has been used more than oncebefore, but for a constant a.] Hence the right member of (2) is equal to

t { ra eits" } d, rn e itsn = f{raeits,}11-rf(t)

obtain

8.4 FINE STRUCTURE 1 289

where raeitsa is taken to be 0 for a = oo. Substituting this into (1), we

1

a-1

(3) [1 - t{rae itsa } ] _

rn e itsn1 - rf (t)

n=0

We have00

(4) 1 - „ {ra e its'} = 1 -E rnfla=n)

and

(5)a-1

00 r

k-1

rn e itSn =

1

rn eitsn d-OP

n=0

k=1 {a=k} n=000= 1 +E

rn

eits~ dJ'n=1

{a>n}

by an interchange of summation . Let us record the two power series appearingin (4) and (5) as

00P(r, t) = 1 - S{raeits') = 1: rn Pn (t) ;

n=0

00Q(r, t) _

rneitsn -E rn gn(t),5G

n=0

n=0

Page 304: A Course in Probability Theory KL Chung

290 1 RANDOM WALK

where

where

two

po(t)

pn (t) _ -fla=n)

qo(t) = 1, qn (t) =

eirsn dam'= J

ei x Vn (dx) .{a>n)

i

Now Un (•) = Jf{a = n ; Sn E •} is a finite measure with support in (0, oc),while V,, ( •) = Jf{a > n ; S, E •} is a finite measure with support in (-oc, 0] .Thus each pn is the Fourier transform of a finite measure in (0, oo), whileeach qn is the Fourier transform of a finite measure in (-oc, 0] . Returning to(3), we may now write it as

(6)

1 - r f (t) P(r, t) = Q(r, t).

The next step is to observe the familiar Taylor series :

1

°O1 -

x = exp

, IxI < 1 .n=1

Thus we have

1(7)

1 - r f (t)=

eXpn=1

= exp

= f+(r, t) -1 f-(r, t),

f+ (r, t) =exp -

f -(r, t) = exp +

n J(_00,0]

eitXAn (dx) ,n=1

eitSn dJ' = - f eitXUn (dx) ;

r ' eirsn dGJ' +f

eirsn dg'{Sn >o)

{Sn _<0)

e`tX[Ln(dx) ,

and ,a () _ /'{Sn E •} is the distribution of S, . Since the convolution ofmeasures both with support in (0, oc) has support in (0, oc), and the

convolution of two measures both with support in (-oc, 0] has support in(-oc, 0], it follows by expansion of the exponential functions above and

Page 305: A Course in Probability Theory KL Chung

rearrangements of the resulting double series that

00

.f+(r, t) = 1 +

~on(t),

.f-(r, t) = 1 +

" *n(t),n=1

where each cp n is the Fourier transform of a measure in (0, oo), while eachYin is the Fourier transform of a measure in (-oo, 0] . Substituting (7) into (6)and multiplying through by f + ( r, t), we obtain

(8)

P(r, t)f -(r, t) = Q(r, t) f +(r, t) .

The next theorem below supplies the basic analytic technique for this devel-opment, known as the Wiener-Hopf technique .

Theorem 8.4.1 . Let00

P(r, t)

rn pn (t),

Q(r, t) =

n qn (t),n =O00

Pk(t)gn-k(t) _k=0 k=0

8.4 FINE STRUCTURE 1 291

n=0

00

P * (r, t) =

rnPn (t),

Q* (r, t) _n=0

n=0

00

n=1

n q,i (t),

where po (t) - qo (t) = po (t) = qo (t) = 1 ; and for n > 1, p„ and p; as func-tions of t are Fourier transforms of measures with support in (0, oc) ; q„ andq, as functions of t are Fourier transforms of measures in (-oo, 0]. Supposethat for some ro > 0 the four power series converge for r in (0, ro) and allreal t, and the identity

(9)

P(r, t)Q * (r, t) _- P * (r, t)Q(r, t)

holds there . ThenP=P* ,

The theorem is also true if (0, oc) and (-oc, 0] are replaced by [0, oc) and(-oc, 0), respectively .

PROOF. It follows from (9) and the identity theorem for power series thatfor every n > 0 :

n

n

Pk( t )gn -k(t ) -

Then for n = 1 equation (10) reduces to the following :

pl (t) - pi (t) = qi (t) - qi (t) .

Page 306: A Course in Probability Theory KL Chung

292 1 RANDOM WALK

By hypothesis, the left member above is the Fourier transform of a finite signedmeasure v 1 with support in (0, oc), while the right member is the Fouriertransform of a finite signed measure with support in (-oc, 0] . It follows fromthe uniqueness theorem for such transforms (Exercise 13 of Sec . 6.2) thatwe must have v 1 - v 2 , and so both must be identically zero since they havedisjoint supports . Thus p1 = pi and ql = q* . To proceed by induction on n,suppose that we have proved that p j - pt and qj - q~ for 0 < j < n - 1 .Then it follows from (10) that

pn (t) + qn (t) _- pn (t) + qn (t) •

Exactly the same argument as before yields that pn - pn and qn =- qn . Hencethe induction is complete and the first assertion of the theorem is proved ; thesecond is proved in the same way .

Applying the preceding theorem to (8), we obtain the next theorem inthe case A = (0, oo) ; the rest is proved in exactly the same way .

Theorem 8.4.2 . If a = aA is the first entrance time into A, where A is oneof the four sets : (0, oo), [0, oc), (-oc, 0), (-oo, 0], then we have

1 - c {rae`rs~} = exp -

e`tsnd 7 ;

(* )

rn eitsn 1 = exp +

elrsn d~P .n=1

n

{S„EA`}

From this result we shall deduce certain analytic expressions involvingthe r.v. a. Before we do that, let us list a number of classical Abelian andTauberian theorems below for ready reference .

(A) If c, > 0 and En°-o Cn rn converges for 0 < r < 1, then

00lim

cnrn =rT1

11=0 n=0

finite or infinite .(B) If Cn are complex numbers and En°_o c, rn converges for 0 < r < 1,

then (*) is true .(C) If Cn are complex numbers such that Cn = o(n _1 ) [or just O(n -1 )]

as n --+ oc, and the limit in the left member of (*) exists and is finite, then(*) is true .

Page 307: A Course in Probability Theory KL Chung

11-r

8.4 FINE STRUCTURE 1 293

(D) If c,(,' ) > 0, En°-0 c( ' )r n converges for 0 < r < 1 and diverges forr=1,i=1,2;and

n

nE Ck (l) K 1: n2)

k=O

k=0

[or more particularly cn' ) - Kc,(2) ] as n --+ oo, where 0 < K < +oc, then

00

00

cMrn _ K 1: C(2) rnn

nn=0

n=0

as rT1 .(E) If c, > 0 and

00

E Cn rnn=0

as r T 1, thenn-1

I: Ck-n .k=0

Observe that (C) is a partial converse of (B), and (E) is a partial converseof (D) . There is also an analogue of (D), which is actually a consequence of(B) : if cn are complex numbers converging to a finite limit c, then as r f 1,

00 cE Cn rnti

1 - rn=0

Proposition (A) is trivial . Proposition (B) is Abel's theorem, and proposi-tion (C) is Tauber's theorem in the "little o" version and Littlewood's theoremin the "big 0" version ; only the former will be needed below . Proposition (D)is an Abelian theorem, and proposition (E) a Tauberian theorem, the latterbeing sometimes referred to as that of Hardy-Littlewood-Karamata . All fourcan be found in the admirable book by Titchmarsh, The theory of functions(2nd ed., Oxford University Press, Inc ., New York, 1939, pp . 9-10, 224 ff.) .

Theorem 8.4.3 . The generating function of a in Theorem 8 .4.2 is given by

n(13)

(`{r') = 1 - exp -

rGJ'[Sn E A]

nn=1n

= 1 - (1 - r) exp

y-GJ'[Sn E AC] .

Page 308: A Course in Probability Theory KL Chung

294 1 RANDOM WALK

We have

(14)

~P{a < oo} = 1 if and only f

-~[Sn E A] = oc ;

in which case

(15)

1 {a) = exp En~[Sn E Ac]

n=1

PROOF . Setting t = 0 in (11), we obtain the first equation in (13), fromwhich the second follows at once through

1

00Yn

00Yn

00 Yn= exp

= exp

-OP[Sn E A] +

-"P[Sn E A`]1-r

nn=

n=

n=

Since

n=0

n=1lim 1{ra } = lim

?'{ = n =rT1

rTl

n=0

n=1

by proposition (A), the middle term in (13) tends to a finite limit, hence alsothe power series in r there (why?) . By proposition (A), the said limit may beobtained by setting r = 1 in the series. This establishes (14) . Finally, settingt = 0 in (12), we obtain

a-1

n(16)

Lrn = exp

r~P[Sn E Ac ]

nn=0

n=1

Rewriting the left member in (16) as in (5) and letting r T 1, we obtain00

(17)

urn

rn 77[a > n] =

°J' [a > n] = 1{a} < oorTI

= n} = P{a < oo}

by proposition (A) . The right member of (16) tends to the right member of(15) by the same token, proving (15).

When is (N a) in (15) finite? This is answered by the next theorem.*

O° 1

* It can be shown that S, --* +oc a.e . if and only if f{a(o,~~} < oo; see A. J . Lemoine, Annalsof Probability 2(1974) .

Theorem 8.4.4 .is finite; then

Suppose that X # 0 and at least one of 1(X+) and 1(X - )

(18) 1(X) > 0 = ct{a(o,00)} < 00;

(19) c'(X) < 0 = (P{a[o,00)) = 00-

Page 309: A Course in Probability Theory KL Chung

PROOF. If c` (X) > 0, then 91{Sn -f +oo) = 1 by the strong law of largenumbers . Hence Sn = -00} = 0, and this implies by the dualof Theorem 8 .2.4 that {a(_,,,o) < oo) < 1 . Let us sharpen this slightly toJ'{a(_,,),o] < o0} < 1 . To see this, write a' = a(_oo,o] and consider S, asin the proof of Theorem 8 .2 .4. Clearly Sin < 0, and so if JT{a' < oo} = 1,one would have J'{Sn < 0 i.o .) = 1, which is impossible. Now apply (14) toa(_o,~,o] and (15) to a(o, 00 ) to infer

O° 1f l(y(o,oo)) = exp E

< 01 < oc,n=1

n

proving (18) . Next if (X) = 0, then J'{a(_~,o) < 00} = 1 by Theorem 8.3 .4 .Hence, applying (14) to

and (15) to a[o,,,.), we infer

°O 1{a[o,~)} = exp

nJ'[Sn < 0] = 0c .

n=1

Finally, if (X) < 0, then /){a[o,~) = 00} > 0 by an argument dual to thatgiven above for a(,,,o], and so {a[o,,,3 )} = oc trivially .

Incidentally we have shown that the two r .v.'s a(o,w) and a[o, 0c ) haveboth finite or both infinite expectations . Comparing this remark with (15), wederive an analytic by-product as follows .

Corollary. We have

(20)

nJT [Sn = 0] < 00 .

n=1

This can also be shown, purely analytically, by means of Exercise 25 ofSec. 6 .4 .

The astonishing part of Theorem 8 .4 .4 is the case when the random walkis recurrent, which is the case if ((X) = 0 by Theorem 8 .3 .4 . Then the set[0, 00), which is more than half of the whole range, is revisited an infinitenumber of times. Nevertheless (19) says that the expected time for even onevisit is infinite! This phenomenon becomes more paradoxical if one reflectsthat the same is true for the other half (-oc, 0], and yet in a single step oneof the two halves will certainly be visited . Thus we have :

01(- 0,-,01 A a[o, C)O) = 1,

( {a'(_ ,o]} = c~'{a[o,oo)} = 00 .

Another curious by-product concerning the strong law of large numbersis obtained by combining Theorem 8 .4.3 with Theorem 8 .2.4 .

8.4 FINE STRUCTURE I 295

Page 310: A Course in Probability Theory KL Chung

296 1 RANDOM WALK

Theorem 8.4.5 . S, 1n -* m a.e. for a finite constant m if and only if forevery c > 0 we have

(23)

~,-, {S«} =

n=1

Sn--mn

PROOF . Without loss of generality we may suppose m = 0 . We know fromTheorem 5 .4.2 that Sn /n -> 0 a.e. if and only if (,'(IX I) < oc and (~ (X) = 0 .If this is so, consider the stationary independent process {X' , n E N}, whereXn = Xn - E, E > 0 ; and let S' and a' = ado ~> be the corresponding r .v.'sfor this modified process. Since AX') _ -E, it follows from the strong lawof large numbers that S' --> -oo a.e ., and consequently by Theorem 8 .2.4 wehave

< oo} < 1 . Hence we have by (14) applied to a' :

00 1(22)

1: -°J'[S n - nE > 0] < 00 .n

By considering Xn + E instead of Xn - E, we obtain a similar result with"Sn - nE > 0" in (22) replaced by "Sn + nE < 0" . Combining the two, weobtain (21) when m = 0 .

Conversely, if (21) is true with m = 0, then the argument above yields°J'{a' < oo} < 1, and so by Theorem 8.2.4, GJ'{limn . oo Sn = +oo} = 0. Afortiori we have

. E > 0 : _'P{S' > nE i.0 .} = -:7p{Sn > 2ne i.o .) = 0 .

Similarly we obtain YE > 0 : J'{S,, < -2ne i.o.} = 0, and the last two rela-tions together mean exactly Sn/n --* 0 a.e. (cf. Theorem 4 .2.2) .

Having investigated the stopping time a, we proceed to investigate thestopping place S a , where a < oc . The crucial case will be handled first .

Theorem 8 .4.6. If (-r(X) = 0 and 0 < e(X2) = 62 < oc, then

a exp

1-

1- - ' (Sn E A )

< 00 .

n=1n 2

PROOF . Observe that ~,,(X) = 0 implies each of the four r .v .'s a is finitea.e. by Theorem 8.3 .4 . We now switch from Fourier transform to Laplacetransform in (11), and suppose for the sake of definiteness that A = (0, oo ) .It is readily verified that Theorem 6.6.5 is applicable, which yields

0° nr(24)

1 - e`"'{r"e- ;~S°} = exp - . -

e-)'s" dGJ'n

{sn>0)n=1

Page 311: A Course in Probability Theory KL Chung

for 0 < r < 1, 0 < X < oo. Letting r T 1 in (24), we obtain an expression forthe Laplace transform of Sa , but we must go further by differentiating (24)with respect to X to obtain

(25) (r'{rae- ;~S~Sa}

I

r nS„ e-X s, dUT • exp -

-

e-xS ^ d:~ ,ii=1 n

17>0)

n=1 n

(S,, >0)

the justification for termwise differentiation being easy, since f { IS,, I } < n J'I I X I) .

If we now set ?, = 0 in (25), the result is

(26)

(-r{raSa }00 Yn

00 rn- (,{S„ } exp

> 0]n n=1

By Exercise 2 of Sec . 6 .4, we have as n --). oc,

Ln Snn }

s/27rn

so that the coefficients in the first power series in (26) are asymptotically equalto a/-~f2- times those of

°° . 1

2n(1 - r)-1/2 =

22n C n/ r

n ,n=0

since

22n (2n

2)

1nti

rrn

It follows from proposition (D) above that

Il00

r <<{S,i } ^- Q (1 - r)-1/2 = Q exp +

r n

n

-

1 1 2n

Substituting into (26), and observing that as r T 1, the left member of (26)tends to C` {Sa } < oc by the monotone convergence theorem, we obtain

/1

(27)

~` {Sa } _a

00

lim exp > • r 1[_(s~ n > 0)rT1

1 n

2I1=1

It remains to prove that the limit above is finite, for then the limit of thepower series is also finite (why? it is precisely here that the Laplace transformsaves the day for us), and since the coefficients are o(l/n) by the central

8.4 FINE STRUCTURE 1 297

Page 312: A Course in Probability Theory KL Chung

298 1 RANDOM WALK

limit theorem, and certainly O(11n) in any event, proposition (C) above willidentify it as the right member of (23) with A = (0, oc) .

Now by analogy with (27), replacing (0, cc) by (-oo, 0] and writinga(_oc,o] as 8, we have

(28)

({Sp} =v

lim exp

n 1 J'(S,, < 0)rTl

n=1n 2

Clearly the product of the two exponentials in (27) and (28) is just exp 0 = 1,hence if the limit in (27) were +oc, that in (28) would have to be 0 . Butsince c% (X) = 0 and (`(X2) > 0, we have T(X < 0) > 0, which implies atonce %/'(Sp < 0) > 0 and consequently cf{Sf } < 0. This contradiction provesthat the limits in (27) and (28) must both be finite, and the theorem is proved .

Theorem 8.4.7 . Suppose that X # 0 and at least one of (,(X+) and (-,''(X- )is finite ; and let a = a (o , oc ) , , = a(_,,o 1 .

(i) If (-r(X) > 0 but may be +oo, then ASa ) =(ii) If t(X) = 0, then (F(Sa ) and o(S p) are both finite if and only if

(~,'(X2 ) < 00.

PROOF. The assertion (i) is a consequence of (18) and Wald's equation(Theorem 5 .5.3 and Exercise 8 of Sec . 5.5) . The "if' part of assertion (ii)has been proved in the preceding theorem ; indeed we have even "evaluated"

(Sa ) . To prove the "only if' part, we apply (11) to both a and ,8 in thepreceding proof and multiply the results together to obtain the remarkableequation :

(29) [ 1 - c { . a e ttsa}]{l - C`{rIe`rs0}] = exp -

Setting r = 1, we obtain for t ; 0:

1 - f (t)

1 - c'{eitS-} 1 - (`L{el tSf}

t2

-it

+it

Letting t ,(, 0, the right member above tends to (-,"ISa}c`'{-S~} by Theorem 6 .4 .2 .Hence the left member has a real and finite limit . This implies (`(X2) < 00 byExercise 1 of Sec . 6.4 .

8 .5 Continuation

Our next study concerns the r .v .'s Mn and M defined in (12) and (13) ofSec . 8.2. It is convenient to introduce a new r.v . L, which is the first time

-f (t)n = 1 - rf (t) .

Page 313: A Course in Probability Theory KL Chung

(necessarily belonging to N,,O ) that Mn is attained :

(1)

Vn ENO : L,(co) = min{k E N° : Sk(w) = M n ((0)} ;

note that Lo = 0. We shall also use the abbreviation

a = a(O,00),

8 = a(-00,0](2)

as in Theorems 8 .4.6 and 8 .4.7 .For each n consider the special permutation below :

(3)

pn_

l,

2,

nn

n-1

1)',

which amounts to a "reversal" of the ordered set of indices in Nn . Recallingthat 8 ° pn is the r .v . whose value at w is 6(p n w), and Sk ° pn = Sn - Sn-kfor 1 < k < n, we have

n

{~ ° pn > n } = n {Sk ° pn > 0}k=1

n

n-1

= n {Sn > Sn-k) = n {Sn > Sk } _ {L n = n} .k=1

k=0

It follows from (8) of Sec . 8 .1 that

(4)

e'tsn d 7 =f

e` t(s n°Pn ) dJi = f

e'tsn dg') .4>n}

{~°p, >n}

{Ln=n}

Applying (5) and (12) of Sec . 8 .4 to fi and substituting (4), we obtain

(5)00

00`Y ,1rn

e'tsn d ;? = exp +L_

e'tsn d9' ;n =o

{L,=n}

n=1 n {Sn>0}

applying (5) and (12) of Sec . 8.4 to a and substituting the obvious relation{a > n } = {L,, = 0}, we obtain

00

00(6)

>-• rn

f

e' ts n d:%T = exp +n=0 {L,=O}

(7)

8.5 CONTINUATION 1 299

r'l

,~e't.Sn dJ-n

{sn <oi

We are ready for the main results for M„ and M, known as Spitzer's identity :

Theorem 8.5.1 . We have for 0 < r < 1 :00

00

r" (`'{e'rM„ } = expn=0

n=

nr

(e'ts+ )

n

Page 314: A Course in Probability Theory KL Chung

300 1 RANDOM WALK

M is finite a .e. if and only if00

(8)

En

-'~PflSn > 0} < oo,n=1

in which case we have00

{e"}= exp E n

[,9(eits„) - 1]n=1

(9)

PROOF . Observe the basic equation that follows at once from the meaningof Ln :

(10)

IL, = k} _ {Lk = k} n {Lf_k ° z k = 0},

where zk is the kth iterate of the shift . Since the two events on the right sideof (10) are independent, we obtain for each n E N°, and real t and u :

n

(11)

{eirMneiu(sn -Mn ) } =>

eitskeiu(sn-Sk)d Pkk=0 {Ln=k}

n=0

J_f eitsk d~

el"(sn-koTk ) d~/k=0 {Lk=k}

{Ln-k~ik=0}n

f

=

eitsk dGJ'J

eiusn-k dJ~'A

k=0 {Lk=k}

{Ln-k=0}

It follows that00

(12)

rn {eitMneiu(Sn -Mn ) }

n=0

00

00n f

eitsn dc?P . N7 r' f

eiusn d°J' .Ln=n}

n=0

{Ln=O}

Setting u = 0 in (12) and using (5) as it stands and (6) with t = 0, we obtain00

E rn {eitMn } = exp

- is eirsn dP +f

1dGJ'n=0

n=

n >0}

{Sn _0}

which reduces to (7) .Next, by Theorem 8 .2.4, M < oo a.e. if and only if J'{a < oc} < 1 or

equivalently by (14) of Sec . 8.4 if and only if (8) holds . In this case theconvergence theorem for ch .f.'s asserts that

{eitM} = lim { eitMn },n-+00

Page 315: A Course in Probability Theory KL Chung

and consequently

[ {ettM} = lim(l -rT1

00

n=O

r" c(`{e1tMn}

= lim exp - - exprTl

n

nn=1

n=

00

= lim exp

- [ '(e`tsn) - 1 ]rTl

nn=1

where the first equation is by proposition (B) in Sec . 8.4. Since

EI

I c"(ettsn) - 1

00?'P[Sn > 0] < 00

n=1

n=1

by (8), the last-written limit above is equal to the right member of (9), byproposition (B) . Theorem 8 .5 .1 . is completely proved .

By switching to Laplace transforms in (7) as done in the proof ofTheorem 8 .4 .6 and using proposition (E) in Sec. 8.4, it is possible to givean "analytic" derivation of the second assertion of Theorem 8 .5 .1 withoutrecourse to the "probabilistic" result in Theorem 8 .2.4. This kind of mathemat-ical gambit should appeal to the curious as well as the obstinate ; see Exercise 9below . Another interesting exercise is to establish the following result :

n

(13)

~(Mn)

k f(Sk)k=1

by differentiating (7) with respect to t . But a neat little formula such as (13)deserves a simpler proof, so here it is .

Proof of (13) . Writingn

n

Mn=VS;= VSj=0

i=(1

1

and dropping "dy" in the integrals below :n

C '(Mn) =JIM,,

V Sk>O} k=1

n

n

X1+ 0 V (Sk - X1) +I

Mn

Sk{M n >O;S n >O}

k=2

{M n >O;Sn <O} k=1

8.5 CONTINUATION 1 301

n

n-1

X1+J

0V(Sk-X1)+ f

V Sk .{S„ >0}

{Sn >0}

k=2

{Mn_1>O;Sn <0) k=1

Page 316: A Course in Probability Theory KL Chung

302 1 RANDOM WALK

Call the last three integrals fl , f2 , and f3 . We have on grounds of symmetry

1

nJ //

Sn .J {s n >o }

Apply the cyclical permutation

1,2, . . .,n2,3, . . ., 1

to f2 to obtain

//,

Mn-12 = J is n >0}

Obviously, we have

1 _

Mn- 1 =I Mn-1 •3

{Mn_1 >O;Sn <0)

S„ <0)

Gathering these, we obtain

(Mn ) = /~, Sn +4n>0)<0

}

{

}

= 1&(Sn) + e (Mn-1),nand (13) follows by recursion.

Another interesting quantity in the development was the "number ofstrictly positive terms" in the random walk . We shall treat this by combinatorialmethods as an antidote to the analytic skulduggery above . Let us define

vn (co) = the number of k in Nn such that Sk (w) > 0 ;

vn (w) = the number of k in Nn such that Sk(w) < 0 .

For easy reference let us repeat two previous definitions together with twonew ones below, for n E No :

Mn (w) = max S j (w) ; Ln (w) = min{j E No : Sj (w) = Mn (w)} ;O< J<n

M;T (w) = min Sj (w) ; Ln (w) = max{ j E No : S j (w) = M;, (w)} .0<<- j<<n

Since the reversal given in (3) is a 1-to-1 measure preserving mappingthat leaves S, unchanged, it is clear that for each A E ~&, the measures onJdl below are equal :

(14)

~?P{A ; Sn E .} = ~I{pn A ; Sn E •} .

Mn-1

Page 317: A Course in Probability Theory KL Chung

Lemma. For each n E N°, the two random vectors

(Ln ,Sn ) and (n - Ln ,Sn )

have the same distribution .

PROOF . This will follow from (14) if we show that

(15)

dk E N, : pn 1 {Ln = k} = {Ln = n - k} .

Now for pn cw the first n + 1 partial sums Sj(pn c)), j E N°, are

0,Wn,Wn +&)n-1, . . .,(On + . . .+Wn-j+1, . . .,(On + . . . +w1,

which is the same as

Sn - Sn, Sn - Sn-1, Sn - Sn-2, . . . , Sn - Sn_ .j, . . . , Sn - S°,

from which (15) follows by inspection .

Theorem 8.5.2 . For each n E N°, the random vectors

(16)

(Ln, Sn) and (Vn, Sn)

have the same distribution; and the random vectors

(16')

(Ln, Sn ) and (vn , Sn )

have the same distribution .

PROOF . For n = 0 there is nothing to prove; for n = 1, the assertion about(16) is trivially true since {L1 = 0} = {S1 < 0} = {v 1 = 0) ; similarly for (16') .We shall prove the general case by simultaneous induction, supposing that bothassertions have been proved when n is replaced by n - 1 . For each k E N°- 1and y E ,~W1 , let us put

G(y) = J'{L„-1 = k; S„-1 < y}, H(y) _ ?''{ Vn-1 = k; Sn-1

y} .

Then the induction hypothesis implies that G = H . Since Xn is independentof Jhn _ 1 and so of the vector (Li - 1, Sn -1), we have for each x E 3,' :

w(17)

/5{Llz -1 = k; Sn < x} = f F(x - y) dG(y)00

=1 F(x - y)dH(y)00

= ~P{Vn-1 = k ; S n < x),

8.5 CONTINUATION 1 303

Page 318: A Course in Probability Theory KL Chung

304 1 RANDOM WALK

where F is the common d .f. of each X, . Now observe that on the set {S„ < 0}we have Ln = L,,-, by definition, hence if x < 0 :

(18)

{w : L„ (w) = k; S,, (w) ::S x} = {w : Ln-1 (w) = k ; S n (w) < x} .

On the other hand, on the set {S n < 0} we have vn - 1 = vn , so that if x < 0,

(19)

{w : v„ (w) = k ; S n (w) < x} = {cv : vn - 1 (w) = k; S n (w) < x} .

Combining (17) with (19), we obtain

(20)

Vk E N°, x < 0 : ?T{Ln = k ; Sn < x } = ~{vn = k ; Sn < x } .

Next if k E N°, and x > 0, then by similar arguments :

{w : Ln (w) = n - k; Sn (w) > x} _ {w : Ln_ 1 (w) = n - k; Sn (w) > x},

{w : vn (w) = n - k; Sn (CO) > X)

10-) V'-

Using (16') when n is replaced by n - 1, we obtain the analogue of (20) :

(21) Vk E N°, x > 0 : '{Ln = n - k; Sn > x} = °J'{vn = n - k; S n > x}.

The left members in (21) and (22) below are equal by the lemma above, whilethe right members are trivially equal since vn + vn = n :

(22)

`dk E N°, x > 0 : ~{Ln = k ; S„ > x} _ °J'{v n = k ; Sn > x} .

Combining (20) and (22) for x = 0, we have

(23)

dk E N° : ~{Ln = k} = °'{vn = k} ;

subtracting (22) from (23), we obtain the equation in (20) for x > 0 ; hence itis true for every x, proving the assertion about (16) . Similarly for (16'), andthe induction is complete .

As an immediate consequence of Theorem 8 .5.2, the obvious relation(10) is translated into the by-no-means obvious relation (24) below :

Theorem 8.5.3 . We have for k E N°

(24)

% {v„ = k} = .mil'{vk = k} Z{vn_k = 0} .

If the common distribution of each X„ is symmetric with no atom at zero,then

_ 1

l(25)

Vk E N° : J'{ vn = k} = (- l )n

k2 -

n -2k

Page 319: A Course in Probability Theory KL Chung

PROOF. Let us denote the number on right side of (25), which is equal to

1 2k) (2n - 2k)22n ( k) ( n - k '

by an (k) . Then for each n E N, {an (k), k E N° } is a well-known probabilitydistribution. For n = 1 we have

9'{v1 = 0} = ~P{v1 = 1} = 2 = an (0) = an (n),

so that (25) holds trivially . Suppose now that it holds when n is replaced byn - 1 ; then for k E N,_1 we have by (24) :

1

1

GJ'{vn = k} _ (- 1)k _k2 (-1)n-k n-2k = a„ (k)

It follows that

n-1{vn=0}+°J'{vn=n}_

9~1 v„=k}k=1

n-1

1 -

an (k) = a 12 (0) + a n (n) .k=1

Under the hypotheses of the theorem, it is clear by considering the dual randomwalk that the two terms in the first member above are equal ; since the twoterms in the last member are obviously equal, they are all equal and thetheorem is proved .

Stirling's formula and elementary calculus now lead to the famous "arcsinlaw", first discovered by Paul Levy (1939) for Brownian motion .

Theorem 8.5.4 . If the common distribution of a stationary independentprocess is symmetric, then we have

J ( Vn <

2

1 ~'X

dubx E [0, 1 ] : lim .~'

x = - arc sin x _ -n->oo

n

7

TC p ,/u(l - u)

This limit theorem also holds for an independent, not necessarily stationaryprocess, in which each X n has mean 0 and variance 1 and such that theclassical central limit theorem is applicable . This can be proved by the samemethod (invariance principle) as Theorem 7 .3 .3 .

8.5 CONTINUATION 1 305

Page 320: A Course in Probability Theory KL Chung

306 1 RANDOM WALK

EXERCISES

1 . Derive (3) of Sec . 8.4 by considering

and deduce that

1

00

?~P[M = 0] = exp -

1GJ'[Sn > 0]

6. If M < oo a.e ., then it has an infinitely divisible distribution .7. Prove that

T {Ln = 0; Sn = 0} =exp -?,'[S, = 0]n=0

n= n

[HINT: One way to deduce this is to switch to Laplace transforms in (6) ofSec. 8 .5 and let a, -+ oo.]

8. Prove that the left member of the equation in Exercise 7 is equal to

00

g{raeus ° } =

11

eirsn d?' -

eirsn d ?P .{a>n-1}

{a>n}n=1

*2 . Under the conditions of Theorem 8 .4.6, show that

{Sa } _ - 11m

Sn d GJ' .n-+oo {,>n}

3. Find an expression for the Laplace transform of Sa . Is the corre-sponding formula for the Fourier transform valid?

*4. Prove (13) of Sec . 8.5 by differentiating (7) there; justify the steps .5. Prove that

100

n=1

where a' = a{0, 00 ) ; hence prove its convergence .*9. Prove that 00

E IJ'[Sn > 0] < 00

In=1

J'{a' = n ; S n = 0}

Page 321: A Course in Probability Theory KL Chung

8 .5 CONTINUATION 1 307

implies JT(M < oc) = 1 as follows . From the Laplace transform version of(7) of Sec . 8.5, show that for A > 0,

exists and is finite, say = X(A) . Now use proposition (E) in Sec . 8 .4 and applythe convergence theorem for Laplace transforms (Theorem 6.6.3) .

*10. Define a sequence of r .v .'s {Yn , n E N° ) as follows :

where {Xn , n E N) is a stationary independent sequence. Prove that for eachn, Yn and Mn have the same distribution . [This approach is useful in queuingtheory.]

11. If -oo < c,,'~' (X) < 0, then

(-,"(X)

where V = sup Sj .1<_,j<oo

[HINT : If Vn = maxi<j< n Sj, then

(e`tv°) + Ae `ty" ) - 1 = A(env ) = Aeixv°_')f(t) ;

let n -+ oo, then t ,, 0. For the case A(X) = -oo, truncate . This result is dueto S . Port .]

*12. If

< oo} < 1, then v, -+ v, L n -+ L, both limits being finitea.e . and having the generating functions :

00 n{r"} _

~~{ } = exp

r - 1Zl'[Sn > 0] -

n

[HINT : Consider lim n,,o En°_o G.Y'{v,n = n }rn and use (24) of Sec . 8.5 .]

13. If ~(X) = 0, (X2 ) = U2 , 0 < a2 < oc, then as n - oo we haveec

e-C~[Vn

= 0] ^nn , 'P[Vn = n]

nn,

where

[HINT : Consider

Yo = 0,

00lim(1 -

n n{e -; ^ }rTl

Yn+1 = (Yn + Xn+l)+ , n E N° ,

°O 1

1

lc-~n 2

-~T[S„ >0]} .n=1 JJ~

lim(1 -rT1

)1

n=0

n=1

002

n =O

"'~P'

[vn = 0]

Page 322: A Course in Probability Theory KL Chung

308 1 RANDOM WALK

as in the proof of Theorem 8 .4.6, and use the following lemma : if p„ is adecreasing sequence of positive numbers such that

n

Pk - 2n 1/z

k=1

then p„ ^- n - 1/2 .]

* 14 . Prove Theorem 8 .5 .4 .15. For an arbitrary random walk, we have

(-1)n .

1'{Sn > 0) < oc .n n

[HINT : Half of the result is given in Exercise 18 of Sec. 8 .3 . For the remainingcase, apply proposition (C) in the 0-form to equation (5) of Sec . 8 .5 with L„replaced by vn and t = 0. This result is due to D . L. Hanson and M . Katz.]

Bibliographical Note

For Sec. 8 .3, see

K. L. Chung and W. H. J . Fuchs, On the distribution of values of sums of randomvariables, Mem. Am. Math. Soc. 6 (1951), 1-12 .

K. L. Chung and D . Ornstein, On the recurrence of sums of random variables,Bull . Am. Math. Soc . 68 (1962) 30-32 .

Theorems 8 .4.1 and 8 .4.2 are due to G . Baxter; see

G. Baxter, An analytical approach to finite fluctuation problems in probability,J. d' Analyse Math. 9 (1961), 31-70 .

Historically, recurrence problems for Bernoullian random walks were first discussedby Polya in 1921 .The rest of Sec . 8.4, as well as Theorem 8.5 .1, is due largely to F . L. Spitzer; see

F. L. Spitzer, A combinatorial lemma and its application to probability theory,Trans . Am. Math. Soc. 82 (1956), 323-339 .

F. L. Spitzer, A Tauberian theorem and its probability interpretation, Trans . Am .Math. Soc . 94 (1960), 150-169 .

See also his book below-which, however, treats only the lattice case :

F. L. Spitzer, Principles of random walk . D . Van Nostrand Co., Inc ., Princeton,N.J ., 1964 .

Page 323: A Course in Probability Theory KL Chung

8.5 CONTINUATION 1 309

The latter part of Sec . 8.5 is based on

W. Feller, On combinatorial methods in fluctuation theory, Surveys in Proba-bility and Statistics (the Harald Cramer volume), 75-91 . Almquist & Wiksell,Stockholm, 1959 .

Theorem 8 .5.3 is due to E . S . Andersen . Feller [13] also contains much material onrandom walks.

Page 324: A Course in Probability Theory KL Chung

9 Conditioning . Markovproperty. Martingale

9 .1 Basic properties of conditional expectation

If A is any set in 7 with T(A)>O, we define -A (•) on ~~7 as follows :

(1)

~'~'n(E) _

Clearly ?An is a p .m. on c ; it is called the "conditional probability relative toA" . The integral with respect to this p .m. is called the "conditional expectationrelative to A" :

(2)

(-A (Y) _

Y(w)AA(dw) _ ~~~A) f Y(co)~'(dco) .

If //(A) = 0, we decree that :%$A (E) = 0 for every E E T. This convention isexpedient as in (3) and (4) below .

Let now {A,,, n > 1} be a countable measurable partition of Q, namely :

OcQ = U An,

n=1

A n E)

~' D(A)

A,EJ~,

A,n nA,=o,

if m=n .

Page 325: A Course in Probability Theory KL Chung

Then we have

(4)

(-"(Y) =

9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 31 1

00

(3)

;!'//(E) =

7'(A n n E) =

(An ) JPA n (E) ;n=1

Y(& (dw) =

provided that C "(Y) is defined. We have already used such decompositionsbefore, for example in the proof of Kolmogorov's inequality (Theorem 5 .3 .1) :

n

f SndT =D(Ak ) cnA (Sn ) .A

k=1Another example is in the proof of Wald's equation (Theorem 5 .5.3), where

00

00

n=100

?/)(A n )UA„ (Y),n=1

e(SN) =k=1

Thus the notion of conditioning gives rise to a decomposition when a givenevent or r .v. is considered on various parts of the sample space, on each ofwhich some particular information may be obtained .

Let us, however, reflect for a few moments on the even more elementaryexample below . From a pack of 52 playing cards one card is drawn and seento be a spade . What is the probability that a second card drawn from theremaining deck will also be a spade? Since there are 51 cards left, amongwhich are 12 spades, it is clear that the required probability is 12/51 . But isthis the conditional probability 7'A (E) defined above, where A = "first cardis a spade" and E = "second card is a spade"? According to the definition,

13 .12:JP(A n E) - 52.51

12^A) - 13

51'

T (N = k) 0'{N=k) (SN ) .

52where the denominator and numerator have been separately evaluated byelementary combinatorial formulas . Hence the answer to the question above isindeed "yes"; but this verification would be futile if we did not have anotherway to evaluate J''A (E) as first indicated. Indeed, conditional probability isoften used to evaluate a joint probability by turning the formula (1) around asfollows :

13 12'~(A n E) = J-P(A)J~)A(E) = 52 51

In general, it is used to reduce the calculation of a probability or expectationto a modified one which can be handled more easily .

Page 326: A Course in Probability Theory KL Chung

312 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Let be the Borel field generated by a countable partition {A n }, forexample that by a discrete r .v . X, where An = {X = a n } . Given an integrabler.v . Y, we define the function c` (Y) on 0 by:

(5)

4(Y) _

A n (Y)IA, ( •) •n

Thus c'.,- (Y) is a discrete r .v . that assumes that value en,, (Y) on the set A,for each n . Now we can rewrite (4) as follows :

n

(Y) =

f A47(Y) dJ' = [4(Y)d .9'

Furthermore, for any A E -,07 A is the union of a subcollection of the A n 's(see Exercise 9 of Sec . 2.1), and the same manipulation yields

(6)

VA E : f Y dJ' = f 4~(Y) dJ' .JA

n

In particular, this shows that 4g?(Y) is integrable. Formula (6) equates twointegrals over the same set with an essential difference : while the integrandY on the left belongs to 3 ;7, the integrand 4~(Y) on the right belongs to thesubfield -9. [The fact that 4(Y) is discrete is incidental to the nature of i'.]It holds for every A in the subfield -,6% but not necessarily for a set in 97\6 ? .Now suppose that there are two functions Cpl and ~02, both belonging to a7 ,

such thatVA E mar : f Y d~' = f gyp; dam', i = 1, 2.

JA

A

Let A = {w : cp l (w) > cp2(w)}, then A E -' and so

fn ((pi - ~O 2) dGJ' = 0 .

Hence 9'(A) = 0; interchanging cpi and cp2 above we conclude that cpi = c02

a.e. We have therefore proved that the 4(Y) in (6) is unique up to an equiv-alence . Let us agree to use 4(Y) or d (Y I ;!~`) to denote the correspondingequivalence class, and let us call any particular member of the class a "version"of the conditional expectation .

The results above are valid for an arbitrary Borel subfield ;!~' and will bestated in the theorem below .

Theorem 9.1.1 . If c ( I Y I) < oo and ;(;'is a Borel subfield of IT, then thereexists a unique equivalence class of integrable r .v.'s S(Y 1 ~6) belonging to Isuch that (6) holds .

Page 327: A Course in Probability Theory KL Chung

PROOF . Consider the set function v on ;~7 :

VA E r~' : v(A) = f Y dP.IIt is finite-valued and countably additive, hence a "signed measure" on ,<s .

If J'(A) = 0. then v(A) = 0 ; hence it is absolutely continuous with respectto ~T : v << 7'. The theorem then follows from the Radon-Nikodym theorem(see, e.g ., Royden [5] or Halmos [4]), the resulting "derivative" dv/dJ beingwhat we have denoted by (,"(Y 1 ;6' ) .

Having established the existence and uniqueness, we may repeat the defi-nition as follows .

DEFINITION OF CONDITIONAL EXPECTATION . Given an integrable r.v . Y and aBorel subfield , the conditional expectation C,',(Y 1 ~) of Y relative to - isany one of the equivalence class of r .v .'s on Q satisfying the two properties :

(a) it belongs to 6' ;(b) it has the same integral as Y over any set in .

We shall refer to (b), or equivalently formula (6) above, as the "definingrelation" of the conditional expectation . In practice as well as in theory, theidentification of conditional expectations or relations between them is estab-lished by verifying the two properties listed above. When Y = 1 0 , whereA E & , we write

°P (A I ~') = (,~`(1 o I ~)

and call it the "conditional probability of A relative to " . Specifically, J' (A If;) is any one of the equivalence class of r .v.'s belonging to ;,;' and satisfyingthe condition

(7)

9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 313

VA E ;-~' : ~'(A (1 A) = f '(A I ;T) dam' .JA

It follows from the definition that for an integrable r .v . Y and a Borelsubfield .~, we have

fn [Y - (~(Y I r )] d:~ = 0,

for every A E .,'/, and consequently also

{[Y - <<(Y 1 fi)]Z} = 0

for every bounded Z E ff' (why?) . This implies the decomposition :

Y = Y' + Y" where Y' = ((Y I ~) and Y" 1 ',

Page 328: A Course in Probability Theory KL Chung

31 4 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

where "Y" 1 -" means that a (Y"Z) = 0 for every bounded Z E ,6' . In thelanguage of Banach space, Y' is the "projection" of Y on -,,~' and Y" its "orthog-onal complement" .

For the Borel field ~F{X} generated by the r .v . X, we write also (`(Y I X)for ~, (Y I ZIF{X}) ; similarly for I(Y I X 1 , . . ., X„). The next theorem clarifiescertain useful connections .

Theorem 9 .1.2 . One version of the conditional expectation 'Z'(Y I X) is givenby cp(X ), where cp is a Borel measurable function on R 1 . Furthermore, if wedefine the signed measure A on _,~V by

dB E 0 : X (B) = I YY dam',x- ' (B)

and the p.m. of X by g, then (p is one version of the Radon-Nikodym deriva-tive d),/dg .

PROOF. The first assertion of the theorem is a particular case of thefollowing lemma .

Lemma. If Z E i {X}, then Z = ~p(X) for some extended-valued Borelmeasurable function cp .

PROOF OF THE LEMMA . It is sufficient to prove this for a bounded positive Z(why?) . Then there exists a sequence of simple functions Z m which increasesto Z everywhere, and each Z m is of the form

e~ c j l njj=1

where A j E 3?7{X} . Hence A j = X-1 (Bj ) for some Bj E A1 (see Exercise 11of Sec . 3 .1) . Thus if we take

e

j=1

we have Zm = (p,, (X ) . Since corn (X) -> Z, it follows that corn converges on therange of X . But this range need not be Borel or even Lebesgue measurable(Exercise 6 of Sec . 3 .1) . To overcome this nuisance, we put,

t1x E R1 : cp(x) = hm (pm (x) .M_+00

Then Z = limm co rn (X) = cp(X ), and cp is Borel measurable, proving the lemma .To prove the second assertion: given any B E ?A1 , let A = X-1 (B), then

by Theorem 3 .2.2 we have

Ac(Y I X)dJ'= f IB(X)co(X)d"~'= / 1B(x)co(x)dg = f cp(x)dg)

n

n

Page 329: A Course in Probability Theory KL Chung

Hence by (6),

9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 31 5

X(B) = / Yd°J'=1

cp(x)dµ)JA

B

This being true for every B in 73 1 , it follows that cp is a version of the derivativedX/dµ . Theorem 9 .1 .2 is proved .

As a consequence of the theorem, the function u(Y I X) of w is constanta.e. on each set on which X (w) is constant . By an abuse of notation, the co(x)above is sometimes written as S°(Y I X = x) . We may then write, for example,for each real c :

Y dGJ' =1-00,C]

~(Y I X = x) dGJ'{X < x} .{X_4 :50

Generalization to a finite number of X's is straightforward . Thus oneversion of (9(Y I X 1 , . . ., X,) is cp(X 1 , . . ., X,), where co is an n-dimensionalBorel measurable function, and by m`(Y I X1 = xl , . . . , X, = xn ) is meant(p(xl,

, xn)-It is worthwhile to point out the extreme cases of ~, (Y I ;6?) :

°(Y if) = e(Y), m I ~) = Y;

a.e .

where / is the trivial field {0, Q} . If ' is the field generated by one setA : {P , A, A', S2}, then e(Y I -6-7) is equal to &(Y I A) on A and e(Y I Ac )on A` . All these equations, as hereafter, are between equivalent classes ofr.v .'s .

We shall suppose the pair (.97, -OP) to be complete and each Borel subfieldof JF to be augmented (see Exercise 20 of Sec . 2.2) . But even if is

not augmented and ~ is its augmentation, it follows from the definition thatdt(Y I ) = i(Y I ), since an r .v . belonging to 3~ is equal to one belongingto ;6~ almost everywhere (why ?) . Finally, if ~j-o is a field generating a7 , or justa collection of sets whose finite disjoint unions form such a field, then thevalidity of (6) for each A in -fo is sufficient for (6) as it stands. This followseasily from Theorem 2.2.3 .

The next result is basic .

Theorem 9.1.3 . Let Y and YZ be integrable r .v .'s and Z E

then we have

(8)

c,`(YZ I ) = Zt(Y I -.6-7 ) a.e .

[Here "a .e ." is necessary, since we have not stipulated to regard Z as anequivalence class of r.v.'s, although conditional expectations are so regardedby definition . Nevertheless we shall sometimes omit such obvious "a.e.'s"from now on.]

Page 330: A Course in Probability Theory KL Chung

31 6 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

PROOF . As usual we may suppose Y > 0, Z > 0 (see property (ii) below) .The proof consists in observing that the right member of (8) belongs to andsatisfies the defining relation for the left member, namely :

(9)

VA E ;!~' : f ZC'(Y I -°) d P = f ZY dJ' .I

Jn

For (9) is true if Z = 1 0 , where A E f, hence it is true if Z is a simpler.v. belonging to ;6' and consequently also for each Z in § by monotoneconvergence, whether the limits are finite or positive infinite . Note that theintegrability of Z (-r'(Y 1 §) is part of the assertion of the theorem .

Recall that when § is generated by a partition {A n }, we have exhibited aspecific version (5) of (t(Y 1 §) . Now consider the corresponding ~?P(M I }

as a function of the pair (M, w) :

9(M 1 -) ( )

?(M I An)1A (w) •n

For each fixed M, as a function of w this is a specific version of °J'(M 1 -9) .

For each fixed wo, as a function of M this is a p .m. on .T given by -"P{ • I Am }for wo E A,n . Let us denote for a moment the family of p .m.'s arising in thismanner by C(wo, •) . We have then for each integrable r .v . Y and each wo in Q :

(10)

(r (Y I §)(wo)

dt'(Y I An ) 1 A„ (WO) _ f YC(wo , do)) .n

Thus the specific version of (-f'(Y I §) may be evaluated, at each COQ E Q, byintegrating Y with respect to the p.m. C(a)o, •) . In this case the conditionalexpectation 6`(- 1 - ') as a functional on integrable r.v .'s is an integral in theliteral sense. But in general such a representation is impossible (see Doob[16, Sec . 1 .9]) and we must fall back on the defining relations to deduce itsproperties, usually from the unconditional analogues . Below are some of thesimplest examples, in which X and X n are integrable r.v.'s .

(i) If X E §', then f' (X 1 §) = X a.e . ; this is true in particular if X is aconstant a.e .

(ii) (P (XI + X2 I §) _ (1V1 I §) + 6"- (X2

(iii) If X1 < X2, then (`(X i I §) < c`(X2 16') .

(iv) W (X I {)I <_(`0X1 I

(v) If X, T X, then L (Xn I

(((X I

(vi) If X„ X, then f (Xn I fi ') . (`'(X I ~) .

Page 331: A Course in Probability Theory KL Chung

9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 317

(vii) If J X, I < Y where c`(Y) < oo and X„ -+ X, then ((X„ I l')~` (X 1 10.

To illustrate, (iii) is proved by observing that for each A E <:

f ~f(X i I ~)dJT=f X i d2 <fX2dT=f ~(X2 I {~)dT.

n

n

n

n

Hence if A = {~ (X1 I ) > ~`'(X2 I )}, we have 7'(A) = 0. The inequality(iv) may be proved by (ii), (iii), and the equation X = X+ - X- . To prove(v), let the limit of f (X„ 1 ~) be Z, which exists a .e. by (iii) . Then for eachA E {, we have by the monotone convergence theorem :

J Z dT = limf

' ~ (X„ I ~) dT = limf

X„ dT = f X dT.A

n A

" A

A

Thus Z satisfies the defining relation for o (X 1 6'), and it belongs to 6' withthe F (X„ 16')'s, hence Z = e(X 16') .

To appreciate the caution that is necessary in handling conditional expec-tations, let us consider the Cauchy-Schwarz inequality :

(IXYI 16)2 < e(X2 I -6')A(Y 2 I ') .

If we try to extend one of the usual proofs based on the positiveness of thequadratic form in A : (`((X + AY)2 16'), the question arises that for each ),the quantity is defined only up to a null set N;,, and the union of these overall A cannot be ignored without comment . The reader is advised to think thisdifficulty through to its logical end, and then get out of it by restricting theA's to the rationals . Here is another way out : start from the following trivialinequality :

IXIIYI

X2

y2

up - 2a2 + 2fl 2 '

where a = (X 2 1 6)12 , ,B = C~(Y2 16) 1 / 2 , and a,B>0; apply the operation{- 16} using (ii) and (iii) above to obtain

IXYIca,8

1

X2

1

Y2

< 2 c( a2 l~ + Zen~2

Now use Theorem 9 .1 .3 to infer that this can be reduced to2

2

{IXYIIf}< -a2 +2 2=1,

ufi

the desired inequality .The following theorem is a generalization of Jensen's inequality in Sec . 3.2 .

Page 332: A Course in Probability Theory KL Chung

318 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

Theorem 9.1.4 . If (p is a convex function on Ml and X and (p(X) are inte-grable r.v.'s, then for each ~ :

(11)

(p(~ (X I s)) < ((P (X) I {)

PROOF. If X is a simple r.v . taking the values {yj) on the sets {Aj ), 1 <j < n, which forms a partition of Q, we have

where E"= , ?;~(Aj I fi) = 1 a.e. Hence (11) is true in this case by the propertyof convexity. In general let {Xm } be a sequence of simple r .v.'s converging toX a.e. and satisfying 1X m I < IXI for all m (see Exercise 7 of Sec . 3 .2) . If welet m -+ oo below :

(12)

(P((-(X.1 7)) < Aco(Xm) 1 ;!!0'

the left-hand member converges to the left-hand member of (11) by the conti-nuity of (p, but we need dominated convergence on the right-hand side . To getthis we first consider (pn which is obtained from (p by replacing the graph of(p outside (-n, n) with tangential lines . Thus for each n there is a constantCn such that

b'x E W 1 : I(pn(x)I < Cn(IxI + 1 ) .

Consequently, we have

Icon(Xm)I < Cn(IX,I + 1) < Cn(IXI + 1)

and the last term is integrable by hypothesis . It now follows from property(vii) of conditional expectations that

urn C '~' (con(Xm) I f) = c (cn(X) I f)M- 00

This establishes (11) when (p is replaced by (pn . Letting n - oc we have(p„ T (p and (p„ (X) is integrable ; hence (11) follows for a general convex (p,by monotone convergence (v) .

Here is an alternative proof, slightly more elegant and more delicate . Wehave for any x and y :

P(x) -w(y) ? (p, (y)(x - y)

n

( (XI ;(;) = yj,°T(A j I )j=1

n

(`((P(X ) 1 ) _ (p(yj)~P(A j 1 _1~7)1

j=1

Page 333: A Course in Probability Theory KL Chung

9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 319

where cP' is the right-hand derivative of co . Hence

(P(X) - (p((r`'(X I ~)) >_ ~P'(~`(X I ~)){X - ~`(X

)~

The right member may not be integrable; but let A = {w : I (r'(X I ) I < A}for A>0. Replace X by X1 A in the above, take expectations of both sides, andlet A T oc. Observe that

& {1P(XlA) I --67 1 = C{cP(X)lA +(p(0)1A' I ~} = ~'{cP(X) I I}ln +cP(0)lAc .

We now come to the most important property of conditional expectationrelating to changing fields . Note that when A = Q, the defining relation (6)may be written as

051((-rle(Y)) = (1-5r (Y)

(Y))-

This has an immediate generalization .

Theorem 9.1.5 . If Y is integrable and 31 C ~z, then

(13)

(Y) _

(Y)

if and only if c (Y) E ;

and

(14)

~(c (Y)) = ( (Y) _ (~5 (~ (Y))-

PROOF . Since Y satisfies trivially the defining relation for c'- (Y I 1 ), itwill be equal to the latter if and only if Y E 31 . Now if we replace our basic

by ~Z and Y by : (Y), the assertion (13) ensues . Next, since

c`;, (Y) E 1 C 2,

the second equation in (14) follows from the same observation . It remainsto prove the first equation in (14) . Let A E ~i71 , then A E 12 ; applying thedefining relation twice, we obtain

n14((`~(Y))dJT = f c~ (Y)dUl`

JA J Yd~ .JA

Hence e~, (~ (Y)) satisfies the defining relation for c% (Y) ; since it belongsto T1 , it is equal to the latter .

As a particular case, we note, for example,

(15)

{c`(Y I X1, X2) I X1} = c%~(Y I X1) = (f{(=`(Y IX1) I X1, X2}

To understand the meaning of this formula, we may think of X 1 and X2 asdiscrete, each producing a countable partition . The superimposition of both

Page 334: A Course in Probability Theory KL Chung

320 I CONDITIONING. MARKOV PROPERTY. MARTINGALE

partitions yields sets of the form {Aj fl Mk} . The "inner" expectation on theleft of (15) is the result of replacing Y by its "average" over each Ajfl Mk .Now if we replace this average r .v. again by its average over each A j , thenthe result is the same as if we had simply replaced Y by its average over eachAj . The second equation has a similar interpretation .

Another kind of simple situation is afforded by the probability triple(Jlln, J3", m") discussed in Example 2 of Sec . 3.3 . Let x1, . . . , xn be the coor-dinate r.v.'s . .. = f (x 1 , . . . , xn ), where f is (Borel) measurable and inte-grable. It is easy to see that for 1 < k < n - 1,

P1

PI

e(YIxI, . . .,xk)= J

. ..J

f(xl, . . .,xn)dxk+l . . .dxn ,0

0

while for k = n, the left side is just y (a.e .). Thus, taking conditional expec-tation with respect to certain coordinate r.v.'s here amounts to integrating outthe other r.v.'s . The first equation in (15) in this case merely asserts the possi-bility of iterated integration, while the second reduces to a banality, which weleave to the reader to write out .

EXERCISES

1 . Prove Lemma 2 in Sec . 7.2 by using conditional probabilities .2. Let [A,,) be a countable measurable partition of 0, and E E 3 with

~P(E)>O ; then we have for each m :1:"'(A. )EAm (E)

~E(A.) _E_1~6(An)r"A„(E)n

[This is Bayes' rule .]*3. If X is an integrable r.v ., Y a bounded r.v., and ' a Borel subfield,

then we have{e(X I S)Y} _ '{X (Y I )}

4. Prove Fatou's lemma and Lebesgue's dominated convergence theoremfor conditional expectations .

*5. Give an example where (-"(("(y I X 1 ) I X2)

(c (Y I X2) I X1)-[HINT: It is sufficient to give an example where (t (X I Y) =,4 ({(r'(X I Y) I X} ;consider an Q with three points .]

* 6. Prove that a 2 (c (Y)) < a2(y), where a2 is the variance .7. If the random vector has the probability density function p( •, •) and

X is integrable, then one version of ('V I X + Y = z) is given by

fxp(xz , - x) dx/I

p(x, z - x) dx .

Page 335: A Course in Probability Theory KL Chung

9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 321

*8. In the case above, there exists an integrable function cp( •, •) with theproperty that for each B E A1 ,

fB

~p(x, y) d y

is a version of 7'{Y E B I X = x} . [This is called a "conditional density func-tion in the wide sense" and

n

fooco (x, y) d y

is the corresponding conditional distribution in the wide sense :

cp(x, n) = GJ'{Y < r? I X = x} .

The term "wide sense" refers to the fact that these are functions on R1 ratherthan on 0 ; see Doob [1, Sec . I.9] .]

9. Let the p(., •) above be the 2-dimensional normal density :

1

1

x2

2pxy y2 exp -2 2 -

+ 22rra1Q2\/1 - p2

2(1 - P) c1 910'2 0'2

where a1>0, 62>0, 0 < p < 1 . Find the qo mentioned in Exercise 8 and

foo y~p(x, y) dy •

The latter should be a version of d (Y I X = x); verify it.10 . Let ~ be a B .F., X and Y two r .v.'s such that

(Y2 I -C_/) = X2,

(Y I ) = X.

Then Y = X a.e .11. As in Exercise 10 but suppose now for any f E CK ;

{X2 I f (X)j = (,,{Y2 I f (X)}; M I f (X)} = e{Y I f (X )}

Then Y = X a.e . [mNT: By a monotone class theorem the equations hold forf = I B, B E 276 1 ; now apply Exercise 10 with = 3T{X} .]

12. Recall that X, in L 1 converges weakly in L 1 to X iff 1(X, Y) -a~(XY) for every bounded r.v . Y. Prove that this implies c`'(X„ 16') convergesweakly in L1 to (,(X 1 -,6") for any Borel subfield E of 3T .

*13. Let S be an r .v . such that 7 S > t) = e -`, t>0 . Compute d'{S I SAt}

and c`{S I S v t} for each t > 0 .

Page 336: A Course in Probability Theory KL Chung

322 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

9.2 Conditional independence ; Markov property

In this section we shall first apply the concept of conditioning to independentr.v.'s and to random walks, the two basic types of stochastic processes thathave been extensively studied in this book ; then we shall generalize to aMarkov process. Another important generalization, the martingale theory, willbe taken up in the next three sections .

All the B .F.'s (Borel fields) below will be subfields of 9 . The B .F.'s{tea , a E A), where A is an arbitrary index set, are said to be conditionally inde-pendent relative to the B .F. ;!~_, iff for any finite collection of sets A 1 , . . . , A,such that A j E : and the aj 's are distinct indices from A, we have

When is the trivial B .F., this reduces to unconditional independence .

Theorem 9 .2.1 . For each a E A let 3TM denote the smallest B .F. containingall , 8 E A - {a} . Then the 3's are conditionally independent relative tof~, ' if and only if for each a and Aa E " we have

91(Aa I

(a ) V --C/- ) = -OP(Aa I ),

where ZT (" ) V denotes the smallest B .F. containing ~7 (1) and -'6 .

PROOF . It is sufficient to prove this for two B .F.'s 1 and , since thegeneral result follows by induction (how?) . Suppose then that for each A E 1we have

(1)

~(A I J

) _ (A I )

Let M E , then

9)(AM I ) _ {~(AM I

V ) I '} = {,'~P(A I

V §)1M I }I )1M I fi~ } = ~(A I )?)'(M I ),

where the first equation follows from Theorem 9 .1 .5, the second and fourthfrom Theorem 9 .1 .3, and the third from (1) . Thus j and are conditionallyindependent relative to '. Conversely, suppose the latter assertion is true, then

{J'(A I )1M I } = J$(A I ) ~'(M I )= .--/'(AM I fl ) _ (`{U~'(A I ~~ V ~')1M I ~},

Page 337: A Course in Probability Theory KL Chung

9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 1 323

where the first equation follows from Theorem 9 .1 .3, the second by hypothesis,and the third as shown above . Hence for every A E

we have

?-)(A I ~) d J~' _ f J/'(A I J~ v fi) dJ' = T(AMA)MD

MA

It follows from Theorem 2 .1 .2 (take ~ to be finite disjoint unions of sets likeMA) or more quickly from Exercise 10 of Sec . 2.1 that this remains true ifMA is replaced by any set in ~ v ;6 . The resulting equation implies (1), andthe theorem is proved .

When -~' is trivial and each ~ti" is generated by a single r .v., we have thefollowing corollary .

Corollary. Let {X", a E A} be an arbitrary set of r .v.'s. For each a let .~(a)denote the Borel field generated by all the r.v.'s in the set except X" . Thenthe X"'s are independent if and only if : for each a and each B E %31, we have

{X" E B 13 (")} = A{X" E B}

a.e .

An equivalent form of the corollary is as follows: for each integrable r .v .

Y belonging to the Borel field generated by X", we have

(2)

cF{Y I ~(")} = c{Y} .

This is left as an exercise .Roughly speaking, independence among r.v.'s is equivalent to the lack

of effect by conditioning relative to one another. However, such intuitivestatements must be treated with caution, as shown by the following examplewhich will be needed later .

If ~T1, 1-, and ' are three Borel fields such that ., l V .%; is independentof , then for each integrable X E

we have

(3)

{XI~2V%$}_«{XI } .

Instead of a direct verification, which is left to the reader, it is interestingto deduce this from Theorem 9 .2 .1 by proving the following proposition .

If : T v .~ is independent of

then :~-hl and /S are conditionally inde-

pendent relative to % .To see this, let A1 E T, A E . Since

/'(A1A2A3) = .P(A1A )J~(A3) = f(AiJ/' I `~z)'~'(A3)d'~',

for every A2 E . , we have

(AI A3 I '~z) = J/'( A1 I ;z )'' (A3) = °/'(A1 I

I

which proves the proposition .

Page 338: A Course in Probability Theory KL Chung

324 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

Next we ask : if X 1 and X2 are independent, what is the effect of condi-tioning X 1 + X2 by X 1 ?

Theorem 9.2.2 . Let X1 and X2 be independent r .v.'s with p .m.'s µl and µ2 ;then for each B E %3 1 :

(4)

;/'{X1 + X2 E B I X1 } = µ2(B - X 1 ) a.e .

More generally, if {X,,, n > 1) is a sequence of independent r.v.'s with p .m.'s{µn , n > 1), and Sn = 2~=1 Xj , then for each B E Al

(5) A{S n E B I S1, . . . , Sn-1 } _ pn (B - Sn-1) = G1'{Sn E B I Sn-1 } a.e .

PROOF . To prove (4), since its right member belongs to :7 {X 1 }, itis sufficient to verify that it satisfies the defining relation for its leftmember. Let A E Z{X1 }, then A = X1 1 (A) for some A E A1 . It follows fromTheorem 3 .2.2 that

fA µ2(B - X1)d9l = [2(BAµ-xi)µl(dxl) .

Writing A = Al X A2 and applying Fubini's theorem to the right side above,then using Theorem 3 .2.3, we obtain

f µl (dxl)f

A2(dx2) = ff µ(dxl, dx2)A

i+x2EBx, EA

x1 +x2EB

=

dJ' = ~{X1 E A; X, +X 2 E B} .J X~EA

X1 +X2EB

This establishes (4) .To prove (5), we begin by observing that the second equation has just been

proved. Next we observe that since {X 1 , . . . , X,) and {S1, . . . , S, } obviouslygenerate the same Borel field, the left member of (5) is just

Jq'{Sn = B I X1, . . . , Xn-1 } .

Now it is trivial that as a function of (XI, . . ., Xi_ 1 ), S, "depends on themonly through their sum S,_," . It thus appears obvious that the first term in (5)should depend only on Sn _ 1 , namely belong to ~JS,,_,) (rather than the larger:'f {S1, . . . , S i _ 1 }) . Hence the equality of the first and third terms in (5) shouldbe a consequence of the assertion (13) in Theorem 9 .1 .5 . This argument,however, is not rigorous, and requires the following formal substantiation .

Let µ(") = Al x . . . X An =A(n-1)

X An andn-1

A = n S7 1 (Bj ),1=1

Page 339: A Course in Probability Theory KL Chung

where Bn = B. Sets of the form of A generate the Borel field 7{S, . . . , S"-I } .It is therefore sufficient to verify that for each such A, we have

fn

An (Bn - Sn-1) d_,111 = '{A ; Sn E Bn } .

If we proceed as before and write s, = E j=, xj , the left side above is equal to

An (Bn - Sn-1)l(n-1)

(dx1, . . . , dxn-1)

SjEBj,I<_j_<n-1

S1EBj,1 -< j<<n

9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 1 325

nA(n)(&,' . . .' dxn ) = ' n{{s E Bj] ,

j=1

as was to be shown. In the first equation above, we have made use of the factthat the set of (x 1 , . . . , xn ) in In for which Xn E Bn - Sn _1 is exactly the setfor which sn E Bn , which is the formal counterpart of the heuristic argumentabove. The theorem is proved .

The fact of the equality of the two extreme terms in (5) is a fundamentalproperty of the sequence {Sn } . We have indeed made frequent use of thisin the study of sums of independent r.v.'s, particularly for a random walk(Chapter 8), although no explicit mention has been made of it . It would beinstructive for the reader to review the material there and locate some instancesof its application . As an example, we prove the following proposition, wherethe intuitive picture of conditioning is particularly clear .

Theorem 9.2.3 . Let {Xn , n > 1 } be an independent (but not necessarilystationary) process such that for A>0 there exists 3>0 satisfying

inf °'{Xn > A) > S .n

Then we have

`dn > 1 : GJ'{S 1 E (0, A] for I j < n} <- (1 - S)n .

Furthermore, given any finite interval I, there exists an E>O such that

°J'{Sj E I, for 1< j<n}<(1-E)n .

PROOF . We write A n for the event that Sj E (0, A] for 1 < J < n ; then

~'{An } _ _OP( An-1 ; 0 < S n < A) .

By the definition of conditional probability and (5), the last-written probabilityis equal to

Page 340: A Course in Probability Theory KL Chung

326 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

{0 < S„ < A I S1, . . . , S„-1 } d.JT

=

[Fn (A - S„-1) - Fn (0 - Sn-1 )] d GT,

where Fn is the d.f. Of An . [A quirk of notation forbids us to write theintegrand on the right as T {-Sn _i < Xn < A - S,1 1!] Now for each coo inAn-1, Sn-1(wo)>O, hence

Fn (A - Sn-I((oo)) < :flXn < A) < 1 - 8

by hypothesis . It follows that

< f (1 - 8) dam' = (I - 8) GJ'{An-1 },12-1

and the first assertion of the theorem follows by iteration . The second is provedsimilarly by observing that A{Xn + • • • +Xn+m-1 > mA) > 8m and choosingm so that mA exceeds the length of I. The details are left to the reader .

Let N° = {0} U N denote the set of positive integers . For a given sequenceof r.v.'s {Xn , n E N° } let us denote by 1 the Borel field generated by {Xn , n EI}, where I is a subset of N° , such as [0, n], (n, oc), or {n} . Thus {n), [o,,],and

have been denoted earlier by ti{Xn }, 9n , and

respectively .

DEFINITION OF MARKOV PROCESS . The sequence of r.v.'s {Xn , n E N° ) issaid to be a Markov process or to possess the "Markov property" iff forevery n E N° and every B E A1 , we have

(6)

2{X,+1 (E B I Xo, . . . , Xn) = ~P{Xn+1 E B I Xn } .

This property may be verbally announced as : the conditional distribution(in the wide sense!) of each r .v. relative to all the preceding ones is the sameas that relative to the last preceding one. Thus if {Xn } is an independentprocess as defined in Chapter 8, then both the process itself and the processof the successive partial sums {S n } are Markov processes by Theorems 9 .2 .1and 9 .2.2. The latter category includes random walk as a particular case ; notethat in this case our notation X n rather than Sn differs from that employed inChapter 8.

Equation (6) is equivalent to the apparently stronger proposition : forevery integrable Y E ~T(n+ 1), we have

(6)

({Y I X1, . . ., Xn } = (`{Y I X n } .

It is clear that (6) implies (6) . To see the converse, let Ym be a sequence ofsimple r.v .'s belonging to :'%{n+1 ) and increasing to Y . By (6) and property (ii)

Page 341: A Course in Probability Theory KL Chung

of conditional expectation in Sec . 9.1, (6') is true when Y is replaced by Ym ;hence by property (v) there, it is also true for Y .

The following remark will be repeatedly used . If Y and Z are integrable,Z E :'1, and

for each A of the form

9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY I 327

f Yd.JA=f

Zd9'n

n

n X. 1 (Bj),j EIo

uwhere I0 is an arbitrary finite subset of I and each B j is an arbitrary Borel set,then Z = (-r'(Y I d1) . This follows from the uniqueness of conditional expec-tation and a previous remark given before Theorem 9 .1 .3, since finite disjointnions of sets of this form generate a field that generates 3 I .

If the index n is regarded as a discrete time parameter, as is usual inthe theory of stochastic processes, then 9 7[o ,,2 ] is the field of "the past and thepresent", while is that of "the future" ; whether the present is adjoinedto the past or the future is often a matter of convenience . The Markov propertyjust defined may be further characterized as follows .

Theorem 9.2.4 . The Markov property is equivalent to either one of the twopropositions below :

(7)

Vn E N, M E

'{M I ~T[O,n]} = lM I X,, 1 .

(8)

`dn E N, M1 E ~T[O,,i ], M2 E :(n , oo ) : b °{MlM2 I Xn}

= °J'{M1 I Xn }2flM2 I X n }

These conclusions remain true if (,i , oo ) is replaced by

PROOF . To prove (7) implies (8), let Y; = 1M,, i = 1, 2 . We then have

(9)

7'{M1 I Xn} 7{M2 I Xn} _ (`{Y1 I Xn}F~{Y2 I Xn}

_ "{Y1 (`(Y2 I Xn ) I X,,

_ (t {Y1 ( (Y2 I -[0,n]) I Xn}

{(°(Y1Y2 13X[0,n]) I Xn}

_ ~J{Y1Y2 I Xn} = J'{M1M2 I X, J,

where the second and fourth equations follow from Theorems 9 .1 .3, the thirdfrom assumption (7), and the fifth from Theorem 9 .1 .5 .

Conversely, to prove that (8) implies (7), let A E :'{n }, M1 E :'h[o, n ), M2 E

J~,,, ocl . By the second equation in (9) applied to the fourth equation below,

Page 342: A Course in Probability Theory KL Chung

328 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

we have

f ?P (M2 I Xn)d~'= f i(Y2 I Xn)dgl = f Y1~^(Y2 1 Xn)d_'

=

AMA

AM,

A

f c {Y1~(Y2 I Xn) I Xn }d-'A

_ ~ ('-(Y1 Xn)e(Y2 I Xn )d-OPA

= f P(M 1 I Xn)~(M2 I X.) dJ'A

= / '(M1M2 I Xn ) d~ = ~'(AM1M2) .A

Since disjoint unions of sets of the form AM, as specified above generate theBorel field [o ,n ], the uniqueness of °J' (M2 I o,n]) shows that it is equal toGJ'(M2 I X,), proving (7) .

Finally, we prove the equivalence of the Markov property and the propo-sition (7) . Clearly the former is implied by the latter ; to prove the conversewe shall operate with conditional expectations instead of probabilities and useinduction. Suppose that it has been shown that for every n E N, and everybounded f belonging to 9[n+l,n+k], we have

( 10)

~(f ( [O,n]) = e(f I -'1"fn}) .

This is true fork = 1 by (6') . Let g be bounded, g E 9[n+1,n+k+1] ; we are goingto show that (10) remains true when f is replaced by g. For this purpose itis sufficient to consider a g of the form 9 1 92 , where gi E 9[n+1,n+k], 92 E

~T{n+k+1}, both bounded. The successive steps, in slow motion fashion, are asfollows :

e-19 1 9[0,,,]) _ ({ (00(g I {O,n+k]) I [o,n]} _ (_

g7

g7{91e(92 I [o,n+k]) I [O,n]}

_ {gl (

-'(g2 I ~n+k}) I

O,n] } = S{g1 (g2 I -'1'tn+k}) I -n} }

= U191(-r(92 I 37[n,n+k]) I Vin)} = e{1(8192 I O'(n,n+k]) I {n}}

~"'{g1g2 1 9{n}} = 0{g 1 9{n}} •

It is left to the reader to scrutinize each equation carefully for explanation,except that the fifth one is an immediate consequence of the Markov property(n{92

I ~{n+k}} = &`{g2 I o[O,n+k]} and (13) of Sec. 9.1 . This establishes (7) forM E Uk° 1 which is a field generating .X(n , w) . Hence (7) is true (why?).The last assertion of the theorem is left as an exercise .

Page 343: A Course in Probability Theory KL Chung

The property embodied in (8) may be announced as follows : "The pastand the future are conditionally independent given the present" . In this formthere is a symmetry that is not apparent in the other equivalent forms .

The Markov property has an extension known as the "strong Markovproperty". In the case discussed here, where the time index is N°, it is anautomatic consequence of the ordinary Markov property, but it is of greatconceptual importance in applications . We recall the notions of an optionalr.v . a, the r .v . Xa , and the fields J~a and T-', which are defined for a generalsequence of r .v .'s in Sec . 8.2. Note that here a may take the value 0, whichis included in N° . We shall give the extension in the form (7) .

Theorem 9.2.5 . Let {Xn , n E N°} be a Markov process and a a finiteoptional r .v. relative to it. Then for each M E ~'7'a we have

(11)

GJ'{M I ?/;,} = Y{M I a,Xa} .

PROOF . Since a E a and Xa Era (Exercise 2 of Sec . 8.2), the rightmember above belongs to Via . To prove (11) it is then sufficient to verifythat the right member satisfies the defining relation for the left, when M is ofthe form

en{Xa+j E Bj},

B j E ', 1 < j <

< f < 00 .

j=1

Put for each n,P

M,1 = n{Xn+j E Bj) E ~(n oo) .

j°I

Now the crucial step is to show that

(12)

J~{M~i I Xn} 1 {a=n} = ~flM I a, X" 1 .n=0

By the lemma in the proof of Theorem 9.1 .2, there exists a Bore] measurablefunction cp„ such that %/'{M„ I X 71 } = cp„ (X 11 ), from which it follows that theleft member of (12) belongs to the Borel field generated by the two .r.v .'s aand Xa . Hence we shall prove (12) by verifying that its left member satisfiesthe defining relation for its right member, as follows . For each m E N andB E ,7A, 1 , we have

J~{Mn I{

X1, [a=,,) d JP= {a=m:X,,,

J~{Mm I X,n} d~'l'a=m :X,, EB}

a=m:X,,, EB}n=0

JP{a = m ; Xm E B; Mm }fa=m:XEB}

_ :~'{a = m ;Xa E B; M},

9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 1 329

Page 344: A Course in Probability Theory KL Chung

330 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

where the second equation follows from an application of (7), and the thirdfrom the optionality of a, namely

{a = m} E

This establishes (12) .Now let A E :Tq, then [cf. (3) of Sec . 8 .2] we have

00A=U{(a=

n=Owhere A„ E :T[o,,,] . It follows that

00{AM} =

J,P{a = n ; An ; Mn }11=°

00

n) f1 A,1},

~P{Mn I

} d `'7'n=0 (c =n )nA,

0C

J~{M,1 1 Xn}1{a=n}d P= f 2P{M I a,Xa}dG ,11=0 A

A

where the third equation is by an application of (7) while the fourth is by (12) .This being true for each A, we obtain (11) . The theorem is proved .

When a is a constant n, it may be omitted from the right member of(11), and so (11) includes (7) as a particular case . It may be omitted also inthe homogeneous case discussed below (because then the cp„ above may bechosen to be independent of n ) .

There is a very general method of constructing a Markov process withgiven "transition probability functions", as follows . Let P0( . ) be an arbitraryp.m. on (a 1 3 1 ) For each n > 1 let P1,( ., •) be a function of the pair (x, B)where x E _W1 and B E '~3 1 , having the measurability properties below :

(a) for each x, P„ (x, •) is a p.m . on ~~1 ;

(b) for each B, P„ ( •, B) E J3 '

It is a consequence of Kolmogorov's extension theorem (see Sec . 3 .3) thatthere exists a sequence of r .v.'s {X,,, n E N° } on some probability spacewith the. following "finite-dimensional joint distributions" : for each 0 < ~ <oo, B 1 E .%3 1 , 0 < j < n :

11

~'

[Xjj-0n E Bj] = f

P0(dxo)f Pi (xo, dxl )

B0

B1

(13)

X . . . X

P,1(xn-1, dxn) •B„

There is no difficulty in verifying that (13) yields an (n + 1)-dimensional p .m .on (:%'"+1 ~n+~) and so also on each subspace by setting some of the B j's

Page 345: A Course in Probability Theory KL Chung

9.2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 33 1

above to be mil, and that the resulting collection of p .m.'s are mutually consis-tent in the obvious sense [see (16) of Sec . 3 .3] . But the actual construction ofthe process {Xn , n E N° ) with these given "marginal distributions" is omittedhere, the procedure being similar to but somewhat more sophisticated than thatgiven in Theorem 3 .3 .4, which is a particular case . Assuming the existence,we will now show that it has the Markov property by verifying (6) briefly .By Theorem 9.1 .5, it will be sufficient to show that one version of the leftmember of (6) is given by Pn+1(Xn, B), which belongs to ~T{n) by condition(b) above. Let then

n

A = n[XJ E Bj]j=°

and

(n+ 1) be the (n + 1)-dimensional p.m. of the random vector(Xo, . . . , X n ) . It follows from Theorem 3 .2 .3 and (13) used twice that

f Pn+1(Xn, B) d-1, = f. . .fPfl+ i(X fl ,B)d'I+ uµ (n B o x . . .xB,

n+1

f . . . f Po(dxo)11 Pj(xj-1, dxj)B0 x . . .xB n xB

j=1

_ _,"P(A ; Xn+1 E B) .

This is what was to be shown .We call P° ( .) the initial distribution of the Markov process and P n (, )

its "nth-stage transition probability function" . The case where the latter is thesame for all n > 1 is particularly important, and the corresponding Markovprocess is said to be "(temporally) homogeneous" or "with stationary transitionprobabilities" . In this case we write, with x = x0:

(14)

P(») (x B)= f

. . . f HP(xj, dxj+1),

WlX . . .xVXBj=o

and call it the "n-step transition probability function" ; when n = 1, the qual-ifier "1-step" is usually dropped. We also put P (°) (x, B) = 1B(x) . It is easy tosee that

(15)

p(''+l)(x, B) =f

p(n)(y, B)P(1) (x, dy),

so that all p(n) are just the iterates of P(l) .It follows from Theorem 9 .2.2 that for the Markov process {Sn , n E N)

there, we havePn (x, B) = µn (B - x) .

Page 346: A Course in Probability Theory KL Chung

332 I CONDITIONING . MARKOV PROPERTY. MARTINGALE

In particular, a random walk is a homogeneous Markov process with the 1-steptransition probability function

P' (x, B) = µ(B - x) .

In the homogeneous case Theorem 9 .2.4 may be sharpened to read likeTheorem 8 .2.2, which becomes then a particular case . The proof is left as anexercise .

Theorem 9.2.6. For a homogeneous Markov process and a finite r .v. a whichis optional relative to the process, the pre-a and post-a fields are conditionallyindependent relative to Xa , namely:

VA E 3~ , M E .fat : 9{AM I Xa } = OPJA I Xa }GJ'{M I Xa} •

Furthermore, the post-a process {Xa+n , n E N} is a homogeneous Markovprocess with the same transition probability function as the original one .

Given a Markov process {Xn , n E N°), the distribution of X° is a p .m.and for each B E A1 , there is according to Theorem 9 .1 .2 a Borel measurablefunction cn ( ., B) such that

7 {Xn+1 E B I Xn = x} = con (x, B) .

It seems plausible that the function (p n ( . , .) would correspond to the n-stagetransition probability function just discussed . The trouble is that while condi-tion (b) above may be satisfied for each B by a particular choice of cn ( •, B), itis by no means clear why the resulting collection for varying B would satisfycondition (a) . Although it is possible to ensure this by means of conditionaldistributions in the wide sense alluded to in Exercise 8 of Sec . 9.1, we shallnot discuss it here (see Doob [16, chap . 2]) .

The theory of Markov processes is the most highly developed branchof stochastic processes . Special cases such as Markov chains, diffusion, andprocesses with independent increments have been treated in many mono-graphs, a few of which are listed in the Bibliography at the end of the book .

EXERCISES

*1. Prove that the Markov property is also equivalent to the followingproposition : if t 1 < . . . < to < to+1 are indices in N° and B j , 1 < j < n + 1,

are Borel sets, then

'~Pi {Xtn+l E Bn+1 I X tl , . . . , X t, } = g'{Xto+l E Bn+ 1 I X t o }

In this form we can define a Markov process (Xt ) with a continuous parametert ranging in [0, oo) .

Page 347: A Course in Probability Theory KL Chung

9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 1 333

2. Does the Markov property imply the following (notation as inExercise 1) :

ZT{Xn+1 E Bn+1 I X1 E B1, X,,E Bn } = ~'{Xn+1 E Bn+1 I Xr, E B, J?

3. Unlike independence, the Markov property is not necessarily preservedby a "functional" : {f (X,,), n E N°} . Give an example of this, but show that itis preserved by each one-to-one Borel measurable mapping f .

*4. Prove the strong Markov property in the form of (8) .5. Prove Theorem 9 .2.6 .

*6. For the p(n) defined in (14), prove the "Chapman-Kolmogorovequations" :

Yin E N, n E N: P ('+' ) (x, B) =f

Ptm) (x, dy)P(n) (y, B) .u 1

7. Generalize the Chapman-Kolmogorov equation in the nonhomo-geneous case .

*8. For the homogeneous Markov process constructed in the text, showthat for each f > 0 we have

{f (Xm+n) I Xm} = f f (y)P(n)(Xm, dy) .

* 9. Let B be a Borel set, f 1(x, B) = P(x, B), and define f , for n > 2inductively by

fn (x, B) = fBC

dy)fn-1(y, B) ;BC

put f (x, B) = E', fn (x, B) . Prove that f (X,,, B) is a version of theconditional probability J'{UO_n+1 [X j E B] I Xn } for the homogeneous Markovprocess with transition probability function P( ., •) .

*10. Using the f defined in Exercise 9, put00

g(x,B)=f(x,B)-~ f P(n)(x,dy)[1-f(y,B)]n=1f

Prove that g(X,,, B) is a version of the conditional probability

:fllim sup[XJ E B] I X„ } .J

* 11. Suppose that for a homogeneous Markov process the initial distri-bution has support in N° as a subset of %~ 1 , and that for each i c N°, the

Page 348: A Course in Probability Theory KL Chung

334 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

transition probability function P(i, •) , also has support in N°. Thus

{P(i, j) ; (i, j) E N° x N° )

is an infinite matrix called the "transition matrix" . Show that P (n ) as a matrixis just the nth power of P ( l ) . Express the probability ?7{Xtk = i x , 1 < k < n}in terms of the elements of these matrices . [This is the case of homogeneousMarkov chains .]

12. A process {Xn , n E N°} is said to possess the "rth-order Markovproperty", where r > 1, iff (6) is replaced by

{Xn+1 E B I X0 , . . . , Xn } = i{Xn11 E B I Xn , . . . , Xn-r+1}

for n > r - 1 . Show that if r < s, then the rth-order Markov property impliesthe sth. The ordinary Markov property is the case r = 1 .

13. Let Yn be the random vector (Xn , Xn+l, • • • , Xn+r-1 ) . Then thevector process {Yn , n E N° } has the ordinary Markov property (triviallygeneralized to vectors) if and only if {Xn , n E N° ) has the rth-order Markovproperty .

14 . Let {Xn , n E N°) be an independent process. Let

Snl) _n

j=0

n

j=0J

for r > 1 . Then {S ;,r ) , n E N°} has the rth-order Markov property . For r = 2,give an example to show that it need not be a Markov process .

15 . If {Sn , n E N} is a random walk such that GJ'{S1 :A 0)>O, then forany finite interval [a, b] there exists an E < 1 such that

J'{S1 E [a, b], 1 < j < n } < c" .

This is just Exercise 6 of Sec . 5.5 again.]16 . The same conclusion is true if the random walk above is replaced

by a homogeneous Markov process for which, e .g ., there exist 6>0 and ri>Osuch that P(x, ~? 1 - (x - 6, x + 8)) > ri for every x .

9.3 Basic properties of smartingales

The sequence of sums of independent r .v .'s has motivated the generalizationto a Markov process in the preceding section ; in another direction it will nowmotivate a martingale . Changing our previous notation to conform with later

Page 349: A Course in Probability Theory KL Chung

(1)

9.3 BASIC PROPERTIES OF SMARTINGALES I 335

usage, let {x,,, n E N} denote independent r .v .'s with mean zero and writeX, _ >j-1 xj for the partial sum. Then we have

J (X,+1 I x1, . . . , xn) = c"~ (X,7 + xn+1 I X1, . . . , X17)

=Xn +("' (xn+I IX1, . . .,xn)=Xn +cf(x„+1)=Xn .

Note that the conditioning with respect to x1, . . . , x„ may be replaced byconditioning with respect to X 1 , . . . , X,, (why?) . Historically, the equationabove led to the consideration of dependent r .v.'s {x„ } satisfying the condition

CC'-(xn+1 I x1, . . . , xn) = 0 .

It is astonishing that this simple property should delineate such a useful classof stochastic processes which will now be introduced . In what follows, wherethe index set for n is not specified, it is understood to be either N or someinitial segment Nn, of N.

DEFINITION OF MARTINGALE . The sequence of r .v.'s and B .F.'s {X,,, ' } iscalled a martingale iff we have for each n :

(a) J~„ C ~n+1 and Xn E ~ ;

(b) A IXn1) < 00;

(c) Xn = (Xn+1 I gin), a.e .

It is called a supermartingale iff the "=" in (c) above is replaced by ">", anda submartingale iff it is replaced by "<". For abbreviation we shall use theterm smartingale to cover all three varieties . In case = ( l,n] as defined inSec . 9 .2, we shall omit Jn and write simply {X„ } ; more frequently howeverwe shall consider { T„ } as given in advance and omitted from the notation .

Condition (a) is nowadays referred to as : IX, j is adapted to } . Condi-tion (b) says that all the r .v.'s are integrable ; we shall have to impose strongerconditions to obtain most of our results . A particularly important one is theuniform integrability of the sequence {X„ }, which is discussed in Sec . 4 .5 . Aweaker condition is given by

(2)

Sup "-'( IX,, I) < oo ;11

when this is satisfied we shall say that {X„} is L 1 -bounded . Condition (c) leadsat once to the more general relation :

(3) n<m==~, X„=v(Xn,I''X„)-

This follows from Theorem 9.1 .5 by induction since

ci(Xn, I '/'n) = ( (( (X, I J`n-1) I -~`») = (I(Xm-1 I in)

Page 350: A Course in Probability Theory KL Chung

336 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE

An equivalent form of (3) is as follows : for each A E 3' and n < m, we have

(4)

fX n dP=fXm dA.:

It is often safer to use the explicit formula (4) rather than (3), because condi-tional expectations can be slippery things to handle . We shall refer to (3) or(4) as the defining relation of a martingale ; similarly for the "super" and "sub"varieties .

Let us observe that in the form (3) or (4), the definition of a smartingaleis meaningful if the index set N is replaced by any linearly ordered set, with"<" as the strict order . For instance, it may be an interval or the set ofrational numbers in the interval. But even if we confine ourselves to a discreteparameter (as we shall do) there are other index sets to be considered below .

It is scarcely worth mentioning that {Xn } is a supermartingale if andonly if {-Xn } is a submartingale, and that a martingale is both . However theextension of results from a martingale to a smartingale is not always trivial, noris it done for the sheer pleasure of generalization . For it is clear that martingalesare harder to come by than the other varieties . As between the super and subcases, though we can pass from one to the other by simply changing signs,our force of habit may influence the choice . The next proposition is a case inpoint .

Theorem 9.3.1 . Let {Xn , } be a submartingale and let ~o be an increasingconvex function defined on 11 . If ~p(X,) is integrable for every n, then{cp(Xn ), n } is also a submartingale .

PROOF . Since pp is increasing, and

Xn <<`{X11+1

I Zfn}

we have

(5 )

c (Xn) < (P( L {Xn+i IuT}) •

By Jensen's inequality (Sec . 9 .1), the right member above does not exceedf {cp(X„+ 1) I } ; this proves the theorem. As forewarned in 9 .1, we have leftout some "a.e." above and shall continue to do so .

Corollary 1. If {Xn , } is a submartingale, then so is IX;, JT„} . Thus (, (X„)as well as (1'(X„) is increasing with n .

Corollary 2 . If {X,,, %, } is a martingale, then {IX n 1, ;~n } is a submartingale ;and {IXn I ?n }, 1 < p < oc, is a submartingale provided that every X„ E Lr ;similarly for {IX n 11og+ IXn 1, Jan } where log+ x = (log x) v 0 for x > 0 .

Page 351: A Course in Probability Theory KL Chung

(6)

9.3 BASIC PROPERTIES OF SMARTINGALES 1 337

PROOF . For a martingale we have equality in (5) for any convex cp, hencewe may take cp(x) = Ix 1, Ix I P or Ix I log' Ix I in the proof above .

Thus for a martingale IX,), all three transmutations : {X; }, {X;, } and{ 1X„ I } are submartingales . For a submartingale {Xn }, nothing is said about thelast two .

Corollary 3. If {Xn , n3 } is a supermartingale, then so is {X, A A, ~, n } whereA is any constant .

PROOF . We leave it to the reader to deduce this from the theorem, buthere is a quick direct proof:

X„ AA > f(Xn+1 i 7n) A e` (A 1 Jan) > (r(Xn+1 AA I ~Tn)

It is possible to represent any smartingale as a martingale plus or minussomething special. Let us call a sequence of r .v .'s {Z,,, n E N) an increasingprocess iff it satisfies the conditions :

( i) Z1 = 0; Zn < Zn+1 for n > 1 ;(ii) (-'C-(Z„) < oo for each n .

It follows that Z, = T Zn exists but may take the value +oo ; Z,,.is integrable if and only if {Z„ } is L 1 -bounded as defined above, whichmeans here f (Z,) < oo . This is also equivalent to the uniformintegrability of {Z,,) because of (i) . We can now state the result as follows .

Theorem 9.3.2 . Any submartingale {Xn , Jan } can be written as

Xn = Yn +Zn,

where {Yn , ~, } is a martingale, and {Z„ } is an increasing process .

PROOF . From {X„ } we define its difference sequence as follows :

(7)

x1 = X1, x,, = X„ - Xn-1, n > 2,

so that X,, = E~=1 xj , n > 1 (cf. the notation in the first paragraph of thissection) . The defining relation for a submartingale then becomes

c" {Xn I ~ - 1) >- 0 ,

with equality for a martingale . Furthermore, we putn

Y1 = X1, yn = x n - ~, {xn Y„ = yj'j=1n

Z1 = 0, Zn Zn = Zj .

j=1

Page 352: A Course in Probability Theory KL Chung

338 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

Then clearly xn = yn + z, and (6) follows by addition . To show that {Y, , n }is a martingale, we may verify that c{yn I ~n _ 1 } = 0 as indicated a momentago, and this is trivial by Theorem 9 .1 .5 . Since each zn > 0, it is equallyobvious that (Z, J is an increasing process . The theorem is proved .

Observe that Zn E 9n _ 1 for each n, by definition . This has importantconsequences ; see Exercise 9 below. The decomposition (6) will be calledDoob's decomposition . For a supermartingale we need only change the "+"there into "-", since {- Y, n } is a martingale. The following complementis useful .

Corollary. If {Xn } is L 1 -bounded [or uniformly integrable], then both {Y n }and {Zn } are L 1 -bounded [or uniformly integrable] .

PROOF. We have from (6) :

(9(Zn) _< (~'(IXnI) - d(y1)

since v`(Yn ) = (9(Y 1 ) . Since Zn > 0 this shows that if {X n } is L 1 -bounded,then so is {Z,); and {Yn } is too because

S (I Yn I) <_ R IX. l) + e(Zn ).

Next if (X,) is uniformly integrable, then it is L 1-bounded by Theorem 4 .5 .3,hence {Z n } is L 1 -bounded and therefore uniformly integrable as remarkedbefore. The uniform integrability of [Y,) then follows from the last-writteninequality .

We come now to the fundamental notion of optional sampling of asmartingale. This consists in substituting certain random variables for the orig-inal index n regarded as the time parameter of the process. Although this kindof thing has been done in Chapter 8, we will reintroduce it here in a slightlydifferent way for the convenience of the reader. To begin with we adjoin a lastindex oo to the set N and call it N .. . = {1, 2, . . ., oo} . This is an example ofa linearly ordered set mentioned above . Next, adjoin

V~° 1 n to {fin } .A r.v . a taking values in N,,, is called optional (relative to {.Tn, n E N~})

iff for every n E N,, we have

(8)

{a<n}E

.

Since F, increases with n, the condition in (8) is unchanged if {a < n) isreplaced by {a = n } . Next, for an optional a, the pre-a field 3a is definedto be the class of all subsets A of 3- ;;,, satisfying the following condition : foreach n E N,,,, we have

(9) An{a<n}E :~Wn_ ,

Page 353: A Course in Probability Theory KL Chung

where again {a < n } may be replaced by {a = n } . Writing then

(10)

An = A fl {a = n),

we have An E J and

A=UAn=U[{a=n}nAn]n

n

where the index n ranges over Nom . This is (3) of Sec . 8.2. The reader shouldnow do Exercises 1-4 in Sec . 8.2 to get acquainted with the simplest proper-ties of optionality . Here are some of them which will be needed soon : .~a isa B .F. and a E Via; if a is optional then so is aAn for each n E N ; if a < pwhere 8 is also optional then 3a c ~f ; in particular Mann c ~a n On and infact this inclusion is an equation .

Next we assume X,,,, has been defined and X,,, E 3 ;;,, . We then define Xaas follows :

(11)

Xa(C)) =Xa(cv)(w) ;

in other words,

9.3 BASIC PROPERTIES OF SMARTINGALES 1 339

X"' (w) = X" ((0) on {a = n}, n e N, .

This definition makes sense for any a taking values in Nom, but for an optionala we can assert moreover that

(12)

Xa E

This is an exercise the reader should not miss ; observe that it is a naturalbut nontrivial extension of the assumption X11 E 2X-n for every n . Indeed, allthe general propositions concerning optional sampling aim at the same thing,namely to make optional times behave like constant times, or again to enableus to substitute optional r .v.'s for constants . For this purpose conditions mustsometimes be imposed either on a or on the smartingale {Xn } . Let us howeverbegin with a perfect case which turns out to be also very important .

We introduce a class of martingales as follows . For any integrable r .v. Ywe put

(13)

Xn = (n-(Y I ~7.), n E Noo .

By Theorem 9.1 .5, if n < m :

(14)

Xn = ("{c (Y I ~m) ~ gn} = L{Xm Win}

which shows {Xn, Jan } is a martingale, not only on N but also on Nom . The

following properties extend both (13) and (14) to optional times .

Page 354: A Course in Probability Theory KL Chung

340 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

Theorem 9.3.3 . For any optional a, we have

(15)

Xa = e (Y I ZTa) .

If a < 8 where ,B is also optional, then {Xa ,~a ;X,6, } forms a two-termmartingale .

PROOF. Let us first show that Xa is integrable . It follows from (13) andJensen's inequality that

IXnI ::s

(IYI I Vin)

Since {a = n } E „, we may apply this to get

IX,, IdA

f - IX, IdA<SZ

n

{a-n}

Next if A E a , we have, using the notation in (10) :

f Xa dg'= f Xn dam' = fAn dGA = f Y dGJ',A

,

A„

n A n

A

where the second equation holds by (13) because A n E :'fin . This establishes(15) . Now if a < ,B, then ;?X-a C p7 and consequently by Theorem 9 .1 .5 .

XU = -{ (Y I ~f) ( a} _ {X,6 I

which proves the second assertion of the theorem .

As an immediate corollary, if fan) is a sequence of optional r .v.'s suchthat

(16)

al<a2< . . .<an< . . .,

then {Xan , ?/a „ } is a martingale . This new martingale is obtained by samplingthe original one at the optional times {a j } . We now proceed to extend thesecond part of Theorem 9 .3 .3 to a supermartingale . There are two importantcases which will be discussed separately .

Theorem 9.3.4 . Let a and P be two bounded optional r .v.'s such thata < j . Then for any [super] martingale {X, }, {Xa , ZXa ;Xp, jp} forms a[super]martingale .

PROOF . Let A E

using (10) again we have for each k > j :

Aj n{fi>k)E .'A

fl"--n}

Page 355: A Course in Probability Theory KL Chung

because A j E ~/7Fj C ~Jrk, whereas {,B > k} = {,B < k}` E . It follows from thedefining relation of a supermartingale that

I

X k dJ' >

Xk+1 d.I'A;n{P>k}

A;n{fi>k}

and consequently

Xk d' >fAJn{ fi=k}

J

XkdA+An{fi>k}

Xk+1 d'l'A n >k{~_ }

Rewriting this as

Xk d~' -

Xk+l d A >

X~ d~;n1n{~>k}

A;n{B>-k+l}

A;n{~=k}

summing over k from j to m, where m is an upper bound for 8; and thenreplacing X j by Xa on A j , we obtain

(17)

Xa dJ' -

Xm+1 dam' >

Xfi d'.IA 3n{ > '}

JA 1n{ >m+1}

A-n{j<,6<m}

I Xa dg' > Xfi dOP.,

n,

Another summation over j from 1 to m yields the desired result. In the caseof a martingale the inequalities above become equations .

A particular case of a bounded optional r .v. is a„ = aAn where a isan arbitrary optional r .v. and n is a positive integer. Applying the precedingtheorem to the sequence {an } as under Theorem 9 .3.3, we have the followingcorollary .

Corollary. If {Xn , „ } is a [super] martingale and a is an arbitrary optionalr.v ., then {XaA,i An } is a [super] martingale .

In the next basic theorem we shall assume that the [super] martingaleis given on the index set Nom . This is necessary when the optional r.v .can take the value +oo, as required in many applications ; see the typicalexample in (5) of Sec . 8 .2. It turns out that if {X„ } is originally given onlyfor n E N, we may take X 0 ,, = lim„-+00 X„ to extend it to N,, under certainconditions, see Theorems 9 .4 .5 and 9 .4.6 and Exercise 6 of Sec . 9 .4. A trivialcase occurs when {X,,, `Tn ;

n E N} is a positive supermartingale; we may thentake X,,, = 0.

Theorem 9 .3.5 . Let a and 6 be two arbitrary optional r .v.'s such that a < ,B .

Then the conclusion of Theorem 9 .3 .4 holds true for any supermartingale{Xn, J~n ;n E Na } .

9.3 BASIC PROPERTIES OF SMARTINGALES 1 341

Page 356: A Course in Probability Theory KL Chung

342 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Remark . For a martingale {Xn , urn; n E N,,) this theorem is containedin Theorem 9 .3 .3 since we may take the Y in (13) to be X,,,, here .

PROOF . (a) Suppose first that the supermartingale is positive with X," = 0a.e. The inequality (17) is true for every m E N, but now the second integralthere is positive so that we have

~ Xa dJ' >

Xfi d A .JA I

JA .n(,8<m}

Since the integrands are positive, the integrals exist and we may let m -+ 00

and then sum over j E N. The result is

f

Xa d~T >f

X~ &JAn{a<oo}

Anti<oo}

which falls short of the goal . But we can add the inequality

Xa d~ _

X~ d°J' _

X,, d9' =

X,6d~nn(a=oo}

M{a=oo}

An{fl=oo}

An(f4=o0}

which is trivial because X 0,, = 0 a.e. This yields the desired

(18)

fXad>am' [xfldP .n

JA

Let us show that Xa and X~ are in fact integrable. Since X, > X., we haveXa < lim„~~ Xan „ so that by Fatou's lemma,

(19)

(F(Xa ) < urn c~'(XaAn) .n->oo

Since 1 and aAn are two bounded optional r.v.'s satisfying 1 < aAn ; theright-hand side of (19) does not exceed (`(X1) by Theorem 9.3 .4. This showsXa is integrable since it is positive .

(b) In the general case we put

X7 , = c` {X. I .r }, Xn = Xn -X ,I •

Then {X;, , :7& ; 17 E Nx } is a martingale of the kind introduced in (13), andX„ > X,, by the defining property of supermartingale applied to X„ and X,,,, .Hence the difference IX,,,

n E N) is a positive supermartingale with X"oc0 a.e. By Theorem 9 .3 .3, {Xa,

Xf~ , %} forms a martingale ; by case (a),{Xa, :Via ; X~,

forms a supermartingale. Hence the conclusion of the theoremfollows simply by addition .

The two preceding theorems are the basic cases of Doob's optionalsampling theorem . They do not cover all cases of optional sampling (see e .g .

Page 357: A Course in Probability Theory KL Chung

9.3 BASIC PROPERTIES OF SMARTINGALES ( 343

Exercise 11 of Sec . 8 .2 and Exercise 11 below), but are adequate for manyapplications, some of which will be given later .

Martingale theory has its intuitive background in gambling . If X„ is inter-preted as the gambler's capital at time n, then the defining property postulatesthat his expected capital after one more game, played with the knowledge ofthe entire past and present, is exactly equal to his current capital . In otherwords, his expected gain is zero, and in this sense the game is said to be"fair". Similarly a smartingale is a game consistently biased in one direc-tion. Now the gambler may opt to play the game only at certain preferredtimes, chosen with the benefit of past experience and present observation,but without clairvoyance into the future . [The inclusion of the present statusin his knowledge seems to violate raw intuition, but examine the examplebelow and Exercise 13 .] He hopes of course to gain advantage by devisingsuch a "system" but Doob's theorem forestalls him, at least mathematically .We have already mentioned such an interpretation in Sec . 8.2 (see in partic-ular Exercise 11 of Sec . 8 .2 ; note that a + 1 rather than a is the optionaltime there .) The present generalization consists in replacing a stationary inde-pendent process by a smartingale . The classical problem of "gambler's ruin"illustrates very well the ideas involved, as follows .

Let IS,, n E N°} be a random walk in the notation of Chapter 8, and let S1

have the Bernoullian distribution 281 + 28_1 . It follows from Theorem 8 .3 .4,or the more elementary Exercise 15 of Sec. 9.2, that the walk will almostcertainly leave the interval [-a, b], where a and b are strictly positive integers ;and since it can move only one unit a time, it must reach either -a or b. Thismeans that if we set

(20)

a = min{n > L S„ = -a}, ,B = min{n > 1 : S„ = b},

then y = aA~ is a finite optional r .v. It follows from the Corollary toTheorem 9 .3 .4 that {S yA n } is a martingale . Now

(21)

lim SyAf = S y a.e .n->oc

and clearly S y takes only the values -a and b . The question is : with whatprobabilities? In the gambling interpretation : if two gamblers play a fair coin-tossing game and possess, respectively, a and b units of the constant stake asinitial capitals, what is the probability of ruin for each?

The answer is immediate ("without any computation"!) if we show firstthat the two r .v .'s IS,, S y } form a martingale, for then

(22)

(` (Sy) = «'(S,) = 0,

which is to say that

-a:J/'{Sy = -a) + b,J/ (Sy = b} = 0,

Page 358: A Course in Probability Theory KL Chung

344 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

so that the probability of ruin is inversely proportional to the initial capital ofthe gambler, a most sensible solution .

To show that the pair {S1, Sy } forms a martingale we use Theorem 9 .3.5since {SyA n , n EN,, } is a bounded martingale. The more elementaryTheorem 9 .3 .4 is inapplicable, since y is not bounded. However, there isa simpler way out in this case : (21) and the boundedness just mentionedimply that

c` (Sy ) = lim c`(SyA n ),n-4 00

and since (` (SyA 1) = f (S 1 ), (22) follows directly .The ruin problem belonged to the ancient history of probability theory,

and can be solved by elementary methods based on difference equations (see,e .g ., Uspensky, Introduction to mathematical probability, McGraw-Hill, NewYork, 1937) . The approach sketched above, however, has all the main ingredi-ents of an elaborate modern theory. The little equation (22) is the prototype ofa "harmonic equation", and the problem itself is a "boundary-value problem" .The steps used in the solution-to wit : the introduction of a martingale,its optional stopping, its convergence to a limit, and the extension of themartingale property to include the limit with the consequent convergence ofexpectations -are all part of a standard procedure now ensconced in thegeneral theory of Markov processes and the allied potential theory .

EXERCISES

1. The defining relation for a martingale may be generalized as follows .For each optional r .v . a < n, we have C''{Xn I Via } = Xa . Similarly for asmartingale .

*2. If X is an integrable r.v ., then the collection of (equivalence classesof) r.v.'s 1`(X I r) with -,(, ranging over all Bore] subfields of ~T, is uniformlyintegrable .

3. Suppose {Xnk),k = 1, 2, are two [super] martingales, a is a finiteoptional r.v ., and Xal1 _ [>]Xa2) . Define Xn =X,(,1)1{n<a} +Xn2) l showthat {X„ , !T„ I is a [super] martingale . [HINT : Verify the defining relation in (4)for m=n+1 .]

4. Suppose each Xn is integrable and

C' {Xn+l I X1,...,X,,) = n-1 (X1 + . . . + Xn )

then {(n -1 )(X 1 + • • • + Xn ), n EN) is a martingale .5. Every sequence of integrable r.v.'s is the sum of a supermartingale

and a submartingale .6. If {X, , .T-,J and {X,, .~ n } are martingales, then so is {Xn +X',, ~ } .

But it may happen that {X„ } and {X;, } are martingales while {Xn + X;, } is not .

Page 359: A Course in Probability Theory KL Chung

9.3 BASIC PROPERTIES OF SMARTINGALES 1 345

[HINT : Let x1 and x' be independent Bernoullian r.v .'s ; and x2 = x2 = + 1 or-1 according as x1 +xl = 0 or :A 0; notation as in (7) .]

7. Find an example of a positive martingale which is not uniformly inte-grable . [HINT : You win 2" if it's heads n times in a row, and you lose everythingas soon as it's tails .]

8. Find an example of a martingale {X,) such that X n -* -oo a.e. Thisimplies that even in a "fair" game one player may be bound to lose anarbitrarily large amount if he plays long enough (and no limit is set to theliability of the other player) . [HINT : Try sums of independent but not identicallydistributed r .v.'s with mean 0 .]

* 9. Prove that if {Yn , ?n } is a martingale such that Yn E 2-h-,7- 1 , then forevery n, Y, = Y 1 a.e. Deduce from this result that Doob's decomposition (6)is unique (up to equivalent r.v .'s) under the condition that Z n E 3n-1 for everyn > 2. If this condition is not imposed, find two different decompositions .

10 . If {Xn } is a uniformly integrable submartingale, then for any optionalr.v. a we have

(i) {XaAn } is a uniformly integrable submartingale ;(11) c(X1) < co (Xa ) < SUN (X n ) .

[HINT : IXaAn I -< IX, I + IXn l .]*11. Let {Xn , „ ; n E N} be a [super] martingale satisfying the following

condition: there exists a constant M such that for every n > 1 :

e { IXn - Xn-1 I

} < Ma.e .

where Xo = 0 and 1b is trivial. Then for any two optional r .v.'s a and $such that a < 8 and (-P (8) < oc, {Xa , 9%a ;Xfi , :'gyp} is a [super] martingale . Thisis another case of optional sampling given by Doob, which includes Wald'sequation (Theorem 5 .5 .3) as a special case . [HINT : Dominate the integrand inthe second integral in (17) by Y~ where Xo = 0 and Y,n = ~m=1 IX,t - X„_1 I .We have 00

f(Yfi) =

f

lx,, - Xn-lid:% < Mn=1 {~?n}

12. Apply Exercise 11 to the gambler's ruin problem discussed in thetext and conclude that for the a in (20) we must have c`(a) = +oo. Verifythis by elementary computation .

* 13. In the gambler's ruin problem take b = 1 in (20) . Compute (,(SSA,,)for a fixed n and show that {So, S,,,n } forms a martingale. Observe that {So, Sfi}does not form a martingale and explain in gambling terms the effect of stopping

Page 360: A Course in Probability Theory KL Chung

346

1 CONDITIONING

. MARKOV PROPERTY . MARTINGALE

fi at n . This example shows why in optional sampling the option may be takeneven with the knowledge of the present moment under certain conditions . Inthe case here the present (namely fi A n) may leave one no choice!

14. In the gambler's ruin problem, suppose that S1 has the distribution

P81 + (1 - P)8-1, P 0 i

and let d = 2 p - 1 . Show that (_F(Sy) = d e (y) . Compute the probabilities ofruin by using difference equations to deduce (9(y), and vice versa .

15. Prove that for any L 1-bounded smartingale {X,, ~i n , n E N,,, }, andany optional a, we have c (jXaI) < oo . [iurcr: Prove the result first for amartingale, then use Doob's decomposition .]

*16. Let {Xn, n } be a martingale : x1 = X1, xn = Xn - X, -I for n > 2 ;let vn E :Tn_1 for n > 1 where A = 9,Tj ; now put

nvjxj .

j=1

Show that IT,, 0~,7} is a martingale provided that Tn is integrable for every n .The martingale may be replaced by a smartingale if v, > 0 for every n . Asa particular case take vn = l{n<a) where a is an optional r .v. relative to {~} .What then is T,? Hence deduce the Corollary to Theorem 9 .3 .4 .

17. As in the preceding exercise, deduce a new proof of Theorem 9 .3 .4by taking vn = 1 {a <n <fl} .

9.4 Inequalities and convergence

We begin with two inequalities, the first of which is a generalization ofKolmogorov's inequality (Theorem 5 .3 .1) .

Theorem 9.4.1. If {X j, ~7j, j E Nn } is a submartingale, then for each real ) .we have

(1)

A '{ max Xj > ~,} <

Xn dam' < ~(Xn) ;1<j<n

{maxj<j<„Xj>A)

-

(2)

)J'{min Xj<-~}<c(Xn-X1)-

X, d, 1 <j::S n

{minj<j<,Xj--A)

< c'n(Xn) - Cn(X1 )

PROOF. Let a be the first j such that X j > A if there is such a j in Nn,otherwise let a = n (optional stopping at n). It is clear that a is optional ;

Page 361: A Course in Probability Theory KL Chung

9.4 INEQUALITIES AND CONVERGENCE 1 347

since it takes only a finite number of values, Theorem 9 .3.4 shows that thepair {X(,Xn } forms a submartingale . If we write

M = { max Xj > A),1<j<n

then M E 3a (why?) and Xa > A on M, hence the first inequality followsfrom

)'J (M) < [ xadA < f Xn d A;M

M

the second is just a cruder consequence .Similarly let fi be the first j such that Xj < -A if there is such a j in

N, otherwise let ,B = n . Put also

Mk = { min X j < -A) .1<j<k

Then {X 1 , Xp) is a submartingale by Theorem 9 .3 .4, and so

(X1) S(XP ) _

X,dGJ'+f

Xn dPJ'+f Xn dGJ'{f <n-1}

Mn_IMn

Mn

< -~^Mn) + Mn) - f X n d-OP,M n

which reduces to (2) .

Corollary 1 . If {Xn } is a martingale, then for eachX>0:

(3)

~P{ max IX1I > A} < -

IXn I dam' < - RIXn I ) .

1 ~1<n

{maxi<i<„ IXil> ;~}

a,

If in addition c` (X2 ) < oc for each n, then we have also

(4)

-OP( max IX j I > A} < a (~(Xn ) .1 <i-<n

a,

These are obtained by applying the theorem to the submartingales { IX, I )

and {Xn } . In case X n is the S, in Theorem 5.3 .1, (4) is precisely the Kolmo-gorov inequality there .

Corollary 2 . Let 1 < m < n, Am E ~m and M = { maxm<j<n Xj > A}, then

AGT{Am fl M} < f

Xn d -OP .A,,, nM

This is proved just as (1) and will be needed later .

Page 362: A Course in Probability Theory KL Chung

348 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

We now come to a new kind of inequality, which will be the tool forproving the main convergence theorem below . Given any sequence of r .v .'s{Xj }, for each sample point w, the convergence properties of the numericalsequence {Xj (w)} hinge on the oscillation of the finite segments {Xj(w), j E

N„ } as n -* cc . In particular the sequence will have a limit, finite or infinite, ifand only if the number of its oscillations between any two [rational] numbers aand b is finite (depending on a, b and w) . This is a standard type of argumentused in measure and integration theory (cf. Exercise 10 of Sec . 4.2) . Theinteresting thing is that for a smartingale, a sharp estimate of the expectednumber of oscillations is obtainable .

Let a < b . The number v of "upcrossings" of the interval [a, b] by anumerical sequence {x 1 , . . . , x, } is defined as follows . Set

al = min{j : 1 < j < n, xj < a},

a2 =min{j : ai < j < n, x j > b) ;

if either al or a2 is not defined because no such j exists, we define v = 0 . Ingeneral, for k > 2 we set

a2k-1 = min{j : a2k-2 < j ::~ n, xj < a},

a2k = min{j : a2k-1 < j < n, x> > b} ;

if any one of these is undefined, then all the subsequent ones will be undefined .Let at be the last defined one, with £ = 0 if al is undefined, then v is definedto be [f/2] . Thus v is the actual number of successive times that the sequencecrosses from < a to > b. Although the exact number is not essential, since acouple of crossings more or less would make no difference, we must adhereto a rigid way of counting in order to be accurate below .

Theorem 9.4.2 . Let {Xj , .-Tj , j E N„} be a submartingale and -oc < a <b < oc . Let vtQ bj (w) denote the number of upcrossings of [a, b] by the samplesequence {Xj (w) ; j E N„ } . We have then

(5)

~~{v(") } < c {(X,~ - a)+} - (`{(X1 -a)+ } < cf'{X„} + Iat[a.b]

b - a

b - a

PROOF . Consider first the case where Xj > 0 for every j and 0 = a < b,so that vr11)

, (w) becomes v(12)

] (w), and Xaj (w) = 0 if j is odd, where ajaj (w) is defined as above with xj = Xi (w) . For each w, the sequence a j (w)is defined only up to £(w), where 0 < 2(w) < n . But now we modify thedefinition so that aj (w) is defined for 1 < j < n by setting it to be equal to nwherever it was previously undefined . Since for some w, a previously definedaj (w) may also be equal to n, this apparent confusion will actually simplify

Page 363: A Course in Probability Theory KL Chung

9.4 INEQUALITIES AND CONVERGENCE 1 349

the formulas below. In the same vein we set ao - 1 . Observe that a„ = n inany case, so that

n-1Xn - X1=Xa„-Xa

(X .+ -X )_

+~+j=0

j even j odd

If j is odd and j + 1 < f (co), then

Xaj+ , (cv) > b>O = Xaj (cv) ;

If j is odd and j = f (co), then

Xaj+i (co) = Xn (co) > 0 = Xaj (w);

if j is odd and t(co) < j, then

Xaj+t (Cv) = X, (C.~) = X aj (CO) .

Hence in all cases we have

(6)

(Xaj+ , (w) - Xaj ((V)) >

(Xaj+ , (U) - Xaj (co))j odd j oddj+1<e(m)

f[(w)]

(n) (co)b .> 2 b = "[0,b1

Next, observe that {aj , 0 < j < n} as modified above is in general of the form1 = a0 < a1 < a2 < . . . < ae < at+ 1 = . . . = an = n, and since constantsare optional, this is an increasing sequence of optional r.v.'s . Hence byTheorem 9.3 .4, {Xaj , 0 < j < n } is a submartingale so that for each j, 0 <j < n - 1, we have (`{Xaj+1 - Xaj } > 0 and consequently

c~

(

Xaj ) > 0 .J even

Adding to this the expectations of the extreme terms in (6), we obtain

(7) (`,(X" - X 1) > c`( V[0,b] )b ,

which is the particular case of (5) under consideration .In the general case we apply the case just proved to {(Xj - a)+, j E N„ },

which is a submartingale by Corollary 1 to Theorem 9 .3.1 . It is clear thatthe number of upcrossings of [a, b] by the given submartingale is exactlythat of [0, b - a] by the modified one . The inequality (7) becomes the firstinequality in (5) after the substitutions, and the second one follows since(Xn - a) + < X+ + jai . Theorem 9 .4.2 is proved .

Page 364: A Course in Probability Theory KL Chung

350 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

The corresponding result for a supermartingale will be given below ; butafter such a painstaking definition of upcrossing, we may leave the dual defi-nition of downcrossing to the reader .

Theorem 9.4.3 . Let {Xj , ~Tj ; j E Nn } be a supermartingale and let -cc <ab < oc. Let I-

[)(11)~ be the number of downcrossings of [a, b] by the sample

sequence {Xj (a)), j E Nn } . We have then

(8)

~ {-(11)} < (`'{X1 A b} - ~`'{X11 A b}

l

b-a

PROOF . {-X j, j E Nn } is a submartingale and 5 bJ is V(n b _a) for thissubmartingale . Hence the first part of (5) becomes

e{v(Q b]} < c`{(-X 11 +b)+ - (-X 1 +b)+ } = c`{(b -X,,)+ - (b -X 1 )+}-a - (-b)

b - aSince (b - x)+ = b - (b A x) this is the same as in (8) .

Corollary . For a positive supermartingale we have for 0 < a < b < oc

-(n) bG{v[ab]} < b-a .

G. Letta proved the sharper "dual" :

~ v(n) < ac{ [a' b] }

b - a(Martingales et integration stochastique, Quaderni, Pisa, 1984, 48-49 .)

The basic convergence theorem is an immediate consequence of theupcrossing inequality .

Theorem 9.4.4 . If {Xn ,

n E N) is an L1 -bounded submartingale, then{Xn } converges a .e. to a finite limit .

Remark . Since

~' (jx,1 f) = 2 r(X,+,) - (F(Xn) < 2c`(X,+1 )the condition of L'-boundedness is equivalent to the apparently weaker onebelow :(9)

sup c((Xn) < oo .n

PROOF . Let V[a,b] = limnV(12) ~ . Our hypothesis implies that the last term

in (5) is bounded in n ; letting n --~ oo, we obtain c` {V[a ,b] } < oo for every aand b, and consequently V [a , b ] is finite with probability one . Hence, for eachpair of rational numbers a < b, the set

Aia ,i,] = {limX,, < a < b < limX,,}1117

Page 365: A Course in Probability Theory KL Chung

9.4 INEQUALITIES AND CONVERGENCE 1 351

is a null set; and so is the union over all such pairs . Since this union containsthe set where lim ., X, < limn Xn , the limit exists a .e. It must be finite a .e . byFatou's lemma applied to the sequence JXn 1 .

Corollary. Every uniformly bounded smartingale converges a .e. Every posi-tive supermartingale and every negative submartingale converge a .e.

It may be instructive to sketch a direct proof of Theorem 9.4 .4 which isdone "by hand", so to speak. This is the original proof given by Doob (1940)for a martingale .

Suppose that the set A[a , b] above has probability > 77>0 . For each co inA[a,b], the sequence {Xn (w), n E N) takes an infinite number of values < aand an infinite number of values > b . Let 1 = no < n 1 < . . . and put

A21_ 1 = { min

Xi < a), A 21 = { max Xi > b) .n2j-2<i<n2j-1

n2 j_1 <i<n2i

Then for each k it is possible to choose the n i ' s successively so that the differ-ences n i - n i_ 1 for 1 < i < 2k are so large that "most" of A[ a ,b] is containedinn2k 1 Ai, so that

2k

g) n Ai > I7 .

i=1

Fixing an n > n 2k and applying Corollary 2 to Theorem 9 .4.1 to {-Xi } aswell as {X i }, we have

2j-1

aJ' n Ai > i-1 xn2i _ 1 dJ' = 2j-1 Xn d~/P,i=1

n

n Ai

2j

b:/' n Ai < 2i X,2j dJ' = 2i Xn d1=1

n A i

n Aii=1

i=1

where the equalities follow from the martingale property. Upon subtractionwe obtain

7 2j

2j-1

(b - a).°/' n A ;

n A,AZ 1 < - 1i -1

Xn d~',\i=1

i=1

n Ain 2ji=1

and consequently, upon summing over 1 < j < k:

k(b - a)7 - j al < dr(IX,1) .

This is impossible if k is large enough, since {X,} is L 1 -bounded .

Page 366: A Course in Probability Theory KL Chung

352 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

Once Theorem 9 .4.4 has been proved for a martingale, we can extend iteasily to a positive or uniformly integrable supermartingale by using Doob'sdecomposition . Suppose {X„ } is a positive supermartingale and X, = Y n - Z,as in Theorem 9 .3 .2 . Then 0 < Z n < Y n and consequently

c`(Z") = lim '(Z, ) < ( (Y 1 ) ;n-> 00

next we have

c (Yn) = Cr(X11) + ev(Zn) <- vr, (X1) + (-,5 (Z")-

Hence { Y„ } is an L1-bounded martingale and so converges to a finite limitas n --* oc . Since Zn T Z00 < oo a.e., the convergence of {X„ } follows . Thecase of a uniformly integrable supermartingale is just as easy by the corollaryto Theorem 9 .3 .2 .

It is trivial that a positive submartingale need not converge, since thesequence {n } is such a one . The classical random walk IS, } (coin-tossinggame) is an example of a martingale that does not converge (why?) . Aninteresting and not so trivial consequence is that both e(S„) and (-,"(IS,, 1) mustdiverge to +oc! (Cf. Exercise 2 of Sec . 6.4.) Further examples are furnishedby "stopped random walk" . For the sake of concreteness, let us stay with theclassical case and define y to be the first time the walk reaches +1 . As in ourprevious discussion of the gambler's-ruin problem, the modified random walk{S,,}, where S, = SyA,,, is still a martingale, hence in particular we have foreach n :

'(Sn)=~(S1)=

S1d'T+

S1d'=(-"(SI)=0 .{y=1}

{y>1}

As in (21) of Sec . 9 .3 we have, writing S00 = S y = 1,

lim S, = S00 a.e .,n

since y < oo a.e ., but this convergence now also follows from Theorem 9 .4 .4,since S12 < 1 . Observe, however, that

c` (S n ) = 0 < 1 =

Next, we change the definition of y to be the first time (> 1) the walk "returns"to 0, as usual supposing So - 0. Then S00 = 0 and we have indeed (Sn ) _

But for each n,

Sn d~J/'>0 =

S0 d.~' ,',{s„ >o)

{s„ >o}

so that the "extended sequence" {S1, . . .

. . . , S ..J is no longer a martin-gale. These diverse circumstances will be dealt with below .

Page 367: A Course in Probability Theory KL Chung

9.4 INEQUALITIES AND CONVERGENCE 1 353

Theorem 9.4.5 . The three propositions below are equivalent for a sub-martingale {X,,,

11 E N} :

(a) it is a uniformly integrable sequence ;(b) it converges in L 1 ;(c) it converges a .e. to an integrable X,,,, such that {X,,,

n E N,,} isa submartingale and ("(X,) converges to (t (X .. . ) .

PROOF . (a) = (b) : under (a) the condition in Theorem 9 .4 .4 is satisfied sothat X„ -a X,, a.e. This together with uniform integrability implies X, --* X, '~in L i by Theorem 4.5 .4 with r = 1 .

(b) ---* (c) : under (b) let X„ --+ X1,, in L 1 , then '1(jX„ J ) -+ C'((X ,~ 1) <oo and so X, ---> X,, a.e. by Theorem 9.4 .4. For each A E ~ and n < n',

we have

JAX, &0 < J[XflfdPJA

by the defining relation . The right member converges to fA X,, d?' by L1 -convergence and the resulting inequality shows that {X„ , „ ; n E N~ } is asubmartingale . Since L 1 -convergence also implies convergence of expecta-tions, all three conditions in (c) are proved .

(c) = (a) ; under (c). {X,; , „ ; n c- Nom} is a submartingale ; hence wehave for every ).>0:

(10)

X,+, d~) < f

X+4d9',{Xn >kJ

{Xn >),)which shows that {X,; , n E N} is uniformly integrable . Since Xn -~ X+ a.e .,this implies c (X„) -a (`(X+) . Since by hypothesis ((X„) -+ ((X~ ), it followsthat ((X,) --± c (X0) . This and Xn -+ X- a.e . imply that {Xn } is uniformlyintegrable by Theorem 4 .5.4 for r = 1 . Hence so is {X„ } .

Theorem 9.4.6 . In the case of a martingale, propositions (a) and (b) aboveare equivalent to (c') or (d) below :

(c') it converges a .e . to an integrable X,,, such that {X„ ,

n E N~} isa martingale ;

(d) there exists an integrable r .v . Y such that X„ = -F(Y

„) for eachn E N.

PROOF . (b) =:~ (c') as before ; (c') = (a) as before if we observe that('(X,,,) for every n in the present case, or more rapidly by consid-

ering JX„ I instead of X as below . (c') =~ (d) is trivial, since we may takethe Y in (d) to be the X,,, in (c') . To prove (d) = (a), let n < n', then by

Page 368: A Course in Probability Theory KL Chung

354 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

Theorem 9 .1 .5 :

('(Xn-I~„)=~(~(Yi n-)fi

'(Y

X",

hence {X„ , n , n E N ; Y, } is a martingale by definition. Consequently{ IX„ I , : 7,, , n E N ; I Y 1, 271 is a submartingale, and we have for each ) .> 0 :

f

IXn I d~GA< /

I Y I d,IX,I>1, }

{lX„j>a.}

~{IXnI > ),} <_ ~0"(IXnI) < ~erU(IYI),

which together imply (a) .

Corollary. Under (d), {Xn , 9n , n E N; X00 , ~& ; Y, ~fl is a martingale, whereXo, is given in (c') .

Recall that we have introduced martingales of the form in (d) earlier in(13) in Sec . 9.3 . Now we know this class coincides with the class of uniformlyintegrable martingales .

We have already observed that the defining relation for a smartingaleis meaningful on any linearly ordered (or even partially ordered) index set .The idea of extending the latter to a limit index is useful in applications tocontinuous-time stochastic processes, where, for example, a martingale maybe defined on a dense set of real numbers in (t 1 , t 2 ) and extended to t2 .

This corresponds to the case of extension from N to Nom . The dual extensioncorresponding to that to tl will now be considered . Let -N denote the set ofstrictly negative integers in their natural order, let -oo precede every elementin -N, and denote by -N,,, the set {-oo) U (-N) in the prescribed order .If {fin , n E -N} is a decreasing (with decreasing n) sequence of Borel fields,their intersection I ,,E-N `Tn will be denoted by

~.The convergence results for a submartingale on -N are simpler because

the right side of the upcrossing inequality (5) involves the expectation of ther.v. with the largest index, which in this case is the fixed -1 rather than theprevious varying n . Hence for mere convergence there is no need for an extracondition such as (9) .

Theorem 9.4.7 . Let {X,,, n E -NJ be a submartingale . Then

(11)

lira Xn = X-,,,, where -oo < X_ ,,, < oo a.e .n- -oc

The following conditions are equivalent, and they are automatically satisfiedin case of a martingale with "submartingale" replaced by "martingale" in (c) :

Page 369: A Course in Probability Theory KL Chung

9.4 INEQUALITIES AND CONVERGENCE 1 355

(a) {Xn } is uniformly integrable ;(b) X„ --> X-,,,, in L 1 ;(c) {Xn , n E -N .. . } is a submartingale ;(d) limn ~_~ 4, <r'(X11 ) > -oc .

PROOF . Let v[Q b, be the number of upcrossings of [a, b] by the sequence{X-nX_i } . We have from Theorem 9 .4.2 :

G-{ via b} } <(f(Xb

1) a I a I .

Letting n --* oo and arguing as the proof of Theorem 9 .4 .4, we conclude (11)by observing that

(X±0 ) < lim (,C°(X± n ) < (-,C(X+ i ) < oo .n

The proofs of (a) ==~ (b) = (c) are entirely similar to those in Theorem 9 .4.5 .(c) = (d) is trivial, since -oc < (-"(X-,,)) < cr(X _ n ) for each n . It remainsto prove (d) #, (a). Letting C denote the limit in (d), we have for each A>O :

(12) aJ'{Wn I >a}< (IX,1 )=2c~(Xn)-L(X,1)<2L(X-1)-C<00 .

It follows that 9'{ IX, I > A) converges to zero uniformly in n as A -* oc . Since

X+ d.G/' <f

X+I d{X >) }

{X„ >)}

this implies that {X ; } is uniformly integrable . Next if n < m, then

0 > I

Xn d=~' = (`'(X,1) -

Xn dam/'

> «(Xn ) -

XrndJ~{X„ > -;0

_ ~c'(Xn - X,n) + (`(Xnl) -J

X, d~J'{X,I>- Ik }

_ ((X,1 - X111) +

X, n412

d.l'.<-)j

By (d), we may choose -m so large that ((X,, - X,n ) > -E for any given E>0and for every n < m . Having fixed such an m, we may choose A so large that

sup

IXm (dJI' < En

{X„ <-;~}

Page 370: A Course in Probability Theory KL Chung

356 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

by the remark after (12) . It follows that {gin ) is also uniformly integrable, andtherefore (a) is proved .

The next result will be stated for the index set N of all integers in theirnatural order :

N={ . . .,-n, . . .,-2,-1,0,1,2, . . .,n, . . .} .

Let { In } be increasing B .F.'s on N, namely : n C ~m if n < m . We may"close" them at both ends by adjoining the B .F.'s below :

oo=1 \ ' moo=vn

n

Let {Yn } be r.v.'s indexed by N . If the B .F.'s and r.v.'s are only given on Nor -N, they can be trivially extended to N by putting ~n = 1 , Y„ = Y 1 forall n < 0, or 7, ,~,7, = ~ 1, Yn = Y-1 for all n > 0. The following convergencetheorem is very useful .

Theorem 9 .4.8 . Suppose that the Y n 's are dominated by an integrable r .v.Z :

(13)

sup I Yn I < Zn

and lim„ Y, = Yx or Y-,, as n --~ oc or -oo. Then we have

(14a) limn-+o0

(14b)

lim

~` {Yn I n } _ C~{Y00 I moo } ;

'IYn I Win} = co {Y-oo I

00} .

In particular for a fixed integrable r .v. Y, we have

(15a)

lim c`'{Y I Win } _ {Y 1 ~-'} ;

11-+0C

(15b)

lim «{Y I `J~/1 } _ "IY I

} .n-+-o0

where the convergence holds also in L' in both cases .

PROOF . We prove (15) first . Let X1t = c{Y I .%} . For n E N, {X,1i Jn}is a martingale already introduced in (13) of Sec . 9.3 ; the same is true forn E -N . To prove (15a), we apply Theorem 9 .4.6 to deduce (c') there . Itremains to identify the limit X 00 with the right member of (15a) . For eachA E :'fin , we have

I Yd'/'=I Xn d.°A= J X0A

A

A

Page 371: A Course in Probability Theory KL Chung

9.4 INEQUALITIES AND CONVERGENCE 1 357

Hence the equations hold also for A E ~& (why?), and this shows that X".has the defining property of 1(Y I ~&, since X,,,, E ~& . Similarly, the limitX_~ in (15b) exists by Theorem 9 .4.7; to identify it, we have by (c) there,for each A E ,, :

fX oc d_ am'=J

XndGJ'_IYdA.

nThis shows that X_,,, is equal to the right member of (15b) .

We can now prove (14a) . Put for M E N:

Wm = sup IYn - Y" 1 ;n>m

then IWmI < 2Z and limm,c Wm = 0 a.e. Applying (15a) to W m we obtain

lim co{IYn - YooIj } < lim e{Wm I ~Tn } = co{Wm ~ moo}0n-+oo

n--+00

As m -± oc, the last term above converges to zero by dominated convergence(see (vii) of Sec . 9.1) . Hence the first term must be zero and this clearlyimplies (14a) . The proof of (14b) is completely similar .

Although the corollary below is a very special case we give it for histor-ical interest. It is called Paul Levy's zero-or-one law (1935) and includesTheorem 8.1 .1 as a particular case .

Corollary . If A E 3 , then

(16)

lim -OP(A I On ) = I n a.e .n-+oc

The reader is urged to ponder over the intuitive meaning of this result andjudge for himself whether it is "obvious" or "incredible" .

EXERCISES

*1 . Prove that for any smartingale, we have for each k >O :

~,Z'P{suP IXn I> ?,} < 3 sup (F(IXn D ) .n

n

For a martingale or a positive or negative smartingale the constant 3 may bereplaced by 1 .

2. Let {X,) be a positive supermartingale. Then for almost every w,Xk (w) = 0 implies X,, (w) = 0 for all n > k. [This is the analogue of aminimum principle in potential theory .]

3. Generalize the upcrossing inequality for a submartingale {Xn ,~} asfollows :

°{V[a b] I

} <{(Xn - a)+ I ) - (X i - a)+

b-a

Page 372: A Course in Probability Theory KL Chung

358 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE

Similarly, generalize the downcrossing inequality for a positive supermartin-gale {X„ ,

as follows :

X 1 Abv[Q b] I h } < b - a

*4. As a sharpening of Theorems 9 .4 .2 and 9.4 .3 we have, for a positivesupermartingale {X,,,

n E N} :

X A a

k-Iv

(Q,b} > k} < (b

) ( b )'7(X1 A b)

(a)k

-1(n)

b

bJ~{v{a b} > k} <

These inequalities are due to Dubins . Derive Theorems 9.3 .6 and 9 .3 .7 fromthem . [HIN :

bJ'{a2 j < n } < fa,1<n}

2 d- -

Xa2j dJ'{a2j_, <n}

<

X j-1 d A < ate'{a2j_i < n}{all ' <n}

since {a2j_I < n } E .~a2 . .]

*5. Every L 1 -bounded martingale is the difference of two positive L 1 -bounded martingales. This is due to Krickeberg . [HINT: Take one of them tobe limk, x ({X,+ I . Tn} •]

*6. A smartingale {X,,, „ ; n E N} is said to be closable [on the right]iff there exists a r .v . X,,, such that {X,,, n E Nom } is a smartingale of thesame kind. Prove that if so then we can always take X,, = lim n ,, X,, . Thissupplies a missing link in the literature . [HINT : For a supermartingale considerX„ _ ~`(Xx :,, ) + Y,,, then {Y,,, Jan } is a positive supermartingale so wemay apply the convergence theorems to both terms of the decomposition .]

7. Prove a result for closability [on the left] which is similar to Exercise 6but for the index set -N . Give an example to show that in case of N we mayhave lim„_-, x (% (X„) 0 ((X,,,), whereas in case of -N closability implieslim",

8. Let

11 E N} be a submartingale and let a be a finite optionalr.v . satisfying the conditions : (a) i (IXa I) < oc, and (b)

lim

IX, I dai = 0 .a--+OO {a>n}

Then {Xa,,,, .~iaA ,, ; n E Nom } is a submartingale . [HINT : for A E

boundfA (Xa - XI,,,, ) d:9' below by interposing Xan ,,, where n < m .]

Page 373: A Course in Probability Theory KL Chung

9.4 INEQUALITIES AND CONVERGENCE 1 359

9 . Let IX,, : h'7-,, ; n E N} be a supermartingale satisfying the conditionlimn -0C (` (X„) > -oc. Then we have the representation X„ = X,, + X,7 where{X,,, c „} is a martingale and {X„, %,} is a positive supennartingale such thatlim„, 0 X„ = 0 in L 1 as well as a.e. This is the analogue of F . Riesz'sdecomposition of a superharmonic function, X ;, being the harmonic part andX;, the potential part . [HINT : Use Doob's decomposition X„ = Y„ - Z„ andput X„=Y„-~`(Z" I~'/n) •]

10. Let IX,,, ?X„ } be a potential ; namely a positive supermartingale suchthat Jim, ,--+,, = 0; and let X„ = Y„ - Z„ be the Doob decomposition[cf. (6) of Sec . 9 .3]. Show that

Xn =I(Z,, I~T-n ) - Zn .

* 11 . If {X„} is a martingale or positive submartingale such thatSUN_

(Xn) < oc, then {X„ } converges in L2 as well as a.e .12 . Let {~,,, n E N} be a sequence of independent and identically

distributed r.v.'s with zero mean and unit variance; and S„ _ ~~= I ~j .

Then for any optional r .v. a relative to {~,, } such that c(ja-) < oc, wehave c (ISa I) < /2-C`(.,Aa-) and c(Sa ) = 0 . This is an extension of Wald'sequation due to Louis Gordon . [HINT : Truncate a and put 77k = (S2 /,/) -

(Sn_ 1 /,/k - 1) ; then

77k dJ'k-1 fit k)

00

k=1k}/-1k- < 2(r(,,fc-e ) ;

now use Schwarz's inequality followed by Fatou's lemma .]The next two problems are meant to give an idea of the passage from

discrete parameter martingale theory to the continuous parameter theory .13 . Let {X t , `At ; t E [0, 1 ]} be a continuous parameter supermartingale .

For each t E [0, 1 ] and sequence It,j decreasing tot, {X t „ } converges a .e. andin L 1 . For each t E [0, 1 ] and sequence {t,, } increasing to t, {X t, } converges a .e .but not necessarily in L 1 . [HINT : In the second case consider X t„ - (1'7{X t I ?It,, } .]

*14. In Exercise 13 let Q be the set of rational numbers in [0, 1] . Foreach t E (0, 1) both limits below exist a.e . :

lilnXs , limXs .'Tt

,i t

,cQ

sEQ

[HINT : Let {Q,,, n > 1 } be finite subsets of Q such that Q,, T Q ; and apply theupcrossing inequality to {X,•, S E Q,, }, then let n ---> oo .]

Page 374: A Course in Probability Theory KL Chung

360 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

9.5 Applications

Although some of the major successes of martingale theory lie in the field ofcontinuous-parameter stochastic processes, which cannot be discussed here, ithas also made various important contributions within the scope of this work.We shall illustrate these below with a few that are related to our previoustopics, and indicate some others among the exercises .

(I) The notions of "at least once" and "infinitely often"

These have been a recurring theme in Chapters 4, 5, 8, and 9 and play impor-tant roles in the theory of random walk and its generalization to Markovprocesses. Let {X,I , n E N° ) be an arbitrary stochastic process ; the notationfor fields in Sec . 9 .2 will be used . For each n consider the events :

00An =U{Xj EB1 ),

j=n

00

M= I I An = {Xj E Bj i.o .},n=1

where Bn are arbitrary Borel sets .

Theorem 9 .5.1 . We have

(1)

lim ,?I{An+1 13[0,n)} = 1M a.e .,

where [o .n] may be replaced by {n } or X n if the process is Markovian .

PROOF . By Theorem 9.4.8, (14a), the limit is

~'{M I ~io,00>} = 1 M .

The next result is a "principle of recurrence" which is useful in Markovprocesses ; it is an extension of the idea in Theorem 9 .2.3 (see also Exercises 15and 16 of Sec . 9.2) .

Theorem 9.5.2 . Let {Xn , n E N°} be a Markov process and A, Bn Borelsets . Suppose that there exists 8>0 such that for every n,

00(2)

){ U [X j E Bj] I X n } > 8 a.e. on the set {Xn E An } ;

j=n+1

then we have

(3)

J'{[X j E A j i .o.]\[Xj E Bj i.o .]} = 0 .

Page 375: A Course in Probability Theory KL Chung

9.5 APPLICATIONS 1 361

PROOF. Let A _ {Xj E Aj i .o .} and use the notation An and M above .We may ignore the null sets in (1) and (2) . Then if w E A, our hypothesisimplies that

ZII{An+l I X„}((o) > 8 i.o.

In view of (1) this is possible only if w E M. Thus A C M, which implies (3) .

The intuitive meaning of the preceding theorem has been given byDoeblin as follows : if the chance of a pedestrian's getting run over is greaterthan 8 > 0 each time he crosses a certain street, then he will not be crossingit indefinitely (since he will be killed first)! Here IX, E A,,) is the event ofthe nth crossing, {Xn E Bn} that of being run over at the nth crossing .

(II) Harmonic and superharmonic functions for a Markov process

Let {X,,, n E N°} be a homogeneous Markov process as discussed in Sec . 9.2with the transition probability function P( ., . ). An extended-valued function

f on R1 is said to be harmonic (with respect to P) iff it is integrable withrespect to the measure P(x, .) for each x and satisfies the following "harmonic

equation" ;

(4)

Vx E ~W1 : f (x) = f P(x, dy)f(y) .

It is superharmonic (with respect to P) iff the "_" in (4) is replaced byin this case f may take the value +oo .

Lemma. If f is [super]harmonic, then { f (X,), n e N° }, where X0 - x° forsome given x° in

is a [super]martingale .

PROOF. We have, recalling (14) of Sec . 9.2,

{f(Xn)} = f P" (xo,dy)f(y) < oc,

as is easily seen by iterating (4) and applying an extended form of Fubini'stheorem (see, e .g., Neveu [6]). Next we have, upon substituting X„ for x in (4):

f (X,) = f P(Xn, d y)f (y) = (`{ .f (Xn+l) I Xn} = ("If (Xn+1) 1 ~[°,nl},

where the second equation follows by Exercise 8 of Sec . 9.2 and the third byMarkov property. This proves the lemma in the harmonic case ; the other caseis similar . (Why not also the "sub" case?)

The most important example of a harmonic function is the g( ., B) of

Exercise 10 of Sec . 9.2 for a given B; that of a superharmonic function is the

Page 376: A Course in Probability Theory KL Chung

362

1 CONDITIONING

. MARKOV PROPERTY . MARTINGALE

f ( •,

B) of Exercise 9 there

. These assertions follow easily from their proba-bilistic meanings given in the cited exercises, but purely analytic verificationsare also simple and instructive . Finally, if for some B we have

7r(x) =

P(') (x, B) < oc11=0

for every x, then 7r( •)

is superharmonic and is called the "potential" of the

set B .

Theorem 9 .5.3 . Suppose that the remote field of {Xn, n E N°} is trivial .Then each bounded harmonic function is a constant a .e. with respect to each

where A„ is the p .m. of Xn .

PROOF. By Theorem 9.4 .5, f (Xn) converges a.e. to Z such that

{ f (Xn), [0,n1 Z, x[0,00)}

is a martingale . Clearly Z belongs to the remote field and so is a constant ca.e. Since

f (X n) = (~IJZ ( n }

each f (X„) is the same constant c a .e. Mapped into 31, the last assertionbecomes the conclusion of the theorem .

(III) The supremum of a submartingale

The first inequality in (1) of Sec . 9 .4 is of a type familiar in ergodic theoryand leads to the result below, which has been called the "dominated ergodictheorem" by Wiener. In the case where X„ is the sum of independent r .v.'swith mean 0, it is due to Marcinkiewicz and Zygmund. We write IIXIIP forthe U-norm of X : IIXIIn = (IXIp) .

Theorem 9.5.4. Let 1 < p < oo and 1 / p + 1 /q = 1 . Suppose that {Xn , n EN) is a positive submartingale satisfying the condition

(5)

sup ("(XP} < oo .n

Then sup,IENX,1 E LP and

(6)

I I supXn I Ir < q sup I IX,, I IP .11

/t

PROOF. The condition (5) implies that IX,,) is uniformly integrable

(Exercise 8 of Sec. 4 .5), hence by Theorem 9 .4 .5, Xn --~ X,,, a .e. and {Xn, n ENom} is a submartingale. Writing Y for sup X,,, we have by an obvious

Page 377: A Course in Probability Theory KL Chung

9.5 APPLICATIONS 1 363

extension of the first equation in (1) of Sec . 9 .4 :

(7)

W .>0: A~{Y > A} < / X,0 dT.J{Y?~}

Now it turns out that such an inequality for any two r .v.'s Y and X"' impliesthe inequality I IY I I p ::S g I IXoo I I p, from which (6) follows by Fatou's lemma .This is shown by the calculation below, where G Q,) _ 91{Y > a.} .

00

((Yp) _ -J

A dG(A) <or

pAp-1 G(A,)da,0

0

< ~ plp-1 1

X,, &G7 d,o

A {Y>A}

Y_

X" fpAp-2 da~ d'

s~

o

-17IX,)Yp-1 d~P < gIIXooiIpIIYP-' 11q

= glfXoollp{('(Yp)j 11q .

Since we do not yet know (ff (Yp) < 00, it is necessary to replace Y firstwith Y AM, where m is a constant, and verify the truth of (7) after thisreplacement, before dividing through in the obvious way . We then let m T o0and conclude (6) .

The result above is false for p = 1 and is replaced by a more complicatedone (Exercise 7 below) .

(IV) Convergence of sums of independent r.v.'s

We return to Theorem 5 .3 .4 and complete the discussion there by showingthat the convergence in distribution of the series En X, already implies itsconvergence a .e. This can also be proved by an analytic method based onestimation of ch.f.'s, but the martingale approach is more illuminating .

Theorem 9.5 .5 . If {Xn , n E N) is a sequence of independent r .v.'s such thatSn = E'=, Xj converges in distribution as n ---> oo, then S, converges a .e .

PROOF . Let f; be the ch .f. of Xj , so that

n

(Pit = 11 fjj=1

Page 378: A Course in Probability Theory KL Chung

364 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE

is the ch .f. of S,, . By the convergence theorem of Sec . 6.3, cn convergeseverywhere to co, the ch .f. of the limit distribution of S,, . We shall need onlythis fact for (tI < to, where to is so small that cp(t) 0 0 for ItI < to ; then thisis also true of c ,, for all sufficiently large n . For such values of n and a fixedt with I tt < to , we define the complex-valued r .v. Z, as follows :

(8)eits,

Zn

cpn (t)

Then each Z„ is integrable; indeed the sequence {Z n ) is uniformly bounded .We have for each n, if ~n denotes the Borel field generated by S1, . . . , S n

eitsn

eitXn+ ,

(n (t) f n+1(t)

eitS n

e itX„+ ,~

eitSn f n+1 (t) =Z

cn (t)

f n+1 (t)n

~n (t) f n+1(t)

Z/1

where the second equation follows from Theorem 9 .1 .3 and the third fromindependence . Thus {Zn , :~„ } is a martingale, in the sense that its real andimaginary parts are both martingales . Since it is uniformly bounded, it followsfrom Theorem 9 .4.4 that Zn converges a .e. This means, for each t with I t I < to,

there is a set Sg t with °J'(S2 t ) = 1 such that if w E Q, then the sequence ofcomplex numbers eits^ (W ) /cp„ (t) converges and so also does eirsn (w) . But howdoes one deduce from this the convergence of S, (w)? The argument belowmay seem unnecessarily tedious, but it is of a familiar and indispensable kindin certain parts of stochastic processes .

Consider eirsn (w) as a function of (t, w) in the product space T x Q, whereT = [-to, to], with the product measure nZ x where n is the Lebesguemeasure on T. Since this function is measurable in (t, w) for each n, theset C of (t, w) for which limn~~ eits„(m) exists is measurable with respect tom x ~. Each section of C by a fixed t has full measure ?~ (Qt ) = 1 as justshown, hence Fubini's theorem asserts that almost every section of C by afixed w must also have full measure m(T) = 2to . This means that there existsan S2 with '~'(Q) = 1, and for each w E S2 there is a subset T,,, of T withm(T.) = m(T), such that if t E T, then lirn n,~ eits" (' ) exists . Now we arein a position to apply Exercise 17 of Sec . 6 .4 to conclude the convergence ofSn (w) for w E S2, thus finishing the proof of the theorem .

According to the preceding proof, due to Doob, the hypothesis ofTheorem 9 .5 .5 may be further weakened to that the sequence of ch .f.'s ofS„ converges on a set of t of strictly positive Lebesgue measure . In particular,if an infinite product II„ f „ of ch .f.'s converges on such a set, then it convergeseverywhere .

("{Zn+1

Page 379: A Course in Probability Theory KL Chung

(V) The strong law of large numbers

Our next example is a new proof of the classical strong law of large numbersin the form of Theorem 5 .4 .2, (8) . This basically different approach, whichhas more a measure-theoretic than an analytic flavor, is one of the strikingsuccesses of martingale theory . It was given by Doob (1949) .

Theorem 9.5.6 . Let IS,, n e N} be a random walk (in the sense ofChapter 8) with c`{IS i 1} < oc. Then we have

lim Sn

n-+oo n

PROOF. Recall that S n = Ejn=1 Xj and consider for 1 < k < n :

(9)

(F{Xk I Sn, Sn+1 . . . . } = c fXk I fin

where n is the Borel field generated by {S j , j > n} . Thus

= '{Si } a.e .

nEN

as n increases. By Theorem 9.4 .8, (15b), the right side of (9) converges toc`{Xk 1 6] . Now ;!~;, is also generated by S n and {Xj , j > n + 1}, hence itfollows from the independence of the latter from the pair (X k , S,,) and (3) ofSec. 9 .2 that we have

(10)

C {Xk Sn, Xn+l, Xn+2, . . .} = ( {Xk S, } = c`IX 1 Sn},

the second equation by reason of symmetry (proof?) . Summing over k from1 to n and taking the average, we infer that

S" _ (rfXi I n }n

so that if Y_,, = S,,/n for n E N, {Y,,, n E -NJ is a martingale. In particular,

limS" = Jim cc{X1 I n} = (``{XI I _},

n -oo n

17__>00

where the second equation follows from the argument above . On the otherhand, the first limit is a remote (even invariant) r .v. in the sense of Sec . 8 .1,since for every m > 1 we have

n

lim Sn (w) = limEj=m Xj ((0) ,

n-+o0 n

n-+oo

n

9.5 APPLICATIONS 1 365

Page 380: A Course in Probability Theory KL Chung

366 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE

hence it must be a constant a .e. by Theorem 8 .1 .2 . [Alternatively we may useTheorem 8 .1 .4, the limit above being even more obviously a permutable r.v.]This constant must be f { {X 1 1 -,0-11 _ ~ {X 1 }, proving the theorem .

(VI) Exchangeable events

The method used in the preceding example can also be applied to the theoryof exchangeable events . The events {E,, n E NJ are said to be exchangeableiff for every k > 1, the probability of the joint occurrence of any k of them isthe same for every choice of k members from the sequence, namely we have

(11)

;~1'{En , f1 . . . fl Enk } = Wk,

k E N ;

for any subset {n 1 , . , n k } of N. Let us denote the indicator of En by en ,and put

Nnj=1

then Nn is the number of occurrences among the first n events of the sequence .Denote by , the B .F. generated by {Nj , j > n}, and

(~eni(12)

f

I ,j=1

fnk =

nEN

Then the definition of exchangeability implies that if nj < n for 1k, then

k

11 en i ,(ni, . .,nk) J=1

k11 (eni

I Nnj=1

and that this conditional expectation is the same for any subset (n 1, . . . , n k )

of (1, . . . , n ) . Put then fno = 1 and

1<k

where the sum is extended over all (") choices ; this is the "elementarysymmetric function" of degree k formed by e1, . . ., e n . Introducing an inde-terminate z we have the formal identity in z :

n

n

1 , f n jzj = 11(1 + e jz) .j=0

j=1

n,

< J <

Page 381: A Course in Probability Theory KL Chung

But it is trivial that 1 + ejz = ( 1 + Z)ej since ej takes only the values 0 and1, hence'

n

n

f,tjz'' _ 11(1 +Z)ej _ (1 +Z)N"j=0

j=1

From this we obtain by comparing the coefficients :

(13)

fnk=Nn),

0<k<n .It follows that the right member of (12) is equal to

Nn

(n )k

kLetting n -* oc and using Theorem 9 .4.8 (15b) in the left member of (12),we conclude that

k

Nk

( 14)

e I 11 enj I JC = lim "(j=1

n+ cc n

This is the key to the theory. It shows first that the limit below exists almostsurely :

lim Nn =27,

n->oo nand clearly q is a r.v. satisfying 0 < q < 1 . Going back to (14) we haveestablished the formula(14')

A(E, , n . . . n E„ A I ,l) = rJ k , k E N;

and taking expectations we have identified the w k in (11) :

Wk = c~"(Yjk ) .

Thus {wk , k E N} is the sequence of moments of the distribution of 17 . This isde Finetti's theorem, as proved by D . G. Kendall. We leave some easy conse-quences as exercises below . An interesting classical example of exchangeableevents is Polya's urn scheme, see Renyi [24], and Chung [25] .

(VII) Squared variation

Here is a small sample of the latest developments in martingale theory . LetX = {X,,

be a martingale ; using the notation in (7) of Sec. 9.3, we put11

Qn = Qn(X)= > 41j=1

I owe this derivation to David Klarner .

9.5 APPLICATIONS 1 367

Page 382: A Course in Probability Theory KL Chung

368 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE

The sequence {Q2 (X), n E N} associated with X is called its squared varia-tion process and is useful in many applications. We begin with the algebraicidentity :

X;,17

j=1

2

xj

xj + 2

Xj-lxj .

If X„ E L2 for every n, then all terms above are integrable, and we have foreach j :

( 16)

(c(Xj-l xj) = cn (Xj_1 cn(xj Jj-1 )) = 0 .

It follows thatn

L (Xn )

XJj=1

When X„ is the nth partial sum of a sequence of independent r .v.'s with zeromean and finite variance, the preceding formula reduces to the additivity ofvariances ; see (6) of Sec . 5.1 .

Now suppose that {X,} is a positive bounded supermartingale such that0 < X, < a. for all n, where A is a constant . Then the quantity of (16) isnegative and bounded below by

((a.(F(xj I ' -I ) ) _ Acr(xj) < 0 .

In this case we obtain from (15) :

> (f (X,,) > c` (Q2 ) + 2A

so that

n

j=2

n

j=1

n

j=2

C`(xj) _ c (Q1) + 2ALV(X„) = c`(XI

(17)

(,c (Q') < 2A6`(X1) < 2a.2 .

If X is a positive martingale, then XAA, is a supermartingale of the kindjust considered so that (17) is applicable to it . Letting X* = supl<„ <,,,X,,,

c~I'{Qn (X) >_ A} < ~11{X* > A} + ./){X* < A. ; Q,~ (X AA) >_ A} .

By Theorem 9 .4.1, the first term on the right is bounded by Thesecond term may be estimated by Chebyshev's inequality followed by (17)applied to XAA :

1'{Q, (X AA) >_ A} <_ -2 ~` {Q (X A), )} <_ 2 X 1) .

Page 383: A Course in Probability Theory KL Chung

We have therefore established the inequality :

(18)

Z'1 1Q,, (X)

<`(X1)

for any positive martingale. Letting n --+ oo and then A --a oc, we obtain00

x~ = nl Qn (X) < oc a.e .j=1

Using Krickeberg's decomposition (Exercise 5 of Sec . 9.4) the last resultextends at once to any L 1 -bounded martingale . This was first proved by D . G .Austin. Similarly, the inequality (18) extends to any L 1 -bounded martingaleas follows :

(19)

~{Qn (X) ? A} < 6 suP(0(IXn j ) .

The details are left as an exercise. This result is due to D. Burkholder. Thesimplified proofs given above are due to A . Garsia .

(VIII) Derivation

Our final example is a feedback to the beginning of this chapter, namely touse martingale theory to obtain a Radon-Nikodym derivative . Let Y be anintegrable r.v. and consider, as in the proof of Theorem 9 .1 .1, the countablyadditive set function v below :

(20)

v(A) = [YdZ/ .JA

For any countable measurable partition {A~"~, j E N} of S2, let `'T„ be the Borelfield generated by it . Define the approximating function X„ as follows :

v(A~n) )tJWA J„>o 'J)

where the fraction is taken to be zero if the denominator vanishes . Accordingto the discussion of Sec . 9.1, we have

X ,1

X,, = ("IY I

Now suppose that the partitions become finer as n increases so that {J7;;, n E N}is an increasing sequence of Borel fields . Then we obtain by Theorem 9.4.8 :

limX„Y1 :'7U .11-+00

9.5 APPLICATIONS 1 369

Page 384: A Course in Probability Theory KL Chung

370 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

In particular if Y E ~~ we have obtained the Radon-Nikodym derivativeY = d v/d 7 as the almost everywhere limit of "increment ratios" over a "net" :

(n)

Y(w) = lim v( ~' ( c' ))n-aoo ~(A(n) )'

i(w)

where j(w) is the unique j such that w E Ann ) .If (0, ZT, ~P) is (ill, ~A, m) as in Example 2 of Sec. 3 .1, and v is an

arbitrary measure which is absolutely continuous with respect to the Lebesguemeasure m, we may take the nth partition to be 0 = i

(01n) < . . . < 4(n) = 1

such thato<max 1 (~kk+l -

n))- 0 .

For in this case ~& will contain each open interval and so also _10 . If v is notabsolutely continuous, the procedure above will lead to the derivative of itsabsolutely continuous part (see Exercise 14 below) . In particular, if F is thed.f. associated with the p .m. v, and we put

fn(x)=2nF k+1

-F( ) ]k

for <x<k+1 ,

2n

2n

2n

2n

where k ranges over all integers, then we have

Jim f n (x) = F' (x)n-* oo

for almost all x with respect to m ; and F' is the density of the absolutelycontinuous part of F ; see Theorem 1 .3 .1 . So we have come around to thebeginning of this course, and the book is hereby ended .

EXERCISES

* 1 . Suppose that (Xn , n E N} is a sequence of integer-valued r .v.'s havingthe following property . For each n, there exists a function pn of n integerssuch that for every k E N, we have

9'{Xk+J =x>, 1 < j < n} = Pn(XI, . . .,xn ) .

Define for a fixed xo :

Pn+1(xo,X1, . . .,Xn )Zn =

Pn(X1, . . .,Xn)

if the denominator >0 ; otherwise Zn = 0 . Then {Zn , n E N} is a martingalethat converges a .e. and in L' . [This is from information theory .]

Page 385: A Course in Probability Theory KL Chung

9.5 APPLICATIONS ( 371

2. Suppose that for each n, {Xj , 1 < j < n} and {X', 1 < j < n) haverespectively the n-dimensional probability density functions p, and qn . Define

Yn _ gn(X1, . . .,Xn )

pn(X1, . . .,X,)

if the denominator >0 and = 0 otherwise . Then {Yn , n E N} is a super-martingale that converges a .e. [This is from statistics .]

3. Let {Zn , n E N°} be positive integer-valued r .v.'s such that Z° - 1and for each n > 1, the conditional distribution of Zn given Z0 , . . . , Z n _ 1 isthat of Zn _ 1 independent r.v.'s with the common distribution {pk , k E N°),where p1 < 1 and

000<m= kpk<oo .

k=0

Then {Wn , n E N°}, where Wn = Zn /mn , is a martingale that converges, thelimit being zero if m < 1 . [This is from branching process .]

4. Let {Xn , n E N} be an arbitrary stochastic process and let ~' be as inSec. 8.1 . Prove that the remote field is almost trivial if and only if for eachA E ~& we have

lim sup 19(AM) - _,P(A)GJ'(M)j = 0 .n-±oo MEN'

n

[HINT : Consider 9'(A I Zx' ) and apply 9 .4.8 . This is due to Blackwell andFreedman .]

*5. In the notation of Theorem 9 .6 .2, suppose that there exists 8>0 suchthat

'flXj E B j i .o . I Xn } < 1 - 8 a.e. on {Xn E An } ;

then we have

Z'{Xj E Aj i.o . and Xj E Bj i .o .) - 0.

6. Let f be a real bounded continuous function on R1 and µ a p.m. onl such that

dx E !~4 1 : f (x) = I f (x + y)µ(dy) .I

Then f (x + s) = f (x) for each s in the support of ,u . In particular, if p is not ofthe lattice type, then f is constant everywhere . [This is due to G . A. Hunt, whoused it to prove renewal limit theorems . The approach was later rediscoveredby other authors, and the above result in somewhat more general context isnow referred to as Choquet-Deny's theorem .]

Page 386: A Course in Probability Theory KL Chung

372 1 CONDITIONING. MARKOV PROPERTY. MARTINGALE

*7. The analogue of Theorem 9.5.4 for p = 1 is as follows : if Xn > 0 forall n, then

(''{supXn } <e

[1 + sup i`{Xn log+ Xn}],n

e - 1

n

where log+ x = log x if x > 1 and 0 if x < 1 .*8 . As an example of a more sophisticated application of the martingale

convergence theorem, consider the following result due to Paul Levy . LetIX, n E N°) be a sequence of uniformly bounded r.v.'s, then the two series

~7Xn and

G{Xn I X1, . . .,Xn-1}n

n

converge or diverge together . [HINT : Letn

Yn = Xn - L {Xn I X1, . . . , Xn-1 } and Zn =j=1

Define a to be the first time that Zn > A and show that e (ZAn ) is boundedin n . Apply Theorem 9.4 .4 to {ZaAn } for each A to show that Zn converges onthe set where lim„ Z„ < oo ; similarly also on the set where lim71 Zn > -oc .The situation is reminiscent of Theorem 8 .2.5 .]

9. Let {Yk, 1 < k < n} be independent r.v.'s with mean zero and finitevariances ak ;

kSk=~Y i .

j=1 j=1a~ > 0, Zk = Sk - Sk

Prove that {Zk, 1 < k < n } is a martingale. Suppose now all Yk are boundedby a constant A, and define a and M as in the proof of Theorem 9 .4.1, withthe Xk there replaced by the Sk here . Prove that

Sna'(M`) < (F(Sa) _ (1(Sa) < (~.+A)2 .

Thus we obtain

:/'{ max ISk I < A} <(), 2A )z .1<k<n

11

an improvement on Exercise 3 of Sec . 5.3 . [This is communicated by Doob .]10 . Let {X,,, n E N} be a sequence of independent, identically distributed

r.v.'s with (' (IX 1 I) < oo ; and S„ = En=, Xj . Define a = inf{n > 1 : IXn I >

n}. Prove that if (f ((ISaI/a)l{a<.)) < oo, then f'(IX1I log+ IX 1 I) < oo. This

Page 387: A Course in Probability Theory KL Chung

is due to McCabe and Shepp . [HINT :

n

Cn = 11?/-){1X11 < j} -~ C>0 ;

j=1

°O 1 -1L

IXn I

C,~d'/ =

1X1 I dJ~ ;n=1 n {a=n}

n=1 n

{1X1I>n}

n=1n {an }

~ 1

- -I'sn-1 I d, < oc .]

9.5 APPLICATIONS 1 373

11. Deduce from Exercise 10 that c~(sup n ISn 1 /n) < oc if and onlyif c`(IX1I log+ IXi1) < oo . [HINT: Apply Exercise 7 to the martingaleI . . . . S„ /n, . . . , S2/2, S 1 } in Example (V).]

12. In Example (VI) show that (i) -r,- is generated by rl ; (ii) the events{En , n E NJ are conditionally independent given 17 ; (iii) for any l events Enj ,1 < j < l and any k < l we have

f1~'{E,, , n . . . n E,lA n En L+I n . . . n En } = J

Xk (l - x)l-kG(dx)

0

where G is the distributions of q .13. Prove the inequality (19) .

*14. Prove that if v is a measure on ~~ that is singular with respect to./ , then the X,'s in (21) converge a .e. to zero . [HINT : Show that

v(A) = Xn dZ' for A E

m < n,n

and apply Fatou's lemma . {X" } is a supermartingale, not necessarily a martin-gale!]

15. In the case of (V, A, m), suppose that v = 6 1 and the nth partitionis obtained by dividing °l/ into 2" equal parts : what are the X„'s in (21)? Usethis to "explain away" the St . Peterburg paradox (see Exercise 5 of Sec . 5 .2) .

Bibliographical Note

Most of the results can be found in Chapter 7 of Doob [17] . Another useful account isgiven by Meyer [20] . For an early but stimulating account of the connections betweenrandom walks and partial differential equations, see

A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeitsrechnung . Springer-Verlag, Berlin, 1933 .

Page 388: A Course in Probability Theory KL Chung

374 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE

Theorems 9 .5.1 and 9 .5 .2 are contained in

Kai Lai Chung, The general theory of Markov processes according to Doeblin,Z. Wahrscheinlichkeitstheorie 2 (1964), 230-254 .

For Theorem 9 .5.4 in the case of an independent process, see

J. Marcinkiewicz and A. Zygmund, Sur les fonctions independants, Fund . Math .29 (1937), 60-90,

which contains other interesting and useful relics .The following article serves as a guide to the recent literature on martingale

inequalities and their uses :

D. L. Burkholder, Distribution function inequalities for martingales, Ann. Proba-bility 1 (1973), 19-42 .

Page 389: A Course in Probability Theory KL Chung

Supplement: Measure andIntegral

For basic mathematical vocabulary and notation the reader is referred to § 1 .1and §2.1 of the main text .

1 Construction of measure

Let Q be an abstract space and / its total Borel field, then A E J meansAc Q .

DEFINITION 1 . A function A* with domain / and range in [0, oc] is anouter measure iff the following properties hold :

(a) A* (0) = 0 ;(b) (monotonicity) if A I C A2, then /-t* (A I ) < ~~ * 02) ;(c) (subadditivity) if {A j} is a countable sequence of sets in J, then

(UA j

i<

µ*(A j ).

DEFINITION 2 . Let - be a field in Q . A function µ with domain ~'Ao andrange in [0, oo] is a measure on A% iff (a) and the following property hold :

Page 390: A Course in Probability Theory KL Chung

376 1 SUPPLEMENT: MEASURE AND INTEGRAL

(d) (additivity) if {BJ } is a countable sequence of disjoint sets in tandUJ BJ E :%, then

(1)

Let us show that the properties (b) and (c) for outer measure hold for a measureg, provided all the sets involved belong to A.

If A 1

A2 E z~b, and A 1 C A2 , then A 'A2 E A because ~5t- is a field ;A2 =A 1 UA~A2 and so by (d) :

g(A2) = /,t(A1)+ g(A'A2) ? A00-

Next, if each AJ E ~b , and furthermore if U J A J E A (this must be assumedfor a countably infinite union because it is not implied by the definition of afield!), then

-A 1 UAlA2 UAlA2A3 U . . .J

and so by (d), since each member of the disjoint union above belongs to : 3 :

g UAJ = A(Ai) + g(A,A2) + A(AlA2A3) + . . .I

< g(A1)+g(A2)+A03)+ . . .

by property (b) just proved .The symbol N denotes the sequence of natural numbers (strictly positive

integers) ; when used as index set, it will frequently be omitted as understood .For instance, the index j used above ranges over N or a finite segment of N.

Now let us suppose that the field :b is a Borel field to be denoted byand that g is a measure on it . Then if A„ E '- for each n E N, the countableunion U, A„ and countable intersection nn A„ both belong to ~T- . In this casewe have the following fundamental properties .

(e) (increasing limit) if A„ C A, + 1 for all n and A, T A - Un A,,, then

lim T g(A„) = A(A) .17

(f) (decreasing limit) if A„ D A„ + 1 for all n, A„ 4, A = nn A,,, and forsome n we have g (A„) < oo, then

1im ~ A(An) = A(A) .

It (UBi) -i )

11

Jg(Bj) .

Page 391: A Course in Probability Theory KL Chung

I CONSTRUCTION OF MEASURE 1 377

The additional assumption in (f) is essential . For a counterexample letA„ _ (n, oo) in R, then A„ 4, 4 the empty set, but the measure (length!) of A„is +oc for all n, while 0 surely must have measure 0 . See §3 for formalitiesof this trivial example. It can even be made discrete if we use the countingmeasure # of natural numbers : let A„ _ {n, n + 1, n + 2, . . .} so that #(A„) _+oc, #(n n An) = 0 .

Beginning with a measure µ on a field A, not a Bore] field, we proceedto construct a measure on the Borel field 7 generated by A, namely theminimal Borel field containing % (see §2 .1). This is called an extension ofji from A to Y , when the notation it is maintained . Curiously, we do thisby first constructing an outer measure A* on the total Borel field J and thenshowing that µ* is in truth a measure on a certain Borel field to be denotedby z~;7* that contains A. Then of course 3T* must contain the minimal ~, andso it* restricted to Z is an extension of the original µ from A to 7. But wehave obtained a further extension to ~A * that is in general "larger" than 3T andpossesses a further desirable property to be discussed .

DEFINITION 3 . Given a measure A on a field A in Q, we define µ* onJ as follows, for any A E J:

(2)

A*(A) = inf

,u(BJ)J

BJ E A for all j and U BJ D A .

A countable (possibly finite) collection of sets {BJ } satisfying the conditionsindicated in (2) will be referred to below as a "covering" of A. The infimumtaken over all such coverings exists because the single set c2 constitutes acovering of A, so that

0 < u* (A) < i.c * (Q) < +oo .

It is not trivial that µ* (A) = ,u (A) if A E %, which is part of the next theorem .

Theorem 1 . We have µ* = I, on A; u* on J is an outer measure .

PROOF . Let A E ~b, then the single set A serves as a covering of A ; henceµ* (A) < µ(A). For any covering {BJ } of A, we have AB j E A and

UABJ =AEI ;J

hence by property (c) of µ on 3~ followed by property (b) :

µ(A) = µ (UABJ) <

A(ABJ) <

ti (BJ) .J J

Page 392: A Course in Probability Theory KL Chung

378 1 SUPPLEMENT : MEASURE AND INTEGRAL

It follows from (2) that p(A) < ,u*(A) . Thus µ* _ /I. on A .

To prove u* is an outer measure, the properties (a) and (b) are trivial .To prove (c), let E > 0. For each j, by the definition of ,u* (A j ), there exists acovering {Bjk} of Aj such that

~A(Bjk) < A* (4 (4j)+ 2jik

The double sequence {Bjk} is a covering of Uj A j such that

Hence for any e > 0 :

,u UAj <

,u* (Aj ) +c

that establishes (c) for A*, since E is arbitrarily small .With the outer measure E.t*, a class of sets ,* is associated as follows .

DEFINITION 4. A set A c Q belongs to ~T* iff for every Z c S2 we have

(3)

A* (Z) _ It* (AZ) + µ * (A`Z) .

If in (3) we change "_" into "<", the resulting inequality holds by (c) ; hence(3) is equivalent to the reverse inequality when "=" is changed into ">" .

Theorem 2. Y * is a Borel field and contains ~ . On 97*, It* is a measure .

PROOF . Let A E A . For any Z c 0 and any c > 0, there exists a covering(Bj } of Z such that

E It(Bik) <j

jI*(Aj)+E .

(4) ~µ(Bj) < µ* (Z) +j

Since AB j E ~;b, JAB j } is a covering ofAZ; {AcBj } is a covering of A Z, hence

(5)

A*(AZ) _<

µ(ABj), It* (A`Z) <

It(ACBj).

Since A is a measure on A, we have for each j :

'(6)

I(ABj) + I(AcBj) - A(Bj ) .

Page 393: A Course in Probability Theory KL Chung

1 CONSTRUCTION OF MEASURE 1 379

It follows from (4), (5), and (6) that

A*(AZ) + µ*(A`Z) < µ*(Z) + E.

Letting E ~ 0 establishes the criterion (3) in its ">" form . Thus A E ~T*, andwe have proved that 3 C ~*.

To prove that 3;7* is a Borel field, it is trivial that it is closed undercomplementation because the criterion (3) is unaltered when A is changedinto A` . Next, to show that -IT* is closed under union, let A E 9,-* and B E 3~7* .Then for any Z C Q, we have by (3) with A replaced by B and Z replaced byZA or ZA` :

A*(ZA) = A*(ZAB) + µ*(ZAB');

A* (ZA`) _ A * (ZA`B) + It* (ZA`B` ) .

Hence by (3) again as written :

A* (Z) = ,u* (ZAB) + A* (ZAB`) + A * (ZA`B) + µ* (ZA`B`) .

Applying (3) with Z replaced by Z(A U B), we have

A*(Z(A U B)) = µ*(Z(A U B)A) + p*(Z(A U B)A`)

= ,u * (ZA) + A* (ZA`B)= µ*(ZAB) + It* (ZAB`) + µ*(ZA`B) .

Comparing the two preceding equations, we see that

/J-* (Z) =,u*(Z(A U B)) + µ*(Z(A U B)`).

Hence A U B E *, and we have proved that * is a field .Now let {Aj } be an infinite sequence of sets in * ; put

j-1B I =A 1 , Bj =Aj \ UA for j > 2.

(i=1

Then {Bj } is a sequence of disjoint sets in * (because * is a field) and hasthe same union as {Aj} . For any Z C Q, we have for each n > 1 :

nJ _ g (Z(~B1)Bfl) + /G* (Z(~B J)B)n

j=1

j=1

=µ*(ZBn)+A* (Z flU: BJ)j=1 )

Page 394: A Course in Probability Theory KL Chung

380 1 SUPPLEMENT : MEASURE AND INTEGRAL

because B,, E T * . It follows by induction on n that

n

(7)

~l Z U B j =

µ*(ZBj ) .J=1

j=1

Since U". ] B j E :7*, we have by (7) and the monotonicity of 1U* :

n

n

~(.( * (Z )

( j=1

j=1U Bj -}- ~(.l.* Z U Bj

cn

pp

,u *(ZBj) + g* Z U Bj

>=1

J=1

Letting n f oo and using property (c) of /f , we obtain

00

,u * (Z )> µ* Z U Bj + ,u * Z ( 00U Bj

j=1

j=1

that establishes U~° 1 Bj E * . Thus J7* is a Borel field .Finally, let {Bj } be a sequence of disjoint sets in * . By the property (b)

of ,u* and (7) with Z = S2, we have

The proof of Theorem 2 is complete .

2 Characterization of extensions

We have proved that

where some of the "n" may turn out to be "=" . Since we have extended themeasure µ from moo to .f * in Theorem 2, what for :J7h-? The answer will appearin the sequel .

cc

n

n

o0

,u U Bj > lim sup /f U Bj = 1 nm µ (B1) = /f (B j ) .

j=1

n j=1

j=1

j=

Combined with the property (c) of /f , we obtain the countable additivity ofµ* on *, namely the property (d) for a measure :

/*(Bj)

j=1

Page 395: A Course in Probability Theory KL Chung

2 CHARACTERIZATION OF EXTENSIONS 1 381

The triple (S2, zT, µ) where ~ is a Borel field of subsets of S2, and A is ameasure on will be called a measure space . It is qualified by the adjective"finite" when ,u(S2) < oc, and by the noun "probability" when ,u(Q) = l .

A more general case is defined below .

DEFINITION 5. A measure ,u on a field 33 (not necessarily Borel field) issaid to be a-finite iff there exists a sequence of sets IQ,, n E N} in J~ suchthat µ (Q,,) < oc for each n, and U 17 Stn = S2 . In this case the measure space(Q, :h , p), where JT is the minimal Borel field containing , is said to be"a-finite on so".

Theorem 3 . Let ~~ be a field and z the Borel field generated by r~ . Let µland µ2 be two measures on J that agree on ~ . If one of them, hence bothare a-finite on `b-, then they agree on 1 .

PROOF . Let {S2„ } be as in Definition 5 . Define a class 6F of subsets of S2as follows :

F = {A C S2 : I,1(S2 nA) = A2 (S2nA) for all n E N} .

Since Q,, E A, for any A E 3 we have QA E - for all n ; hence 6' D :3 .Suppose Ak E l A k C Ak+I for all k E N and Ak T A. Then by property (e)of µ1 and /L2 as measures on

we have for each n

µi (S2„ A) = 1km t ,uI (Q,,Ak) = liken T A2(S2nAk) = ,2(S2nA) .

Thus A E 6'. Similarly by property (f), and the hypothesis (S2n ) _/L2(Q,,) < oo, if Ak E f and Ak 4, A, then A E (,-- . Therefore { is closed underboth increasing and decreasing limits; hence D : by Theorem 2.1 .2 of themain text. This implies for any A E

ii1(A) = lira T /L1 (2nA) = lim T /2(QnA) _ U2 (A)n

n

by property (e) once again . Thus ,u, and lµ2 agree on .It follows from Theorem 3 that under the a-finite assumption there, the

outer measure µ* in Theorem 2 restricted to the minimal Borel fieldcontaining -% is the unique extension of A from 3 to 3 . What about themore extensive extension to ~T*? We are going to prove that it is also uniquewhen a further property is imposed on the extension . We - begin by definingtwo classes of special sets in . .

DEFINITION 6. Given the field of sets in Q, let <'i5 be the collectionof all sets of the form n,n° I U,°°_ 1 Bn,n where each B,,,, E A, and %sa be thecollection of all sets of the form Un,° I nnc) I B,nn where each Bn,n E •'iO .

Page 396: A Course in Probability Theory KL Chung

382 1 SUPPLEMENT : MEASURE AND INTEGRAL

Both these collections belong to ~T because the Borel field is closed undercountable union and intersection, and these operations may be iterated, heretwice only, for each collection . If B E .~b , then B belongs to both %os andz~bba because we can take B,,,,, = B. Finally, A E Avb if and only if A` E As,because

/nU Bmn = Un BmnIn n

m n

Theorem 4. Let A E T * . There exists B E &b such that

A CB ; It* (A) = A* (B).

If u is a-finite on ~, then there exists C E Aba such that

C C A ; ,u*(C) = A*(A) .

PROOF . For each m, there exists {Bmn } in ~T such that

A C U Bmnn

n

PutB,n = U Bmn ; B = nBm

n

m

then A C B and B E A~b . We have

h* (B) < p * (Bm) <

*(Bmn) < l-t * (A)+ 1M

A* (Bmn) :!~ A* (A) + 1 .m

n

Letting m T oo we see that It*(B) < p*(A) ; hence ,a*(B) =,cc*(A) . The firstassertion of the theorem is proved .

To prove the second assertion, let 2, be as in Definition 5 . Applying thefirst assertion to S2„AC , we have Bn E ~o,b such that

Q,,A` C B,, ; µ * (Q A`) =,u * (B„) .

Hence we have

Q,A` C QnBn ; h * ( Q A c ) = li * (QnBn) .

Taking complements with respect to SZ n , we have since µ * (Q,,) < oo :

Q,A D SZnB; ;

W (QnA) - /t * (Qn) - l-t * (Q Ac ) = lt * (Qn) - li* (QnB,,) = A* (QnBC) .

Page 397: A Course in Probability Theory KL Chung

2 CHARACTERIZATION OF EXTENSIONS 1 383

Since St n E ~ and Bn E aQ , it is easy to verify that S2„Bn E .7u s, by thedistributive law for the intersection with a union . Put

C = U Q n B'nnn

It is trivial that C E %SQ and

A = U S21A D C.n

Consequently, we have

µ* (A) > µ*(C) > Jiminf/c * (S2nBn)n

= lim inf µ* (Q,,A) -,a* (A),n

the last equation owing to property (e) of the measure A* . Thus A* (A) =A* (C), and the assertion is proved .

The measure A* on 3T* is constructed from the measure A on the field. The restriction of [c* to the minimal Borel field ~ containing ~?'b7- will

henceforth be denoted by A instead of µ* .In a general measure space (Q, v), let us denote by ..4"(fi', v) the class

of all sets A in ~ with v(A) = 0. They are called the null sets when -,6' andv are understood, or v-null sets when ry is understood . Beware that if A C Band B is a null set, it does not follow that A is a null set because A may notbe in ! This remark introduces the following definition .

DEFINITION 7 . The measure space (Q, ~', v) is called complete iff anysubset of a null set is a null set .

Theorem 5. The following three collections of subsets of Q are idential :

(i) A C S2 and the outer measure /.c * (A) = 0;

(ii) A E ~T * and µ*(A) = 0 ;

(iii) A C B where B E :% and µ(B) = 0 .

It is the collection ( "( *, W) .

PROOF . If µ* (A) = 0, we will prove A E :?X-7* by verifying the criterion(3) . For any Z C Q, we have by properties (a) and (b) of µ* :

0 < µ*(ZA) < µ* (A) = 0 ; u* (ZA`) < /.t* (Z) ;

Page 398: A Course in Probability Theory KL Chung

384 1 SUPPLEMENT : MEASURE AND INTEGRAL

and consequently by property (c) :

g*(Z) = g* (ZA U ZA`) < g*(ZA) + g*(ZA`) < g* (Z) .

Thus (3) is satisfied and we have proved that (i) and (ii) are equivalent .

Next, let A E T * and g* (A) = 0 . Then we have by the first assertion inTheorem 4 that there exists B E 3;7 such that A C B and g* (A) = g (B). ThusA satisfies (iii) . Conversely, if A satisfies (iii), then by property (b) of outermeasure : g* (A) < g* (B) = g(B) = 0, and so (i) is true .

As consequence, any subset of a (ZT*, g*)-null set is a (*, g*)-null set .This is the first assertion in the next theorem .

Theorem 6. The measure space (Q, *, g*) is complete . Let (Q, 6-7 , v) be acomplete measure space; §' D A and v = g on A. If g is a-finite on A then

;J' D3* and v=g* on :'* .

PROOF . Let A E *, then by Theorem 4 there exists B E O and C Esuch that

(8)

C C A C B ; g(C) = g*(A) = A(B) .

Since v =,u on zoo , we have by Theorem 3, v = g on 3T . Hence by (8) wehave v(B - C) = 0. Since A - C C B - C and B - C E ;6', and (Q, ;6', v) iscomplete, we have A - C E -6- and so A = C U (A - C) E .

Moreover, since C, A, and B belong to , it follows from (8) that

A(C) = v(C) < v(A) < v(B) = g(B)

and consequently by (8) again v(A) = g(A). The theorem is proved .

To summarize the gist of Theorems 4 and 6, if the measure g on the field~To is a-finite on A, then (3;7 , g) is its unique extension to ~ , and ( *, g* )is its minimal complete extension . Here one is tempted to change the notationgtogoonJ%!

We will complete the picture by showing how to obtain (J~ *, g*) from(~?T , g), reversing the order of previous construction . Given the measure space(Q, ZT, g), let us denote by the collection of subsets of 2 as follows: A E Ciff there exists B E I _(:T, g) such that A C B . Clearly has the "hereditary"property : if A belongs to C, then all subsets of A belong to ~ ; is also closedunder countable union. Next, we define the collection

(9)

3 ={ACQIA =B-C where BE`,CEC} .

where the symbol "-" denotes strict difference of sets, namely B - C = BC`where C C B . Finally we define a function µ on :-F as follows, for the A

Page 399: A Course in Probability Theory KL Chung

2 CHARACTERIZATION OF EXTENSIONS 1 385

shown in (9) :

(10)

A(A) = µ(B) .

We will legitimize this definition and with the same stroke prove the mono-tonicity of µ. Suppose then

(11) B1 - C1CB2 - C2, BiE9,CiE ,i=1,2 .

Let C 1 C D E J '(Z,,7 , µ) . Then B 1 C B2 U D and so µ(B1 ) < µ(B2 U D) =µ(B2)- When the C in (11) is "=", we can interchange B 1 and B2 to concludethat µ(B1) = µ(B2), so that the definition (10) is legitimate .

Theorem 7. ~F is a Borel field and µ is a measure on 7 .

PROOF . Let An E 3T, n E N ; so that An = BJ1 Cn as in (9) . We have then

00

( '

'n

n Bn n U Cn .n=1

n=1

n=1

Since the class -C is closed under countable union, this shows that ~ is closedunder countable intersection . Next let C C D, D E At(3~7 , µ) ; then

A`=B`UC=B`UBC=B`U(B(D-(D-C)))

=(B`UBD)-B(D-C).

Since B(D - C) C D, we have B(D - C) E f; hence the above shows thatis also closed under complementation and therefore is a Borel field. Clearly~T D ~ because we may take C = 0 in (9) .

To prove J is countably additive on 7, let JA„) be disjoint sets in7. Then

All = Bn -C11 , Bn E ~J , Cn E .

There exists D in

µ) containing U1 Cn . Then {Bn - D) aredisjoint and

00

00

cc

U (Bn - D) C U An C U Bn .n=1

n=1

n=1

All these sets belong to ?T and so by the monotonicity of µ:

A < A (uAn < µ (uB) .nn

n

Page 400: A Course in Probability Theory KL Chung

386 1 SUPPLEMENT: MEASURE AND INTEGRAL

Since µ = µ on ~T, the first and third members above are equal to, respec-tively :

A (u(B n - D) = l-, (Bn -

l-t(Bn) µ(A,,);n n n n

n

Therefore we have

Since µ(q) = µ(O - q) = /.t(O) = 0, µ is a measure on 9- .

Corollary. In truth: 2 = * and µ = µ* .

PROOF . For any A E *, by the first part of Theorem 4, there exists B E

such thatA=B-(B-A), µ*(B-A)=0 .

C

Hence by Theorem 5, B - A E and so A E ~T by (9) . Thus 97* c . Since* and { E * by Theorem 6, we have 9 7 C 97* by (9) . Hence =

*. It follows from the above that µ*(A) = µ(B) = µ(A) . Hence µ* = µ on

The question arises naturally whether we can extend µ from 5 todirectly without the intervention of * . This is indeed possible by a method oftransfinite induction originally conceived by Borel ; see an article by LeBlancand G.E. Fox: "On the extension of measure by the method of Borel", Cana-dian Journal of Mathematics, 1956, pp. 516-523 . It is technically lengthierthan the method of outer measure expounded here .

Although the case of a countable space 2 can be treated in an obviousway, it is instructive to apply the general theory to see what happens .

Let Q = N U w ; A is the minimal field (not Borel field) containing eachsingleton n in N, but not w. Let Nf denote the collection of all finite subsetsof N; then `~o consists of Nf and the complements of members of Nf (withrespect to Q), the latter all containing w. Let 0 < µ(n) < oc for all n E N ; ameasure µ is defined on alb as follows :

µ(A) = / Jµ(n) if A E Nf ; µ(A`) = µ(S2) - µ(A) .

nEA

We must still define µ(Q). Observe that by the properties of a measure, wehave µ(S2) > >fEN lc(n) = s, say .

n

µ ~UAn =n

It (u)Bn ) :!~

1i(Bn)

lT(An)n

n710n)-

Page 401: A Course in Probability Theory KL Chung

Now we use Definition 3 to determine the outer measure µ* . It is easyto see that for any A C N, we have

A* (A) =

a (n)EA

In particular A* (N) = s . Next we have

µ* (w) = inf /-t (A`) = µ(0) - sup µ(A) = µ(2) - sAENf

AENf

provided s < oc ; otherwise the inf above is oo . Thus we have

µ* (w) _ A(Q) - s if µ(2) < oc ; µ* (a)) = oc if µ(2) = cc .

It follows that for any A C S2 :

µ* (A) =

µ* (n)nEA

where µ* (n) = g (n) for n E N. Thus µ* is a measure on,/, namely, ~?T* = J .But it is obvious that ~T- = J since 3 contains N as countable union and

so contains co as complement . Hence ~T = ~ * = J .If µ(S2) oc and s = oc, the extension µ* of a to J is not unique,

because we can define µ((o) to be any positive number and get an exten-sion . Thus µ is not 6-finite on A, by Theorem 3. But we can verify thisdirectly when µ(S2) = oc, whether s = oc or s < oc . Thus in the latter case,A(c)) = cc is also the unique extension of µ from A to J. This means thatthe condition of a-finiteness on A is only a sufficient and not a necessarycondition for the unique extension .

As a ramification of the example above, let Q = N U co, U cot , with twoextra points adjoined to N, but keep J,O as before. Then (_ *) is strictlysmaller than J because neither w l nor 0)2 belongs to it . From Definition 3 weobtain

µ*(c) 1 U cot) = Lc*(w1) = A*(w2)

Thus µ* is not even two-by-two additive on .' unless the three quantitiesabove are zero. The two points coi and cot form an inseparable couple . Weleave it to the curious reader to wonder about other possibilities .

3 Measures in R

Let R = (- oc, +oo) be the set of real members, alias the real line, with itsEuclidean topology. For -oo < a < b < +oc,

(12)

(a,b]={xER:a<x<b)

3 MEASURES IN R 1 387

Page 402: A Course in Probability Theory KL Chung

388 1 SUPPLEMENT : MEASURE AND INTEGRAL

is an interval of a particular shape, namely open at left end and closed at rightend. For b = +oc, (a, +oo] = (a, +oo) because +oc is not in R. By choiceof the particular shape, the complement of such an interval is the union oftwo intervals of the same shape :

(a, b]` = (-oc, a] U (b, oo] .

When a = b, of course (a, a] = q5 is the empty set. A finite or countablyinfinite number of such intervals may merge end to end into a single one asillustrated below :

°°

1

1(13)

(0,21=(0,11U(1,21 ; (0,11=Un+1' nn=1

Apart from this possibility, the representation of (a, b] is unique .The minimal Borel field containing all (a, b] will be denoted by 3 and

called the Borel field of R. Since a bounded open interval is the countableunion of intervals like (a, b], and any open set in R is the countable unionof (disjount) bounded open intervals, the Borel field A contains all opensets; hence by complementation it contains all closed sets, in particular allcompact sets . Starting from one of these collections, forming countable unionand countable intersection successively, a countable number of times, one canbuild up A3 through a transfinite induction .

Now suppose a measure m has been defined on J6, subject to the soleassumption that its value for a finite (alias bounded) interval be finite, namelyif -oc < a < b < + oo, then

(14)

0 < m((a, b]) < oc .

We associate a point function F on R with the set function m on

as follows :

(15) F(0) = 0; F(x) = m((0, x]) for x > 0; F(x) = -m((x, 0]) for x < 0 .

This function may be called the "generalized distribution" for m. We see thatF is finite everywhere owing to (14), and

(16)

m((a, b1) = F(b) - F(a).

F is increasing (viz. nondecreasing) in R and so the limits

F(+oc) = lim F(x) < +oo, F(-oo) = lim F(x) > -oox- +oo

x-+ -oo

both exist. We shall write oc for +oc sometimes . Next, F has unilateral limitseverywhere, and is right-continuous :

F(x-) < F(x) = F(x+) .

Page 403: A Course in Probability Theory KL Chung

The right-continuity follows from the monotone limit properties (e) and (f )of m and the primary assumption (14) . The measure of a single point x isgiven by

m(x) = F(x) - F(x-).

We shall denote a point and the set consisting of it (singleton) by the samesymbol .

The simplest example of F is given by F(x) __ x . In this case F is continu-ous everywhere and (14) becomes

m((a, b]) = b - a .

We can replace (a, b] above by (a, b), [a, b) or [a, b] because m(x) = 0 foreach x . This measure is the length of the line-segment from a to b. It wasin this classic case that the following extension was first conceived by SmileBorel (1871-1956) .

We shall follow the methods in §§1-2, due to H . Lebesgue andC. Caratheodory. Given F as specified above, we are going to construct ameasure m on _1Z and a larger Borel field _1~6* that fulfills the prescription (16) .

The first step is to determine the minimal field Ao containing all (a, b] .Since a field is closed under finite union, it must contain all sets of the form

n

(17)

B- U I j, Ij = (a j, bj ], 1< j< n ; n E N .j=1

Without loss of generality, we may suppose the intervals I j to be disjoint, bymerging intersections as illustiated by

(1, 31 U (2, 41 = (1, 41 .

Then it is clear that the complement BC is of the same form. The union oftwo sets like B is also of the same form. Thus the collection of all sets likeB already forms a field and so it must be Ao . Of course it contains (includes)the empty set ¢ = (a, a] and R. However, it does not contain any (a, b) exceptR, [a, b), [a, b], or any single point!

Next we define a measure m on Ao satisfying (16) . Since the condition(d) in Definition 2 requires it to be finitely additive, there is only one way :for the generic B in (17) with disjoint Ij we must put

n n

(18)

m(B) =

ii) =

(F(bj) - F(aj)) .j=1 j=1

3 MEASURES IN R 1 389

Having so defined m on Ao, we must now prove that it satisfies the condi-tion (d) in toto, in order to proclaim it to be a measure on o . Namely, if

Page 404: A Course in Probability Theory KL Chung

390 1 SUPPLEMENT : MEASURE AND INTEGRAL

{Bk , 1 < k < I < oc} is a finite or countable sequence of disjoint sets in Ao,we must prove

1

(19)

m U Bk =k=1

whenever I is finite, and moreover when I = oc and the union Uk_ 1 Bk happensto be in : 3o .

The case for a finite 1 is really clear. If each Bk is represented as in(17), then the disjoint union of a finite number of them is represented in asimilar manner by pooling together all the disjoint I1's from the Bk's. Then theequation (19) just means that a finite double array of numbers can be summedin two orders .

If that is so easy, what is the difficulty when l = oo? It turns out, asBore] saw clearly, that the crux of the matter lies in the following fabulous"banality ."

Borel's lemma. If -oo < a < b < +oc and

00(20)

(a, b] = U(aj, b1],j=1

where aj < bj for each j, and the intervals (a j , b j ] are disjoint, then we have

(21)

F(b) - F(a)00

j=1

k=m(Bk) .

(F(bj) - F(aj )) .

PROOF . We will first give a transfinite argument that requires knowledgeof ordinal numbers . But it is so intuitively clear that it can be appreciatedwithout that prerequisite. Looking at (20) we see there is a unique indexj such that hj = b ; name that index k and rename ak as c1 . By removing(ak , bk] = (c 1 , b] from both sides of (20) we obtain

00(22)

(a, c1] = U(aj , bj ]J-1j#k

This small but giant step shortens the original (a, b] to (a, c1] . Obviously wecan repeat the process and shorten it to (a, c2] where a < c2 < c1 = b, and soby mathematical induction we obtain a sequence a < c„ < . . . < c2 < c1 = b .

Needless to say, if for some n we have c„ = a, then we have accom-plished our purpose, but this cannot happen under our specific assumptionsbecause we have not used up all the infinite number of intervals in the union .

Page 405: A Course in Probability Theory KL Chung

Therefore the process must go on ad infinitum . Suppose then c„ > c„+1 forall 17 E N, so that cu, = lim„ ~ c„ exists, then cu, > a. If cu, = a (which caneasily happen, see (13)), then we are done and (21) follows, although the termsin the series have been gathered step by step in a (possibly) different order .What if cu, > a? In this case there is a unique j such that bj = cw ; renamethe corresponding aj as c,,1 . We have now

(23)

(a, c.] =

where the (a', b'J's are the leftovers from the original collection in (20) afteran infinite number of them have been removed in the process . The interval(cc 1, cu,] is contained in the reduced new collection and we can begin a newprocess by first removing it from both sides of (23), then the next, to bedenoted by [cu,2, c,,,], and so on. If for some n we have cu,,, = a, then (21)is proved because at each step a term in the sum is gathered . Otherwise thereexists the limit lim ~ cu,,, = cww > a . If c,,, = a, then (21) follows in the limit .Otherwise c,,,,,, must be equal to some bj (why?), and the induction goes on .Let us spare ourselves of the cumbersome notation for the successive well-ordered ordinal numbers . But will this process stop after a countable numberof steps, namely, does there exist an ordinal number a of countable cardinalitysuch that ca = a? The answer is "yes" because there are only countably manyintervals in the union (20) .

The preceding proof (which may be made logically formal) reveals thepossibly complex structure hidden in the "order-blind" union in (20) . Borel inhis These (1894) adopted a similar argument to prove a more general resultthat became known as his Covering Theorem (see below) . A proof of the lattercan be found in any text on real analysis, without the use of ordinal numbers .We will use the covering theorem to give another proof of Borel's lemma, forthe sake of comparison (and learning) .

This second proof establishes the equation (21) by two inequalities inopposite direction. i he first inequality is easy by considering the first n termsin the disjoint union (20) :

n

F(b) - F(a)

(F(bj) - F(aj)) .j=1

As n goes to infinity we obtain (21) with the "=" replaced by ">".The other half is more subtle : the reader should pause and think why?

The previous argument with ordinal numbers tells the story .

00

U (0j, bj],j=1

3 MEASURES IN R 1 391

Page 406: A Course in Probability Theory KL Chung

392 1 SUPPLEMENT : MEASURE AND INTEGRAL

Borel's covering theorem . Let [a, b] be a compact interval, and (aj , bj ),j E N, be bounded open intervals, which may intersect arbitrarily, such that

[a, b] C U (aj, bj) .j=1

(24)

Then there exists a finite integer 1 such that when 1 is substituted for oo inthe above, the inclusion remains valid .

In other words, a finite subset of the original infinite set of open inter-vals suffices to do the covering . This theorem is also called the Heine-BorelTheorem; see Hardy [1] (in the general Bibliography) for two proofs by Besi-covitch .

To apply (24) to (20), we must alter the shape of the intervals (aj , bj] tofit the picture in (24) .

Let -oo < a < b < oc ; and E > 0. Choose a' in (a, b), and for each jchoose bj > bj such that

(25)

F(a') - F(a) < 2 ; F(b') - F(bj )< 2j+ 1 '

These choices are possible because F is right continuous ; and now we have

00[a', b] C U (a j , b')

j=1

as required in (24) . Hence by Borel's theorem, there exists a finite l such that

t

(26)

[a', b] C U (aj , b') .j=1

From this it follows "easily" that

(27)

F(b) - F(a')

(F(bj ) - F(aj )) .j=1

We will spell out the proof by induction on 1 . When l = 1 it is obvious .Suppose the assertion has been proved for l - 1, l > 2 . From (26) as written,there is k, 1 < k < 1, such that ak < a' < bk and so

(28)

F(bk) - F(d) < F'(bk) - F(ak) .

Page 407: A Course in Probability Theory KL Chung

If we intersect both sides of (26) with the complement of (ak , bk ), we obtain

i[bk , b] C U(aj, bj) .

j=1j#k

Here the number of intervals on the right side is 1 - 1 ; hence by the inductionhypothesis we have

F(b) - F(bk ) -

(F(bj ) - F(aj))j=1j#k

Adding this to (28) we obtain (27), and the induction is complete . It followsfrom (27) and (25) that

F(b) - F(a)

(F(bj ) - F(aj )) + E.

j=1

Beware that the 1 above depends on E . However, if we change l to 00 (back toinfinity!) then the infinite series of course does not depend on E . Therefore wecan let c -* 0 to obtain (21) when the "=" there is changed to " _<", namelythe other half of Borel's lemma, for finite a and b.

It remains to treat the case a = -oo and/or b = +oo . Let00

(-oo, b] C U (aj, b j] .j=1

Then for any a in (-oc, b), (21) holds with "=" replaced by "<<". Lettinga -± -oc we obtain the desired result . The case b = +oo is similar. Q.E.D.

In the following, all I with subscripts denote intervals of the shape (a, b] ;denotes union of disjoint sets . Let B E 730 ; B j E ?,/3o, j E N. Thus

Suppose

so that

(29)

!1

B = >' I; ;

jk .1=1

00

j=1

11

i=1 j=1 k=1

11j

k=1

jk •

3 MEASURES IN R 1 393

Page 408: A Course in Probability Theory KL Chung

394 1 SUPPLEMENT : MEASURE AND INTEGRAL

We will prove

0C

I

(30)

m(Ii) E

m(Ijk) .r=1

j=1 k=1

For n = 1, (29) is of the form (20) since a countable set of sets can beordered as a sequence . Hence (30) follows by Borel's lemma . In general,simple geometry shows that each I i in (29) is the union of a subcollection ofthe 1jk's . This is easier to see if we order the Ii 's in algebraic order and, aftermerging where possible, separate them at nonzero distances. Therefore (30)follows by adding n equations, each of which results from Borel's lemma .

This completes the proof of the countable additivity of m on ?A0, namely(19) is true as stipulated there for 1 = oo as well as l < oc .

The general method developed in § 1 can now be applied to (R, J3o, m) .Substituting J3o for zTb, m for t in Definition 3, we obtain the outer measurem* . It is remarkable that the countable additivity of m on A0, for which twopainstaking proofs were given above, is used exactly in one place, at the begin-ning of Theorem 1, to prove that m* = m on .730 . Next, we define the Borelfield 73* as in Definition 4 . By Theorem 6, (R, ?Z*, m*) is a complete measurespace. By Definition 5, m is 6-finite on Ao because (-n, n] f (-oo, oo) asn T oc and m((-n, n]) is finite by our primary assumption (14) . Hence byTheorem 3, the restriction of m* to J3 is the unique extension of m from J'oto J,3 .

In the most important case where F(x) _- x, the measure m on J30 is thelength : m((a, b]) = b - a . It was Borel who, around the turn of the twentiethcentury, first conceived of the notion of a countably additive "length" on anextensive class of sets, now named after him : the Borel field J3. A member ofthis class is called a Borel set . The larger Borel field J3* was first constructedby Lebesgue from an outer and an inner measure (see pp . 28-29 of maintext). The latter was later bypassed by Caratheodory, whose method is adoptedhere . A member of A)* is usually called Lebesgue-measurable . The intimaterelationship between J3 and J3* is best seen from Theorem 7 .

The generalization to a generalized distribution function F is sometimesreferred to as Borel-Lebesgue-Stieltjes . See §2.2 of the main text for thespecial case of a probability distribution .

The generalization to a Euclidean space of higher dimension presents nonew difficulty and is encumbered with tedious geometrical "baggage" .

It can be proved that the cardinal number of all Borel sets is that of thereal numbers (viz. all points in R), commonly denoted by c (the continuum).On the other hand, if Z is a Borel set of cardinal c with m(Z) = 0, suchas the Cantor ternary set (p . 13 of main text), then by the remark precedingTheorem 6, all subsets of Z are Lebesgue-measurable and hence their totality

n

Page 409: A Course in Probability Theory KL Chung

4 INTEGRAL 1 395

has cardinal 2C which is strictly greater than c (see e .g . [3]) . It follows thatthere are incomparably more Lebesgue-measurable sets than Borel sets .

It is however not easy to exhibit a set in A* but not in A ; see ExerciseNo. 15 on p. 15 of the main text for a clue, but that example uses a non-Lebesgue-measurable set to begin with .

Are there non-Lebesgue-measurable sets? Using the Axiom of Choice,we can "define" such a set rather easily ; see example [3] or [5]. However, PaulCohen has proved that the axiom is independent of the other logical axiomsknown as Zermelo-Fraenkel system commonly adopted in mathematics ; andRobert Solovay has proved that in a certain model without the axiom ofchoice, all sets of real numbers are Lebesgue-measurable . In the notation ofDefinition 1 in § 1 in this case, ?Z* = J and the outer measure m* is a measureon J .

N.B . Although no explicit invocation is made of the axiom of choice inthe main text of this book, a weaker version of it under the prefix "countable"must have been casually employed on the q .t. Without the latter, allegedly it isimpossible to show that the union of a countable collection of countable setsis countable . This kind of logical finesse is beyond the scope of this book .

4 Integral

The measure space (Q, ~ , µ) is fixed. A function f with domain Q andrange in R* = [-oo, +oo] is called ~-measurable iff for each real number cwe have

{f <C}={wEQ :f(w)<c}EZT.

We write f E ' in this case . It follows that for each set A E A, namely aBorel set, we have

{f EA} c-~ ;

and both {f = -boo} and { f = - oo} also belong to ZT. Properties of measur-able functions are given in Chapter 3, although the measure there is a proba-bility measure .

A function f E : with range a countable set in [0, oc] will be calleda basic function . Let {aj } be its range (which may include "00"), and Aj

{ f = a j } . Then the Ad 's are disjoint sets with union Q and

(31)

f = L aj lAJ

.i

where the sum is over a countable set of j .We proceed to define an integral for functions in ~T, in three stages,

beginning with basic functions .

Page 410: A Course in Probability Theory KL Chung

396

1 SUPPLEMENT

: MEASURE AND INTEGRAL

DEFINITION 8(a). For the basic function f in (31), its integral is definedto be

(32)

E(f) =

aju(Aj)

and is also denoted by

f fdµ = f f(w)µ(dw).

s~

If a term in (32) is O.oo or oo.0, it is taken to be 0. In particular if f - 0, thenE(O) = 0 even if µ(S2) = oo . If A E ~ and µ(A) = 0, then the basic function

oc .lA + O .lAc

has integral equal tooo .0 + 0.µ(A`) = 0 .

We list some of the properties of the integral .(i) Let {Bj} be a countable set of disjoint sets in ?~`, with union S2 and {bj)

arbitrary positive numbers or oc, not necessarily distinct . Then the function

(33)

bj 1B;

is basic, and its integral is equal to

bj µ (Bj)i

PROOF. Collect all equal bb's into aj and the corresponding Bb's into Ajas in (31) . The result follows from the theorem on double series of positiveterms that it may be summed in any order to yield a unique sum, possibly +oc .

(ii) If f and g are basic and f < g, then

E(f) < E(g) .

In particular if E(f) = +oo, then E(g) = +oo .

PROOF. Let f be as in (31) and g as in (33) . The doubly indexed set{Ajfl Bk} are disjoint and their union is Q . We have using (i) :

E(f) =

aj µ(Ajfl Bk) ;

E(g) =

bkµ(Ai fl Bk) .

i

Page 411: A Course in Probability Theory KL Chung

The order of summation in the second double series may be reversed, and theresult follows by the countable additivity of lt .

(iii) If f and g are basic functions, a and b positive numbers, then a f +bg is basic and

E(a f + bg) = aE(f) + bE(g) .

PROOF . It is trivial that a f is basic and

E(af) = aE(f) .

Hence it is sufficient to prove the result for a = b = 1 . Using the doubledecomposition in (ii), we have

E(f + g) =

( J + bk)A(Aj n Bk).

J

Splitting the double series in two and then summing in two orders, we obtainthe result .

It is good time to state a general result that contains the double seriestheorem used above and some other version of it that will be used below .

Double Limit Lemma . Let {CJk ; j E N, k E N} be a doubly indexed arrayof real numbers with the following properties :

(a) for each fixed j, the sequence {CJk ; k E N) is increasing in k ;(b) for each fixed k, the sequence {CJk ; j E N} is increasing in j .

Then we have

lim T lira f CJk = lim T lim T CJk < + oo .j

k

k

j

The proof is surprisingly simple. Both repeated limits exist by funda-mental analysis . Suppose first that one of these is the finite number C . Thenfor any E > 0, there exist jo and ko such that CJok0 > C - E . This impliesthat the other limit > C - e. Since c is arbitrary and the two indices areinterchangeable, the two limits must be equal . Next if the C above is +oo,then changing C - E into c-1 finishes the same argument .

As an easy exercise, the reader should derive the cited theorem on doubleseries from the Lemma .

Let A E '~ and f be a basic function . Then the product lA f is a basicfunction and its integral will be denoted by

(34)

E(A; f) = f f (w)lt(dw) = f f dµ .A

A

4 INTEGRAL 1 397

Page 412: A Course in Probability Theory KL Chung

398 1 SUPPLEMENT : MEASURE AND INTEGRAL

(iv) Let An E h , An C An+1 for all n and A = U„A n . Then we have

(35)

limE(A,, ; f) = E(A ; f ) .n

PROOF. Denote f by (33), so that IAf = >j bJ l ABj . By (i),

E (A ; f) = b j µ (ABA)

with a similar equation where A is replaced by A,, . Since p.(A„B j ) T u.(ABj)as n T oo, and as m f oo, (35) follows by the double limittheorem.

Consider now an increasing sequence {fn } of basic functions, namely,fn< f n+1 for all n . Then f = limn T f n exists and f E , but of course fneed not be basic ; and its integral has yet to be defined . By property (ii), thenumerical sequence E(fn ) is increasing and so limn f E(fn ) exists, possiblyequal to +oo . It is tempting to define E(f) to be that limit, but we need thefollowing result to legitimize the idea .

Theorem 8 . Let {fn} and 19n) be two increasing sequences of basic func-tions such that

(36)

lira T fn =lira T9,n

n

(everywhere in Q). Then we have

(37)

lim T E(f n ) = Jim T E(gn ) .n

n

PROOF. Denote the common limit function in (36) by f and put

A = {w E Q : f (w) > 0),

then A E ZT. Since 0 < gn < f, we have lAcgn = 0 identically; hence by prop-erty (iii) :

(38)

E(g,) = E(A ; g,,) -}- E(A` ; gn) = E(A ; gn ) .

Fix an n and put for each k E N:

Ak = CO E Q : f k (w) > n- 1

gn (w) .n

Since fk < f k+1, we have Ak C Ak+1 for all k . We are going to prove that

(39)00U Ak =A .k=1

Page 413: A Course in Probability Theory KL Chung

If W E Ak , then f (w) > fk(w) > [(n - In ]g, (w) > 0; hence w E A . On theother hand, if w E A then

l m T fk (w) = f (w) -> gn (w)

and f (w) > 0; hence there exists an index k such that

f k (w) > n- 1

gn (w)n

and so w E Ak . Thus (39) is proved. By property (ii), since

n-1fk>- IA,fk >- IA,

n gn

we haven-1

E(fk) > E(Ak ;fk) ?

E(Ak;gn) .n

Letting k T oo, we obtain by property (iv) :

n-1lkm T E(fk) > n

lkmE(Ak ;gn)

n-1

n-1E (A ; gn) =n

n

where the last equation is due to (38) . Now let n T oo to obtain

hmf E(fk)>lnmTE(gn) .

Since { f , } and {g,) are interchangeable, (37) is proved .

Corollary. Let f , and f be basic functions such that f „ f f , then E (fn ) TE(f ).

PROOF. Take gn = f for all n in the theorem .The class of positive :T-measurable functions will be denoted by i+ .

Such a function can be approximated in various ways by basic functions . It isnice to do so by an increasing sequence, and of course we should approximateall functions in J~+ in the same way . We choose a particular mode as follows .

Define a function on [0, oo] by the (uncommon) symbol ) ] :

)0] = 0 ; )oo] = oo ;

)x]=n-l for xE(n-1,n},nEN .

E (g,l )

4 INTEGRAL 1 399

Page 414: A Course in Probability Theory KL Chung

400 1 SUPPLEMENT: MEASURE AND INTEGRAL

Thus )Jr] = 3, )4] = 3 . Next we define for any f E :'~+ the approximatingsequence { f (""), III E N, by

(40)

f(m)(w) _ )2mf (w)]2m

Each f ("') is a basic function with range in the set of dyadic (binary) numbers:{k/2"'} where k is a nonnegative integer or oo. We have f (m) < f (m+1) forall m, by the magic property of bisection. Finally f(in ) T f owing to theleft-continuity of the function x -*)x] .

DEFINITION 8(b) . For f E `+, its integral is defined to be

(41)

E(f) = lim f E(f 'm )) .m

When f is basic, Definition 8(b) is consistent with 8(a), by Corollary toTheorem 8 . The extension of property (ii) of integrals to 3+ is trivial, becausef < g implies f (m) < g(m) . On the contrary, (f + g) (m ) is not f (m) + g (m ) , butsince f ('n) + g(m) T (f + g), it follows from Theorem 8 that

lim T E(f (m) + g(m) ) = lim f E((f + g) (m) )in

m

that yields property (iii) for ~_+, together with E(a f (m) ) T aE(f), for a > 0.

Property (iv) for ~+ will be given in an equivalent form as follows .(iv) For f E ~+, the function of sets defined on .T by

A--* E(A ;f)

is a measure .

PROOF . We need only prove that if A = Un°_ 1An where the An 's aredisjoint sets in .-T, then

00E(A;f)=

E(An ; .f);n=1

For a basic f, this follows from properties (iii) and (iv). The extension to ~L'~+can be done by the double limit theorem and is left as an exercise .

There are three fundamental theorems relating the convergence of func-tions with the convergence of their integrals . We begin with Beppo Levi'stheorem on Monotone Convergence (1906), which is the extension of Corol-lary to Theorem 8 to :'i+ .

Page 415: A Course in Probability Theory KL Chung

Then we have

(44)

Put for n E N:PROOF .

(45)

limE(fn ) = 0-n

gn = Sup fknk>n

lim T E(gi - gn) = E(gl ) .n

4 INTEGRAL 1 401

Theorem 9. Let If, } be an increasing sequence of functions in ~T- + withlimit f : f n T f . Then we have

lim T E(f n) = E(f) < + oo.n

PROOF . We have f E ~+ ; hence by Definition 8(b), (41) holds . For eachf,, we have, using analogous notation :

(42)

lim f E(fnm) ) = E(fn ) .M

Since fn T f , the numbers )2m f n (a))] T)2' f (w)] as n t oo, owing to theleft continuity of x -p )x] . Hence by Corollary to Theorem 8,

(43)

urn f E(fnm) ) = E(f (m) ) .

It follows that

lmm T 1nm T E(f nm) ) = lmm f E(f (m)) = E(f ) .

On the other hand, it follows from (42) that

lim f lim f E(f nm)) = lim t E(fn ) .n

m

n

Therefore the theorem is proved by the double limit lemma .

From Theorem 9 we derive Lebesgue's theorem in its pristine positiveguise .

Theorem 10. Let fn E F+ , n E N. Suppose

(a) limn f n = 0 ;(b) E(SUpn f n ) < 00.

Then gn E ~T+ , and as n t oc, gn ,. lim sup„ fn = 0 by (a) ; and gi = sup,, f„

so that E(gl) < oc by (b) .

Now consider the sequence {g l - gn }, n E N . This is increasing with limitg i . Hence by Theorem 9, we have

Page 416: A Course in Probability Theory KL Chung

402 1 SUPPLEMENT: MEASURE AND INTEGRAL

By property (iii) for F+ ,

E(g1 - gn) + E(9n) = E(gi ) .

Substituting into the preceding relation and cancelling the finite E($1), weobtain E(g n ) J, 0 . Since 0 < f n < gn , so that 0 < E(fn) < E(gn ) by property(ii) for F+, (44) follows .

The next result is known as Fatou's lemma, of the same vintage 1906as Beppo Levi's . It has the virtue of "no assumptions" with the consequentone-sided conclusion, which is however often useful .

Theorem 11 . Let If,) be an arbitrary sequence of functions in f + . Thenwe have

(46)

E(lim inf fn ) < lim infE(fn ) .n

nPROOF . Put for n E N:

gn = inf f k,k>n

thenlim inf f n = lim T gn .

n

n

Hence by Theorem 9,

(47)

E (lim inf fn ) = lim f E (gn ) .n

n

Since g, < fn, we have E(g„) < E(fn ) and

lim infE(9n) < lim infE(f n ) .n

n

The left member above is in truth the right member of (47) ; therefore (46)follows as a milder but neater conclusion .

We have derived Theorem 11 from Theorem 9 . Conversely, it is easy togo the other way . For if f „ f f , then (46) yields E(f) < lim n T E(fn ) . Sincef > f,,, E(f) > lim„ T E(fn ) ; hence there is equality .

We can also derive Theorem 10 directly from Theorem 11 . Using thenotation in (45), we have 0 < gi - f n < gi . Hence by condition (a) and (46),

E(g1) = E(liminf(gi - fn)) < liminf(E(gl) - E(fn))n

n

= E(g1) - lim sup E(f)n

that yields (44).

Page 417: A Course in Probability Theory KL Chung

4 INTEGRAL I 403

The three theorems 9, 10, and 11 are intimately woven .We proceed to the final stage of the integral . For any f E 3~`with range

in [-oo, +ool, put

f+ __ f on {f >- 0}, f- _

f on {f < 0},0

on

If < 0} ;

0

on

If > 0} .

Then f+ E 9+ , f - E~+ , and

f = f+ - f - ; If I = f+ + f-

By Definition 8(b) and property (iii) :

(48)

E(I f I) = E(f +) + E(f ).

DEFINITION 8(c) . For f E 3~7 , its integral is defined to be

(49)

E(f) = E(f +) - E(f ),

provided the right side above is defined, namely not oc - oc . We say f isintegrable, or f E L', iff both E (f +) and E (f -) are finite; in this case E (f )is a finite number . When E(f) exists but f is not integrable, then it must beequal to +oo or -oo, by (49) .

A set A in 0 is called a null set iff A E .' and it (A) = 0 . A mathematicalproposition is said to hold almost everywhere, or a.e . iff there is a null set Asuch that it holds outside A, namely in A` .

A number of important observations are collected below .

Theorem 12 . (i) The function f in ~ is integrable if and only if I f I isintegrable ; we have

(50)

IE(f )I < E(If I) .

(ii) For any f E 3 and any null set A, we have

(51)

E(A;f)= f fdlc=0; E(f)=E(A` ;f)=fC fdlL.JJA

fAC

If f E L1 , then the set {N E S2 : I f (w)I = oo} is a null set .(iv) If f E L', g E T, and IgI < If I a.e ., then g E L' .(v) If f E OT, g E .T, and g = f a.e ., then E(g) exists if and only if E(f )

exists, and then E(g) = E(f ) .(vi) If µ(Q) < oc, then any a .e. bounded _37-measurable function is inte-

grable .

PROOF. (i) is trivial from (48) and (49) ; (ii) follows from

1A If I < lA .oc

Page 418: A Course in Probability Theory KL Chung

404 1 SUPPLEMENT : MEASURE AND INTEGRAL

so that0 < E(lAIf I) < E(lA .oo) = µ(A).00 = 0 .

This implies (51) .To prove (iii), let

A(n) = {If I >_ n} .

Then A (n) E ?~` and

nµ(A(n)) = E(A(n) ; n) < E(A(n) ; IfI) < E(I f 1) .

Hence

(52)

l-t(A(n)) < 1 E(I f 1) .n

Letting n T oc, so that A(n) 4 {I f I = oo} ; since µ(A(1)) < E(I f I) < oo, wehave by property (f) of the measure µ :

,a({If I = oo}) = Jim ~ µ(A(n)) = 0.n

To prove (iv), let IgI < If I on Ac, where µ(A) = 0. Then

191 <_ IA-00 + lAc .If I

and consequently

E(IgI) < p(A) .oc + E(Ac; If I) _< 0.oo + E(I f I) = E(I f 1) .

Hence g E L 1 if f E L' .The proof of (v) is similar to that of (iv) and is left as an exercise . The

assertion (vi) is a special case of (iv) since a constant is integrable whenµ(S2) < oo .

Remark. A case of (52) is known as Chebyshev's inequality ; see p. 48of the main text. Indeed, it can be strengthened as follows :

(53)

limnµ(A(n)) < limE(A(n) ; I f I) = 0.n

n

This follows from property (f) of the measure

A --* E(A ; If I) ;

see property (iv) of the integral for ~7+ .There is also a strengthening of (ii), as follows .If Bk E ~ and µ (Bk) -* 0 as k --* oo, then

limE(Bk ; f) = 0-k

Page 419: A Course in Probability Theory KL Chung

4 INTEGRAL 1 405

To prove this we may suppose, without loss of generality, that f E i+ . Wehave then

E(Bk ; f) = E(Bk n A(n ); f) +- E(Bk n A(n )` ; f )

< E(A(n ) ; f) + E(Bk )n .

Hencelim sup E(Bk ; P < E(A(n ); f )

k

and the result follows by letting n -+ oo and using (53) .It is convenient to define, for any f in ~T, a class of functions denoted

by C(f ), as follows : g E C(f) iff g = f a.e. When (Q, ?T, it) is a completemeasure space, such a g is automatically in 4T . To see this, let B = {g 0 f } .Our definition of "a.e ." means only that B is a subset of a null set A ; in plainEnglish this does not say whether g is equal to f or not anywhere in A - B.However if the measure is complete, then any subset of a null set is also anull set, so that not only the set B but all its subsets are null, hence in ~T, .Hence for any real number c,

{g-<c}={g=f;g-<c}U{g0f ;g :5 c}

belongs to A , and so g E Zh7 .

A member of C(f) may be called a version of f, and may be substitutedfor f wherever a null set "does not count" . This is the point of (iv) and (v) inTheorem 12. Note that when the measure space is complete, the assumption"g E 7,T" there may be omitted. A particularly simple version of f is thefollowing finite version

on

{ { .f I < oo},f

0

on

11f I = 001 ;

where 0 may be replaced by some other number, e .g., by 1 in E(log f) .In functional analysis, it is the class C(f) rather than an individual f

that is a member of L 1 .As examples of the preceding remarks, let us prove properties (ii) and

(iii) for integrable functions .(ii) if f E L1 , g E L', and f < g a.e ., then

E(f) < E(g) .

PROOF . We have, except on a null set :f+_ f- < g+_ g-

but we cannot transpose terms that may be +oo! Now substitute finite versionsof f and g in the above (without changing their notation) and then transpose

Page 420: A Course in Probability Theory KL Chung

406 1 SUPPLEMENT: MEASURE AND INTEGRAL

as follows :f++g- < g+ + f -

Applying properties (ii) and (iii) for +, we obtain

E(f +) + E(g ) < E(g+ ) + E(f ).

By the assumptions of L', all the four quantities above are finite numbers .Transposing back we obtain the desired conclusion .

(iii) if f E L', g E L', then f + g E L 1 , and

E(f + g) = E(f) + E(g) .

Let us leave this as an exercise . If we assume only that both E(f) andE(g) exist and that the right member in the equation above is defined, namelynot (+oe) + (-oc) or (-oo) + (+oc), does E(f + g) then exist and equalto the sum? We leave this as a good exercise for the curious, and return toTheorem 10 in its practical form .

Theorem 10 ° . Let f , E 7; suppose(a) lim„ f, = f a.e . ;(b) there exists ~O E L 1 such that for all n :

If,, I < cp

a.e .

Then we have(c)lim„E(If„- f1)=0 .

PROOF. observe first that

I lim f„ I < sup If n 1 ;n

, 1

Ifn-fI < Ifn1+IfI <2suplfnl,n

provided the left members are defined . Since the union of a countable collec-tion of null sets is a null set, under the hypotheses (a) and (b) there is a null setA such that on Q - A, we have sup„ If,, I < cp hence by Theorem 12 (iv), allIfn I, I f 1, If, - f I are integrable, and therefore we can substitute their finiteversions without affecting their integrals, and moreover lim,, I f n - f I = 0 onS2 - A. (Remember that f,, - f need not be defined before the substitutions!) .By using Theorem 12 (ii) once more if need be, we obtain the conclusion (c)from the positive version of Theorem 10 .

This theorem is known as Lebesgue's dominated convergence theorem,vintage 1908. When µ(S2) < oo, any constant C is integrable and may be usedfor cp ; hence in this case the result is called bounded convergence theorem .

Page 421: A Course in Probability Theory KL Chung

5 APPLICATIONS 1 407

Curiously, the best known part of the theorem is the corollary below with afixed B.

Corollary . We have

linmJ

fa dµ=f f dua

auniformly in B E ~-T .

This is trivial from (c), because, in alternative notation :

IE(B; f n) - E(B; f)I < E(B ;Ifn - fl)<-E(Ifn - f l) .

In the particular case where B = S7, the Corollary contains a number of usefulresults such as the integration term by term of power series or Fourier series .A glimpse of this is given below .

5 Applications

The general theory of integration applied to a probability space is summarizedin §§3 .1-3 .2 of the main text . The specialization to R expounded in §3 abovewill now be described and illustrated .

A function f defined on R with range in [-oo, +oc] is called a Borelfunction iff f E A; it is called a Lebesgue-measurable function iff f E A* .The domain of definition f may be an arbitrary Borel set or Lebesgue-measurable set D. This case is reduced to that for D = R by extending thedefinition of f to be zero outside D. The integral of f E A* correspondingto the measure m* constructed from F is denoted by

00E(f) = f-00 f (x) dF(x) .

In case F(x) - x, this is called the Lebesgue integral of f ; in this case theusual notation is, for A E A* :

JAf (x) dx = E(A ; f ) .

Below are some examples of the application of preceding theorems to classicanalysis .

Example 1 . Let I be a bounded interval in R ; {Uk} a sequence of functions on I ; andfor X E I :

11

S" (x) = I Uk (x), n EN .

Page 422: A Course in Probability Theory KL Chung

408 I SUPPLEMENT : MEASURE AND INTEGRAL

Suppose the infinite series >k uk(x) converges I ; then in the usual notation :

limn

k=1

U

exists and is finite. Now suppose each ukk is Lebesgue-integrable, then so is each s,,,by property (iii) of the integral ; and

I s„ (x) dx =k=1

uk (x) dx .IQuestion: does the numerical series above converge? and if so is the sum of integralsequal to the integral of the sum :

E fUk(X)dX =

uk (x) dx = J s(x) dx?k=1

/ k=1

/

This is the problem of integration term by term .A very special but important case is when the interval I = [a, b] is compact

and the functions Uk are all continuous in I . If we assume that the series Ek_ 1 u k (x)converges uniformly in I, then it follows from elementary analysis that the sequenceof partial sums {s„ (x)} is totally bounded, that is,

sup sup IS, (x) I = sup sup IS, (x) I < 00 .n

x

x

n

Since m(I) < oc, the bounded convergence theorem applies to yield

lim / s„ (x) dx = / lim s,, (x) dx.n

/

/ "

The Taylor series of an analytic function always converges uniformly and abso-lutely in any compact subinterval of its interval of convergence . Thus the result aboveis fruitful .

Another example of term-by-term integration goes back to Theorem 8 .

Example 2. Let u k > 0, u k E L', then

(54)

that is (54) .

x

(x) =

uk(x) = s(x)k=1

a

b 00

~ 1 fb

~

E uk

uk dlt .k=1

k=

Let f „ = EL 1 uk, then f „ E L', f ,, T f

uk . Hence by monotone conver-gence

E(f) = limE(f„)1r

Page 423: A Course in Probability Theory KL Chung

When Uk is general the preceding result may be applied to Iuk I to obtain

C

l uk l/ dp

fIukId/L .k=1

k=1

If this is finite, then the same is true when Iuk I is replaced by uA and u- . It thenfollows by subtraction that (54) is also true . This result of term-by-term integrationmay be regarded as a special case of the Fubini-Tonelli theorem (pp . 63-64), whereone of the measures is the counting measure on N .

For another perspective, we will apply the Borel-Lebesgue theory of integral tothe older Riemann context .

Example 3 . Let (I, A*, m) be as in the preceding example, but let I = [a, b] becompact. Let f be a continuous function on I . Denote by P a partition of I as follows :

a=xo<x1<x2< . . .<xn=b;

and put8(P) = max (Xk - Xk_1) .

I<k<_n

For each k, choose a point ~k in [xk_ 1 , xk ], and define a function fp as follows :

fP(x) =k,

for x E [xo , x 1 ],for xE(xk_ I ,xk],2<k<n .

Particular choices of ~k are : ~k = f (xk_1) ; ~k = f (xk) ;

(55)

~k = min f (x) ; 'k = max f (x) .Xk-I <X<Xk

Xk-1 <X<Xk

The fp is called a step function ; it is an approximant of f . It is not basic by Defini-tion 8(a) but fP and fP are. Hence by Definitions 8(a) and 8(c), we have

n

E(fp) =

f (W(xk - Xk_1)1k=1

The sum above is called a Riemann sum ; when the ~k are chosen as in (55), they arecalled lower and upper sums, respectively .

Now let {P(n), n E N) be a sequence of partitions such that E(P(n)) -a 0 asn -~ oo. Since f is continuous on a compact set, it is bounded . It follows that thereis a constant C such that

sup sup I fP(n)

WI < C.nEN XE/

Since I is bounded, we can apply the bounded convergence theorem to conclude that

limE(fp(,,)) = E(f ) .n

5 APPLICATIONS 1 409

Page 424: A Course in Probability Theory KL Chung

41 0 1 SUPPLEMENT: MEASURE AND INTEGRAL

The finite existence of the limit above signifies the Riemann-integrability of f, and thelimit is then its Rienwnn-integral fb f (x) dx . Thus we have proved that a continuousfunction on a compact interval is Riemann-integrable, and its Riemann-integral is equalto the Lebesgue integral. Let us recall that in the new theory, any bounded measurablefunction is integrable over any bounded measurable set . For example, the function

1sin -, x E (0, 1]

X

being bounded by 1 is integrable . But from the strict Riemannian point of view ithas only an "improper" integral because (0, 1] is not closed and the function is notcontinuous on [0, 1], indeed it is not definable there . Yet the limit

1

lEm sin 1 dxE

X

exists and can be defined to be f sin (1/x) dx, As a matter of fact, the Riemann sumsdo converge despite the unceasing oscillation of f between 0 and 1 as x J, 0 .

Example 4. The Riemann integral of a function on (0, oo) is called an "infiniteintegral" and is definable as follows :00

f

f (x) dx = lim

f (x) dxo

n-'°° o

n

when the limit exists and is finite . A famous example is

sinx(56)

f (x) =, x E (0, oo) .X

This function is bounded by 1 and is continuous . It can be extended to [0, oo) bydefining f (0) = 1 by continuity . A cute calculation (see §6.2 of main text) yields theresult (useful in Optics) :

"sin x

nlim dx =

*

n fn

o x

2

By contrast, the function I f I is not Lebesgue-integrable . To show this, we usetrigonometry :

sinx > 1

1

-

3n=

x ) - / (2n + 1)ir = Cn for X E (2n7r + 4 , 2nJr + 4

I n .

Thus for x > 0 :

x

n

n=1

C n l,n (x) .

The right member above is a basic function, with its integral :

Y: Cnm(I n )11

00

(2n + 1)2=

Page 425: A Course in Probability Theory KL Chung

Example 5. The square of the function f in (56) :

sinx 2f(x)2=

( x

, XER

I ~2

d ~2xdx=

,

-~;2

d2

f~ - dx = log ,d~

log ~ _ .

5 APPLICATIONS 1 4 1 1

It follows that E(f +) _ +oo. Similarly E(f -) = +oo; therefore by Definition 8(c)E(f) does not exist! This example is a splendid illustration of the followingNon-Theorem .

Let f E-,0 and f n = f 1 (o,,, ), n E N. Then f, E-10 and f, --*f as n-+ 00 .Even when the f n s are "totally bounded", it does not follow that

(57)

limE(f n ) = E(f) ;n

indeed E(f) may not exist .On the other hand, if we assume, in addition, either (a) f > 0 ; or (b) E(f)

exists, in particular f E L' ; then the limit relation will hold, by Theorems 9 and 10,respectively . The next example falls in both categories .

is integrable in the Lebesgue sense, and is also improperly integrable in the Riemannsense .

We have2

1.f (x) < 1(-1,+I)

1~-~,-i)ut+i,+~)x2

and the function on the right side is integrable, hence so is f2 .Incredibly, we have

°° smx 2

7t

°° sinx

C- !dx = - = (RI)

dx,o

x

2

o

x

where we have inserted an "RI" to warn against taking the second integral as aLebesgue integral. See §6.2 for the calculation . So far as I know, nobody has explainedthe equality of these two integrals .

Example 6 . The most notorious example of a simple function that is not Riemann-integrable and that baffled a generation of mathematicians is the function 1 Q, where Qis the set of rational numbers. Its Riemann sums can be made to equal any real numberbetween 0 and 1, when we confine Q to the unit interval (0, 1). The function is sototally discontinuous that the Riemannian way of approximating it, horizontally so tospeak, fails utterly . But of course it is ludicrous even to consider this indicator functionrather than the set Q itself. There was a historical reason for this folly : integration wasregarded as the inverse operation to differentiation, so that to integrate was meant to"find the primitive" whose derivative is to be the integrand, for example,

Page 426: A Course in Probability Theory KL Chung

412 1 SUPPLEMENT : MEASURE AND INTEGRAL

A primitive is called "indefinite integral", and f2 (1/x)dx e .g . is called a "definiteintegral." Thus the unsolvable problem was to find fo'~ I Q (x) dx, 0 < ~ < 1 .

The notion of measure as length, area, and volume is much more ancient thanNewton's fluxion (derivative), not to mention the primitive measure of counting withfingers (and toes) . The notion of "countable additivity" of a measure, although seem-ingly natural and facile, somehow did not take hold until Borel saw that

m(Q)

m(q) =

=0.qEQ

qEQ

There can be no question that the "length" of a single point q is zero. Euclid gave it"zero dimension" .

This is the beginning of MEASURE. An INTEGRAL is a weighted measure,as is obvious from Definition 8(a). The rest is approximation, vertically as in Defini-tion 8(b), and convergence, as in all analysis .

As for the connexion with differentiation, Lebesgue made it, and a clue is givenin §1 .3 of the main text .

Page 427: A Course in Probability Theory KL Chung

General bibliography

[The five divisions below are merely a rough classification and are not meant to bemutually exclusive.]

1 . Basic analysis

[1] G. H. Hardy, A course of pure mathematics, 10th ed. Cambridge UniversityPress, New York, 1952 .

[2] W. Rudin, Principles of mathematical analysis, 2nd ed. McGraw-Hill BookCompany, New York, 1964 .

[3] I. P. Natanson, Theory of functions of a real variable (translated from theRussian) . Frederick Ungar Publishing Co ., New York, 1955 .

2. Measure and integration

[4] P. R. Halmos, Measure theory . D . Van Nostrand Co., Inc., Princeton, N.J .,1956 .

[5] H. L. Royden, Real analysis . The Macmillan Company, New York, 1963 .[6] J. Neveu, Mathematical foundations of the calculus of probability . Holden-

Day, Inc., San Francisco, 1965 .

3. Probability theory

[7] Paul Levy, Calcul des probabilites . Gauthier-Villars, Paris, 1925 .[8] A. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung . Springer-

Verlag, Berlin, 1933 .

Page 428: A Course in Probability Theory KL Chung

414 ( GENERAL BIBIOGRAPHY

[9] M . Frechet, Generalizes sur les probabilites. Variables aleatoires. Gauthier-Villars, Paris, 1937 .

[10] H. Cramer, Random variables and probability distributions, 3rd ed. Cam-bridge University Press, New York, 1970 [1st ed ., 1937] .

[11] Paul Levy, Theorie de l'addition des variables aleatoires, 2nd ed. Gauthier-Villars, Paris, 1954 [1st ed., 1937] .

[12] B . V. Gnedenko and A . N. Kolmogorov, Limit distributions for sums of inde-pendent random variables (translated from the Russian) . Addison-Wesley PublishingCo., Inc., Reading, Mass ., 1954 .

[13] William Feller, An introduction to probability theory and its applications,vol . 1 (3rd ed .) and vol . 2 (2nd ed .) . John Wiley & Sons, Inc., New York, 1968 and1971 [1 st ed . of vol. 1, 1950] .

[14] Michel Loeve, Probability theory, 3rd ed. D. Van Nostrand Co ., Inc ., Prince-ton, N.J., 1963 .

[15] A. Renyi, Probability theory . North-Holland Publishing Co., Amsterdam,1970 .

[16] Leo Breiman, Probability . Addison-Wesley Publishing Co ., Reading, Mass .,1968 .

4. Stochastic processes

[17] J . L . Doob, Stochastic processes . John Wiley & Sons, Inc ., New York, 1953 .[18] Kai Lai Chung, Markov chains with stationary transition probabilities, 2nd

ed. Springer-Verlag, Berlin, 1967 [1st ed ., 1960] .[19] Frank Spitzer, Principles of random walk . D. Van Nostrand Co ., Princeton,

N.J., 1964 .[20] Paul-Andre Meyer, Probabilites et potential . Herman (Editions Scienti-

fiques), Paris, 1966 .[21] G. A. Hunt, Martingales et processus de Markov . Dunod, Paris, 1966 .[22] David Freedman, Brownian motion and diffusion . Holden-Day, Inc ., San

Francisco, 1971 .

5 . Supplementary reading

[23] Mark Kac, Statistical independence in probability, analysis and numbertheory, Carus Mathematical Monograph 12. John Wiley & Sons, Inc ., New York,1959 .

[24] A. Renyi, Foundations of probability . Holden-Day, Inc ., San Francisco,1970 .

[25] Kai Lai Chung, Elementary probability theory with stochastic processes .

Springer-Verlag, Berlin, 1974 .

Page 429: A Course in Probability Theory KL Chung

Index

A

Abelian theorem, 292Absolutely continuous part of d.f., 12Adapted, 335Additivity of variance, 108Adjunction, 25a.e ., see Almost everywhereAlgebraic measure space, 28Almost everywhere, 30Almost invariant, permutable, remote, trivial,

266Approximation lemmas, 91, 264Arcsin law, 305Arctan transformation, 75Asymptotic density, 24Asymptotic expansion, 236Atom, 31, 32Augmentation, 32Axiom of continuity, 22

B

moo, /3, ', 389Ballot problem, 232Basic function, 395Bayes' rule, 320Belonging (of function to B .F .), 263Beppo Levi's theorem, 401

Bernstein's theorem, 200Berry-Esseen theorem, 235B.F ., see Borel fieldBi-infinite sequence, 270Binary expansion, 60Bochner-Herglotz theorem, 187Boole's inequality, 21Borel-Cantelli lemma, 80Borel field, 18

generated, 18Borel field on R, 388, 394Borel-Lebesgue measure, 24, 389Borel-Lebesgue-Stiltjes measure, 394Borel measurable function, 37Borel's Covering Theorem, 392Borel set, 24, 38, 394Borel's Lemma, 390Borel's theorem on normal numbers, 109Boundary value problem, 344Bounded convergence theorem, 44Branching process, 371Brownian motion, 129, 227

C

CK, Co, CB, C, 91CB, Cu, 157

Page 430: A Course in Probability Theory KL Chung

416 1 INDEX

Canonical representation of infinitelydivisible laws, 258

Cantor distribution, 13-15, 129, 174Caratheodory criterion, 28, 389, 394Carleman's condition, 103Cauchy criterion, 70Cauchy distribution, 156Cauchy-Schwartz inequality, 50, 317Central limit theorem, 205

classical form, 177for dependent r .v .'s, 224for random number of terms, 226general form, 211Liapounov form, 208Lindeberg-Feller form, 214Lindeberg's method, 211with remainder, 235

Chapman-Kolmogorov equations, 333Characteristic function, 150

list of, 155-156multidimensional, 197

Chebyshev's inequality, 50strengthened, 404

ch.f., see Characteristic functionCoin-tossing, 111Complete probability space, 30Completely monotonic, 200Concentration function, 160Conditional density, 321Conditional distribution, 321Conditional expectation, 313as integration, 316as martingale limit, 369

Conditional independence, 322Conditional probability, 310, 313Continuity interval, 85, 196Convergence a.e ., 68

in dist., 96in LP, 71in pr ., 70weak in L 1 , 73

Convergence theorem :for ch .f., 169for Laplace transform, 200

Convolution, 152, 157Coordinate function, 58, 264Countable, 4

D

de Finetti's theorem, 367De Moivre's formula, 220

Density of d .f., 12d.f., see Distribution functionDifferentiation :

of ch .f., 175of d.f., 163

Diophantine approximation, 285Discrete random variable, 39Distinguished logarithm, nth root, 255Distribution function, 7continuous, 9degenerate, 7discrete, 9n-dimensional, 53of p.m., 30

Dominated convergence theorem, 44Dominated ergodic theorem, 362Dominated r.v ., 71Doob's martingale convergence, 351Doob's sampling theorem, 342Double array, 205Double limit lemma, 397Downcrossing inequality, 350

E

Egorov's theorem, 79Empiric distribution, 138Equi-continuity of ch .f.'s, 169Equivalent sequences, 112Event, 54Exchangeable, event, 366Existence theorem for independent r.v.'s, 60Expectation, 41Extension theorem, see Kolmogorov's

extension theorem

F

*, 380,T* -measurability, 378

approximation of -IF* -measurable set, 382Fatou's lemma, 45, 402Feller's dichotomy, 134Field, 17Finite-product set, 61First entrance time, 273Fourier series, 183Fourier-Stieltjes transform, 151

of a complex variable, 203Fubini's theorem, 63Functionals of S,,, 228Function space, 265

Page 431: A Course in Probability Theory KL Chung

G

Gambler's ruin, 343Gambling system, 273, 278, see also Optional

samplingGamma distribution, 158Generalized Poisson distribution, 257Glivenko-Cantelli theorem, 140

H

Harmonic analysis, 164Harmonic equation, 344Harmonic function, 361Helley's extraction principle, 88Holder's inequality, 50Holospoudicity, 207Homogeneous Markov chain, process,

333

I

Identically distributed, 37Independence, 53

characterizations, 197, 322Independent process, 267Indicator, 40In dist., see In distributionIn distribution, 96Induced measure, 36Infinitely divisible, 250Infinitely often, 77, 360Infinite product space, 61Information theory, 148, 414Inner measure, 28In pr ., see In probabilityIn probability, 70Integrable random variable, 43Integrable function, 403Integral, 396, 400, 403

additivity, 406convergence theorems, 401, 406-408

Integrated ch .f., 172Integration term by term, 44, 408Invariance principle, 228Invariant, 266Inverse mapping, 35Inversion formula for ch .f., 161, 196Inversions, 220i .o ., see Infinitely often

J

Jensen's inequality, 50, 317Jump, 3Jumping part of d.f., 7

K

Kolmogorov extension theorem, 64, 265Kolmogorov's inequalites, 121, 125Kronecker's lemma, 129

L

Laplace transform, 196Large deviations, 214, 241Lattice d.f., 183Law of iterated logarithm, 242, 248Law of large numbers

converse, 138for pairwise independent r.v .'s, 134for uncorrelated r.v .'s, 108identically distributed case, 133, 295, 365strong, 109, 129weak, 114, 177

Law of small numbers, 181Lebesgue measurable, 394Lebesgue's dominated and bounded

convergence, 401, 406Levi's monotone convergence, 401Levy-Cramer theorem, 169Levy distance, 98Levy's formula, see Canonical representationLevy's theorem on series of independent

r.v.'s, 126strengthened, 363

Liapounov's central limit theorem, 208Liapounov's inequality, 50Liminf, Limsup, 75Lindeberg's method, 211Lindeberg-Feller theorem, 214Lower semicontinuous function, 95

M

Marcinkiewicz-Zygmund inequality, 362Markov chain, homogeneous, 333Markov process, 326homogeneous, 330or rth order, 333

Markov property, 326strong, 327

INDEX 1 417

Page 432: A Course in Probability Theory KL Chung

418 1 INDEX

Martingale, 335, see also Submartingale,Supermartingale

difference sequence, 337Krickeberg's decomposition, 358of conditional expectations, 338stopped, 341

Maximum of S,,, 228, 299Maximum of submartingale, 346, 362M.C ., see Monotone classm-dependent, 224Mean, 49Measurable with respect to 3 , 35MeasureCompletion, 384Extension, 377, 380a-finite, 381Transfinite construction, 386Uniqueness, 381

Measure space, 381discrete, 386

Median, 117Method of moments, 103, 178Metricfor ch .f.'s, 172for convergence in pr ., 71for d .f.'s, 98for measure space, 47for p.m.'s, 172

Minimal B.F ., 18Minkowski's inequality, 50Moment, 49Moment problem, 103Monotone class theorems, 19Monotone convergence theorem, 44, 401Monotone property of measure, 21Multinomial distribution, 181

N

n-dimensional distribution, 53Negligibility, 206Non-measurable set, 395Normal d.f., 104

characterization of, 180, 182convergence to, see Central limit theoremkernel, 157positive, 104, 232

Normal number, 109, 112Norming, 206Null set, 30, 383Number of positive terms, 302Number of zeros, 234

0

Optional r.v ., 259, 338Optional sampling, 338, 346Order statistics, 147Orthogonal, 107Outcome, 57Outer measure, 28, 375, 377-378

P

Pairwise independence, 53Partition, 40Permutable, 266p.m ., see Probability measurePoisson d.f., 194Poisson process, 142P61ya criterion, 191P61ya distribution, 158Positive definite, 187Positive normal, see NormalPost-a, 259Potential, 359, 361Pre-a, 259Probability distribution, 36

on circle, 107Probability measure, 20

of d.f., 30Probability space, triple, 23Product B .F. measure, space, 58Projection, 64

Q

Queuing theory, 307

R

Radon-Nikodym theorem, 313, 369Random variable, 34, see also Discrete,

integrable, simple, symmetric randomvariable

Random vector, 37Random walk, 271

recurrent, 284Recurrence principle, 360Recurrent value, 278Reflection principle, 232Remainder term, 235Remote event, field, 264Renewal theory, 142Riemann integral, 409

Page 433: A Course in Probability Theory KL Chung

improper, infinite, 410primitive, 411

Riemann-Lebesgue lemma, 167Riemann-Stieltjes integral, 198Riemann zeta function, 259Riesz decomposition, 359Right continuity, 5r .v ., see Random variable

S

Sample function, 141Sample space, 23

discrete, 23s .d .f., see Subdistribution functionSecond-order stochastic process, 191Sequentially vaguely compact, 88Set of convergence, 79Shift, 265optional, 273

Simple random variable, 40Singleton, 16Singular d .f., 12

measure, 31Singular part of d.f., 12Smartingale, 335

closeable, 359Span, 184Spectral distribution, 191St. Petersburg paradox, 120, 373Spitzer's identity, 299Squared variation, 367Stable distribution, 193Standard deviation, 49Stationary independent process, 267Stationary transition probabilities, 331Step function, 92Stochastic process, with discrete parameter,

263Stone -Weierstrass theorem, 92, 198Stopped (super) martingale, 340-341Stopping time, see Optional samplingSubdistribution function, 88Submartingale, 335

convergence theorems for, 350convex transformation of, 336Doob decomposition, 337inequalities, 346

Subprobability measure, 85Subsequences, method of, 109Superharmonic function, 361

T

Tauberian theorem, 292Taylor's series, 177Three series theorem, 125Tight, 95Topological measure space, 28Trace, 23Transition probability function, 331Trial, 57Truncation, 115Types of d .f., 184

U

Uncorrelated, 107Uniform distribution, 30Uniform integrability, 99Uniqueness theorem

for ch .f., 160, 168for Laplace transform, 199, 201for measure, 30

Upcrossing inequality, 348

V

Vague convergence, 85, 92general criterion, 96

Variance, 49

W

Wald's equation, 144, 345Weak convergence, 73Weierstrass theorem, 145Wiener-Hopf technique, 291

Z

Zero-or-one lawHewitt-Savage, 268Kolmogorov, 267L6vy, 357

INDEX 1 419

Supermartingale, 335optional sampling, 340

Support of d.£, 10Support of p.m ., 32Symmetric difference (o), 16Symmetric random variable, 165Symmetric stable law, 193, 250Symmetrization, 155System theorem, see Gambling system,

Submartingale