probability with measurenicfreeman.staff.shef.ac.uk/masx50/notes.pdf · 2020-04-05 · measure...

102
Probability with Measure Dr Nic Freeman September 22, 2020

Upload: others

Post on 27-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

Probability with Measure

Dr Nic Freeman

September 22, 2020

Page 2: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Contents

0 Introduction 30.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1 Measure Spaces and Measure 71.1 What is Measure? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Sigma Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 The Borel σ-field and Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . 141.5 An example of a non-measurable set (?) . . . . . . . . . . . . . . . . . . . . . . . 161.6 Two Useful Theorems About Measure . . . . . . . . . . . . . . . . . . . . . . . . . 181.7 Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.8 Exercises 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Measurable Functions 222.1 Liminf and Limsup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2 Measurable Functions - Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Examples of Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4 Algebra of Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.5 Simple Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6 Extended Real Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.7 Exercises 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Lebesgue Integration 353.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 The Lebesgue Integral for Simple Functions . . . . . . . . . . . . . . . . . . . . . 373.3 The Lebesgue Integral for Non-negative Measurable Functions . . . . . . . . . . . 383.4 The Monotone Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 423.5 Lebesgue Integrability and Dominated Convergence . . . . . . . . . . . . . . . . . 453.6 Fubini’s Theorem and Function Spaces (?) . . . . . . . . . . . . . . . . . . . . . . 503.7 Riemann Integration (?) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.8 Exercises 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Probability and Measure 594.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2 Basic Concepts of Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . 59

1

Page 3: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.3 The Borel-Cantelli lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.4 Kolmogorov’s 0-1 law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.5 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.6 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.7 Characteristic Functions and Weak Convergence . . . . . . . . . . . . . . . . . . . 704.8 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.9 Exercises 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Product Measures and Fubini’s Theorem (∆) 795.1 Dynkin’s π − λ Lemma (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.2 Product Measure (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.3 Fubini’s Theorem (∆) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.4 Exercises 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

A Solutions to exercises 85

2

Page 4: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Chapter 0

Introduction

0.1 OrganizationSyllabus

These notes are for three courses: MAS350, MAS451 and the Spring semester of MAS6352.Some sections of the course are included in MAS451/6352 but not in MAS350. These sections

are marked with a (∆) symbol. We will not cover these sections in lectures. Students takingMAS451/6352 should study these sections independently.

Some parts of the notes are marked with a (?) symbol, which means they are off-syllabus. Theseare often cases where detailed connections can be made to and from other parts of mathematics.

These notes are heavily based on the earlier notes of Prof. David Applebaum.

Problem sheets

The exercises are divided up according to the chapters of the course. Some exercises are markedas ‘challenge questions’ – these are intended to offer a serious, time consuming challenge to thebest students.

Aside from challenge questions, it is expected that students will attempt all exercises (for theversion of the course they are taking) and review their own solutions using the typed solutionsprovided at the end of these notes.

At three points during each semester, an assignment of additional exercises will be set. Aboutone week later, a mark scheme will be posted, and you should self-mark your solutions.

Examination

The course will be examined in the summer sitting. Parts of the course marked with a (∆) areexaminable for MAS451/6352 but not for MAS350. Parts of the course marked with a (?) willnot be examined (for everyone).

Website

Further information, including the timetable, can be found on

http://nicfreeman.staff.shef.ac.uk/MASx50/.

3

Page 5: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

0.2 Preliminaries

This section contains lots of definitions, from earlier courses, that we will use in MAS350. Mostof the material here should be familiar to you. There may be one or two minor extensions of ideasyou have seen before.

1. Set Theory.Let S be a set and A,B,C, . . . be subsets.Ac is the complement of A in S so that

Ac = x ∈ S;x /∈ A.

Union A ∪B = x ∈ S;x ∈ A or x ∈ B.Intersection A ∩B = x ∈ S;x ∈ A and x ∈ B.Set theoretic difference: A−B = A ∩Bc.We have finite and infinite unions and intersections so if A1, A2, . . . , An are subsets of S.

n⋃i=1

Ai = A1 ∪ A2 ∪ . . . ∪ An.

n⋂i=1

Ai = A1 ∩ A2 ∩ . . . ∩ An.

We will also need infinite unions and intersections. So let (An) be a sequence of subsets inS.Let x ∈ S. We say that x ∈

⋃∞i=1 Ai if x ∈ Ai for at least one value of i. We say that

x ∈⋂∞i=1 Ai if x ∈ Ai for all values of i.

Note that de Morgan’s laws hold in this context:

( ∞⋂i=1

Ai

)c=∞⋃i=1

Aci .( ∞⋃i=1

Ai

)c=∞⋂i=1

Aci .

The Cartesian product S × T of sets S and T is

S × T = (s, t); s ∈ s, t ∈ T.

2. Sets of Numbers

• Natural numbers N = 1, 2, 3, . . ..• Non-negative integers Z+ = N ∪ 0 = 0, 1, 2, 3, . . ..• Integers Z.• Rational numbers Q.• Real numbers R.

4

Page 6: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

• Complex numbers C.

A set X is countable if there exists an injection between X and N. A set is uncountable ifit fails to be countable. N,Z+,Z and Q are countable. R and C are uncountable. All finitesets are countable.

3. Images and Preimages.Suppose that S1 and S2 are two sets and that f : S1 → S2 is a mapping (or function).Suppose that A ⊆ S1. The image of A under f is the set f(A) ⊆ S2 defined by

f(A) = y ∈ S2; y = f(x) for some x ∈ S1.

If B ⊆ S2 the inverse image of B under f is the set f−1(B) ⊆ S1 defined by

f−1(B) = x ∈ S1; f(x) ∈ B.

Note that f−1(B) makes sense irrespective of whether the mapping f is invertible.Key properties are, with A,A1, A2 ⊆ S1 and B,B1, B2 ⊆ S2 :

f−1(B1 ∪B2) = f−1(B1) ∪ f−1(B2),f−1(B1 ∩B2) = f−1(B1) ∩ f−1(B2),

f−1(Ac) = f−1(A)c,f(A1 ∪ A2) = f(A1) ∪ f(A2),f(A1 ∩ A2) ⊆ f(A1) ∩ f(A2),f(f−1(B)) ⊆ B

(f g)−1(A) = g−1(f−1(A))A ⊆ f−1(f(A)),

A ⊆ B ⇒ f−1(A) ⊆ f−1(B).

4. Extended Real NumbersWe will often find it convenient to work with ∞ and −∞. These are not real numbers, butwe find it convenient to treat them a bit like real numbers. To do so we specify the extraarithmetic rules, for all x ∈ R,

∞+ x = x+∞ =∞,

x−∞ = −∞+ x = −∞,

∞.x = x.∞ =∞ for x > 0,

∞.x = x.∞ = −∞ for x < 0,

∞.0 = 0.∞ = 0.

Note that ∞−∞, ∞.∞ and ∞∞ are undefined. We also specify that, for all x ∈ R,

−∞ < x <∞.

We write R∗ = −∞ ∪ R ∪ ∞, which is known as the extended real numbers.

5

Page 7: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

5. Analysis.

• sup and inf. If A is a bounded set of real numbers, we write sup(A) and inf(A) for thereal numbers that are their least upper bounds and greatest lower bounds (respectively.)If A fails to be bounded above, we write sup(A) =∞ and if A fails to be bounded belowwe write inf(A) = −∞. Note that inf(A) = − sup(−A) where −A = −x;x ∈ A. Iff : S → R is a mapping, we write supx∈S f(x) = supf(x);x ∈ S. A very usefulinequality is

supx∈S|f(x) + g(x)| ≤ sup

x∈S|f(x)|+ sup

x∈S|g(x)|.

• Sequences and Limits. Let (an) = (a1, a2, a3, . . .) be a sequence of real numbers. Itconverges to the real number a if given any ε > 0 there exists a natural number N sothat whenever n > N we have |a− an| < ε. We then write a = limn→∞ an.A sequence (an) which is monotonic increasing (i.e. an ≤ an+1 for all n ∈ N) andbounded above (i.e. there exists K > 0 so that an ≤ K for all n ∈ N) converges tosupn∈N an.A sequence (an) which is monotonic decreasing (i.e. an+1 ≤ an for all n ∈ N) andbounded below (i.e. there exists L > 0 so that an ≥ L for all n ∈ N) converges toinfn∈N an.A subsequence of a sequence (an) is itself a sequence of the form (ank) where nk1 < nk2

when k1 < k2.• Series. If the sequence (sn) converges to a limit s where sn = a1 +a2 + · · ·+an we writes =

∑∞n=1 an and call it the sum of the series. If each an ≥ 0 then the sequence (sn) is

either convergent to a limit or properly divergent to infinity. In the latter case we writes =∞ and interpret this in the sense of extended real numbers.

• Continuity. A function f : R→ R is continuous at a ∈ R if given any ε > 0 there existsδ > 0 so that |x− a| < δ ⇒ |f(x)− f(a)| < ε. Equivalently f is continuous at a if givenany sequence (an) that converges to a, the sequence (f(an)) converges to f(a).f is a continuous function if it is continuous at every a ∈ R.

6

Page 8: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Chapter 1

Measure Spaces and Measure

1.1 What is Measure?

Measure theory is the abstract mathematical theory that underlies all models of measurement inthe real world. This includes measurement of length, area and volume, mass but also chance/probability.Measure theory is on the one hand a branch of pure mathematics, but it also plays a key role inmany applied areas such as physics and economics. In particular it provides a foundation for boththe modern theory of integration and also the theory of probability. It is one of the milestones ofmodern analysis and is an invaluable tool for functional analysis.

To motivate the key definitions, suppose that we want to measure the lengths of several linesegments. We represent these as closed intervals of the real number line R so a typical line segmentis [a, b] where b > a. We all agree that its length is b− a. We write this as

m([a, b]) = b− a

and interpret this as telling us that the measurem of length of the line segment [a, b] is the numberb− a. We might also agree that if [a1, b1] and [a2, b2] are two non-overlapping line segments andwe want to measure their combined length then we want to apply m to the set-theoretic union[a1, b1] ∪ [a2, b2] and

m([a1, b1] ∪ [a2, b2]) = (b2 − a2) + (b1 − a1) = m([a1, b1]) +m([a2, b2]).(1.1)

An isolated point c has zero length and so

m(c) = 0.

and if we consider the whole real line in its entirety then it has infinite length, i.e.

m(R) =∞.

We have learned so far that if we try to abstract the notion of a measure of length, then weshould regard it as a mapping m defined on subsets of the real line and taking values in theextended non-negative real numbers [0,∞].

Question Does it make sense to consider m on all subsets of R?

7

Page 9: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Example 1.1.1 (The Cantor Set.) Start with the interval [0, 1] and remove the middle thirdto create the set C1 = [0, 1/3) ∪ (2/3, 1]. Now remove the middle third of each remaining pieceto get C2 = [0, 1/9) ∪ (2/9, 1/3) ∪ (2/3, 7/9) ∪ (8/9, 1]. Iterate this process so for n > 2, Cn isobtained from Cn−1 by removing the middle third of each set within that union. The Cantor setis C =

⋂∞n=1 Cn. It turns out that C is uncountable. Does m(C) make sense?

We’ll see later that m(C) does make sense and is a finite number (can you guess what it is?).But it turns out that there are even wilder sets in R than C which have no length. These arequite difficult to construct (they require the axiom of choice) so we won’t try to describe themhere.

Conclusion. The set of all subsets of R is its power set P(R). We’ve just learned that thepower set is too large to support a good theory of measure of length. So we need to find a smallerclass of subsets that we can work with.

8

Page 10: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

1.2 Sigma Fields

So far we have only discussed length but now we want to be more ambitious. Let S be an arbitraryset. We want to define mappings from subsets of S to [0,∞] which we will continue to denote bym. These will be called measures and they will share some of the properties that we’ve just beenlooking at for measures of length. Now on what type of subset of S can m be defined? The powerset of S is P(S) and we have just argued that this could be too large for our purposes as it maycontain sets that can’t be measured.

Suppose that A and B are subsets of S that we can measure. Then we should surely be ableto measure the complement Ac, the union A∪B and the whole set S. Note that we can then alsomeasure A ∩B = (Ac ∪Bc)c. This leads to a definition

Definition 1.2.1 Let S be a set. A Boolean algebra B is a set of subsets of S that has thefollowing properties

B(i) S ∈ B,

B(ii) If A,B ∈ B then A ∪B ∈ B,

B(iii) If A ∈ B then Ac ∈ B.

Note that B is a set, and each element of B is a subset of S. In other words, B is a subset of thepower set P(S). In this course we will frequently work with sets, whose elements are sets. It’simportant to get used to working with these objects; don’t forget the difference between 1, 2and 1, 2.

Boolean algebras are named after the British mathematician George Boole (1815-1864) whointroduced them in his book The Laws of Thought published in 1854. They are well studiedmathematical objects that are extremely useful in logic and digital electronics. It turns out thatthey are inadequate for our own purposes – we need a little more sophistication.

If we use induction on B(ii) then we can show that, if A1, A2, . . . An ∈ B then A1∪A2∪· · ·∪An ∈B. This is left for you to prove, in Problem 1.1. But we need to be able to do analysis and thisrequires us to be able to handle infinite unions. The next definition gives us what we need:

Definition 1.2.2 Let S be a set. A σ-field Σ is a set of subsets of S that has the followingproperties

S(i) S ∈ Σ,

S(ii) If (An) is a sequence of sets with An ∈ Σ for all n ∈ N then⋃∞n=1 An ∈ Σ,

S(iii) If A ∈ Σ then Ac ∈ Σ.

The terms σ-field and σ-algebra have the same meaning (this is an unfortunate accident ofhistory!). Often you will find that ‘σ-field’ is used in advanced texts and ‘σ-algebra’ is used withinlecture courses. I prefer σ-field, you may use either.

Lastly, a piece of terminology.

Definition 1.2.3 Given a σ-field Σ on S, we say that a set A ⊂ S is measurable if A ∈ Σ

9

Page 11: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Facts about σ-fields

• By S(i) and S(iii), ∅ = Sc ∈ Σ.

• We have seen in S(ii) that infinite unions of sets in Σ are themselves in Σ. The same istrue of finite unions. To see this let A1, . . . , Am ∈ Σ and define the sequence (A′n) by

A′n =

An if 1 ≤ n ≤ m

∅ if n > mNow apply S(ii) to get the result. We can deduce from this

that every σ-field is a Boolean algebra.

• Σ is also closed under infinite (or finite) intersections. To see this use de Morgan’s law towrite

∞⋂i=1

Ai =( ∞⋃i=1

Aci

)c.

• Σ is closed under set theoretic differences A−B, since (by definition) A−B = A ∩Bc.

Examples of σ-fields

1. P(S) is a σ-field. If S is finite with n elements then P(S) has 2n elements (Problem 1.2).

2. For any set S, ∅, S is a σ-field which is called the trivial σ-field. It is the basic tool formodelling logic circuitry where ∅ corresponds to “OFF” and S to “ON”.

3. If S is any set and A ⊂ S then ∅, A,Ac, S is a σ-field .

4. The most important σ-field for studying the measure of length is the Borel σ-field of R whichis denoted B(R). It is named after the French mathematican Emile Borel (1871-1956) whowas one of the founders of measure theory. It is defined rather indirectly and we postponethis definition until after the next section.

A pair (S,Σ) where S is a set and Σ is a σ-field of subsets of S is called a measurable spaceThere are typically many possible choices of Σ to attach to S. For example we can always take Σto be trivial or the power set. The choice of Σ is determined by what we want to measure.

10

Page 12: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

1.3 Measure

Definition 1.3.1 Let (S,Σ) be a measurable space. A measure on (S,Σ) is a mapping m : Σ→[0,∞] which satisfies

M(i) m(∅) = 0,

M(ii) (σ-additivity) If (An)n∈N is a sequence of sets where each An ∈ Σ and if these sets aremutually disjoint, i.e. An ∩ Am = ∅ if m 6= n, then

m

( ∞⋃n=1

An

)=∞∑n=1

m(An).

M(ii) may appear to be rather strong. Our earlier discussion about length led us to m(A ∪B) =m(A) + m(B) and straightforward induction then extends this to finite additivity: m(A1 ∪ A2 ∪· · · ∪ An) = m(A1) + m(A2) + · · · + m(An) but if we were to replace M(ii) by this weaker finiteadditivity condition, we would not have an adequate tool for use in analysis, and this would makeour theory much less powerful.

The key point here is, of course, limits. Limits are how we rigorously justify that approxima-tions work – consequently we need them, if we are to create a theory that will, ultimately, beuseful to experimentalists and modellers.

Basic Properties of Measures

1. (Finite additivity) If A1, A2, . . . , Ar ∈ Σ and are mutually disjoint then

m(A1 ∪ A2 ∪ · · · ∪ Ar) = m(A1) +m(A2) + · · ·+m(Ar).

To see this define the sequence (A′n) by A′n =

An if 1 ≤ n ≤ r

∅ if n > rThen

m

(r⋃i=1

Ai

)= m

( ∞⋃i=1

A′i

)=∞∑i=1

m(A′i) =r∑i=1

m(Ai),

where we used M(ii) and then M(i) to get the last two expressions.

2. If A,B ∈ Σ with B ⊆ A and either m(A) <∞, or m(A) =∞ but m(B) <∞, then

m(A−B) = m(A)−m(B). (1.2)

To prove this write the disjoint union A = (A−B)∪B and then use the result of (1) (withr = 2).

3. (Monotonicity) If A,B ∈ Σ with B ⊆ A then m(B) ≤ m(A).If m(A) < ∞ this follows from (1.2) using the fact that m(A − B) ≥ 0. If m(A) = ∞, theresult is immediate.

4. If A,B ∈ Σ are arbitrary (i.e. not necessarily disjoint) then

m(A ∪B) +m(A ∩B) = m(A) +m(B). (1.3)

The proof of this is Problem 1.4 part (a). Note that if m(A ∩B) <∞ we have

m(A ∪B) = m(A) +m(B)−m(A ∩B).

11

Page 13: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Now some concepts and definitions. First, let us define the setting that we will work in for allof Chapters 1-3.

Definition 1.3.2 A triple (S,Σ,m) where S is a set, Σ is a σ-field on S, and m : Σ→ [0,∞) isa measure is called a measure space.

The extended real number m(S) is called the total mass of m. The measure m is said to be finiteif m(S) <∞.

We will start to think about probability in Chapter 4. A finite measure is called a probabilitymeasure if m(S) = 1. When we have a probability measure, we use a slightly different notation.

We write Ω instead of S and call it a sample space.We write F instead of Σ. Elements of F are called events.

We use P instead of m.The triple (Ω,F ,P) is called a probability space.

12

Page 14: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Examples of Measures

1. Counting MeasureLet S be a finite set and take Σ = P(S). For each A ⊆ S define

m(A) = #(A) i.e. the number of elements in A.

2. Dirac MeasureThis measure is named after the famous British physicist Paul Dirac (1902-84). Let (S,Σ)be an arbitrary measurable space and fix x ∈ S. The Dirac measure δx at x is defined by

δx(A) =

1 if x ∈ A0 if x /∈ A

Note that we can write counting measure in terms of Dirac measure, so if S is finite andA ⊆ S,

#(A) =∑x∈S

δx(A).

3. Discrete Probability MeasuresLet Ω be a countable set and take F = P(Ω). Let pω, ω ∈ Ω be a set of real numberswhich satisfies the conditions

pω ≥ 0 for all ω ∈ Ω and∑ω∈Ω

pω = 1.

Now define the discrete probability measure P by

P (A) =∑ω∈A

pω =∑ω∈Ω

pωδω(A),

for each A ∈ F .For example if #(Ω) = n + 1 and 0 < p < 1 we can obtain the binomial distribution as aprobability measure by taking pr =

(nr

)pr(1− p)n−r for r = 0, 1, . . . , n.

4. Measures via IntegrationLet (S,Σ,m) be an arbitrary measure space and f : S → [0,∞) be a function that takesnon-negative values. In Chapter 3, we will meet a powerful integration theory that allowsus to cook up a new measure If from m and f (provided that f is suitably well-behaved,which will think about in Chapter 2) by the prescription:

If (A) =∫Af(x)m(dx),

for all A ∈ Σ.

13

Page 15: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

1.4 The Borel σ-field and Lebesgue Measure

In this section we take S to be the real number line R. We want to describe a measure λ thatcaptures the notion of length as we discussed at the beginning of this chapter. So we should haveλ((a, b)) = b − a. The first question is - which σ-field should we use? We have already arguedthat the power set P(R) is too big. Our σ-field should contain open intervals, and also unions,intersections and complements of these.

Definition 1.4.1 The Borel σ-field of R to be denoted B(R) is the smallest σ-field that containsall open intervals (a, b) where −∞ ≤ a < b ≤ ∞. Sets in B(R) are called Borel sets.

Note that B(R) also contains isolated points a where a ∈ R. To see this first observethat (a,∞) ∈ B(R) and also (−∞, a) ∈ B(R). Now by S(iii),(−∞, a] = (a,∞)c ∈ B(R) and[a,∞) = (−∞, a)c ∈ B(R). Finally as σ-fields are closed under intersections, a = [a,∞) ∩(−∞, a] ∈ B(R). You can show that B(R) also contains all closed intervals (see Problem 1.8).

We make two observations:

1. B(R) is defined quite indirectly and there is no “formula” that can be used to give the mostgeneral element in it. However it is very hard to find a subset of R that isn’t in B(R) – wewill give an example of one in Section 1.5.

2. B(S) makes sense on any set S for which there are subsets that can be called “open” in asensible way. In particular this works for metric spaces. The most general type of S forwhich you can form B(S) is a topological space.

The measure that precisely captures the notion of length is called Lebesgue measure in honourof the French mathematician Henri Lebesgue (1875-1941), who founded the modern theory ofintegration. We will denote it by λ. First we need a definition.

Let A ∈ B(R) be arbitrary. A covering of A is a finite or countable collection of open intervals(an, bn), n ∈ N so that

A ⊆∞⋃n=1

(an, bn).

Definition 1.4.2 Let CA be the set of all coverings of the set A ∈ B(R). The Lebesgue measureλ on (R,B(R)) is defined by the formula:

λ(A) = infCA

∞∑n=1

(bn − an), (1.4)

where the inf is taken over all possible coverings of A.

It would take a long time to prove that λ really is a measure, and it wouldn’t help us understandλ any better if we did it, so we’ll omit that from the course. For the proof, see the standard textbooks e.g. Cohn, Schilling or Tao.

Let’s check that the definition (1.4) agrees with our intuitive ideas about length.

1. If A = (a, b) then λ((a, b)) = b − a as expected, since (a, b) is a covering of itself and anyother cover will have greater length.

14

Page 16: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2. If A = a then choose any ε > 0. Then (a − ε/2, a + ε/2) is a cover of a and so λ(a) ≤(a+ ε/2)− (a− ε/2) = ε. But ε is arbitrary and so we conclude that λ(a) = 0.From (1) and (2), and using M(ii), we deduce that for a < b,

λ([a, b)) = λ(a ∪ (a, b)) = λ(a) + λ((a, b)) = b− a.

3. If A = [0,∞) write A =⋃∞n=1[n − 1, n). Then by M(ii), λ([0,∞)) = ∞. By a similar

argument, λ((−∞, 0)) =∞ and so λ(R) = λ((−∞, 0)) + λ([0,∞)) =∞.

4. If A ∈ B(R), and for some x ∈ R we define Ax = x+ a ; a ∈ A, then λ(A) = λ(Ax).In words, if we take a set A and translate it (by x), we do not change its measure. This iseasily seen from (1.4), because any cover of A can be translated by x to be a cover of Ax.

In simple practical examples on Lebesgue measure, it is generally best not to try to use (1.4),but to just apply the properties (1) to (4) above:

e.g. to find λ((−3, 10)− (−1, 4)), use (1.2) to obtain

λ((−3, 10)− (−1, 4)) = λ((−3, 10))− λ((−1, 4)))= (10− (−3))− (4− (−1)) = 13− 5 = 8.

If I is a closed interval (or in fact any Borel set) in R we can similarly define B(I), the Borelσ-field of I, to be the smallest σ-field containing all open intervals in I. Then Lebesgue measureon (I,B(I)) is obtained by restricting the sets A in (1.4) to be in B(I).

Sets of measure zero play an important role in measure theory. Here are some interestingexamples of quite “large” sets that have Lebesgue measure zero

1. Countable Subsets of R have Lebesgue Measure ZeroLet A ⊂ R be countable. Write A = a1, a2, . . . =

⋃∞n=1an. Since A is an infinite union

of point sets, it is in B(R). Then

λ(A) = λ

( ∞⋃n=1an

)=∞∑n=1

λ(an) = 0.

It follows thatλ(N) = λ(Z) = λ(Q) = 0.

The last of these is particularly intriguing as it tells us that the only contribution to lengthof sets of real numbers comes from the irrationals.

2. The Cantor Set has Lebesgue Measure ZeroRecall the construction of the Cantor set C =

⋂∞n=1 Cn given earlier in this chapter. Recall

also that the Cn are decreasing, that is Cn+1 ⊆ Cn, and hence also C ⊆ Cn for all n.Since Cn is a union of intervals, Cn ∈ B(R) for all n ∈ N. Hence C ∈ B(R). We easily seethat λ(C1) = 1− 1

3 and λ(C2) = 1− 13 −

29 . Iterating, we deduce that λ(Cn) = 1−

∑nr=1

2r−1

3r

and since λ(C) ≤ λ(Cn) we thus have

λ(C) ≤ λ(Cn) = 1−n∑r=1

2r−1

3r .

Letting n→∞, and using that limits preserve weak inequalities, we obtain λ(C) ≤ 0. Butby definition of a measure we have λ(C) ≥ 0. Hence λ(C) = 0.

15

Page 17: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

1.5 An example of a non-measurable set (?)

Note that this section has a (?), meaning that it is off-syllabus. It is included for interest.We might wonder, why go to all the trouble of defining the Borel σ-field? In other words, why

can’t we measure (the ‘size’ of) every possible subset of R? We will answer these questions byconstructing a strange looking set V ⊆ R; we will then show that it is not possible to define theLebesgue measure of V .

As usual, let Q denote the rational numbers. For any x ∈ R we define

Qx = x+ q ; q ∈ Q. (1.5)

Note that different x values may give the same Qx. For example, an exercise for you is to provethat Q√2 = Q1+

√2. You can think of Qx as the set Q translated by x.

It is easily seen that Qx ∩ [0, 1] is non-empty; just pick some rational q that is slightly lessthan x and note that x+ (−q) ∈ Qx ∩ [0, 1]. Now, for each set Qx, we pick precisely one elementr ∈ Qx ∩ [0, 1] (it does not matter which element we pick). We write this number r as r(Qx).Define

V = r(Qx) ; x ∈ R,

which is a subset of [0, 1]. For each q ∈ Q define

Vq = q +m ; m ∈ V .

Clearly V = V0, and Vq is precisely the set V translated by q. Now, let us record some factsabout Vq.

Lemma 1.5.1 It holds that

1. If q1 6= q2 then Vq1 ∩ Vq2 = ∅.

2. R =⋃q∈Q Vq.

3. [0, 1] ⊆⋃q∈Q∩[−1,1] Vq ⊆ [−1, 2].

Before we prove this lemma, let us use it to show that V cannot have a Lebesgue measure. Wewill do this by contradiction: assume that λ(V ) is defined.

Since V and Vq are translations of each other, they must have the same Lebesgue measure.We write c = λ(V ) = λ(Vq), which does not depend on q. Let us write Q∩ [−1, 1] = q1, q2, . . . , ,which we may do because Q is countable. By parts (1) and (3) of Lemma 1.5.1 and propertyM(ii) we have

λ

⋃q∈Q∩[−1,1]

Vq

=∞∑i=1

λ(Vqi) =∞∑i=1

c.

Using the monotonicity property of measures (see Section 1.3) and part (3) of Lemma 1.5.1 wethus have

1 ≤∞∑i=1

c ≤ 3.

However, there is no value of c which can satisfy this equation! So it is not possible to make senseof the Lebesgue measure of V .

The set V is known as a Vitali set. In higher dimensions even stranger things can happen withnon-measurable sets; you might like to investigate the Banach-Tarski paradox.

16

Page 18: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Proof: [Of Lemma 1.5.1.] We prove the three claims in turn.(1) Let q1, q2 ∈ Q be unequal. Suppose that some x ∈ Vq1 ∩ Vq2 exists – and we now look for

a contradiction. By definition of Vq we have

x = q1 + r(Qx1) = q2 + r(Qx2). (1.6)

By definition of Qx we may write r(Qx1) = x1 + q′1 for some q′1 ∈ Q, and similarly for x2, so weobtain x = q1 + x1 + q′1 = q2 + x2 + q′2 where q, q′ ∈ Q. Hence, setting q = q2 − q1 + q′2 − q′1 ∈ Q,we have x1 + q = x2, which by (1.5) means that Qx1 = Qx2 . Thus r(Qx1) = r(Qx2), so going backto (1.6) we obtain that q1 = q2. But this contradicts our assumption that q1 6= q2. Hence x doesnot exist and Vq1 ∩ Vq2 = ∅.

(2) We will show ⊇ and ⊆. The first is easy: since Vq ⊆ R it is immediate that R ⊇⋃q∈Q Vq.

Now take some x ∈ R. Since we may take q = 0 in (1.5) we have x ∈ Qx. By definition ofr(Qx) we have r(Qx) = x+ q′ for some q′ ∈ Q. By definition of V we have r(Qx) ∈ V and sincex = r(Qx)− q′ we have x ∈ V−q′ . Hence x ∈

⋃q∈Q Vq.

(3) Since V ⊆ [0, 1], we have Vq ∩ [0, 1] = ∅ whenever q /∈ [−1, 1]. Hence, from part (2) andset algebra we have

R ∩ [0, 1] =

⋃q∈Q

Vq

∩ [0, 1] =⋃q∈Q

Vq ∩ [0, 1] =⋃

q∈Q∩[−1,1]Vq ∩ [0, 1] ⊆

⋃q∈Q∩[−1,1]

Vq.

This proves the first ⊆ of (3). For the second simply note that V ⊆ [0, 1] so Vq ⊆ [−1, 2] wheneverq ∈ [−1, 1].

Remark 1.5.2 We used the axiom of choice to define the function r(·).

17

Page 19: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

1.6 Two Useful Theorems About Measure

In this section we return to the consideration of arbitrary measure spaces (S,Σ,m). Let (An) bea sequence of sets in Σ. We say that it is increasing if An ⊆ An+1 for all n ∈ N, and decreasing ifAn+1 ⊆ An. When (An) is increasing, it is easily seen that (Acn) is decreasing.

When (An) is increasing, a useful technique is the disjoint union trick whereby we can write⋃∞n=1 An =

⋃∞n=1 Bn where the Bns are all mutually disjoint by defining B1 = A1 and for n >

1, Bn = An − An−1. e.g. R =⋃∞n=1[−n, n] and here B1 = [−1, 1], B2 = [−2,−1) ∪ (1, 2] etc.

Theorem 1.6.1 Let An ∈ Σ for all n. It holds that:

1. If (An) is increasing and A =⋃∞n=1 An then m(A) = limn→∞m(An).

2. If (An) is decreasing and A =⋂∞n=1 An, andm is a finite measure, thenm(A) = limn→∞m(An).

Proof: For the second claim, see Problem 1.9. We will prove the first claim here. We use thedisjoint union trick and M(ii) to find that

m(A) = m

( ∞⋃n=1

Bn

)=∞∑n=1

m(Bn) = limN→∞

N∑n=1

m(Bn) = limN→∞

m

(N⋃n=1

Bn

)= lim

N→∞m(AN ).

Here we use that AN = B1 ∪B2 ∪ · · · ∪BN .

Theorem 1.6.2 If (An) is an arbitrary sequence of sets with An ∈ Σ for all n ∈ N then

m

( ∞⋃n=1

An

)≤∞∑n=1

m(An).

Proof: From Problem 1.4, we have m(A1 ∪ A2) + m(A1 ∩ A2) = m(A1) + m(A2) from whichwe deduce that m(A1 ∪ A2) ≤ m(A1) +m(A2). By induction we then obtain for all N ≥ 2,

m

(N⋃n=1

An

)≤

N∑n=1

m(An).

Now define XN =⋃Nn=1 An. Then XN ⊆ XN+1 and so (XN ) is increasing to

⋃∞n=1 Xn =

⋃∞n=1 An.

By Theorem 1.6.1 we have

m

( ∞⋃n=1

An

)= m

( ∞⋃n=1

Xn

)= lim

N→∞m(XN ) = lim

N→∞m

(N⋃n=1

An

)≤ lim

N→∞

N∑n=1

m(An) =∞∑n=1

m(An).

18

Page 20: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

1.7 Product Measures

We calculate areas of rectangles by multiplying products of lengths of their sides. This suggeststrying to formulate a theory of products of measures. Let (S1,Σ1,m1) and (S2,Σ2,m2) be twomeasure spaces. Form the Cartesian product S1 × S2. We can similarly try to form a product ofσ-fields

Σ1 × Σ2 = A×B;A ∈ Σ1, B ∈ Σ2,

but it turns out that Σ1 × Σ2 is not a σ-field (or even a Boolean algebra) e.g. take S1 = S2 = Rand consider ((0, 1)× (0, 1))c. Instead we need Σ1⊗Σ2 which is defined to be the smallest σ-fieldwhich contains all the sets in Σ1 × Σ2. We state but do not prove:

Theorem 1.7.1 There exists a measure m1×m2 on (S1×S2,Σ1⊗Σ2) so that for all A ∈ Σ1, B ∈Σ2,

(m1 ×m2)(A×B) = m1(A)m2(B).

Definition 1.7.2 The measure m1 ×m2 is called the produce measure of m1 and m2.

For example, consider R2 = R×R. We equip it with the Borel σ-field, B(R2) = B(R)⊗B(R).Then the product Lebesgue measure λ2 = λ× λ. It has the property that

λ2((a, b)× (c, d)) = (b− a)(d− c).

Of course, (b− a)(d− c) is the area of the rectangle (a, b)× (c, d). In fact, from a mathematicalpoint of view the measure λ2 is the definition of area. Similarly, λ3 = λ× λ× λ is how we definevolume, in three dimensions.

Remark 1.7.3 After thinking about λ × λ × λ, we might ask if, given measures m1,m2,m3, wehave (m1×m2)×m3 = m1× (m2×m3). It is true, but we won’t prove it. Consequently we writeboth these as simply m1 ×m2 ×m3, without any ambiguity.

We can go beyond 3 dimensions. Given n-measure spaces (S1,Σ1,m1), (S2,Σ2,m2), . . . , (Sn,Σn,mn),we can iterate the above procedure to define the product σ-field Σ1⊗Σ2⊗· · ·⊗Σn and the productmeasure m1 ×m2 × · · · ×mn so that for Ai ∈ Σi, 1 ≤ i ≤ n,

(m1 ×m2 × · · · ×mn)(A1 × A2 × · · · × An) = m1(A1)m2(A2) · · ·mn(An).

In particular n-dimensional Lebesgue measure on Rn may be defined in this way.Of course there are many measures that one can construct on (S1 × S2,Σ1 × Σ2) and not all

of these will be product measures. For probability spaces, product measures are closely relatedto the notion of independence as we will see later.

19

Page 21: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

1.8 Exercises

1.1 Give a careful proof by induction of the fact that if B is a Boolean algebra andA1, A2, . . . , An ∈B, then A1 ∪ A2 ∪ · · · ∪ An ∈ B.

1.2 Show that if S is a set containing n elements, then the power set P(S) contains 2n elements.Hint: How many subsets are there of size r, for a fixed 1 ≤ r ≤ n? The binomial theoremmay also be of some use.

1.3 Let Σ1 and Σ2 be σ-fields of subsets of a set S. Define

Σ1 ∩ Σ2 = A ⊆ S;A ∈ Σ1 and A ∈ Σ2.

Show that Σ1 ∩ Σ2 is a σ-field. Define Σ1 ∪ Σ2 = A ⊆ S;A ∈ Σ1 or A ∈ Σ2, (where “or”is inclusive.) Why is Σ1 ∪ Σ2 not in general a σ-field?

1.4 If (S,Σ,m) is a measure space, show that for all A,B ∈ Σ

(a) m(A ∪B) +m(A ∩B) = m(A) +m(B),(b) m(A ∪B) ≤ m(A) +m(B).

Hence prove that if A1, A2, . . . , An ∈ Σ,

m

(n⋃i=1

Ai

)≤

n∑i=1

m(Ai).

1.5 (a) If m is a measure on (S,Σ) and k > 0, show that km is also a measure on (S,Σ) wherefor all A ∈ Σ,

(km)(A) = km(A).

Hence show that if m is a finite measure and m(S) 6= 0, then P is a probability measurewhere P(A) = m(A)

m(S) for all A ∈ Σ.(b) Let [a, b] be a finite closed interval in R. Write down a formula for the uniform distri-

bution as a probability measure on ([a, b],B([a, b])), using the above considerations andLebesgue measure.Hint: Recall that the uniform distribution has the property that subintervals of [a, b]which have the same length, will have the same probability.

(c) If m and n are measures on (S,Σ), deduce that m + n is a measure on (S,Σ) where(m+ n)(A) = m(A) + n(A) for all A ∈ Σ.

1.6 (a) If m is a measure on (S,Σ) and B ∈ Σ is fixed, show that mB(A) = m(A∩B) for A ∈ Σdefines another measure on (S,Σ).

(b) If m is a finite measure and m(B) > 0, deduce that PB is a probability measure where

PB(A) = mB(A)m(B) .

How does this relate to the notion of conditional probability?

20

Page 22: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

1.7 Let S = x1, x2, . . . , xn be a finite set and let c1, c2, . . . , cn be non-negative numbers. Deducethat m is a measure on (S,P(S)) where m =

∑ni=1 ciδxi . What condition should be imposed

on c1, c2, . . . , cn for m to be a probability measure?

1.8 Show that B(R) contains all closed intervals [a, b], where −∞ < a < b <∞.

1.9 (a) Let m be a finite measure on (S,Σ).(i) Show that, for any A ∈ Σ,

m(Ac) = m(S)−m(A).

(ii) Let (An)n∈N be a decreasing sequence of sets in Σ. Show that m(An) → m(∩jAj)as n→∞.

(b) Give an example of a (not finite!) measure m, and sets An and A =⋂∞n=1 An such that

m(A) 6= limn→∞m(An).

Challenge Questions

1.10 Let S be a finite set and Σ be a σ-field on S. Consider the set

Π = A ∈ Σ ; if B ∈ Σ and B ⊆ A then either B = A or B = ∅. (?)

(a) Show that Π is a finite set.(b) Using (a), let us enumerate the elements of Π as Π = Π1,Π2, · · · ,Πk, where each Πi

is distinct from the others.(i) Show that Πi ∩ Πj = ∅ for i 6= j. Hint: Could Πi ∩ Πj be an element of Π?(ii) Show that ∪ki=1Πi = S. Hint: If C = S \ ∪ki=1Πi is non-empty, is C ∈ Π?(iii) Let A ∈ Σ. Show that

A =⋃i∈I

Πi

where I = i = 1, . . . , k ; A ∩ Πi 6= ∅.

21

Page 23: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Chapter 2

Measurable Functions

We now restrict ourselves to studying a particular kind of function, known as a measurable func-tion. For measure theory, this is an important step, because it allows us to exclude some verystrangely behaved examples (in the style of Section 1.5) that would disrupt our theory.

We will see in Chapter 4 that measurable functions also play a huge role in probability the-ory, where they provide a mechanism for identifying which random variables depend on whichinformation.

2.1 Liminf and Limsup

This section introduces some important tools from analysis which there wasn’t time to cover inMAS221. Let (an) be a sequence of real numbers. It may or may not converge. For example thesequence whose nth term is (−1)n fails to converge but it does have two convergent subsequencescorresponding to a2n−1 = −1 and a2n = 1. This is a very special case of a general phenomenonthat we’ll now describe.

Assume that the sequence (an) is bounded, i.e. there exists K > 0 so that |an| ≤ K for alln ∈ N. Define a new sequence (bn) by bn = infk≥n ak. Then you can check that (bn) is monotonicincreasing and bounded above (by K). Hence it converges to a limit.

In fact, we are fine if (an) is not bounded. In this case bn is still monotone increasing, but wenow have the extra possibilities that bn →∞ or bn → −∞. In fact, we might even have bn = ±∞for some or all finite n.

We define lim infn→∞ an = limn→∞ bn, where bn = infk≥n ak. We call this the limit inferior orliminf of the sequence (an). So we have

lim infn→∞

an = limn→∞

infk≥n

ak = supn∈N

infk≥n

ak. (2.1)

Similarly the sequence (cn) where cn = supk≥n ak is monotonic decreasing and bounded below.So it also converges to a limit which we call the limit superior or limsup for short. We denotelim supn→∞ an = limn→∞ cn. Then we have

lim supn→∞

an = limn→∞

supk≥n

ak = infn∈N

supk≥n

ak. (2.2)

Clearly we have lim infn→∞ an ≤ lim supn→∞ an. In Problem 2.2, you can investigate someother properties of lim sup and lim inf. It can be shown that the smallest limit of any convergent

22

Page 24: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

subsequence of (an) is lim infn→∞ an, and the largest limit is lim supn→∞ an. The next theoremis very useful:

Theorem 2.1.1 A bounded sequence of real numbers (an) converges to a limit if and only iflim infn→∞ an = lim supn→∞ an. In this case we have

limn→∞

an = lim infn→∞

an = lim supn→∞

an.

Proof: If (an) converges to a limit, then all of its subsequences also converge to the same limitand it follows that limn→∞ an = lim infn→∞ an = lim supn→∞ an. Conversely suppose that wedon’t know that (an) converges but we do know that lim infn→∞ an = lim supn→∞ an. Then forall n ∈ N,

0 ≤ an − infk≥n

ak ≤ supk≥n

ak − infk≥n

ak.

Butlimn→∞

(supk≥n

ak − infk≥n

ak

)= lim sup

n→∞an − lim inf

n→∞an = 0

and solimn→∞

(an − inf

k≥nak

)= 0

by the sandwich rule. But since

an =(an − inf

k≥nak

)+ inf

k≥nak,

and limn→∞ infk≥n ak = lim infn→∞ an, we can use the algebra of limits to deduce that (an)converges to the common value of lim infn→∞ an and lim supn→∞ an.

23

Page 25: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2.2 Measurable Functions - Basic Concepts

We begin with some motivation from probability. Let (Ω,F ,P) be a probability space. When wefirst study probability, we learn that random variables should be considered as mappings from Ωto R. But is this enough for a rigorous mathematical theory? In practise we are interesting incalculating probabilities such as Prob(X > a) where a ∈ R. What does this mean in terms of themeasure P? We must have

Prob(X > a) = P(ω ∈ Ω;X(ω) ∈ (a,∞))= P(X−1((a,∞))).

Now X−1((a,∞)) ⊆ Ω, however P only makes sense when applied to sets in F . So we concludethat P(X > a) only makes sense if we impose an additional condition on the mapping X, namelythat X−1((a,∞)) ∈ F for all a ∈ R. This property is precisely what we mean by measurability.

In fact let (S,Σ) be an arbitrary measurable space. A mapping f : S → R is said to bemeasurable if f−1((a,∞)) ∈ Σ for all a ∈ R. So in particular, we define a random variable on aprobability space to be a measurable mapping from Ω to R.

Theorem 2.2.1 Let f : S → R be a mapping. The following are equivalent:

(i) f−1((a,∞)) ∈ Σ for all a ∈ R.

(ii) f−1([a,∞)) ∈ Σ for all a ∈ R.

(iii) f−1((−∞, a)) ∈ Σ for all a ∈ R.

(iv) f−1((−∞, a]) ∈ Σ for all a ∈ R.

Proof: (i) ⇔ (iv) as f−1(A)c = f−1(Ac) and Σ is closed under taking complements.(ii) ⇔ (iii) is proved similarly.(i) ⇒ (ii) uses [a,∞) =

⋂∞n=1(a− 1/n,∞) and so

f−1([a,∞)) =∞⋂n=1

f−1((a− 1/n,∞))

and the result follows since Σ is closed under countable intersections.(ii) ⇒ (i) uses

f−1((a,∞)) =∞⋃n=1

f−1([a+ 1/n,∞))

and the fact that Σ is closed under countable unions.

It follows that f is measurable if any of (i) to (iv) in Theorem 2.2.1 is established for alla ∈ R. In Problem 2.4 you can show that f is measurable if and only if f−1((a, b)) ∈ Σ for all−∞ ≤ a < b ≤ ∞.

A set O in R is open if for every x ∈ O there is an open interval I containing x for whichI ⊆ O. It follows that every open interval in R is an open set. We might ask what other kinds ofopen set there are. The following result gives a surprisingly clear answer.

Proposition 2.2.2 Every open set O in R is a countable union of disjoint open intervals.

24

Page 26: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Proof: Note that a ‘countable union’ includes the case where we only need finitely manyintervals.

For x ∈ O, let Ix be the largest open interval containing x for which Ix ⊆ O. If x, y ∈ O andx 6= y then either Ix and Iy are disjoint or identical, for if they have a non-empty intersectiontheir union is an open interval containing both x and y and that leads to a contradiction unlessthey coincide. Clearly O =

⋃x∈O Ix. We now select a rational number r(x) in every interval Ix

and rewrite O as the countable disjoint union over intervals Ix labelled by distinct rationals r(x).

From Proposition 2.2.2 we see that if O is an open set in R then O ∈ B(R).

Theorem 2.2.3 The mapping f : S → R is measurable if and only if f−1(O) ∈ Σ for all opensets O in R.

Proof: Suppose that f−1(O) ∈ Σ for all open sets O in R. Then in particular f−1((a,∞)) ∈ Σfor all a ∈ R and so f is measurable. Conversely assume that O is open in R and use Proposition2.2.2 to write O =

⋃∞n=1(an, bn). Then

f−1(O) =∞⋃n=1

f−1((an, bn)).

If f is measurable, then f−1((an, bn)) ∈ Σ for all n ∈ N by Problem 2.4, and so f−1(O) ∈ Σ sinceΣ is closed under countable unions.

We now present an equivalent, but more general result than Theorem 2.2.3.

Theorem 2.2.4 The mapping f : S → R is measurable if and only if f−1(A) ∈ Σ for allA ∈ B(R).

Proof: Suppose that f is measurable and let A = E ⊆ R; f−1(E) ∈ Σ. We first show thatA is a σ-field.

S(i). R ∈ A as S = f−1(R).S(ii). If E ∈ A then Ec ∈ A since f−1(Ec) = f−1(E)c ∈ Σ.S(iii). If (An) is a sequence of sets inA then

⋃n∈NAn ∈ A since f−1 (

⋃n∈NAn) =

⋃n∈N f

−1(An) ∈Σ.

By Problem 2.4, f−1((a, b)) ∈ Σ for all −∞ ≤ a < b ≤ ∞, and so A is a σ-field of subsetsof R that contains all the open intervals. But by definition, B(R) is the smallest such σ-field. Itfollows that B(R) ⊆ A and so f−1(A) ∈ Σ for all A ∈ B(R).

The converse is easy (e.g. just allow A to range over open sets, and use Theorem 2.2.3).

Theorem 2.2.4 leads to the following important extension of the idea of a measurable function:Let (S1,Σ1) and (S2,Σ2) be measurable spaces. The mapping f : S1 → S2 is measurable iff−1(A) ∈ Σ1 for all A ∈ Σ2.

Let (S,Σ,m) be a measure space and f : S → R be a measurable function. It is easy to seethat the mapping mf = m f−1 is a measure on (R,B(R). Indeed mf (∅) = 0 is obvious and if

25

Page 27: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

(An) is a sequence of disjoint sets in B(R) we have

mf

( ∞⋃n=1

An

)= m

(f−1

( ∞⋃n=1

An

))

= m

( ∞⋃n=1

f−1(An))

=∞∑n=1

m(f−1(An)) =∞∑n=1

mf (An),

where we use the fact that for m 6= n, f−1(An) ∩ f−1(Am) = f−1(An ∩ Am) = f−1(∅) = ∅.The measure mf is called the pushforward of m by f . In the case of a probability space

(Ω,F , P ) and a random variable X : Ω → R, the pushforward is usually denoted pX . It is aprobability measure on (R,B(R)) (you should check that it has total mass 1) and is called theprobability law or probability distribution of the random variable X.

26

Page 28: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2.3 Examples of Measurable Functions

We first consider the case where S = R (equipped with its Borel σ-field) and look for classes ofmeasurable functions. In fact we will prove that

continuous functions on R ⊆ measurable functions on R.

First we present a result that is well-known (in the wider context of continuous functions onmetric spaces) to those who have taken MAS331.

Proposition 2.3.1 A mapping f : R→ R is continuous if and only if f−1(O) is open for everyopen set O in R.

Proof: First suppose that f is continuous. Choose an open set O and let a ∈ f−1(O) so thatf(a) ∈ O. Then there exists ε > 0 so that (f(a)− ε, f(a) + ε) ⊆ O. By definition of continuity off , for such an ε there exists δ > 0 so that x ∈ (a− δ, a+ δ)⇒ f(x) ∈ (f(a)− ε, f(a) + ε). But thistells us that (a− δ, a+ δ) ⊆ f−1((f(a)− ε, f(a) + ε)) ⊆ f−1(O). Since a is arbitrary we concludethat f−1(O) is open. Conversely suppose that f−1(O) is open for every open set O in R. Choosea ∈ R and let ε > 0. Then since (f(a)− ε, f(a) + ε) is open so is f−1((f(a)− ε, f(a) + ε)). Sincea ∈ f−1((f(a)− ε, f(a) + ε)) there exists δ > 0 so that (a− δ, a+ δ) ⊆ f−1((f(a)− ε, f(a) + ε)).From here you can see that whenever |x− a| < δ we must have |f(x)− f(a)| < ε. But then f iscontinuous at a and the result follows.

Corollary 2.3.2 Every continuous function on R is measurable.

Proof: Let f : R→ R be continuous and O be an arbitrary open set in R. Then by Proposition2.3.1, f−1(O) is an open set in R. Then f−1(O) is in B(R) by the remark after Proposition 2.2.2.Hence f is measurable by Theorem 2.2.3.

There are many discontinuous functions on R that are also measurable. Lets look at animportant class of examples in a wider context. Let (S,Σ) be a general measurable space. FixA ∈ Σ and define the indicator function 1A : S → R by

1A(x) =

1 if x ∈ A0 if x /∈ A

To see that it is measurable its enough to check that

1−1A ((c,∞)) = ∅ ∈ Σ if c ≥ 1

1−1A ((c,∞)) = A ∈ Σ if 0 ≤ c < 1

1−1A ((c,∞)) = S ∈ Σ if c < 0

If (S,Σ) = (R,B(R)) or indeed if S is any metric space, then 1A is clearly a measurable butdiscontinuous function.

A particularly interesting example is obtained by taking (S,Σ) = (R,B(R)) and A = Q.Then 1A is called Dirichlet’s jump function. We have already seen that Q is measurable (it is acountable union of points). As there is a rational number between any pair of irrationals and anirrational number between any pair of rationals, we see that in this case 1A is measurable, butdiscontinuous at every point of R.

A measurable function from (R,B(R)) to (R,B(R)) is sometimes called Borel measurable.

27

Page 29: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2.4 Algebra of Measurable Functions

One of our goals in this (and the next) section is to show that, sums, products, limits etc ofmeasurable functions are themselves measurable. Throughout this section, (S,Σ) is a measurablespace.

Let f and g be functions from S to R and define for all x ∈ S,

(f ∨ g)(x) = maxf(x), g(x) , (f ∧ g)(x) = minf(x), g(x).

Proposition 2.4.1 If f and g are measurable then so are f ∨ g and f ∧ g.

Proof: This follows immediately from the facts that for all c ∈ R,

(f ∨ g)−1((c,∞)) = f−1((c,∞)) ∪ g−1((c,∞))

and (f ∧ g)−1((c,∞)) = f−1((c,∞)) ∩ g−1((c,∞))

Let −f be the function (−f)(x) = −f(x) for all x ∈ S. If f is measurable it is easily checkedthat −f also is (take k = −1 in Problem 2.5.)

Let 0 denote the zero function that maps every element of S to zero, i.e. 0 = 1∅. Then 0 ismeasurable since it is the indicator factor of a measurable set (or use Problem 2.3.)

Define f+ = f ∨ 0 and f− = −f ∨ 0. So that for all x ∈ S,

f+(x) =f(x) if f(x) ≥ 0

0 if f(x) < 0, f−(x) =

−f(x) if f(x) ≤ 0

0 if f(x) > 0

Corollary 2.4.2 If f is measurable then so are f+ and f−.

Now define the set f > g = x ∈ S; f(x) > g(x).

Proposition 2.4.3 If f and g are measurable then f > g ∈ Σ.

Proof: Let rn, n ∈ N be an enumeration of the rational numbers. Then

f > g =⋃n∈Nf > rn > g

=⋃n∈Nf > rn ∩ g < rn

=⋃n∈N

f−1((rn,∞)) ∩ g−1((−∞, rn)) ∈ Σ

Theorem 2.4.4 If f and g are measurable then so is f + g.

Proof: By Problem 2.5, we see that a− g is measurable for all a ∈ R. Now

(f + g)−1((a,∞)) = f + g > a = f > a− g ∈ Σ,

by Proposition 2.4.3 and this establishes the result.

28

Page 30: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

You can use induction to show that if f1, f2, . . . , fn are measurable and c1, c2, . . . , cn ∈ R thenf is also measurable where f = c1f1 + c2f2 + · · ·+ cnfn. So the set of measurable functions fromS to R forms a real vector space. Of particular interest are the simple functions which take theform f =

∑ni=1 ci1Ai where Ai ∈ Σ (1 ≤ i ≤ n). We will learn more about this highly useful class

of functions in the next section and in Chapter 3 we will see that they play an important role inintegration theory.

Theorem 2.4.5 If f : S → R is measurable and G : R→ R is continuous then Gf is measurablefrom S to R.

Proof: For all a ∈ R let Oa = G−1((a,∞)). Then since G is continuous, Oa is an open set inR. Then since for any subset A of S, (G f)−1(A) = f−1(G−1(A)), we have

(G f)−1((a,∞)) = f−1(G−1((a,∞))) = f−1(Oa) ∈ Σ,

by Theorem 2.2.2. The result follows.

Theorem 2.4.6 If f and g are measurable then so is fg.

Proof: Apply Theorem 2.4.5 with G(x) = x2 to deduce that h2 is measurable whenever h is.But

fg = 14[(f + g)2 − (f − g)2]

and the result follows by using Theorem 2.4.4.

29

Page 31: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Limits of Measurable Functions

Let (fn) be a bounded sequence of functions from S to R such that we also have the conditionsupn∈N supx∈S |fn(x)| <∞. Define infn∈N fn and supn∈N fn by(

infn∈N

fn

)(x) = inf

n∈Nfn(x) and

(supn∈N

fn

)(x) = sup

n∈Nfn(x)

for all x ∈ S.1

Proposition 2.4.7 If fn is measurable for all n ∈ N then infn∈N fn and supn∈N fn are bothmeasurable.

Proof: For all c ∈ R, infn∈N

fn > c

=⋂n∈Nfn > c ∈ Σ.

supn∈N

fn > c

=⋃n∈Nfn > c ∈ Σ.

We define lim infn→∞ fn and lim supn→∞ fn by(lim infn→∞

fn)

(x) = lim infn→∞

fn(x) and(

lim supn→∞

fn

)(x) = lim sup

n→∞fn(x)

for all x ∈ S

Theorem 2.4.8 If fn is measurable for all n ∈ N then lim infn→∞ fn and lim supn→∞ fn are bothmeasurable.

Proof: By Proposition 2.4.7, infk≥n fk and supk≥n fk are measurable for each n ∈ N. Then byProposition 2.4.7 again, lim infn→∞ fn = supn∈N infk≥n fk and lim supn→∞ fn = infn∈N supk≥n fkare measurable.

We say that the sequence (fn) converges pointwise to f as n→∞ if limn→∞ fn(x) = f(x) forall x ∈ S.

Theorem 2.4.9 If fn is measurable for all n ∈ N and (fn) converges pointwise to f as n→∞,then f is measurable.

Proof: By Theorem 2.1.1 f(x) = lim infn→∞ fn(x) for all x ∈ S and so f is measurable byTheorem 2.4.8.

1We can drop the boundedness requirement if we work with functions taking values in [−∞,∞]. See Section 2.6.

30

Page 32: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2.5 Simple Functions

Recall the definition of indicator functions 1A where A ∈ Σ. A mapping f : S → R is said to besimple if it takes the form

f =n∑i=1

ci1Ai (2.3)

where c1, c2, . . . , cn ∈ R and A1, A2, . . . , An ∈ Σ with⋃ni=1 Ai = S and Ai ∩ Aj = ∅ when i 6= j.

In other words a simple function is a (finite) linear combination of indicator functions of non-overlapping sets. It follows from Theorem 2.4.4 that every simple function is measurable. It isstraightforward to prove that sums and scalar multiples of simple functions are themselves simple,so the set of all simple functions form a vector space.

We now prove a key result that shows that simple functions are powerful tools for approxi-mating measurable functions. Recall that a mapping f : S → R is non-negative if f(x) ≥ 0 forall x ∈ S, which we write for short as f ≥ 0. We write f ≤ g when g − f ≥ 0. It is easy to seethat a simple function of the form (2.3) (with Ai 6= ∅ for all i = 1, . . . , n) is non-negative if andonly if ci ≥ 0 (1 ≤ i ≤ n).

Theorem 2.5.1 Let f : S → R be measurable and non-negative. Then there exists a sequence(sn) of non-negative simple functions on S with sn ≤ sn+1 ≤ f for all n ∈ N so that (sn) convergespointwise to f as n→∞. Moreover, if f is bounded then the convergence is uniform.

Proof: We split this into three parts.Step 1 Construction of (sn).Divide the interval [0, n) into n2n subintervals Ij , 1 ≤ j ≤ n2n, each of length 1

2n by takingIj =

[j−12n ,

j2n). Let Ej = f−1(Ij) and Fn = f−1([n,∞)). Then S =

⋃n2nj=1 Ej ∪ Fn. We define for

all x ∈ S

sn(x) =n2n∑j=1

(j − 1

2n)1Ej (x) + n1Fn(x).

Step 2 Properties of (sn).For x ∈ Ej , sn(x) = j−1

2n and j−12n ≤ f(x) < j

2n and so sn(x) ≤ f(x). For x ∈ Fn, sn(x) = n

and f(x) ≥ n. So we conclude that sn ≤ f for all n ∈ N.To show that sn ≤ sn+1, fix an arbitrary j and consider Ij =

[j−12n ,

j2n). For convenience, we

write Ij as I and we observe that I = I1 ∪ I2 where I1 =[

2j−22n+1 ,

2j−12n+1

)and I2 =

[2j−12n+1 ,

2j2n+1

). Let

E = f−1(I), E1 = f−1(I1) and E2 = f−1(I2). Then sn(x) = j−12n for all x ∈ E, sn+1(x) = j−1

2n forall x ∈ E1, and sn+1(x) = 2j−1

2n+1 for all x ∈ E2. It follows that sn ≤ sn+1 for all x ∈ E. A similar(easier) argument can be used on Fn.

Step 3 Convergence of (sn).Fix any x ∈ S. Since f(x) ∈ R there exists n0 ∈ N so that f(x) ≤ n0. Then for each

n > n0, f(x) ∈ Ij for some 1 ≤ j ≤ n2n, i.e. j−12n ≤ f(x) < j

2n . But sn(x) = j−12n and so

|f(x)− sn(x)| < 12n and the result follows. If f is bounded we can find n0 ∈ N so that f(x) ≤ n0

for all x ∈ R. Then the argument just given yields |f(x)− sn(x)| < 12n for all x ∈ R from which

we can deduce the uniformity of the convergence.

31

Page 33: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2.6 Extended Real Functions

Let (S,Σ,m) be a measure space. An extended function on S is a mapping f : S → R∗ whereR∗ = −∞ ∪ R ∪ ∞ is the extended real number line. Theorem 2.2.1 extends easily to thiscontext and we have f−1((a,∞]) ∈ Σ for all a ∈ R, if and only if f−1([a,∞]) ∈ Σ for all a ∈ R,if and only if f−1([−∞, a)) ∈ Σ for all a ∈ R, if and only if f−1([−∞, a]) ∈ Σ for all a ∈ R.We then say that f is measurable if it satisfies any one (and hence all) of these conditions. Nowsuppose that (fn) is a sequence of measurable functions from S to [0,∞). If the functions arenot bounded, then there may exist a set A ∈ Σ with m(A) > 0 so that lim infn→∞ fn(x) =∞ forall x ∈ A. Then we may regard lim infn→∞ fn as an extended measurable function in the sensejust given. This will be used implicitly in the integration theory that we’ll describe in the nextchapter.

32

Page 34: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2.7 Exercises

2.1 Let (S,Σ) be a measurable space. Show that for all A,B ∈ Σ

(a) 1A∪B = 1A + 1B − 1A∩B,(b) 1Ac = 1− 1A,(c) 1A−B = 1A − 1B, if B ⊆ A,(d) 1A∩B = 1A1B.

Furthermore if (An) is a sequence of disjoint sets in Σ and A =⋃∞n=1 An, show that 1A =∑∞

n=1 1An .Hint: In (a) consider what happens to both sides in each of the four cases: x ∈ A, x ∈ B;x ∈A, x /∈ B, etc.

2.2 Let (an) and (bn) be bounded sequences of real numbers. Show that

(a) lim supn→∞ an = − lim infn→∞(−an),(b) lim supn→∞(an + bn) ≤ lim supn→∞ an + lim supn→∞ bn,(c) lim infn→∞(an + bn) ≥ lim infn→∞ an + lim infn→∞ bn,(d) if an, bn ≥ 0 for all n ∈ N, lim supn→∞(anbn) ≤ (lim supn→∞ an) (lim supn→∞ bn),(e) if an, bn ≥ 0 for all n ∈ N, lim infn→∞(anbn) ≥ (lim infn→∞ an) (lim infn→∞ bn) ,(f) lim supn→∞ |an| = 0⇒ (an) converges to 0.

2.3 Let (S,Σ) be a measurable space and f : S → R be a constant function, i.e. there existsc ∈ R so that f(x) = c for all x ∈ S. Show that f is measurable.

2.4 If (S,Σ) be a measurable space and f : S → R show that f is measurable if and only iff−1((a, b)) ∈ Σ for all −∞ ≤ a < b ≤ ∞.

2.5 Let (S,Σ) be a measurable space and f : S → R be a measurable function.

(a) Show that g = f + c is measurable, where c ∈ R is fixed,(b) Show that g = kf is measurable, where k ∈ R is fixed.

2.6 Let (S,Σ) be a measurable space and f : S → R be a measurable function. If g : R→ R isBorel measurable, show that g f is measurable from S to R. What does this result tell usabout functions of random variables in probability theory?

2.7 Let f : R → R be Borel measurable. Show that the mapping h : R → R is measurable,where h(x) = f(x+ y) for all x ∈ R, and where y ∈ R is fixed.

2.8 Let (S,Σ) be a measurable space and f : S → R be a function. Define the function|f | : S → R by |f |(x) = |f(x)| for all x ∈ R. Show that

(a) f = f+ − f−,(b) |f | = f+ + f−,(c) if f is measurable then |f | is also measurable.

33

Page 35: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2.9 Let f : R → R be a differentiable function. Explain why both f and its derivative f ′ aremeasurable functions.

2.10 Let f : R→ R be monotonic increasing. Show that it is measurable.

2.11 Let (S,Σ) be a measure space and (fn) be a sequence of measurable functions from S to R.Let f : S → R be measurable. We say that fn → f almost everywhere or (a.e.) for short, iflimn→∞ fn(x) = f(x) for all x ∈ S − A where m(A) = 0. Show that if fn → f (a.e.) andgn → g (a.e.) then

(a) f2n → f2 (a.e.)

(b) fn + gn → f + g (a.e.)(c) fngn → fg (a.e.)

Challenge questions

2.12 A function f : R→ R is said to be upper-semicontinuous at x ∈ R, if given any ε > 0 thereexists δ > 0 so that f(y) < f(x) + ε whenever |x− y| < δ.

(a) Show that f = 1[a,∞) (where a ∈ R) is upper-semicontinuous for all x ∈ R,(b) Deduce that the floor function f(x) = bxc, which is equal to the greatest integer less

than or equal to x, is upper-semicontinuous at all x ∈ R.(c) Show that if f is upper-semicontinuous for all x ∈ R then f is measurable.

34

Page 36: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Chapter 3

Lebesgue Integration

3.1 Introduction

The concept of integration as a technique that both acts as an inverse to the operation of differen-tiation and also computes areas under curves goes back to the origin of the calculus and the workof Isaac Newton (1643-1727) and Gottfried Leibniz (1646-1716). It was Leibniz who introducedthe

∫· · · dx notation. The first rigorous attempt to understand integration as a limiting operation

within the spirit of analysis was due to Bernhard Riemann (1826-1866). The approach to Rie-mann integration that is often taught (as in MAS221) was developed by Jean-Gaston Darboux(1842-1917). At the time it was developed, this theory seemed to be all that was needed but asthe 19th century drew to a close, some problems appeared:

• One of the main tasks of integration is to recover a function f from its derivative f ′. Butsome functions were discovered for which f ′ existed and was bounded, but was not Riemannintegrable.

• Suppose (fn) is a sequence of functions converging pointwise to f . The Riemann integralcould not be used to find conditions for which∫

f(x)dx = limn→∞

∫fn(x)dx. (3.1)

Problem 3.16 illustrates some of the difficulties here; it gives an example of fn, f such thatfn(x)→ f(x) for all x, but in which (3.1) fails.

• Riemann integration was limited to computing integrals over Rn with respect to Lebesguemeasure. Although it was not yet apparent, the emerging theory of probability would requirethe calculation of expectations of random variables X: E(X) =

∫Ω X(ω)dP(ω).

A new approach to integration was needed. In this chapter, we’ll study Lebesgue integration,which allow us to investigate

∫S f(x)dm(x) where f : S → R is a “suitable” measurable function

defined on a general measure space (S,Σ,m).1 If we take m to be Lebesgue measure on (R,B(R))we recover the familiar integral

∫R f(x)dx but we will now be able to integrate many more functions

(at least in principle) than Riemann and Darboux. If we take X to be a random variable on aprobability space, we get its expectation E(X).

1We may also integrate extended measurable functions, but will not develop that here.

35

Page 37: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Notation. For simplicity we usually write∫S fdm instead of

∫S f(x)dm(x). To simplify even

further we’ll sometimes write I(f) =∫S fdm. Note that many authors use

∫S f(x)m(dx), with

the same meaning. In French textbooks they often write∫S dm f .

36

Page 38: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.2 The Lebesgue Integral for Simple Functions

We’ll present the construction of the Lebesgue integral in four steps: Step 1: Indicator func-tions, Step 2: Simple Functions, Step 3: Non-negative measurable functions, Step 4: Integrablefunctions. The first two steps begin here.

Step 1. Indicator FunctionsThis is very easy and yet it is very important:If f = 1A where A ∈ Σ ∫

S1Adm = m(A). (3.2)

e.g. In a probability space we get E(1A) = P (A).Step 2. Simple FunctionsLet f =

∑ni=1 ci1Ai be a non-negative simple function so that ci ≥ 0 for all 1 ≤ i ≤ n. We

extend (3.2) by linearity, i.e. we define∫Sfdm =

n∑i=1

cim(Ai), (3.3)

and note that∫S fdm ∈ [0,∞].

Theorem 3.2.1 If f and g are non-negative simple functions and α, β ≥ 0 then

1. (“Linearity”)∫S(αf + βg)dm = α

∫S fdm+ β

∫S gdm,

2. (Monotonicity) If f ≤ g then∫S fdm ≤

∫S gdm.

Proof:

1. Let f =∑ni=1 ci1Ai , g =

∑mj=1 dj1Bj . Since

⋃ni=1 Ai =

⋃mj=1 Bj = S, we have

f =n∑i=1

ci1Ai∩S =n∑i=1

ci1Ai∩⋃m

j=1 Bj=

n∑i=1

m∑j=1

ci1Ai∩Bj .

Here, the last equality follows by Problem 2.1 part (d). It follows that

αf + βg =n∑i=1

m∑j=1

(αci + βdj)1Ai∩Bj .

ThusI(αf + βg) =

n∑i=1

m∑j=1

(αci + βdj)m(Ai ∩Bj)

= αn∑i=1

ci

m∑j=1

m(Ai ∩Bj) + βn∑j=1

di

m∑i=1

m(Ai ∩Bj)

= αn∑i=1

cim

Ai ∩ m⋃j=1

Bj

+ βm∑j=1

djm

(n⋃i=1

Ai ∩Bj

)

= αn∑i=1

cim(Ai ∩ S) + βm∑j=1

djm(Bj ∩ S)

= αn∑i=1

cim(Ai) + βm∑j=1

djm(Bj)

= αI(f) + βI(g).

37

Page 39: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Here, to deduce the third line we again use Problem 2.1.

2. By (1), I(g) = I(f)+I(g−f) but g−f is a non-negative simple function and so I(g−f) ≥ 0.The result follows.

Notation. If A ∈ Σ, whenever∫S fdm makes sense for some “reasonable” measurable function

f : S → R we define:IA(f) =

∫Afdm =

∫S1Afdm.

Of course there is no guarantee that IA(f) makes sense and this needs checking at each stage. InProblem 3.2, you can check that it makes sense when f is non-negative and simple.

3.3 The Lebesgue Integral for Non-negative Measurable Func-tions

We haven’t done any analysis yet and at some stage we surely need to take some sort of limit! Iff is measurable and non-negative, it may seem attractive to try to take advantage of Theorem2.4.1 and define “

∫S fdm = limn→∞

∫S sndm”. But there are many different choices of simple

functions that we could take to make an approximating sequence, and this risks making thelimiting integral depend on that choice, which is undesirable. Instead Lebesgue used the weakernotion of the supremum to “approximate f from below” as follows:

Step 3. Non-negative measurable functions∫Sfdm = sup

∫Ssdm, s simple, 0 ≤ s ≤ f

. (3.4)

With this definition,∫S fdm ∈ [0,∞]. We allow the possibility that this integral may be equal

to +∞. The set over which we take the supremum is non-empty by Theorem 2.5.1.The use of the sup makes it harder to prove key properties and we’ll have to postpone a full

proof of linearity until the next section when we have some more powerful tools. Here are somesimple properties that can be proved fairly easily.

Theorem 3.3.1 If f, g : S → R are non-negative measurable functions,

1. (Monotonicity) If f ≤ g then∫S fdm ≤

∫S gdm.

2. I(αf) = αI(f) for all α > 0,

3. If A,B ∈ Σ with A ⊆ B then IA(f) ≤ IB(f),

4. If A ∈ Σ with m(A) = 0 then IA(f) = 0.

Proof: For (1), ∫Sfdm = sup

∫Ssdm ; s is simple, 0 ≤ s ≤ f

≤ sup

∫Ssdm ; s is simple, 0 ≤ s ≤ g

=∫Sgdm.

38

Page 40: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Parts (2), (3) and (4) are Problem 3.3.

Lemma 3.3.2 (Markov’s inequality) If f : S → R is a non-negative measurable function andc > 0.

m(x ∈ S; f(x) ≥ c) ≤ 1c

∫Sfdm

Proof: Let E = x ∈ S; f(x) ≥ c. Note that E = f−1([c,∞)) ∈ Σ as f is measurable (seeTheorem 2.2.1 (ii)). By Theorem 3.3.1 (3) and (2),∫

Sfdm ≥

∫Efdm ≥

∫Ecdm =

∫Sc1Edm = cm(E),

and the result follows.

Definition 3.3.3 Let f, g : S → R be measurable. We say that f = g almost everywhere, andwrite this for short as f = g a.e., if

m(x ∈ S; f(x) 6= g(x) = 0.

In Problem 3.9 you can show that this gives rise to an equivalence relation on the set of allmeasurable functions. In probability theory, we use the terminology almost surely for two randomvariables X and Y that agree almost everywhere, and we write X = Y (a.s.)

Corollary 3.3.4 If f is a non-negative measurable function and∫S fdm = 0 then f = 0 (a.e.)

Proof: Let A = x ∈ S; f(x) 6= 0 and for each n ∈ N, An = x ∈ S; f(x) ≥ 1/n. SinceA =

⋃∞n=1 An, we have m(A) ≤

∑∞n=1 m(An) by Theorem 1.5.2, and its sufficient to show that

m(An) = 0 for all n ∈ N. But by Markov’s inequality m(An) ≤ n∫S fdm = 0.

In Chapter 1 we indicated that we would be able to use integration to cook up new examplesof measures. Let f : S → R be non-negative and measurable and define IA(f) =

∫A fdm for

A ∈ Σ. We have∫∅ fdm = 0 by Theorem 3.3.1 part (4). To prove that A → IA(f) is a measure

we then need only prove that it is σ-additive, i.e. that IA(f) =∑∞n=1 IAn(f) whenever we have a

disjoint union A =⋃∞n=1 An.

Theorem 3.3.5 If f : S → R is a non-negative measurable function, the mapping from Σ to[0,∞] given by A→ IA(f) is σ-additive.

Proof: First assume that f = 1B for some B ∈ Σ. Then by (3.2)

IA(f) = m(B ∩ A) = m

(B ∩

∞⋃n=1

An

)

=∞∑n=1

m(B ∩ An) =∞∑n=1

IAn(f),

so the result holds in this case. You can then use linearity to show that it is true for non-negative simple functions.

Now let f be measurable and non-negative. Then by definition of the supremum, for any ε > 0there exists a simple function s with 0 ≤ s ≤ f so that IA(f) ≤ IA(s) + ε. The result holds forsimple functions and so by monotonicity we have

IA(s) =∞∑n=1

IAn(s) ≤∞∑n=1

IAn(f).

39

Page 41: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Combining this with the earlier inequality we find that

IA(f) ≤∞∑n=1

IAn(f) + ε.

But ε was arbitrary and so we conclude that

IA(f) ≤∞∑n=1

IAn(f).

The second half of the proof will aim to establish the opposite inequality. First let A1, A2 ∈ Σ bedisjoint. Given any ε > 0 we can, as above, find simple functions s1, s2 with 0 ≤ sj ≤ f , so thatIAj (sj) ≥ IAj (f)− ε/2 for j = 1, 2. Let s = s1 ∨ s2 = maxs1, s2. Then s is simple (check this),0 ≤ s ≤ f and s1 ≤ s, s2 ≤ s. So by monotonicity, IAj (s) ≥ IAj (f)− ε/2 for j = 1, 2. Add thesetwo inequalities to find that

IA1(s) + IA2(s) ≥ IA1(f) + IA2(f)− ε.

But the result is true for simple functions and so we have

IA1∪A2(s) ≥ IA1(f) + IA2(f)− ε.

By the definition (3.4), IA1∪A2(f) ≥ IA1∪A2(s) and so we have that

IA1∪A2(f) ≥ IA1(f) + IA2(f)− ε.

But ε was arbitrary and so we conclude that

IA1∪A2(f) ≥ IA1(f) + IA2(f),

which is the required inequality for unions of two disjoint sets. By induction we have

IA1∪A2∪···∪An(f) ≥n∑i=1

IAi(f),

for any n ≥ 2. But as A1 ∪ A2 ∪ · · · ∪ An ⊆ A we can use Theorem 3.3.1 (3) to find that

IA(f) ≥n∑i=1

IAi(f).

Now take the limit as n→∞ to deduce that

IA(f) ≥∞∑i=1

IAi(f),

as was required.

Example 3.3.6 The famous Gaussian measure on R is obtained in this way by taking

f(x) = 1√2πe− x

22 for x ∈ R,

40

Page 42: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

with m being Lebesgue measure. In this case,∫R f(x)dx = 1. To connect more explicitly with

probability theory, let (Ω,F , P ) be a probability space and X : Ω → R be a random variable.Equip (R,B(R)) with Lebesgue measure. In Chapter 2, we introduced the probability law pX ofX as a probability measure on (R,B(R)). If for all A ∈ B(R),

pX(A) = IA(fX)(= I(fX1A))

for some non-negative measurable function fX : R→ R, then fX is called the probability densityfunction or pdf of X. So for all A ∈ B(R),

P (X ∈ A) = pX(A) =∫AfX(x)dx.

We say that X is a standard normal if pX is Gaussian measure.

We present two useful corollaries to Theorem 3.3.5:

Corollary 3.3.7 Let f : S → R be a non-negative measurable function and (En) be a sequenceof sets in Σ with En ⊆ En+1 for all n ∈ N. Set E =

⋃∞n=1 En, Then∫

Efdm = lim

n→∞

∫En

fdm.

Proof: This is in fact an immediate consequence of Theorems 3.3.5 and 1.5.1, but it might behelpful to spell out the proof in a little detail, so here goes: We use the “disjoint union trick”, sowrite A1 = E1, A2 = E2−E1, A3 = E3−E2, . . .. Then the Ans are mutually disjoint,

⋃∞n=1 An = E

and⋃ni=1 Ai = En for all n ∈ N . Then by Theorem 3.3.5∫

Efdm =

∞∑i=1

∫Ai

fdm

= limn→∞

n∑i=1

∫Ai

fdm

= limn→∞

∫A1∪A2∪···∪An

fdm

= limn→∞

∫En

fdm.

Corollary 3.3.8 If f and g are non-negative measurable functions and f = g (a.e.) then I(f) =I(g).

Proof: Let A1 = x ∈ S; f(x) = g(x) and A2 = x ∈ S; f(x) 6= g(x). Then A1, A2 ∈ Σ withA1∪A2 = S,A1∩A2 = ∅ and m(A2) = 0. So by Theorem 3.3.1 (4),

∫A2fdm =

∫A2gdm = 0. But∫

A1fdm =

∫A1gdm as f = g on A1 and so by Theorem 3.3.5,∫

Sfdm =

∫A1

fdm+∫A2

fdm

=∫A1

gdm+∫A2

gdm =∫Sgdm.

41

Page 43: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.4 The Monotone Convergence Theorem

We haven’t yet proved that∫S(f + g)dm =

∫S fdm+

∫S gdm. Nor have we extended the integral

beyond non-negative measurable functions. Before we can do either of these, we need to establishthe monotone convergence theorem. This is the first of two important results that show thesuperiority of Lebesgue integration over Riemann integration.

We say that a sequence (fn) be of measurable functions is monotone increasing if fn ≤ fn+1

for all n ∈ N. Note that in this case the pointwise limit f = limn→∞ fn automatically exists, isnon-negative and also measurable (by Theorem 2.4.9), and f may take values in [0,∞]. Similarly,we say that (fn) is monotone decreasing if fn ≥ fn+1.

Theorem 3.4.1 (Monotone Convergence Theorem) Let fn, f be measurable functions fromS to R. Suppose that:

1. (fn) is a monotone increasing, and each fn is non-negative.

2. fn(x)→ f(x) almost everywhere.

Then ∫Sfn dm→

∫Sf dm

as n→∞.

Proof: Since (fn) is increasing, f(x) = limn→∞ f(x) exists for all x ∈ R. We have f(x) = f(x)almost everywhere by our second assumption, which means that

∫s f dm =

∫S f dm. Hence, in

fact we may assume (without loss of generality, by using f in place of f) that fn(x) → f(x) forall x.

As f = supn∈N fn, by monotonicity (Theorem 3.3.1(1)), we have∫Sf1dm ≤

∫Sf2dm ≤ · · · ≤

∫Sfdm.

Hence by monotonicity of the integrals, limn→∞∫S fndm exists (as an extended real number) and

limn→∞

∫Sfndm ≤

∫Sfdm.

We must now prove the reverse inequality. To simplify notation, let a = limn→∞∫S fndm. So we

need to show that a ≥∫S fdm. Let s be a simple function with 0 ≤ s ≤ f and choose c ∈ R with

0 < c < 1. For each n ∈ N, let En = x ∈ S; fn(x) ≥ cs(x), and note that En ∈ Σ for all n ∈ Nby Proposition 2.3.3. Since (fn) is increasing, it follows that En ⊆ En+1 for all n ∈ N. Also wehave

⋃∞n=1 En = S. To verify this last identity, note that if x ∈ S with s(x) = 0 then x ∈ En for

all n ∈ N and if x ∈ S with s(x) 6= 0 then f(x) ≥ s(x) > cs(x) and so for some n, fn(x) ≥ cs(x),as fn(x)→ f(x) as n→∞, i.e. x ∈ En. By Theorem 3.3.1(3) and (1), we have

a = limn→∞

∫Sfndm ≥

∫Sfndm ≥

∫En

fndm ≥∫En

csdm.

As this is true for all n ∈ N, we find that

a ≥ limn→∞

∫En

csdm.

42

Page 44: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

But by Corollary 3.3.7 (since (En) is increasing), and Theorem 3.3.1(2),

limn→∞

∫En

csdm =∫Scsdm = c

∫Ssdm,

and so we deduce thata ≥ c

∫Ssdm.

But 0 < c < 1 is arbitrary so taking e.g. c = 1− 1/k with k = 2, 3, 4, . . . and letting k →∞, wefind that

a ≥∫Ssdm.

But the simple function s for which 0 ≤ s ≤ f was also arbitrary, so now take the supremum overall such s and apply (3.4) to get

a ≥∫Sfdm,

and the proof is complete. .

Corollary 3.4.2 Let f : S → R be measurable and non-negative. There exists an increasingsequence of simple functions (sn) converging pointwise to f so that

limn→∞

∫Ssndm =

∫Sfdm. (3.5)

Proof: Apply the monotone convergence theorem to the sequence (sn) constructed in Theorem2.4.1

Theorem 3.4.3 Let f, g : S → R be measurable and non-negative. Then∫S(f + g)dm =

∫Sfdm+

∫Sgdm.

Proof: By Theorem 2.4.1 we can find an increasing sequence of simple functions (sn) that con-verges pointwise to f and an increasing sequence of simple functions (tn) that converges pointwiseto g. Hence (sn + tn) is an increasing sequence of simple functions that converges pointwise tof + g. So by Theorem 3.4.1, Theorem 3.2.1(1) and then Corollary 3.4.2,∫

S(f + g)dm = lim

n→∞

∫S(sn + tn)dm

= limn→∞

∫Ssndm+ lim

n→∞

∫Stndm

=∫Sfdm+

∫Sgdm.

Another more delicate convergence result can be obtained as a consequence of the monotoneconvergence theorem. We present this as a theorem, although it is always called Fatou’s lemmaafter the French mathematician (and astronomer) Pierre Fatou (1878-1929). It tells us how lim infand

∫interact.

Theorem 3.4.4 (Fatou’s Lemma) If (fn) is a sequence of non-negative measurable functionsfrom S to R then

lim infn→∞

∫Sfndm ≥

∫S

lim infn→∞

fndm

43

Page 45: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Proof: Define gn = infk≥n fk. Then (gn) is an increasing sequence which converges tolim infn→∞ fn. Now as fl ≥ infk≥n fk for all l ≥ n, by monotonicity (Theorem 3.3.1(1)) wehave that for all l ≥ n ∫

Sfldm ≥

∫S

infk≥n

fkdm,

and soinfl≥n

∫Sfldm ≥

∫S

infk≥n

fkdm.

Now take limits on both sides of this last inequality and apply the monotone convergence theoremto obtain

lim infn→∞

∫Sfndm ≥ lim

n→∞

∫S

infk≥n

fkdm

=∫S

limn→∞

infk≥n

fkdm

=∫S

lim infn→∞

fndm

Note that we do not require (fn) to be a bounded sequence, so lim infn→∞ fn should beinterpreted as an extended measurable function, as discussed at the end of Chapter 2. Thecorresponding result for lim sup, in which case the inequality is reversed, is known as the reverseFatou lemma and can be found as Problem 3.12.

44

Page 46: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.5 Lebesgue Integrability and Dominated Convergence

At last we are ready for the final step in the construction of the Lebesgue integral - the extensionfrom non-negative measurable functions to a class of measurable functions that are real-valued.

Step 4. Integrable functions.For the final step we first take f to be an arbitrary measurable function. We define the positive

and negative parts of f , which we denote as f+ and f− respectively by:

f+(x) = maxf(x), 0, f−(x) = max−f(x), 0.

Both f+ and f− are measurable (by Corollary 2.3.2) and non-negative. We have

f = f+ − f−,

and using Step 3, we see that we can construct both∫S f+dm and

∫S f−dm. Provided both of

these are not infinite, we define ∫Sfdm =

∫Sf+dm−

∫Sf−dm.

With this definition,∫S fdm ∈ [−∞,∞]. We say that f is integrable if

∫S fdm ∈ (−∞,∞).

Clearly f is integrable if and only if each of f+ and f− are. Define |f |(x) = |f(x)| for all x ∈ S.As f is measurable, it follows from Problem 2.8 that |f | also is. Since

|f | = f+ + f−,

it is not hard to see that f is integrable if and only if |f | is. Using this last fact, the condition forintegrability of f is often written ∫

S|f |dm <∞.

We also have the useful inequality (whose proof is Problem 3.8 part (a)):∣∣∣∣∫Sfdm

∣∣∣∣ ≤ ∫S|f |dm, (3.6)

for all integrable f .

Theorem 3.5.1 Suppose that f and g are integrable functions from S to R.

1. If c ∈ R then cf is integrable and∫S cfdm = c

∫S fdm,

2. f + g is integrable and∫S(f + g)dm =

∫S fdm+

∫S gdm,

3. (Monotonicity) If f ≤ g then∫S fdm ≤

∫S gdm.

Proof: (1) and (3) are Problem 3.7. For (2), we may assume that both f, g are not identically0. The fact that f + g is integrable if f and g are follows from the triangle inequality (Problem3.8 part (b)). To show that the integral of the sum is the sum of the integrals, we first need toconsider six different cases (writing h = f +g) (i) f ≥ 0, g ≥ 0, h ≥ 0, (ii) f ≤ 0, g ≤ 0, h ≤ 0, (iii)f ≥ 0, g ≤ 0, h ≥ 0, (iv) f ≤ 0, g ≥ 0, h ≥ 0, (v) f ≥ 0, g ≤ 0, h ≤ 0, (vi) f ≤ 0, g ≥ 0, h ≤ 0.

45

Page 47: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Case (i) is Theorem 3.4.2. We’ll just prove (iii). The others are similar. If h = f + g thenf = h+ (−g) and this reduces the problem to case (i). Indeed we then have∫

Sfdm =

∫S(f + g)dm+

∫S(−g)dm,

and so by (1) ∫S(f + g)dm =

∫Sfdm−

∫S(−g)dm =

∫Sfdm+

∫Sgdm.

Now write S = S1 ∪ S2 ∪ S3 ∪ S4 ∪ S5 ∪ S6, where Si is the set of all x ∈ S for which case (i)holds for i = 1, 2, . . . , 6. These sets are disjoint and measurable and so by a slight extension ofTheorem 3.3.5,2∫

S(f + g)dm =

6∑i=1

∫Si

(f + g)dm =6∑i=1

∫Si

fdm+6∑i=1

∫Si

gdm =∫Sfdm+

∫Sgdm,

as was required.

We now present the last of our convergence theorems, the famous Lebesgue dominated con-vergence theorem - an extremely powerful tool in both the theory and applications of modernanalysis:

Theorem 3.5.2 (Dominated Convergence Theorem) Let fn, f be measurable functions fromS to R. Suppose that

1. There is an integrable function g : S → R so that |fn| ≤ g for all n ∈ N.

2. fn(x)→ f(x) almost everywhere.

Then f is integrable and ∫Sfndm→

∫Sfdm

as n→∞.

Proof: For the same reason as in the proof of Theorem 3.4.1, we may assume that in factfn(x) → f(x) for all x. Note that we didn’t assume explicitly that fn is integrable - becausethis fact follows immediately from the first assumption by Theorem 3.3.1 part (1), using that∫S |fn|dm ≤

∫S gdm <∞.

Since fn(x) → f(x) almost everywhere, |fn(x)| → |f(x)| almost everywhere. By Fatou’slemma (Theorem 3.4.4) and monotonicity (Theorem 3.5.1 part (3)), we have∫

S|f |dm =

∫S

lim infn→∞

|fn|dm

≤ lim infn→∞

∫S|fn|dm

≤∫Sgdm <∞,

and so f is integrable.Also for all n ∈ N, g + fn ≥ 0 so by Fatou’s lemma again,∫

Slim infn→∞

(g + fn)dm ≤ lim infn→∞

∫S(g + fn)dm.

2This works since we only need finite additivity here.

46

Page 48: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

But lim infn→∞(g+fn) = g+ limn→∞ fn = g+f and (using Theorem 3.5.1(2)) lim infn→∞∫S(g+

fn)dm =∫S gdm+ lim infn→∞

∫S fndm. We then conclude that∫

Sfdm ≤ lim inf

n→∞

∫Sfndm. (3.7)

Repeat this argument with g+ fn replaced by g− fn which is also non-negative for all n ∈ N. Wethen find that

−∫Sfdm ≤ lim inf

n→∞

(−∫Sfndm

)= − lim sup

n→∞

∫Sfndm,

and so ∫Sfdm ≥ lim sup

n→∞

∫Sfndm (3.8)

Combining (3.7) and (3.8) we see that

lim supn→∞

∫Sfndm ≤

∫Sfdm ≤ lim inf

n→∞

∫Sfndm (3.9)

but we always have lim infn→∞∫S fndm ≤ lim supn→∞

∫S fndm and so lim infn→∞

∫S fndm =

lim supn→∞∫S fndm. Then by Theorem 2.1.1 limn→∞

∫S fndm exists, and from (3.9) we deduce

that∫S fdm = limn→∞

∫S fndm.

Example 3.5.3 Suppose that (S,Σ,m) is a finite measure space and (fn) is a sequence of mea-surable functions from S to R which converge pointwise to f , and are uniformly bounded, i.e.there exists K > 0 so that |fn(x)| ≤ K for all x ∈ S, n ∈ N. Then f is integrable. To see this justtake g = K in the dominated convergence theorem and show that it is integrable, which followsfrom the fact that

∫S gdm = Km(S) <∞.

For a concrete example, work in the measure space ([0, 1],B([0, 1]), λ) and consider the sequence

of functions (fn) where fn(x) = nx2

nx+ 5 for all x ∈ [0, 1], n ∈ N. Each fn is continuous, hencemeasurable by Corollary 2.3.1. It is straightforward to check that limn→∞ fn(x) = x for allx ∈ [0, 1] and that |fn(x)| ≤ 1 for all n ∈ N, x ∈ [0, 1]. So in this case, we can take K = 1,and apply Lebesgue’s dominated convergence theorem to deduce that f(x) = x is integrable, and(writing dx as is traditional, rather than λ(dx), in the integrals)

limn→∞

∫[0,1]

nx2

nx+ 5dx =∫

[0,1]xdx.

You may want to go further and write∫[0,1]

xdx =[x2

2

]1

0= 1

2 ,

but we can’t at this stage, as that is a consequence of integration in the Riemann sense, not theLebesgue one. It is true though, and follows from the next result.

We will not prove the next theorem, which shows that the Lebesgue integral on R is at leastas powerful as the Riemann one – when we integrate over finite intervals. You can find a proof inSection 3.7 (which is off-syllabus).

Theorem 3.5.4 Let f : [a, b]→ R be bounded and Riemann integrable. Then it is also Lebesgueintegrable and the two integrals have the same value.

47

Page 49: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

In fact, we can integrate many more functions using Lebesgue integration than we couldusing Riemann integration. For example, with Riemann integration we could not conclude that∫

[a,b] 1R−Q(x)dx = (b− a), but with Lebesgue integration we can.

Example 3.5.5 We aim show that f(x) = x−α is integrable on [1,∞) for α > 1.For each n ∈ N define fn(x) = x−α1[1,n](x). Then (fn(x)) increases to f(x) as n → ∞. We

have ∫ ∞1

fn(x)dx =∫ n

1x−αdx = 1

α− 1(1− n1−α).

By the monotone convergence theorem,∫ ∞1

x−αdx = 1α− 1 lim

n→∞(1− n1−α) = 1

α− 1 .

Example 3.5.6 We aim to show that f(x) = xαe−x is integrable on [0,∞) for α > 0.We use the fact that for any M ≥ 0, limx→∞ x

Me−x = 0, so that given any ε > 0 there existsR > 0 so that x > R⇒ xMe−x < ε, and choose M so that M − α > 1. Now write

xαe−x = xαe−x1[0,R](x) + xαe−x1(R,∞)(x).

The first term on the right hand side is clearly integrable. For the second term we use that factthat for all x > R,

xαe−x = xMe−x.xα−M < εxα−M ,

and the last term on the right hand side is integrable by Example 1. So the result follows bymonotonicity (Theorem 3.3.1 (1)).

As a result of Example 3.5.5, we know that the Gamma function

Γ(α) =∫ ∞

0xα−1e−xdx

exists for all α > 1. It can also be extended to the case α > 0, with a little more work. In thenext example, we deduce one of its properties.

Example 3.5.7 We aim to show that

Γ(α) = limn→∞

n!nαα(α + 1) · · · (α + n)

for α > 1.Let Pα

n = n!nαα(α+1)···(α+n) . You can check (e.g. by induction and integration by parts) that

Pαn

nα=∫ 1

0(1− t)ntα−1dt.

Make a change of variable x = tn to find that

Pαn =

∫ n

0

(1− x

n

)nxα−1dx =

∫ ∞0

(1− x

n

)nxα−1

1[0,n](x)dx.

Now the sequence whose nth term is(1− x

n

)nxα−1

1[0,n](x) comprises non-negative measurablefunctions, and is monotonic increasing to e−xxα−1 as n → ∞. The result then follows by themonotone convergence theorem.

48

Page 50: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Example 3.5.8 Summation of series is also an example of Lebesgue integration. Suppose forsimplicity that we are interested in

∑∞n=1 an, where an ≥ 0 for all n ∈ N. We consider the

sequence (an) as a function a : N→ [0,∞). We work with the measure space (N,P(N),m) wherem is counting measure. Then every sequence (an) gives rise to a non–negative measurable functiona (why is it measurable?) and

∞∑n=1

an =∫Na(n)dm(n).

The integration theory that we’ve developed tells us that this either converges, or diverges to +∞,as we would expect from previous work.

Remark 3.5.9 (?) It is also possible to define Lebesgue integration of complex–valued functions.Let (S,Σ,m) be a measure space and f : S → C. We can always write f = f1 + if2, where thereal and imaginary parts are fi : S → R (i = 1, 2). We say that f is measurable/integrable if bothf1 and f2 are. When f is integrable, we may define∫

Sfdm =

∫Sf1dm+ i

∫Sf2dm.

You can check that f is integrable if and only if∫S |f |dm < ∞, where we now have |f | =√

f21 + f2

2 . See Problems 3.17-3.21 for applications to the Fourier transform.

49

Page 51: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.6 Fubini’s Theorem and Function Spaces (?)

This section is included for interest. It is marked with a (?) and it is off-syllabus. However thefirst topic (Fubini’s theorem) is covered in greater detail for MAS451/6352 within Chapter 5 andthat more extensive treatment is examinable for MAS451/3562.

Fubini’s Theorem (?)

Let (Si,Σi,mi) be two measure spaces3and consider the product space (S1×S2,Σ1⊗Σ2,m1×m2)as discussed in Section 1.7. We can consider integration of measurable functions f : S1× S2 → Rby the procedure that we’ve already discussed and there is nothing new to say about the definitionand properties of

∫S1×S2

fd(m1×m2) when f is either measurable and non-negative (so the integralmay be an extended real number) or when f is integrable (and the integral is a real number.)However from a practical point of view we would always like to calculate a double integral bywriting it as a repeated integral so that we first integrate with respect to m1 and then with respectto m2 (or vice versa). Fubini’s theorem, which we will state without proof, tells us that we cando this provided that f is integrable with respect to the product measure. It is named in honourof the Italian mathematician Guido Fubini (1879-1943).

Theorem 3.6.1 (Fubini’s Theorem) Let f be integrable on (S1×S2,Σ1⊗Σ2,m1×m2) so that∫S1×S2

|f(x, y)|(m1 ×m2)(dx, dy) <∞. Then

1. The mapping f(x, ·) is m2-integrable, almost everywhere with respect to m1,

2. The mapping f(·, y) is m1-integrable, almost everywhere with respect to m2,

3. The mapping x→∫S2f(x, y)m2(dy) is equal almost everywhere to an integrable function on

S1,

4. The mapping y →∫S1f(x, y)m1(dy) is equal almost everywhere to an integrable function on

S2,

5. ∫S1×S2

f(x, y)(m1 ×m2)(dx, dy) =∫S1

(∫S2

f(x, y)m2(dy))m1(dx)

=∫S2

(∫S1

f(x, y)m1(dx))m2(dy).

Function Spaces (?)

An important application of Lebesgue integration is to the construction of Banach spaces Lp(S,Σ,m)of equivalence classes of real-valued4 functions that agree a.e. and which satisfy the requirement

||f ||p =(∫

S|f |pdm

) 1p

<∞,

where 1 ≤ p < ∞. In fact || · ||p is a norm on Lp(S,Σ,m), but only if p ≥ 1. This is the reasonwhy, in Section 4.5, we will only define Lp convergence for p ≥ 1.

3Technically speaking, the measures should have an additional property called σ–finiteness for the main result below tobe valid.

4The complex case also works and is important.

50

Page 52: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

When p = 2 we obtain a Hilbert space with inner product:

〈f, g〉 =∫Sfgdm.

There is also a Banach space L∞(S,Σ,m) where

||f ||∞ = infM ≥ 0; |f(x)| ≤M a.e..

These spaces play important roles in functional analysis and its applications, including partialdifferential equations, probability theory and quantum mechanics.

51

Page 53: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.7 Riemann Integration (?)

In this section, our aim is to show that if a bounded function f : [a, b]→ R is Riemann integrable,then it is measurable and Lebesgue integrable. Moreover, in this case the Riemann and Lebesgueintegrals of f are equal. We begin by briefly revising the Riemann integral.

Note that this whole section is marked with a (?), meaning that it is off-syllabus. It will bediscussed briefly in lectures.

The Riemann Integral (?)

A partition P of [a, b] is a set of points x0, x1, . . . , xn with a = x0 < x1 < · · · < xn−1 < xn = b.Define mj = infxj−1≤x≤xj f(x) and Mj = supxj−1≤x≤xj f(x). We underestimate by defining

L(f,P) =n∑j=1

mj(xj − xj−1),

and overestimate by defining

U(f,P) =n∑j=1

Mj(xj − xj−1),

A partition P ′ is said to be a refinement of P if P ⊂ P ′. We then have

L(f,P) ≤ L(f,P ′), U(f,P ′) ≤ U(f,P). (3.10)

A sequence of partitions (Pn) is said to be increasing if Pn+1 is a refinement of Pn for alln ∈ N.

Now define the lower integral La,bf = supP L(f,P), and the upper integral Ua,bf = infP U(f,P).We say that f is Riemann integrable over [a, b] if La,bf = Ua,bf , and we then write the commonvalue as

∫ ba f(x)dx. In particular, every continuous function on [a, b] is Riemann integrable. The

next result is very useful:

Theorem 3.7.1 The bounded function f is Riemann integrable on [a, b] if and only if for everyε > 0 there exists a partition P for which

U(f,P)− L(f,P) < ε. (3.11)

If (3.11) holds for some P , it also holds for all refinements of P . A useful corollary is

Corollary 3.7.2 If the bounded function f is Riemann integrable on [a, b], then there exists anincreasing sequence (Pn) of partitions of [a, b] for which

limn→∞

U(f,Pn) = limn→∞

L(f,Pn) =∫ b

af(x)dx

Proof: This follows from Theorem (3.7.1) by successively choosing ε = 1, 12 ,

13 , . . . ,

1n , . . . . If the

sequence (Pn) is not increasing, then just replace Pn with Pn ∪ Pn−1 and observe that this canonly improve theinequality (3.11).

52

Page 54: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

The Connection (?)

We have already stated the connection between Riemann and Lebesgue integration as Theorem3.5.4. We re-state it here for convenience, now with a proof.

Theorem If f : [a, b] → R is Riemann integrable, then it is Lebesgue integrable, and the twointegrals coincide.

Proof: We use the notation λ for Lebsgue measure in this section. We also write M =supx∈[a,b] |f(x)| and m = infx∈[a,b] |f(x)|.

Let P be a partition as above and define simple functions,

gP =n∑j=1

mj1(xj−1,xj ], hP =n∑j=1

Mj1(xj−1,xj ].

Consider the sequences (gn) and (hn) which correspond to the partitions of Corollary 3.7.2 andnote that

Ln(f) =∫

[a,b]gndλ, Unf =

∫[a,b]

hndλ,

where Un(f) = U(f,Pn) and Ln(f) = L(f,Pn). Clearly we also have for each n ∈ N,

gn ≤ f ≤ hn. (3.12)

Since (gn) is increasing (by (3.10)) and bounded above by M , it converges pointwise to a measur-able function g. Similarly (hn) is decreasing and bounded below by m, so it converges pointwiseto a measurable function h. By (3.12) we have

g ≤ f ≤ h. (3.13)

Again since maxn∈N|gn|, |hn| ≤M , we can use dominated convergence to deduce that g andh are both integrable on [a, b] and by Corollary 3.7.2,∫

[a,b]gdλ = lim

n→∞Ln(f) =

∫ b

af(x)dx = lim

n→∞Un(f) =

∫[a,b]

hdλ.

Hence we have ∫[a,b]

(h− g)dλ = 0,

and so by Corollary 3.3.1, h(x) = g(x) (a.e.). Then by (3.13) f = g (a.e.) and so f is measurable5and also integrable. So

∫[a,b] fdλ =

∫[a,b] gdλ, and hence we have∫

[a,b]fdλ =

∫ b

af(x)dx.

5I’m glossing over a subtlety here. It is not true in general, that a function that is almost everywhere equal to a measurablefunction is measurable. It works in this case due to a special property of Lebesgue measure called “completeness.”

53

Page 55: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Discussion (?)

An important caveat is that Theorem 3.5.4 only applies to finite closed intervals. On infinite inter-vals, there are examples of functions are Riemann integrable6 but not Lebesgue integrable. Onesuch example is

∫∞0

sinxx dx. Crucially, sinx

x oscillates above and below 0 as x→∞, and the Rie-mann integral only exists because these oscillations cancel each other out. In Lebesgue integrationthis isn’t allowed to happen, and sinx

x fails to be Lebesgue integrable because∫∞

0 ‖sinxx ‖ dx =∞.

You might think of this as analogous to something you’ve already seen with infinite series.When infinite series are absolutely convergent (

∑|an| <∞) they are much better behaved. The

following result, which we won’t prove, makes this point very clearly. A ‘re-ordering’ of a seriessimply means arranging its terms in a different order.

Theorem Let (an) be a real sequence.

1. Suppose∑|an| = ∞. Then, for any α ∈ R, there is a re-ordering bn = ap(n) such that

an → α.

2. Suppose∑|an| <∞. Then, for any re-ordering bn = ap(n), we have

∑an =

∑bn.

Imagine if we allowed something similar to happen in integration. It would mean that re-orderingthe x-axis (i.e. permuting it around) could change the value of

∫f(x) dx! This would be nonsen-

sical, and mean that integration no longer had anything to do with ‘area under the curve’.Similarly, when integrals satisfy

∫∞0 |f(x)| dx <∞, as Lebesgue integration requires, they are

much better behaved, and we can then prove important convergence results like the MCT/DCT– both of which fail to be true if we use Riemann integration.

6Strictly, we should say ‘improperly’ Riemann integrable.

54

Page 56: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.8 Exercises

3.1 Let f : R→ R be defined as follows

f =

0 if x < −21, if − 2 ≤ x < −1,0 if − 1 ≤ x < 0,2 if 0 ≤ x < 1,1 if 1 ≤ x < 2,0, if x ≥ 2

Write f explicitly as a simple function and calculate∫R f(x)dx.

3.2 Let (S,Σ,m) be a measure space, A ∈ Σ and f be a real-valued simple function defined on S.Show that f1A is also a simple function, which is non-negative if f is. If f is non-negative,what constraint can you impose to ensure that IA(f) = I(f1A) is finite?

3.3 Prove Theorem 3.3.1 parts (2) to (4).

3.4 Prove the following version of Chebychev’s inequality. If f : S → R is a measurable functionand c > 0 then

m(x ∈ S; |f(x)| ≥ c) ≤ 1c2

∫Sf2dm.

Formulate and prove a similar inequality where c2 is replaced by cp for p ≥ 1.Hint: Imitate the method of proof for Markov’s inequality.

3.5 Extend Corollary 3.3.4 as follows. Show that if f is a real valued measurable function forwhich

∫S |f |pdm = 0 for some p ≥ 1 then f = 0 (a.e.)

3.6 Let f : R→ R be defined as follows

f =

0 if x < −2−1, if − 2 ≤ x < −1,1 if − 1 ≤ x < 0,−2 if 0 ≤ x < 1,3 if 1 ≤ x < 2,0, if x ≥ 2

Write down f+ and f− and confirm that they are non-negative simple functions. Calculate∫R f+(x)dx and

∫R f−(x)dx and hence also

∫R f(x)dx.

3.7 Prove Theorem 3.5.1 parts (1) and (3).Hint: For (1), consider the cases c ≥ 0, c = −1 and c < 0 (c 6= −1) separately.

3.8 Show that if f and g are integrable functions then

(a) |∫S f dm| ≤

∫S |f | dm,

(b)∫S |f + g| dm ≤

∫S |f | dm+

∫S |g| dm.

3.9 Show that f = g (a.e.) defines an equivalence relation on the set of all real-valued measurablefunctions defined on (S,Σ,m).

55

Page 57: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.10 Consider the sequence (fn) on the measure space (R,B(R), λ) where fn = n1(0,1/n). Showthat (fn) converges pointwise to zero, but that

∫R fn dλ = 1 for all n ∈ N.

3.11 Let (S,Σ,m) be a measure space and (An) be a sequence of disjoint sets with An ∈ Σ foreach n ∈ N. define A =

⋃∞n=1 An. Let f : S → R be measurable. Show that f1A is integrable

if and only if f1An is integrable for each n ∈ N and∑∞n=1

∫An|f |dm <∞.

Hint: Use the monotone convergence theorem.

3.12 Prove the reverse Fatou lemma, i.e. if (fn) is a sequence of non-negative measurable functionsfor which fn ≤ f for all n ∈ N where f is integrable then

lim supn→∞

∫Sfndm ≤

∫S

lim supn→∞

fndm.

Hint: Apply Fatou’s lemma to f − fn.

3.13 Show that if f : R → R is integrable then so are the mappings x → cos(αx)f(x) andx→ sin(βx)f(x), where α, β ∈ R. Deduce that

limn→∞

∫R

cos(x/n)f(x)dx =∫Rf(x)dx.

3.14 Let (S,Σ,m) be a measure space and f : [a, b]× S → R be a measurable function for which

(i) The mapping x→ f(t, x) is integrable for all t ∈ [a, b],(ii) The mapping t→ f(t, x) is continuous for all x ∈ S,(iii) There exists a non-negative integrable function g : S → R so that |f(t, x)| ≤ g(x) for

all t ∈ [a, b], x ∈ S.

Use the dominated convergence theorem to show that the mapping t →∫S f(t, x)dm(x) is

continuous on [a, b].Hint: Use continuity in terms of sequences – show that limn→∞

∫S f(tn, x)dm(x) =

∫S f(t, x)dm(x)

for any sequence (tn) satisfying limn→∞ tn = t.

3.15 Let (S,Σ,m) be a measure space and f : [a, b]× S → R be a measurable function for which

(i) The mapping x→ f(t, x) is integrable for all t ∈ [a, b],(ii) The mapping t→ f(t, x) is differentiable for all x ∈ S,

(iii) There exists a non-negative integrable function h : S → R so that∣∣∣∣∂f(t, x)

∂t

∣∣∣∣ ≤ h(x) forall t ∈ [a, b], x ∈ S.

Show that the mapping t→∫S f(t, x)dm(x) is differentiable on (a, b) and that

d

dt

∫Sf(t, x)dm(x) =

∫S

∂f(t, x)∂t

dm(x).

Hint: Use the mean value theorem.

56

Page 58: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.16 (?) Let

f(x) = −2xe−x2

fn(x) =n∑r=1

(−2r2xe−r

2x2 + 2(r + 1)2xe−(r+1)2x2)

for all x ∈ R.

(a) Show that f(x) = limn→∞ fn(x) for all x ∈ R.(b) Let a > 0. Show that f and fn are Riemann integrable over [0, a] for all n ∈ N but that∫ a

0f(x)dx 6= lim

n→∞

∫ a

0fn(x)dx.

Neither the monotone or dominated convergence theorems can be used here (follow up ex-ercise: explain why not). This example illustrates that things can go badly wrong withoutthem, even when fn(x)→ f(x) for all x.

57

Page 59: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Additional questions (?)

These questions explore properties of the Fourier transform. They are off syllabus, but you mayfind them interesting.

3.17 If f is an integrable function on (R,B(R), λ), where λ is Lebesgue measure, define its Fouriertransform f(y) for each y ∈ R, by

f(y) =∫Re−ixyf(x)dx

=∫R

cos(xy)f(x)dx− i∫R

sin(xy)f(x)dx.

Prove that |f(y)| <∞ and so f is a well-defined function from R to C. Show also that theFourier transformation Ff = f is linear, i.e. for all integrable f, g, and a, b ∈ R we have

af + bg = af = bg.

3.18 Recall Dirichlet’s jump function 1Q. Does it make sense to write down the Fourier coefficientsan = 1

π

∫ π−π 1Q(x) cos(nx)dx and bn = 1

π

∫ π−π 1Q(x) sin(nx)dx as Lebesgue integrals? If so,

what values do they have? Can you associate a Fourier series to 1Q? If so, (and if it isconvergent) what does it converge to?

3.19 Fix a ∈ R and define the shifted function fa(x) = f(x− a). If f is integrable, show that faalso is, and deduce that fa(y) = e−iayf(y) for all y ∈ R.

3.20 Show that the mapping y → f(y) is continuous from R to C.Hint: In this question, and the next one, you may use the fact that Lebesgue’s dominatedconvergence theorem continues to hold for complex–valued functions, where | · | is interpretedas the usual modulus of complex numbers.

3.21 Suppose that the mappings x → f(x) and x → xf(x) are both integrable. Show thaty → f(y) is differentiable and that for all y ∈ R,

(f)′(y) = −ig(y),

where g(x) = xf(x) for all x ∈ R.Hint: Use the inequality |eib − 1| ≤ |b| for b ∈ R.

Useful note: Analogues of the results of Problems 3.17-3.21, with slight modifications, also holdfor the Laplace transform Lf(y) =

∫[0,∞) e

−yxf(x)dx, where y ≥ 0 and f is assumed to be inte-grable on [0,∞).

58

Page 60: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Chapter 4

Probability and Measure

4.1 Introduction

In this chapter we will examine probability theory from the measure theoretic perspective. Therealisation that measure theory is the foundation of probability is due to the Russian mathemati-cian A. N. Kolmogorov (1903-1987) who in 1933 published the hugely influential “Grundbegriffeder Wahrscheinlichkeitsrechnung” (in English: Foundations of the Theory of Probability). Sincethat time, measure theory has underpinned all mathematically rigorous work in probability theoryand has been a vital tool in enabling the theory to develop both conceptually and in applications.

We have already seen that probability is a measure, random variables are measurable functionsand expectation is a Lebesgue integral – but it is not true that “probability theory” can be reducedto a subset of “measure theory”. This is because there are important probabilistic concepts, suchas independence and conditioning, that do not appear naturally from within measure theory itself.The Polish mathematician Mark Kac (1914-1984) remarked that “Probability theory is measuretheory with a soul.” By the end of the course, you should be able to decide for yourself if youagree with this statement.

4.2 Basic Concepts of Probability TheoryProbability as Measure

Let us review what we know so far. In this chapter we will work with general probability spacesof the form (Ω,F ,P) where the probability P is a finite measure on (Ω,F) having total mass 1. So

P(Ω) = 1 and 0 ≤ P(A) ≤ 1 for all A ∈ F .

P (A) is the probability that the event A ∈ F takes place. Since A ∪ Ac = Ω and A ∩ Ac = ∅, byM(ii) we have 1 = P(Ω) = P(A ∪ Ac) = P(A) + P(Ac) so that

P(Ac) = 1− P(A).

A random variable X is a measurable function from (Ω,F) to (R,B(R)). If A ∈ B(R), it isstandard to use the notation (X ∈ A) to denote the event X−1(A) ∈ F . The law or distributionof X is the induced probability measure on (R,B(R)) given by pX(B) = P(X−1(B)) for B ∈ B(R).So

pX(B) = P(X ∈ B) = P(X−1(B)) = P(ω ∈ Ω;X(ω) ∈ B).

59

Page 61: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

The expectation of X is the Lebesgue integral:

E(X) =∫

ΩX(ω)dP(ω) =

∫RxdpX(x),

(see Problem 4.10) which makes sense and yields a finite quantity if and only if X is (Lebesgue)integrable, i.e. E(|X|) < ∞. In this case, we write µX = E(X) and call it the mean of X. Notethat for all A ∈ F

P(A) = E(1A).

By the result of Problem 2.6, any Borel measurable function f from R to R enables us toconstruct a new random variable f(X) for which f(X)(ω) = f(X(ω)) for all ω ∈ Ω. For examplewe may take f(x) = xn for all n ∈ N. Then the nth moment E(Xn) will exist and be finite if|X|n is integrable. If X has a finite second moment then its variance Var(X) = E((X − µ)2)always exists (see Problem 4.12). It is common to use the notation σ2

X =Var(X). The standarddeviation of X is σX =

√Var(X). When it is clear which random variable we mean, we write

simply µ and σ in place of µX , σX .Here’s some useful notation. If X and Y are random variables defined on the same probability

space and A1, A2 ∈ B(R) it is standard to write:

P(X ∈ A1, Y ∈ A2) = P((X ∈ A1) ∩ (Y ∈ A2)).

Continuity of Probabilities

Recall from Section 1.6 that a sequence of sets (An) with An ∈ F for all n ∈ N is increasing ifAn ⊆ An+1 for all n ∈ N. Similarly, we say that a sequence (Bn) with Bn ∈ F for all n ∈ N isdecreasing if Bn ⊇ Bn+1 for all n ∈ N.

Theorem 4.2.1 We have:

1. Suppose (An) is increasing and A =⋃nAn. Then P[A] = limn→∞ P[An].

2. Suppose (Bn) is decreasing and B =⋂nBn. Then P[B] = limn→∞ P[Bn].

Proof: This is just Theorem 1.6.1 of Chapter 1 applied to probability measures.

The intuition here should be clear. The set An gets bigger as n→∞ and, in doing so, gets evercloser to A; the same is true of their probabilities. Similarly for Bn, which gets smaller and closerto B. This result is a probabilistic analogue of the well known fact that monotone increasing(resp. decreasing) sequences of real numbers converge.

The Cumulative Distribution Function

Let X : Ω→ R be a random variable. Its cumulative distribution function or cdf is the mappingFX : R→ [0, 1] defined for each x ∈ R by

FX(x) = P(X ≤ x) = pX((−∞, x]).

When X is understood we will just denote FX by F . The next result gathers together some usefulproperties of the cdf. Recall that if f : R→ R, the left limit at x is limy↑x f(y) = limy→x,y<x f(y),and the right limit at x is limy↓x f(y) = limy→x,y>x f(y). In general, left and right limits may notexist, but they do at every point when the function f is monotonic increasing (or decreasing).

60

Page 62: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Theorem 4.2.2 Let X be a random variable having cdf F .

1. P(X > x) = 1− F (x),

2. P(x < X ≤ y) = F (y)− F (x) for all x < y.

3. F is monotone increasing, i.e F (x) ≤ F (y) for all x < y,

4. P(X = x) = F (x)− limy↑x F (y),

5. The mapping x→ F (x) is right continuous, i.e. F (x) = limy↓x F (y), for all x ∈ R,

6. limx→−∞ F (x) = 0, limx→∞ F (x) = 1.

Proof: (1), (2) and (3) are easy exercises, left for you to do. (6) is Problem 4.4.(4) Let (an) be a sequence of positive numbers that decreases to zero. Let x ∈ R be arbitrary

and for each n ∈ N, define Bn = (x − an < X ≤ x). Then (Bn) decreases to the event (X = x)and using (2) and Theorem 4.2.1 (2),

P(X = x) = limn→∞

P(Bn) = F (x)− limn→∞

F (x− an),

and the result follows.(5) Let x and (an) be as in (4) and for each n ∈ N define An = (X > x + an). The sets (An)

are increasing to (X > x) and using (1) and Theorem 4.2.1 (1) we find that

1− F (x) = limn→∞

P(An) = 1− limn→∞

F (x+ an),

and the result follows.

Discrete and Continuous Random Variables

You will probably recall that many useful random variables are found in two special cases. For-mally, we say that a random variable X is a:

1. continuous random variable if its cdf FX is continuous at every point x ∈ R;

2. discrete random variable if FX has jump discontinuities at a countable set of points and isconstant between these jumps.

Note that if FX is continuous at x then P(X = x) = 0 by Theorem 4.2.2(4). In particular, thisapplies to all x ∈ R for continuous random variables. Many random random variables are neitherdiscrete nor continuous.

We now point out a technicality that is often forgotten in less rigorous courses: a continuousrandom variable does not need to have a probability density function! Strictly speaking, thosethat do have a special name; we say X is a

3. absolutely continuous random variable if there exists an integrable function fX : R → R sothat FX(x) =

∫ x−∞ fX(y)dy for all x ∈ R.

61

Page 63: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

In this case X is certainly a continuous random variable. The function fX is called the prob-ability density function or pdf of X. Clearly fX ≥ 0 (a.e.) and by Theorem 4.2.2 (6) we have∫∞−∞ fX(y)dy = 1.We have already seen the example of the Gaussian random variable that is absolutely contin-

uous. Most useful continuous random variables are absolutely continuous; other examples thatyou may have encountered previously include the uniform, exponential, Student t, gamma andbeta distributions. Typical examples of discrete random variables are the binomial, geometric andPoisson distributions.

Independence

In this subsection we consider the meaning of independence for events, random variables andσ-fields.

A useful heuristic is: independence means multiply. We say that two events A1, A2 ∈ F areindependent if

P(A1 ∩ A2) = P(A1)P(A2).

We extend this by induction to n events. But for many applications, we want to discuss indepen-dence of infinitely many events, or to be precise a sequence (An) of events with An ∈ F for alln ∈ N. The definition of independence is extended from the finite case by considering all finitesubsets of the sequence.

Formally: we say that the events in the sequence (An) are independent if the finite setAi1 , Ai2 , . . . , Aim is independent for all finite subsets i1, i2, . . . , im of the natural numbers,i.e.

P(Ai1 ∩ Ai2 ∩ · · · , Aim) = P(Ai1)P(Ai2) · · ·P(Aim).

We recall that two random variables X and Y are said to be independent if P(X ∈ A, Y ∈B) = P(X ∈ A)P(Y ∈ B), for all A,B ∈ B(R). In other words the events (X ∈ A) and (Y ∈ B)are independent for all A,B ∈ B(R). Again this is easily extended to finite collections of randomvariables. Now suppose we are given a sequence of random variables (Xn). We say that the Xn’sare independent if every finite subset Xi1 , Xi2 , . . . , Xim of random variables is independent, i.e.

P(Xi1 ∈ Ai1 , Xi2 ∈ Ai2 , . . . , Xim ∈ Aim) = P(Xi1 ∈ Ai1)P(Xi2 ∈ Ai2) · · ·P(Xim ∈ Aim)

for all Ai1 , Ai2 , . . . , Aim ∈ B(R) and for all finite i1, i2, . . . , im ⊂ N.In the case where there are two random variables, we may consider the random vector Z =

(X, Y ) as a measurable function from (Ω,F) to (R2,B(R2)) where the Borel σ-field B(R2) is thesmallest σ-field generated by all open intervals of the form (a, b) × (c, d). The law of Z is, asusual, pZ = P Z−1 and the joint law of X and Y is precisely pZ(A×B) = P(X ∈ A, Y ∈ B) forA,B ∈ B(R). Then X and Y are independent if and only if

pZ(A×B) = pX(A)pY (B),

i.e. the joint law factorises as the product of the marginals.

Theorem 4.2.3 If X and Y are independent integrable random variables. Then

E(XY ) = E(X)E(Y ).

62

Page 64: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Proof: By the two-dimensional version of Problem 4.10,

E(XY ) =∫R2xypZ(dx, dy)

=(∫

RxpX(dx)

)(∫RypY (dy)

)= E(X)E(Y ),

where we have used Fubini’s theorem to write the integral over R2 as a repeated integral.

Lets go back to measure theory and consider a measurable space (S,Σ). We say that Σ′ ⊆ Σis a sub-σ-field if it is itself a σ-field. For example the trivial σ-field S, ∅ is a sub-σ-field of anyσ-field defined on S. If (An) is a sequence of sets in Σ then σ(A1, A2, . . .) is defined to be thesmallest sub-σ-field of Σ that contains An for all n ∈ N.

Sub-σ-fields play an important role in probability theory. For example let X be a randomvariable defined on (Ω,F ,P). Then σ(X) is the smallest sub-σ-field of F that contains all theevents X−1(A) for A ∈ B(R). For example let X describe a simple coin toss so that

X =

0 if the coin shows tails1 if the coin shows heads

If A = X−1(0) then Ac = X−1(1) and σ(X) = ∅, A,Ac,Ω.Two sub-σ-fields G1 and G2 are said to be independent if

P(A ∩B) = P(A)P(B)

for all A ∈ G1, B ∈ G2. In Section 4.4 we will need the following proposition which we statewithout proof.

Proposition 4.2.4 Let (An) and (Bn) be sequences of events in F which are such that the com-bined sequence (An, Bn) is independent (i.e. any finite subset containing both An’s and Bm’s isindependent.) Then the two sub-σ-fields σ(A1, A2, . . .) and σ(B1, B2, . . .) are independent.

This result is very natural! It hopefully not hard to believe that it is true. A proof can be foundin e.g. Rosenthal pp.28–29, but be aware that this requires the application of a rather deep resultabout σ-fields that we have also not proved in this course.

63

Page 65: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.3 The Borel-Cantelli lemmas

The Borel-Cantelli lemmas are a tool for understanding the tail behaviour of a sequence (En) ofevents. The key definitions are

En i.o. = En, infinitely often =⋂m

⋃n≥m

En = ω : ω ∈ En for infinitely many n

En e.v. = En, eventually =⋃m

⋂n≥m

En = ω : ω ∈ En for all sufficiently large n.

The set En i.o. is the event that infinitely many of the individual events En occur. The setEn e.v. is the event that, for some (random) N , all the events En for which n ≥ N occur.

For example, we might take an infinite sequence of coin tosses and choose En to be the eventthat the nth toss is a head. Then En i.o. is the event that infinitely many heads occur, andEn e.v. is the event that, after some point, all remaining tosses show heads.

Note that by straightforward set algebra,

Ω \ En i.o. = Ω \ En e.v.. (4.1)

In our coin tossing example, Ω \ En is the event that the nth toss is a tail. So (4.1) says that‘there are not infinitely many heads’ if and only if ‘eventually, we see only tails’.

The Borel-Cantelli lemmas, respectively, give conditions under which the probability of En i.o.is either 0 or 1.

Lemma 4.3.1 (First Borel-Cantelli Lemma) Let (En)n∈N be a sequence of events and sup-pose

∑∞n=1 P[En] <∞. Then P[En i.o.] = 0.

Proof: We have

P

⋂N

⋃n≥N

En

= limN→∞

P

⋃n≥N

EN

≤ limN→∞

∞∑n=N

P[En] = 0,

Here, the first step follows by applying Theorem 4.2.1 to the decreasing sequence of events (BN )where BN =

⋃n≥N En. The second stop follows by Theorem 1.6.2 and the fact that limits preserve

weak inequalities. The final step follows because∑∞n=1 P[En] <∞.

For example, suppose that (Xn) are random variables that take the values 0 and 1, andthat P[Xn = 1] = 1

n2 for all n. Then∑n P[Xn = 1] =

∑n

1n2 < ∞ so, by Lemma 4.3.1,

P[Xn = 1 i.o.] = 0, which by (4.1) means that P[Xn = 0 e.v.] = 1. So, almost surely, beyondsome (randomly• located) point in our sequence (Xn), we will see only zeros. Note that we didnot require the (Xn) to be independent.

Lemma 4.3.2 (Second Borel-Cantelli Lemma) Let (En)n∈N be a sequence of independentevents and suppose that

∑∞n=1 P[En] =∞. Then P[En i.o.] = 1.

Proof: Write Ecn = Ω\En. We will show that P[Ec

n e.v.] = 0, which by (4.1) implies our statedresult. Note that

P[Ecn e.v.] = P

⋃N

⋂n≥N

Ecn

≤ ∞∑N=1

P

⋂n≥N

Ecn

(4.2)

64

Page 66: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

by Theorem 1.6.2. Moreover, since the (En) are independent, so are the (Ecn), so

P

⋂n≥N

Ecn

=∞∏n=N

P[Ecn] =

∞∏n=N

(1− P[En]) ≤∞∏n=N

e−P[En] = exp(−∞∑n=N

P[En])

= 0.

Here, the first step follows by Problem 4.3. The second step is immediate and the third step usesthat 1 − x ≤ e−x for x ∈ [0, 1]. The fourth step is immediate and the final step holds because∑n P[En] =∞. By (4.2) we thus have P[Ec

n e.v.] = 0.

For example, suppose that (Xn) are i.i.d. random variables such that P[Xn = 1] = 12 and

P[Xn = −1] = 12 . Then

∑n P[Xn = 1] = ∞ and, by Lemma 4.3.2, P[Xn = 1 i.o.] = 1. By

symmetry, we have also P [Xn = 0 i.o.] = 1. So, if we look along our sequence, almost surely wewill see infinitely many 1s and infinitely many 0s.

Since both the Borel-Cantelli lemmas come down to summing a series, a useful fact to rememberfrom real analysis is that, for p ∈ R,

∞∑n=1

n−p <∞ ⇔ p > 1.

This follows from the integral test for convergence of series. Proof is left as an exercise for you.

65

Page 67: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.4 Kolmogorov’s 0-1 law

Let (An) be a sequence of events in F . The tail σ-field associated to (An) is

τ =∞⋂n=1

σ(An, An+1, . . . , ).

The intuition here is that τ contains events which depend only on the behaviour ‘in the tail’ ofthe sequence (An). That is, if we have an event E ∈ τ , we could tell whether E occurred bylooking only at the values of An for large n.

The next result may appear quite surprising. It is called the Kolmogorov zero-one law afterA. N. Kolmogorov.

Theorem 4.4.1 (Kolmogorov’s 0-1 law.) Let (An) be a sequence of independent events in Fand τ be the tail σ-field that they generate. If A ∈ τ then either P(A) = 0 or P(A) = 1.

Proof: If A ∈ τ then A ∈ σ(An, An+1, . . . , ) for all n ∈ N. Then by Proposition 4.2.4 A

is independent of A1, A2, . . . , An−1 for all n = 2, 3, 4, . . .. Since independence is only definedin terms of finite subcollections of sets, it follows that A is independent of A1, A2, . . .. ButA ∈ τ ⊆ σ(A1, A2, . . . , ). Hence A is independent of itself. So P(A) = P(A ∩ A) = P(A)2 andhence P(A) = 0 or 1.

From the definitions at the top of Section 4.3, it is clear that An i.o. ∈ τ and An e.v. ∈ τ .In the light of the Kolmogorov 0-1 law, the second Borel-Cantelli lemma should no longer seemso surprising.

66

Page 68: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.5 Convergence of Random Variables

Let (Xn) be a sequence of random variables, all of which are defined on the same probabilityspace (Ω,F ,P). There are various different ways in which we can examine the convergence of thissequence to a random variable X (which is also defined on (Ω,F ,P). They are called modes ofconvergence.

When we talk about convergence of real numbers an → a we only have one mode of convergence,which we might think of as convergence of the value of an to the value of a. Random variablesare much more complicated objects; they take many different values with different probabilities.For this reason, there are multiple different modes of convergence of random variables.

We say that (Xn) converges to X

• in probability if given any a > 0, we have P(|Xn −X| > a)→ 0 as n→∞,

• almost surely if P[Xn(ω)→ X(ω), as n→∞] = 1,

• in Lp, where p ∈ [1,∞), if E(|Xn −X|p)→ 0 as n→∞.

When (Xn) converges to X almost surely we sometimes write Xn → X a.s. as n→∞. We mayalso write the type of convergence above the arrow e.g. Xn

L2→ X or Xn

a.s.→ X.For Lp convergence we are usually only interested in the cases p = 1 and p = 2. The case p = 2

is sometimes known as convergence in mean square. Note that Lp convergence is only defined forp ≥ 1.

Happily, there are some relationships between these different modes of convergence.

Theorem 4.5.1 We have:

1. If q ≥ p ≥ 1, then convergence in Lq implies convergence in Lp.

2. Convergence in Lp implies convergence in probability.

3. Convergence almost surely implies convergence in probability.

Proof:

1. (?) This part is non-examinable because we need an inequality that is (just) outside of ourown syllabus. It is true that E[|X|]p ≤ E[|X|p] for all p ≥ 1 and all random variables X.To find a proof of this fact, investigate Jensen’s inequality or Hölder’s inequality. Putting|Xn − X| into the inequality, and putting q/p ≥ 1 in place of p, we have E[|Xn − X|q] ≤(E[|Xn −X|p])q/p. The result follows.

2. Thanks to (1), it suffices to prove that L1 convergence implies convergence in probability.This follows from Markov’s inequality (Lemma 3.3.2) since for any a > 0,

P(|Xn −X| > a) ≤ E(|Xn −X|a

.

If XnL1→ X then the right hand side tends to zero as n→∞, hence so does the left.

67

Page 69: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3. Let ε > 0 be arbitrary and let An =⋃∞m=n(|Xm − X| > ε). Then (An) is a decreasing

sequence of events. Let A =⋂∞n=1 An. If ω ∈ A then Xn(ω) cannot converge to X(ω) as

n→∞ and soP(A) ≤ P(Ω− Ω′) = 0.

By Theorem 4.2.1 part (2), limn→∞ P(An) = P(A) = 0. But then by monotonicity,

P(|Xn −X| > ε) ≤ P(An)→ 0 as n→∞.

We have a partial converse to Theorem 4.5.1 (2)

Theorem 4.5.2 If Xn → X in probability as n → ∞ then there is a subsequence of (Xn) thatconverges to X almost surely.

Proof: If (Xn) converges in probability to X, for all c > 0, given any ε > 0, there existsN(c) ∈ N so that for all n > N(c),

P(|Xn −X| > c) < ε.

In order to find our subsequence, first choose, c = 1 and ε = 1/2, then for n > N(1),

P(|Xn −X| > 1) < 1/2.

Next choose c = 1/2 and ε = 1/4, then for n > N(2),

P(|Xn −X| > 1/2) < 1/4,

and for r ≥ 3, c = 1/r and ε = 1/2r, then for n > N(r),

P(|Xn −X| > 1/r) < 1/2r,

Now choose the numbers kr = N(r) + 1, for r ∈ N to obtain a subsequence (Xkr) so that for allr ∈ N,

P(|Xkr −X| > 1/r) < 1/2r.

Since∑∞r=1

12r <∞, by the first Borel-Cantelli lemma (Lemma 4.3.1 (i)) we have

P(|Xkr −X| > 1/r i.o.) = 0,

and soP(|Xkr −X| ≤ 1/r e.v.) = 1.

This means that for all r ∈ N

P(⋃n∈N

⋂r>n

|Xkr −X| ≤ 1/r)

= 1,

and so, with probability 1, at least one of the events⋂r>n |Xkr −X| ≤ 1/r occurs. Then in the

definition of almost sure convergence, we can take Ω′ = |Xkr − X| ≤ 1/r e.v, and so for anyω ∈ Ω′, given ε > 0, we can find N ∈ N such that for r > N, |Xkr(ω) −X(ω)| ≤ 1

r < ε, and theresult follows.

Remark 4.5.3 There is no simple relationship between a.s. convergence and convergence in Lp.

68

Page 70: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.6 Laws of Large Numbers

Let (Xn) be a sequence of random variables all defined on the same probability space, that havethe following properties,

• they are independent (see section 4.2)

• they are identically distributed , i.e. pXn = pXm for all n 6= m. In other words, for allA ∈ B(R),

P(X1 ∈ A) = P(X2 ∈ A) = · · · = P(Xn ∈ A) = · · ·

Such a sequence is said to be ‘i.i.d.’. They are very important in modelling (consider the stepsof a random walk) and also statistics (consider a sequence of idealised experiments carried outunder identical conditions). We can form a new sequence of random variables (Xn) where Xn isthe empirical mean

Xn = 1n

(X1 +X2 + · · ·+Xn).

If Xn is integrable for some (and hence all) n ∈ N then E(Xn) = µ is finite. It also followsthat Xn is integrable, and by linearity E(Xn) = µ. If E(X2

n) < ∞ then (see Problem 4.12)Var(Xn) = σ2 <∞ for some (and hence all) n ∈ N, and it follows by elementary properties of thevariance that Var(Xn) = σ2

n . It is extremely important to learn about the asymptotic behaviourof Xn as n → ∞. Two key results are the weak law of large numbers or WLLN and the stronglaw of large numbers or SLLN. In fact the second of these implies the first but its much harder toprove. Later in this chapter we will study the central limit theorem or CLT.

Theorem 4.6.1 (WLLN) Let (Xn) be a sequence of integrable i.i.d. random variables withE(Xn) = µ for all n ∈ N. Suppose also that E(X2

n) < ∞ for all n ∈ N. Then Xn → µ inprobability as n→∞.

Proof: Let σ2 = Var(Xn) for all n ∈ N. Then by Chebychev’s inequality, for all a > 0,

P(|Xn − µ| > a) ≤ Var(Xn)a2

= σ2

na2 → 0 as n→∞.

Theorem 4.6.2 (SLLN) Let (Xn) be a sequence of integrable i.i.d. random variables withE(Xn) = µ for all n ∈ N. Suppose also that E(X2

n) < ∞ for all n ∈ N. Then Xn → µ

almost surely as n→∞.

Before we discuss the proof we make an observation: SLLN ⇒ WLLN, by Theorem 4.5.1 (2).The full proof of the SLLN is a little difficult for this course (see e.g. Rosenthal pp.47-9). We’llgive a manageable proof by making an assumption on the fourth moments of the sequence (Xn).

Assumption 4.6.3 E((Xn − µ)4) = b <∞ for all n ∈ N.

69

Page 71: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Proof: [Of Theorem 4.6.2 under Assumption 4.6.3.] Assume that µ = 0. If not we can justreplace Xn throughout the proof with Yn = Xn − µ. Let Sn = X1 + X2 + · · · + Xn so thatSn = nXn for all n ∈ N. Consider E(S4

n). It contains many terms of the form E(XjXkXlXm)(with distinct indices) and these all vanish by independence. A similar argument disposes ofterms of the form E(XjX

3k) and E(XjXkX

2l ). The only terms with non-vanishing expectation are

n terms of the form X4i and

(n2).(4

2)

= 3n(n − 1) terms of the form X2iX

2j with i 6= j. Now by

Problem 4.5, X2i and X2

j are independent for i 6= j and so

E(X2iX

2j ) = E(X2

i )E(X2j ) = Var(X2

i )Var(X2j ) = σ4.

We then haveE(S4

n) =n∑i=1

E(X4i ) +

∑i 6=j

E(X2iX

2j )

= nb+ 3n(n− 1)σ4 ≤ Kn2,

where K = b+ 3σ4. Then for all a > 0, by Markov’s inequality (Lemma 3.3.1)

P(|Xn| > a) = P(S4n > a4n4)

≤ E(S4n)

a4n4

≤ Kn2

a4n4 = K

a4n2 .

But∑∞n=1

1n2 <∞ and so by the first Borel-Cantelli lemma, P(lim supn→∞ |Xn| > a) = 0 and so

P(lim infn→∞ |Xn| ≤ a) = 1. By a similar argument to the last part of Theorem 4.5.2 we deducethat Xn → 0 a.s. as required.

Notes

1. The last part of the proof skated over some details. In fact you can show that for anysequence (Yn) of random variables P(lim supn→∞ |Yn−Y | ≥ a) = 0 for all a > 0 implies thatYn → Y (a.s.) as n→∞. See Lemma 5.2.2 in Rosenthal p.45.

2. The proof in the general case without Assumption 4.1 uses a truncation argument and definesYn = Xn1Xn≤n. Then Yn ≤ n for all n and so E(Y k

n ) ≤ nk for all k. If Xn ≥ 0 for all n,E(Yn)→ µ by monotone convergence. Roughly speaking we can prove a SLLN for the Yns.We then need a clever probabilistic argument to transfer this to the Xns. The assumptionin Theorem 4.6.2 that all the random variables have a finite second moment may also bedropped.

4.7 Characteristic Functions and Weak Convergence

In this section, we introduce two tools that we will need to prove the central limit theorem.

Characteristic Functions

Let (S,Σ,m) be a measure space and f : S → C be a complex-valued function. Then we can writef = f1 + if2 where f1 and f2 are real-valued functions. We say that f is measurable/integrable ifboth f1 and f2 are. Define |f |(x) = |f(x)| =

√f1(x)2 + f2(x)2 for each x ∈ S. It is not difficult

70

Page 72: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

to see that |f | is measurable, using e.g. Problem 2.4. In Problem 4.15, you can prove that f isintegrable if and only if |f | is integrable. The Lebesgue dominated convergence theorem continuesto hold for sequences of measurable functions from S to C.

Now let X be a random variable defined on a probability space (Ω,F ,P). Its characteristicfunction φX : R→ C and is defined, for each u ∈ R, by

φX(u) = E(eiuX) =∫ReiuypX(dy).

Note that y → eiuy is measurable since eiuy = cos(uy) + i sin(uy) and integrability holds since|eiuy| ≤ 1 for all y ∈ R and in fact we have |φX(u)| ≤ 1 for all u ∈ R.

Example X ∼ N(µ, σ2) means that X has a normal or Gaussian distribution with mean µ

and variance σ2 so that for all x ∈ R,

FX(x) = 1σ√

∫ x

−∞exp

−1

2

(y − µσ

)2dy.

In Problem 4.15, you can show for yourself that in this case, for all u ∈ R

φX(u) = expiµu− 1

2σ2u2.

Characteristic functions have many interesting properties. Here is one of the most useful. Itis another instance of the “independence means multiply” philosophy.

Theorem 4.7.1 If X and Y are independent random variables then for all u ∈ R,

φX+Y (u) = φX(u)φY (u).

Proof:

φX+Y (u) = E(eiu(X+Y )) = E(eiuXeiuY ) = E(eiuX)E(eiuY ) = φX(u)φY (u),

by Problem 4.5.

The following result is also important but we omit the proof. It tells us that the probabilitylaw of a random variable is uniquely determined by its characteristic function.

Theorem 4.7.2 If X and Y are two random variables for which φX(u) = φY (u) for all u ∈ Rthen pX = pY .

The characteristic function is the Fourier transform of the law pX of the random variable Xand we have seen that it always exists. In elementary probability theory courses we often meetthe Laplace transform E(euX) of X which is called the moment generating function. This existsin some nice cases (e.g. when X is Gaussian), but will not do so in general as y → euy may not beintegrable since it becomes unbounded as y →∞ (when u > 0) and as y → −∞ (when u < 0.)

We will now develop an important inequality that we will need to prove the central limittheorem. Let x ∈ R and let Rn(x) be the remainder term of the series expansion at n ∈ N ∪ 0in eix, i.e.

Rn(x) = eix −n∑k=0

(ix)kk! .

71

Page 73: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Note that R0(x) = eix − 1 = ∫ x

0 ieiydy if x > 0

−∫ 0x ie

iydy if x < 0. From the last two identities, we have

|R0(x)| ≤ min|x|, 2. Then you should check that Rn(x) = ∫ x

0 iRn−1(y)dy if x > 0−∫ 0x iRn−1(y)dy if x < 0

. Fi-

nally using induction, we can deduce the useful inequality:

|Rn(x)| ≤ min

2|x|nn! ,

|x|n+1

(n+ 1)!

.

Now let X be a random variable with characteristic function φX for which E(|X|n) <∞ for somegiven n ∈ N. Then integrating the last inequality yields for all y ∈ R∣∣∣∣∣φX(y)−

n∑k=0

(iy)kE(Xk)k!

∣∣∣∣∣ ≤ E[min

2|yX|nn! ,

|yX|n+1

(n+ 1)!

]. (4.3)

When we prove the CLT we will want to apply this in the case n = 2 to a random variablethat has E(X) = 0. Then writing E(X2) = σ2 we deduce that for all u ∈ R.∣∣∣∣φX(y)− 1 + 1

2σ2y2∣∣∣∣ ≤ θ(y), (4.4)

where θ(y) = y2E[min

|X|2, |y| |X|

3

6

]. Note that

min|X|2, |y| |X|

3

6

≤ |X|2

which is integrable by assumption. Also we have

limy→0

min|X|2, |y| |X|

3

6

= 0,

and so by the dominated convergence theorem we can deduce the following important propertyof θ which is that

limy→0

θ(y)y2 = 0. (4.5)

Weak Convergence

A sequence (µn) of probability measures on R is said to converge weakly to a probability measureµ if

limn→∞

∫Rf(x)µn(dx) =

∫Rf(x)µ(dx)

for all bounded continuous functions f defined on R. In the case where there is a sequence ofrandom variables (Xn) and instead of µn we have pXn and also µ is the law pX of a randomvariable X we say that (Xn) converges in distribution to X; so that convergence in distributionmeans the same thing as weak convergence of the sequence of laws. It can be shown that (Xn)converges to X in distribution if and only if limn→∞ FXn(x) = FX(x) at every continuity point xof the c.d.f. F .

It can be shown that convergence in probability implies convergence in distribution (and so,by Theorem 4.5.1 (2), almost sure convergence also implies convergence in distribution.) For aproof, see Proposition 10.0.3 on p.98 in Rosenthal.

72

Page 74: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

There is an important link between the concepts of weak convergence and characteristic func-tions which we present next.

Theorem 4.7.3 Let (Xn) be a sequence of random variables, where each Xn has characteristicfunction φn and let X be a random variable having characteristic function φ. Then (Xn) convergesto X in distribution if and only if limn→∞ φn(u) = φ(u) for all u ∈ R.

Proof: We’ll only do the easy part here. Suppose (Xn) converges to X in distribution. Then

φn(u) =∫R

cos(uy)pXn(dy) + i

∫R

sin(uy)pXn(dy)

→∫R

cos(uy)pX(dy) + i

∫R

sin(uy)pX(dy) = φ(u),

as n → ∞, since both y → cos(uy) and y → sin(uy) are bounded continuous functions. See e.g.Rosenthal pp.108-9 for the converse 1.

4.8 The Central Limit Theorem

Let (Xn) be a sequence of i.i.d. random variables having finite mean µ and finite variance σ2. Wehave already met the SLLN which tells us that Xn converges to µ a.s. as n→∞. Note that thestandard deviation (i.e. the square root of the variance) of Xn is σ/

√n which also converges to

zero as n→∞. Now consider the sequence (Yn) of standardised random variables defined by

Yn = Xn − µσ/√n

= Sn − nµσ√n

(4.6)

Then E(Yn) = 0 and Var(Yn) = 1 for all n ∈ N.Its difficult to underestimate the importance of the next result. It shows that the normal

distribution has a universal character as the attractor of the sequence (Yn). From a modellingpoint of view, it tells us that as you combine together many i.i.d. different observations then theyaggregate to give a normal distribution. This is of vital importance in applied probability andstatistics. Note however that if we drop our standing assumption that all the Xn’s have a finitevariance, then this would no longer be true.

Theorem 4.8.1 (Central Limit Theorem) Let (Xn) be a sequence of i.i.d. random variableseach having finite mean µ and finite variance σ2. Then the corresponding sequence (Yn) of stan-dardised random variables converges in distribution to the standard normal Z ∼ N(0, 1), i.e. forall a ∈ R

limn→∞

P(Yn ≤ a) = 1√2π

∫ a

−∞e−

12y

2dy.

Before we give the proof, we state a known fact from elementary analysis. We know that forall y ∈ R,

limn→∞

(1 + y

n

)n= ey.

Now for all y ∈ R, let (αn(y)) be a sequence of real (or complex) numbers for which limn→∞ αn(y) =0. Then we also have that for all y ∈ R

limn→∞

(1 + y + αn(y)

n

)n= ey (4.7)

1This reference is to the first edition. You’ll find it on pp. 132-3 in the second edition

73

Page 75: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

You may want to try your hand at proving this rigorously.Proof: For convenience we assume that µ = 0 and σ = 1. Indeed if it isn’t we can just replaceXn everywhere by (Xn − µ)/σ. Let ψ be the common characteristic function of the Xns so thatin particular ψ(u) = E(eiuX1) for all u ∈ R. Let φn be the characteristic function of Yn for eachn ∈ N. Then for all u ∈ R, using Theorem 4.7.1 we find that

φn(u) = E(eiuSn/√n)

= E(eiu( 1√

n(X1+X2+···+Xn)

)= ψ(u/

√n)n

= E(ei u√

nX1)n

=(

1 + iu√nE(X1)− u2

2nE(X21 ) + θn(u)

n

)n,

where by (4.4) and the same argument we used to derive (4.5),

|θn(u)| ≤ u2E[min

|X1|2,

|u|.|X1|3

6√n

]→ 0

as n→∞, for all u ∈ R.Now we use (4.7) to find that

φn(u) =(

1−u2

2 − θn(u)n

)n→ e−

12u

2 as n→∞.

The result then follows by Theorem 4.7.3.

Further discussion (?)

The CLT may be extensively generalised. We mention just two results here. If the i.i.d. sequence(Xn) is such that µ = 0 and E(|Xn|3) = ρ3 <∞, the Berry-Esseen theorem gives a useful boundfor the difference between the cdf of the normalised sum and the cdf Φ of the standard normal.To be precise we have that for all x ∈ R, n ∈ N:∣∣∣∣P( Sn

σ√n≤ x

)− Φ(x)

∣∣∣∣ ≤ Cρ√nσ3 ,

where C > 0.We can also relax the requirement that the sequence (Xn) be i.i.d.. Consider the triangular

array (Xnk, k = 1, . . . , n, n ∈ N) of random variables which we may list as follows:X11

X21 X22

X31 X32 X33...

......

Xn1 Xn2 Xn3 . . . Xnn

......

......

...

74

Page 76: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

We assume that each row comprises independent random variables. Assume further thatE(Xnk) = 0 and σ2

nk = E(X2nk) <∞ for all k, n. Define the row sums Sn = Xn1 +Xn2 + · · ·+Xnn

for all n ∈ N and define τn = Var(Sn) =∑nk=1 σ

2nk. Lindeburgh’s central limit theorem states that

if we have the asymptotic tail condition

limn→∞

n∑k=1

1τ 2n

∫|Xnk|≥ετn

X2nk(ω)dP(ω) = 0,

for all ε > 0 then Snτn

converges in distribution to a standard normal as n→∞.The highlights of this last chapter have been the proofs of the law of large numbers and central

limit theorem. There is a third result that is often grouped together with the other two as one ofthe key results about sums of i.i.d. random variables. It is called the law of the iterated logarithmand it gives bounds on the fluctuations of Sn for an i.i.d sequence with µ = 0 and σ = 1. Theresult is quite remarkable. It states that

lim supn→∞

Sn√2n log log(n)

= 1 a.s. (4.8)

This means that (with probability one) if c > 1 then only finitely many of the events Sn >

c√

2n log log(n) occur but if c < 1 then infinitely many of such events occur. This gives a veryprecise description of the long-term behaviour of Sn.

You should be able to deduce from (4.8) that

lim infn→∞

Sn√2n log log(n)

= −1 a.s.

75

Page 77: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.9 Exercises

4.1 Write down probabilistic versions of the monotone convergence theorem, Fatou’s lemma andthe dominated convergence theorem, using random variables in place of measurable functionsand expectation in place of the integral.

4.2 Let A and B be independent events. Show that their complements Ac and Bc are alsoindependent.

4.3 (a) Let (An) be a sequence of independent events. Show that

P[⋂n∈N

An

]=∞∏n=1

P[An]. (4.9)

(b) Recall that we define independence of a sequence of events (An) in terms of finitesubsequences (e.g. as in Section 4.2). An ‘obvious’ alternative definition might to beuse (4.9) instead. Why is this not a sensible idea?

4.4 Establish Theorem 4.2.2 part (6).

4.5 Let X and Y be independent random variables and f, g : R → R be Borel measurable.Deduce that f(X) and g(Y ) are also independent.

4.6 Let X be a random variable and a ∈ R. Prove that

E(maxX, a) ≥ maxE(X), a.

Hint: Write E(maxX, a) as an integral.

4.7 (a) Let X be a random variable that takes values in N. Explain why X =∑∞i=1 1X≥i and

hence show that E(X) =∑∞i=1 P(X ≥ i).

(b) Let X be a non-negative random variable. For i ∈ N, let Ai = i− 1 ≤ X < i. Showthat

∞∑i=1

(i− 1)1Ai ≤ X <∞∑i=1

i1Ai

and hence deduce that∞∑k=1

P(X ≥ k) ≤ E(X) < 1 +∞∑k=1

P(X ≥ k).

4.8 Let k ∈ N. Prove that in a sequence of independent coin tosses, infinitely many runs of kconsecutive heads will occur.

4.9 Let (Ω,F ,P) be a probability space. Let (An) be a sequence of events.

(a) Show that An e.v. ⊆ An i.o..(b) Show that An i.o.c = Acn e.v. and deduce that P[An i.o.] = 1− P[Acn e.v.].(c) Show that

P[An e.v.] ≤ lim infn→∞

P[An] ≤ lim supn→∞

P[An] ≤ P [An i.o.] .

76

Page 78: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.10 Let X be a real-valued random variable with law pX defined on a probability space (Ω,F ,P).Show that for all bounded measurable functions f : R→ R,∫

Ωf(X(ω))dP(ω) =

∫Rf(x)dpX(x).

What can you say about these integrals when f is non-negative but not necessarily bounded?Hint: Begin with f an indicator function, then extend to simple, bounded non-negative andgeneral bounded measurable functions.

4.11 Let X be a real-valued random variable defined on a probability space (Ω,F , P ). Use theresult of question 3.4 to derive the “probabilist’s” version of Chebychev’s inequality:

P [|X − µ| ≥ c] ≤ var(X)c2 ,

where var(X) and E(X) = µ are both assumed to be finite.

4.12 (a) Suppose that X and Y are random variables and both X2 and Y 2 are integrable. Provethe Cauchy-Schwarz inequality:

|E[XY ]| ≤ (E[X2] 12 )(E[Y 2] 1

2 ).

Hint: Consider g(t) = E[(X + tY )2] as a quadratic function of t ∈ R.

(b) Deduce that if X2 is integrable then so is X, and in fact |E[X]|2 ≤ E[X2].(c) Let X be any random variable with a finite mean E[X] = µ. Show that E[X2] < ∞ if

and only if var(X) <∞.

4.13 Let X be a random variable for which E(|X|n) < ∞. Show that E(|X|m) < ∞ for all1 ≤ m < n.Hint: Write X = X1X≤1 +X1X>1.

4.14 Show that the converse to Theorem 4.5.1 part (2) is false by taking Ω = [0, 1],F = B([0, 1])and P to be Lebesgue measure. Take X = 0 and define Xn = 1An where A1 = [0, 1/2], A2 =[1/2, 1], A3 = [0, 1/4], A4 = [1/4, 1/2], A5 = [1/2, 3/4], A6 = [3/4, 1], A7 = [0, 1/8], A8 =[1/8, 1/4] etc.

4.15 Let (S,Σ) be a measurable space and f : S → C be a (complex-valued) measurable function.Deduce that f is integrable if and only if |f | is.

4.16 If X is a normally distributed random variable with mean µ and variance σ2, deduce thatits characteristic φX function is given for each u ∈ R by φX(u) = exp

iµu− 1

2σ2u2.

Hint: First show that it is sufficient to establish the case µ = 0 and σ = 1 by writing Y =1σ (X − µ). Then show that y → φY (u) is differentiable and deduce that φ′Y (u) = −uφY (u).Now solve the initial value problem using what you know about φY (0).

4.17 Suppose that X is a random variable for which E(|X|n) <∞ for some n. Explain carefullywhy

E(Xn) = i−ndn

dunφX(u)

∣∣∣∣u=0

.

77

Page 79: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.18 (a) Let X be a non-negative random variable and a > 0. Show that E(e−aX) ≤ 1.(b) A random variable is said to have an exponential moment if E(ea|X) < ∞ for some

a > 0. Show that if X ∼ N(0, 1) then it has exponential moments for all a > 0.(c) If X has an exponential moment, deduce that it has moments to all orders, i.e. that

E(|X|n) <∞ for all n ∈ N.

4.19 Show that the conclusion of the weak law of large numbers continue to hold if the requirementthat the random variables (Xn) are i.i.d. is replaced by the weaker condition that they areidentically distributed, and uncorrelated, i.e. E(XmXn) = E(Xm)E(Xn) whenever m 6= n.

4.20 The first central limit theorem (CLT) to be established was due to de Moivre and Laplace.In this case each Xn takes only two values, 1 with probability p and 0 with probability1 − p where 0 < p < 1 (i.e. the Xns are i.i.d. Bernoulli random variables.) Write downthe form of the CLT in this case (writing the standardised random variable Yn in terms ofSn = X1 + X2 + · · · + Xn), and explain its relation to the “binomial approximation to thenormal distribution”.

78

Page 80: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Chapter 5

Product Measures and Fubini’sTheorem (∆)

Note that this chapter is marked with a (∆). This means it is only included in MAS452/6352. Itis not included in MAS350. It extends Sections 1.7 and 3.6.

In this chapter, we give brief notes which are intended as a summary of additional readingthat you are expected to do outside the lectures. They emphasise the main ideas and concepts,but you will need to work carefully through all the relevant proofs. The recommended sourceis Adams and Guillemin, section 2.5, pp.89-102. This material is examinable. It can be studiedstraight after Chapter 3.

The aim of this section is to learn more about product measures, and how the theory ofLebesgue integration deals with multiple (double, triple, etc.) integrals. Before delving furtherinto the details of these ideas, we’ll need an additional tool from the theory of σ-fields.

5.1 Dynkin’s π − λ Lemma (∆)

Let (S,Σ) be a measurable space. A collection P of sets in S is called a π-system if A,B ∈ P ⇒A ∩B ∈ P (i.e. P is closed under intersections).

A collection L of sets in S is called a λ-system if

(L1) S ∈ L.

(L2) If (En) is an increasing sequence of sets in L (so En ⊆ En+1 for all n ∈ N), then⋃∞n=1 En ∈ L.

(L3) If E,F ∈ L and F ⊂ E then E − F ∈ L.

Note that by (L1) and (L3), λ-systems are closed under complements.

Proposition 5.1.1 If L is a λ-system that is also a π-system, then it is a σ-field.

Proof: Since L is closed under complements and finite intersections, it is closed under finiteunions by de Morgan’s laws. To show that L is a σ-algeba, we need to prove that if (An) is anarbitrary sequence of sets in L, then

⋃n∈NAn ∈ L. This follows by writing

⋃n∈NAn =

⋃n∈NBn

and using (L2), where

B1 = A1, B2 = B1 ∪ (A2 −B1), B3 = B2 ∪ (A3 −B2) . . .

79

Page 81: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Recall that if A is a collections of sets in S, then σ(A) is the smallest σ-field which containsA.

Lemma 5.1.2 (Dynkin’s π − λ Lemma) If P is a π-system and L is a λ-system with P ⊆ L,then σ(P) ⊆ L.

Proof: It suffices to prove that L(P), which is the smallest λ-system that contains P , is aσ-field. By Proposition 5.1.1, its enough to prove that L(P) is closed under intersections.

Step 1. Fix A ∈ L(P) and define

GA = B ⊆ Ω;A ∩B ∈ L(P).

You should check that GA is a λ-system.Step 2. If A,B ∈ P , then A ∩ B ∈ P ⊆ L(P). Hence B ∈ GA. So we’ve shown that P ⊆ GA,

when A ∈ P . But in Step 1, we proved that GA is a λ-system, and so since L(P) is the smallestsuch, we deduce that L(P) ⊆ GA. This means in particular that if A ∈ P , B ∈ L(P), thenA ∩B ∈ L(P).

Step 3. If A ∈ L(P), then by Step 2 we have A∩B = B ∩A ∈ L(P) when B ∈ P . This showsthat P ⊆ GA, when A ∈ L(P). Now using Step 1 again, we find that L(P) ⊆ GA, and the resultthen follows.

Corollary 5.1.3 If P is a π-system and F is a σ-field then σ(P) ⊆ F .

This result can be very useful for proving that a function is measurable.

5.2 Product Measure (∆)

Let (S1,Σ1,m1) and (S2,Σ2,m2) be measure spaces.The set S1 × S2 := (s1, s2); s1 ∈ S1, s2 ∈ S2 is the usual Cartesian product of sets. If

A ⊆ S1, B ⊆ S2, the subset A×B of S1 × S2 is called a product set.The σ-field Σ1 ⊗ Σ2 is defined to be the smallest σ-field of subsets of S1 × S2 which contains

all the product sets.If E ⊂ S1 × S2 and x ∈ S1, then Ex ⊂ S2 is called an x-slice of E where

Ex := y ∈ S2; (x, y) ∈ E.

Proposition 5.2.1 If E ∈ Σ1 ⊗ Σ2 then Ex ∈ Σ2 for all x ∈ S1.

Corollary 5.2.2 Let f : S1 × S2 → R be measurable and fix x ∈ S1. Define fx : S2 → R byfx(y) = f(x, y) for all y ∈ S2. Then fx is a measurable function.

Proof: For all a ∈ R, we need to show that f−1x ((a,∞)) ∈ Σ2. Define E = f−1((a,∞)). Since

f is measurable, E ∈ Σ1⊗Σ2. By Proposition 5.2.1, Ex ∈ Σ2. But Ex = f−1x ((a,∞)), and so the

result follows.

We can similarly define y-slices of sets in S1 × S2 and show that fy : S1 → R is measurable,for any y ∈ R, where fy(x) = f(x, y) for all x ∈ S1.

80

Page 82: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

We seek to define the product measure m1 ×m2 on (S1 × S2,Σ1 ⊗ Σ2). We need to make anassumption about the measures that we are using. Let (S,Σ,m) be a measure space. We say thatthe measure m is σ-finite if there exists a sequence (An) of subsets of S with An ∈ Σ for all n ∈ Nsuch that S =

⋃∞n=1 An and m(An) < ∞ for all n ∈ N. Clearly any finite measure (and hence

all probability measures) are σ-finite. It is easy to see that Lebesgue measure on R is σ-finite. Ameasure space (S,Σ,m) is said to be σ-finite, if m is a σ-finite measure.

From now on, let (S1,Σ1,m1) and (S2,Σ2,m2) be σ-finite measure spaces. Let E ∈ Σ1 ⊗ Σ2

and define φE : S1 → R byφE(x) = m2(Ex),

for all x ∈ S1. By using the Dynkin π− λ lemma, you can show that φE is measurable. Then wedefine product measure of E by

(m1 ×m2)(E) =∫S1

φE(x)dm1(x).

Again using the Dynkin π − λ lemma, you can show that this definition is consistent, in that

(m1 ×m2)(E) =∫S2

ψE(y)dm2(y),

where ψE : S2 → R is the measurable function defined by ψE(y) = m1(Ey) for all y ∈ S2.In Problem 3 you can check that if E = A×B is a product set, then

(m1 ×m2)(A×B) = m1(A)m2(B).

It can be shown that B(R2) = B(R)⊗ B(R) and Lebesgue measure on R2 is precisely λ× λ.

5.3 Fubini’s Theorem (∆)

We give two versions of this important result - one for general non-negative measurable functionsand the other for integrable functions.

Theorem 5.3.1 (Fubini’s Theorem 1) Let f : S1 × S2 → R be a non-negative measurablefunction. Then the mappings

x→∫S2

f(x, y)dm2(y),

and y →∫S1

f(x, y)dm1(x),

are both measurable. Furthermore

∫S1×S2

fd(m1 ×m2) =∫S1

(∫S2f(x, y)dm2(y)

)dm1(x)

=∫S2

(∫S1f(x, y)dm1(x)

)dm2(y). (5.1)

The proof works by first establishing the result for indicator functions, and then extending bylinearity to simple functions. The next step is to take an arbitrary non-negative measurable func-tion, approximate it by simple functions as in Theorem 2.4.1 and use the monotone convergencetheorem. Note that all three integrals in (5.1) may be infinite. The next result is more useful forapplications.

81

Page 83: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Theorem 5.3.2 (Fubini’s Theorem 2) Let f : S1 × S2 → R be an integrable function. Thenthe mappings

x→∫S2

f(x, y)dm2(y),

and y →∫S1

f(x, y)dm1(x),

are both equal (a.e.) to integrable functions. Furthermore

∫S1×S2

fd(m1 ×m2) =∫S1

(∫S2f(x, y)dm2(y)

)dm1(x)

=∫S2

(∫S1f(x, y)dm1(x)

)dm2(y). (5.2)

The proof works by writing f = f+− f− and applying Theorem 5.3.1 to f− and f+ separately.All of the results of this chapter extend in a straightforward way to products of finitely many

measure spaces.

82

Page 84: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

5.4 Exercises (∆)

Throughout these problems (S1,Σ1,m1) and (S2,Σ2,m2) are measure spaces. From Problem 3onwards they are always σ-finite.

5.1 If E,F ⊂ S1 × S2 and x ∈ S1, show that

(a) (E ∩ F )x = Ex ∩ Fx,(b) (Ec)x = (Ex)c,(c) (

⋃∞n=1 En)x =

⋃∞n=1(En)x, where (En) is a sequence of subsets of S1 × S2.

5.2 If m1 and m2 are σ-finite measures, show that the product measure m1×m2 is also σ-finite.

5.3 If A ∈ Σ1 and B ∈ Σ2, prove that (m1 ×m2)(A×B) = m1(A)m2(B).

5.4 A product set A1 × A2 is said to be finite if mi(Ai) < ∞ for i = 1, 2. Show that productmeasure m1 ×m2 is the unique measure µ on (S1 × S2,Σ1 ⊗ Σ2) for which

µ(A1 × A2) = m1(A1)m2(A2),

for all finite product sets.Hint: Use Dynkin’s π − λ lemma.

5.5 (a) Let f : S1 → R and g : S2 → R be measurable functions. Define h : S1 × S2 → R by

h(x, y) = f(x)g(y),

for all x ∈ S1, y ∈ S2. Show that h is measurable.(b) If f and g are integrable, show that h is also integrable and that∫

S1×S2

h d(m1 ×m2) =(∫

S1

fdm1

)(∫S2

gdm2

).

5.6 Let aij ≥ 0 for all 1 ≤ i, j ≤ ∞. Show that∞∑i=1

∞∑j=1

aij =∞∑j=1

∞∑i=1

aij ,

(in the sense that both double series converge to the same limit, or diverge together).

5.7 Let (S,Σ,m) be a σ-finite measure space and f : S → R be a non-negative measurablefunction. Define Af = (x, t) ∈ S × R; 0 ≤ t ≤ f(x). Show that Af ∈ Σ⊗ B(R) and that

(m× λ)(Af ) =∫Sfdm.

5.8 Use Fubini’s theorem to prove that

limT→∞

∫ T

0

sin(x)x

dx = π

2 .

Hint: Write 1x =

∫∞0 e−xydy.

83

Page 85: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

5.9 (a) Show that for the function f : R2 → R, defined byf(x, y) = xy

(x2 + y2)2 , the iterated integrals∫ 1−1

(∫ 1−1 f(x, y)dy

)dx and

∫ 1−1

(∫ 1−1 f(x, y)dx

)dy

exist and are equal.(b) Show that f is not integrable over the square −1 ≤ x ≤ 1 and −1 ≤ y ≤ 1.

5.10 (?) This problem carries on from Problems 3.17-3.21. Like those problems, it deals withproperties of the Fourier transform.Assume that f and g are integrable functions on R, and that g is bounded. Define theconvolution f ∗ g of f with g by

(f ∗ g)(x) =∫Rf(x− y)g(y)dy,

for all x ∈ R. Show that |(f ∗g)(x)| <∞, and so f ∗g is a well–defined function from R to R.Show further that f ∗ g is both measurable and integrable, and that the Fourier transformof the convolution is the product of the Fourier transforms, i.e. that for all y ∈ R,

f ∗ g(y) = f(y)g(y).

84

Page 86: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Appendix A

Solutions to exercises

We include solutions to most exercises, but some are left entirely for you.

Chapter 11.1 The case n = 2 is B(ii). Now use induction: suppose the result holds for some n. Then

A1 ∪A2 ∪ · · · ∪An ∪An+1 = (A1 ∪A2 ∪ · · · ∪An) ∪An+1

= Bn ∪An+1.

Now Bn = A1 ∪ A2 ∪ · · · ∪ An ∈ B by the inductive hypothesis and An+1 ∈ B by assumption. HenceBn ∪An+1 ∈ B by B(ii) and the result follows.

1.2 There are(nr

)subsets of size r for 0 ≤ r ≤ n and so the total number of subsets is

∑n

r=0

(nr

)= (1+1)2 = 2n.

Here we used the binomial theorem (x+ y)n =∑n

r=0

(nr

)xryn−r.

1.3 To show Σ1 ∩ Σ2 is a σ-field we must verify S(i) to S(iii).S(i) Since S ∈ Σ1 and S ∈ Σ2, S ∈ Σ1 ∩ Σ2.S(ii) Suppose (An) is a sequence of sets in Σ1 ∩ Σ2. Then An ∈ Σ1 for all n ∈ N and so

⋃∞n=1 An ∈ Σ1.

But also An ∈ Σ2 for all n ∈ N and so⋃∞n=1 An ∈ Σ2. Hence

⋃∞n=1 An ∈ Σ1 ∩ Σ2.

S(iii) If A ∈ Σ1 ∩ Σ2, Ac ∈ Σ1 and Ac ∈ Σ2. Hence Ac ∈ Σ1 ∩ Σ2.

Note that the same argument can be used to show that if Σn, n ∈ N) are all σ-fields of subsets of S thenso is

⋂∞n=1 Σn.

Σ1 ∪Σ2 is not in general a σ-field for if A ∈ Σ1 and B ∈ Σ2 there is no good reason why A ∪B ∈ Σ1 ∪Σ2.For example let S = 1, 2, 3,Σ1 = ∅, 1, 2, 3, S,Σ2 = ∅, 2, 1, 3, S, A = 1, B = 2. ThenA ∪B = 1, 2 is neither in Σ1 nor Σ2.

1.4 (a) A∪B = [A− (A∩B)]∪ [B − (A∩B)]∪ (A∩B) is a disjoint union, hence using finite additivity and(1.3.2)

m(A ∪B) = m(A−A ∩B) +m(B −A ∩B) +m(A ∩B).Then

m(A ∪B) +m(A ∩B) = m(A−A ∩B) +m(B −A ∩B) + 2m(A ∩B)= [m(A−A ∩B) +m(A ∩B)]+ [m(B −A ∩B) +m(A ∩B)]= m(A) +m(B),

where we use the fact that A is the disjoint union of A−A ∩B and A ∩B, and the analogous resultfor B. Note that the possibility that m(A ∩B) =∞ is allowed for within this proof.

(b) m(A ∪ B) ≤ m(A ∪ B) +m(A ∩ B) = m(A) +m(B) follows immediately from (a) as m(A ∩ B) ≥ 0.The general case is proved by induction. We’ve just established n = 2. Now suppose the result holds

85

Page 87: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

for some n. Then

m

(n+1⋃i=1

Ai

)= m

(n⋃i=1

Ai ∪An+1

)

≤ m

(n⋃i=1

Ai

)+m(An+1)

≤n∑i=1

m(Ai) +m(An+1) =n+1∑i=1

m(Ai).

1.5 (a) We have that (km)(∅) = km(∅) = 0 because m(∅) = 0.If (An)n∈N is a sequence of disjoint measurable sets then

∞∑n=1

(km)(An) = k

∞∑n=1

m(An) = km

(∞⋃n=1

An

)= (km)

(∞⋃n=1

An

).

For the second inequality we use that m is σ-additive. Thus km is σ-additive.Thus km is a measure.Ifm is a finite measure, then by taking k = m(S) it follows immediately that P(·) = m(·)

m(S) is a measure.Noting that P(S) = m(S)

m(S) = 1, P is a probability measure.

(b) The uniform distribution m on ([a, b],B([a, b])) is given by

m(A) = λ(A)b− a

where λ denotes Lebesgue measure.(c) We have that (m+ n)(∅) = m(∅) + n(∅) = 0 + 0 = 0.

If (Aj)j∈N is a sequence of disjoint measurable sets then

∞∑j=1

(m+ n)(Aj) = limJ→∞

J∑j=1

m(Aj) + n(Aj)

= limJ→∞

J∑j=1

m(Aj) +J∑j=1

n(Aj)

=∞∑j=1

m(Aj) +∞∑j=1

n(Aj)

= m

(∞⋃j=1

Aj

)+ n

(∞⋃j=1

Aj

)

= (m+ n)

(∞⋃j=1

Aj

).

Here, the second follows because the sums are finite, and the third line follows because both series areincreasing (and hence their limits both exist). The fourth line follows by σ-additivity of m and n.Thus m+ n is a measure.

1.6 (a) We have mB(∅) = m(∅ ∩B) = m(∅) = 0.If (An)n∈N is a sequence of disjoint measurable sets then (An∩B)n∈N are also disjoint and measurable,hence

∞∑n=1

mB(An) =∞∑n=1

m(An ∩B) = m

(∞⋃n=1

An ∩B

)= m

((∞⋃n=1

An

)∩B

)= mB(

∞⋃n=1

An).

Here to deduce the second equality we use the σ-additivity of m.Thus mB is a measure.

86

Page 88: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

(b) Applying 1.5 part (a) to mB , it is immediate that PB is a probability measure.If m itself is a probability measure, say we write m = P, then PB is the conditional distribution of Pgiven that the event B occurs.

1.7 The easiest way to see that m is a measure is to first use 1.5 (a) and (b) and induction to show that ifm1,m2 . . . ,mn are measures and c1, c2, . . . , cn are non-negative numbers then c1m1 + c2m2 + · · ·+ cnmn isa measure. Now apply this with mj = δxj (1 ≤ j ≤ n). To get a probability measure we need

∑n

j=1 cj = 1for then, as δxj is a probability measure for all 1 ≤ j ≤ n, we have

m(S) =n∑j=1

cjδxj (S) =n∑j=1

cj = 1.

It’s also possible to check the definition directly, but it is a little more work that way.1.8 By definition (a, b) ∈ B(R). We’ve shown in the notes that a, b ∈ B(R) and so by S(ii), [a, b] =

a ∪ (a, b) ∪ b ∈ B(R).1.9 (a) (i) We have that A ∩ Ac = ∅ and A ∪ Ac = S, so m(A) +m(Ac) = m(S) = M . Because m(S) <∞

we have also that m(A) <∞, hence we may subtract m(A) and obtain m(Ac) = M −m(A).(ii) Let (An) be a decreasing sequence of sets. Then Bn = S \ An defines an increasing sequence of

sets, so by the first part of Theorem 1.6.1 we have m(Bn)→ m(B) where B = ∪jBj .By part (a) we have

m(Bn) = m(S \An) = m(S)−m(An)m(B) = m(∪jS \An) = m(S \ ∩jAj) = m(S)−m(∩jAj)

Thus m(S) − m(An) → m(S) − m(∩jAj). Since m(S) < ∞ we may subtract it, and aftermultiplying by −1 we obtain that m(An)→ m(∩jAj).

(b) Let S = R, Σ = B(R) and m = λ be Lebesgue measure on R. Set An = (−∞,−n]. Note that∩nAn = ∅ so λ(∩nAn) = 0. However, m(An) =∞ for all n, so m(An) 9 m(∩nAn) in this case.

1.10 (a) Note that each element of Π is a subset of S. Hence Π itself is a subset of the power set P(S) of S.Since S is a finite set, P(S) is also a finite set, hence Π is also finite.Part (b) requires you to keep a very clear head. To solve a question like this you have to explore whatyou have deduce from what else, with lots of thinking ‘if I knew this then I would also know that’and then trying to fit a bigger picture together, connecting your start point to your desired end point.Analysis can often be like this.

(b) (i) Suppose Πi ∩Πj 6= ∅. Note that Πi ∩Πj is a subset of both Πi and Πj .By definition of Π, any subset of Πi is either equal to Πi or is equal to ∅. Since we assume thatΠi ∩Πj 6= ∅, we therefore have Πi = Πi ∩Πj . Similarly, Πj = Πi ∩Πj .Hence Πi = Πj , but this contradicts the fact that the Πi are distinct from each other. Thus wehave a contradiction and in fact we must have Πi ∩Πj = ∅.

(ii) By definition of Π we have ∪ki=1Πi ⊆ S. Suppose ∪ki=1Πi 6= S. Then C = S \ ∪ki=1Πi is anon-empty set in Σ.Since C is disjoint from all the Πi, we must have C /∈ Π. Noting that C ∈ Σ, by definition of Πthis implies that there is some1 B1 ⊂ C such that B1 6= ∅.We have that B1 is disjoint from all the Πi, so we must have B1 /∈ Π. Thus by the same reasoning(as we gave for C) there exists B2 ⊂ B1 such that B2 6= ∅. Iterating, we construct an infinitedecreasing sequence of sets C ⊃ B1 ⊃ B2 ⊃ B3 . . . each strictly smaller than the previous one,none of which are empty. However, this is impossible because C ⊆ S is a finite set.

(iii) Let i ∈ I. So Πi ∩A 6= ∅. Noting that Πi ∩A ⊆ Πi, by definition of Π we must have Πi ∩A = Πi.That is, Πi ⊆ A. Since we have this for all i ∈ I, we have ∪i∈IΠi ⊆ A.Now suppose that A \ ∪i∈IΠi 6= ∅. Since by (ii) we have S = ∪ki=1Πi, and the union is disjointby (i), this means that there is some Πj with j /∈ I such that A ∩Πj 6= ∅. However A ∩Πj ⊆ Πj

so by definition of Π we must have Πj ∩A = Πj . That is Πj ⊆ A, but then we would have j ∈ I,which is a contraction.Thus A \ ∪i∈IΠi must be empty, and we conclude that A = ∪i∈IΠi.

1X ⊂ Y means that X ⊆ Y and X 6= Y i.e. X is strictly smaller than the set Y

87

Page 89: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Chapter 22.1 (a) If x ∈ A and x ∈ B, lhs = 1 and rhs= 1 + 1− 1 = 1,

If x ∈ A and x /∈ B, lhs = 1 and rhs= 1 + 0− 0 = 1,If x /∈ A and x ∈ B, lhs = 1 and rhs= 0 + 1− 0 = 1,If x /∈ A and x /∈ B, lhs = 0 and rhs= 0 + 0 − 0 = 0, and so we have equality of lhs and rhs in allpossible cases.

(b) If x ∈ A, x /∈ Ac so lhs = 1 and rhs = 1− 0 = 1,if x /∈ A, x ∈ Ac so lhs = 0 and rhs = 1− 1 = 0.

(c) Since A = B ∪ (A−B) and B ∩ (A−B) = ∅, we can apply (a) to find that 1A = 1B + 1A−B .(d) The lhs and rhs are both non-zero only in the case where x ∈ A and x ∈ B when both lhs and rhs are

1.For the last part, if x /∈ A then x /∈ An for all n ∈ N and so lhs = rhs = 0. If x ∈ A then x ∈ An for oneand only one n ∈ N and so lhs = rhs = 1.

2.2 (a) Since for all n ∈ N, supk≥n ak = − infk≥n(−ak), we have

lim supn→∞

an = limn→∞

supk≥n

ak = limn→∞

(− infk≥n

(−ak))

= − lim infn→∞

(−an).

(b) Since for all n ∈ N, supk≥n(ak + bk) ≤ supk≥n ak + supk≥n bk, the result is obtained similarly to (a)by taking limits on both sides.

(c) Argue as in (b) noting that the inequality is reversed for inf, or use (a) and (b) to argue that

lim infn→∞

(an + bn) = − lim supn→∞

(−an − bn)

≥ − lim supn→∞

(−an)− lim supn→∞

(−bn)

= lim infn→∞

an + lim infn→∞

bn.

(d) Use the fact that for all n ∈ N, supk≥n(akbk) ≤(supk≥n ak

) (supk≥n bk

)and argue as in (b).

(e) Use the fact that for all n ∈ N, infk≥n(akbk) ≥ (infk≥n ak) (infk≥n bk) and argue as in (d).(f) Since 0 ≤ lim infn→∞ |an| ≤ lim supn→∞ |an| = 0, we must have lim infn→∞ |an| = 0 and so 0 =

lim infn→∞ |an| = lim supn→∞ |an| from which it follows that limn→∞ |an| = 0 and hence limn→∞ an =0.

2.3 If a < c, f−1((a,∞)) = S ∈ Σ and if a ≥ c, f−1((a,∞)) = ∅ ∈ Σ.2.4 If f is measurable then, writing (a, b) = R \ ([a,∞) ∪ [−∞, b]) and using the properties of pre-images,

f−1((a, b)) = f−1(R) \(f−1([a,∞)) ∪ f−1((∞, b])

),

which by Theorem 2.2.1 shows that f−1((a, b)) ∈ Σ. Note that if either a or b are infinite, the correspondinghalf interval above will be the empty set (which has empty pre-image).Conversely, suppose that we have f−1((a, b)) ∈ Σ for all −∞ ≤ a < b ≤ ∞. Taking b = ∞, we have thatf−1((a,∞)) ∈ Σ, which shows that f is measurable.

2.5 (a) For any a > 0, we have (f + c)−1((a,∞)) = x ∈ R ; f(x) + c > a = x ∈ R ; f(x) > a − c =f−1((a− c,∞)) ∈ Σ by measurability of f .

(b) First note that if k = 0 then (kf)(x) = 0 for all x, so in this case f is measurable by 2.3.Consider when k > 0. For any a > 0, we have (kf)−1((a,∞)) = x ∈ R ; kf(x) > a = x ∈R ; f(x) > a/k = f−1((a/k,∞)) ∈ Σ by measurability of f .For k < 0, we can write kf = −(−kf). The function −kf is measurable by the above, because −k > 0.Multiplying by −1 to obtain −(−kf) preserves measurability by Theorem 2.4.6 , where we use thatthe constant function g ≡ −1 is measurable.Follow-up exercise: Prove the k < 0 case without using Theorem 2.4.6.

2.6 (g f)−1((a,∞)) = f−1(g−1(a,∞). Now g is Borel measurable and so g−1((a,∞)) = A ∈ B(R). Hence byTheorem 2.2.3, f−1(A) ∈ Σ. So we conclude that (g f)−1((a,∞)) ∈ Σ and so g f is measurable.If X : Ω→ R is a random variable then it is a measurable function from (Ω,F) to (R,B(R)). If g : R→ Ris Borel measurable then g(X) = g X is again a random variable by what we have just shown. If g is notBorel measurable then we must be wary of interpreting g(X) as a random variable, unless we can directlyprove that it is measurable using some other technique.

88

Page 90: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

2.7 For any a > 0, we have

h−1((a,∞)) = x ∈ R ; f(x+ y) > (a,∞) = z − y ∈ R ; f(z) > a = (f−1((a,∞)))−y.

Here we use the notation Ay = a+y ; a ∈ A from Section 1.4. Using that Ay ∈ B(R) whenever A ∈ B(R),we have that h−1((a,∞)) ∈ B(R), and hence h is measurable.Alternative: Write h = f τy where τy(x) = x + y. The mapping τy is continuous and hence measurableand so h is measurable by Theorem 2.4.5.

2.8 (a) If f(x) > 0 then f+(x) = f(x) and f−(x) = 0. If f(x) < 0 then f+(x) = 0 and f−(x) = −f(x). Iff(x) = 0 then f+(x) = f−(x) = 0. In all cases we have f(x) = f+(x)− f−(x).

(b) Using the same cases as in (a), in all cases we have |f(x)| = f+(x) + f−(x).(c) By Corollary 2.4.2, f+ and f− are measurable whenever f is. By Theorem 2.4.4, the sum of measurable

function is measurable, hence |f | = f+ + f− is measurable.2.9 If f is differentiable then it is continuous and so measurable by Corollary 2.3.1. For each x ∈ R, f ′(x) =

limh→0f(x+h)−f(x)

h. Now x→ f(x+ h) is measurable by Problem 2.7, and x→ f(x+h)−f(x)

his measurable

by Theorem 2.3.1 and Problem 2.6(b). Finally f ′ is measurable by Theorem 2.3.5.2.10 There are 4 possibilities to consider for a given c ∈ R: (i) If f(x) = c then by monotonicity f−1((−∞, c]) =

(−∞, x] ∈ B(R). (ii) If f(x) < c for all x then f−1((−∞, c]) = R ∈ B(R). (iii) If f has a discontinuityso that c is not in its range, let α = supx ∈ R; f(x) ≤ c then f−1((−∞, c]) = (−∞, α) ∈ B(R). (iv) Iff(x) > c for all x then f−1((−∞, c]) = ∅ ∈ B(R)

2.11 (a) Let A be the set of measure zero on which fn fails to converge to f . Then limn→ fn(x) = f(x) for allx ∈ S −A. But then by algebra of limits limn→ fn(x)2 = f(x)2 for all x ∈ S −A.

(b) A be the set of measure zero on which fn fails to converge to f and B be the set of measure zeroon which gn fails to converge to g. Now m(A ∪ B) ≤ m(A) + m(B) = 0 and by algebra of limitslimn→(fn(x) + gn)(x) = f(x) + g(x) for all x ∈ S − (A ∪B).

(c) This follows by writing fngn = 14 [(fn + gn)2 − (fn − gn)2] and using the results of (a) and (b).

2.12 (a) Its sufficient to consider the case where x = a. Then for any ε > 0 and arbitrary δ, f(a − δ) = 0 <f(a) + ε = 1 + ε and f(a+ δ) = f(a) < f(a) + ε = 1 + ε.

(b) Its sufficient to consider the case x = n for some integer n. Again for any ε > 0 and arbitraryδ, f(n− δ) = n− 1 < f(n) + ε = n+ ε and f(n+ δ) = n < f(n) + ε = n+ ε.

(c) Let U = f−1((−∞, a)). We will show that U is open. Then it is a Borel set and f is measurable. Fixx ∈ U and let ε = a− f(x). Then there exists δ > 0 so that |x− y| < δ ⇒ f(y) < f(x) + ε = a and soy ∈ U . We have shown that for each x ∈ U there exists an open interval (of radius δ) so that if y isin this interval then y ∈ U . Hence U is open.

Chapter 33.1 f = 1[−2,−1) + 21[0,1) + 1[1,2).∫

R f(x)dx = 1 + 2 + 1 = 4.

3.2 If f =∑n

i=1 ci1Ai , then

f1A =n∑i=1

ci1Ai1A =n∑i=1

ci1Ai∩A =n∑i=1

ci1Ai∩A + 01S\A,

by Problem 2.1(d). Note that A ∩A1, . . . , A ∩An, S \A are disjoint sets, with union S.If f ≥ 0, ci ≥ 0(1 ≤ i ≤ n) and so f1A ≥ 0.If we assume that m(A) < ∞, then IAf =

∑n

i=1 ci1m(Ai ∩ A) < ∞, because in this case m(Ai ∩ A) <m(A) <∞ for all 1 ≤ i ≤ n.

3.3 We have already proved part (1) of Theorem 3.3.1. It remains to prove parts (2)-(4).

(2) Let α > 0. We have ∫S

αfdm = sup∫

S

s dm ; s is simple, 0 ≤ s ≤ αf

= sup∫

S

s dm ; s is simple, 0 ≤ 1αs ≤ f

89

Page 91: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

= sup∫

S

αr dm ; r is simple, 0 ≤ r ≤ f

= α sup∫

S

r dm ; r is simple, 0 ≤ r ≤ f

= α

∫S

fdm.

Here, to deduce the third line we use that s is simple if and only if r = 1αs is simple.

(3) Since A ⊆ B we have 1A ≤ 1B , which means 1Af ≤ 1Bf . This part now follows from part (1).(4) We have m(A) = 0. Suppose s is a non-negative simple function such that 0 ≤ s ≤ 1Af , and write

s =∑n

i=1 ci1Ai . Hence, for any given i, either ci = 0 or we must have Ai ⊆ A, implying m(Ai) = 0.Thus

∫Ss dm =

∑n

i=1 cim(Ai) = 0. Thus∫Sf dm = 0.

3.4 Imitating the proof of Lemma 3.3.2, let A = x ∈ S; |f(x)| ≥ c. Then∫S

f2dm ≥∫A

f2dm ≥ c2m(A),

and so m(A) ≤ 1c2

∫Sf2dm.

The generalisation to p ≥ 1 is

m(x ∈ S; |f(x)| ≥ c) ≤ 1cp

∫S

|f |pdm,

and it is proved similarly. Note that when p is odd, we need to replace f by |f | inside the integral to ensurenon-negativity.

3.5 We apply Corollary 3.3.4 to the function g = |f |p, which is clearly non-negative and is measurable byTheorem 2.4.5 (compose f with the continuous function x 7→ |x|p). Thus |f |p = 0 a.e. which implies thatf = 0 a.e.

3.6 We have

f+ = 1[−1,0) + 31[1,2),

f− = 1[−2,−1) + 21[0,1),∫Rf(x)dx =

∫Rf+(x)dx−

∫Rf−(x)dx = (1 + 3)− (1 + 2) = 1.

3.7 (1) Using Theorem 3.3.1 (2), if c ≥ 0,∫S

cfdm =∫S

cf+dm−∫S

cf−dm = c

∫S

f+dm− c∫S

f−dm = c

∫S

fdm.

If c = −1, (−f)+ = f− and (−f)− = f+ and so∫S

(−f)dm =∫S

f−dm−∫S

f+dm = −(∫

S

f+dm−∫S

f−dm

)= −

∫S

fdm.

Finally if c < 0(c 6= −1) write c = −d where d > 0 and use the two cases we’ve just proved.(3) If f ≤ g then g−f ≥ 0 so by Theorem 3.3.1 (1),

∫S

(g−f)dm ≥ 0. But by (1) and (2) this is equivalentto∫Sgdm−

∫Sfdm ≥ 0, i.e.

∫Sgdm ≥

∫Sfdm, as required.

3.8 (a) Noting f+ and f− are both non-negative, with non-negative integrals, we have∣∣∣∣∫S

f dm

∣∣∣∣ =∣∣∣∣∫S

f+ dm−∫S

f− dm

∣∣∣∣ ≤ ∫S

f+ dm+∫S

f− dm =∫S

|f | dm.

(b) By the triangle inequality we have |f(x) + g(x)| ≤ |f(x)|+ |g(x)|. Thus by monotonicity and linearity(from Theorem 3.5.1) we obtain∫

S

|f + g| dm ≤∫S

|f |+ |g| dm =∫S

|f | dm+∫S

|g| dm.

90

Page 92: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.9 Reflexivity is obvious as f(x) = f(x) for all x ∈ S. So is symmetry, because f(x) = g(x) almost everywhereif and only if g(x) = f(x) almost everywhere. For transitivity, let A = x ∈ S; f(x) 6= g(x), B = x ∈S; g(x) 6= h(x) and C = x ∈ S; f(x) 6= h(x). Then C ⊆ A ∪B and so m(C) ≤ m(A) +m(B) = 0. Thusif f = g a.e. and g = h a.e. we have f = h a.e.

3.10 Let x ∈ R be arbitrary. Then we can find n0 ∈ N so that 1n0

< |x| and then for all n ≥ n0, fn(x) =n1(0,1/n)(x) = 0. So we have proved that limn→ fn(x) = 0. But for all n ∈ N∫

R|fn(x)− 0|dx = n

∫R1(0,1/n)(x)dx = n.

1n

= 1,

and so we cannot find any function in the sequence that gets arbitrarily close to 0 in the L1 sense.3.11 First suppose that f1A is integrable. Then for all n ∈ N, |f |1An ≤ |f |1A and so f1An is integrable by

monotonicity. It follows thatn∑r=1

∫S

|f |1Ardm =∫S

|f |1⋃n

r=1Ardm <∞.

Now |f |1⋃n

r=1Ar

increases to |f |1A as n→∞ and so by the monotone convergence theorem,

∞∑r=1

∫S

|f |1Ardm = limn→∞

∫S

|f |1⋃n

r=1Ardm =

∫S

|f |1Adm <∞.

Conversely if f1An is integrable for each n ∈ N and∑∞

n=1

∫An|f |dm <∞, we have by Theorem 3.3.2 that∫

S

|f |1Adm =∫S

|f |1⋃∞n=1

Andm

=∞∑n=1

∫An

|f |dm <∞.

3.12 f − fn ≥ 0 for all n ∈ N so by Fatou’s lemma:

lim infn→∞

∫S

(f − fn)dm ≥∫S

lim infn→∞

(f − fn)dm.

i.e.∫S

fdm+ lim infn→∞

∫S

(−fn)dm ≥∫S

fdm+∫S

lim infn→∞

(−fn)dm,

and so lim infn→∞

−(∫

S

fndm

)≥∫S

lim infn→∞

(−fn)dm.

Multiplying both sides by −1 reverses the inequality to yield

− lim infn→∞

−(∫

S

fndm

)≤∫S

(− lim inf

n→∞(−fn)

)dm.

But then by definition of lim supn→∞ we have

lim supn→∞

∫S

fndm ≤∫S

lim supn→∞

fndm.

3.13 Integrability follows easily from the facts that | cos(αx)| ≤ 1 and | sin(βx)| ≤ 1 for all x ∈ R. As| cos(x/n)f(x)| ≤ |f(x)| for all x ∈ R and f is integrable, we may use the dominated convergence the-orem to deduce that

limn→∞

∫R

cos(x/n)f(x)dx =∫R

limn→∞

cos(x/n)f(x)dx =∫Rf(x)dx,

since limn→∞ cos(x/n) = cos(0) = 1 for all x ∈ R.

91

Page 93: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.14 Define fn(x) = f(tn, x) for each n ∈ N, x ∈ S. Then |fn(x)| ≤ g(x) for all x ∈ S. Since g is integrable, bydominated convergence

limn→∞

∫S

f(tn, x)dm(x) =∫S

limn→∞

fn(x)dm(x)

=∫S

limn→∞

f(tn, x)dm(x)

=∫S

f(t, x)dm(x),

where we used the continuity assumption (ii) in the last step.

3.15 Let (hn) be an arbitrary sequence such that hn → 0 and define an,t(x) = f(tn+h,x)−f(t,x)hn

.Since ∂f

∂texists we have an,t(x) → ∂f

∂t(x, t) as n → ∞ for all x. By the mean value theorem there exists

θn ∈ [0, 1] such that an,t(x) = ∂f∂t

(t + θnh, x), hence |fn(x)| ≤ h(x). Thus by dominated convergence∫San,t(x) dm(x)→

∫S

∂f∂t

(t, x) dm(x).By linearity of the integral we have

∂t

∫S

f(t, x) dm(x) = limn→∞

1hn

(∫S

f(t+ hn, x) dm(x)−∫S

f(t, x) dm(x))

= limn→∞

∫S

an,t(x) dm(x)

and the result follows.3.16 (a) For each x ∈ R, n ∈ N, the expression for fn(x) is a telescopic sum. If you begin to write it out, you

see that terms cancel in pairs and you obtain

fn(x) = −2xe−x2

+ 2(n+ 1)2xe−(n+1)2x2.

Using the fact that limN→∞N2e−yN

2= 0, for all y ∈ R we find that

limn→∞

fn(x) = f(x) = −2xe−x2.

(b) The functions f and fn are continuous and so Riemann integrable over the closed interval [0, a]. Wecan calculate (which is left for you) that

∫ a0 f(x)dx = −2

∫ a0 xe

−x2dx = e−a

2− 1. But on the other

hand ∫ a

0fn(x)dx =

n∑r=1

∫ a

0[−2r2xe−r

2x2+ 2(r + 1)2xe−(r+1)2x2

]dx

=n∑r=1

(e−r2a − e−(r+1)2a)

= e−a2− e−(n+1)2a → e−a

2as n→∞.

So we conclude that∫ a

0 f(x)dx 6= limn→∞∫ a

0 fn(x)dx.

3.17 Using the fact that |e−ixy| ≤ 1, we get by Theorem 3.5.1,

|f(y)| ≤∫R|e−ixy|.|f(x)|dx ≤

∫R|f(x)|dx <∞.

For the linearity, we have

af + bg(y) =∫Re−ixy(af(x) + bg(x))dx

= a

∫Re−ixyf(x)dx+ b

∫Re−ixyg(x)dx

= af(y) + bg(y).

92

Page 94: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

3.18 x→ 1Q(x) cos(nx) is integrable as |1Q(x) cos(nx)| ≤ | cos(nx)| for all x ∈ R and x→ cos(nx) is integrable.Similarly x→ 1Q(x) sin(nx) is integrable. So the Fourier coefficients an and bn are well-defined as Lebesgueintegrals. As | cos(nx)| ≤ 1, we have an = 0 for all n ∈ Z+ since,

|an| ≤1π

∫ π

−π1Q(x)| cos(nx)|dx

≤ 1π

∫ π

−π1Q(x)dx = 0.

By a similar argument, bn = 0 for all n ∈ N. So it is possible to associate a Fourier series to 1Q, but thisFourier series will converges to zero!This illustrates that pointwise convergence is not the right tool for examining convergence of Fourier series!

3.19 First observe that fa is measurable, by Problem 2.7 (take y = −a there). For integrability, we use∫R|fa(x)|dx =

∫R|f(x− a)|dx =

∫R|f(x)|dx <∞.

Thenfa(y) =

∫Re−ixyf(x− a)dx,

and the result follows on making a change of variable u = x− a.3.20 Let y ∈ R and (yn) be an arbitrary sequence converging to y as n→∞. We need to show that the sequence

(f(yn)) converges to f(y). We have

|f(yn)− f(y)| =∣∣∣∣∫

Re−ixynf(x)dx−

∫Re−ixyf(x)dx

∣∣∣∣≤

∫R|e−ixyn − e−ixy|.|f(x)|dx.

Now |e−ixyn − e−ixy| ≤ |e−ixyn |+ |e−ixy| = 2 and the function x→ 2f(x) is integrable. Also the mappingy → e−ixy is continuous, and so limn→∞ |e−ixyn − e−ixy| = 0. The result follows from these two facts, andthe use of Lebesgue’s dominated convergence theorem.

3.21 To prove that y → f(y) is differentiable, we need to show thatlimh→0(f(y + h)− f(y))/h exists for each y ∈ R. We have

f(y + h)− f(y)h

= 1h

∫R(e−ix(y+h) − e−ixy)f(x)dx

=∫Re−ixy

(e−ihx − 1

h

)f(x)dx.

Since |e−ixy| ≤ 1, and using the hint with b = hx, we get∣∣∣∣ f(y + h)− f(y)h

∣∣∣∣ ≤ ∫R

∣∣∣∣e−ihx − 1h

∣∣∣∣ .|f(x)|dx

≤∫R|x||f(x)|dx <∞.

Then we can use Lebesgue’s dominated convergence theorem to get

limh→0

f(y + h)− f(y)h

=∫Re−ixy lim

h→0

(e−ihx − 1

h

)f(x)dx

= −i∫Re−ixyxf(x)dx = −ig(y),

and the result is proved. In the last step we used

limh→0

e−ihx − 1h

= d

dye−ixy

∣∣∣∣y=0

= −ix.

93

Page 95: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

Chapter 44.1 Monotone Convergence Theorem. Let (Xn) be an increasing sequence of non-negative random variables

which converges pointwise to a random variable X, i.e. limn→∞Xn(ω) = X(ω) for all ω ∈ Ω (*). Then

limn→∞

E(Xn) = E(X).

Fatou’s Lemma. Let (Xn) be a sequence of non-negative random variables, then

lim infn→∞

E(Xn) ≥ E(

lim infn→∞

Xn

).

Dominated Convergence Theorem. Let (Xn) be a sequence of random variables which converges pointwise(*) to a random variable X. Suppose that there exists an integrable, non-negative random variable Y sothat |Xn(ω)| ≤ Y (ω) for all n ∈ N and all ω ∈ Ω. Then X is integrable and

limn→∞

E(Xn) = E(X).

(*) We can in fact replace pointwise convergence by convergence almost everywhere, i.e. limn→∞Xn(ω) =X(ω) for all ω ∈ Ω−A, where A ∈ F and P(A) = 0.

4.2 We have P[A ∩B] = P[A] ∩ P[B]. Noting that Ac ∩Bc = (A ∪B)c, we have

P[Ac ∩Bc] = P[(A ∪B)c] = 1− P[A ∪B]= 1− P[A]− P[B] + P[A ∩B]= 1− P[A]− P[B]− P[A]P[B]= (1− P[A])(1− P[B])= P[Ac]P[Bc].

Hence Ac and Bc are independent.4.3 (a) Define Bn = ∩ni=1Ai. Then (Bn) is a decreasing sequence of sets and, since P is a finite measure,

by Theorem 1.6.1 we have P[Bn] → P[∩∞i=1Bi] as n → ∞. Since ∩∞i=1Ai = ∩∞i=1Bi we thus haveP[∩∞i=1Ai] = limn→∞ P[∩ni=1Ai]. Using independence on the right hand side, we obtain

P[∩∞i=1Ai] = limn→∞

P[A1]P[A2] . . .P[An] =∞∏i=1

P[Ai]

as required. Note that the limit on the right hand side exists because P[A1]P[A2] . . .P[An] is decreasingas n increases.

(b) If P(An) < 1− κ for infinitely many n, where κ > 0 does not depend on n, then∏∞n=1 P(An) = 0 so

P(⋂∞

n=1 An)

=∏∞n=1 P(An) would hold in, for example, the case where all the (An) were disjoint.

Disjoints events are always dependent (because if one occurs then all the others do not!), so clearlythis ‘alternative’ definition is not what want.

4.4 Let (an) be a sequence diverging to ∞. We may without loss of generality assume that it is monotonicincreasing. Define An = ω ∈ Ω;X(ω) ≤ an. Then (An) increases to Ω and by Theorem 4.2.1,

limx→∞

F (x) = limn→∞

P(An) = P(Ω) = 1.

Next let Bn = ω ∈ Ω;X(ω) ≤ −an. Then (Bn) decreases to ∅ and by Theorem 4.2.1,

limx→−∞

F (x) = limn→∞

P(Bn) = P(∅) = 0.

4.5 If A,B ∈ B(R) and f, g are Borel measurable, then f−1(A), g−1(B) ∈ B(R) and so

P(f(X) ∈ A, g(Y ) ∈ B) = P(X ∈ f−1(A), Y ∈ g−1(B))= P(X ∈ f−1(A))P(Y ∈ g−1(B))= P(f(X) ∈ A)P(g(Y ) ∈ B).

94

Page 96: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.6 Using the usual notation, a ∨ b = maxa, b,

E(maxX, a) =∫R(x ∨ a)dpX(x).

Since x ∨ a ≥ x and x ∨ a ≥ a, by monotonicity∫R(x ∨ a)dpX(x) ≥

∫R xdpX(x) = E(X) and

∫R(x ∨ a)dpX(x) ≥

∫R adpX(x) = a, and so

E(maxX, a) ≥ maxE(X), a.

4.7 (a) Let Ak = ω ∈ Ω;X(ω) = k. If ω ∈ Ak, X(ω) = k and∞∑i=1

1X≥i(ω) = 1X≥1(ω) + 1X≥2(ω) + · · ·+ 1X≥k(ω) = k.

The result follows since we have the disjoint union Ω =⋃∞k=1 Ak.

By monotone convergence

E(X) = E

(∞∑i=1

1X≥i

)=∞∑i=1

E(1X≥i) =∞∑i=1

P(X ≥ i).

(b) Let ω ∈ Ai then i− 1 ≤ X(ω) < i and so∞∑i=1

(i− 1)1Ai (ω) ≤ X(ω) <∞∑i=1

i1Ai (ω),

from which the first result follows. For the second result, first observe that∞∑i=1

i1Ai (ω) =∞∑i=1

(i− 1)1Ai (ω) +∞∑i=1

1Ai (ω) =∞∑i=1

(i− 1)1Ai (ω) + 1,

since∑∞

i=1 1Ai = 1⋃∞

i=1Ai

= 1Ω = 1. Thus

∞∑k=1

(k − 1)P(Ak) ≤ E(X) < 1 +∞∑k=1

(k − 1)P(Ak).

But P(Ak) = P(k − 1 ≤ X < k) for each k ∈ N and

N∑k=1

(k − 1)P(k − 1 ≤ X < k) = P(1 ≤ X < 2) + 2P(2 ≤ X < 3)

+ 3P(3 ≤ X < 4) + · · ·+ (N − 1)P(N − 1 ≤ X < N)= P(1 ≤ X < N) + P(2 ≤ X < N)+ P(3 ≤ X < N) + · · ·+ P(N − 1 ≤ X < N)

=N−1∑k=1

P(k ≤ X < N) + P(N − 1 ≤ X < N)

→∞∑k=1

P(X ≥ k) as N →∞.

In the last step we used the fact that as N →∞,

P(N − 1 ≤ X < N) = P(X < N)− P(X < N − 1)→ 1− 1 = 0.

4.8 Let Em be the event that starting at the mth toss, k consecutive heads appear. Then P[Em] = 1/2k.Set An = Em+kn and then the (An) are independent. Moreover,

∑∞r=1 P[An] = ∞, so by the second

Borel-Cantelli lemma P[An i.o.] = 1.

95

Page 97: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.9 (a) You might reasonably think that this is obvious - if (An) occurs eventually then it occurs for all nafter some N , and of course there are infinitely many such n so then (An) occurs infinitely often. Let’sgive a proof anyway.Suppose ω ∈ An e.v. =

⋃m

⋂n≥mAn. Then, for at least one value of m, we have ω ∈ An for all

n ≥ m. Hence, ω ∈⋃n≥k An for all k, which implies ω ∈

⋂k

⋃n≥k An = An i.o..

(b) By De Morgan’s laws we have

Ω \ An i.o. = Ω \

(⋂m

⋃n≥m

An

)=⋃m

(Ω \

( ⋃n≥m

An

))=⋃m

⋂n≥m

Ω \An = Ω \An e.v..

It follows immediately that 1− P[An i.o.] = P[Acn e.v.].(c) Define Bm = ∩n≥mAn and note that Bm is increasing and that P[Bm] ≤ P[Am] because Bm ⊆ Am.

Thus by Theorem 4.2.1 we have

P[An e.v.] = P[∪mBm] = limm→∞

P[Bm] = lim infm→∞

P[Bm] ≤ lim infm→∞

P[Am].

Note that we must switch from lim to lim inf before using P[Bm] ≤ P[Am], because we cannot be sureif limn P[An] exists (and in general it will not).Using (b), we then have

P[An i.o.] = 1−P[Acn e.v.] ≥ 1−lim infm→∞

P[Acm] = 1−lim infm→∞

(1−P[Am]) = − lim infm→∞

−P[Am] = lim supm→∞

P[Am].

Putting our two equations together gets the result.4.10 If f is an indicator function: f = 1A for some A ∈ B(R):∫

Ω1A(X(ω))dP(ω) = P(X ∈ A) = pX(A) =

∫R1A(x)pX(dx),

and so the result holds in this case. It extends to simple functions by linearity. If f is non-negative andbounded ∫

Ωf(X(ω))dP(ω) = sup

∫Ωg(ω)dP(ω); g simple on Ω, 0 ≤ g ≤ f X

= sup

∫Ωh(X(ω))dP(ω);h simple on R, 0 ≤ h X ≤ f X

= sup

∫Rh(x)pX(dx);h simple, 0 ≤ h ≤ f

=∫Rf(x)dpX(x).

In the general case write f = f+ − f−. If f is non-negative but not necessarily bounded, the result stillholds but both integrals may be (simultaneously) infinite.

4.11 This follows immediately from the result of Problem 3.4, when you replace f by X − µ.4.12 (a) By linearity, the quadratic function g(t) = E(X2) + 2tE(XY ) + t2E(Y 2) ≥ 0 for all t ∈ R. A non-

negative quadratic function has at most one real root, and hence has a non-positive discriminant(i.e. b2 − 4ac ≤ 0). Hence 4E(XY )2 − 4E(X2)E(Y 2) ≤ 0 and the result follows.

(b) Put Y = 1 in the Cauchy-Schwarz inequality from (a) to get E(|X|) ≤ E(X2) 12 < ∞. So X is

integrable. By Problem 3.8 part (a) |E(X)| ≤ E(|X|). Combining our two inequalities gives |E(X)|2 ≤E(X2).

(c) If E[X2] <∞ then by part (b) X is also integrable, so by linearity we have

var(X) = E[(X − µ)2] = E[X2]− 2µE[X] + µ2 = E[X2]− µ2.

Hence var(X) <∞.Conversely, suppose that var(X) < ∞, and note that by assumption we also have E[X] < ∞. Wecan write X2 = (X − E[X])2 + 2XE[X] − E[X]2 and note that all terms here are integrable by ourassumptions, thus

E[X2] = var(X) + 2E[X]E[X]− E[X]2 = var(X)− E[X]2.

Hence E[X2] is finite.

96

Page 98: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.13 Write |X| = |X|1|X|≤1 + |X|1|X|>1 and then we get |X|m = |X|m1|X|≤1 + |X|m1|X|>1. Usingthe facts that |x|m ≤ 1 whenever |x| ≤ 1 and |x|m ≤ |x|n whenever |x| ≥ 1, we find by linearity andmonotonicity that

E(|X|m) = E(|X|m1|X|≤1) + E(|X|m1|X|>1)≤ 1 + E(|X|n1X>1)≤ 1 + E(|X|n) <∞.

4.14 (Xn) converges to X in probability since given any ε > 0 and c > 0 we can find n0 ∈ N, which we writen0 = 2m+ r for some natural number m where r = 0, 1, 2, . . . , 2m−1, so that 1

2mc< ε. Then for all n > n0,

by Markov’s inequality

P(|Xn −X| > c) = P(1An > c) ≤ E(1An )c

<1

2mc < ε.

On the other hand (Xn) cannot converge to X almost surely since given any n ∈ N no matter how large,we can find m > n so that Am and An are disjoint (with P(An) > 0) and so 1An (ω)− 1Am (ω) = 1− 0 = 1for all ω ∈ An.

4.15 Suppose f = f1 + if2 is integrable. Then both f1 and f2 are integrable. The integrability of |f | =√f2

1 + f22

follows immediately from the inequality√f2

1 + f22 ≤ |f1|+ |f2|. For the converse use |f1| ≤

√f2

1 + f22 and

|f2| ≤√f2

1 + f22 .

4.16 First suppose that we have established the case for Y , i.e. we know that ΦY (u) = e−12u

2for all u ∈ R.

Then since X = µ+ σY , we have

ΦX(u) = E(eiu(µ+σY ))

= eiuµE(ei(uσ)Y ) = eiµu−12σ

2u2,

as was required. To establish the result for Y we write

ΦY (u) = 1√2π

∫Reiuye−

12 y

2dy

= 1√2π

∫R

cos(uy)e−12 y

2dy + i

1√2π

∫R

sin(uy)e−12 y

2dy.

As | cos(uy)ye− 12u

2| ≤ |y|e−

12 y

2and | sin(uy)ye− 1

2u2| ≤ |y|e−

12 y

2and y → |y|e− 1

2 y2is integrable on R, we

may apply Problem 3.15 to deduce that u→ ΦY (u) is differentiable and its derivative at u ∈ R is

Φ′Y (u) = i√2π

∫Reiuyye−

12 y

2dy.

Now integrate by parts to find that

Φ′Y (u) = i√2π

[−eiuye−

12 y

2]∞−∞− 1√

∫ ∞−∞

ueiuye−12 y

2dy

= −uΦY (u).

So we have the initial value problem dΦY (u)du

= −uΦY (u) with initial condition, ΦY (0) = 1 and the resultfollows by using the standard separation of variables technique.

4.17 First note that by Problem 4.12, for all 1 ≤ m ≤ n,E(|X|m) is finite and so the mapping y → ym is pXintegrable. We also have that for all u, y ∈ R, |imymeiuy| ≤ |y|m Hence we can apply Problem 3.15 todifferentiate up to and including n times under the integral sign to obtain

dn

dunΦX(u) =

∫RinyneiuydpX(y).

Now let u = 0 to find that

dn

dunΦX(u)

∣∣∣u=0

= in∫RyndpX(y) = inE(Xn).

97

Page 99: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

4.18 (a) Since e−ax ≤ 1 for all x ≥ 0 we have

E(e−aX) =∫ ∞

0e−axdpX(x) ≤

∫ ∞0

dpX(x) = 1.

(b)

E(ea|X| = 1√2π

∫Reu|y|e−

12 y

2dy

= 1√2π

∫ 0

−∞e−uye−

12 y

2dy + 1√

∫ ∞0

euye−12 y

2dy

= 2. 1√2π

∫ ∞0

euye−12 y

2dy

= 2e12a

2 1√2π

∫ ∞0

e−12 (y−a)2

dy

= 2e12a

2 1√2π

∫ ∞−a

e−12 y

2dy

= 2e12a

2P(X > −a) <∞.

[Note that the same argument can be used to establish that X has the moment generating functionE(eaX) = e

12a

2.

(c) Using the fact that ea|x| =∑∞

n=0an|x|nn! for all x ∈ R we see that for each n ∈ N, |x|n ≤ n!

an ea|x| and

so by monotonicity:E(|X|n) ≤ n!

anE(ea|X|) <∞.

4.19 Its sufficient to assume that E(Xn) = 0 for all n ∈ N. Indeed if this is not the case, just replace Xn withXn − µ. The proof proceeds in exactly the same way as when the random variables are independent oncewe have made the following calculation:

Var(X) = 1n2 E

((n∑i=1

Xi

)2)

= 1n2

n∑i=1

n∑j=1

E(XiXj)

= 1n2

n∑i=1

E(X2i )

= σ2

n.

4.20 In this case µ = p and σ =√p(1− p) and so we can write

limn→∞

P

(Sn − np√np(1− p)

≤ a

)= 1√

∫ a

−∞e−

12 y

2dy.

The random variable Sn is the sum of n i.i.d. Bernoulli random variables and so is binomial with mean npand variance np(1− p) and so for large n it is approximately normal in the precise sense given above.

Chapter 55.1 (a)

(E ∩ F )x = y ∈ S2; (x, y) ∈ E ∩ F= y ∈ S2; (x, y) ∈ E ∩ y ∈ S2; (x, y) ∈ F= Ex ∩ Fx.

(Ec)x = y ∈ S2; (x, y) ∈ Ec= y ∈ S2; (x, y) /∈ E= (Ex)c.

98

Page 100: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

(∞⋃n=1

En

)x

=

y ∈ S2; (x, y) ∈

∞⋃n=1

En

=∞⋃n=1

y ∈ S2; (x, y) ∈ En

=∞⋃n=1

(En)x.

5.2 We can write S1 =⋃∞n=1 An where m1(An) < ∞ for all n ∈ N and S2 =

⋃∞r=1 Br where m2(Br) < ∞ for

all r ∈ N. We then have

S1 × S2 =∞⋃n=1

∞⋃r=1

An ×Br,

and for all r, n ∈ N,(m1 ×m2)(An ×Br) = m1(An)m2(Br) <∞.

(You can, of course, write S1 × S2 as just a single union, by using the countability of N× N.)5.3 Let E = A×B. Then if x ∈ S1,

Ex =

B if x ∈ A∅ if x /∈ A

So φE(x) = m2(B)1A(x), and hence

(m1 ×m2)(A×B) =∫S1

φE(x)dm1(x)

= m2(B)∫S1

1A(x)dm1(x)

= m1(A)m2(B).

5.4 Suppose that µ is a measure that takes the same value as m1 ×m2 on finite product sets. Define

E = E ∈ Σ1 ⊗ Σ2;µ(E) = (m1 ×m2)(E).

By definition of µ, the collection P of all finite product sets is in E . Since

(A1 ×A2) ∩ (B1 ×B2) = (A1 ∩B1)× (A2 ∩B2),

it follows that P is a π-system. Using basic properties of measures, it is not hard to show that E is a λ-system (use the solution to Problem 5.1 to establish (L1)). By σ-finiteness, it follows that σ(P) = Σ1⊗Σ2and by Dynkin’s π − λ lemma, σ(P) ⊆ E . The result follows.

5.5 (a) Method 1. First let f = 1A and let g = 1B . Since h = 1A1B = 1A×B , it is clear that h is measurablein this case. Next use linearity, to extend to the case where f and g are non-negative simple functions.Next let f and g be arbitrary non-negative measurable functions. Then by Theorem 2.4.1, thereis a sequence of non-negative simple functions (sn) converging pointwise to f , and a correspondingsequence (tm) converging pointwise to g. Taking limits as m and n go to infinity, proves that f and gare measurable in this case. Finally let f and g be arbitrary measurable functions. Write f = f+−f−and g = g+ − g−. Then

fg = (f+g+ + f−g−)− (f−g+ + f+g−),is measurable as it is a sum of products of measurable functions.Method 2. For B ∈ Σ2, define fB(x, y) = f(x)1B(y) for all x ∈ S1, y ∈ S2. The mapping f : S1×S2 →R is measurable since for all a ∈ R, f−1((a,∞)) = f−1((a,∞)) × B ∈ Σ1 × Σ2. In particular, fS2

is measurable; however fS2 (x, y) = f(x) for all x ∈ S1, y ∈ S2; so h = fS2 gS1 is the product ofmeasurable functions, hence is measurable.

(b) Follows easily from Fubini’s theorem (2).5.6 Let m be counting measure on (N,P(N)). Then a(i, j) = aij defines a non-negative measurable function

from (N2,P(N2)) to (R,B(R)), where we note that P(N2) = P(N)⊗ P(N). We have∫N2a d(m×m) =

∑(i,j)∈N2

ai,j ,

99

Page 101: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

∫N

(∫Na(i, j)dm(i)

)dm(j) =

∞∑j=1

∞∑i=1

aij ,

∫N

(∫Na(i, j)dm(j)

)dm(i) =

∞∑i=1

∞∑j=1

aij ,

and the result follows by Fubini’s theorem 1.5.7 (a)

Acf = (x, t) ∈ S × R; 0 ≤ f(x) < t

=⋃q∈Q

(x, t) ∈ S × R; 0 ≤ f(x) < q, t ≥ q

=⋃q∈Q

f−1([0, q))× [q,∞),

which is a countable union of measurable sets, and so is measurable. Hence Af = (Acf )c is measurable.(b) We use the definition (as a Lebesgue integral) of product measure. Fix x ∈ S. Then the x-slice (Af )x

is just the interval [0, f(x)]. Its Lebesgue measure is f(x) and so

(m× λ)(Af ) =∫S

λ[(Af )x]dm(x)

=∫S

f(x)dm(x).

5.8 First fix T > 0 and use the hint:∫ T

0

sin(x)x

dx =∫ T

0sin(x)

(∫ ∞0

e−xydy

)dx.

Now f(x, y) = e−xy sin(x) is continuous, and so Riemann integrable (and hence Lebesgue integrable) on[0, t]× [0, N ]. By Fubini’s theorem:∫ T

0sin(x)

(∫ N

0e−xydy

)dx =

∫ N

0

(∫ T

0e−xy sin(x)dx

)dy

= −∫ N

0

(y

1 + y2 e−yT sin(T )

+ 11 + y2 (e−yT cos(T )− 1)

)dy,

using integration by parts.On the other hand,

∫ N0 e−xydy = 1

x(1− e−Ny) and so∣∣∣∣∫ N

0e−xydy

∣∣∣∣ ≤ 2x.

Since x→ sin(x)x

is continuous, and hence integrable, on [0, T ] we can use dominated convergence to assertthat ∫ T

0

sin(x)x

dx =∫ T

0sin(x)

(∫ ∞0

e−xydy

)dx

= −∫ ∞

0

(y

1 + y2 e−yT sin(T ) + 1

1 + y2 (e−yT cos(T )− 1))dy.

Now use monotonicity in the first integral (since y/1 + y2 ≤ 1), and dominated convergence in the second(since |e−yT cos(T )− 1| ≤ 2) to deduce that

limT→∞

∫ T

0

sin(x)x

dx =∫ ∞

0

11 + y2 dy = π

2 .

5.9 (a) Both integrals vanish by elementary calculus arguments.

100

Page 102: Probability with Measurenicfreeman.staff.shef.ac.uk/MASx50/notes.pdf · 2020-04-05 · Measure Spaces and Measure chap:measure_spaces 1.1 What is Measure? Measure theory is the abstract

©Nic Freeman, University of Sheffield, 2020.

(b) Let S = (x, y) ∈ R2;−1 ≤ x ≤ 1,−1 ≤ y ≤ 1 and A = (x, y) ∈ R2; 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Werequire

∫S|f(x, y)|dxdy <∞. Note that

∫S|f(x, y)|dxdy ≥

∫A|f(x, y)|dxdy. Now if f were integrable

over A, we could use Fubini’s theorem to write it as repeated integral. But consider∫ 1

0x

(∫ 1

0

y

(x2 + y2)2 dy

)dx = 1

2

∫ 1

0

( 1x− x

x2 + 1

)dx.

Since x→ 1xis not integrable over [0, 1], the result follows.

5.10 First observe that by Problems 2.7 and 1.6 part (a) the mapping (x, y)→ f(x− y)g(y) is measurable. LetK = supx∈R |g(x)| <∞, since g is bounded. Then since f is integrable

|(f ∗ g)(x)| ≤∫R|f(x− y)|.|g(y)|dy ≤ K

∫R|f(x− y)|dy = K

∫R|f(y)|dy <∞.

We also have by Fubini’s theorem∫R

∫R|(f(x− y)g(y)|dydx ≤

∫R

(∫R|f(x− y)|.|g(y)|dy

)dx

=∫R

(∫R|f(x− y)|dx

)|g(y)|dy

=∫R|f(x)|dx

∫R|g(y)|dy <∞,

from which it follows that f ∗ g is both measurable, and integrable.By a similar argument using Fubini’s theorem, we have that

f ∗ g(y) =∫Re−ixy

∫Rf(x− z)g(z)dzdx

=∫R

(∫Re−iy(u+z)f(u)du

)g(z)dz

=∫Re−iyuf(u)du.

∫Re−iyzg(z)dz

= f(y)g(y),

where we used that change of variable x = u+ z.

101