table of contents 7 7 - harvard universityscholar.harvard.edu › files › charlescywang › files...

1 | P a g e

Table of Contents Preliminary Definitions: .................................................................................................................................................................. 7

a. Sup and Inf ....................................................................................................................................................................... 7

b. Lim Inf and Lim Sup of a Sequence of reals ..................................................................................................................... 7

c. Lim Inf and Lim Sup of a Set ............................................................................................................................................ 7

d. Metric Spaces .................................................................................................................................................................. 7

e. Normed Spaces................................................................................................................................................................ 7

f. Inner Product Spaces ....................................................................................................................................................... 7

g. Stochastic Order .............................................................................................................................................................. 7

Chapter 1: Probability Measure and Integration ............................................................................................................................. 8

I. Definitions: .......................................................................................................................................................................... 8

a. Probability Space ............................................................................................................................................................. 8

b. (D.1.1.1)Sigma‐Algebra: (We want to impose some structure on F) .............................................................................. 8

c. (D.1.1.2)Measurable Space: ( )A pair , with F a sigma-field is called a measurable space.FΩ ....................................................... 8

d. (D.1.1.2)Probability Measure: ......................................................................................................................................... 8

e. (D.1.1.5)Generated Sigma Fields: .................................................................................................................................... 8

f. (D.1.1.7/E.1.1.8)Borel Sigma‐Field on R: ......................................................................................................................... 8

g. (D.1.2.8)Sigma Field Generated by a RV: ........................................................................................................................ 9

h. (D.1.2.8)Sigma Field Generated by a Sequence of R.V.s: ................................................................................................ 9

i. (D.1.2.1)Random Variable: .............................................................................................................................................. 9

j. (E.1.2.2)Indicators: .......................................................................................................................................................... 9

k. (E.1.2.3)Simple Functions: ............................................................................................................................................... 9

l. (D.1.2.7) X,Y Almost Surely the same: ( ) ( )We say X,Y def on same , , are almost surely the same if P : ( ) ( ) 0F P X Yω ω ωΩ ≠ = .... 9

m. (D.1.2.11)Borel Measurable Function: ........................................................................................................................ 9

n. (D.1.2.18)Mathematical Expectation of a R.V.: ............................................................................................................... 9

o. (D.1.2.24) Positive (X+) and Negative (X‐) Parts of a RV.................................................................................................. 9

p. (D.1.2.25)Integrable R.V.: .............................................................................................................................................. 10

q. (D.1.4.21,L.1.4.25)Uniform Integrability of a Collection of R.V.s .................................................................................. 10

r. Sufficient Conditions for Uniform Integrability of a Collection of R.V.’s ....................................................................... 10

s. (D.1.3.1,D.1.3.2,D.1.4.9,P.1.4.10)Convergences of R.V.s .............................................................................................. 10

t. (D.1.3.4)Complete Probability Space ............................................................................................................................ 11

u. (D.1.3.2) Lq space ........................................................................................................................................................... 11

v. (D.1.4.1)Law of a R.V. .................................................................................................................................................... 11

2 | P a g e

w. (D.1.4.4.)Distribution Function of a R.V. ....................................................................................................................... 11

x. (D.1.2.22,P.1.4.7)Probability Density Function (PDF): .................................................................................................. 11

y. (D.1.4.32,D.1.3.33,D.1.4.35) Independence of Events, Sigma‐Fields, and Random Variables ..................................... 12

II. Theorems/Propositions: ........................................................................................................................................................ 13

a. (E.1.1.3)Properties of Probability Measures ................................................................................................................. 13

b. (E.1.1.6) Properties of Sigma‐Fields (Unions and Intersections) ................................................................................... 13

c. (P.1.1.19) Borel set does not contain all subsets of R ................................................................................................... 13

d. (E.1.2.3) Properties of Indicator R.V.’s: ......................................................................................................................... 13

e. (P.1.2.6)RV as (pt‐wise) limit of SF ................................................................................................................................ 13

f. (D.1.2.7)X=Y A.S. : If Var(X‐Y) = 0 then X = Y a.s. ........................................................................................................... 13

g. (E.1.2.10,P.1.2.13,P.1.2.14,T.1.2.16)Closure Properties of R.V.’s: ................................................................................ 13

h. (P.1.2.31)Properties of Expectation .............................................................................................................................. 14

i. (P.1.2.34,T.1.2.36,P.1.2.38) Inequalities (Jensen’s, Markov’s, Chebychev’s, Cauchy‐Schwartz) .................................. 14

j. (T.1.3.6,P.1.3.21,C.1.3.15) A.S., Q.M. P, D Convergence Relationships ........................................................................ 14

k. (P.1.3.12)Property of the || . ||q Norm: ||X||q is nondecreasing in q ....................................................................... 15

l. (L.1.3.10,L.1.3.11) Borel Cantelli Lemmas ..................................................................................................................... 15

m. (P.1.3.22) Typical Application of BCI: ( )2 1( ) 1 0 a.s. n nE X n Xn

ω≤ ∀ ⇒ → ...................................................................... 15

n. (T.1.4.26,T.1.4.27,C.1.4.28,T.1.4.29) (Dominated/Bounded/Monotone) Convergence Theorems to Show

( ) ( )lim limn nE X E X= .............................................................................................................................................................. 15

o. (T.1.4.22) Uniform Integrability + Convergence in Prob q.m. Convergence ............................................................ 15

p. (P.1.4.3) Change of Variables (We can measure a RV either in the input space of in the output space!) ................... 16

q. (P.1.4.5) Distribution Function FX Uniquely Determines the Law PX of X (Just as Characteristic Function) .................. 16

r. (P.1.4.3) X =DY E[h(X)] = E[h(Y)] for all h bounded and Borel measurable. .............................................................. 16

Chapter 2: Conditional Expectation and Hilbert Spaces ............................................................................................................... 17

I. Definitions ......................................................................................................................................................................... 17

a. Expression for Conditional Expectation for Discrete R.V. Y: ......................................................................................... 17

b. Definition of E(X|Y) when ( )2X L , ,F P∈ Ω ( )2 i.e. E(|X| ) < ∞ : ............................................................................................ 17

c. General Definition of E(X|Y) .......................................................................................................................................... 17

II. Theorems/Propositions ...................................................................................................................................................... 18

a. (P.2.1.2)Projection Theorem: ........................................................................................................................................ 18

b. (T.2.1.6)Consistency of L2 and General Definition of E(X|Y): ....................................................................................... 18

c. (E.2.3.1,E.2.3.3.,P.2.3.4,P.2.3.5) Properties of Conditional Expectations ..................................................................... 18

d. (P.2.3.10) Jensen’s Inequality for Conditional Expectations ......................................................................................... 18

e. (T.2.3.13,T.2.3.14) Monotone and Dominated Convergence for Conditional Expectations ......................................... 18

3 | P a g e

f. (E.2.3.8) Conditional Variance Identity ......................................................................................................................... 18

Chapter 3: General Theory of Stochastic Processes ...................................................................................................................... 19

I. Definitions: ........................................................................................................................................................................ 19

a. (D.3.1.1) Stochastic Process .......................................................................................................................................... 19

b. (D.3.1.2) Random Walk ................................................................................................................................................. 19

c. (E.3.1.4) S.P. with Independent Increments ................................................................................................................. 19

d. (D.3.1.5)Finite Dimensional Distribution Function: ....................................................................................................... 19

e. (D.1.3.12) Consistency of FDD ....................................................................................................................................... 19

f. (D. 3.1.14) Sigma‐Field of a S.P...................................................................................................................................... 19

g. (D.1.3.7) Versions: ......................................................................................................................................................... 19

h. (D.1.3.8) Modifications: ................................................................................................................................................. 19

i. (E.3.3.7) Indistinguishable sample path (a.s.): .............................................................................................................. 19

j. (D.3.2.1) Characteristic Function of a Random Vector .................................................................................................. 20

k. (D.3.2.4) PDF of a Random Vector ................................................................................................................................ 20

l. (D.3.2.6) Positive Semi‐definite Matrix ......................................................................................................................... 20

m. (D.3.2.7) Gaussian Distribution ................................................................................................................................ 20

n. (P.3.2.5) Gaussian Density ............................................................................................................................................. 21

o. (D.3.2.14) Gaussian Stochastic Process ......................................................................................................................... 21

p. (D.3.2.19) Strong and Covariance/Weak Stationary Process ........................................................................................ 21

II. Theorems/Propositions ...................................................................................................................................................... 22

a. Modification Versions ............................................................................................................................................... 22

b. X(t) , Y(t) modifications of each other does not imply that they’re indistinguishable. ................................................. 22

c. If X(t), Y(t) modifications of each other and are either a) discrete time or b) right continuous continuous time, then they are also indistinguishable. ............................................................................................................................................. 22

d. (P.3.2.2) Characteristic Function Uniquely Determines the Law PX of a Random Vector ............................................. 22

e. (P.3.2.5) Properties of Characteristic Functions ............................................................................................................ 22

f. (P.3.2.9,P.3.2.10,E.3.2.11,P.3.2.12,P.3.2.13) Properties of Gaussian Random Vectors ................................................ 22

g. (C.3.2.15,P.3.2.16)Properties of Gaussian Processes .................................................................................................... 22

h. (T.3.3.16) Fubini’s Theorem .......................................................................................................................................... 23

Chapter 4: Martingales and Stopping Times ................................................................................................................................. 24

Definitions ................................................................................................................................................................................. 24

a. (D.4.1.1) Filtration: A (discrete time) filtration is a non‐decreasing family of sub‐sigma‐fields of nF of our

measurable space ( ), FΩ ...................................................................................................................................................... 24

That is, 0 1 ... ...nF F F F⊆ ⊆ ⊆ ⊆ where nF is a sigma field for each n. ................................................................................... 24

4 | P a g e

b. (D.4.1.2) S.P. Adapted to Filtration: A (discrete time) S.P. , 0,1,...nX n = is adapted to a filtration nF if ( )nXω ωa is

a R.V. on ( ), nFΩ for each n. i.e.) ( )n nX Fσ ⊆ for each n. ........................................................................................................ 24

c. (D.4.1.3) Minimal Filtration / Canonical Filtration ........................................................................................................ 24

d. (D.4.1.4,D.4.2.1) Martingale .......................................................................................................................................... 24

e. (D.3.1.14) L2 Martinglale ............................................................................................................................................... 24

f. Martingale Differences .................................................................................................................................................. 24

g. (D.4.1.11) Previsible / Predictable Process ................................................................................................................... 24

h. (D.4.1.16) Orthogonal Sequence of R.V.’s ..................................................................................................................... 25

i. (D.4.1.18) Super‐martingales and Sub‐martingales ...................................................................................................... 25

j. (D.4.2.8) Last Element of a subMG or supMG ............................................................................................................... 25

k. (D.4.2.9) Right‐Continuous Filtration ............................................................................................................................ 25

l. (D.4.3.1, D.4.3.11) Stopping Time t for a filtration Ft ................................................................................................. 25

m. (D.4.3.5) Stopped Process ......................................................................................................................................... 26

n. (D.4.4.2) Innovation Process ......................................................................................................................................... 26

o. (D.4.4.8) Increasing Part of a M.G. Mt ........................................................................................................................ 26

p. (D.4.6.1) Branching Process (We use MGs to study the extinction probabilities of branching processes) .................. 26

q. Probability of Extinction ................................................................................................................................................ 26

Theorems ................................................................................................................................................................................... 27

a. is adapted to a filtration nF iff ( )0,..., n nX X F nσ ⊆ ∀ .................................................................................................. 27

b. (P. 4.1.7): ....................................................................................................................................................................... 27

c. (T.4.1.12) : Martingale Transform ................................................................................................................................. 27

d. (P.4.1.15,P.4.1.17) Alternative and Equivalent Characterizations of M.G. S.P. ............................................................ 27

e. (R.4.1.21) SubMG (SupMG) have non‐decreasing (non‐increasing) expectation E(Xn) ................................................ 27

f. (T.4.3.6) Preservation of (sub/sup)MG of Stopped Processes ...................................................................................... 27

g. (P.4.3.13) Checking Stopping Time for a Continuous‐Time Process ............................................................................. 28

h. (T.4.3.8, T.4.3.16) Doob’s Optional Stopping Theorem................................................................................................. 28

i. (E.4.3.9) Typical Example of Application of Doob’s Optional Stopping ........................................................................ 28

j. (E.4.3.17) A Proof for why t <inf .................................................................................................................................... 29

k. (T.4.4.1) Doob’s Decomposition .................................................................................................................................... 29

l. (E.4.4.3) Doob’s Decomposition for SubMG (An non‐decreasing) and SupMG (An non‐increasing) ............................. 30

m. (T.4.4.7) Doob‐Meyer Decomposition (continuous time analog of Doob’s decomposition and fundamental to stochastic integration) ........................................................................................................................................................... 31

n. (T.4.4.11) Doob’s Inequality .......................................................................................................................................... 31

o. (T. 4.5.1) Doob’s Martingale Convergence Theorem .................................................................................................... 31

5 | P a g e

p. (P.4.5.3) L2 and A.S. convergence of MGs (A Stronger Convergence Theorem than Doob’s) ...................................... 32

q. (P.4.6.2) SP Xn=m‐nZn is a MG for the filtration Fn, Zn a Branching Process ................................................................ 32

r. (P.4.6.3) Sub‐Critical Process Dies Off / Extinction Probability is One when m < 1 ...................................................... 32

s. (P.4.6.5) Critical Process Dies Off / Extinction Probability is One when m = 1 ............................................................. 32

Chapter 5: Brownian Motion ......................................................................................................................................................... 33

Definitions ................................................................................................................................................................................. 33

a. (D.5.1.1, T.5.2.1) Brownian Motion ............................................................................................................................... 33

b. Comparison between Variation of Nice Calculus and Stochastic Calculus: .................................................................. 33

c. (D.5.3.1) || π || (Length of Longest Interval) and Q‐th Variation of f(.) on π (CALCULUS DEFINITION) .................... 33

d. (D.5.3.2) Q‐th variation of a S.P. X(t) on the interval [a,b] ........................................................................................ 33

e. Lipschitz Sample Path (w.p.1) ........................................................................................................................................ 33

f. (D.5.3.3) Quadratic Variation ........................................................................................................................................ 34

Theorems ................................................................................................................................................................................... 35

a. (P.5.1.2) Brownian Motion has Independent Increments of Zero Mean ..................................................................... 35

b. (E.5.1.4) Interesting Properties of a BM ........................................................................................................................ 35

c. (T.5.2.1) Levy’s Martingale Characterization of BM ...................................................................................................... 35

g. (E.5.1.5,E.5.1.6) BM‐Related Processes and their properties ....................................................................................... 35

(SEE PS5 for Work) ................................................................................................................................................................ 36

d. (P.5.2.2) tX a BMtW Wτ τ+= − (USEFUL FOR REFLECTION PRINCIPLE) ................................................................................ 36

e. 0: ( ) : max ( ) : ( )T s T sW W Tαω ω α ω ω α ω τ ω≤ ≤≥ ⊆ ≥ = ≤ where inf( 0 : )tt Wατ α= > = is the First Hitting Time of BM ........ 36

f. Reflection Principle (pp 106‐107) .................................................................................................................................. 36

g. (P.5.3.9) Total Variation of BM W(t) is infinite w.p.1 .................................................................................................... 36

h. (D.5.3.4) (2)( )For BM W(t), as || || 0 we have that ( ) ( ) in 2-mean.V W b aππ → → − ...................................................................................... 37

i. Quadratic Variation of a BM W(t) is t, since its Doob‐Meyer Decomposition is W(t)2=[W(t)2‐t] + t ............................ 37

Chapter 6: Markov, Poisson, and Jump Processes ........................................................................................................................ 38

Definitions ................................................................................................................................................................................. 38

a. (D.6.1.1) Markov Chain ................................................................................................................................................. 38

b. (D.6.1.2) (Time) Homogeneous Markov Chain .............................................................................................................. 38

c. (D.6.1.3, D.6.1.8) Stationary Transition Probabilities (Determines distribution of Homogeneous MCs) ..................... 38

d. (D.6.1.4) Initial Distribution of a Markov Chain ............................................................................................................ 39

e. 4 Useful Conditions for Defining Poisson Processes: .................................................................................................... 39

f. (D.6.2.1) Counting Process (C0) ..................................................................................................................................... 40

g. (D.6.2.1) Jump Times ..................................................................................................................................................... 40

h. Poisson R.V. ................................................................................................................................................................... 40

6 | P a g e

i. S.P. with Independent Increments ................................................................................................................................ 40

j. (D. 6.2.3) S.P. with Stationary Increments .................................................................................................................... 40

k. (P.6.2.5) Memoryless Property of Exponential Law ...................................................................................................... 40

l. (D.6.2.1, P.6.2.4) Poisson Process ................................................................................................................................. 40

Theorems................................................................................................................................................................................... 41

a. Showing that a SP is a Markov SP ................................................................................................................................. 41

b. Showing that a Markov SP is Time‐Homogeneous ....................................................................................................... 41

c. (Lemma 1) If X(t) is a S.P. with independent increments, then X(t) is a Markov SP. ............................................... 41

d. (Lemma 2) If X(t) is a stationary process and a Markov Process, then it is Homogeneous. ....................................... 41

e. (Extra) If X(t) a Markov Process, then Y(t) = ft(Xg(t)) is also a Markov Process for f invertible and g strictly increasing. 41

f. (P.6.1.13) Every Continuous Time SP X(t) with Stationary and Independent Increments is a Homogenous Markov Process. ................................................................................................................................................................................. 42

g. (P.6.1.5) Strong Markov Property ................................................................................................................................. 42

h. (P.6.2.4) Poisson Process is the only S.P. with stationary independent increments that satisfies condition C0 (counting process). ................................................................................................................................................................ 42

i. (P.6.2.5) Memoryless Property of Exponential Law ...................................................................................................... 42

j. (P.6.2.6) A S.P. N(t) that satisfies C0 is a Poisson process of rate λ iff it satisfies C3. ................................................... 42

k. (P.6.2.8) Relationship between Poisson Process and Uniform Measure ...................................................................... 42

l. (T.6.2.10) Poisson Approximation ................................................................................................................................. 42

m. (P.6.2.12) N1(t) + N2(t) Poisson if N1(t), N2(t) independent Poisson ....................................................................... 43

7 | P a g e

Preliminary Definitions: a. Sup and Inf

Sup is the least upper bound and inf is the greatest lower bound. Note: By completeness of R, we know that Sup and Inf always exist for any subset of R. Note: Sup is weakly increasing as sets get larger, and Infs are weakly decreasing as sets get larger.

b. Lim Inf and Lim Sup of a Sequence of reals

limsup lim sup liminf lim infn n n k n k n n n k n kX X and X X→∞ > →∞ >= = Note: Again, these are always defined. Note: If the limit of the sequence exist, then it can be shown that limsup and liminf must be the same (i.e. equal to the limit).

c. Lim Inf and Lim Sup of a Set

1

Let : be a sequence of events

Then, (A , . .) ( occurs infinitely often) limsup (Intuition: For n>1, Limsup is in ONE of the tail

i

i i

i

m n m n

A i I

i o AA

A ω= ≥

∈

≡

≡

≡ ∈I U

n

1 1

A 's. This is true for all n. )

= : , ( ) s.t. ( )

= : for infinitely many n

Idea: If LimSup

n

n

i m n m n n n

m n m A

A

A A A

ω ω ω ω

ω ω

ω ω ω= ≥ ≥

∀ ∃ ≥ ∈

∈

∈ ⇒ ∈ ⇒ ∈I U U 2

k

AND AND ... k. That is, is in -many A 's. So, a collection of such a collection of is a collection of s.t. A 's occur i.o.

(A ,e.v.) (

n n n k n

k

i

A A

A

ω ωω ω ω

≥ ≥∈ ∈ ∀

∞

≡

U U

n

eventually) liminf (Intuition: For n>1, Liminf is in ALL of the A 's. This is true for all n. )

= : s.t. n>m,

i

i

m n m n

AA

m

ω

ω≥

≡

≡ ∈

∃ ∀

U I

1 2

( )

= : for all large n

Idea: If lim Inf AND AND ... k. That is, is in a

n

n

i m n m n n n n n n k n

A

A

A A A A A

ω ω

ω ω

ω ω ω ω ωω

≥ ≥ ≥ ≥

∈

∈

∈ ⇒ ∈ ⇒ ∈ ∈ ∈ ∀U I I U I

kll the A 's eventually. So, a collection of such a collection of is a collection of s.t. all A 's occur eventually.kω ω

(See notes) d. Metric Spaces e. Normed Spaces f. Inner Product Spaces g. Stochastic Order

O(.): As x → ?, ƒ(x) = O(g(x)) iff ∃ C such that eventually |ƒ(x)| ≤ C |g(x)|. Or, the fraction ƒ(x)g(x) is eventually bounded as x approaches limit.

o(.): As x → ?, ƒ(x) = o(g(x)) iff limx→?

ƒ(x)g(x) = 0. (i.e. f(x) goes to 0 faster than g(x) as x approaches the limit.

Op(.): op(.) Manipulating this notation: Most of what we have here is common sense. Some useful ideas: O(g1(x)) + O(g2(x)) = O(max(g1(x), g2(x))

O(g1(x))⋅O(g2(x)) = O(g1(x)⋅g2(x))

We’d like to be able to say that for reasonable functions w, w(O(g(x))) = O(w(g(x)). There’s no problem with saying (O(h))2

= O(h2

), but eO(ln x)

isn’t well defined. You have to be careful there.

You should avoid dividing by big-O or little-o. However, one can make sense of something like 1

2+O(x) by long division: 1

2+O(x) = 12 + O(x) as x → 0.

8 | P a g e

Chapter 1: Probability Measure and Integration

I. Definitions: a. Probability Space

( )Probability space is a triple , , where:

Sample space the set of all possible outcomes of some rando experimnt or phenomenon.

F Event space = a subset of 2 (set of all subsets), consisting of a

F P

Ω

Ω

Ω = =

= ll "allowed events". Or, events to which we shall assign prob's. (It represents both the amount of information available and the events of possible interest to us.)P= Probability Measure = a set function P: A F [0,1]⊆ a

b. (D.1.1.1)Sigma-Algebra: (We want to impose some structure on F)

We say that F 2 is a -field/ -algebra if (a) (b) A F \ (Complement Rule) (c) A F A F (Countable Union Rule)i i i

FA F

σ σΩ⊆Ω∈

∈ ⇒ Ω ∈∈ ⇒ ∈U De Morgan's Law [A F A F] (Countable Intersection Rule)

( , ) always in -fieldby i i i

σ

⇒ ∈ ⇒ ∈

⇒ ∅ Ω

I(1)

| |Note on 2 : We denote "set of subsets" as 2 because for a set of finite elements, it can be show that the power set has 2 elements. Pf: Let | | .

Count all subset

N

Ω Ω Ω

Ω =

0

0

s with 0,1, 2, 3, 4, ... ,N elements, we get, ... (1 1) 2 by binomial th.0 1

Recall, binomial th. says ( )

NN N

in

n i n i

i

N N N NN i

na b a b

i

=

−

=

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ + + = = + =⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

⎛ ⎞+ = ⎜ ⎟

⎝ ⎠

∑

∑

c. (D.1.1.2)Measurable Space: ( )A pair , with F a sigma-field is called a measurable space.FΩ

d. (D.1.1.2)Probability Measure:

Given a measurable space, a probability measure is a function :F 0,1 s.t.

(a)0 P(A) 1 (b)P( )=1 (c)P(A) = whenever A= A is a countable union of disjoint sets An n nA F

⎡ ⎤⎣ ⎦

≤ ≤ Ω ∈∑P a

U

e. (D.1.1.5)Generated Sigma Fields:

( ) ( ) ( )Given a collection of subsets A , where not necessarily a countable index set,

we denote by or , , and call the sigma-field generated

by the colle

A F A A Aα

α α α α

α

α σ σ α σ

⊆ Ω ∈ Γ

− ∈ ∀ ∈ Γsmallest sigma field s.t.

( )

ction : : 2 is a sigma-field, A

(intersection of all possible sigma-fields that contain yields the smallest one)

A A G G G

A

α α α

α

σ αΩ= ⊆ ∈ ∀ ∈ΓI

(Note: This definition works because a (possibly uncountable) intersection of sigma-fields is a sigma-field)

f. (D.1.1.7/E.1.1.8)Borel Sigma-Field on R: ( ) ( ) ( ) ( ) ( )( , ) : [ , ] : , ( , ) : ( , ] : opena b a b a b a b b b b b Q Oβ σ σ σ σ σ= < ∈ℜ = ∈ℜ = −∞ ∈ℜ = −∞ ∈ = ⊆ ℜ

Note on Borel and ( )Xσ : It turns out that this has all the sets/events that we could possibly ever want. Now, we generally assume this sigma-field on the output space of a random variable. So, we want to be able to measure the Borel events on the output space. To do so, we want to make sure that the sigma field in the input space corresponds to this (since our probability measure is defined on the input space, and the measure on the output space is induced by the random variable), and thus we require F = ( ): ( )X Rσ ω ω α α≤ ∀ ∈ so that in the output space we have all

the Borel sets (because then we have in the output space ( )( , ], Rσ α α−∞ ∈ which is just Borel).

1 De Morgan’s Laws: ( ) ( ) ( ) ( ) or in set notation:

c cc c c cA B A B and A B A B A B A B and A B A B∧ = ¬ ¬ ∨ ¬ ∨ = ¬ ¬ ∧ ¬ = =I U U I

Equivalently: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) or in set notation: c cc c c cA B A B and A B A B A B A B and A B A B¬ ∧ = ¬ ∨ ¬ ¬ ∨ = ¬ ∧ ¬ = =I U U I

Thus, ( ) ( ). by Complement Rule by Countable Union by Complement.c cc c c c

i i i i i i i i i i i iA A A F A F A F A A F= ∈ ⇒ ∈ ⇒ ∈ ⇒ = ∈I U U I U

9 | P a g e

g. (D.1.2.8)Sigma Field Generated by a RV: ( )Given RV X, we denote by (X) the smallest -field G F s.t. X( ) is measurable on , .

We call (X) the -field generated by X, sometimes denoting by F .It can be shown that:

X

Gσ σ ω

σ σ

⊆ Ω

( ) ( ) ( ) :X Xσ σ ω ω α= ≤

Note on (X)σ : ( )ˆ ˆNaturally, we may want to consider (X)= : ( ) for all Borel sets B. But, it can be shown that (X) ( ).X Xσ σ ω ω σ σ∈ Β =

Note on (X)σ and information:σ(X) is used to produce a rigorous mathematical theory. But it also has the crucial role of quantifying the amount of information we have. For example, (X) contains exactly those events A for which wσ e can say whether A or not. So, if we knew whether all the sets in (X) happened, then we'd know the value of X.

ωσ

∈

h. (D.1.2.8)Sigma Field Generated by a Sequence of R.V.s:

i. (D.1.2.1)Random Variable: ( ) ( ) A R.V. on , is a function X: s.t. , the set : (called F-measurable function)F R R X Fα ω ω αΩ Ω ∀ ∈ ≤ ∈a

( ) (Equivalently: : A)X A Fω ω ∈ ∈ ∀

Note on F-Measurability: Why do we care about this? So that we can induce a probability measure Px on R.

( ) ( ) i.e. A R, P : ( ) is well-defined because : ( ) and thus we can assign a probability measure!X A P X A X A Fω ω ω ω∀ ⊂ = ∈ ∈ ∈

Note on Choice of F: In defining RV, F is implicit (i.e. X on (W,F,P) is by def F-measurable). Note on X as Limit of SF: Any RV X can be expressed as a limit of a sequence of SF’s.

j. (E.1.2.2)Indicators: 1,

We call a R.V. (it can be easily verified) I ( ) an indicator function0, . .A

Ao w

ωω

∈⎧= ⎨

⎩ Note on Use: Indicators are very useful because it allows us to use simple functions, and all R.V.’s can be expressed as limits of SF’s.

k. (E.1.2.3)Simple Functions:

( )1

We call the R.V. X ( ) (for finite N, non-random c , and sets A ) a simple function, denoted by X .n

NN n A n n Nn

c I R F SFω ω=

= ∈ ∈ ∈∑ l. (D.1.2.7) X,Y Almost Surely the same: ( ) ( )We say X,Y def on same , , are almost surely the same if P : ( ) ( ) 0F P X Yω ω ωΩ ≠ =

(Can show by showing Var(X-Y) = 0)

m. (D.1.2.11)Borel Measurable Function: ( ) A function g:R R is called Borel (measurable) function if g(.) is a R.V. on , so that : ( ) R B x R g x Bα α⎡ ⎤∈ ≤ ∈ ∀⎣ ⎦a

Note: Every continuous function and every piecewise constant (at most countably many jump points between which it is constant) function is Borel measurable.

n. (D.1.2.18)Mathematical Expectation of a R.V.:

( )2 1, ,( 2 ,( 1)2 ]0

The mathematical expectation of a RV X( ) is denoted EX.

EX=E lim nI + 2 ( ) lim : ( )n

n nn n

n x n k n k nk kkk I x E x P X I

ω

ω ω− −

− −→∞ > +=

⎛ ⎞⎡ ⎤ ⎜ ⎟= ∈⎢ ⎥ ⎜ ⎟⎣ ⎦ ⎝ ⎠∑ ∑ (A Lebesgue Integral)

Note: This above formula is derived using the fact that any random variable is a limit and using some convergence theorem to switch order of E.

o. (D.1.2.24) Positive (X+) and Negative (X-) Parts of a RV

( ) ( ) ( )

( )1 n k

1

Given X ,...,X on the same probability space , , denote by , the smallest -field F s.t. X (ω) are measurable on , .

, H (whThus,

k

nk k k

F X k n F

X k n

σ σ

σ =

Ω ≤ Ω

≤ = I ( )k

ere H are sigma fields that contain

i.e. smallest -field containing (X ) k=1,...,n)k kXσ

σ σ ∀

10 | P a g e

( ) ( )For a general R.V. X, consider the non-negative R.V.'s X max ,0 and X min ,0 .

Then,

, | | , (provided either or )

X X

X X X X X X and EX EX EX EX EX

+ −

+ − + − + − + −

= = −

= − = + = − < ∞ < ∞

Note on Expectation Being Undefined: If and EX EX+ −= ∞ = ∞ , then the expectation is undefined. Where as if only 1 of them is infinity, then the expectation is defined, but it does not exist for that random variable. Furthermore, when in many theorems we require the integrability condition, +E X EX EX+ −< ∞ ⇔ < ∞ precisely to avoid the un-defined expectation problem. E|X| is always defined (since inf + inf = inf), where as EX is not always defined (since inf – inf = undefined. E.g. lim 1/n – lim 1/n = 0 but lim 2/n – lim 1/n = inf)

p. (D.1.2.25)Integrable R.V.:

( )| |

A R.V. X is integrable (or has finite expectation) if E|X|< (i.e. both EX and EX )

E |X|I 0X M M

+ −

> →∞

∞ < ∞ < ∞

⎡ ⎤⇔ →⎣ ⎦ (See HW2)

q. (D.1.4.21,L.1.4.25)Uniform Integrability of a Collection of R.V.s

( )| |A collection of RV's , is called Uniformly Integrable (U.I.) if lim sup | | 0M X MX I E X Iαα α αα →∞ >∈ =

OR EQUIVALENTLY

( )

A collection of RV's , is U.I. iff

(a) sup

(b) >0, >0 s.t. | | I and events A s.t. P(A)<A

X I

E X

E X I

α

α α

α

α

ε δ ε α δ

∈

< ∞

∀ ∃ < ∀ ∈ ∀

Note on Intuition: This says that if the tails of the collection of RV’s are uniformly bounded in some sense (i.e. the largest tail expectation of the collection goes to 0 as we go farther and farther out in the tail. Note on Use: In our class, this becomes useful when we know that X(n) pX, and |X(n)|q uniformly integrable, then X(n) q.m.X.

r. Sufficient Conditions for Uniform Integrability of a Collection of R.V.’s

(Lecture 4 pg.8)

1

1 1

a) If X L then X is U.I. (a single X that is integrable is trivially U.I.)b) If C< s.t. a.s. n, then U.I.

c) If for i=1,...,n (a finite collection of R.V.), then ,..., U.I.

d) I

n n

i n

X C X

X L X X

∈

∃ ∞ ≤ ∀

∈

1f Sup for some >0, then is U.I. n n nE X Xε ε+⎛ ⎞ < ∞⎜ ⎟⎝ ⎠

s. (D.1.3.1,D.1.3.2,D.1.4.9,P.1.4.10)Convergences of R.V.s

( ) ( )

(0) Pointwise: if , X ( ) ( )

(1) Almost Sure: if A F w/ P(A)=1 s.t. X ( ) ( ) for each

P : lim X ( ) ( ) 1 0 P : | lim X ( ) ( ) | 1 0 P : | lim X (

n ptwise n n

n as n n

n n n n n n

X X X

X X X A

X X

ω ω ω

ω ω ω

ω ω ω ε ω ω ω ε δ ω ω

→∞

→∞

→ ∀ ∈Ω →

→ ∃ ∈ → ∈

⇔ ∈Ω = = ⇔ ∀ > ∈Ω − < = ⇔ ∀ > ∈ Ω ( ) ( ) ( )

( )

) ( ) | 0

0, P : X ( ) ( ) 1 0, lim P : X ( ) ( ) 1

(i.e. eventually, it's within , which is just the def of convergence)

0, P : X ( ) ( ) 0 0, l

n k n k n k n k

n k n k

X

X X

X

ω δ

ε ω ω ω ε ε ω ω ω ε

ε

ε ω ω ω ε ε

≥ ≥

≥

− > =

⇔ ∀ > − ≤ = ⇔ ∀ > − ≤ =

⇔ ∀ > − > = ⇔ ∀ >

U I I

I U ( )im P : X ( ) ( ) 0

(i.e. this is just an equivalent statement of the above. The event that we're outside neighborhood of the limit infinitely often must have measure

n k n k Xω ω ω ε

ε

≥ − > =U

( )( )( ). .

0)

(2) Probability: if P : | ( ) ( ) | 0 0 fixed

(3) Lq/q-Mean: if , and || || 0 i.e. E | | 0

(4) Distribution/Law/Weakly: if ( ) ( ) for each n

n P n

q qn q m n n q n n n

n D X n X

X X X X

X X X X L X X X X

X X F F

ω ω ω ε ε

α α α

→∞ →∞

→∞

→ ∈Ω − > → ∀ >

→ ∈ − → − →

→ →

( )exp

that is a continuity point of .

(or equivalently) X iff h continuous and bounded (on the range of X), we have E ( ) ( ( ))

(5) Expectation: if ( ) ( )

X

n D n n

n n n

F

X h X E h X

X X E X E X→∞

→∞

→ ∀ →

→ →

Note: Convergence in expectations depends on monotone/dominated convergence theorems Note complete probability space: In principle, when dealing with a.s. convergence, we need to also check that the limit X is also a RV. If we assume that the probability space is complete (i.e. contains all limits), which we always do, then we can ignore this technical point. (See below)

11 | P a g e

Note on 2nd definition of convergence in probability: This is an equivalent definition (P.1.4.10) that is an alternative definition which applies to more general R.V. whose range is not R (for example random vectors with values in Rn). This condition is harder to check.

t. (D.1.3.4)Complete Probability Space ( ) ( )We say that , , is a complete probability space if ANY SUBSET N of B F with P 0 is also in F. F P BΩ ∈ =

Note: Any probability space can be completed by adding to F all the subsets of sets of probability 0. Note on why we care: Completeness guarantees that an a.s. limit of a RV is itself a RV.

nˆ ˆIt is possible to show that if X are RV's s.t. ( ) ( ) for each A and P(A)=1, then there exists a RV X s.t. N= : ( ) ( )

ˆis a subset of B=A and X . By assuming that the probability

n n

Cn as

X X X X

F X

ω ω ω ω ω ω→∞→ ∈ ≠

∈ → space is complete, we guarantee that N is in F, and therefore ˆ= is necessarily a random variable. asX X

u. (D.1.3.2) Lq space

( ) ( ) ( )Fixing 1 q< , we denote by , , the collection of RV's X on measurable space , for which E | |q qL F P F X≤ ∞ Ω Ω < ∞

Note on convergence: Convergence in Lq is the convergence of RV’s in the Lq space (defined above), with the || . ||q metric. Examples: 1 2L denotes the space of all integrable random variables. L denotes the space of all square-integrable random variables. Important structural properties of Lq (P.1.3.16):

( )

( ) 1/,

, , is a complete, normed (topological) vector space (i.e. X+ Y whenever X,Y L , , )

with the norm || . || | . | . If || || 0 then for some X in Lq. (i.e. complete

q q q

qq

q n m q m n n qm

L F P L R

E X X X X

α β α β

→∞

Ω ∈ ∈ ∈

⎡ ⎤= − → →⎢ ⎥⎣ ⎦)

v. (D.1.4.1)Law of a R.V. ( ) ( )XThe law of a RV X, denoted P , is the probability measure on , s.t. P ( ) : ( ) U BorelXR B U P X Uω ω= ∈ ∀ ∈

(In other words, PX is the measure induced by X on R)

w. (D.1.4.4.)Distribution Function of a R.V. ( ) ( )X XThe distribution function F of a real-valued R.V. X is F ( ) : ( ) ( , ] XP X Pα ω ω α α α= ≤ = −∞ ∀

Note on Existence of Fx: Since by definition X must be measurable w.r.t. ( )( ) : ( )X Xσ σ ω ω α α≡ ≤ ∀ , therefore we know the measure of

these sets is well defined always. Thus, the distribution function always exists! But PDF does not always exist, because (as shown below) this is only true for continuous and differentiable (almost everywhere) CDF’s. Note on Law and FX: The distribution function uniquely determines the law Px of X (just as the characteristic function).

x. (D.1.2.22,P.1.4.7)Probability Density Function (PDF):

( )A R.V. X( ) has a PDF f if P ( ) a < b R. Such a density must be a non-ive function with ( ) 1.b

x X Xa R

a X b f x dx f x dxω ≤ ≤ = ∀ ∈ =∫ ∫

OR Equivalently

XA R.V. X has a PDF iff its distribution function F can be expressed as

( ) ( ) where f 0 and (

x

X X x x

f

F f x dx f xα

α α−∞

= ∀ ≥∫ ) 1 .dx⎡ ⎤=⎢ ⎥⎣ ⎦∫

Note on Non-Existence of PDF:

If PDF exists, then it must be that ( ) ( )b

X XF b f x dx−∞

= ∫ , and thus by Fundamental Th. Of Calculus, ' ( ) ( )X XF b f x= .

However, if F(x) is not differentiable, then PDF does not exist (by contrapositive)! PDF exists is to say that we can write the CDF as a Riemann integral over R as mentioned above. But, remember that the Lebesgue integral can still be written, since by definition,

( ) : ( )

( ) ( [ , ]) : ( ) ( )X XX b

F b P X b P X b dPω ω

ω ω ω≤

= ∈ −∞ = ≤ = ∫ .

Note on Existence of PDF and Property of CDF: If PDF exists, then we know that FX is continuous and almost everywhere differentable with

' ( ) ( )X XF b f x= for almost every x.

12 | P a g e

Note on PDF’s, Lebegue vs. Riemann Integral:

The Lebesgue integral of X wrt the probability P is denoted E(X)= ( ) ( ). It is based on splitting the range of X( ) into finitely many small

intervals and approximating X( ) by a constant on the p

X dPω ω ω

ω∫

reimage of each of the intervals. This allows us to deal with rather general domain .

In contrast, the Riemann integral splits the domain of integration into finitely many small intervals (hence we'r

Ω

e limited to R ). Even when =[0,1]it allows us to deal with measures for which P for which ([0, ]) is not smooth (and hence Riemann integral fails to exist). But if the Riemannintegral exists, the

d

Pω ωΩ

→n it necessarily coincides with the Lebesgue Integral.

Thus, if PDF exists, then (Riemann) ( ) ( ) ( ) (Lebesgue)Xxf x dx X dPω

ω ω=∫ ∫

y. (D.1.4.32,D.1.3.33,D.1.4.35) Independence of Events, Sigma-Fields, and Random Variables

1

i 1 n(1)Independence of Events: Events A are P-mutually independent if for any L< and distinct indices i ,...,i ,

P ...i i

F

A A

∈ ∞

∩ ∩( ) ( )

( ) ( ) ( )

1

P

(2)Independence of -Fields: Two -fields H,G F are P-independent if

P P g G,h H

n j

n

ij

A

g h g P h

σ σ

=

=

⊆

∩ = ∀ ∈ ∈

∏

( ) ( ) ( )1 n

1 1

n -fields H ,...,H F are P-independent if

P ... P ...

(3)Independence of R.V.

n n i ih h h P h h H i

σ ⊆

∩ ∩ = ∀ ∈ ∀

( ) ( ) ( ) ( ) ( )'s: Two RV's X,Y defined on , , are independent if (where , ) OR EQVALENTLY if ( , ) ( ) (

F P X Y X Y F

P X x Y y P X x P

σ σ σ σΩ ⊥ ⊆

= = = =

( ) ( ) m n1 1

)

(4)Independence of a Random Vector: For any finite n,m 1, two random vectors ,..., and ,..., with values in R and R ,

respectively, are independent iff for all bounded Borel measurabl

m n

Y y

X X Y Y

=

≥

( ) ( )( ) ( )( ) ( )( )1 1 1 1

e functions g: R R and h:R R

,..., ,..., ,..., ,...,

m n

m n m nE h X X g Y Y E h X X E g Y Y

→ →

=

Note on Proving (4): From (3), we have that if h is the indicator variable, then (4) immediately follows. Now, we do the trick to show that if (4) is true for indicators, then it’s true for simple functions. Then, if it’s true for simple functions then it’s true for all functions. However, we impose the bounded and measurable conditions on the function so that the expectation is defined. Note on Checking Independence Condition: It’s useful, when checking that all sets in sigma-field are mutual independent, to use the following

c c c cA B A B A B A B⊥ ⇔ ⊥ ⇔ ⊥ ⇔ ⊥

13 | P a g e

II. Theorems/Propositions: a. (E.1.1.3)Properties of Probability Measures

( )

( )

Let , , be a probability space and A,B,A events in F. Then, satisfies: (a) Monotonicity: If A B then P(A) P(B)

(b) Sub-additivity: If A then P(A)

(c) Continuity from Below: If A A (i.e.

i

i i i

i

F

A P A

Ω

⊆ ≤

⊆ ≤

↑

∑

P P

U

( )( )

( ) ( ) ( ) ( ) ( )

1 2

1 2

11

A ... and ), then P A

(d) Continuity from Above: If A A (i.e. A ... and ), then P A

(e) Inclusion-Exlusion Rule: P ... ( 1) ...

i i i

i i i i

ni i i i j i j k n

i i j i j k

A A A A

A A A A

A P A P A A P A A A P A A+

< < <

⊆ ⊆ = ↑

↓ ⊇ ⊇ = ↓

= − ∩ + ∩ ∩ − + − ∩ ∩∑ ∑ ∑

U

I

U

b. (E.1.1.6) Properties of Sigma-Fields (Unions and Intersections)

1 2Let A be a -field for each , an arbitrary (possibly uncountable) index set. Then, is a -field, but not necessarily one.A A Aα α α α ασ α σ∈Γ∈Γ ∪I

c. (P.1.1.19) Borel set does not contain all subsets of R There exists a subset of R that is not in Borel set. Thus, not all subsets of R are Borel sets. (But for all intents and purposes, this is a technical and unimportant detail)

d. (E.1.2.3) Properties of Indicator R.V.’s:

(a) I ( ) 0 and I ( ) 0 (b) I ( ) 1 ( )

(c) I ( ) ( ) iff A B

(d) I ( ) ( )

(e) If A disjoint then I ( ) ( )

C

i i i

i i i

AA

A B

A Ai

i A Ai

I

I

I

I

ω ωω ω

ω ω

ω ω

ω ω

∅ Ω= =

= −

≤ ⊆

=

=

∏∑

I

U

e. (P.1.2.6)RV as (pt-wise) limit of SF For every RV X( ), there exists a sequence of SF's X ( ) s.t. X ( ) X( ) for each fixed as nn nω ω ω ω ω→ ∈ Ω → ∞

2 1( 2 ,( 1)2 ]0

Use f (x)=nI + 2 ( ) n

n nn n

n x n k kkk I x− −

− −> +=∑

What does this look like? It partitions R into the following intervals of width 12n :

(1 1 2 3 4 2 1 2 10, , , , , ,..., , , , ,2 2 2 2 2 2 2 2

n n

n n n n n n n nn n n n n

⎛ ⎤−⎛ ⎤ ⎛ ⎤ ⎛ ⎤ ⎛ ⎤= − ∞⎜ ⎤⎥⎜ ⎜ ⎜ ⎜ ⎦⎥ ⎥ ⎥ ⎥⎜⎝ ⎦ ⎝ ⎦ ⎝ ⎦ ⎝ ⎦⎥⎝ ⎦

Then, whenever the function’s values fall in one of these intervals, we approximate by the lower limit. (So, whenever x>n, we give it n) (See notes for a picture)

f. (D.1.2.7)X=Y A.S. : If Var(X-Y) = 0 then X = Y a.s.

g. (E.1.2.10,P.1.2.13,P.1.2.14,T.1.2.16)Closure Properties of R.V.’s: 1. (E.1.2.10): Limit of R.V. on a measurable space is a R.V. on that space

( )( )

nLet Ω,F be a measurable space and let X be a sequence of random variables on it. If , X ( )=lim ( ) exists and is finite.

Then, X (ω) is a RV on Ω,F .n nXω ω ω∞ →∞

∞

∀ ∈Ω

Note on usefulness: For a typical measurable space with uncountable , it is impractical to list all possible R.V.s that are defined on it. Using this closure property, we are given a tool that helps us show that a parti

Ωcular function X( ) is inded a RV.

(i.e. by showing that it's a limit to a sequence of RV's on ( ,F))ω

Ω

2. (P.1.2.13) Borel function of R.V.’s on a measurable space is a R.V. on that space ( ) ( ) ( )1 n 1If g:R is a Borel measurable function and X ,...,X are R.V. on , , then g ,..., also a R.V. on ,n

nR F X X FΩ Ωa 3. (P.1.2.14) Borel functions of R.V.’s generate a smaller sigma field (corollary from above)

1

1

1 n ( ,..., ) 1

( ,..., ) 1

For any n< , g:R R Borel measurable and any R.V. Y ,...,Y on the same measurable space, then F ( ,..., ).

If g invertible, then F ( ,..., ).n

n

ng Y Y n

g Y Y n

Y Y

Y Y

σ

σ

∞ → ⊆

=

14 | P a g e

Note on Intuition: The idea here about information. If Y(n) creates a particular information partition, then some function of Y(n) can’t give you any more information. G(Y(n)) gives you exactly the same information if G is invertible, because from G you can always back out Y.

h. (P.1.2.31)Properties of Expectation

( ) ( )1 1

(1) (Indicator) ( ) for any A F

(2) If X = is a simple function, then E(X)=

(3) (Linearity) If X and Y are integrable R.V. then for any constants , the R.V. X+ Y is integrabl

n

AN N

n A n nn n

EI P A

c I c P Aω

α β α β= =

= ∈

∑ ∑e and E( X+ Y) ( ) ( )

(4) (Constant) E(X)=c if X( ) = c w.p. 1(5) (Monotonicity) If X Y a.s., then E(X) E(Y). Further, if X Y a.s. and EX=EY, then X=Y a.s.

E X E Yα β α βω

= +

≥ ≥ ≥

i. (P.1.2.34,T.1.2.36,P.1.2.38) Inequalities (Jensen’s, Markov’s, Chebychev’s, Cauchy-Schwartz) (1) Jensen's: Suppose g(.) is a convex function, i.e. g( x+(1- )y) ( ) (1 ) ( ) x,y R, [0,1]

If X is an integrable R.V. and g(X) is also integrable, then E(g(X)) g(E(X)) w/ st

g x g yλ λ λ λ λ≤ + − ∀ ∈ ∈

≥ rict inequality if g(.) strictly convex.

(2) (General) Markov's: Suppose f is a non-decreasing, Borel measurable function with f(x) > 0 for x > 0.

Then for any R.V. X, >0, P ( )Xε ω∀ ( ) ( )

( )

( ) ( )222

2 2

1 (| |)( )

| |(3) (Usual) Markov's: Take above with f(x) = x. Then, >0, P ( )

Y-E(Y) ( )(4) Chebychev's: Take above with f(x) = x and X = Y-E(Y). Then, >0, P Y-E(Y)

(5)

E f Xf

E XX

E Var Y

εε

ε ω εε

ε εε ε

≥ ≤

∀ ≥ ≤

⎡ ⎤⎣ ⎦∀ ≥ ≤ =⎡ ⎤⎣ ⎦

( ) ( )2 2 2 2

2 2'

Cauchy-Schwartz: Suppose X,Y RV's on the same probability space w/ E Y and E . Then, E|YZ|

Note: In particular, for X,Y with mean 0, |EXY| E|XY| ( ) ( ) Jensen s Cauchy Schwartz

X E Y E Z

E Y E X−

⎡ ⎤ ⎡ ⎤< ∞ < ∞ ≤⎣ ⎦ ⎣ ⎦

≤ ≤ OR, |Cov(X,Y)| Var(X)Var(Y)

(6) Triangle: ( , ) ( , ) ( , )

Using Euclidean d(x,y)=|x-y|: and

d x y d x z d z y

x y x y x y x y

≤

≤ +

+ ≤ + − ≥ −

j. (T.1.3.6,P.1.3.21,C.1.3.15) A.S., Q.M. P, D Convergence Relationships a.s. and p convergence(a) (and converse not true)(b) (BUT) subsequence s.t. for n

k

n as n P

n P k n as

X X X XX X n X X

→ ⇒ →

→ ⇒ ∃ → → ∞

a.s. and q.m. convergence(c) does not imply and vice versa.

(d) If then . .n qm n as

n qm n as

X X X Y

X X and X Y X Y a s

→ →

→ → =

. . . .

q.m. and r.m. convergence(e) If n q m n r mX X X X r q→ ⇒ → ∀ ≤

. .

. .

q.m. and p convergence(f) (Converse not true except following)

(g) , |X | U.I. X

n q m n pq

n p n n q m

X X X X

X X X

→ ⇒ →

→ ⇒ →

(Dominated convergence is a special case of (g), with q = 1) other convergences and d convergence(h) (Converse not true, except for special case when X is nonrandom constant a.s.)

(i) If and X is nonrandom cons

n P n D

n qm n D

n as n D

n D

X X X XX X X X

X X X XX X

→ ⇒ →

→ ⇒ →

→ ⇒ →

→ tant a.s. Xn P X⇒ →

15 | P a g e

k. (P.1.3.12)Property of the || . ||q Norm: ||X||q is nondecreasing in q

l. (L.1.3.10,L.1.3.11) Borel Cantelli Lemmas

( ) ( )

( ) ( ) ( )

11

11

BCI: Let and ( ) . Then, P(A ) . . 0

BCII (converse with independence): If independent and ( ) , then P . . 1

k k n k n k kk

k k n k n k kk

A F P A P A P A i o

A P A A P A P A i o

∞∞ ∞ ∞

= ==

∞∞ ∞ ∞

= ==

∈ < ∞ ≡ = =

= ∞ ≡ = =

∑

∑

I U

I U

Note on Intuition: BC1 states that a.s., Ak occurs for only finitely many values of k (i.e. Ak does not occur for infinitely many k) if the sequence P(Ak) converges to 0 fast enough (so that the sum is < inf). BC2 is the converse of BCI when we know that Ak’s are mutually independent. The converse is not true if the independence does not hold. Note on Usefulness: BC lemmas allow us to prove many results (ABOUT ALMOST SURE CONVERGENCE). For example, j(b) above can be proven using this. See following for a typical application of BCI.

m. (P.1.3.22) Typical Application of BCI: ( )2 1( ) 1 0 a.s. n nE X n Xn

ω≤ ∀ ⇒ →

( )1 1Pf: WTS: P lim 0 1 or equivalently 0 P : lim X ( ) 0

1 Fix 0. Let A : X ( ) . Then we want to show that the Prob that A occurs for infinitely many k is

n n n

k k k

Xn n

k

ω δ ω ω δ

δ ω ω δ

⎛ ⎞⎧ ⎫⎛ ⎞= = ∀ > ∈Ω ≥ =⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎩ ⎭⎝ ⎠⎧ ⎫

> = ≥⎨ ⎬⎩ ⎭

( )( )

( )

( ) ( )

2

2 2 2 2

2 21 1

0.

X ( )1 1 Using BCI, we know P A : X ( ) = by Chebychev

1 1 Then, P A .

1 1 Thus, P =P : lim X ( ) 0. So, 0 a.s.

kk k

kk k

n n n

EP

k k k

k

A Xn n

ωω ω δ

δ δ

δ

ω ω δ ω

∞ ∞

= =

∞

⎛ ⎞⎧ ⎫= ≥ ≤⎜ ⎟⎨ ⎬⎜ ⎟⎩ ⎭⎝ ⎠

≤ ≤ ∞

⎛ ⎞⎧ ⎫∈ Ω ≥ = →⎜ ⎟⎨ ⎬⎜ ⎟⎩ ⎭⎝ ⎠

∑ ∑

n. (T.1.4.26,T.1.4.27,C.1.4.28,T.1.4.29) (Dominated/Bounded/Monotone) Convergence Theorems to Show ( ) ( )lim limn nE X E X=

( ) ( ) ( ) ( )1(0) Fatou's Lemma: If X L and X n, then lim inf liminf

where liminf denotes lim inf

(1) Dominated Convergence: Suppose a RV Y s.t. EY< , . If X , then E

n n k k

n n n k n k

n n P

X E X E X

X X

X Y n X

ω ω →∞

→∞ →∞ >

∃ ∈ ≥ ∀ ≥

⎡ ⎤⎣ ⎦∃ ∞ ≤ ∀ → ( ) ( ) ( )

( ) ( ) ( )( ) [i.e. lim lim ]

(2) Bounded Convergence: Suppose n, for some constant C finite. If , then E ( ) [i.e. lim lim ]

(3) Monotone Convergence: Suppose X 0 and ( )

n n n n

n n P n n n n

n n

X E X E X E X

X C X X X E X E X E X

X Xω

→∞

→∞

→ =

∀ ≤ → → =

≥ ↑ ( )1

( ) for a.e. (almost sure convergence), then E ( )

where ( ) ( ) means ( ) ( ) s.t. P(A)=1, and ( ) ( )n n

n n n n n

X E X

X X X X A X X

ω ω

ω ω ω ω ω ω ω

→∞

+ →∞

↑

⎡ ⎤↑ ≤ ∀ ∈ →⎣ ⎦ Note on Fatou’s Lemma: Fatou’s is useful because it tells us something about the expectations even if we don’t know that X(n) pX, or even if we don’t know that the limit of the expectation exists. Since E(Xn) is just a sequence of reals, then liminf / limsup always exists by completeness of R. Now, if a limit to E(Xn) exists, then lim inf E(Xn) and lim sup E(Xn) must coincide. Note also that Fatou’s does HALF THE WORK. If we can show the reverse inequality, then we can show that the limits are equal. Note on Dominated vs. Bounded: Bounded convergence is just a special case of dominated where the RV Y is a constant. Note on Dominated Convergence as a Special Case: Dominated convergence is a special case of . ., |X | U.I. Xq

n p n n q mX X X→ ⇒ →

o. (T.1.4.22) Uniform Integrability + Convergence in Prob q.m. Convergence

( ) ( )( ). .If and are U.I., then . . q q qn p n n q m n nX X X X X i e E X E X→∞→ → →

Note on Dominated Convergence as a Special Case: Dominated convergence is a special case of . ., |X | U.I. Xqn p n n q mX X X→ ⇒ →

16 | P a g e

Dom. Convergence is a special case because if a RV Y s.t. EY< , U.I.

So, if we have this dominated convergence condition, we have U.I., and we also have convergence in prob,then we have L1

n nX Y n X∃ ∞ ≤ ∀ ⇒

convergence by theorem, which gives us the result of dominated convergence. Note on Versions of this Theorem: Some versions of this uses Xn as X. But it does not matter because in this case they’re equivalent, by Skohorod representation theorem. Remember for the theorem to hold, we only need to show that it holds for the distribution of Xq.

p. (P.1.4.3) Change of Variables (We can measure a RV either in the input space of in the output space!) ( )

( ) ( ) ( ):

Let X be a RV on , , and let g be a Borel function on R. Suppose either g is nonegative or E ( ) . Then,

( ) ( ( )) ( ) XR

F P g X

E g X g X dP g x dP xω

ω ωΩ

Ω < ∞

= =∫ ∫

Note on Borel-Measurability: ( ) ( )( ) : ( , ) , Borel measurable , : ( ) . But the RV X has induced a measure on , ,

so we can use that measure on R, OR, we can map it back to the original probability space and measure via g x R B R B x R g x B R Bα α→ ⇒ ∀ ∈ ≤ ∈

the original P.

(All functions we will care about are going to be Borel measurable). Note on Usefulness: This allows us to have a consistent framework in thinking about taking expectations either by using the measure on the output space (i.e. cdf/pdf) or by using measure on the input space (i.e. P).

Examples: f(x) = x identity function. ( ) ( ) ( ):

( ) ( )XR

E g X x dP x X dPω

ω ωΩ

= =∫ ∫

q. (P.1.4.5) Distribution Function FX Uniquely Determines the Law PX of X (Just as Characteristic Function)

r. (P.1.4.3) X =DY E[h(X)] = E[h(Y)] for all h bounded and Borel measurable. We cal show this using the delta method.

17 | P a g e

Chapter 2: Conditional Expectation and Hilbert Spaces

I. Definitions a. Expression for Conditional Expectation for Discrete R.V. Y:

( )

( ) ( ): ( )

( )( | ) ( ): ( )

Y y

18 | P a g e

II. Theorems/Propositions a. (P.2.1.2)Projection Theorem:

( ) ( ) ( )2 22Y

Y

There exists a unique (a.s.) optional Z H L , ( ), s.t. inf E .

Further, the optimality of Z is equivalent to the orthogonality property E ( ) 0 V H

YW HY P E X Z X W

X Z V

σ ∈⎡ ⎤ ⎡ ⎤∈ = Ω − = −⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

− = ∀ ∈⎡ ⎤⎣ ⎦

b. (T.2.1.6)Consistency of L2 and General Definition of E(X|Y):

1 2

The C.E. of integrable R.V. X given any -field G exists and is a.s. unique. That is , there exists Z measurable on G satisfying partial averaging and if Z and Z are both measurable on G satisfying

σ

( )1 2

2 2

2Y

partial avraging, then Z a.s.

Further, if in addition E(X ) < then such Z also satisfies E ( ) 0 , , .

(Hence for G = F , R.V. Z coincides with the definition of E(X|Y) for X in L )

Z

X Z V V L G P

=

∞ − = ∀ ∈ Ω⎡ ⎤⎣ ⎦

c. (E.2.3.1,E.2.3.3.,P.2.3.4,P.2.3.5) Properties of Conditional Expectations

0 0

(The following can be easily verified by checking measurability and partial averaging condition)(1) (Independence) If (X) ind (Y), E(X|Y) = E(X)(2) (Trivial -Field) If F , , then E(X|F ) ( ) fE X

σ σσ = Ω ∅ =

( ) ( )( )

1

1

or any X integrable

(3) (Linearity) Let X,Y L , , . Then, E | ( | ) ( | ) for any ,

(4) (Tower Property) Suppose H G (H contains less information than G) and X L , , .

F P X Y G E X G E Y G R

F F P

α β α β α β∈ Ω + = + ∈

⊆ ⊆ ∈ Ω

( ) ( )( ) ( )( )( )1

Then, E | | | | | (smaller set wins!)

(5) (Take Out What You Know) Suppose Y bdd and measurable on G, and that X L , , . Then, ( | ) ( | )

X E E X G H E E X H G

F P E XY G YE X G

Η = =

∈ Ω = Note on Tower Property: First equality can be checked via definitions. Second equality is trivial since G is richer than H, and knowing G gives us the value of H, and thus E(X|H) is a constant given G so pull it out. When we take H to be the trivial sigma-field, then we get the law of iterated expectations!

d. (P.2.3.10) Jensen’s Inequality for Conditional Expectations ( )1Let f:R R be a convex function. Suppose X L , , is s.t. E|f(X)|< . Then, ( ( ) | ) ( ( | ))F P E f X G f E X G→ ∈ Ω ∞ ≥

e. (T.2.3.13,T.2.3.14) Monotone and Dominated Convergence for Conditional Expectations (see notes: same applies)

f. (E.2.3.8) Conditional Variance Identity

( ) ( ) ( )22 2Let Var(Y|G) = E | | where Y L , , and G a -field. Then, ( ( | )) ( ( | )).Y G E Y G F P VarY E Var Y G Var E Y Gσ− ∈ Ω = +

19 | P a g e

Chapter 3: General Theory of Stochastic Processes

I. Definitions: a. (D.3.1.1) Stochastic Process

( ) Given , , , a stochastic process is a collection : of RVs where the index t belongs to the index set I.

If I is an interval in R, we say it's continuous time. If I a subset of Z we say it's t tF P X X t I

+

Ω ∈

discrete time.

Fixing , we call t X ( ) the sample function or sample path of the S.P. tω ω⎡ ⎤⎣ ⎦

a

Note on Sigma Field Generated by S.P.: Clearly, all RV’s in this collection are defined on a common probability space.

b. (D.3.1.2) Random Walk ( ) Given , , , a stochastic process is a collection : of RVs where the index t belongs to the index set I.

If I is an interval in R, we say it's continuous time. If I a subset of Z we say it's t tF P X X t I

+

Ω ∈

( )n 1

discrete time.

Fixing , we call t X ( ) the sample function or sample path of the S.P.

A random walk is the sequence S where are iid RV defined on the same probability space , , .

Sp

tn

i iiF P

ω ω

ξ ξ=

⎡ ⎤⎣ ⎦

= Ω∑a

n

n

ecial Case: If 1,1 , then S is a simple random walk. If , then S is a random walk on the integers

i

i Zξ

ξ

∈ −

∈

c. (E.3.1.4) S.P. with Independent Increments ( )tS.P. has independent increments if X -X is independent of ,0t t h sX X s tσ+ ≤ ≤

d. (D.3.1.5)Finite Dimensional Distribution Function:

( )( ) ( )

1 1

1 1

1 2 ,..., (.)

,..., 1 1 1

Given and a collection , ,..., in , we denote the joint distribution of ,..., by ,

so ,..., ,..., ,..., .

We call the collection of

N N

N N

N t t t t

t t N t t N N

N t t t I X X F

F P X Xα α α α α α

< ∞

= ≤ ≤ ∀ ∈ℜ

1,..., (.) functions the finite dimensional distributions of the S.P. Nt tF

Note on the limitations of FDD: This is the S.P. analog of the distribution function, which always exist. However, FDD does not tell us about some important properties of the S.P. such as the continuity of its sample path. We can have S.P. with same FDD but one is continuous, the other is not. EX: (See notes)

e. (D.1.3.12) Consistency of FDD

( ) ( )1 1 1,..., 1 ,..., , ,... 1 1

1

We say that a collection of finite dimensional distributions is consistent if lim ,..., ,..., , ,

for any 1 k N, t ..N k k Nk t t N t t t t k k NF Fα α α α α α α

− −↑∞ =

≤ ≤ < . and N it I Rα< ∈ ∈

Note on S.P.’s and Consistency of FDD: The FDD of any SP must be consistent (and converse also true). This is intuitive because FDD is a joint distribution function of a subset of the random variables, and thus if we integrated one out, it would give us the joint distribution function of the rest.

f. (D. 3.1.14) Sigma-Field of a S.P. (See notes for construction)

g. (D.1.3.7) Versions: S.P.'s , are versions of one another if they have the same f.d.d.t tX Y

h. (D.1.3.8) Modifications:

( )A S.P. is a modification of another S.P. if : ( ) ( ) 1 t It t t tY X P X Yω ω ω= = ∀ ∈

Note on Version vs. Modification: Modification is stronger. It implies versions. Two S.P. are modifications implies also that they’re on the same probability space. But two stochastic processes can be versions of each other and not necessarily defined on the same probability space.

i. (E.3.3.7) Indistinguishable sample path (a.s.): ( ), have indistinguishable sample paths (a.s.) if P : ( ) ( ) t I 1t t t tX Y X Yω ω ω= ∀ ∈ =

20 | P a g e

j. (D.3.2.1) Characteristic Function of a Random Vector ( )

( ) ( ) ( )

( )

1 2

'1

cos

A random vector X= , ,..., with values in R has the characteristic function

where = ,..., and 1

j jj

j jj

nn

i Xi X nX n

X

X X X

E e E e R i

E e

θθ

θ

θ θ θ θ⎡ ⎤

⎡ ⎤ ⎢ ⎥Φ = = ∈ = −⎣ ⎦ ⎢ ⎥⎣ ⎦

=

∑

∑ ( )sin

j jji Xθ+⎡ ⎤

⎢ ⎥⎢ ⎥⎣ ⎦

∑

Note on Existence: Unlike moment generating functions, characteristic functions always exist.

XΦ :R C exists for any X because cos sin has both real and imaginary parts that are bounded

and hence integrable R.V.'s.

nj j j j j jj j jX X i Xθ θ θ⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠∑ ∑ ∑a

Note on Properties: ( ) ( )0 =1 and 1 n

X X Rθ θΦ Φ ≤ ∀ ∈ (by prop or sin and cos)

Note on Usefulness: Characteristic function uniquely determines the law of a random variable. (IFF!)

Note on Fourier Transforms:

( )When the R.V. X has a PDF , then the characteristic function is ( )

which is just the Fourier transform of the density function .

i X i xX X X

X

f E e e f x dx

f

θ θθ∞

−∞⎡ ⎤Φ = =⎣ ⎦ ∫

k. (D.3.2.4) PDF of a Random Vector

( )

( ) ( ) ( ) ( )1

1

1 2

1 1 1 1 1

A random vector , ,..., has a PDF if for every

,..., : , 1,2,..., ... ,..., ...n

n

n X i ib b

n n n i i i X n na a

X X X X f a b i

P a X b a X b P a X b i n f x x dx dxω ω

= < ∀

≤ ≤ ≤ ≤ = ≤ ≤ = = ∫ ∫

Note: Such a function (also called the joint density of X(1)…X(n) ) must be a non-negative Borel measurable function that integrates out to 1.

l. (D.3.2.6) Positive Semi-definite Matrix

jkAn nxn matrix with entries A is called p.s.d. if nonzero vector R , 'A 0.nθ θ θ∀ ∈ ≥A

m. (D.3.2.7) Gaussian Distribution ( )

( )

1 2

1 '' '2

We say that a random vector , ,..., has a Gaussian distribution if

for some psd symmetric nxn matrix (VarCov matrix), some

n

i X iX

X X X X

E e e eθ θθ θ μθ

− Σ

=

⎡ ⎤Φ = =⎣ ⎦

Σ 1=( ,..., ) (vector of means) and all Rnnμ μ μ θ ∈

Note on R:

2 212 2A RV X is Gaussian if for some , 0, and all R, E

ii XR e eθ σ θμθμ σ θ

− +⎡ ⎤∈ ≥ ∈ =⎣ ⎦

Note on Non-Degeneracy (D.3.2.8): We say that X has a nondegenerate Gaussian distribution if the matrix is invertible, or, when is strictly positive definite. Σ Σ What’s the big fuss? The definition allows for a non-invertible VarCov matrix so to include non-random constants as (technically) Gaussian, even though it does not have a density. But this is done for technical reasons, so that the set of Gaussian distributions is closed in L2, i.e. contains all its limit points. (For example, a sequence of normal RV’s s.t. X(n) = N(0,1/n) converges to the limit 0, a non-random constant. With this definition, 0 is Gaussian, and the limit point is contained in the space).

21 | P a g e

n. (P.3.2.5) Gaussian Density

( )( ) ( )1

/2

2

A random vector X with a nondegenerate Gaussian distribution has the density

1 1 ( ) exp22 det

In particular, if 0, then a Gaussian RV has density

TX nf x x xμ μ

π

σ

−⎧ ⎫= − − Σ −⎨ ⎬⎩ ⎭Σ

>

( )22

1 1 ( ) exp2 2Xf x x μπ σ σ

⎧ ⎫= − −⎨ ⎬

⎩ ⎭

o. (D.3.2.14) Gaussian Stochastic Process

( )11A stochastic process , is Gaussian if for all n< , and all ,..., , the random vector ,..., has a Gaussian distribution.

i.e. All FDD's of the process are Gaussian. nt n t tX t I t t I X X∈ ∞ ∈

Note on Distributional Properties (C.3.2.15):

( ) ( )( )All distributional properties of a Gaussian processes are determined by the mean (t)=E(X )

of the process and its autocovariance function , ( ) ( )t

t st s E X t X t

μ

ρ μ μ⎡ ⎤= − −⎣ ⎦

This is not surprising because the distributional properties are characterized uniquely by the characteristic function, which in this case

Takes the form ( )1 '' '2 i X i

X E e e eθ θθ θ μθ

− Σ⎡ ⎤Φ = =⎣ ⎦ , which is parameterized by the mean vector and the Var-Cov matrix. Elements of the mean

vector and var-cov function are exactly what (t)μ and ( ),t sρ are.

p. (D.3.2.19) Strong and Covariance/Weak Stationary Process Strong

( ) ( ) ( )1 1 1 1

1

,..., 1 1 1 ,..., 1

A stochastic process , is called (strong) stationary if its FDD's satisfy: , i=1,...,N, and ...

,..., ,..., ,..., ,...,N N N N

t i N

t t N t t N t t N t t

X t R R R t t R

F P X X P X X Fτ τ τ τ

τ α

α α α α α α α α+ + + +

∈ ∀ ∈ ∈ < < ∈

= ≤ ≤ = ≤ ≤ = ( ) N

Weak

( ) ( )

A stochastic process , is called (weak/covariance) stationary if

(t)= (non-random constant) and , (i.e. a function of the time difference).tX t R

t s r t sμ μ ρ

∈

= −

Note on Discrete Time Definitions: Strictly Stationary Processes: A stochastic process/sequence of RV’s (e.g. a time series) ...)2,1( =izi is (strictly) stationary if

1 1( ,..., ) 0,1,2,... ( ,..., ) ( ,..., )t t K K t t KF Z Z does not depend on t for all K F Z Z F Z Z for all t and K+ + += ⇔ = F(Zt) does not depend on t! (all observations come from same distribution. Though this says identical but not necessarily independent) Prop: If ...)2,1( =izi (strictly) stationary, then ( ) ( 1,2...)ig z i = (strictly) stationary for some cont. function g2. Weakly Stationary Process: A stochastic process/sequence of RV’s ...)2,1( =izi is weakly (or covariance) stationary if: E(zi) does not depend on i and Cov(zi,zi-j) exists, is finite, and depends only on j but not on i (e.g. Cov(z1,z5) =Cov(z12,z16) )

22 | P a g e

II. Theorems/Propositions a. Modification Versions

b. X(t) , Y(t) modifications of each other does not imply that they’re indistinguishable.

c. If X(t), Y(t) modifications of each other and are either a) discrete time or b) right continuous continuous time, then they are also

indistinguishable.

d. (P.3.2.2) Characteristic Function Uniquely Determines the Law PX of a Random Vector ( ) ( )The characteristic function determines the law of a random vector. That is, if , then X has the same law as Y.

(i.e. has the same probability measure on R , and thus, the same distributio

nX Y

n

Rθ θ θΦ = Φ ∀ ∈

n.)

Note on MGF’s: The law of a non-negative RV is uniquely determined by MGF. However, MGF is not defined for many RV’s, and thus not very useful.

e. (P.3.2.5) Properties of Characteristic Functions

( )

( ) ( )( ) ( )

( )

1 2

'

(1) (Indepdence) If , ,..., mutually independent RV's iff

(2) (Linearity) For a,b R,

(3) (Gaussian)

j j j j

n

i X i Xi XX

i baX b X

iX

X X X X

E e E e E e

e a

E e

θ θθ

θ

θ

θ

θ θ

θ

+

=

⎡ ⎤⎡ ⎤ ⎢ ⎥Φ = = =⎣ ⎦ ⎢ ⎥

⎣ ⎦

∈ Φ = Φ

Φ =

∏ ∏

1 '' '2X ie eθ θ θ μ− Σ

⎡ ⎤ =⎣ ⎦

f. (P.3.2.9,P.3.2.10,E.3.2.11,P.3.2.12,P.3.2.13) Properties of Gaussian Random Vectors

( )(1) Uncorrelated Independent: If X,Y jointly Gaussian, then Cov , 0 (Converse not true. See 3.2.11. i.e. you can have Gaussian RV's that are not jointly Gaussian, have cov 0, and not indepen

X Y X Y⇒ = ⇒ ⊥

1 ( 1) 11

dent)

(2) X Gaussian , 1,.., is a m-dimensional Gausian random vector, for linear combinations .

(i.e. any sub-vector composed of linear combinations of the elem

nm

nx ji i j nx ji

X j mα α=

=

⎛ ⎞⎜ ⎟⇔ =⎜ ⎟⎝ ⎠∑

( )( ) ( )1

/2

ents of a Gaussian vector is also Gaussian.)1 1(3) X Gaussian has PDF: ( ) exp

22 det(4) Distribution of X Gaussian has parameters and , the variance-covariance matrix and the

TX nf x x xμ μ

π

μ

−⎧ ⎫= − − Σ −⎨ ⎬⎩ ⎭Σ

Σ

( )1

mean vector, respectively.

(5) Closure: If a sequence of n-dimensional Gaussian random vectors X converges in L2 to a n-dimensional vector X,

then, X is a Gaussian random vecto

kk≥

( ) ( ) ( )

r whose parameters and are the limits of the corresponding parameters of

the sequence of random vectors, and of X . A stochastic process , is Gaussian if for a

k k k

tX t I

μ

μ

Σ

Σ

∈ 1ll n< , and all ,...,t∞

g. (C.3.2.15,P.3.2.16)Properties of Gaussian Processes

( )( ) ( )( )

(1) (t) and , of a Gaussian SP: All distributional properties of a Gaussian processes are determined by the mean (t)=E(X )

of the process and its autocovariance function , ( ) ( )t

t s

t s

t s E X t X t

μ ρ μ

ρ μ μ= − −

( )

2( )

(2) If the SP , and the Gaussian S.P. , (a sequence of collections of RVs, or sequence of functions from Ix R)

are such that for each fixed t I, E 0 (each func

kt t

kt t k

X t I X t I

X X →∞

⎡ ⎤⎣ ⎦

∈ ∈ Ω →

⎛ ⎞∈ − →⎜ ⎟

⎝ ⎠

2

( )

tion converges pointwise on t, based on ||.|| metric),

then, X is a Gaussian S.P. with the same mean and auto-covariance functions as the pointwise limits of , .

(3) Stationarity: A Gaussi

kt tX t I∈

( ) ( )an SP is stationary iff it is covariance/weak stationary

i.e. (t)= (non-random constant) and , (i.e. a function of the time difference).t s r t sμ μ ρ = −

Note on (2): The proof here is not difficult to see, given what we know about the closure of a Gaussian random vector, and given that we know that a SP is Gaussian iff all FDD’s are Gaussian. That is, take any arbitrary index set, and look at the sequence of functions over the sub-index

23 | P a g e

(forming a vector since now countable). We know that the vector is Gaussian, that it converges to the corresponding sub-set of the SP X, which must be Gaussian by the closure of Gaussian random vectors in L2. Since we picked an arbitrary index set, then it must be true that all FDD’s are Gaussian. Note on Random “Process” and Random Vector: When we say “process”, t in I may not be finite, or countable. Whereas when we say “vector” we are assuming countable (and often finite).

h. (T.3.3.16) Fubini’s Theorem If is measurable SP, then for a.e. , the function t ( ) is measurable from R to R.

Moreover, for any interval I s.t. | | , we have that , (can exchange the order

n t

t t t tI I I I

X X

E X dt X dt EX dt E X dt

ω ω

< ∞ < ∞ =∫ ∫ ∫ ∫a

of integration).

Further, if E for all t I then the function t is Borel measurable (in I)t tX EX< ∞ ∈ a

24 | P a g e

Chapter 4: Martingales and Stopping Times (Martingales is a collection of S.P. – Random Walk and Brownian Motion are examples)

Definitions

a. (D.4.1.1) Filtration: A (discrete time) filtration is a non-decreasing family of sub-sigma-fields of nF of our measurable space ( ), FΩ That is, 0 1 ... ...nF F F F⊆ ⊆ ⊆ ⊆ where nF is a sigma field for each n.

Note on Usefulness: A filtration represents any procedure of collecting more and more information as time goes on. Given a filtration, we’re interested in S.P. s.t. for each n the information gathered by that time suffices for evaluating the value of the n-th element of the process.

b. (D.4.1.2) S.P. Adapted to Filtration: A (discrete time) S.P. , 0,1,...nX n = is adapted to a filtration nF if ( )nXω ωa is a R.V. on

( ), nFΩ for each n. i.e.) ( )n nX Fσ ⊆ for each n. Note: It can be shown that , 0,1,...nX n = is adapted to a filtration nF iff ( )0,..., n nX X F nσ ⊆ ∀

c. (D.4.1.3) Minimal Filtration / Canonical Filtration ( )

0The filtration ,..., is the minimal filtration with respect to which is adapted.

We call it the canonical filtration for the S.P. . i.e. s.t. S.P. is adapted to , An n n

n n n n n n

G X X X

X A X A G

σ=

⎡ ⎤∀ ⊆⎣ ⎦

d. (D.4.1.4,D.4.2.1) Martingale (D.4.1.3) Discrete Time

( ) ( )A martingale (denoted MG) is a pair , , where is a filtration and an integrable i.e. E S.P. adapted to this filtration s.t.

n n n n nX F F X X < ∞

1 E | n, a.s. n n nX F X+ = ∀⎡ ⎤⎣ ⎦ Note: If you’re MG wrt a filtration, then you’re a MG w.r.t. a “slower” filtration.

( ) ( )The slower a filtration n grows, the easier it is for an adapted S.P. to be a martingale. That is, if n,

and S.P. is adapted to filtration is s.t. , is MG, then , also a MG.

Pf

n n n

n n n n n n

F H F

X H X F X H

⊆ ∀a

( ) ( ) ( ) ( )1 1 1: Use Tower Property: | | | by Tower | = since |n n n n n n n n n n nE X H E E X F H E X H X E X F X+ + +⎡ ⎤= = =⎣ ⎦ Note on the Meaning of Filtration: 1 1 1 0E | | , ,...,n n n n nX F E X X X X+ + −=⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

(D.4.2.1) Continuous Time

( )

The pair , , t 0 real-valued, is called a continuous time martingale (in short MG) if:

(a) The σ-Fields , t 0 form a continuous time filtration.

That is, 0 and h>0

(b) The continu

t t

t

t t h

X F

F F

F F t+

≥

⊆ ≥

⊆ ∀ ≥

ous time S.P. is integrable and adapted to this filtrationa.

That is, E|X | and (X ) t 0

(c) For any fixed t 0 and h>0, the identity ( | )

t

t t t

t h t t

X

F

E X F X

σ

+

< ∞ ⊆ ∀ ≥

≥ =

e. (D.3.1.14) L2 Martinglale

( )2 2A L -MG (or square-integrable MG) is a MG s.t. E n.n nX X < ∞ ∀

f. Martingale Differences

1 0 0- for n 1 and are called the martingale differences associated with a martingale .n n n nD X X D X X−= ≥ =

g. (D.4.1.11) Previsible / Predictable Process

1We call a sequence previsible (or predictable) for the filtration if is measurable on for all n 1.n n n nV F V F − ≥

25 | P a g e

h. (D.4.1.16) Orthogonal Sequence of R.V.’s

( )( ) ( )

( )

2

0 1 1 0 1 1

20 1 1

We say that D , , is an orhogonal sequence of RV if

, ,..., , ,...,

for any n 1 and every Borel function h:R R s.t. , ,...,

n

n n n n

nn

L F P

E D h D D D E D E h D D D

E h D D D

− −

−

∈ Ω

⎡ ⎤ ⎡ ⎤= ⎡ ⎤⎣ ⎦⎣ ⎦ ⎣ ⎦

⎡ ⎤≥ < ∞⎢ ⎥⎣ ⎦a

0 1

Equivalently,

| ,..., n n nE D D D E D− =⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

i. (D.4.1.18) Super-martingales and Sub-martingales

( )n

1 1

A submartingale (denoted subMG) is an integrable SP X , adapted to the filtration , s.t.

| . . i.e. P : | ( ) ( ) 0 0

n

n n n n n n

F

E X F X n a s E X F Xω ω ω+ +⎡ ⎤≥ ∀ − < =⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦⎢ ⎥⎣ ⎦

n n Alternatively: X subMG iff -X superMG

( )n

1 1

A super-martingale (denoted superMG) is an integrable X adapted to the filtration s.t.

| . . i.e. P : | ( ) ( ) 0 0

n

n n n n n n

F

E X F X n a s E X F Xω ω ω+ +⎡ ⎤≤ ∀ − > =⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦⎢ ⎥⎣ ⎦

n n Alternatively: X subMG iff -X superMG

Note on SubMG, SupMG, and MG: All the results about SubMG are dual stamnts for supMG’s and vice versa. Further, every MG is both a SubMG and SupMG by definition, therefore all the results holding for either SubMG or SupMG hold for MG as well.

j. (D.4.2.8) Last Element of a subMG or supMG ( ) ( )

( )( )( )

We say that a subMG , has a last element , if , the integrable RV is measurable on F

and for each t 0, a.s. E | .

similarly for supMG, but with E |

t t t

t t

t t

X F X F F F X

X F X

X F X

∞ ∞ ∞ ∞ ∞

∞

∞

⊂

≥ ≥

≤

Note: Last element preserves sub/supMG

k. (D.4.2.9) Right-Continuous Filtration

0A filtration is called right-filtration if for any t 0, t h th

F F+>≥ =I

Note1: We usually assume without proof that continuous time filtrations are right-continuous. Note2: Note that not all filtrations are right-continuous. Consider 0 0, while (where F F ) is not right-continuous at t=0. hF F F= ∅ Ω = ≠

00bc Fhh

F F>

⎡ ⎤= ≠⎢ ⎥⎣ ⎦I

l. (D.4.3.1, D.4.3.11) Stopping Time t for a filtration Ft

Discrete Time A RV taking values in 0,1,..., ,..., is a stopping time for the filtration if the event : ( ) for each finite n 0n nn F n Fτ ω τ ω∞ ≤ ∈ ≥

(set of w’s s.t. the process stops before time n) Note: This means that the filtration contains information about whether or not the process has stopped. Continuous Time

A non-negative RV ( ) is called stopping time wrt the continuous filtration if : ( ) F for all t 0.t tF tτ ω ω τ ω ≤ ∈ ≥

Note on Min (M,N), Max(M,N) and M+N as Stopping Time: It can be shown (E.4.3.3) that if M,N stopping time, then M+N and Min(M,N) and Max(M,N) also is. Note on expressing w:t(w)<n to check stopping time (DISCRETE TIME):

( ) 1

Let ( ) min 0 : ( ) for some set B Borel

: ( ) : ( )

Idea: LHS is the set of outcomes such that the process is stopped by n. RHS is the union of the set of all such outcomes.

kni i

k X B

n X B

τ ω ω

ω τ ω ω ω=

= ≥ ∈ ∈

≤ = ∈U

Note on Showing Stopping Time for Continuous Time: Use Prop 4.3.13 – I f t X(t) is continuous a.s. and B is a closed set, then t(w) = inf(t>0:X(t) in B) is a stopping time!

26 | P a g e

Note on Diff Bt Discrete & Cont Time First Hitting Time: For discrete , FHT is a stopping time. For continuous, not necessarily, esp if B is not closed or sample path not continuous (see bottom of pg. 83).

m. (D.4.3.5) Stopped Process ( )

( )( )

Using the notation n min , ( ) , the stopped at stochastic process is given by

( ), ( )( ), ( )

n

nn

n X

X nX

X n

τ

ττ ω

τ τ ω τ

ω τ ωω

ω τ ω

∧

∧

∧ =

≤⎧⎪= ⎨ >⎪⎩

n. (D.4.4.2) Innovation Process

( ) ( )1 1 1 1 1

When using Doob's decomposition for the canonical filtration , , the MG is called the innovation process associated with .

Note: The reason for this name is that k n n

n n n n n n

X k n Y X

X Y A A Y Y Y

σ

+ + + + +

≤

= + = + + −( ) ( )1

1 1

, where is measurable on , while

describes the "NEW" part of . n n n k

n n n

A Y X k n

Y Y X

σ+

+ +

+ ≤

−

o. (D.4.4.8) Increasing Part of a M.G. Mt

2The S.P. in the Doob-Meyer decomposition of is called the increasing part or the increasing process associated with the MG . t t tA M M

p. (D.4.6.1) Branching Process (We use MGs to study the extinction probabilities of branching processes)

1

( )0

1( )

The branching process is a discrete time S.P. taking non-negative integer values, s.t.

1 and n 1

where N and N for j=1,2,... are i

n

nZ

nn j

jnj

Z

Z Z N−

=

= = ∀ ≥∑

1

id non-negative integer valued R.V. with finite mean ( ) ,

and where we use the convention that if 0 then also 0.n n

m E N

Z Z−

= < ∞

= = Note on Interpretation:

( )

The S.P. is interpreted as counting the size of an evolving population so that is the size of the n generation.

The S.P. N is the number of offsprings of j individual of generation (n-1)

thn nn thj

Z Z

( )

.

Associated with the branching process is the family tree with the root denoting the 0-th generation and having N edges from the vertex j

at distance (n-1) from the root to vertices of distance n

nj

( ) ( )

( )

from the root. Random trees generated in such a fashion are called Galton-Watson trees.

We shall use throughout the filtration , , 1,2,...

Note that in general, ,

(STRICT subset

kn j

n k n

F N k n j

G Z k n F

σ

σ

= ≤ =

= ≤ ⊂

, since we cannot in general recover the number of offsprings of each individual knowing only the total population sizes at different generations)

q. Probability of Extinction ( )0 for some n 0ex np P Z= = ≥

Z0=1

(1)1N

(1)2N

(0)11N Z=

27 | P a g e

Theorems

a. , 0,1,...nX n = is adapted to a filtration nF

iff ( )0,..., n nX X F nσ ⊆ ∀

b. (P. 4.1.7):

( ) ( )

0

1

If then the canonical filtration for is the same as the canonical filtration for .

Furthermore, , is a martingale iff is an integrable S.P., adopted to s.t. E | 0 a.s.

nn i n ni

n n n n n n

X D X D

X F D F D F=

+

=

= ∀

∑n.

c. (T.4.1.12) : Martingale Transform ( )

( )11

Let , be a MG and be a previsible sequence for the same filtration. The sequence of R.V.

Y

called the martingale transf

n n nn

n k k kk

X F V

V X X −== −∑

. .orm of V wrt X , is then a MG with respect to the filtration , provided

1 1for some non-random constatnst , or more generally E for all n and some 1 p, q< s.t. 1.

n n n

qn n

F V C

C Vq p

≤

< ∞ < ∞ ≤ ∞ + =

d. (P.4.1.15,P.4.1.17) Alternative and Equivalent Characterizations of M.G. S.P. (P.4.1.15)

( ) ( ) ( )2 21A S.P. X , , adapted to the filtration is a MG iff E 0 for any Z , ,n n n n nL F P F X X Z L F P+⎡ ⎤∈ Ω − = ∈ Ω⎣ ⎦

Note on Intuition: This result follows from the fact that Xn is a MG iff E[Xn+1|Fn]=Xn. But the idea is that no random variables that live in the

space ( )2 , ,nL F PΩ is “correlated” (in the sense of having a non-zero inner product, or non-orthogonal in the L2 norm) to the difference between

Xn+1-Xn. Because if so, then it means that some random variable in the space ( )2 , ,nL F PΩ is correlated with the difference, and hence we know

something about Xn+1.

(P.4.1.17) ( )2

1A S.P. X , , is a MG for its canonical filtration iff it has an orthogonal, zero-mean differences sequence , 1n n n nL F P D X X n−∈ Ω = − ≥ Note on Intuition: This is just a simple reformulation of P.4.1.15 above, using the definition of orthogonal sequence of R.V. Note on Application to Gaussians:

( ) ( )( )0

From above, a necessary condition for the MG property is to have E 0 and E 0 0 .

With the Gaussian vector ,..., having uncorrelated elements, we know that the var-cov matrix is diagon n i

n

D D D i n

D D D

= = ∀ ≤ <

= Σ

( ) ( ) k0

nal

and the characteristic function . This is true iff D 's are independent (by P.3.2.5).

Thus, for a Gaussian S.P. having independent, orthogonal or uncorrelated differences are equivalk

nD kk

θ θ=

Φ = Φ∏D

ent properties, whichtogether with each of these differences having a zero mean is also equivalent to the MG condition.

e. (R.4.1.21) SubMG (SupMG) have non-decreasing (non-increasing) expectation E(Xn)

( )

n

1 1 1 1

If a subMG, then necessarly n E(X ) is non-decreasing. (non-increasing for supMG)

Pf: | since ( | )( ) ( ) A s.t P(A)=1

n

n n n n n n n

X

E X E E X F E X E X F Xω ω ω− − − −⎡ ⎤= ≥ ≥ ∀ ∈⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦⎣ ⎦

a

1 1 So, ( | )( ) ( ) ( | )( )n n n nA A

E X F dP E X Fω ω

ω ω ω− −∈ ∈

≥∫ ∫

f. (T.4.3.6) Preservation of (sub/sup)MG of Stopped Processes Discrete Time

( ) ( )If , is a subMG (or supMG or a MG), and is a stopping time for , then , is also a subMG (or supMG or MG). n n n n nX F F X Fττ ∧

( ) ( ) ( )( ) ( )

Pf: ( ), ( )( ), ( ) ( ), ( ) from previous

|( ), ( ) ( ), ( ) from the fact that X is subMG( ) | , ( )

nn nn n n

n nn

X nX n X nX E X F

X n X nE X F nτ ττ ω τ ω

ω τ ωω τ ω ω τ ωω

ω τ ω ω τ ωω τ ω∧ ∧

≤⎧≤⎧ ≤⎧⎪ ⎪= ⇒ = ≥⎨ ⎨ ⎨> >>⎪ ⎩⎪⎩ ⎩

(C.4.3.7) ( ) ( )

( ) ( ) ( )0

0

If , is a subMG and is a stopping time for , then | .

If in addition , is a MG, then .n n n n n

n n n

X F F E X F X n

X F E X E Xτ

τ

τ ∧

∧

≥ ∀

=

28 | P a g e

Continuous Time ( )

( )( )

f is a stopping time for the filtration and the S.P. of right-continuous sample path is a subMG or supMG or a MG for ,

then is also a subMG (or supMG or MG) for this filtration.t t t

t t

F X F

X Xτ τ ω

τ

ω∧ ∧=

g. (P.4.3.13) Checking Stopping Time for a Continuous-Time Process

( )

If the sample path t ( ) is continuous and B is a closed set, then ( ) inf 0 : ( ) is a stopping time

for the canonical filtration , .t B t

t s

X t X B

G X s t

ω ω τ ω ω

σ

∀ ∈Ω = ≥ ∈

= ≤

a

Note on Proof: Details in pg 83.

: ( ) : ( )

Now, each : ( ) . However, we have an uncountable union, and therefore we don't know if this is in fact in G .

It can be shown that : ( ) : ( )

st t

B s t s

s t t

s t s k Q s Q s k t

t X B

X B G

X B X B F

ω τ ω ω ω

ω ω

ω ω ω ω

≤

≤ ∈ ∈

≤ = ∈

∈ ∈

∈ = ∈ ∈

U

U I U

ince : ( ) and we have countable union and intersection

where Q is the set of all rational numbers in [0,t) together with t

1 1and B , is an open set containing B (so that as k

s k t

t

k y B

X B F

y yk k

ω ω

∈

∈ ∈

⎛ ⎞= − + →⎜ ⎟⎝ ⎠

U , B B). k∞ →

h. (T.4.3.8, T.4.3.16) Doob’s Optional Stopping Theorem

(T.4.3.8) Discrete Time ( )

( ) ( )( ) ( ) ( )

0

0

If 1) , is a subMG, 2) a.s., 3) is a stopping time for the filtration and 4) the sequence is uniformly integrable.

Then, .

If in addition, 5) , is a MG, then .

n n n n

n n

X F F X

E X E X

X F E X E X

τ

τ

τ

τ τ ∧< ∞

≥

=

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

. .

Pf: Observe that whenever ( ) , .

Since, a.s. (by assumption), thus

Also, we have U.I.

Thus, we have = lim by a.s. conver

nn

a s Pn n

n

n n

X X

X X X X

X

E X E X

τ ω τ ω

τ ω τ ω τ ω τ ω

τ ω

τ τ

τ ω ω ω

τ ω ω ω ω

ω

→∞∧

∧ ∧

∧

→∞ ∧

< ∞ →

< ∞ → ⇒ →

( )

( ) ( )1

0

gence above = lim E by Th.1.4.22

U.I. and converges in prob

by corollary 4.3.7 above, since | a subMG

n n

n n L

n n

X

X X X

E X X F

τ

τ τ τ

τ

→∞ ∧

∧ ∧

∧

⎡ ⎤⇒ →⎣ ⎦≥

(T.4.3.16) Continuous Time

( ) ( )

If 1) , is a subMG with 2) right-continuous sample path (t is right continuous A s.t. P(A) = 1)

3) a.s., 4) is a stopping time for the filtration and 5) the sequence is unift t t

t t

X F X

F X τ

ω ω

τ τ ∧

∀ ∈

< ∞

a

( ) ( )( ) ( ) ( )

0

0

ormly integrable.

Then, .

If in addition, 6) , is a MG, then .t t

E X E X

X F E X E Xτ

τ

≥

=

i. (E.4.3.9) Typical Example of Application of Doob’s Optional Stopping

29 | P a g e

( ) ( )

( )

( )

1

,

1Consider the simple random walk with 1,1 iid s.t. 1 1 so that ( ) 0.2

Let =inf 0 : , (first time that S exits the interval).

We can show that...1) , is a MG

2

n

n k k k k kk

a b n n

n n

S P P E

n S a b

S F

ξ ξ ξ ξ ξ

τ=

= ∈ − = = = − = =

≥ ∉ −

∑

( ) ( )

n

,

0

) S has right continuous sample paths3) < a.s.4) is a Stopping Time

5) max( , ) so that U.I.

Thus, by Doob's optional Stopping, we know that...0 where , necessarily b

a b n

n n

F

S a b S

E S E S S a b

τ τ

τ τ

τ

τ

∧ ∧

∞

−

≤

= = ∈ −

( ) ( ) ( ) hits -a before b hits b before -a

y construction.

Thus, (1 ) 0 and

t t

a a a bS S

b aE S aP S a bP S b aP b P P Pa b a bτ τ τ= − = − + = = − + − = ⇒ = =

+ +1442443 14243

j. (E.4.3.17) A Proof for why t <inf

( ) ( )

( ) ( ) ( )

.

. .

.

20

/

.

Let inf 0 : ( , )

lim

Now, inf 0 :

It's sufficient to show that a.s.

2max by reflection principle exp / 2 12

Thus, a.s.

a b t

a b n a b

a b b t

b

b s T s TT

a b b

t W a b

P P n

t W b

P T P W b x dxα

τ

τ τ

τ τ

τ

τπ

τ τ

→∞

∞

≤ ≤ →∞

= ≥ ∉ −

= ∞ = >

≤ = ≥ ≥

< ∞

≤ = ≥ = − →

≤ < ∞

∫

k. (T.4.4.1) Doob’s Decomposition

( )

Given an integrable S.P. , adapted to a discrete parameter filtration , n 0, there exists a decomposition

s.t. , is a MG and is a previsible S.P. This decomposition is unique up n n

n n n n n n

X F

X Y A Y F A

≥

= + 0

0

to the value of Y ,

a R.V. measurable on F .

30 | P a g e

( )( )

0 1 1 1

-1 1 1 1

1 1 1 1

Pf: Let A 0, and for all n 1, let | .

- | is F measurable, by def of C.E.

Also, since by def. of filtration, so is F measurab

n n n n n

n n n n n n

k n k k n

A A E X X F

A A E X X F

F F k n A A

− − −

− − −

− − − −

= ≥ = + −

= −

⊂ ∀ ≤ −

( ) ( ) ( ) ( )

1 2 0 1 1 2 1 0 0 0 11

1

le.

Now, A ... ...

Then, is measurable w.r.t. n 1 and thus a previsible process by def.

To check is a marti

n

n n n n n n n n k kk

n n

n n n

A A A A A A A A A A A A A A

A F

Y X A

− − − − − −=

−

= ± ± ± ± = − + − + + − + = + −

∀ ≥

= −

∑

( ) ( )1 1

ngale, need to show that , , where is a filtration and Y integrable i.e. E

S.P. adapted to this filtration s.t. | .

Y integrable. We know that X

n n n n n

n n n

n n

Y F F Y

E Y F Y

WTS− −

< ∞

=⎡ ⎤⎣ ⎦•

( ) ( )( )

1 1 1 1 1 1

1 1 1 1

is integrable, by assumption, thus it suffices to show that A integrable.

| | | |

| |

n

n n n n n n n n n n

n n n n n n n n

A A E X X F A A E X X F

E A A E E X X F EE X X F

− − − − − −

− − − −

− = − ⇒ − = −

⇒ − = − ≤ −( ) ( )

1 1

1

1 1

by Jensen's Inequality =

since integrable n integrable.

Now, we have and integrable.

E

n n

n n n

n n n n

n n n

E X X

X X X

A A X X

Y E X A E

− −

−

− −

− ≤ ∞

∀ ⇒ −

− −

= − = ( ) ( ) ( ) ( ) ( )1 1 1 1 1 1 1 1

1 1 1 1

1

E by triangle inequality of the L1 norm

where

n n n n n n n n n n n n

n n n n n n

n n

X X X A A A E X X X A A A

X X E X A E A A

C E Y

− − − − − − − −

− − − −

−

− + + − − = − + − + −

≤ − + − + −

≤ + 1 1

0 0 0 0 01 1 1

01

E by integrability of these 2 terms

since A 0

since and X integr

n n n n nn n n

k k kk k k

n

kk

X X E A A C

C E Y C E X A C E X

C

− −

= = =

=

− + − ≤ ≤ ∞

≤ + = + − = + = ≤ ∞

≤ ∞

∑ ∑ ∑

∑

( )1 1 1 1 1 1 1

able.

So, integrable.

adapted to since X adapted and A previsible

| | ( ) |

n

n n n n

n n n n n n n n n n n

Y

Y F

E Y F Y E X A F X A E X X F− − − − − − −

•

• − = − − − = −⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦ ⎣ ( )

( )

1

1 1

1 1 1 1 1

by the fact that is F measurable and is previsible

( ) 0 since by construction, ( ) |

n n

n n n

n n n n n n n n n

A A

X A

A A A A E X X F A A

−

− −

− − − − −

− −⎦

= − − − = − = −⎡ ⎤⎣ ⎦

1 1

0

So, | .

WTS uniqueness up to the value of Y :

Suppose two such decompositions, . Then, .

n n n

n n n n n n n n n

E Y F Y

X Y A Y A Y Y A A

− −=⎡ ⎤⎣ ⎦•

∃ = + = + − = −% %% %

( ) ( )( ) ( )1 1 1 1

1 1 1 1 2 1 1 2 2 2

| by previsibility | by M.G.

| |

n n n n n n n n n n

n n n n n n n n n n

A A E A A F E Y Y F Y Y

A A E A A F E Y Y F Y Y

− − − −

− − − − − − − − − −

− = − = − = −

= − = − = − = −

% % % %

% % % %

0 0

0 0

...

Thus, if then . That is, the decomposition is unique up to the val

n n n n

Y Y

Y Y A A Y Y SAME DECOMPOSITION

=

= −

= ⇒ = ⇒ = ⇒

%

%% %

0ue of .

Y

l. (E.4.4.3) Doob’s Decomposition for SubMG (An non-decreasing) and SupMG (An non-increasing)

( )( )

( )

1

1

1 1 1 1 1

If , a subMG, then in Doob's decomposition.

, a supMG, then in Doob's decomposition.

Pf: , a subMG | |

n n n n

n n n n

n n n n n n n n n n n n n n n n

X F A A n

X F A A n

X F E X F X E Y A F Y A Y A Y A A A

+

+

+ + + + +

≤ ∀

≥ ∀

⇒ ≥ ⇒ + ≥ + ⇒ + ≥ + ⇒ ≥⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

( ) 1 1 1 1 1 , a supMG | |n n n n n n n n n n n n n n n nX F E X F X E Y A F Y A Y A Y A A A+ + + + +⇒ ≤ ⇒ + ≤ + ⇒ + ≤ + ⇒ ≤⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦ Note: For this reason, Doob’s decomposition is very attractive and useful for subMG and supMG. Note2: Doob’s is also particularly useful in connection with square-integrable martingales Xn, where one can relate the limit of Xn as n inf with that of the non-decreasing sequence An in the decomposition of Xn

2.

31 | P a g e

m. (T.4.4.7) Doob-Meyer Decomposition (continuous time analog of Doob’s decomposition and fundamental to stochastic integration)

( ) ( )

2

0

Suppose is right-continuous filtration and the MG , having continuous sample path is s.t. E t 0.

Then, a unique SP s.t.

(a) A 0

(b) has continuous sample path w.p. 1

(

t t t t

t

t

F M F M

A

A

< ∞ ∀ ≥

∃

=

( )2

c) is adapted to .

(d) t A is non-decreasing w.p.1

(e) M -A ,F is a MG

t t

t

t t t

A F

a

Note: This is just the Doob decomposition of the sub-MG Xt=Mt

2, where (a) resolves the issue of the uniqueness of the R.V. A0 measurable on F0, (b) specifies the smoothness of the sample-path of the continuous-time S.P. At and (d) is an analogy of the monoticity property we saw above. Note on “Increasing Part”: The S.P. At in the Doob-Meyer decomposition of Mt

2 is called the increasing part of the increasing process associated with the MG Mt. Note that this is a decomposition of Mt

2 s.t. Mt2= Mt

2 – At +At for some At nondecreasing and Mt2 – At a MG.

Note on Quadratic Variation and Increasing Part: Later we will see that the increasing part gives us the quadratic variation (up to t) of Mt. We know for B-M that Wt

2=(Wt2-t)+t is the Doob Meyer decomposition, so that t is the increasing part. We also know that the quadratic variation for

Wt is t!!!

n. (T.4.4.11) Doob’s Inequality

( ) ( )

0(a) Suppose is a subMG. Then, 0 and , P : max ( )

(b) Suppose is a subMG with a last element ( , ) with F F with X integrable and measurable wrt F and t>0, | .

Nn n N n

n t t t

E XX x N X x

xX X F E X F X

ω ω≤ ≤

∞ ∞ ∞ ∞ ∞ ∞

∀ > < ∞ > ≤

⎡ ⎤⊂ ∀ ≥⎣ ⎦

( ) ( )

0

0

Then, 0, P : sup ( )

(c) Suppose , 0, is a continuous-parameter, right continuous subMG (i.e. each sample path t is right continuous).

Then, 0, P : sup ( )

n n

t t

t T t

E Xx X x

xX t T X

x X

ω ω

ω

ω ω

∞≤ ≤∞

≤ ≤

∀ > > ≤

∈⎡ ⎤⎣ ⎦

∀ > >

a

( ) TE Xx

x≤

Note: This theorem states that the maximum of a subMG does not grow too quickly.

( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

0

0

Pf: Consider the stopping time min 0 : .

Observe that A= : max : =B

(" ": Take . max at least one n 0, is s.t. min 0 :

" ": Ta

x n

n N n x

n N n n x n

n X x

X x N

A X x N X x n X x N

τ ω ω

ω ω ω τ ω

ω ω ω τ ω ω

≤ ≤

≤ ≤

= ≥ >

> = ≤

⊆ ∈ > ⇒ ∈ > ⇒ = ≥ > ≤⎡ ⎤⎣ ⎦

⊇ ( ) ( ) ( )( ) ( )n 0

ke B. min 0 : at least one n 0, is s.t.

the maximum X in 0, is s.t. max .)

Thus, ...

x n n

n n N n

n X x N N X x

N X x X x

ω τ ω ω ω

ω ω≤ ≤

∈ = ≥ > ≤ ⇒ ∈ >⎡ ⎤⎣ ⎦

⇒ > ⇒ >⎡ ⎤⎣ ⎦

o. (T. 4.5.1) Doob’s Martingale Convergence Theorem

(Idea: The fact that the maximum of a subMG does not grow too rapidly is closely related to convergence properties of subMG (also supMGs and MGs).

( )( ) ( ) ( ) ( ) ( ) ( )

0 . .

Suppose , is a right continuous subMG.

(a) If sup , then X lim exists w.p.1. i.e. s.t.

Furthermore, in this case E lim

(b) If is uniformly in

t t

t t t t t a s

t t

t

X F

E X X X X X

X E X

X

ω ω ω ω ω≥ ∞ →∞ ∞+

∞ →∞

⎡ ⎤ ⎡ ⎤< ∞ = ∃ →⎣ ⎦⎣ ⎦

≤ < ∞

( ) ( )

1

1

tegrable, then also in L .

Further, X , is a subMG with its last element i.e. for any fixed t 0, X | alsot

t L t t t t

X X

X X F X E X F∞

∞ ∞ ∞

→

⎡ ⎤→ ⇒ ≥ ≤⎣ ⎦ Note on the Difference between (a) and (b):

32 | P a g e

( )To understand the difference between parts (a) and (b) of Doob's convergence, recall that if U.I. then E for some C< and all t.

by Def 1.4.21 a necessary condition for U.I. is that sup

t t

t

X X C

E X

+⎡ ⎤ ≤ ∞⎣ ⎦

( ) ( ) ( ) ( ) ( ). . 1

sup E E E t

Further, by Th. 1.4.22, U.I. and (and thus in prob) or .

Thus, part (b) 's assumption implies part (a) assumptions, so that we have a.s. c

t t t t t

t t a s t L t

X X X

X X X X X E X E X+ − +

⎡ ⎤⎡ ⎤= + < ∞ ⇒ < ∞ ∀⎢ ⎥⎣ ⎦⎣ ⎦→ ⇒ → →

( )1

onvergence. PLUS U.I. we have L1 convergence. This all comes for free using theorems we already have seen.

But the content of (b) i.e. requiring proof is that | t 0.t L t tX X X E X F∞→ ⇒ ≤ ∀ ≥⎡ ⎤⎣ ⎦

Note on Non-Converging MGs: Many important martingales do not converge. (example: Brownian Motion! It can be shown that LimSup W(t) = inf and Lim Inf W(t) = -inf. Indeed, Doob’s convergence assumptions do not hold because it can be shown that E[W(t)+] = sqrt(t/2pi) which is not bounded, so Doob’s does not apply.)

p. (P.4.5.3) L2 and A.S. convergence of MGs (A Stronger Convergence Theorem than Doob’s) 2

. . 22

If the MG is s.t. for some and all n, then a R.V. s.t. AND .

Moreover, EX .

n n n a s n LX EX C C X X X X X

C

∞ ∞ ∞

∞

≤ < ∞ ∃ → →

≤ < ∞

Note on No L1 Analog: There is no L1 analog to this proposition! That is, there exists a MG Xn s.t. E[Xn]<C for some C finite, but the L1 and a.s. limits are different! Example: There exists a non-negative MG Xn s.t. E(Xn)=1 for all n and Xn 0 a.s. So, the a.s. and L1 limits are different. (See P.4.6.5)

q. (P.4.6.2) SP Xn=m-nZn is a MG for the filtration Fn, Zn a Branching Process

( ) ( )

( )

( 1) ( 1)1

1 1

( 1) ( 1)

1

Pf:

| | | by linearity of conditional expectation and since Z measurable w.r.t. F

since is independent of F

n n

n

Z Zn n

n n n n n nj jj j

Zn n

nj jj

E Z F E N F E N F

E N N

+ ++

= =

+ +

=

⎛ ⎞⎜ ⎟= =⎜ ⎟⎜ ⎟⎝ ⎠

=

∑ ∑

∑( )

( )

( 1)

11 1

(the number of offsprings this gen for any ind ind. of the past)

since N iid with finite mean m = E(N)<

1Thus, | |

nn j

n

nn n nn n

E N Z

mZ

ZE X F E Fm m

+

++ + +

= ∞

=

⎛ ⎞= =⎜ ⎟

⎝ ⎠( )11 1

1| nn n n nn n

ZE Z F mZ Xm m

+ += = =

r. (P.4.6.3) Sub-Critical Process Dies Off / Extinction Probability is One when m < 1 If m<1 then p 1, i.e. with probability 1 the population eventually dies off.

Pf:

Recall that is a non-negative MG. From Ex 4.5.6, we know from part (a) of Doob's convergence theorem th

ex

nn n

ZXm

=

=

. .

at if is a nonnegative

right-continuous MG, then for some where < a.s..

and m 0 (since m<1) Z 0. But Z integer valued, so Z 0 if Z 0 for some n eventuall

t

t asn n

n n n a s

n n n

X

X X X X

Z m X∞ ∞ ∞→ ∞

= → ⇒ →

→ = y.

s. (P.4.6.5) Critical Process Dies Off / Extinction Probability is One when m = 1 If m=1 then p 1, i.e. with probability 1 the population eventually dies off. ex =

33 | P a g e

Chapter 5: Brownian Motion BM is the most fundamental continuous time SP. It is both a martingale and a Gaussian S.P. It has continuous sample path (but differentiable nowhere), independent increments, and the strong Markov property.

Definitions DEFINITION AND CONSTRUCTION a. (D.5.1.1, T.5.2.1) Brownian Motion

(D.5.1.1) Definition

( )

A S.P. ,0 is called a Brownian Motion (or a Wiener Process) if:

(a) is a Gaussian Process

(b) ( ) = 0, min( , )

(c) For almost every , the sample path ( ) is continous on [0,T]

t

t

t t s

t

W t T

W

E W E W W t s

t Wω ω

≤ ≤

=

a (T.5.2.1) Levy’s Martingale Characterization of BM

( ) ( ) ( )2

2t

. . its Doob-Meyere decomp is

Let G ,0 . If , is a MG w/ continuous sample path and , also a MG , then X a BM.

t

t s t t t t

i eX t

X s t X G X t Gσ

−

= ≤ ≤ −14444244443

Note on FDD: (a) and (b) completely characterize the FDD of the Brownian Motion (because Gaussian processes are characterized by their mean and auto-covariance functions by C.3.2.15). Adding (c) allows us to characterize the sample path as well. Note on Independent Increments of 0 Mean: BM has independent increments of 0 mean.

( ) ( )

( ) ( ) ( )

( ) ( ) 0

Now, Gaussian all lin comb are Gaussian , Gaussian (s<t)

So, they're independent iff uncorrelated., , , min( , ) min( , ) 0

t h h t h h

t t h t s

t h t s t h s t s

E W W E W E W

W W W W

Cov W W W Cov W W Cov W W t h s t s s s

+ +

+

+ +

− = − =

⇒ ⇒ −

− = − = + − = − =

Note on Constructing BM: Read pg 102 – 104. The idea is that we can construct Gaussian process with the distributional properties (a) and (b) of a BM. Then, by Kolmogorov’s continuity theorem, we say there exists a modification of this Gaussian process that has continuous sample path – this modification is the BM.

SMOOTHNESS AND VARIATION OF THE BROWNIAN SAMPLE PATH

b. Comparison between Variation of Nice Calculus and Stochastic Calculus:

“Nice” Calculus “Stochastic” Calculus Total Variation <Inf +Inf Quadratic Variation 0 <Inf P>3 Variation 0 0

c. (D.5.3.1) || π || (Length of Longest Interval) and Q-th Variation of f(.) on π (CALCULUS DEFINITION)

( ) ( )

( ) ( ) ( )0 1

( ) ( )1

( ) ( ) ( )1( )

For any finite partition of [a,b], that is, ... ,

let || ||=max denote the length of the longest interval in

and let ( ) denot

k

i ii

qq

iii

a t t t b

t t

V f f t f t

π π π

π π

π ππ

π π

π π+

+

= = < < < =

−

= −∑

( )|| || 0 ( )

e the q-th variation of f(.) on .

The q-th variation of f(.) on [a,b] is lim ( )qV fπ π

π

→

d. (D.5.3.2) Q-th variation of a S.P. X(t) on the interval [a,b]

( )The q-th variation of a S.P. on the interval [a,b] is the random variable ( ) obtained when replacing f(t) by X ( ) in the above definition, provided the limit exists.

qtV X ω

e. Lipschitz Sample Path (w.p.1)

We say that a S.P. X(t) has Lipschitz sample path with probability 1 if a RV L( ) which is finite a.s. s.t. ( ) ( ) t,s ,X t X s L t s a bω∃ − ≤ − ∀ ∈ ⎡ ⎤⎣ ⎦

Note on Quadratic Variation and Smoothness of the Sample Path (A Lipschitz Continuous RV has 0-Quadratic Variation):

34 | P a g e

( ) ( ) 2(2) ( ) ( )

1( )

The quadratic variation is affected by the smoothness of the sample path. Suppose a SP X(t) has Lipschitz sample path w.p. 1. ( RV L( ) s.t. |X(t)-X(s)| L|t-s|.)

Then, ( ) iii

V X X t X t Lπ ππ

ω

+

∃ ≤

= − ≤∑ ( ) ( )2( ) ( ) ( ) ( ) ( ) ( )2 21 1 1

2. .

|| || bc || || i

|| || ( ) 0 as || || 0.So, X(t) has 0 quadratic variation.

i i ii i ii i

a s

t t L t t t t

L b a

π π π π π ππ π

π π

+ + +− ≤ − ≤ − ∀

≤ − → →

∑ ∑

f. (D.5.3.3) Quadratic Variation

(2)The quadratic variation of a S.P. X, denoted ( ), is the non-decreasing, non-negative SP corresponding to the quadratic variation of X on the intervals [0,t].tV X

36 | P a g e

(SEE PS5 for Work)

REFLECTION PRINCIPLE AND BROWNIAN MOTION HITTING TIMES d. (P.5.2.2) tX a BMtW Wτ τ+= − (USEFUL FOR REFLECTION PRINCIPLE)

tX a BM, where is a G -stopping time, G the cononical filtration for a brownian. And X independent of stopped -field Gt t t tW Wτ τ ττ σ+= −

e. 0: ( ) : max ( ) : ( )T s T sW W Tαω ω α ω ω α ω τ ω≤ ≤≥ ⊆ ≥ = ≤ where inf( 0 : )tt Wατ α= > = is the First Hitting Time of BM Pf: The first inclusion should be obvious. The second equality is intuitively clear, because the LHS is the set of outcomes in which the BM has hit a by time T, which is precisely what the RHS is.

f. Reflection Principle (pp 106-107) A

( ) ( ) ( )0

true by above bc if first hitting time is Tand by T W(T) still , sothe difference bewteen the first hitting time and T mustbe positive

max , , , 0s T s T T TP W W P T W P T Xαα α τ

α

α α τ α τ≤ ≤ −

≤≥

≥ ≥ = ≤ ≥ = ≤ ≤144424443 1444244 3

( ) ( ) ( )0

Using the fact that is a BM and thus for each t, distribution is symmetric around 0

, 0 , max ,

T

T T s T s T

X

P T X P T W P W Wα

τα

α τ ατ τ α α α

−

− ≤ ≤= ≤ ≥ = ≤ ≤ = ≥ ≤4 144424443

B ( ) ( ) ( ) ( ) ( )0 0 0 0

by reflection principal A

max max , max , 2 max , 2s T s s T s T s T s T s T s T TP W P W W P W W P W W P Wα α α α α α α α≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤≥ = ≥ ≥ + ≥ ≤ = ≥ ≥ = ≥1444442444443

SMOOTHNESS AND VARIATIONS OF THE BROWNIAN MOTION PATH g. (P.5.3.9) Total Variation of BM W(t) is infinite w.p.1

1. (Brownian Bridge): min(1, , )

(a) a Gaussian S.P.

(b) E(B ) 0 and (t,s)=min(t,s)-min(s,1)min(t,1)

t t t

t

t

B W t W

B

ρ

= −

=

(c) Continuous sample paths a.s. (d) Not adapted to the canonical filtration of W and not stationary (e)

tB

1 has same distribution as | 0

2. (Geometric BM):

(a) a Gaussian S.P.

(b) E(Y ) exp( 1/ 2) and (t,s)=exp(min(t,s))

t

t tW

t

t

t

W W

Y e

Y

ρ

=

=

= − [exp((1/2)(t+s)-1] (c) Continuous sample paths a.s. (d) Not adapted to the canonical filtration of W and not stationary

3. (Ornst

/2tein-Uhlenbeck):

(a) a Gaussian S.P.

1 (b) E(U ) 0 and ( , ) exp2

tt

t e

t

t

U e W

U

t s t sρ

−=

⎛ ⎞= = − −⎜ ⎟

⎝ ⎠(c) Continuous sample paths a.s.

(d) Not adapted to the canonical filtration of W (e) Stationary process with stationary increm

t

2

ents4. (BM with Drift):

(a) a Gaussian S.P.

(b) E( ) and ( , ) min( , )

t t

t

t

X x t W

X

X x t t s t s

μ σ

μ ρ σ

= + +

= + =

(c) Continuous sample paths a.s. (d) Adapted to the canonical filtration of W and not stationaryt

37 | P a g e

Pf:

Let (h)=sup | ( ) ( ) | . (the "largest" difference in BM values in a window of size h, for t [a,b-h]. Or,

this is the difference in BM values when we look over the set of h-width windows a t b h W t h W tα ≤ ≤ − + − ∈

where the left side of the window doesnot go below a and right side of the window does not go above b. We take the sup because there are uncountably many of these windows that we need to look at).

W.P

. .

. 1, the sample path W(t) is continuous and hence uniformly continuous on the closed and bounded interval [a,b].Therefore, (h) 0 as h 0.

( )Let divide [a,b] to 2 equal parts, so || || = .2

T

a s

nn n n

b aα

π π

→ →

−

( ) ( ) ( ) ( )

( )

2 1 2 12(2)

( )0 0

(2)( )

. .

hen,

(*) ( ) ( 1) || || || || (|| ||) ( 1) || || || ||

From Ex.5.3.6, we know that ( ) ( ) .

Furthermore, we have || || 0 from

n n

n

n

n n n n ni i

n a s

V W W a i W a i W a i W a i

V W b a

π

π

π π α π π π

α π

− −

= =

⎡ ⎤ ⎡ ⎤= + + − + ≤ + + − +⎣ ⎦ ⎣ ⎦

→ − < ∞

→

∑ ∑

( ) ( )

( ) ( )

2 1(2)

( )0

2 1(1)

. .( )0

(1)

above,

So, ( ) (|| ||) ( 1) || || || || ,and (|| ||) 0

( ) ( 1) || || || ||

( ) = wp1

n

n

n

n

n n n ni

n n a si

V W W a i W a i

V W W a i W a i

V W

π

π

α π π π α π

π π

−

=

−

=

⎡ ⎤≤ + + − + < ∞ →⎣ ⎦

⎡ ⎤⇒ = + + − + → ∞⎣ ⎦

⇒ ∞

∑

∑

h. (D.5.3.4) (2)( )For BM W(t), as || || 0 we have that ( ) ( ) in 2-mean.V W b aππ → → −

i. Quadratic Variation of a BM W(t) is t, since its Doob-Meyer Decomposition is W(t)2=[W(t)2-t] + t

38 | P a g e

Chapter 6: Markov, Poisson, and Jump Processes These are 3 important families of stochastic processes.

Definitions

Discrete Time Continuous Time Discrete Values Continuous Values

MARKOV PROCESSES a. (D.6.1.1) Markov Chain

Discrete MC (We mean discrete time, but can be continuous or discrete values) ( ) ( )

( ) ( )1 1 1

A discrete time stochastic process , 0,1,... with each RV X taking values in measurable space , is called a Markov Chain if

n N , A B, P | ,..., P | a.s.

or n N , A B,

n n

n n n n

X n S B

X A X X X A X+ + +

+

=

∀ ∈ ∀ ∈ ∈ = ∈

∀ ∈ ∀ ∈ ( ) ( )

( ) ( ) ( )( ) ( )

1 1 1

1 1 1 1 1 1 1

1

E | ,..., E | a.s.

(or P | ,..., P | if B is the Borel sigma field, bc B = )

n N , m N , A B, P | ,..., P | a.s.

n

n n n n

n n n n n n n n n

n m n n m n

I X A X X I X A X

X x X x X x X x X x X

X A X X X A X

σ α α

+ +

+ + + + +

+ + + +

⎡ ⎤ ⎡ ⎤∈ = ∈⎣ ⎦ ⎣ ⎦

≤ = = = ≤ = ≤ ∀

⇔ ∀ ∈ ∀ ∈ ∀ ∈ ∈ = ∈

⇔ ∀ ∈ ( )( ) ( )( )1 1 1N ,for any bounded measurable function f:S R, E | ,..., | a.s.

The set S is called the state space of the Markov chain.

n n n nf X X X E f X X+ + +=a

Continuous Time MC

( )

A S.P. X(t) indexted by t [0, ) and taking values in a measurable space (S,B) is called a Markov Processif for any t,u 0 and A B we have that a.s.

P | ( , )t u sX A X s tσ+

∈ ∞≥ ∈

∈ ≤ = ( )

( )( ) ( )( )

( ) ( )

|

or E | ( , ) E |

for any t,u 0 and any bounded measurable function f(.) on (S,B),

E ( ) | ( , ) E ( ) |

t u t

t u s t u t

t u s t u t

P X A X

I X A X s t I X A X

f X X s t f X X

σ

σ

+

+ +

+ +

∈

∈ ≤ = ∈

≥⇔

≤ = a.s.

The set Sis called the state apace of the Markov Process.

Note on Markov Process and Joint Law: MARKOV PRPERTY IS A PROPERTY OF THE JOINT LAW. SO IF YOU KNOW ANOTHER PROCESS WITH SAME JOINT LAW THAT YOU CAN SHOW TO BE MARKOV, THEN WE CAN USE THAT THE SHOW THAT THE ORIGINAL PROCESS IS MORKOV.

b. (D.6.1.2) (Time) Homogeneous Markov Chain

( )1

A homogeneous Markov chain is a Markov chain that has a modification for which

A B,P | does not depend on n (except via the value of X ) .n n nX A X+∀ ∈ ∈

c. (D.6.1.3, D.6.1.8) Stationary Transition Probabilities (Determines distribution of Homogeneous MCs)

(D.6.1.3) Discrete Time To each homogenoues Markov chain with values in a closed subset S of R correspond its stationary transition probabilities

p(A|x) s.t. p(.|x) is a probability measure on (S,B) for any x S.

nX

∈

( )1 p(A|x) is measurable on B for any A B, and p(A|X ) | n 0n n nP X A X+∈ = ∈ ∀ ≥

(D.6.1.8) Continuous Time

( ) ( )

( ) ( ),

, ,

,

For each t>s and fixed x S, a probability measure p . | on , s.t. for each fixed A B,

the function ( | .) is measurable and ( | ) | |

Such a collection ( | ) i

t s

t s t s s t s t s

t s

x S B

p A p A X E I X A X P X A X

p A x

∈ ∃ ∈

= ∈ = ∈

s called the transition probabilities for the Markov process .tX

Example of stationary homogeneous MCs: Random Walk

39 | P a g e

d. (D.6.1.4) Initial Distribution of a Markov Chain

( ) ( ) ( )0

0

The initial distribution of a M.C. is the distribution of X .

That is, it's the probability measure on ,A P X A S B

π

π = ∈

Note on Using Initial Distribution to Determine the Distribution of Xn:

The distribution of X for any n is determined by the initial distribution and the transition probability function ( | ), , .

If discrete state space and time homogenous, we can find the distrib

n p A x A B x Sπ ∈ ∈

ution of X given is: 'Pnn π π

Note on Initial Distribution + Probability Function Determine FDD of the M.C.: To get FDD, we just need to give initial distribution and the transition probability function (or the transition matrix P if discrete state space).

In particular, for each nonegative integer k, every 0

( )1 1 0

0

0 1 0

0 0 , 1 , 1 0 0

=t <t <...<t and A ,...,A B we have that

P ( ) ,..., ( ) = ... ( | )... ( | ) ( ) (using Lebesgue Integral) k k

k

k k

k k t t k k t tA A

X t A X t A p dx x p dx x dxω ω

π− −

∈ ∈

∈

∈ ∈ ∫ ∫

WHAT DOES THIS MEAN?!?!? Note on Homogenoues Markov Chains: Homogenous MC's are fully characterized by the initial distributions and the (one-step) transition probabilities (. | .) for all 0 in order to determine all distributional properties of the associa

tpt > ted homogeneous Markov process.

In view of the Chapman-Kolmogorov relationship, using functional analysis one wmay often express p (. | .) in terms of a single operator, called the "generator" of the Ma

t

rkov process. For example, the generator of BM is closely related to the heat equation, hence the reasont hat many computations can be simplified via the theory of PDE.

POISSON PROCESS, EXPONENTIAL INTER-ARRIVALS, AND ORDER STATISTICS e. 4 Useful Conditions for Defining Poisson Processes:

0

C0 (Counting Process): a) Each sample path N ( ) is piecewise constant, nondecreasing, and right-continuous b) ( ) 0 c) All jumnp discontinuities are of size 1C1 (Independent and

tN

ωω =

1 2 1 1

1

Stationary Poisson Increments): For any k and any 0 ... , a) the increments, , ,..., are independent R.V.'s

b) for some 0 and all 0, the increment ~k k

k

t t t t t

t s

t tN N N N N

t s N N Poissonλ−

< < <

− −

> > ≥ − ( )

1 2 1

1

( )

(so stationary and independent poisson distributed increments)C2 (Independent and Stationary Increments):For any k and any 0 ... , a) the increments , ,...,

k k

k

t t t t t

t s

t tN N N N N

λ −

< < <

− −1 are independent R.V.'s

b) for all 0, the increment have law that depends only on (t-s) (i.e. stationary increments)C3 (Jump Time Increments ~ Exponential( )): The gap

t st s N N

λ

−

> ≥ −

k k-1

t

1

s / increments between jump times T -T for k=1,2,... are iid RV, each distributed Exp( )C4 a) The S.P. N has no fixed discontinuities. i.e. ( ) 0 k and t 0. b) For any fixed k, 0<t <t

kP T t

λ

= = ∀ ≥

2 k 1 2 k<...<t and nonnegative integers n ,n ,...,n , ( 1 | , ) ( ).

( 2 | , ) ( ).

( ) where o(h) denotes a function f(h) s.t. 0 as h 0.

k k j

k k j

t h t t j

t h t t j

P N N N n j k h o h

P N N N n j k o h

f hh

λ+

+

− = = ≤ = +

− ≥ = ≤ =

→ ↓

40 | P a g e

f. (D.6.2.1) Counting Process (C0)

0

is a counting process if... a) Each sample path N ( ) is piecewise constant, nondecreasing, and right-continuous b) ( ) 0 c) All jumnp discontinuities are of size 1or equivalen

t

t

N

Nω

ω =

tly, N sup 0 : where T are the jump-times associated with N .t k k tk T t= ≥ ≤

Note on Application: We use N(t) as counting the number of discrete events / occurrences in the interval [0,t] for each t>0, with T(k) denoting the arrival or occurrence time of the k-th such event. That’s why we call N(t) a counting process.

g. (D.6.2.1) Jump Times

0 1Associated with each sample path of a counting process, N ( ), are jump times 0 ... such that

inf 0 : for each k. (T is the time when the k-th occurrence arrives/occurs, or, the first tit

k t k

T T

T t N k

ω = < <

= ≥ ≥

t

me

when the counting process reaches k)Thus, as stated above, counting time is equivalently stated as: sup 0 : (bc N is the number ot kN k T t= ≥ ≤ f occurrences at time t)

h. Poisson R.V.

( )A RV N has the Poisson( ) law if P , k 0 integer!

keN kk

λλλ−

= = ≥

Has E(N) = λ and Var(N) = λ

i. S.P. with Independent Increments We say that a S.P. N has independent increments if ( ,0 ) h tt t h t dN N N s tσ+ − ⊥ ≤ ≤ ∀ ∀

j. (D. 6.2.3) S.P. with Stationary Increments t+h tWe say that the S.P. N t 0 has stationary increments if the law of N -N is independent (i.e. not a function) of t,

but a function of h. t ≥

k. (P.6.2.5) Memoryless Property of Exponential Law

( )

We say that a R.V. T has Exp( ) law if ( ) for all t 0 and some >0.Except for the trivial case of T = 0 w.t. 1, these are the ONLY laws for which

P | ( ) x,y

tP T t e

T x y T y P T x

λλ λ−> = ≥

> + > = > ∀ ≥ 0

l. (D.6.2.1, P.6.2.4) Poisson Process Definition 1(D.6.2.1): C0 + C1

1 2 1

1

A poisson process is a counting process (C0) that also satisfies C1 (independent and stationary Poisson increments):i.e. for any k and any 0 ... , a) the increments, , ,...,

k k

k

t t t t t

t tN N N N N

< < <

− −

( )1 are independent R.V.'s

b) for some 0 (rate of intensity) and all 0, the increment ~ ( ) (so stationary and independent poisson distributed increments)In other wor

t st s N N Poisson t sλ λ−

> > ≥ − −

ds, among the processes satisfying C0, the Poisson Process is the unique S.P. having also the property C1.

Note on Terminology: The Poisson process has independent increments, each having a Poisson law, where the parameter of the count N(t)-N(s) is proportional to the length of the corresponding interval [s,t], proportional by exactly a constant λ, called the rate of intensity of the Poisson process. Note on Comparison with Brownian: Just as Brownian S.P., for which B(t)~N(0,t) and B(t)-B(s)~N(0,t-s), then Poisson S.P. is such that N(t)~Poi(λt), and N(t) – N(s) ~ Poi(λ(t-s)). Definition 2 (P.6.2.4): C0 + C2 A S.P. is a Poisson process iff it is a counting process (C0) with stationary independent increments (C2). Note on equivalence: By (P.6.2.4), the poisson process is the only S.P. that satisfies C0 & C2. Definition 3 (P.6.2.6): C0 + C3 A S.P. is a Poisson process with rate iff it is a counting process (C0) with Jump Time Increments~Exp( ) (C3)λ λ Note on equivalence: By (P.6.2.6). Definition 4 (P.6.2.6): C0 + C4 A S.P. is a Poisson process of rate iff it is a counting process (C0) that satisfies C4.λ

41 | P a g e

Theorems a. Showing that a SP is a Markov SP

1. ( ) ( ) 1Compute P | P | if discrete-time only depends on t, and not on ,t h t n n uX A F or X A F X u t+ +⎡ ⎤∈ ∈ <⎣ ⎦

2. Show that the SP has independent increments (using following lemma)tX (See “c” below) 3. Show that it’s a function X(t) = ft[Y(g(t))] of a Markov Process Y(t), for f invertible and g strictly increasing. (See “e” below) 4. Since Markov property is a property of the joint distribution, show that the SP has the same FDD/ Joint Distribution as some other stochastic

process which is Markov.

b. Showing that a Markov SP is Time-Homogeneous 1. Use P.6.1.13: Every continuous time stochastic process of stationary and independent increments is a homogenous Markov Process. 2. If X(t) = f[Y(g(t))] for f invertible and time independent, g(t) strictly increasing, and Y(t) is homogeneous, then X(t) also. (See “e” below) 3. If X(t) a Markov Process and Stationary Process, then it’s homogeneous. (See “d”)

c. (Lemma 1) If X(t) is a S.P. with independent increments, then X(t) is a Markov SP.

PF: Let , 0 be the canonical filtration of , 0 .

Independent increments means that for any t,h 0, the random variable - .

To show that is a Markov process, from Denition 6.1.7 it suff

t t

t h h t

t

F t X t

X X F

X+

≥ ≥

≥ ⊥

ices to show that

[ ( ) | ] [ ( ) | ].for any bounded measurable function f on (S,B), where S is state space and B Borel.Let f be an

t h t t h tE f X F E f X X+ +=

arbitrary bounded measurable function. Then, [ ( ) | ] [ ( ) | ] [ ( ) | ]since is ind of and is measurable.

t h t t h t t t t h t t t

t h t t t t

E f X F E f X X X F E f X X X XX X F X X

+ + +

+

= − + = − +

−

Note on Converse: Converse is NOT necessarily true. Consider process in Ex 6.1.14. Note on Intuition : Markov Process is one which the conditional distribution on the past only depends on where you were last. But that’s precisely what it means to have independent increments. Imagine a simple random walk. Given that you’re at some point, because we have independent increments, it doesn’t matter where you were before, just where you are now.

d. (Lemma 2) If X(t) is a stationary process and a Markov Process, then it is Homogeneous.

( ) ( )

( ) ( )( )

( )( )

( )( )

0 1 1 0

1 1 0 1 01 1

0 0

Pf: Here proof only for discrete time, countable state process .

stationary process and , , .

Thus, , , ,

| |

n

n n D n n D

n nn n

n

X

X X X X X X X

P X y X x P X y X x P X y X xP X y X x P X y

P X x P X x P X x

+

++

⇒ = =

= = = = = == = = = = = =

= = =( )0

So the transition probabilities do not depend on n, i.e. chain is time-homogeneous.

X x=

Note on Homogeneous Markov Process Has Nothing to Do with Stationary Increments (Except Inc & Stat Inc Hom Mark Process):

It can be shown that a Markov Process with stationary increments is not necessarily time-homogeneous, and also that a Homogeneous Markov Process may not necessarily have stationary increments.

e. (Extra) If X(t) a Markov Process, then Y(t) = ft(Xg(t)) is also a Markov Process for f invertible and g strictly increasing.

Theorem: If X(t) a Markov Process, then Y(t) = ft(Xg(t)) also a Markov Process for f invertible and g strictly increasing. If further that X(t) is homogenous and f not time dependent, then f(Xg(t)) also homogeneous.

42 | P a g e

( )( ) ( )

( ) ( )( )( )( )

( ) ( )

( )

1( )

Pf:Take A F ,

Note that since f invertible, , ,

| |

| since f invertible (and this is measurable w.r.t. F

t s

s g s g t

t u t t u g t u t

g t u t u t t

Y s t

Y s t X s t G

P Y A F P f X A F

P X f A F

σ

σ σ

+ + +

−+ +

∈ ≡ ≤

≤ = ≤ =

∈ = ∈

= ∈

( )( )( ) ( )( )

( )

1( ) ( )

1( ) ( )

( )

since the canonical filtrations are the same)

|

| by X Markovian

g t u t u g t

g t u t u g t t

t u g t u

P X f A G

P X f A X

P f X A

σ

−+ +

−+ +

+ +

= ∈

= ∈

= ∈ ( )( ) ( ) ( )( )( )

( ) ( ) ( )| ( ) again b.c. f invertible, ( )

|

So, Y is Markovian.

Furthermore, if X time-homogenous, and f is not time dependent, then f(X ) also time-ho

g t g t g t

t u t

t

t t

f X X f X

P Y A Y

σ σ σ

σ+

=

= ∈

mogenoues.

f. (P.6.1.13) Every Continuous Time SP X(t) with Stationary and Independent Increments is a Homogenous Markov Process.

g. (P.6.1.5) Strong Markov Property

h. (P.6.2.4) Poisson Process is the only S.P. with stationary independent increments that satisfies condition C0 (counting process).

i. (P.6.2.5) Memoryless Property of Exponential Law

( )

We say that a R.V. T has Exp( ) law if ( ) for all t 0 and some >0.Except for the trivial case of T = 0 w.t. 1, these are the ONLY laws for which

P | ( ) x,y

tP T t e

T x y T y P T x

λλ λ−> = ≥

> + > = > ∀ ≥ 0

j. (P.6.2.6) A S.P. N(t) that satisfies C0 is a Poisson process of rate λ iff it satisfies C3.

k. (P.6.2.8) Relationship between Poisson Process and Uniform Measure

( ) ( ) 1 2

1 1

1 2 n

1 1 2 2 1 20

t

Fixing any integer n and 0 t t ... t t, we have that

! , 1,..., | , ,..., ... ...

That is, conditional on N =n, the

n

n

t t tk k t n n nn x x

nP T t k n N n P T t T t T t dx dx dxt −

≤ ≤ ≤ ≤ ≤

≤ = = = ≤ ≤ ≤ = ∫ ∫ ∫ first n arrival times : 1,..., have the distribution of the order statistic of a sample of n iid Unif[0,t] R.V.skT k n=

WHY??? Note on Application:

( )1 1 1

1

For example, E | E by theorem = E since it doesn't matter how we sum

= ( ) since each 2

tN n n

i t i ii i i

n

ii

T N n U U

tE U n

= = =

=

⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠⎝ ⎠

=

∑ ∑ ∑

∑ U ~ 0,i iid Unif t⎡ ⎤⎣ ⎦

l. (T.6.2.10) Poisson Approximation

( ) ( )( )

( ) ( ) ( ) ( )

Suppose that for each n, the random variables Z are independent, nonnegative integers where

P Z 1 and P 2 are such that as n ,

(i)

nt

n n n nl l l l

l

p Z

p

ε= = ≥ = → ∞

( )

( )

( ) ( ) ( )1,...,

1 1

( )

1

0, (ii) 0 and (iii) max 0

Then, when n

n nn n n

l nl ll l

nn

n Dll

p

S Z Poisson

λ ε

λ

== =

=

→ ∈ ∞ → →

= → → ∞

∑ ∑

∑

Note on Approximating Poisson by Binomial:

43 | P a g e

( )

( )

( ) ( )

1

( ) ( ) ( ) ( )

1 1 1

1 w.p. Take Z

0 w.p. 1-

So, Z ~ , S ~ , (since sum of iid bernoulli)

Here, and P 2 0 so

nl

nn n

nl ll

n n nn n n n

l l l ll l l

n

n

Bernoulli Z Binomial nn n

p Zn

λ

λ

λ λ

λ λ ε ε

=

= = =

⎧⎪⎪= ⎨⎪⎪⎩

⎛ ⎞ ⎛ ⎞⇒ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

= = ≥ = =

∑

∑ ∑ ∑

( )

( )1,...,

( )

1

0 and max 0

Thus, by theorem above, we have when n

nl n nl

nn

n Dll

pn

S Z Poisson

λ

λ

= →∞

=

⎡ ⎤⎢ ⎥= = →⎢ ⎥⎣ ⎦

= → → ∞∑

Note on Brownian Motion and Functional CLT: T.6.2.10 plays for the Poisson process the same role that CLT plays for the BM. i.e. It provides a characterization of the Poisson process that is very attractive for the purpose of modeling real world phenomena.

m. (P.6.2.12) N1(t) + N2(t) Poisson if N1(t), N2(t) independent Poisson

(1) (2) (1) (2)1 2 1 2t t t tIf N and N are two independent Poisson processes of rates and respectively, then N +N is a Poisson process of rate .

Conversely, the sub-sequence of jump times obtained by ind

λ λ λ λ+

ependently keeping with probability p of each of the jump times of a Poisson process of rate corresponds to a Poisson process of rate p.λ λ

table of contents 7 7 - harvard universityscholar.harvard.edu › files › charlescywang › files...

Documents