# convergence of random processes - nyu courant convergence of random processes 1 introduction in...

Post on 27-Jun-2020

1 views

Embed Size (px)

TRANSCRIPT

DS-GA 1002 Lecture notes 6 Fall 2016

Convergence of random processes

1 Introduction

In these notes we study convergence of discrete random processes. This allows to characterize phenomena such as the law of large numbers, the central limit theorem and the convergence of Markov chains, which are fundamental in statistical estimation and probabilistic modeling.

2 Types of convergence

Convergence for a deterministic sequence of real numbers x1, x2, . . . is simple to define:

lim i→∞

xi = x (1)

if xi is arbitrarily close to x as i grows. More formally, for any � > 0 there is an i0 such that for all i > i0 |xi − x| < �. This allows to define convergence for a realization of a discrete random process X̃ (ω, i), i.e. when we fix the outcome ω and X̃ (ω, i) is just a deterministic function of i. It is more challenging to define convergence of the random process to a random variable X, since both of these objects are only defined through their distributions. In this section we describe several alternative definitions of convergence for random processes.

2.1 Convergence with probability one

Consider a discrete random process X̃ and a random variable X defined on the same prob- ability space. If we fix an element ω of the sample space Ω, then X̃ (i, ω) is a deterministic

sequence and X (ω) is a constant. It is consequently possible to verify whether X̃ (i, ω) con- verges deterministically to X (ω) as i → ∞ for that particular value of ω. In fact, we can ask: what is the probability that this happens? To be precise, this would be the probability that if we draw ω we have

lim i→∞

X̃ (i, ω) = X (ω) . (2)

If this probability equals one then we say that X̃ (i) converges to X with probability one.

Definition 2.1 (Convergence with probability one). A discrete random vector X̃ converges with probability one to a random variable X belonging to the same probability space (Ω,F ,P)

0 5 10 15 20 25 30 35 40 45 50

0

0.2

0.4

0.6

0.8

1

i

D̃ (ω ,i

)

Figure 1: Convergence to zero of the discrete random process D̃ defined in Example 2.2 of Lecture Notes 5.

if

P ({ ω | ω ∈ Ω, lim

i→∞ X̃ (ω, i) = X (ω)

}) = 1. (3)

Recall that in general the sample space Ω is very difficult to define and manipulate explicitly, except for very simple cases.

Example 2.2 (Puddle (continued)). Let us consider the discrete random process D̃ defined in Example 2.2 of Lecture Notes 5. If we fix ω ∈ (0, 1)

lim i→∞

D̃ (ω, i) = lim i→∞

ω

i (4)

= 0. (5)

It turns out the realizations tend to zero for all possible values of ω in the sample space. This implies that D̃ converges to zero with probability one.

2.2 Convergence in mean square and in probability

To verify convergence with probability one we fix the outcome ω and check whether the corresponding realizations of the random process converge deterministically. An alternative viewpoint is to fix the indexing variable i and consider how close the random variable X̃ (i) is to another random variable X as we increase i.

2

A possible measure of the distance between two random variables is the mean square of their difference. Recall that if E

( (X − Y )2

) = 0 then X = Y with probability one by Chebyshev’s

inequality. The mean square deviation between X̃ (i) and X is a deterministic quantity (a number), so we can evaluate its convergence as i → ∞. If it converges to zero then we say that the random sequence converges in mean square.

Definition 2.3 (Convergence in mean square). A discrete random process X̃ converges in mean square to a random variable X belonging to the same probability space if

lim i→∞

E

(( X − X̃ (i)

)2) = 0. (6)

Alternatively, we can consider the probability that X̃ (i) is separated from X by a certain fixed � > 0. If for any �, no matter how small, this probability converges to zero as i → ∞ then we say that the random sequence converges in probability.

Definition 2.4 (Convergence in probability). A discrete random process X̃ converges in probability to another random variable X belonging to the same probability space if for any � > 0

lim i→∞

P (∣∣∣X − X̃ (i)∣∣∣ > �) = 0. (7)

Note that as in the case of convergence in mean square, the limit in this definition is deter- ministic, as it is a limit of probabilities, which are just real numbers.

As a direct consequence of Markov’s inequality, convergence in mean square implies conver- gence in probability.

Theorem 2.5. Convergence in mean square implies convergence in probability.

Proof. We have

lim i→∞

P (∣∣∣X − X̃ (i)∣∣∣ > �) = lim

i→∞ P

(( X − X̃ (i)

)2 > �2

) (8)

≤ lim i→∞

E

(( X − X̃ (i)

)2) �2

by Markov’s inequality (9)

= 0, (10)

if the sequence converges in mean square.

It turns out that convergence with probability one also implies convergence in probability. Convergence in probability one does not imply convergence in mean square or vice versa. The difference between these three types of convergence is not very important for the purposes of this course.

3

2.3 Convergence in distribution

In some cases, a random process X̃ does not converge to the value of any random variable, but the pdf or pmf of X̃ (i) converges pointwise to the pdf or pmf of another random variable X. In that case, the actual values of X̃ (i) and X will not necessarily be close, but they have

the same distribution. We say that X̃ converges in distribution to X.

Definition 2.6 (Convergence in distribution). A discrete-state discrete random process X̃ converges in distribution to a discrete random variable X belonging to the same probability space if

lim i→∞

pX̃(i) (x) = pX (x) for all x ∈ RX , (11)

where RX is the range of X.

A continuous-state discrete random process X̃ converges in distribution to a continuous ran- dom variable X belonging to the same probability space if

lim i→∞

fX̃(i) (x) = fX (x) for all x ∈ R, (12)

assuming the pdfs are well defined (otherwise we can use the cdfs1).

Note that convergence in distribution is a much weaker notion than convergence with prob- ability one, in mean square or in probability. If a discrete random process X̃ converges to a random variable X in distribution, this only means that as i becomes large the distribution of X̃ (i) tends to the distribution of X, not that the values of the two random variables are close. However, convergence in probability (and hence convergence with probability one or in mean square) does imply convergence in distribution.

Example 2.7 (Binomial converges to Poisson). Let us define a discrete random process

X̃ (i) such that the distribution of X̃ (i) is binomial with parameters i and p := λ/i. X̃ (i)

and X̃ (j) are independent for i 6= j, which completely characterizes the n-order distributions of the process for all n > 1. Consider a Poisson random variable X with parameter λ that is independent of X̃ (i) for all i. Do you expect the values of X and X̃ (i) to be close as i→∞?

No! In fact even X̃ (i) and X̃ (i+ 1) will not be close in general. However, X̃ converges in

1One can also define convergence in distribution of a discrete-state random process to a continuous random variable through the deterministic convergence of the cdfs.

4

distribution to X, as established in Example 3.7 of Lecture Notes 2:

lim i→∞

pX̃(i) (x) = limi→∞

( i

x

) px (1− p)(i−x) (13)

= λx e−λ

x! (14)

= pX (x) . (15)

3 Law of Large Numbers

Let us define the average of a discrete random process.

Definition 3.1 (Moving average). The moving or running average Ã of a discrete random

process X̃, defined for i = 1, 2, . . . (i.e. 1 is the starting point), is equal to

Ã (i) := 1

i

i∑ j=1

X̃ (j) . (16)

Consider an iid sequence. A very natural interpretation for the moving average is that it is a real-time estimate of the mean. In fact, in statistical terms the moving average is the empirical mean of the process up to time i (we will discuss the empirical mean later on in the course). The notorious law of large numbers establishes that the average does indeed converge to the mean of the iid sequence.

Theorem 3.2 (Weak law of large numbers). Let X̃ be an iid discrete random process with

mean µX̃ := µ such that the variance of X̃ (i) σ 2 is bounded. Then the average Ã of X̃

converges in mean square to µ.

Proof. First, we establish that the mean of Ã (i) is constant and equal to µ,

E ( Ã (i)

) = E

( 1

i

i∑ j=1

X̃ (j)

) (17)

= 1

i

i∑ j=1

E ( X̃ (j)

) (18)

= µ. (19)

5

Due to the independence assumption, the variance scales linearly in i. Recall that for inde- pendent random variables the variance of the sum equals the sum of the variances,

Var ( Ã (i)

) = Var

( 1

i

i∑ j=1

X̃ (j)

) (20)

= 1

i2

Recommended