probability and statistics - stanford...

Probability and StatisticsPart 1. Probability Concepts and Limit Theorems

Chang-han Rhee

Stanford University

Sep 19, 2011 / CME001

1

Outline

Probability ConceptsProbability SpaceRandom VariablesExpectationConditional Probability and Expectation

Limit TheoremsModes of ConvergenceLaw of Large NumbersCentral Limit Theorem

2

Outline



3

Probability of an Eventin a random experiment

Relative frequency of an event, when repeating a random experiment.

e.g. coin flip, dice roll, roulette

4

Sample SpaceSet of all possible outcomes.

I Single coin flipΩ = H,T

I Two coin flipsΩ = (H,H), (H,T), (T,H), (T,T)

I Single dice rollΩ = 1, 2, 3, 4, 5, 6

I Two dice rollsΩ = (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)

(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)

(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)

(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)

(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)

5

Event

Subset of sample space.I Single coin flip : The event that the coin lands head

A = H

I Two coin flips : The event that the first coin lands head

A = (H,H), (H,T)

I Single dice roll : The event that the dice falls on an odd number

A = 1, 3, 5

I Two dice roll : The event that the sum is 4

A = (1, 3), (2, 2), (3, 1)

6

Ω = (H,H), (H,T), (T,H), (T,T)

Sample Space

Event: first coin lands on head

Outcome: both coin lands on tail

7

Probability

DefinitionA set function P is called a probability if

I 0 ≤ P(A) ≤ 1 for each event AI P(Ω) = 1 (Unitarity)I For each sequence A1,A2, . . . of mutually disjoint events

P

(∞∪1

Ai

)=

∞∑1

P(Ai) (Countable Additivity)

8

Back to Examples

I Fair CoinP(∅) = 0

P(H) = 1/2

P(T) = 1/2

P(H,T) = 1

I Biased Coin ( p ∈ [0, 1] )P(∅) = 0

P(H) = p

P(T) = 1 − p

P(H,T) = 1

9

Outline



10

Random Variables

A random variable is a function from a sample space to a real number.e.g.

I Winnning in single coin flip

X(H) = 1

X(T) = −1

I First roll, second roll, and sum of two dice

X(i, j) = i

Y(i, j) = j

Z(i, j) = i + j

11

Discrete Random VariablesA discrete random variable X assumes values in discrete subset S of R.

The distribution of a discrete random variable is completely describedby a probability mass function pX : R → [0, 1] such that

P(X = x) = pX(x)

e.g.I [Bernoulli] X ∼ Ber(p) if X ∈ 0, 1 and

P(X = 1) = 1 − P(X = 0) = pi.e.,

pX(1) = p and pX(0) = 1 − p

I [Binomial] X ∼ Bin(n, p) if X ∈ 0, 1, . . . , n and

pX(k) =(

nk

)pk(1 − p)n−k

12

Continuous Random VariablesA continuous random variable X assumes values in R.

The distribution of continuous random variables is completelydescribed by a probability density function fX : R → R+ such that

P(a ≤ X ≤ b) =∫ b

afX(x)dx

e.g.I [Uniform] X ∼ Unif (a, b), a < b if

fX(x) = 1

b−a a ≤ x ≤ b0 o.w.

I [Gaussian/Normal] X ∼ N(µ, σ2), µ ∈ R, σ2 > 0 if

fX(x) =1√

2πσ2e−

(x−µ)2

2σ2

13

Probability Distribution∗

Each random variable induces another probability PX : 2R → [0, 1] onreal line through the following:

PX((−∞, x]) := P(X ≤ x)

We often denote the distribution function with FX:

FX(x) := P(X ≤ x)

[NOTATION] The right-hand sides of the previous displays areshorthand notation for the following:

P(X ≤ x) := P(ω ∈ Ω : X(ω) ≤ x)

14

Note: Distribution can be identical even if the supporting probabilityspace is different.

e.g.X(H) = 1

X(T) = −1

Y(i) =

1 if i is odd−1 if i is even

15

Joint Distribution

Two random variables X and Y induce a probabillity PX,Y on R2:

PX,Y((−∞, x]× (−∞, y]) = P(X ≤ x,Y ≤ y)

A collection of random variables X1,X2, . . . ,Xn induce a probabillityPX1,...,Xn on Rn:

PX1,...,Xn((−∞, x1]× · · · × (−∞, xn]) = P(X1 ≤ x1, · · · ,Xn ≤ xn)

16

Joint distribution of two discrete random variables X and Y assumingvalues in SX and SY can be completely described by joint probabilitymass function pX,Y : R× R → [0, 1] such that

P(X = x,Y = y) = pX,Y(x, y)

Joint distribution of two continuous random variables X and Y can becompletely described by joint probability density functionfX,Y : R× R → R+ such that

P(X ≤ x,Y ≤ y) =∫ x

−∞

∫ y

−∞fX,Y(x, y)dydx

17

Outline



18

Expectation

For discrete random variable X, the expectation of X is

E[X] =∑x∈S

x pX(x)

For continuous random variable Y , the expectation of Y is

E[Y] =∫ ∞

−∞y fY(y)dy

19

Computation of Expectation

We can also compute the expectation of g(X) and g(Y) as follows:

E[g(X)] =∑x∈S

g(x)pX(x)

and

E[g(Y)] =∫ ∞

−∞g(x)pY(x)dx

20

Properties of Expectation

I LinearityEaX + bY = aEX + bEY

I MonotonocityX ≤ Y =⇒ EX ≤ EY

21

Probability as an Expectatoin

[NOTATION] We denote the indicator function of A with IA(·)

IA(ω) =

1 if ω ∈ A0 if ω /∈ A

Probability can be written as an expectation:

PX(A) = E IA(X)

More generally,P(A) = E IA

22

Summary Statistics

I MeanE[X]

I Variance

var(X) = E[(X − EX)2]

= EX2 − (EX)2

I Standard Deviation

σ(X) =√

var(X)

23

Outline



24

Conditoinal Probability

The conditional probability of A given B is defined as

P(A|B) = P(A ∩ B)P(B)

25

Independence

Two events A and B are independent if

P(A ∩ B) = P(A)P(B)

Two random variables X and Y are independent if

P(X ≤ x,Y ≤ y) = P(X ≤ x)P(Y ≤ y)

27

Conditional ExpectationDiscrete Random Variable

For discrete random variables X and Y , the conditional expectation ofX given Y = y is

E[X|Y = y] =∑x∈S

x pX|Y(x|y) =∞∑

x∈S

xP(X = x,Y = y)

P(Y = y)

28

Conditional ExpectationContinuous Random Variable

For continuous random variables X and Y , the conditional expectationof X given Y = y is

E[X|Y = y] =∫ ∞

−∞x fX|Y(x|y)dx =

∫ ∞

−∞x

fX,Y(x, y)fY(y)

29

Outline



31

Almost Sure Convergence

Let X1,X2, . . . be a sequence of random variables. We say that Xn

converges almost surely to X∞ as n → ∞ if

P(Xn → X∞ as n → ∞) = 1

We use the notation Xna.s.→ X∞ to denote almost sure convergence, or

convergence with probability 1.

32

Lp Convergence

[NOTATION] For p > 0, we denote p-norm of X with ∥ · ∥p

∥X∥p := (E|X|p)1/p

Let X1,X2, . . . be a sequence of random variables. For p > 0, we saythat Xn converges to X∞ in pth mean if

∥Xn − X∞∥p → 0

as n → ∞.

We use the notation XnLp

→ X∞ to denote convergence in pth mean, orLp convergence.

33

Convergence in Probability


converges in probability to X∞ if for each ϵ > 0,

P(|Xn − X∞| > ϵ) → 0

as n → ∞.

We use the notation Xnp→ X∞ to denote convergence in probability.

34

Weak Convergence


converges weakly to X∞ if

P(Xn ≤ x) → P(X∞ ≤ x)

as n → ∞ for each x at which P(X∞ ≤ ·) is continuous.

We use the notation Xn ⇒ X∞ or XnD→ X∞ to denote convergence in

probability or convergence in distribution.

35

Implications

Almost Sure Convergence Lp Convergence

Convergence in Probability

Weak Convergence

36

Outline



37

Weak Law of Large Numbers

Theorem (Weak Law of Large Numbers)Suppose that X1,X2, · · · is a sequence of i.i.d. r.v.-s such thatE|X1| < ∞. Then,

1n(X1 + · · ·+ Xn)

P→ EX1 as n → ∞

38

Strong Law of Large Numbers

Theorem (Strong Law of Large Numbers)Suppose that X1,X2, · · · is a sequence of i.i.d. r.v.-s such that EX1exists. Then,

1n(X1 + · · ·+ Xn)

a.s.→ EX1 as n → ∞

39

Outline



40

Central Limit Theorem

TheoremSuppose that the Xi’s are iid rv’s with common finite variance σ2.Then, if Sn = X1 + · · ·+ Xn,

Sn − nE(X1)√n

⇒ σN(0, 1)

as n → ∞.

From here, we can deduce the following approximation:

1n

Sn − EX1D≈ 1√

nN(0, 1)

41

probability and statistics - stanford...

Documents