sds 321: introduction to probability and statisticspsarkar/sds321/lecture23-new.pdfsds 321:...

SDS 321: Introduction to Probability andStatistics

Lecture 23: Continuous random variables-Inequalities, CLT

Purnamrita SarkarDepartment of Statistics and Data Science

The University of Texas at Austin

www.cs.cmu.edu/∼psarkar/teaching

Roadmap

I Examples of Markov and Chebyshev

I Weak law of large numbers and CLT

I Normal approximation to Binomial

Markov’s inequality Example

You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use

the Markov inequality to bound P(20∑i=1

Xi ≥ 15)

I P(∑i

Xi ≥ 15) ≤E [∑

i Xi ]

I How useful is this?

Xi ≥ 15)

I P(∑i

Xi ≥ 15) ≤E [∑

i Xi ]

Xi ≥ 15)

I P(∑i

Xi ≥ 15) ≤E [∑

i Xi ]

Chebyshev’s inequality Example

You have n independent Poisson(1) random variables X1, . . . ,Xn. Use theChebyshev inequality to bound P(|X̄ − 1| ≥ 1)?

P(|X̄ − 1| ≥ 1) ≤ var(X1)

10When n = 10

100When n = 100

Chebyshev’s inequality Example

You have n independent Poisson(1) random variables X1, . . . ,Xn. Use theChebyshev inequality to bound P(|X̄ − 1| ≥ 1)?

P(|X̄ − 1| ≥ 1) ≤ var(X1)

10When n = 10

100When n = 100

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

I E [Mn] =E [X1] + · · ·+ E [Xn]

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

I So P(|Mn − µ| ≥ ε) ≤σ2

I For large n this probability is small.

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

I E [Mn] =E [X1] + · · ·+ E [Xn]

n2=σ2

I So P(|Mn − µ| ≥ ε) ≤σ2

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

I E [Mn] =E [X1] + · · ·+ E [Xn]

n2=σ2

I So P(|Mn − µ| ≥ ε) ≤σ2

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

I E [Mn] =E [X1] + · · ·+ E [Xn]

n2=σ2

I So P(|Mn − µ| ≥ ε) ≤σ2

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

I E [Mn] =E [X1] + · · ·+ E [Xn]

n2=σ2

I So P(|Mn − µ| ≥ ε) ≤σ2

Illustration

Consider the mean of n independent Poisson(1) random variables. Foreach n, we plot the distribution of the average.

Can we say more? Central Limit Theorem

Turns out that not only can you say that the sample mean is close to thetrue mean, you can actually predict its distribution using the famousCentral Limit Theorem.

and variance σ2.

I Let X̄n =X1 + · · ·+ Xn

n. Remember E [X̄n] = µ and var(X̄n) = σ2/n

I Standardize X̄n to getX̄n − µσ/√n

I As n gets bigger,X̄n − µσ/√n

behaves more and more like a Normal(0, 1)

random variable.

(X̄n − µσ/√n< z

)≈ Φ(z)

Can we say more? Central Limit Theorem

Figure: (Courtesy: Tamara Broderick) You bet!

Example

the CLT to bound P(20∑i=1

Xi ≥ 15)

P(∑i

Xi ≥ 15) = P(∑i

Xi − 20 ≥ −5)

= P(X̄ − 1

20≥ −.25

√20)

≈ P(Z ≥ −1.18) = 0.86

I How useful is this? Better than Markov.

Example

Xi ≥ 15)

P(∑i

Xi ≥ 15) = P(∑i

Xi − 20 ≥ −5)

= P(X̄ − 1

20≥ −.25

√20)

≈ P(Z ≥ −1.18) = 0.86

Example

Xi ≥ 15)

P(∑i

Xi ≥ 15) = P(∑i

Xi − 20 ≥ −5)

= P(X̄ − 1

20≥ −.25

√20)

≈ P(Z ≥ −1.18) = 0.86

ExampleAn astronomer is interested in measuring the distance, in light-years, from his observatory toa distant star. Although the astronomer has a measuring technique, he knows that, becauseof changing atmospheric conditions and normal error, each time a measurement is made itwill not yield the exact distance, but merely an estimate. As a result, the astronomer plansto make a series of measurements and then use the average value of these measurements ashis estimated value of the actual distance. If the astronomer believes that the values of themeasurements are independent and identically distributed random variables having acommon mean d (the actual distance) and a common variance of 4 (light-years), how manymeasurements need he make to 95% sure that his estimated distance is accurate to within±.5 lightyears?

I Let X̄n be the mean of the measurements.

I How large does n have to be so that P(|X̄n − d | ≤ .5) = 0.95

(|X̄n − d |

2/√n≤ 0.25

)≈ P(|Z | ≤ 0.25

√n) = 1− 2P(Z ≤ −0.25

√n) = 0.95

P(Z ≤ −0.25√n) = 0.025

−0.25√n = −1.96√n = 7.84

n ≈ 62

ExampleAn astronomer is interested in measuring the distance, in light-years, from his observatory toa distant star. Although the astronomer has a measuring technique, he knows that, becauseof changing atmospheric conditions and normal error, each time a measurement is made itwill not yield the exact distance, but merely an estimate. As a result, the astronomer plansto make a series of measurements and then use the average value of these measurements ashis estimated value of the actual distance. If the astronomer believes that the values of themeasurements are independent and identically distributed random variables having acommon mean d (the actual distance) and a common variance of 4 (light-years), how manymeasurements need he make to 95% sure that his estimated distance is accurate to within±.5 lightyears?

I Let X̄n be the mean of the measurements.

I How large does n have to be so that P(|X̄n − d | ≤ .5) = 0.95

(|X̄n − d |

2/√n≤ 0.25

)≈ P(|Z | ≤ 0.25

√n) = 1− 2P(Z ≤ −0.25

√n) = 0.95

P(Z ≤ −0.25√n) = 0.025

−0.25√n = −1.96√n = 7.84

n ≈ 62

Normal Approximation to Binomial

The probability of selling an umbrella is 0.5 on a rainy day. If there are400 umbrellas in the store, whats the probability that the owner will sellat least 180?

I Let X be the total number of umbrellas sold.

I X ∼ Binomial(400, .5)

I We want P(X > 180). Crazy calculations.

I But can we approximate the distribution of X/n?

I X/n = (∑i

Yi )/n where E [Yi ] = 0.5 and var(Yi ) = 0.25.

I Sure! CLT tells us that for large n,X/400− 0.5√

0.25/400∼ N(0, 1)

I So P(X > 180) = P((X − 200)/√

100 > −2) ≈ P(Z ≥ −2) =

1− Φ(−2) = 0.97

Normal Approximation to Binomial

The probability of selling an umbrella is 0.5 on a rainy day. If there are400 umbrellas in the store, whats the probability that the owner will sellat least 180?

I Let X be the total number of umbrellas sold.

I X ∼ Binomial(400, .5)

I We want P(X > 180). Crazy calculations.

I But can we approximate the distribution of X/n?

I X/n = (∑i

Yi )/n where E [Yi ] = 0.5 and var(Yi ) = 0.25.

I Sure! CLT tells us that for large n,X/400− 0.5√

0.25/400∼ N(0, 1)

I So P(X > 180) = P((X − 200)/√

100 > −2) ≈ P(Z ≥ −2) =

1− Φ(−2) = 0.97

Frequentist Statistics

I The parameter(s) θ is fixed and unknown

I Data is generated through the likelihood function p(X ; θ) (ifdiscrete) or f (X ; θ) (if continuous).

I Now we will be dealing with multiple candidate models, one for eachvalue of θ

I We will use Eθ[h(X )] to define the expectation of the randomvariable h(X ) as a function of parameter θ

Problems we will look at

I Parameter estimation: We want to estimate unknown parametersfrom data.

I Maximum Likelihood estimation (section 9.1): Select theparameter that makes the observed data most likely.

I i.e. maximize the probability of obtaining the data at hand.

I Hypothesis testing: An unknown parameter takes a finite numberof values. One wants to find the best hypothesis based on the data.

I Significance testing: Given a hypothesis, figure out the rejectionregion and reject the hypothesis if the observation falls within thisregion.

sds 321: introduction to probability and statisticspsarkar/sds321/lecture23-new.pdfsds 321:...

Documents