sds 321: introduction to probability and statisticspsarkar/sds321/lecture23-new.pdfsds 321:...

24
SDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables- Inequalities, CLT Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/psarkar/teaching 1

Upload: others

Post on 13-Mar-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

SDS 321: Introduction to Probability andStatistics

Lecture 23: Continuous random variables-Inequalities, CLT

Purnamrita SarkarDepartment of Statistics and Data Science

The University of Texas at Austin

www.cs.cmu.edu/∼psarkar/teaching

1

Page 2: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Roadmap

I Examples of Markov and Chebyshev

I Weak law of large numbers and CLT

I Normal approximation to Binomial

2

Page 3: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Markov’s inequality Example

You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use

the Markov inequality to bound P(20∑i=1

Xi ≥ 15)

I P(∑i

Xi ≥ 15) ≤E [∑

i Xi ]

15=

20

15=

4

3

I How useful is this?

3

Page 4: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Markov’s inequality Example

You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use

the Markov inequality to bound P(20∑i=1

Xi ≥ 15)

I P(∑i

Xi ≥ 15) ≤E [∑

i Xi ]

15=

20

15=

4

3

I How useful is this?

3

Page 5: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Markov’s inequality Example

You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use

the Markov inequality to bound P(20∑i=1

Xi ≥ 15)

I P(∑i

Xi ≥ 15) ≤E [∑

i Xi ]

15=

20

15=

4

3

I How useful is this?

3

Page 6: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Chebyshev’s inequality Example

You have n independent Poisson(1) random variables X1, . . . ,Xn. Use theChebyshev inequality to bound P(|X̄ − 1| ≥ 1)?

I

P(|X̄ − 1| ≥ 1) ≤ var(X1)

n=

1

n

=1

10When n = 10

=1

100When n = 100

...

4

Page 7: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Chebyshev’s inequality Example

You have n independent Poisson(1) random variables X1, . . . ,Xn. Use theChebyshev inequality to bound P(|X̄ − 1| ≥ 1)?

I

P(|X̄ − 1| ≥ 1) ≤ var(X1)

n=

1

n

=1

10When n = 10

=1

100When n = 100

...

4

Page 8: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

5

Page 9: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

5

Page 10: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

5

Page 11: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

5

Page 12: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

5

Page 13: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Illustration

Consider the mean of n independent Poisson(1) random variables. Foreach n, we plot the distribution of the average.

6

Page 14: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Can we say more? Central Limit Theorem

Turns out that not only can you say that the sample mean is close to thetrue mean, you can actually predict its distribution using the famousCentral Limit Theorem.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let X̄n =X1 + · · ·+ Xn

n. Remember E [X̄n] = µ and var(X̄n) = σ2/n

I Standardize X̄n to getX̄n − µσ/√n

I As n gets bigger,X̄n − µσ/√n

behaves more and more like a Normal(0, 1)

random variable.

I P

(X̄n − µσ/√n< z

)≈ Φ(z)

7

Page 15: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Can we say more? Central Limit Theorem

Figure: (Courtesy: Tamara Broderick) You bet!

8

Page 16: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Example

You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use

the CLT to bound P(20∑i=1

Xi ≥ 15)

I

P(∑i

Xi ≥ 15) = P(∑i

Xi − 20 ≥ −5)

= P(X̄ − 1

1/√

20≥ −.25

√20)

≈ P(Z ≥ −1.18) = 0.86

I How useful is this? Better than Markov.

9

Page 17: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Example

You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use

the CLT to bound P(20∑i=1

Xi ≥ 15)

I

P(∑i

Xi ≥ 15) = P(∑i

Xi − 20 ≥ −5)

= P(X̄ − 1

1/√

20≥ −.25

√20)

≈ P(Z ≥ −1.18) = 0.86

I How useful is this? Better than Markov.

9

Page 18: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Example

You have 20 independent Poisson(1) random variables X1, . . . ,X20. Use

the CLT to bound P(20∑i=1

Xi ≥ 15)

I

P(∑i

Xi ≥ 15) = P(∑i

Xi − 20 ≥ −5)

= P(X̄ − 1

1/√

20≥ −.25

√20)

≈ P(Z ≥ −1.18) = 0.86

I How useful is this? Better than Markov.

9

Page 19: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

ExampleAn astronomer is interested in measuring the distance, in light-years, from his observatory toa distant star. Although the astronomer has a measuring technique, he knows that, becauseof changing atmospheric conditions and normal error, each time a measurement is made itwill not yield the exact distance, but merely an estimate. As a result, the astronomer plansto make a series of measurements and then use the average value of these measurements ashis estimated value of the actual distance. If the astronomer believes that the values of themeasurements are independent and identically distributed random variables having acommon mean d (the actual distance) and a common variance of 4 (light-years), how manymeasurements need he make to 95% sure that his estimated distance is accurate to within±.5 lightyears?

I Let X̄n be the mean of the measurements.

I How large does n have to be so that P(|X̄n − d | ≤ .5) = 0.95

I

P

(|X̄n − d |

2/√n≤ 0.25

√n

)≈ P(|Z | ≤ 0.25

√n) = 1− 2P(Z ≤ −0.25

√n) = 0.95

P(Z ≤ −0.25√n) = 0.025

−0.25√n = −1.96√n = 7.84

n ≈ 62

10

Page 20: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

ExampleAn astronomer is interested in measuring the distance, in light-years, from his observatory toa distant star. Although the astronomer has a measuring technique, he knows that, becauseof changing atmospheric conditions and normal error, each time a measurement is made itwill not yield the exact distance, but merely an estimate. As a result, the astronomer plansto make a series of measurements and then use the average value of these measurements ashis estimated value of the actual distance. If the astronomer believes that the values of themeasurements are independent and identically distributed random variables having acommon mean d (the actual distance) and a common variance of 4 (light-years), how manymeasurements need he make to 95% sure that his estimated distance is accurate to within±.5 lightyears?

I Let X̄n be the mean of the measurements.

I How large does n have to be so that P(|X̄n − d | ≤ .5) = 0.95

I

P

(|X̄n − d |

2/√n≤ 0.25

√n

)≈ P(|Z | ≤ 0.25

√n) = 1− 2P(Z ≤ −0.25

√n) = 0.95

P(Z ≤ −0.25√n) = 0.025

−0.25√n = −1.96√n = 7.84

n ≈ 62

10

Page 21: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Normal Approximation to Binomial

The probability of selling an umbrella is 0.5 on a rainy day. If there are400 umbrellas in the store, whats the probability that the owner will sellat least 180?

I Let X be the total number of umbrellas sold.

I X ∼ Binomial(400, .5)

I We want P(X > 180). Crazy calculations.

I But can we approximate the distribution of X/n?

I X/n = (∑i

Yi )/n where E [Yi ] = 0.5 and var(Yi ) = 0.25.

I Sure! CLT tells us that for large n,X/400− 0.5√

0.25/400∼ N(0, 1)

I So P(X > 180) = P((X − 200)/√

100 > −2) ≈ P(Z ≥ −2) =

1− Φ(−2) = 0.97

11

Page 22: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Normal Approximation to Binomial

The probability of selling an umbrella is 0.5 on a rainy day. If there are400 umbrellas in the store, whats the probability that the owner will sellat least 180?

I Let X be the total number of umbrellas sold.

I X ∼ Binomial(400, .5)

I We want P(X > 180). Crazy calculations.

I But can we approximate the distribution of X/n?

I X/n = (∑i

Yi )/n where E [Yi ] = 0.5 and var(Yi ) = 0.25.

I Sure! CLT tells us that for large n,X/400− 0.5√

0.25/400∼ N(0, 1)

I So P(X > 180) = P((X − 200)/√

100 > −2) ≈ P(Z ≥ −2) =

1− Φ(−2) = 0.97

11

Page 23: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Frequentist Statistics

I The parameter(s) θ is fixed and unknown

I Data is generated through the likelihood function p(X ; θ) (ifdiscrete) or f (X ; θ) (if continuous).

I Now we will be dealing with multiple candidate models, one for eachvalue of θ

I We will use Eθ[h(X )] to define the expectation of the randomvariable h(X ) as a function of parameter θ

12

Page 24: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture23-new.pdfSDS 321: Introduction to Probability and Statistics Lecture 23: Continuous random variables-Inequalities,

Problems we will look at

I Parameter estimation: We want to estimate unknown parametersfrom data.

I Maximum Likelihood estimation (section 9.1): Select theparameter that makes the observed data most likely.

I i.e. maximize the probability of obtaining the data at hand.

I Hypothesis testing: An unknown parameter takes a finite numberof values. One wants to find the best hypothesis based on the data.

I Significance testing: Given a hypothesis, figure out the rejectionregion and reject the hypothesis if the observation falls within thisregion.

13