18: central limit theoremweb.stanford.edu/class/archive/cs/cs109/cs109.1202/... · 2019. 11....
Post on 13-Mar-2021
1 Views
Preview:
TRANSCRIPT
18: Central Limit TheoremLisa YanNovember 1, 2019
1
Lisa Yan, CS109, 2019
Beta random variabledef An Beta random variable 𝑋 is defined as follows:
2
𝑓 𝑥 =1
𝐵 𝑎, 𝑏𝑥!"# 1 − 𝑥 $"#𝑋~Beta(𝑎, 𝑏)
VarianceExpectation
𝐸 𝑋 =𝑎
𝑎 + 𝑏 Var 𝑋 =𝑎𝑏
𝑎 + 𝑏 - 𝑎 + 𝑏 + 1
Support of 𝑋: 0, 1
𝑎 > 0, 𝑏 > 0
Beta is a distribution for probabilities.
where 𝐵 𝑎, 𝑏 = ∫/# 𝑥!"# 1 − 𝑥 $"#𝑑𝑥, normalizing constant
Review
Lisa Yan, CS109, 2019
🤔
0.0
1.0
2.0
3.0
4.0
0.0 0.2 0.4 0.6 0.8 1.0
3
CS109 focus: Beta where 𝑎, 𝑏 both positive integers
If 𝑎, 𝑏 are positive integers,Beta parameters 𝑎, 𝑏 couldcome from an experiment:
𝑎 = “successes” + 1𝑏 = “failures” + 1
𝑋~Beta(𝑎, 𝑏)
Beta(5,5)
Beta(2,8) Beta(8,2)
Review
Lisa Yan, CS109, 2019
0.0
1.0
2.0
3.0
4.0
0.0 0.2 0.4 0.6 0.8 1.0
Back to flipping coins• Start with a 𝑋~Uni 0,1 over probability• Observe 𝑛 = 7 successes and 𝑚 = 1 failures• Your new belief about the probability of 𝑋 is:
𝑓5|7 𝑥 𝑛 =1𝑐𝑥8 1 − 𝑥 #,where 𝑐 = 5
/
#𝑥8 1 − 𝑥 #𝑑𝑥
4𝑓 7|9𝑥|𝑛
Posterior belief, 𝑋|𝑁
𝑥
Posterior belief, 𝑋|𝑁:Beta(𝑎 = 8, 𝑏 = 2)
Beta(𝑎 = 𝑛 + 1, 𝑏 = 𝑚 + 1)
𝑓5|7 𝑥 𝑛 =1𝑐𝑥9"# 1 − 𝑥 :"#
Lisa Yan, CS109, 2019
Understanding Beta• Start with a 𝑋~Uni 0,1 over probability• Observe 𝑛 successes and 𝑚 failures• Your new belief about the probability of 𝑋 is:
𝑋|𝑁~Beta(𝑎 = 𝑛 + 1, 𝑏 = 𝑚 + 1)
5
Lisa Yan, CS109, 2019
Understanding Beta• Start with a 𝑋~Uni 0,1 over probability• Observe 𝑛 successes and 𝑚 failures• Your new belief about the probability of 𝑋 is:
𝑋|𝑁~Beta(𝑎 = 𝑛 + 1, 𝑏 = 𝑚 + 1)Check this out:
Beta 𝑎 = 1, 𝑏 = 1 has PDF:
So our prior 𝑋~Beta 𝑎 = 1, 𝑏 = 1 !
6
𝑓 𝑥 =1
𝐵 𝑎, 𝑏𝑥?@A 1 − 𝑥 B@A
where 0 < 𝑥 < 1
=1
𝐵 𝑎, 𝑏𝑥D 1 − 𝑥 D =
1
∫DA 1𝑑𝑥
Lisa Yan, CS109, 2019
If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:
• …and if we observe 𝑛 successes and 𝑚 failures:
• …then our posterior belief about 𝑋is also beta.
7
𝑋~Beta(𝑎, 𝑏)
𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)
𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)
This is the main takeaway of Beta.👉
likelihood
Lisa Yan, CS109, 2019
If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:
• …and if we observe 𝑛 successes and 𝑚 failures:
• …then our posterior belief about 𝑋is also beta.
8
𝑋~Beta(𝑎, 𝑏)
Proof:
𝑓5|7 𝑥 𝑛 =𝑝7|5 𝑛|𝑥 𝑓5 𝑥
𝑝7 𝑛 =
𝑛 +𝑚𝑛 𝑥G 1 − 𝑥 H ⋅ 1
𝐵 𝑎, 𝑏 𝑥?@A 1 − 𝑥 B@A
𝑝9 𝑛
= 𝐶 ⋅ 𝑥> 1 − 𝑥 ? ⋅ 𝑥!"# 1 − 𝑥 $"#constants that don’t depend on 𝑥
= 𝐶 ⋅ 𝑥>@!"# 1 − 𝑥 ?@$"# ✅
𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)
𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)
likelihood
Lisa Yan, CS109, 2019
If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:
• …and if we observe 𝑛 successes and 𝑚 failures:
• …then our posterior belief about 𝑋is also beta.
9
𝑋~Beta(𝑎, 𝑏)
𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)
Beta is a conjugate distribution.• Prior and posterior parametric forms are the same• Practically, conjugate means easy update:
Add number of “heads” and “tails” seento Beta parameter.
𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)likelihood
Lisa Yan, CS109, 2019
If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:
• …and if we observe 𝑛 successes and 𝑚 failures:
• …then our posterior belief about 𝑋is also beta.
10
𝑋~Beta(𝑎, 𝑏)
𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)
You can set the prior to reflect how biased you think the coin is apriori.• This is a subjective probability!• 𝑋~Beta(𝑎, 𝑏): have seen 𝑎 + 𝑏 − 2 imaginary trials, where
𝑎 − 1 are heads, 𝑏 − 1 tails• Then Beta 1, 1 = Uni(0, 1) means we haven’t seen any imaginary trials
𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)likelihood
Lisa Yan, CS109, 2019
If the prior is a Beta…Let 𝑋 be our random variable for probability of success and 𝑁• If our prior belief about 𝑋 is beta:
• …and if we observe 𝑛 successes and 𝑚 failures:
• …then our posterior belief about 𝑋is also beta.
11
𝑋~Beta(𝑎, 𝑏)
𝑁|𝑋~Bin(𝑛 + 𝑚, 𝑥)
This is the main takeaway of Beta.👉
likelihood
Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior
Beta 𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B +𝑚 + 1Posterior
𝑋|𝑁~Beta(𝑎 + 𝑛, 𝑏 + 𝑚)
Lisa Yan, CS109, 2019
The enchanted dieLet 𝑋 be the probability of rolling a 6 on Lisa’s die.• Prior: Imagine 5 die rolls where only 6 showed up• Observation: roll it a few times…
What is the updated distribution of 𝑋 after our observation?
Check out the demo!
12
Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior
Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior
Lisa Yan, CS109, 2019
Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.
What is your new belief that the drug “works”?
13
Frequentist
Let 𝑝 be the probabilityyour drug works.
𝑝 ≈1220
= 0.6
Bayesian
A frequentist view will not incorporateprior/expert belief about probability.👉
Lisa Yan, CS109, 2019
Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.
What is your new belief that the drug “works”?
14
Frequentist
Let 𝑝 be the probabilityyour drug works.
𝑝 ≈1420
= 0.7
Bayesian
Let 𝑋 be the probabilityyour drug works.
𝑋 is a random variable.
Lisa Yan, CS109, 2019
🤔15
Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.
What is your new belief that the drug “works”?
What is the prior distribution of 𝑋? (select all that apply)
A. 𝑋~Beta 1, 1 = Uni 0, 1B. 𝑋~Beta 81, 101C. 𝑋~Beta 80, 20D. 𝑋~Beta 81, 21E. 𝑋~Beta 5, 2
Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior
Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior
(Bayesian interpretation)
Lisa Yan, CS109, 2019
🤔16
Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.
What is your new belief that the drug “works”?
What is the prior distribution of 𝑋? (select all that apply)
A. 𝑋~Beta 1, 1 = Uni 0, 1B. 𝑋~Beta 81, 101C. 𝑋~Beta 80, 20D. 𝑋~Beta 81, 21E. 𝑋~Beta 5, 2
Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior
Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior
Interpretation: 80 successes / 100 imaginary trials
Interpretation: 4 successes / 5 imaginary trials
(Bayesian interpretation)
(you can choose either; we choose E on next slide)
Lisa Yan, CS109, 2019
🤔17
Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.
What is your new belief that the drug “works”?
Prior: 𝑋~Beta 𝑎 = 5, 𝑏 = 2Posterior: 𝑋~Beta 𝑎 = 5 + 12, 𝑏 = 2 + 8
~Beta 𝑎 = 17, 𝑏 = 10
Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior
Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior
(Bayesian interpretation)
0.01.02.03.04.05.0
0.0 0.2 0.4 0.6 0.8 1.0
Posterior
Prior
𝑥
Lisa Yan, CS109, 2019
🤔18
Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.
What is your new belief that the drug “works”?
Prior: 𝑋~Beta 𝑎 = 5, 𝑏 = 2Posterior: 𝑋~Beta 𝑎 = 5 + 12, 𝑏 = 2 + 8
~Beta 𝑎 = 17, 𝑏 = 10
Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior
Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior
(Bayesian interpretation)
What do you report to pharmacists?A. Expectation of posteriorB. Mode of posteriorC. Distribution of posteriorD. Nothing
0.01.02.03.04.05.0
0.0 0.2 0.4 0.6 0.8 1.0
Posterior
Prior
𝑥
mode
Lisa Yan, CS109, 2019
🤔19
Medicinal Beta• Before being tested, a medicine is believed to “work” 80% of the time.• The medicine is tried on 20 patients.• It “works” for 12, “doesn’t work” for 8.
What is your new belief that the drug “works”?
Prior: 𝑋~Beta 𝑎 = 5, 𝑏 = 2Posterior: 𝑋~Beta 𝑎 = 5 + 12, 𝑏 = 2 + 8
~Beta 𝑎 = 17, 𝑏 = 10
Beta(𝑎 = 𝑛A?!B + 1, 𝑏 = 𝑚A?!B + 1)Prior
Beta(𝑎 = 𝑛A?!B + 𝑛 + 1, 𝑏 = 𝑚A?!B + 𝑚 + 1)Posterior
(Bayesian interpretation)
What do you report to pharmacists?A. Expectation of posteriorB. Mode of posteriorC. Distribution of posteriorD. Nothing
0.01.02.03.04.05.0
0.0 0.2 0.4 0.6 0.8 1.0
Posterior
Prior
𝑥
𝐸 𝑋 =𝑎
𝑎 + 𝑏=
1717 + 10
≈ 0.63
mode 𝑋 =𝑎 − 1
𝑎 + 𝑏 − 2=
1616 + 9
= 0.64
mode
Lisa Yan, CS109, 2019
Food for thought
20
𝑌~Ber 𝑝
In this lecture: If we don’t know the parameter 𝑝,Bayesian statisticians will:• Treat the parameter as a random variable 𝑋
with a Beta prior distribution• Perform an experiment• Based on experiment outcomes, update the
posterior distribution of 𝑋Food for thought:
Any parameter for a “parameterized” random variable can be thought of as
a random variable.𝑌~𝒩 𝜇, 𝜎:
🤔
👉
Lisa Yan, CS109, 2019
Today’s plan
Finish Beta
Central Limit Theorem (CLT)
CLT exercises
21
(silent drumroll)
22
Lisa Yan, CS109, 2019
Central Limit TheoremConsider 𝑛 independent and identically distributed (i.i.d.) variables 𝑋#, 𝑋:, … , 𝑋>with 𝐸 𝑋A = 𝜇 and Var 𝑋A = 𝜎:.
JAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)
The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇and variance 𝑛𝜎:.
23
As 𝑛 → ∞
Lisa Yan, CS109, 2019
True happiness
24
Lisa Yan, CS109, 2019
Central Limit TheoremConsider 𝑛 independent and identically distributed (i.i.d.) variables 𝑋#, 𝑋:, … , 𝑋>with 𝐸 𝑋A = 𝜇 and Var 𝑋A = 𝜎:.
JAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)
The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇and variance 𝑛𝜎:.
25
As 𝑛 → ∞
Lisa Yan, CS109, 2019
🤔26
Quick checkWhat dimensions are the following random variables?1. 𝑋#
2. 𝑋#, 𝑋:, … , 𝑋>
3.
4.
A. 1-D random variableB. 𝑛 -D random variable (a vector)C. not a random variable
WXYA
G
𝑋X
1𝑛WXYA
G
𝑋X
Lisa Yan, CS109, 2019
🤔27
Quick checkWhat dimensions are the following random variables?1. 𝑋#
2. 𝑋#, 𝑋:, … , 𝑋>
3.
4.
A. 1-D random variableB. 𝑛 -D random variable (a vector)C. not a random variable
WXYA
G
𝑋X
1𝑛WXYA
G
𝑋X
(A)
(B)
(A)
(A) (aka the sample mean)
(aka a sample)
Lisa Yan, CS109, 2019
Central Limit TheoremConsider 𝑛 independent and identically distributed (i.i.d.) variables 𝑋#, 𝑋:, … , 𝑋>with 𝐸 𝑋A = 𝜇 and Var 𝑋A = 𝜎:.
JAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)
The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇and variance 𝑛𝜎:.
28
As 𝑛 → ∞
Lisa Yan, CS109, 2019
i.i.d. random variablesConsider 𝑛 variables 𝑋#, 𝑋:, … , 𝑋>.𝑋#, 𝑋:, … , 𝑋> are independent and identically distributed if• 𝑋#, 𝑋:, … , 𝑋> are independent, and• All have the same PMF (if discrete) or PDF (if continuous).⇒ 𝐸 𝑋A = 𝜇 for 𝑖 = 1,… , 𝑛⇒ Var 𝑋A = 𝜎: for 𝑖 = 1,… , 𝑛
Side note: Multiple random variables 𝑋A, 𝑋-, … , 𝑋G are independent iff
for all subsets 𝑋A,… , 𝑋\29
𝑃 𝑋# ≤ 𝑥#, 𝑋: ≤ 𝑥:, … , 𝑋N ≤ 𝑥N =OAK#
N
𝑃 𝑋A ≤ 𝑥A
Same thing: i.i.d. iid IID
Lisa Yan, CS109, 2019
🤔30
Quick check
Are 𝑋#, 𝑋:, … , 𝑋> i.i.d. with the following distributions?
1. 𝑋A~Exp 𝜆 , 𝑋A independent
2. 𝑋A~Exp 𝜆A , 𝑋A independent
3. 𝑋A~Exp 𝜆 , 𝑋# = 𝑋: = ⋯ = 𝑋>
4. 𝑋A~Bin 𝑛A , 𝑝 , 𝑋A independent
Lisa Yan, CS109, 2019
🤔31
Quick check
Are 𝑋#, 𝑋:, … , 𝑋> i.i.d. with the following distributions?
1. 𝑋A~Exp 𝜆 , 𝑋A independent
2. 𝑋A~Exp 𝜆A , 𝑋A independent
3. 𝑋A~Exp 𝜆 , 𝑋# = 𝑋: = ⋯ = 𝑋>
4. 𝑋A~Bin 𝑛A , 𝑝 , 𝑋A independent
✅
(unless 𝜆X equal)
dependent: 𝑋A = 𝑋- = ⋯ = 𝑋G
(unless 𝑛X equal)Note underlying Bernoulli RVs are i.i.d.!
Lisa Yan, CS109, 2019
Central Limit TheoremConsider 𝑛 independent and identically distributed (i.i.d.) variables 𝑋#, 𝑋:, … , 𝑋>with 𝐸 𝑋A = 𝜇 and Var 𝑋A = 𝜎:.
JAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)
The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇and variance 𝑛𝜎:.
32
As 𝑛 → ∞
(demo)
Lisa Yan, CS109, 2019
Sum of dice rolls
Roll 𝑛 independent dice. Let 𝑋A be the outcome of roll 𝑖. 𝑋P are i.i.d.
33
WXYA
A
𝑋X
2 4 6 8 10 120.0
0.1
0.2
1 2 3 4 5 60.0
0.1
0.2
3 5 7 9 11131517
WXYA
-
𝑋X WXYA
_
𝑋XSum of 1die roll
Sum of 2die rolls
Sum of 3die rolls
How many wayscan you roll a totalof 3 vs 11?
Lisa Yan, CS109, 2019
CLT explains a lot
34
WXYA
G
𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)
Normal approximation of BinomialSum of i.i.d. Bernoulli RVs ≈ Normal
𝑋~Bin(𝑛, 𝑝)𝑋 =WXYA
G
𝑋X
𝑋~𝒩 𝑛𝑝, 𝑛𝑝(1 − 𝑝)
Proof:
𝑋~𝒩 𝑛𝜇, 𝑛𝜎: CLT, as 𝑛 → ∞
(substitute mean,variance of Bernoulli)
Let 𝑋X~Ber(𝑝) for 𝑖 = 1,… , 𝑛, where 𝑋X are i.i.d.𝐸 𝑋X = 𝑝, Var 𝑋X = 𝑝(1 − 𝑝)
As 𝑛 → ∞The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.
Lisa Yan, CS109, 2019
CLT explains a lot
35
WXYA
G
𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)
Sample ofsize 15,
sum values
Distribution of 𝑋X Distribution of ∑XYAAb 𝑋X
As 𝑛 → ∞The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.
Lisa Yan, CS109, 2019
CLT explains a lot
36
WXYA
G
𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)
Sample ofsize 15,
average values
(sample mean) Distribution of AAb∑XYAAb 𝑋XDistribution of 𝑋X
As 𝑛 → ∞The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.
Lisa Yan, CS109, 2019
CLT explains a lot
37
WXYA
G
𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)
0 1 2 3 4 5
𝑛 = 5
Galton Board, by Sir Francis Galton(1822-1911)
As 𝑛 → ∞The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.
Break for Friday/ announcements
38
BAYESIANS
Lisa Yan, CS109, 2019
Midterm exam
It’s done!Grades: Friday 11/1Solutions: Friday 11/1
Announcements
39
Problem Set 4
Due: Wednesday 11/6Covers: Up to Law of Total Expectation
Late day reminder: No late days permitted past last day of the quarter, 12/7
Lisa Yan, CS109, 2019
Proof of CLT
40
The sum of 𝑛 i.i.d. random variables is normally distributed with mean 𝑛𝜇 and variance 𝑛𝜎-.W
XYA
G
𝑋X ~𝒩(𝑛𝜇, 𝑛𝜎-)
• The Fourier Transform of a PDF is called a characteristic function.• Take the characteristic function of the probability mass of the sample
distance from the mean, divided by standard deviation• Show that this approaches an
exponential function in the limit as 𝑛 → ∞: • This function is in turn the characteristic function of the Standard
Normal, 𝑍~ 𝒩(0,1).
Proof:
𝑓 𝑥 = 𝑒"ST:
(this proof is beyond the scope of CS109)
As 𝑛 → ∞
Lisa Yan, CS109, 2019
Implications of CLT
Anything that is a sum/average of independent random variables is normal…meaning in real life, many things are normally distributed:• Movie ratings: averages of independent viewer scores• Polling:◦ Ask 100 people if they will vote for candidate 1◦ 𝑝# = # “yes”/100◦ Sum of Bernoulli RVs (each person independently says “yes” w.p. 𝑝)
◦ Repeat this process with different groups to get 𝑝A,… , 𝑝G (different sample statistics)
◦ Normal distribution over sample means 𝑝\◦ Confidence interval: “How likely is it that an estimate for true 𝑝 is close?”
41
Lisa Yan, CS109, 2019
What about other functions?
42
WAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)
?
Sum of i.i.d. RVs
Average of i.i.d. RVs
Max of i.i.d. RVs
(sample mean)?
Let 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:
Lisa Yan, CS109, 2019
What about other functions?
43
WAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)
?
Average of i.i.d. RVs
Max of i.i.d. RVs
(sample mean)?
Let 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:
Sum of i.i.d. RVs
Lisa Yan, CS109, 2019
🤔44
Distribution of sample meanLet 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:
Define:
𝑌~𝒩 𝑛𝜇, 𝑛𝜎:
f𝑋 = #>𝑌
f𝑋~𝒩( ? , ? )
U𝑋 =1𝑛JAK#
>
𝑋A (sample mean) 𝑌 =JAK#
>
𝑋A (sum)
(Linear transform of a Normal)
A. f𝑋~𝒩(𝑛𝜇, 𝑛𝜎-)B. f𝑋~𝒩(hG ,
iT
G )C. f𝑋~𝒩(𝜇, 𝜎-)D. f𝑋~𝒩(𝜇, i
T
G)
(CLT, as 𝑛 → ∞)
Lisa Yan, CS109, 2019
🤔45
Distribution of sample meanLet 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:
Define:
𝑌~𝒩 𝑛𝜇, 𝑛𝜎:
f𝑋 = #>𝑌
f𝑋~𝒩( ? , ? )
U𝑋 =1𝑛JAK#
>
𝑋A (sample mean) 𝑌 =JAK#
>
𝑋A (sum)
(Linear transform of a Normal)
A. f𝑋~𝒩(𝑛𝜇, 𝑛𝜎-)B. f𝑋~𝒩(hG ,
iT
G )C. f𝑋~𝒩(𝜇, 𝜎-)D. f𝑋~𝒩(𝜇, i
T
G)
(CLT, as 𝑛 → ∞)
Lisa Yan, CS109, 2019
🤔46
Distribution of sample meanLet 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:
Define:
𝑌~𝒩 𝑛𝜇, 𝑛𝜎:
f𝑋 = #>𝑌
f𝑋~𝒩 𝜇, VT
>
U𝑋 =1𝑛JAK#
>
𝑋A (sample mean) 𝑌 =JAK#
>
𝑋A (sum)
(CLT, as 𝑛 → ∞)
(Linear transform of a Normal)
The average of i.i.d. random variables (i.e., sample mean) is normally distributed with mean 𝜇 and variance 𝜎-/𝑛.
1𝑛WXYA
G
𝑋X ~𝒩(𝜇,𝜎-
𝑛)
Demo: http://onlinestatbook.com/stat_sim/sampling_dist/
Lisa Yan, CS109, 2019
?
What about other functions?
47
WAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)
Average of i.i.d. RVs
Max of i.i.d. RVs
(sample mean)
Let 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:
Sum of i.i.d. RVs
1𝑛WAK#
>
𝑋A ~𝒩(𝜇,𝜎:
𝑛)
Gumbel(see Fisher-Tippett Gnedenko Theorem)
Lisa Yan, CS109, 2019
Once upon a time…
48
Abraham de MoivreCLT for 𝑋~Ber 1/21733
Aubrey Drake Graham(Drake)
Lisa Yan, CS109, 2019
A short history of the CLT
49
1700
1800
1900
2000
1733: CLT for 𝑋~Ber 1/2postulated by Abraham de Moivre
1823: Pierre-Simon Laplace extends de Moivre’swork to approximating Bin 𝑛, 𝑝 with Normal
1901: Alexandr Lyapunov provides precisedefinition and rigorous proof of CLT
2018: Drake releases Scorpion• It was his 5th studio album, bringing his total # of songs to 190• Mean quality of subsamples of songs is normally distributed (thanks
to the Central Limit Theorem)
Lisa Yan, CS109, 2019
Today’s plan
Central Limit Theorem (CLT)
CLT exercises
50
Lisa Yan, CS109, 2019
Working with the CLT
51
⚠If 𝑋X is discrete:
Use the continuity correction on 𝑌!
WAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)
1𝑛WAK#
>
𝑋A ~𝒩(𝜇,𝜎:
𝑛)
Sum of i.i.d. RVs
Average of i.i.d. RVs(sample mean)
Let 𝑋#, 𝑋:, … , 𝑋> i.i.d., where 𝐸 𝑋A = 𝜇, Var 𝑋A = 𝜎:. As 𝑛 → ∞:
Lisa Yan, CS109, 2019
🤔52
Dice gameYou will roll 10 6-sided dice 𝑋#, 𝑋:, … , 𝑋#/ .• Let 𝑋 = 𝑋# + 𝑋: +⋯+ 𝑋#/, the total value of all 10 rolls.• You win if 𝑋 ≤ 25 or 𝑋 ≥ 45.
And now the truth (according to the CLT)…
WAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)As 𝑛 → ∞:
1. Define RVs andstate goal.
2. Solve.
𝐸 𝑋X = 3.5,Var 𝑋X = 35/12𝑋 ≈ 𝑌~𝒩(10 3.5 , 10 35/12 )
Want:𝑃 𝑋 ≤ 25 or 𝑋 ≥ 45
A. 𝑃 25 ≤ 𝑌 ≤ 45B. 𝑃 𝑌 ≤ 25.5 + 𝑃 𝑌 ≥ 44.5C. 1 − 𝑃 25 ≤ 𝑌 ≤ 45D. 1 − 𝑃 25.5 ≤ 𝑌 ≤ 44.5
Lisa Yan, CS109, 2019
🤔53
Dice gameYou will roll 10 6-sided dice 𝑋#, 𝑋:, … , 𝑋#/ .• Let 𝑋 = 𝑋# + 𝑋: +⋯+ 𝑋#/, the total value of all 10 rolls.• You win if 𝑋 ≤ 25 or 𝑋 ≥ 45.
And now the truth (according to the CLT)…
WAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)As 𝑛 → ∞:
1. Define RVs andstate goal.
2. Solve.
𝐸 𝑋X = 3.5,Var 𝑋X = 35/12𝑋 ≈ 𝑌~𝒩(10 3.5 , 10 35/12 )
Want:𝑃 𝑋 ≤ 25 or 𝑋 ≥ 45≈ 𝑃 𝑌 ≤ 25.5 + 𝑃 𝑌 ≥ 44.5
𝑃 𝑌 ≤ 25.5 + 𝑃 𝑌 ≥ 44.5
= Φ25.5 − 3510 35/12
− 1 − Φ44.5 − 3510 35/12
≈ Φ −1.76 − 1 − Φ 1.76≈ 1 − 0.9608 − 1 − 0.9608= 0.0784
Lisa Yan, CS109, 2019
🤔54
Dice gameYou will roll 10 6-sided dice 𝑋#, 𝑋:, … , 𝑋#/ .• Let 𝑋 = 𝑋# + 𝑋: +⋯+ 𝑋#/, the total value of all 10 rolls.• You win if 𝑋 ≤ 25 or 𝑋 ≥ 45.
And now the truth (according to the CLT)…
WAK#
>
𝑋A ~𝒩(𝑛𝜇, 𝑛𝜎:)As 𝑛 → ∞:
0
0.02
0.04
0.06
0.08
10 20 30 40 50 60
≈ 𝑃 𝑌 ≤ 25.5 + 𝑃 𝑌 ≥ 44.5
≈ 0.0784
𝑃 𝑋 ≤ 25 or 𝑋 ≥ 45 ≈ 0.0780
(by computer)
(by CLT)
Lisa Yan, CS109, 2019
As 𝑛 → ∞:Clock running time
55
f𝑋~𝒩 𝑡,4𝑛
𝑃 𝑡 − 0.5 ≤ f𝑋 ≤ 𝑡 + 0.5 = 0.95Want:
𝑃 0.5 ≤ f𝑋 − 𝑡 ≤ 0.5 = 0.95f𝑋 − 𝑡~𝒩 0,4𝑛
(linear transform of
a normal)
(CLT)
Want to find the mean (clock)runtime of an algorithm, 𝜇 = 𝑡 sec.• Suppose variance of
runtime is 𝜎: = 4 sec2.How many trials do we need s.t. estimated time = 𝑡 ± 0.5 with 95% certainty?
Run algorithm repeatedly (i.i.d. trials):• 𝑋X = runtime of 𝑖-th run (for 1 ≤ 𝑖 ≤ 𝑛)• Estimate runtime to be
average of 𝑛 trials, f𝑋
1. Define RVs andstate goal.
2. Solve.
1𝑛WAK#
>
𝑋A ~𝒩(𝜇,𝜎:
𝑛 )
Lisa Yan, CS109, 2019
Clock running timeWant to find the mean (clock)runtime of an algorithm, 𝜇 = 𝑡 sec.• Suppose variance of
runtime is 𝜎: = 4 sec2.How many trials do we need s.t. estimated time = 𝑡 ± 0.5 with 95% certainty?
56
Run algorithm repeatedly (i.i.d. trials):• 𝑋X = runtime of 𝑖-th run (for 1 ≤ 𝑖 ≤ 𝑛)• Estimate runtime to be
average of 𝑛 trials, f𝑋
1. Define RVs andstate goal.
0.95 =𝑃 0.5 ≤ f𝑋 − 𝑡 ≤ 0.5
f𝑋 − 𝑡~𝒩 0,4𝑛
2. Solve.
0.95 = 𝐹 U5"] 0.5 − 𝐹 U5"] −0.5
= Φ0.5 − 04/𝑛
− Φ−0.5 − 0
4/𝑛= 2Φ
𝑛4
− 1
As 𝑛 → ∞:1𝑛WAK#
>
𝑋A ~𝒩(𝜇,𝜎:
𝑛 )
Lisa Yan, CS109, 2019
Clock running timeWant to find the mean (clock)runtime of an algorithm, 𝜇 = 𝑡 sec.• Suppose variance of
runtime is 𝜎: = 4 sec2.How many trials do we need s.t. estimated time = 𝑡 ± 0.5 with 95% certainty?
57
Run algorithm repeatedly (i.i.d. trials):• 𝑋X = runtime of 𝑖-th run (for 1 ≤ 𝑖 ≤ 𝑛)• Estimate runtime to be
average of 𝑛 trials, f𝑋
1. Define RVs andstate goal.
0.95 =𝑃 0.5 ≤ f𝑋 − 𝑡 ≤ 0.5
f𝑋 − 𝑡~𝒩 0,4𝑛
2. Solve.
0.95 = 𝐹 U5"] 0.5 − 𝐹 U5"] −0.5
= Φ0.5 − 04/𝑛
− Φ−0.5 − 0
4/𝑛= 2Φ
𝑛4
− 1
0.975 = Φ 𝑛/4𝑛/4 = Φ"# 0.975 ≈ 1.96 𝑛 ≈ 62
As 𝑛 → ∞:1𝑛WAK#
>
𝑋A ~𝒩(𝜇,𝜎:
𝑛 )
Lisa Yan, CS109, 2019
Wonderful form of cosmic order
I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the ”[Central limit theorem]". The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst
the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway.
– Sir Francis Galton
58
(of the Galton Board)
It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason.
Whenever a large sample of chaotic elements are taken in hand and marshalled in the order of their magnitude, an unsuspected and most
beautiful form of regularity proves to have been latent all along.
Lisa Yan, CS109, 2019
Extra slides
Extra CLT exercises
59
Lisa Yan, CS109, 2019
Crashing website• Let 𝑋 = number of visitors to a website, where 𝑋~Poi 100 .• The server crashes if there are ≥ 120 requests/minute.
What is 𝑃 server crashes in next minute ?
60
Strategy: Poisson (exact) Strategy: CLT (approximate)State goal
𝑃 𝑋 ≥ 120
Want: 𝑃 𝑋 ≥ 120Solve
= JNK#:/
^120 N𝑒"#//
𝑘!
≈ 0.0282
Lisa Yan, CS109, 2019
Crashing website• Let 𝑋 = number of visitors to a website, where 𝑋~Poi 100 .• The server crashes if there are ≥ 120 requests/minute.
What is 𝑃 server crashes in next minute ?
61
Strategy: Poisson (exact) Strategy: CLT (approximate)State goal
𝑃 𝑋 ≥ 120
Want: 𝑃 𝑋 ≥ 120Solve
= JNK#:/
^120 N𝑒"#//
𝑘!
≈ 0.0282
State approx.goal
Poi 100 ~WXYA
G
Poi 100/𝑛 (sum of IIDPoisson = Poisson)
𝑋 ≈ 𝑌~𝒩(𝑛𝜇, 𝑛𝜎-) 𝜇 =100𝑛
= 𝜎-
Want: 𝑃 𝑋 ≥ 120 ≈ 𝑃 𝑌 ≥ 119.5Solve𝑃 𝑌 ≥ 119.5 = 1 − Φ
119.5 − 100100
= 1 − Φ 1.95≈ 0.0256
top related