sds 321: introduction to probability and statisticspsarkar/sds321/lecture22-new.pdf · p(z z) =p(fx...

43
SDS 321: Introduction to Probability and Statistics Lecture 21: Continuous random variables- max of two independent r.v.’s, iterated expectation Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/psarkar/teaching 1

Upload: others

Post on 17-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

SDS 321: Introduction to Probability andStatistics

Lecture 21: Continuous random variables- max oftwo independent r.v.’s, iterated expectation

Purnamrita SarkarDepartment of Statistics and Data Science

The University of Texas at Austin

www.cs.cmu.edu/∼psarkar/teaching

1

Page 2: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Roadmap

I Function of two independent random variables

I Conditional expectation as a random variable, law of iteratedexpectations

I Useful inequalities and the normal approximation to Binomial

2

Page 3: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two random variables

I If X and Y are both random variables, then Z = g(X ,Y ) is also arandom variable.

I In the discrete case, we could easily find the PMF of the newrandom variable:

pZ (z) =∑

x ,y |g(x ,y)=z

pX ,Y (x , y)

I For example, if I roll two fair dice, what is the probability that thesum is 6?

I Each possible ordered pair has probability 1/36.

I The options that sum to 6 are (1, 5), (2, 4), (3, 3), (4, 2), (5, 1)... or inother words (k, 6− k) for k = 1, . . . , 5.

I So, pZ (5) = 5/36

3

Page 4: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two random variables

I If X and Y are both random variables, then Z = g(X ,Y ) is also arandom variable.

I In the discrete case, we could easily find the PMF of the newrandom variable:

pZ (z) =∑

x ,y |g(x ,y)=z

pX ,Y (x , y)

I For example, if I roll two fair dice, what is the probability that thesum is 6?

I Each possible ordered pair has probability 1/36.

I The options that sum to 6 are (1, 5), (2, 4), (3, 3), (4, 2), (5, 1)... or inother words (k, 6− k) for k = 1, . . . , 5.

I So, pZ (5) = 5/36

3

Page 5: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two independent continuous random variables

I I know two guests will both arrive in the next hour. The arrivaltimes are independent random variables which follow a uniformdistribution. What is the PDF of the arrival time of the last guest toarrive?

I X ∼ Uniform([0, 1]) and Y ∼ Uniform([0, 1]).

I In other words, what is the PDF of Z = max(X ,Y )?

I Let’s first think about the CDF...

P(Z ≤ z) =P({X ≤ z} ∩ {Y ≤ z})

=P(X ≤ z)P(Y ≤ z) =

0 z < 0

z2 0 ≤ z ≤ 1

1 z > 1

I Differentiating, fZ (z) =

{2z 0 ≤ z ≤ 1

0 otherwise

4

Page 6: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two independent continuous random variables

I I know two guests will both arrive in the next hour. The arrivaltimes are independent random variables which follow a uniformdistribution. What is the PDF of the arrival time of the last guest toarrive?

I X ∼ Uniform([0, 1]) and Y ∼ Uniform([0, 1]).

I In other words, what is the PDF of Z = max(X ,Y )?

I Let’s first think about the CDF...

P(Z ≤ z) =P({X ≤ z} ∩ {Y ≤ z})

=P(X ≤ z)P(Y ≤ z) =

0 z < 0

z2 0 ≤ z ≤ 1

1 z > 1

I Differentiating, fZ (z) =

{2z 0 ≤ z ≤ 1

0 otherwise

4

Page 7: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two independent continuous random variables

I I know two guests will both arrive in the next hour. The arrivaltimes are independent random variables which follow a uniformdistribution. What is the PDF of the arrival time of the last guest toarrive?

I X ∼ Uniform([0, 1]) and Y ∼ Uniform([0, 1]).

I In other words, what is the PDF of Z = max(X ,Y )?

I Let’s first think about the CDF...

P(Z ≤ z) =P({X ≤ z} ∩ {Y ≤ z})

=P(X ≤ z)P(Y ≤ z) =

0 z < 0

z2 0 ≤ z ≤ 1

1 z > 1

I Differentiating, fZ (z) =

{2z 0 ≤ z ≤ 1

0 otherwise

4

Page 8: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two independent continuous random variables

I I know two guests will both arrive in the next hour. The arrivaltimes are independent random variables which follow a uniformdistribution. What is the PDF of the arrival time of the last guest toarrive?

I X ∼ Uniform([0, 1]) and Y ∼ Uniform([0, 1]).

I In other words, what is the PDF of Z = max(X ,Y )?

I Let’s first think about the CDF...

P(Z ≤ z) =P({X ≤ z} ∩ {Y ≤ z})

=P(X ≤ z)P(Y ≤ z) =

0 z < 0

z2 0 ≤ z ≤ 1

1 z > 1

I Differentiating, fZ (z) =

{2z 0 ≤ z ≤ 1

0 otherwise

4

Page 9: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two independent continuous random variables

I I know two guests will both arrive in the next hour. The arrivaltimes are independent random variables which follow a uniformdistribution. What is the PDF of the arrival time of the last guest toarrive?

I X ∼ Uniform([0, 1]) and Y ∼ Uniform([0, 1]).

I In other words, what is the PDF of Z = max(X ,Y )?

I Let’s first think about the CDF...

P(Z ≤ z) =P({X ≤ z} ∩ {Y ≤ z})

=P(X ≤ z)P(Y ≤ z) =

0 z < 0

z2 0 ≤ z ≤ 1

1 z > 1

I Differentiating, fZ (z) =

{2z 0 ≤ z ≤ 1

0 otherwise

4

Page 10: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two independent continuous random variables

I I know two guests will both arrive in the next hour. The arrivaltimes are independent random variables which follow a uniformdistribution. What is the PDF of the arrival time of the last guest toarrive?

I X ∼ Uniform([0, 1]) and Y ∼ Uniform([0, 1]).

I In other words, what is the PDF of Z = max(X ,Y )?

I Let’s first think about the CDF...

P(Z ≤ z) =P({X ≤ z} ∩ {Y ≤ z})

=P(X ≤ z)P(Y ≤ z) =

0 z < 0

z2 0 ≤ z ≤ 1

1 z > 1

I Differentiating, fZ (z) =

{2z 0 ≤ z ≤ 1

0 otherwise

4

Page 11: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Functions of two random variables: Summary

I If Y = g(X ), in order to get the PDF of Y we first looked at theCDF, P(Y ≤ y) = P(g(X ) ≤ y) and then differentiated with respectto y .

I For functions Z = g(X ,Y ) of two random variables, the samegeneral idea applies:

I First we look at the CDF, P(Z ≤ z) = P(g(X ,Y ) ≤ x)

I Then we differentiate with respect to z.

I We looked at a special case: The maximum of two independentr.v.’s.

5

Page 12: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Conditional expectation

I Recall that the conditional expectation of a discrete random variableis given by

E [X |Y = y ] =∑x

xP(X = x |Y = y)

I Recall that the conditional expectation of a continuous randomvariable is given by

E [X |Y = y ] =

∫ ∞−∞

xfX |Y (x |y)dx

I For a given value of y E [X |Y = y ] gives a numerical value.

6

Page 13: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Example 1

A miner is trapped in a mine containing 3 doors. The first door leads toa tunnel that will take him to safety after 3 hours of travel. The seconddoor leads to a tunnel that will return him to the mine after 5 hours oftravel. The third door leads to a tunnel that will return him to the mineafter 7 hours. If we assume that the miner is at all times equally likely tochoose any one of the doors, what is the expected length of time until hereaches safety?

I Let X denote the amount of time (in hours) until the miner reachessafety, and let Y denote the door he initially chooses

I

E [X ] = E [X |Y = 1]P(Y = 1) + E [X |Y = 2]P(Y = 2) + E [X |Y = 3]P(Y = 3)

=1

3(3 + (5 + E [X ]) + (7 + E [X ]))

=1

3(15 + 2E [X ])

E [X ] = 15

7

Page 14: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Example 1

A miner is trapped in a mine containing 3 doors. The first door leads toa tunnel that will take him to safety after 3 hours of travel. The seconddoor leads to a tunnel that will return him to the mine after 5 hours oftravel. The third door leads to a tunnel that will return him to the mineafter 7 hours. If we assume that the miner is at all times equally likely tochoose any one of the doors, what is the expected length of time until hereaches safety?

I Let X denote the amount of time (in hours) until the miner reachessafety, and let Y denote the door he initially chooses

I

E [X ] = E [X |Y = 1]P(Y = 1) + E [X |Y = 2]P(Y = 2) + E [X |Y = 3]P(Y = 3)

=1

3(3 + (5 + E [X ]) + (7 + E [X ]))

=1

3(15 + 2E [X ])

E [X ] = 15

7

Page 15: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Example 1

I John’s tank holds 15 gallons of gas, and he always refills his tankwhen he gets down to 5 gallons. For y gallons of gas in John’s tank,total number of miles he will run is X ∼ N(30Y , 1).

I Y ∼ Uniform([5, 15])

I Whats E [X ]?

I E [X |Y = y ] = 30y .

8

Page 16: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Conditional expectation as a random variable

I In the last example, we saw that E [X |Y = y ] = 30y .

I For any number y E [X |Y = y ] is also a number.

I As y varies so does E [X |Y = y ]. So, E [X |Y = y ], so we can viewE [X |Y = y ] as a function of Y .

I Since a function of a random variable, E [X |Y = y ] is also a randomvariable.

I Concretely we will define E [X |Y ] as a function of Y , such that:

E [X |Y ] = E [X |Y = y ] When Y = y

I In the last example, E [X |Y ] = 30Y .

I Fun fact. E [Xh(Y )|Y ] = E [X |Y ]h(Y ). Why?

9

Page 17: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Law of iterated expectations

Since E [X |Y ] is a random variable, its expectation can be calculated asE [E [X |Y ]].

I E [E [X |Y ]] =

∑y

E [X |Y = y ]P(Y = y) Y discrete∫yE [X |Y = y ]fY (y) Y continuous

I But we know the RHS of the above, don’t we?

I Total expectation theorem! So E [E [X |Y ]] = E [X ]

For the last problem where X ∼ N(30y , 1), find E [X |Y = y ]

I We can do it by calculating fX (x), or we can just use the iteratedexpectation theorem.

I We already saw that Y ∼ Uniform([5, 15]).

I E [X ] = E [E [X |Y ]] = 30E [Y ] = 30× 10 = 300.

10

Page 18: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Law of iterated expectations

Since E [X |Y ] is a random variable, its expectation can be calculated asE [E [X |Y ]].

I E [E [X |Y ]] =

∑y

E [X |Y = y ]P(Y = y) Y discrete∫yE [X |Y = y ]fY (y) Y continuous

I But we know the RHS of the above, don’t we?

I Total expectation theorem! So E [E [X |Y ]] = E [X ]

For the last problem where X ∼ N(30y , 1), find E [X |Y = y ]

I We can do it by calculating fX (x), or we can just use the iteratedexpectation theorem.

I We already saw that Y ∼ Uniform([5, 15]).

I E [X ] = E [E [X |Y ]] = 30E [Y ] = 30× 10 = 300.

10

Page 19: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Law of iterated expectations

Since E [X |Y ] is a random variable, its expectation can be calculated asE [E [X |Y ]].

I E [E [X |Y ]] =

∑y

E [X |Y = y ]P(Y = y) Y discrete∫yE [X |Y = y ]fY (y) Y continuous

I But we know the RHS of the above, don’t we?

I Total expectation theorem! So E [E [X |Y ]] = E [X ]

For the last problem where X ∼ N(30y , 1), find E [X |Y = y ]

I We can do it by calculating fX (x), or we can just use the iteratedexpectation theorem.

I We already saw that Y ∼ Uniform([5, 15]).

I E [X ] = E [E [X |Y ]] = 30E [Y ] = 30× 10 = 300.

10

Page 20: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Law of iterated expectations

Since E [X |Y ] is a random variable, its expectation can be calculated asE [E [X |Y ]].

I E [E [X |Y ]] =

∑y

E [X |Y = y ]P(Y = y) Y discrete∫yE [X |Y = y ]fY (y) Y continuous

I But we know the RHS of the above, don’t we?

I Total expectation theorem! So E [E [X |Y ]] = E [X ]

For the last problem where X ∼ N(30y , 1), find E [X |Y = y ]

I We can do it by calculating fX (x), or we can just use the iteratedexpectation theorem.

I We already saw that Y ∼ Uniform([5, 15]).

I E [X ] = E [E [X |Y ]] = 30E [Y ] = 30× 10 = 300.

10

Page 21: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Law of iterated expectations

Since E [X |Y ] is a random variable, its expectation can be calculated asE [E [X |Y ]].

I E [E [X |Y ]] =

∑y

E [X |Y = y ]P(Y = y) Y discrete∫yE [X |Y = y ]fY (y) Y continuous

I But we know the RHS of the above, don’t we?

I Total expectation theorem! So E [E [X |Y ]] = E [X ]

For the last problem where X ∼ N(30y , 1), find E [X |Y = y ]

I We can do it by calculating fX (x), or we can just use the iteratedexpectation theorem.

I We already saw that Y ∼ Uniform([5, 15]).

I E [X ] = E [E [X |Y ]] = 30E [Y ] = 30× 10 = 300.

10

Page 22: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Stick breakingWe are breaking a stick of length ` at a point which is chosen uniformlyover its length and keep the piece that contains the left end and then werepeat the process with the piece we have. What is the expected lengthof the remaining stick?

I Let Y is the length of the stick after we break it for the first timeand X be the length after the second break.

I What is E [X |Y = y ]? For that we need fX |Y (x |y). Uniform in [0, y ].

I So E [X |Y = y ] = y/2, and E [X |Y ] = Y /2.

I So E [X ] is E [E [X |Y ]] = E [Y /2].

I But Y is also Uniform in the interval [0, `]

I So, E [Y ] = `/2. and E [X ] = E [Y ]/2 = `/4

11

Page 23: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Stick breakingWe are breaking a stick of length ` at a point which is chosen uniformlyover its length and keep the piece that contains the left end and then werepeat the process with the piece we have. What is the expected lengthof the remaining stick?

I Let Y is the length of the stick after we break it for the first timeand X be the length after the second break.

I What is E [X |Y = y ]? For that we need fX |Y (x |y). Uniform in [0, y ].

I So E [X |Y = y ] = y/2, and E [X |Y ] = Y /2.

I So E [X ] is E [E [X |Y ]] = E [Y /2].

I But Y is also Uniform in the interval [0, `]

I So, E [Y ] = `/2. and E [X ] = E [Y ]/2 = `/4

11

Page 24: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Stick breakingWe are breaking a stick of length ` at a point which is chosen uniformlyover its length and keep the piece that contains the left end and then werepeat the process with the piece we have. What is the expected lengthof the remaining stick?

I Let Y is the length of the stick after we break it for the first timeand X be the length after the second break.

I What is E [X |Y = y ]? For that we need fX |Y (x |y). Uniform in [0, y ].

I So E [X |Y = y ] = y/2, and E [X |Y ] = Y /2.

I So E [X ] is E [E [X |Y ]] = E [Y /2].

I But Y is also Uniform in the interval [0, `]

I So, E [Y ] = `/2. and E [X ] = E [Y ]/2 = `/4

11

Page 25: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Stick breakingWe are breaking a stick of length ` at a point which is chosen uniformlyover its length and keep the piece that contains the left end and then werepeat the process with the piece we have. What is the expected lengthof the remaining stick?

I Let Y is the length of the stick after we break it for the first timeand X be the length after the second break.

I What is E [X |Y = y ]? For that we need fX |Y (x |y). Uniform in [0, y ].

I So E [X |Y = y ] = y/2, and E [X |Y ] = Y /2.

I So E [X ] is E [E [X |Y ]] = E [Y /2].

I But Y is also Uniform in the interval [0, `]

I So, E [Y ] = `/2. and E [X ] = E [Y ]/2 = `/4

11

Page 26: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Stick breakingWe are breaking a stick of length ` at a point which is chosen uniformlyover its length and keep the piece that contains the left end and then werepeat the process with the piece we have. What is the expected lengthof the remaining stick?

I Let Y is the length of the stick after we break it for the first timeand X be the length after the second break.

I What is E [X |Y = y ]? For that we need fX |Y (x |y). Uniform in [0, y ].

I So E [X |Y = y ] = y/2, and E [X |Y ] = Y /2.

I So E [X ] is E [E [X |Y ]] = E [Y /2].

I But Y is also Uniform in the interval [0, `]

I So, E [Y ] = `/2. and E [X ] = E [Y ]/2 = `/4

11

Page 27: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Stick breakingWe are breaking a stick of length ` at a point which is chosen uniformlyover its length and keep the piece that contains the left end and then werepeat the process with the piece we have. What is the expected lengthof the remaining stick?

I Let Y is the length of the stick after we break it for the first timeand X be the length after the second break.

I What is E [X |Y = y ]? For that we need fX |Y (x |y). Uniform in [0, y ].

I So E [X |Y = y ] = y/2, and E [X |Y ] = Y /2.

I So E [X ] is E [E [X |Y ]] = E [Y /2].

I But Y is also Uniform in the interval [0, `]

I So, E [Y ] = `/2. and E [X ] = E [Y ]/2 = `/4

11

Page 28: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Stick breakingWe are breaking a stick of length ` at a point which is chosen uniformlyover its length and keep the piece that contains the left end and then werepeat the process with the piece we have. What is the expected lengthof the remaining stick?

I Let Y is the length of the stick after we break it for the first timeand X be the length after the second break.

I What is E [X |Y = y ]? For that we need fX |Y (x |y). Uniform in [0, y ].

I So E [X |Y = y ] = y/2, and E [X |Y ] = Y /2.

I So E [X ] is E [E [X |Y ]] = E [Y /2].

I But Y is also Uniform in the interval [0, `]

I So, E [Y ] = `/2. and E [X ] = E [Y ]/2 = `/4 11

Page 29: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Some useful inequalities-Markov

So far we have looked at expectations and variances of sums ofindependent random variables. Today we will also look at there behaviorwhen the number of random variables is increasing.

I For a positive random variable X and t > 0,

P(X > t) ≤ E [X ]/t.

I How do we show this?

E [X ] = E [X |X ≥ t]P(X ≥ t) + E [X |X < t]P(X < t)

≥ E [X |X ≥ t]P(X ≥ t) ≥ tP(X ≥ t)

P(X ≥ t) ≤ E [X ]/t

I All this comes in handy to show that a random variable cannot betoo far from its expectation if the variance is small.

12

Page 30: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Some useful inequalities

So far we have looked at expectations and variances of sums ofindependent random variables. Today we will also look at there behaviorwhen the number of random variables is increasing.

I Remember markov’s inequality? For a positive random variable X

and some t > 0, we said that P(X ≥ t) ≤ E [X ]

tI We can use this to bound P(|X − E [X ]| ≥ c).

P(|X − µ| ≥ c) = P((X − µ)2 ≥ c2) ≤ E [(X − µ)2]

c2

I This is the famous Chebyshev inequality.

I All this comes in handy to show that a random variable cannot betoo far from its expectation if the variance is small.

13

Page 31: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

14

Page 32: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

14

Page 33: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

14

Page 34: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

14

Page 35: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Weak law of large numbers

The WLLN basically states that the sample mean of a large number ofrandom variables is very close to the true mean with high probability.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let Mn =X1 + · · ·+ Xn

n.

I E [Mn] =E [X1] + · · ·+ E [Xn]

n= µ

I var(Mn) =var[X1] + · · ·+ var[Xn]

n2=σ2

n

I So P(|Mn − µ| ≥ ε) ≤σ2

nε2

I For large n this probability is small.

14

Page 36: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Illustration

Consider the mean of n independent Poisson(1) random variables. Foreach n, we plot the distribution of the average.

15

Page 37: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Can we say more? Central Limit Theorem

Turns out that not only can you say that the sample mean is close to thetrue mean, you can actually predict its distribution using the famousCentral Limit Theorem.

I Consider a sequence of i.i.d random variables X1, . . .Xn with mean µ

and variance σ2.

I Let X̄n =X1 + · · ·+ Xn

n. Remember E [X̄n] = µ and var(X̄n) = σ2/n

I Standardize X̄n to getX̄n − µσ/√n

I As n gets bigger,X̄n − µσ/√n

behaves more and more like a Normal(0, 1)

random variable.

I P

(X̄n − µσ/√n< z

)≈ Φ(z)

16

Page 38: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Can we say more? Central Limit Theorem

Figure: (Courtesy: Tamara Broderick) You bet!

17

Page 39: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

ExampleAn astronomer is interested in measuring the distance, in light-years, from his observatory toa distant star. Although the astronomer has a measuring technique, he knows that, becauseof changing atmospheric conditions and normal error, each time a measurement is made itwill not yield the exact distance, but merely an estimate. As a result, the astronomer plansto make a series of measurements and then use the average value of these measurements ashis estimated value of the actual distance. If the astronomer believes that the values of themeasurements are independent and identically distributed random variables having acommon mean d (the actual distance) and a common variance of 4 (light-years), how manymeasurements need he make to 95% sure that his estimated distance is accurate to within±.5 lightyears?

I Let X̄n be the mean of the measurements.

I How large does n have to be so that P(|X̄n − d | ≤ .5) = 0.95

I

P

(|X̄n − d |

2/√n≤ 0.25

√n

)≈ P(|Z | ≤ 0.25

√n) = 1− 2P(Z ≤ −0.25

√n) = 0.95

P(Z ≤ −0.25√n) = 0.025

−0.25√n = −1.96√n = 7.84

n ≈ 62

18

Page 40: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

ExampleAn astronomer is interested in measuring the distance, in light-years, from his observatory toa distant star. Although the astronomer has a measuring technique, he knows that, becauseof changing atmospheric conditions and normal error, each time a measurement is made itwill not yield the exact distance, but merely an estimate. As a result, the astronomer plansto make a series of measurements and then use the average value of these measurements ashis estimated value of the actual distance. If the astronomer believes that the values of themeasurements are independent and identically distributed random variables having acommon mean d (the actual distance) and a common variance of 4 (light-years), how manymeasurements need he make to 95% sure that his estimated distance is accurate to within±.5 lightyears?

I Let X̄n be the mean of the measurements.

I How large does n have to be so that P(|X̄n − d | ≤ .5) = 0.95

I

P

(|X̄n − d |

2/√n≤ 0.25

√n

)≈ P(|Z | ≤ 0.25

√n) = 1− 2P(Z ≤ −0.25

√n) = 0.95

P(Z ≤ −0.25√n) = 0.025

−0.25√n = −1.96√n = 7.84

n ≈ 62

18

Page 41: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Normal Approximation to Binomial

The probability of selling an umbrella is 0.5 on a rainy day. If there are400 umbrellas in the store, whats the probability that the owner will sellat least 180?

I Let X be the total number of umbrellas sold.

I X ∼ Binomial(400, .5)

I We want P(X > 180). Crazy calculations.

I But can we approximate the distribution of X/n?

I X/n = (∑i

Yi )/n where E [Yi ] = 0.5 and var(Yi ) = 0.25.

I Sure! CLT tells us that for large n,X/400− 0.5√

0.25/400∼ N(0, 1)

I So P(X > 180) = P((X − 200)/√

100 > −2) ≈ P(Z ≥ −2) =

1− Φ(−2) = 0.97

19

Page 42: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Normal Approximation to Binomial

The probability of selling an umbrella is 0.5 on a rainy day. If there are400 umbrellas in the store, whats the probability that the owner will sellat least 180?

I Let X be the total number of umbrellas sold.

I X ∼ Binomial(400, .5)

I We want P(X > 180). Crazy calculations.

I But can we approximate the distribution of X/n?

I X/n = (∑i

Yi )/n where E [Yi ] = 0.5 and var(Yi ) = 0.25.

I Sure! CLT tells us that for large n,X/400− 0.5√

0.25/400∼ N(0, 1)

I So P(X > 180) = P((X − 200)/√

100 > −2) ≈ P(Z ≥ −2) =

1− Φ(−2) = 0.97

19

Page 43: SDS 321: Introduction to Probability and Statisticspsarkar/sds321/lecture22-new.pdf · P(Z z) =P(fX zg\fY zg) =P(X z)P(Y z) = 8 >> < >>: 0z < z2 0 1 1 z >1 I

Summary

I We looked at conditional expectation as a random variable.

I For two r.v.’s X ,Y E [X |Y ] is a random variable which is a functionof Y

I Law of Iterated expectation: E [E [X |Y ]] = E [X ].

I We also saw Chebyshev’s inequality, WLLN and the CLT.

20