18.600 f2019 lecture 23: conditional probability, order ... › courses › mathematics ›...

61
18.600: Lecture 23 Conditional probability, order statistics, expectations of sums Scott Sheffield MIT 1

Upload: others

Post on 31-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

18.600: Lecture 23

Conditional probability, order statistics, expectations of sums

Scott Sheffield

MIT

1

Page 2: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Outline

Conditional probability densities

Order statistics

Expectations of sums

2

Page 3: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Outline

Conditional probability densities

Order statistics

Expectations of sums

3

Page 4: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I We can define the conditional probability density of X giventhat Y = y by fX |Y=y (x) =

f (x ,y)fY (y) .

I This amounts to restricting f (x , y) to the line correspondingto the given y value (and dividing by the constant that makesthe integral along that line equal to 1).

I This definition assumes that fY (y) =R∞−∞ f (x , y)dx <∞ and

fY (y) 6= 0. This usually safe to assume. (It is true for aprobability one set of y values, so places where definitiondoesn’t make sense can be ignored).

Conditional distributions

I Let’s say X and Y have joint probability density function f (x , y).

4

Page 5: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I This amounts to restricting f (x , y) to the line correspondingto the given y value (and dividing by the constant that makesthe integral along that line equal to 1).

I This definition assumes that fY (y) =R∞−∞ f (x , y)dx <∞ and

fY (y) 6= 0. This usually safe to assume. (It is true for aprobability one set of y values, so places where definitiondoesn’t make sense can be ignored).

Conditional distributions

I Let’s say X and Y have joint probability density function f (x , y).

I We can define the conditional probability density of X given f (x ,y) that Y = y by fX |Y =y (x) = fY (y)

.

5

Page 6: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I This definition assumes that fY (y) =R∞−∞ f (x , y)dx <∞ and

fY (y) 6= 0. This usually safe to assume. (It is true for aprobability one set of y values, so places where definitiondoesn’t make sense can be ignored).

Conditional distributions

I Let’s say X and Y have joint probability density function f (x , y).

I We can define the conditional probability density of X given f (x ,y) that Y = y by fX |Y =y (x) = fY (y)

.

I This amounts to restricting f (x , y) to the line corresponding to the given y value (and dividing by the constant that makes the integral along that line equal to 1).

6

Page 7: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Conditional distributions

I Let’s say X and Y have joint probability density function f (x , y).

I We can define the conditional probability density of X given f (x ,y) that Y = y by fX |Y =y (x) = fY (y)

.

I This amounts to restricting f (x , y) to the line corresponding to the given y value (and dividing by the constant that makes the integral along that line equal to 1).R ∞ I This definition assumes that fY (y) = f (x , y)dx < ∞ and −∞ fY (y) 6= 0. This usually safe to assume. (It is true for a probability one set of y values, so places where definition doesn’t make sense can be ignored).

7

Page 8: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Doesn’t make sense if P(B) = 0. But previous slide defines“probability conditioned on Y = y” and P{Y = y} = 0.

I When can we (somehow) make sense of conditioning onprobability zero event?

I Tough question in general.

I Consider conditional law of X given that Y ∈ (y − �, y + �). Ifthis has a limit as �→ 0, we can call that the law conditionedon Y = y .

I Precisely, defineFX |Y=y (a) := lim�→0 P{X ≤ a|Y ∈ (y − �, y + �)}.

I Then set fX |Y=y (a) = F 0X |Y=y (a). Consistent with definitionfrom previous slide.

Remarks: conditioning on a probability zero event

I Our standard definition of conditional probability is P(A|B) = P(AB)/P(B).

8

Page 9: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I When can we (somehow) make sense of conditioning onprobability zero event?

I Tough question in general.

I Consider conditional law of X given that Y ∈ (y − �, y + �). Ifthis has a limit as �→ 0, we can call that the law conditionedon Y = y .

I Precisely, defineFX |Y=y (a) := lim�→0 P{X ≤ a|Y ∈ (y − �, y + �)}.

I Then set fX |Y=y (a) = F 0X |Y=y (a). Consistent with definitionfrom previous slide.

Remarks: conditioning on a probability zero event

I Our standard definition of conditional probability is P(A|B) = P(AB)/P(B).

I Doesn’t make sense if P(B) = 0. But previous slide defines “probability conditioned on Y = y” and P{Y = y} = 0.

9

Page 10: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Tough question in general.

I Consider conditional law of X given that Y ∈ (y − �, y + �). Ifthis has a limit as �→ 0, we can call that the law conditionedon Y = y .

I Precisely, defineFX |Y=y (a) := lim�→0 P{X ≤ a|Y ∈ (y − �, y + �)}.

I Then set fX |Y=y (a) = F 0X |Y=y (a). Consistent with definitionfrom previous slide.

Remarks: conditioning on a probability zero event

I Our standard definition of conditional probability is P(A|B) = P(AB)/P(B).

I Doesn’t make sense if P(B) = 0. But previous slide defines “probability conditioned on Y = y” and P{Y = y} = 0.

I When can we (somehow) make sense of conditioning on probability zero event?

10

Page 11: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Consider conditional law of X given that Y ∈ (y − �, y + �). Ifthis has a limit as �→ 0, we can call that the law conditionedon Y = y .

I Precisely, defineFX |Y=y (a) := lim�→0 P{X ≤ a|Y ∈ (y − �, y + �)}.

I Then set fX |Y=y (a) = F 0X |Y=y (a). Consistent with definitionfrom previous slide.

Remarks: conditioning on a probability zero event

I Our standard definition of conditional probability is P(A|B) = P(AB)/P(B).

I Doesn’t make sense if P(B) = 0. But previous slide defines “probability conditioned on Y = y” and P{Y = y} = 0.

I When can we (somehow) make sense of conditioning on probability zero event?

I Tough question in general.

11

Page 12: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Precisely, defineFX |Y=y (a) := lim�→0 P{X ≤ a|Y ∈ (y − �, y + �)}.

I Then set fX |Y=y (a) = F 0X |Y=y (a). Consistent with definitionfrom previous slide.

Remarks: conditioning on a probability zero event

I Our standard definition of conditional probability is P(A|B) = P(AB)/P(B).

I Doesn’t make sense if P(B) = 0. But previous slide defines “probability conditioned on Y = y” and P{Y = y} = 0.

I When can we (somehow) make sense of conditioning on probability zero event?

I Tough question in general.

I Consider conditional law of X given that Y ∈ (y − �, y + �). If this has a limit as � → 0, we can call that the law conditioned on Y = y .

12

Page 13: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Then set fX |Y=y (a) = F 0X |Y=y (a). Consistent with definitionfrom previous slide.

Remarks: conditioning on a probability zero event

I Our standard definition of conditional probability is P(A|B) = P(AB)/P(B).

I Doesn’t make sense if P(B) = 0. But previous slide defines “probability conditioned on Y = y” and P{Y = y} = 0.

I When can we (somehow) make sense of conditioning on probability zero event?

I Tough question in general.

I Consider conditional law of X given that Y ∈ (y − �, y + �). If this has a limit as � → 0, we can call that the law conditioned on Y = y .

I Precisely, define FX |Y =y (a) := lim�→0 P{X ≤ a|Y ∈ (y − �, y + �)}.

13

Page 14: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Remarks: conditioning on a probability zero event

I Our standard definition of conditional probability is P(A|B) = P(AB)/P(B).

I Doesn’t make sense if P(B) = 0. But previous slide defines “probability conditioned on Y = y” and P{Y = y} = 0.

I When can we (somehow) make sense of conditioning on probability zero event?

I Tough question in general.

I Consider conditional law of X given that Y ∈ (y − �, y + �). If this has a limit as � → 0, we can call that the law conditioned on Y = y .

I Precisely, define FX |Y =y (a) := lim�→0 P{X ≤ a|Y ∈ (y − �, y + �)}.

I Then set fX |Y (a) = FX 0|Y =y (a). Consistent with definition =y

from previous slide. 14

Page 15: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Answer: fX |Y=0(x) = 1 if x ∈ [0, 1] (zero otherwise).

I Let (θ,R) be (X ,Y ) in polar coordinates. What is fX |θ=0(x)?

I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).

I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.

I Conditioning on (X ,Y ) belonging to a θ ∈ (−�, �) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−�, �) strip.

A word of caution

I Suppose X and Y are chosen uniformly on the semicircle {(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y =0(x)?

15

Page 16: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Let (θ,R) be (X ,Y ) in polar coordinates. What is fX |θ=0(x)?

I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).

I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.

I Conditioning on (X ,Y ) belonging to a θ ∈ (−�, �) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−�, �) strip.

A word of caution

I Suppose X and Y are chosen uniformly on the semicircle {(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y =0(x)?

I Answer: fX |Y =0(x) = 1 if x ∈ [0, 1] (zero otherwise).

16

Page 17: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).

I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.

I Conditioning on (X ,Y ) belonging to a θ ∈ (−�, �) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−�, �) strip.

A word of caution

I Suppose X and Y are chosen uniformly on the semicircle {(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y =0(x)?

I Answer: fX |Y =0(x) = 1 if x ∈ [0, 1] (zero otherwise).

I Let (θ, R) be (X , Y ) in polar coordinates. What is fX |θ=0(x)?

17

Page 18: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Both {θ = 0} and {Y = 0} describe the same probability zeroevent. But our interpretation of what it means to conditionon this event is different in these two cases.

I Conditioning on (X ,Y ) belonging to a θ ∈ (−�, �) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−�, �) strip.

A word of caution

I Suppose X and Y are chosen uniformly on the semicircle {(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y =0(x)?

I Answer: fX |Y =0(x) = 1 if x ∈ [0, 1] (zero otherwise).

I Let (θ, R) be (X , Y ) in polar coordinates. What is fX |θ=0(x)?

I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).

18

Page 19: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Conditioning on (X ,Y ) belonging to a θ ∈ (−�, �) wedge isvery different from conditioning on (X ,Y ) belonging to aY ∈ (−�, �) strip.

A word of caution

I Suppose X and Y are chosen uniformly on the semicircle {(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y =0(x)?

I Answer: fX |Y =0(x) = 1 if x ∈ [0, 1] (zero otherwise).

I Let (θ, R) be (X , Y ) in polar coordinates. What is fX |θ=0(x)?

I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).

I Both {θ = 0} and {Y = 0} describe the same probability zero event. But our interpretation of what it means to condition on this event is different in these two cases.

19

Page 20: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

A word of caution

I Suppose X and Y are chosen uniformly on the semicircle {(x , y) : x2 + y2 ≤ 1, x ≥ 0}. What is fX |Y =0(x)?

I Answer: fX |Y =0(x) = 1 if x ∈ [0, 1] (zero otherwise).

I Let (θ, R) be (X , Y ) in polar coordinates. What is fX |θ=0(x)?

I Answer: fX |θ=0(x) = 2x if x ∈ [0, 1] (zero otherwise).

I Both {θ = 0} and {Y = 0} describe the same probability zero event. But our interpretation of what it means to condition on this event is different in these two cases.

I Conditioning on (X , Y ) belonging to a θ ∈ (−�, �) wedge is very different from conditioning on (X , Y ) belonging to a Y ∈ (−�, �) strip.

20

Page 21: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Outline

Conditional probability densities

Order statistics

Expectations of sums

21

Page 22: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Outline

Conditional probability densities

Order statistics

Expectations of sums

22

Page 23: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I The n-tuple (X1,X2, . . . ,Xn) has a constant density functionon the n-dimensional cube [0, 1]n.

I What is the probability that the largest of the Xi is less thana?

I ANSWER: an.

I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?

I Answer: FX (a) =

⎧⎪⎨⎪⎩0 a < 0

an a ∈ [0, 1]

1 a > 1

. And

fx(a) = F 0X (a) = nan−1.

Maxima: pick five job candidates at random, choose best

I Suppose I choose n random variables X1, X2, . . . , Xn uniformly at random on [0, 1], independently of each other.

23

Page 24: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I What is the probability that the largest of the Xi is less thana?

I ANSWER: an.

I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?

I Answer: FX (a) =

⎧⎪⎨⎪⎩0 a < 0

an a ∈ [0, 1]

1 a > 1

. And

fx(a) = F 0X (a) = nan−1.

Maxima: pick five job candidates at random, choose best

I Suppose I choose n random variables X1, X2, . . . , Xn uniformly at random on [0, 1], independently of each other.

I The n-tuple (X1, X2, . . . , Xn) has a constant density function on the n-dimensional cube [0, 1]n .

24

Page 25: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I ANSWER: an.

I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?

I Answer: FX (a) =

⎧⎪⎨⎪⎩0 a < 0

an a ∈ [0, 1]

1 a > 1

. And

fx(a) = F 0X (a) = nan−1.

Maxima: pick five job candidates at random, choose best

I Suppose I choose n random variables X1, X2, . . . , Xn uniformly at random on [0, 1], independently of each other.

I The n-tuple (X1, X2, . . . , Xn) has a constant density function on the n-dimensional cube [0, 1]n .

I What is the probability that the largest of the Xi is less than a?

25

Page 26: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I So if X = max{X1, . . . ,Xn}, then what is the probabilitydensity function of X?

I Answer: FX (a) =

⎧⎪⎨⎪⎩0 a < 0

an a ∈ [0, 1]

1 a > 1

. And

fx(a) = F 0X (a) = nan−1.

Maxima: pick five job candidates at random, choose best

I Suppose I choose n random variables X1, X2, . . . , Xn uniformly at random on [0, 1], independently of each other.

I The n-tuple (X1, X2, . . . , Xn) has a constant density function on the n-dimensional cube [0, 1]n .

I What is the probability that the largest of the Xi is less than a?

n I ANSWER: a .

26

Page 27: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Answer: FX (a) =

⎧⎪⎨⎪⎩0 a < 0

an a ∈ [0, 1]

1 a > 1

. And

fx(a) = F 0X (a) = nan−1.

Maxima: pick five job candidates at random, choose best

I Suppose I choose n random variables X1, X2, . . . , Xn uniformly at random on [0, 1], independently of each other.

I The n-tuple (X1, X2, . . . , Xn) has a constant density function on the n-dimensional cube [0, 1]n .

I What is the probability that the largest of the Xi is less than a?

n I ANSWER: a .

I So if X = max{X1, . . . , Xn}, then what is the probability density function of X ?

27

Page 28: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Maxima: pick five job candidates at random, choose best

I Suppose I choose n random variables X1, X2, . . . , Xn uniformly at random on [0, 1], independently of each other.

I The n-tuple (X1, X2, . . . , Xn) has a constant density function on the n-dimensional cube [0, 1]n .

I What is the probability that the largest of the Xi is less than a?

n I ANSWER: a .

I So if X = max{X1, . . . , Xn}, then what is the probability density function of X ? ⎧ ⎪0 a < 0 ⎨

n I Answer: FX (a) = a a ∈ [0, 1] . And ⎪⎩ 1 a > 1 n−1 (a) = F 0 (a) = na . fx X

28

Page 29: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .

I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.

I What is the joint probability density of the Yi?

I Answer: f (x1, x2, . . . , xn) = n!Qn

i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.

I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)

I Are σ and the vector (Y1, . . . ,Yn) independent of each other?

I Yes.

General order statistics

I Consider i.i.d random variables X1, X2, . . . , Xn with continuous probability density f .

29

Page 30: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I In particular, Y1 = min{X1, . . . ,Xn} andYn = max{X1, . . . ,Xn} is the maximum.

I What is the joint probability density of the Yi?

I Answer: f (x1, x2, . . . , xn) = n!Qn

i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.

I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)

I Are σ and the vector (Y1, . . . ,Yn) independent of each other?

I Yes.

General order statistics

I Consider i.i.d random variables X1, X2, . . . , Xn with continuous probability density f .

I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .

30

Page 31: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I What is the joint probability density of the Yi?

I Answer: f (x1, x2, . . . , xn) = n!Qn

i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.

I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)

I Are σ and the vector (Y1, . . . ,Yn) independent of each other?

I Yes.

General order statistics

I Consider i.i.d random variables X1, X2, . . . , Xn with continuous probability density f .

I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .

I In particular, Y1 = min{X1, . . . , Xn} and Yn = max{X1, . . . , Xn} is the maximum.

31

Page 32: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Answer: f (x1, x2, . . . , xn) = n!Qn

i=1 f (xi ) if x1 < x2 . . . < xn,zero otherwise.

I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)

I Are σ and the vector (Y1, . . . ,Yn) independent of each other?

I Yes.

General order statistics

I Consider i.i.d random variables X1, X2, . . . , Xn with continuous probability density f .

I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .

I In particular, Y1 = min{X1, . . . , Xn} and Yn = max{X1, . . . , Xn} is the maximum.

I What is the joint probability density of the Yi ?

32

Page 33: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation suchthat Xj = Yσ(j)

I Are σ and the vector (Y1, . . . ,Yn) independent of each other?

I Yes.

General order statistics

I Consider i.i.d random variables X1, X2, . . . , Xn with continuous probability density f .

I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .

I In particular, Y1 = min{X1, . . . , Xn} and Yn = max{X1, . . . , Xn} is the maximum.

I What is the joint probability density of the Yi ? Qn I Answer: f (x1, x2, . . . , xn) = n! f (xi ) if x1 < x2 . . . < xn, i=1 zero otherwise.

33

Page 34: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Are σ and the vector (Y1, . . . ,Yn) independent of each other?

I Yes.

General order statistics

I Consider i.i.d random variables X1, X2, . . . , Xn with continuous probability density f .

I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .

I In particular, Y1 = min{X1, . . . , Xn} and Yn = max{X1, . . . , Xn} is the maximum.

I What is the joint probability density of the Yi ? Qn I Answer: f (x1, x2, . . . , xn) = n! f (xi ) if x1 < x2 . . . < xn, i=1 zero otherwise.

I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation such that Xj = Yσ(j)

34

Page 35: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Yes.

General order statistics

I Consider i.i.d random variables X1, X2, . . . , Xn with continuous probability density f .

I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .

I In particular, Y1 = min{X1, . . . , Xn} and Yn = max{X1, . . . , Xn} is the maximum.

I What is the joint probability density of the Yi ? Qn I Answer: f (x1, x2, . . . , xn) = n! f (xi ) if x1 < x2 . . . < xn, i=1 zero otherwise.

I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation such that Xj = Yσ(j)

I Are σ and the vector (Y1, . . . , Yn) independent of each other?

35

Page 36: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

General order statistics

I Consider i.i.d random variables X1, X2, . . . , Xn with continuous probability density f .

I Let Y1 < Y2 < Y3 . . . < Yn be list obtained by sorting the Xj .

I In particular, Y1 = min{X1, . . . , Xn} and Yn = max{X1, . . . , Xn} is the maximum.

I What is the joint probability density of the Yi ? Qn I Answer: f (x1, x2, . . . , xn) = n! f (xi ) if x1 < x2 . . . < xn, i=1 zero otherwise.

I Let σ : {1, 2, . . . , n} → {1, 2, . . . , n} be the permutation such that Xj = Yσ(j)

I Are σ and the vector (Y1, . . . , Yn) independent of each other?

I Yes. 36

Page 37: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Example: say n = 10 and condition on X1 being the thirdlargest of the Xj .

I Given this, what is the conditional probability density functionfor X1?

I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.

I Answer is beta distribution with parameters (a, b) = (8, 3).

I Up to a constant, f (x) = x7(1− x)2.

I General beta (a, b) expectation is a/(a+ b) = 8/11. Mode is(a−1)

(a−1)+(b−1) = 2/9.

Example

I Let X1, . . . , Xn be i.i.d. uniform random variables on [0, 1].

37

Page 38: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Given this, what is the conditional probability density functionfor X1?

I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.

I Answer is beta distribution with parameters (a, b) = (8, 3).

I Up to a constant, f (x) = x7(1− x)2.

I General beta (a, b) expectation is a/(a+ b) = 8/11. Mode is(a−1)

(a−1)+(b−1) = 2/9.

Example

I Let X1, . . . , Xn be i.i.d. uniform random variables on [0, 1].

I Example: say n = 10 and condition on X1 being the third largest of the Xj .

38

Page 39: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Write p = X1. This kind of like choosing a random p andthen conditioning on 7 heads and 2 tails.

I Answer is beta distribution with parameters (a, b) = (8, 3).

I Up to a constant, f (x) = x7(1− x)2.

I General beta (a, b) expectation is a/(a+ b) = 8/11. Mode is(a−1)

(a−1)+(b−1) = 2/9.

Example

I Let X1, . . . , Xn be i.i.d. uniform random variables on [0, 1].

I Example: say n = 10 and condition on X1 being the third largest of the Xj .

I Given this, what is the conditional probability density function for X1?

39

Page 40: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Answer is beta distribution with parameters (a, b) = (8, 3).

I Up to a constant, f (x) = x7(1− x)2.

I General beta (a, b) expectation is a/(a+ b) = 8/11. Mode is(a−1)

(a−1)+(b−1) = 2/9.

Example

I Let X1, . . . , Xn be i.i.d. uniform random variables on [0, 1].

I Example: say n = 10 and condition on X1 being the third largest of the Xj .

I Given this, what is the conditional probability density function for X1?

I Write p = X1. This kind of like choosing a random p and then conditioning on 7 heads and 2 tails.

40

Page 41: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Up to a constant, f (x) = x7(1− x)2.

I General beta (a, b) expectation is a/(a+ b) = 8/11. Mode is(a−1)

(a−1)+(b−1) = 2/9.

Example

I Let X1, . . . , Xn be i.i.d. uniform random variables on [0, 1].

I Example: say n = 10 and condition on X1 being the third largest of the Xj .

I Given this, what is the conditional probability density function for X1?

I Write p = X1. This kind of like choosing a random p and then conditioning on 7 heads and 2 tails.

I Answer is beta distribution with parameters (a, b) = (8, 3).

41

Page 42: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I General beta (a, b) expectation is a/(a+ b) = 8/11. Mode is(a−1)

(a−1)+(b−1) = 2/9.

Example

I Let X1, . . . , Xn be i.i.d. uniform random variables on [0, 1].

I Example: say n = 10 and condition on X1 being the third largest of the Xj .

I Given this, what is the conditional probability density function for X1?

I Write p = X1. This kind of like choosing a random p and then conditioning on 7 heads and 2 tails.

I Answer is beta distribution with parameters (a, b) = (8, 3).

I Up to a constant, f (x) = x7(1 − x)2 .

42

Page 43: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Example

I Let X1, . . . , Xn be i.i.d. uniform random variables on [0, 1].

I Example: say n = 10 and condition on X1 being the third largest of the Xj .

I Given this, what is the conditional probability density function for X1?

I Write p = X1. This kind of like choosing a random p and then conditioning on 7 heads and 2 tails.

I Answer is beta distribution with parameters (a, b) = (8, 3).

I Up to a constant, f (x) = x7(1 − x)2 .

I General beta (a, b) expectation is a/(a + b) = 8/11. Mode is (a−1) = 2/9. (a−1)+(b−1)

43

Page 44: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Outline

Conditional probability densities

Order statistics

Expectations of sums

44

Page 45: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Outline

Conditional probability densities

Order statistics

Expectations of sums

45

Page 46: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I If X is discrete with mass function p(x) thenE [X ] =

Px p(x)x .

I Similarly, if X is continuous with density function f (x) thenE [X ] =

Rf (x)xdx .

I If X is discrete with mass function p(x) thenE [g(x)] =

Px p(x)g(x).

I Similarly, X if is continuous with density function f (x) thenE [g(X )] =

Rf (x)g(x)dx .

I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =

Py

Px g(x , y)p(x , y).

I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =

R∞−∞

R∞−∞ g(x , y)f (x , y)dxdy .

Properties of expectation

I Several properties we derived for discrete expectations continue to hold in the continuum.

46

Page 47: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Similarly, if X is continuous with density function f (x) thenE [X ] =

Rf (x)xdx .

I If X is discrete with mass function p(x) thenE [g(x)] =

Px p(x)g(x).

I Similarly, X if is continuous with density function f (x) thenE [g(X )] =

Rf (x)g(x)dx .

I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =

Py

Px g(x , y)p(x , y).

I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =

R∞−∞

R∞−∞ g(x , y)f (x , y)dxdy .

Properties of expectation

I Several properties we derived for discrete expectations continue to hold in the continuum.

I If X is discrete with mass function p(x) then P E [X ] = p(x)x . x

47

Page 48: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I If X is discrete with mass function p(x) thenE [g(x)] =

Px p(x)g(x).

I Similarly, X if is continuous with density function f (x) thenE [g(X )] =

Rf (x)g(x)dx .

I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =

Py

Px g(x , y)p(x , y).

I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =

R∞−∞

R∞−∞ g(x , y)f (x , y)dxdy .

Properties of expectation

I Several properties we derived for discrete expectations continue to hold in the continuum.

I If X is discrete with mass function p(x) then P E [X ] = p(x)x . x

I Similarly, if X is continuous with density function f (x) then R E [X ] = f (x)xdx .

48

Page 49: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Similarly, X if is continuous with density function f (x) thenE [g(X )] =

Rf (x)g(x)dx .

I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =

Py

Px g(x , y)p(x , y).

I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =

R∞−∞

R∞−∞ g(x , y)f (x , y)dxdy .

Properties of expectation

I Several properties we derived for discrete expectations continue to hold in the continuum.

I If X is discrete with mass function p(x) then P E [X ] = p(x)x . x

I Similarly, if X is continuous with density function f (x) then R E [X ] = f (x)xdx .

I If X is discrete with mass function p(x) then P E [g(x)] = p(x)g(x). x

49

Page 50: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I If X and Y have joint mass function p(x , y) thenE [g(X ,Y )] =

Py

Px g(x , y)p(x , y).

I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =

R∞−∞

R∞−∞ g(x , y)f (x , y)dxdy .

Properties of expectation

I Several properties we derived for discrete expectations continue to hold in the continuum.

I If X is discrete with mass function p(x) then P E [X ] = p(x)x . x

I Similarly, if X is continuous with density function f (x) then R E [X ] = f (x)xdx .

I If X is discrete with mass function p(x) then P E [g(x)] = p(x)g(x). x

I Similarly, X if is continuous with density function f (x) then R E [g(X )] = f (x)g(x)dx .

50

Page 51: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I If X and Y have joint probability density function f (x , y) thenE [g(X ,Y )] =

R∞−∞

R∞−∞ g(x , y)f (x , y)dxdy .

Properties of expectation

I Several properties we derived for discrete expectations continue to hold in the continuum.

I If X is discrete with mass function p(x) then P E [X ] = p(x)x . x

I Similarly, if X is continuous with density function f (x) then R E [X ] = f (x)xdx .

I If X is discrete with mass function p(x) then P E [g(x)] = p(x)g(x). x

I Similarly, X if is continuous with density function f (x) then R E [g(X )] = f (x)g(x)dx .

I If X and Y have joint mass function p(x , y) then P P E [g(X , Y )] = y x g(x , y)p(x , y).

51

Page 52: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Properties of expectation

I Several properties we derived for discrete expectations continue to hold in the continuum.

I If X is discrete with mass function p(x) then P E [X ] = p(x)x . x

I Similarly, if X is continuous with density function f (x) then R E [X ] = f (x)xdx .

I If X is discrete with mass function p(x) then P E [g(x)] = p(x)g(x). x

I Similarly, X if is continuous with density function f (x) then R E [g(X )] = f (x)g(x)dx .

I If X and Y have joint mass function p(x , y) then P P E [g(X , Y )] = y x g(x , y)p(x , y).

I If X and Y have joint probability density function f (x , y) then R ∞ R ∞ E [g(X , Y )] = g(x , y)f (x , y)dxdy . −∞ −∞

52

Page 53: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I In both discrete and continuous settings, E [aX ] = aE [X ]when a is a constant. And E [

PaiXi ] =

PaiE [Xi ].

I But what about that delightful “area under 1− FX” formulafor the expectation?

I When X is non-negative with probability one, do we haveE [X ] =

R∞0 P{X > x}, in discrete and continuous settings?

I Yes, can prove with integration by parts or...

I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)

I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .

I So E [X ] = E [g(Y )] =R 10 g(y)dy , which is indeed the area

under the graph of 1− FX .

Properties of expectation

I For both discrete and continuous random variables X and Y we have E [X + Y ] = E [X ] + E [Y ].

53

Page 54: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I But what about that delightful “area under 1− FX” formulafor the expectation?

I When X is non-negative with probability one, do we haveE [X ] =

R∞0 P{X > x}, in discrete and continuous settings?

I Yes, can prove with integration by parts or...

I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)

I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .

I So E [X ] = E [g(Y )] =R 10 g(y)dy , which is indeed the area

under the graph of 1− FX .

Properties of expectation

I For both discrete and continuous random variables X and Y we have E [X + Y ] = E [X ] + E [Y ].

I In both discrete and continuous settings, E [aX ] = aE [X ] P P when a is a constant. And E [ ai Xi ] = ai E [Xi ].

54

Page 55: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I When X is non-negative with probability one, do we haveE [X ] =

R∞0 P{X > x}, in discrete and continuous settings?

I Yes, can prove with integration by parts or...

I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)

I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .

I So E [X ] = E [g(Y )] =R 10 g(y)dy , which is indeed the area

under the graph of 1− FX .

Properties of expectation

I For both discrete and continuous random variables X and Y we have E [X + Y ] = E [X ] + E [Y ].

I In both discrete and continuous settings, E [aX ] = aE [X ] P P when a is a constant. And E [ ai Xi ] = ai E [Xi ].

I But what about that delightful “area under 1 − FX ” formula for the expectation?

55

Page 56: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Yes, can prove with integration by parts or...

I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)

I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .

I So E [X ] = E [g(Y )] =R 10 g(y)dy , which is indeed the area

under the graph of 1− FX .

Properties of expectation

I For both discrete and continuous random variables X and Y we have E [X + Y ] = E [X ] + E [Y ].

I In both discrete and continuous settings, E [aX ] = aE [X ] P P when a is a constant. And E [ ai Xi ] = ai E [Xi ].

I But what about that delightful “area under 1 − FX ” formula for the expectation?

I When X is non-negative with probability one, do we have R ∞ E [X ] = P{X > x}, in discrete and continuous settings? 0

56

Page 57: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Define g(y) so that 1− FX (g(y)) = y . (Draw horizontal lineat height y and look where it hits graph of 1− FX .)

I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .

I So E [X ] = E [g(Y )] =R 10 g(y)dy , which is indeed the area

under the graph of 1− FX .

Properties of expectation

I For both discrete and continuous random variables X and Y we have E [X + Y ] = E [X ] + E [Y ].

I In both discrete and continuous settings, E [aX ] = aE [X ] P P when a is a constant. And E [ ai Xi ] = ai E [Xi ].

I But what about that delightful “area under 1 − FX ” formula for the expectation?

I When X is non-negative with probability one, do we have R ∞ E [X ] = P{X > x}, in discrete and continuous settings? 0

I Yes, can prove with integration by parts or...

57

Page 58: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I Choose Y uniformly on [0, 1] and note that g(Y ) has thesame probability distribution as X .

I So E [X ] = E [g(Y )] =R 10 g(y)dy , which is indeed the area

under the graph of 1− FX .

Properties of expectation

I For both discrete and continuous random variables X and Y we have E [X + Y ] = E [X ] + E [Y ].

I In both discrete and continuous settings, E [aX ] = aE [X ] P P when a is a constant. And E [ ai Xi ] = ai E [Xi ].

I But what about that delightful “area under 1 − FX ” formula for the expectation?

I When X is non-negative with probability one, do we have R ∞ E [X ] = P{X > x}, in discrete and continuous settings? 0

I Yes, can prove with integration by parts or...

I Define g(y) so that 1 − FX (g(y)) = y . (Draw horizontal line at height y and look where it hits graph of 1 − FX .)

58

Page 59: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

I So E [X ] = E [g(Y )] =R 10 g(y)dy , which is indeed the area

under the graph of 1− FX .

Properties of expectation

I For both discrete and continuous random variables X and Y we have E [X + Y ] = E [X ] + E [Y ].

I In both discrete and continuous settings, E [aX ] = aE [X ] P P when a is a constant. And E [ ai Xi ] = ai E [Xi ].

I But what about that delightful “area under 1 − FX ” formula for the expectation?

I When X is non-negative with probability one, do we have R ∞ E [X ] = P{X > x}, in discrete and continuous settings? 0

I Yes, can prove with integration by parts or...

I Define g(y) so that 1 − FX (g(y)) = y . (Draw horizontal line at height y and look where it hits graph of 1 − FX .)

I Choose Y uniformly on [0, 1] and note that g(Y ) has the same probability distribution as X .

59

Page 60: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

Properties of expectation

I For both discrete and continuous random variables X and Y we have E [X + Y ] = E [X ] + E [Y ].

I In both discrete and continuous settings, E [aX ] = aE [X ] P P when a is a constant. And E [ ai Xi ] = ai E [Xi ].

I But what about that delightful “area under 1 − FX ” formula for the expectation?

I When X is non-negative with probability one, do we have R ∞ E [X ] = P{X > x}, in discrete and continuous settings? 0

I Yes, can prove with integration by parts or...

I Define g(y) so that 1 − FX (g(y)) = y . (Draw horizontal line at height y and look where it hits graph of 1 − FX .)

I Choose Y uniformly on [0, 1] and note that g(Y ) has the same probability distribution as X . R 1 I So E [X ] = E [g(Y )] = g(y)dy , which is indeed the area 0 under the graph of 1 − FX .

60

Page 61: 18.600 F2019 Lecture 23: Conditional probability, order ... › courses › mathematics › 18-600...I Consider conditional law of X given that Y ∈(y ,y + ). If this has a limit

MIT OpenCourseWarehttps://ocw.mit.edu

18.600 Probability and Random Variables Fall 2019

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

61