math note3

Chapter 3. Some Special Distributions

§3.1 The Binomial and Related Distributions

(a) Bernoulli trial (p.133)

X ∼ Bernoulli(p) (0 ≤ p ≤ 1)

- pdf : f(x) = px(1 − p)1−x, x = 0, 1

- mgf : M(t) = pet + q, −∞ < t < ∞- mean & variance : E(X) = p, Var(X) = p(1 − p)

- Bernoulli process Xnn≥1 : XnIID∼ Bernoulli(p)

(b) Binomial distribution (p.134-p.135)

X ∼ Bin(n, p) 0 ≤ p ≤ 1 (교재 b(n, p))

- pdf : f(x) =

(n

x

)px(1 − p)n−x, x = 0, 1, . . . , n

- mgf : M(t) = (pet + q)n, −∞ < t < ∞ (q = 1 − p)

- mean & variance : E(X) = np, Var(X) = np(1 − p)

- X ∼ Bin(n, p) ⇔ Xd≡ X1 + · · ·+ Xn, Xi

IID∼ Bernoulli(p)

(# of successes)

(c) Geometric distribution (p.137)

X ∼ Geo(p) (0 < p < 1)

- pdf : f(x) = p(1 − p)x, x = 0, 1, 2, . . .

- mgf : M(t) = p(1 − qet)−1, t < − log q (q = 1 − p)

- mean & variance : E(X) = q/p, Var(X) = q/p2

- X ∼ Geo(p) ⇔ Xd≡ W1 − 1, W1 = minn : X1 + · · ·+ Xn ≥ 1(

# of trials before the first success

in a Bernoulli process

)

1

(d) Negative binomial distribution (p.137)

X ∼ Negbin(r, p)

- pdf : f(x) =

(x + r − 1

r − 1

)pr(1 − p)x, x = 0, 1, 2, . . .

- mgf : M(t) = pr(1 − qet)−r, t < − log q (q = 1 − p)

- binomial expansion : (1 + x)r =

∞∑

k=0

(r

k

)xk, |x| < 1

- mean & variance : E(X) = rq/p, Var(X) = rq/p2

- X ∼ Negbin(r, p) ⇔ Xd≡ Wr − r, Wr = minn : X1 + · · · + Xn ≥ r(

# of failures before the rth success

in a Bernoulli process

)

Fact : X ∼ Negbin(r, p) ⇔ Xd≡ X1 + · · ·+ Xr, Xi

IID∼ Geo(p)

Note that the inter “arrival (occurrence)” times

W1 − 1, W2 − W1 − 1, W3 − W2 − 1, · · · , Wr − Wr−1 − 1 : IID Geo(p)

∵ P (W1 − 1 = y1, W2 − W1 − 1 = y2, · · · , Wr − Wr−1 − 1 = yr)

= P (Xi = 0, i = 1, · · · , y1, Xy1+1 = 1, Xy1+1+i = 0, i = 1, · · · , y2,

Xy1+1+y2+1 = 1, · · · )= (qy1p) · (qy2p) · · · (qyrp)

Property (p.137)

1© X1 ∼ Bin(n1, p), X2 ∼ Bin(n2, p), X1 & X2 independent

|⇒ X1 + X2 ∼ Bin(n1 + n2, p)

2© X1 ∼ Negbin(r1, p), X2 ∼ Negbin(r2, p), X1 & X2 independent

|⇒ X1 + X2 ∼ Negbin(r1 + r2, p)

P (X1 + X2 = y) =∑∑

x1+x2=y

(n1

x1

)px1qn1−x1

(n2

x2

)px2qn2−x2

=

∑∑

x1+x2=y

(n1

x1

)(n2

x2

)pyqn1+n2−y

=

(n1 + n2

y

)pyqn1+n2−y, y = 0, 1, · · · , n1 + n2

2

(e) Multinomial trial (p.137-p.138)

(X1, · · · , Xk−1)′ ∼ Multi(n, (p1, · · · , pk−1, pk)

′) (

k∑

1

pi = 1, pi > 0)

- pdf : f(x1, · · · , xk−1) =

(n

x1 · · · xk−1 (n − x·)

)px1

1 · · · pxk−1

k−1 pn−x·

k

xi = 0, 1, · · · , n (i = 1, · · · , k−1), x· =∑k−1

i=1 xi ≤ n

- mgf : (p1et1 + · · ·+ pk−1e

tk−1 + pk)n, −∞ < ti < ∞

- mean and variance : EXi = npi, cov(Xi, Xj) =

npi(1 − pi) (i = j)

−npipj (i 6= j)

- From now on :

(X1, · · · , Xk)′ ∼ Multi(n, (p1, · · · , pk)

′)

with Xk ≡ n − (X1 + · · · + Xk−1) when k ≥ 3

mgf : (p1et1 + · · ·+ pk−1e

tk−1 + pketk)n

Var : n(diag(p) − pp′) with p = (p1, · · · , pk)′.

- marginal distribution and conditional distribution (p.139)

(X1, · · · , Xk)′ ∼ Multi(n, (p1, · · · , pk)

′)

|⇒ (i) Xi ∼ Bin(n, pi)

(ii) (X1, X2, n − (X1 + X2))′ ∼ Trinomial(n, (p1, p2, 1 − p1 − p2)

′)

(iii) (X2, X3, · · · , Xk)′|X1 = x1 ∼ Multi(n−x1, (

p2

1 − p1, · · · ,

pk

1 − p1)′)

Probability (conceptualization of the relative frequency) (WLLN) (p.136)

- X1, X2, · · · , Xn : iid Bernoulli(p)

pn =X1 + · · ·+ Xn

n

limn→∞

P (|pn − p| ≥ ε) = 0 for all ε > 0

R code (p.135, p.140 #3.1.8)

dbinom(pdf), qbinom(quantile), pbinom(cdf), rbinom(random number)

rmultinom, dmultinom

3

§3.2 The Poisson Distribution

Poisson distribution

X ∼ Poisson(m)

- pdf : f(x) = mxe−m/x!, x = 0, 1, 2, · · ·

- mgf : M(t) = exp(m(et − 1))

- mean and variance : E(X) = m, Var(X) = m

(exponential : em =

∞∑

x=0

mx

x!)

Poisson approximation to binomial distribution

limn→∞

npn→µ

(n

x

)px

n(1 − pn)n−x =µxe−µ

x!

(Recall lim

n→∞

(1 +

a

n

)n

= ea, limn→∞

(1 +

a + o(1)

n

)n

= ea, Handout #2)

Poisson process with arrival rate λ : Nt : t ≥ 0(i) (stationarity) Nt+s − Ns d

= Nt(ii) (independent increments) (3 on p.143) For 0 < t1 < · · · < tk,

Nt1 , Nt2 − Nt1 , · · · , Ntk − Ntk−1are independent.

(iii) (rareness) (2 on p.143)

P (Nh ≥ 2) = o(h) as h → 0 (Handout 2)

(iv) (proportionality) (1 on p.143)

P (Nh = 1) = λh + o(h) as h → 0 (Handout 2)

P (Nt = n) =(λt)ne−λt

n!, n = 0, 1, · · ·

4

(Derivation) (p.144)

g(n, t) ≡ P (Nt = n)

(∗)

g(0, 0) = 1∂

∂tg(0, t) = −λg(0, t)

g(0, t) = e−λt

(∗∗)

g(n, 0) = 0 for n = 1, 2, · · ·∂

∂tg(n, t) = −λg(n, t) + λg(n − 1, t)

(∗) g(0, t + h) = P (Nt+h = 0)

= P (Nt = 0, Nt+h − Nt = 0)

= P (Nt = 0)P (Nt+h − Nt = 0) (ii)

= P (Nt = 0)P (Nh = 0) (i)

= g(0, t)(1− P (Nh = 1) − P (Nh ≥ 2))

= g(0, t)(1− λh + o(h)) (iii)

g(0, t + h) − g(0, t) = −λg(0, t) · h + o(h)

(∗∗) g(n, t + h) = P (Nt+h = n)

= P (Nt+h = n, Nt = n)

+P (Nt+h = n, Nt = n − 1)

+P (Nt+h = n, Nt ≤ n − 2)

= g(n, t)(1 − λh + o(h)) + g(n − 1, t)λh + o(h)

∂

∂tg(n, t) = −λg(n, t) + λg(n − 1, t)

∂

∂t

(eλtg(n, t)/λn

)= eλtg(n − 1, t)/λn−1

eλtg(n, t)/λn = tn/n! (eλtg(0, t)/λ0 = 1)

5

Property (p.146)

X1, · · · , Xn : independent Poisson(mi) (i = 1, · · · , n), respectively

|⇒ X1 + · · · + Xn ∼ Poisson(m1 + · · ·+ mn)

Proof (n = 2)

P (X1 + X2 = y) =∑∑

x1+x2=y

mx11 e−m1

x1!

mx22 e−m2

x2!, y = 0, 1, · · ·

=

y∑

x=0

(y

x

)mx

1my−x2

1

y!e−(m1+m2), y = 0, 1, · · ·

=(m1 + m2)

ye−(m1+m2)

y!

Read (eg 3.2.3, eg 3.2.4 on p.147)

6

§3.3 The Gamma and Related Distributions

Fact (p.149)

1© Γ(α) ≡∫ ∞

0

yα−1e−ydy < ∞ for α > 0

2© Γ(n) = (n − 1)! (n = 1, 2, · · · ), Γ(α) = (α − 1)Γ(α − 1), α > 1

3© Γ(1

2

)=

√π

Gamma distribution (p.149)

X ∼ Gamma(α, β) (α > 0, β > 0) (교재 Γ(α, β))

- pdf : f(x) =1

Γ(α)βαxα−1e−x/βI(0,∞)(x)

- mgf : M(t) = (1 − βt)−α, t < 1/β

- mean and variance : E(X) = αβ, Var(X) = αβ2

α : shape parameter, β : scale parameter (See p.152)

Exponential distribution (p.150)

Exp(β) = Gamma(1, β) (exponential with mean β)

(교재) Exp(λ) = Gamma(1, 1/λ)

Fact (p.150) (Waiting time in a Poisson process)

Ntt≥0 : Poisson process with occurrence rate λ > 0

Wr ≡ mint : Nt ≥ r waiting time until rth occurrence.

(a) Wr ∼ Gamma(r, 1/λ)

(b) W1, W2 − W1, · · · : IID Gamma(1, 1/λ) ≡ Exp(λ)

7

(a) P (Wr > t) = P (Nt ≤ r − 1)

=r−1∑

k=0

e−λt(λt)k

k!(∵ Nt ∼Poisson (λt))

d

dtP (Wr ≤ t) = −

r−1∑

k=0

((−λ)

e−λt(λt)k

k!+ λk

e−λt(λt)k−1

k!

)

= λ

(r−1∑

k=0

e−λt(λt)k

k!−

r−2∑

k=0

e−λt(λt)k

k!

)

= λ · e−λt(λt)r−1

(r − 1)!, t > 0

: pdf of Gamma(r, 1/λ)

(b) (Proof for W1 & W2 − W1)

P (W1 > t1, W2 > t2) = P (Nt1 = 0, Nt2 ≤ 1)

= P (Nt1 = 0, Nt2 − Nt1 ≤ 1)

= P (Nt1 = 0)P (Nt2 − Nt1 ≤ 1)

Nt2 − Nt1 ∼ Poisson(λ(t2 − t1))

= e−λt1(e−λ(t2−t1) + e−λ(t2−t1)λ(t2 − t1)

), t1 < t2

pdfW1,W2(t1, t2) = λ2e−λt2I(0<t1<t2)

pdfW1,W2−W1(y1, y2) = λe−λy1I(0,∞)(y1) · λe−λy2I(0,∞)(y2)

∴ W1, W2 − W1 : IID Gamma(1, 1/λ)

Fact (p.154)

(a) X1, · · · , Xn : independent Gamma(αi, β) (i = 1, · · · , n), respectively

⇒ X1 + · · ·+ Xn ∼ Gamma(α1 + · · ·+ αn, β)

(b) X ∼ Gamma(r, β)

⇔ Xd= Z1 + · · · + Zr, Zi

IID∼ Exp(β−1) = Gamma(1, β)

(mgf technique : (1 − βt)−(α1+···+αn))

8

Chi-square distribution (p.152)

χ2(r) ≡ Gamma(r

2, 2)

- pdf : f(x) =1

Γ( r2)2r/2

xr2−1e−

x2 I(0,∞)(x)

- mgf : M(t) = (1 − 2t)−r/2, t < 1/2

- mean and variance : E(X) = r, Var(X) = 2r

(r : degree of freedom)

Rcode (p.153) pchisq, dchisq

Remark : It is related to the distribution of the sample variance from a

normal population (p.186) (Thm 3.6.1)

Fact (p.154)

(a) X1, · · · , Xn : independent χ2(ri) (i = 1, · · · , n), respectively

⇒ X1 + · · ·+ Xn ∼ χ2(r1 + · · ·+ rn)

(b) X ∼ χ2(r) ⇔ Xd≡ X1 + · · ·+ Xr, Xi

IID∼ χ2(1)

Beta distribution (p.155)

X ∼ Beta(α, β)

- pdf : f(x) =Γ(α + β)

Γ(α)Γ(β)xα−1(1 − x)β−1I(0,1)(x)

- mgf : (not explicit form)

- mean and variance : E(X) =α

α + β, Var(X) =

αβ

(α + β)2(α + β + 1)

Rcode (p.156) pbeta, dbeta

9

Fact (p.155)

X ∼ Beta(α, β) ⇔ Xd≡ X1

X1 + X2

where X1 ∼ Gamma(α, 1), X2 ∼ Gamma(β, 1), X1, X2 : independent

∵

-

Y1 = X1 + X2 x1 = y1y2

Y2 =X1

X1 + X2x2 = y1(1 − y2)

- 1-1 from (0,∞) × (0,∞) onto (0,∞) × (0, 1)

-∣∣∣det

(∂x

∂y

)∣∣∣ =

∣∣∣∣∣det

y2 y1

1 − y2 −y1

∣∣∣∣∣ = y1

pdfY1,Y2(y1, y2) = pdfX1,X2

(x1, x2)∣∣∣det

(∂x

∂y

)∣∣∣

=1

Γ(α)xα−1

1 e−x11

Γ(β)xβ−1

2 e−x2 · y1I(0,∞)(y1)I(0,1)(y2)

=1

Γ(α)Γ(β)(y1y2)

α−1(y1(1 − y2))β−1 · y1e

−y1I(0,∞)(y1)I(0,1)(y2)

=1

Γ(α + β)yα+β−1

1 e−y1I(0,∞)(y1)

· Γ(α + β)

Γ(α)Γ(β)yα−1

2 (1 − y2)β−1I(0,1)(y2)

Dirichlet distribution (p.156)

(Y1, · · · , Yk)′ ∼ Dirichlet(α1, · · · , αk, αk+1)

- pdf :Γ(α1 + · · ·+ αk+1)

Γ(α1) · · ·Γ(αk+1)yα1−1

1 · · · yαk−1k (1 − y1 − · · · − yk)

αk+1−1

0 < yi, y1 + · · · + yk < 1

- mean and variance :

E(Y1) =α1

α1 + · · · + αk+1, Var(Y1) =

α1(α2 + · · ·+ αk+1)

α2· (α· + 1)

,

cov(Y1, Y2) =−α1α2

α2· (α· + 1)

, α· = α1 + · · · + αk+1

10

Fact (p.156)

(W1, · · · , Wk)′ ∼ Dirichlet(α1, · · · , αk, αk+1)

⇔ (Wi)1≤i≤kd≡( Xi

X1 + · · ·+ Xk + Xk+1

)

1≤i≤k

where Xiindep∼ Gamma(αi, 1)

(Derivation)

-

Yi ≡Xi

X1 + · · ·+ Xk + Xk+1(i = 1, · · · , k)

Yk+1 ≡ X1 + · · ·+ Xk + Xk+1

: 1-1 from (0,∞)k+1

onto Y = y : 0 < yi < 1, i = 1, · · · , k,∑k

1 yi < 1, yk+1 > 0

-

xi = yiyk+1 (i = 1, · · · , k)

xk+1 = yk+1(1 − y1 − · · · − yk)

- det(∂x

∂y

)= det

yk+1 © y1

. . ....

© yk+1 yk

−yk+1 · · · −yk+1 1 −∑ki=1 yi

= (yk+1)

k

∴ pdfY (y1, · · · , yk, yk+1) =k+1∏

i=1

1

Γ(αi)(xαi−1

i e−xi) ·∣∣∣det

(∂x

∂y

)∣∣∣

for

xi = yiyk+1 (i = 1, · · · , k)

xk+1 = yk+1(1 − y.), y. =∑k

1 yi

=1

Γ(α1) · · ·Γ(αk+1)yα1−1

1 · · · yαk−1k (1 − y.)

αk+1−1

· yα·+αk+1−1k+1 e−yk+1, y ∈ Y , y· =

k∑

i=1

yi, α· =k∑

i=1

αi

∴ (Y1, · · · , Yk)′ and Yk+1 are independent.

Yk+1 ∼ Gamma(α· + αk+1, 1)

pdfY1,··· ,Yk(y1, · · · , yk) =

Γ(α1 + · · ·+ αk+1)

Γ(α1) · · ·Γ(αk)Γ(αk+1)yα1−1

1 · · · yαk−1k (1 − y·)

αk+1−1

0 < yi (i = 1, · · · , k), y· < 1

11

§3.4 The Normal Distribution

The “error” function and it’s integral

φ(x) ≡ 1√2π

e−12x2

,

∫ ∞

−∞φ(x)dx = 1

∵

( ∫∞−∞ φ(x)dx

)2

= 12π

∫∞−∞∫∞−∞ e−

12(x2+y2)dxdy

= 12π

∫ 2π

0

∫∞0

e−12r2

rdrdθ

= 1

Normal approximation to the binomial probability

(De Moivre-Laplace : Handout #3)

limn→∞

∑

x:a≤x−np√

npq≤b

(n

x

)px(1 − p)n−x =

∫ b

a

1√2π

e−12z2

dz

Key steps :

1© (Stirling’s formula)

m! = mm+1/2e−m√

2π(1 + o(1)) as m → ∞

2© (approximation of the density)

For x : a ≤ x − np√npq

≤ b,(

n

x

)px(1 − p)n−x =

1√npq

φ(x − np√

npq

)(1 + o(1))

3© (approximation of the sum as an integral)

binomial probability =b − a

N − 1

N∑

j=1

φ(zj)(1 + o(1))

=

∫ b

a

φ(z)dz + o(1)

where z1 =xmin − np√

npq∼ a, · · · , zN =

xmax − np√npq

∼ b

for xmin(xmax) is the smallest(largest) integer satisfying the inequality.

12

Normal distribution (p.162)

X ∼ N(µ, σ2)

- pdf : f(x) =1√

2πσ2exp

(− 1

2

(x − µ)2

σ2

)

- mgf : M(t) = exp(µt +

1

2σ2t2

)

- mean and variance : E(X) = µ, Var(X) = σ2

Standard normal distribution : N(0, 1)

Z ∼ N(0, 1)

- E(Z) = 0, Var(Z) = E(Z2) = 1,

- E(Z3) = 0 (skewness), E(Z4) = 3 (kurtosis)

Properties

(a) (affine transformation) (p.162, p.166)

(i) X ∼ N(µ, σ2) |⇒ aX + b ∼ N(aµ + b, a2σ2) (a, b : constants)

(ii) X ∼ N(µ, σ2) ⇔ Xd≡ σZ + µ, Z ∼ N(0, 1)

(b) (sum of independent normal rv’s) (p.166)

(i) Xi ∼ N(µi, σ2i ) (i = 1, 2) independent

|⇒ X1 + X2 ∼ N(µ1 + µ2, σ21 + σ2

2)

(ii) X1, · · · , Xn : IID N(µ,σ2)

|⇒ X ∼ N(µ,σ2/n) or

X − µ

σ/√

n∼ N(0, 1)

(c) (square of normal rv) (p.166)

(i) Z ∼ N(0, 1) |⇒ Z2 ∼ χ2(1)

(ii) Y ∼ χ2(r) ⇔ Yd≡ Z2

1 + · · ·+ Z2r , Zi

IID∼ N(0, 1)

13

(Derivation)

(a) (i) (a 6= 0)

pdfaX+b(y) = pdfX(x)∣∣∣dx

dy

∣∣∣, ax + b = y

(b) Note that

pdfX1⊕X2(y) =

∫ ∞

−∞pdfX2

(y − x) · pdfX1(x)dx

=

∫ ∞

−∞

1√2πσ2

2

exp(− (y − x − µ2)

2

2σ22

)· 1√

2πσ21

exp(− (x − µ1)

2

2σ21

)dx

=

∫ ∞

−∞exp

(− 1

2(σ−2

1 + σ−22 )z2

)dz · 1√

2πσ22

1√2πσ2

1

· exp(− 1

2(σ2

1 + σ22)

−1(y − µ1 − µ2)2)

=1√

2π(σ21 + σ2

2)exp

(− 1

2(σ2

1 + σ22)

−1(y − µ1 − µ2)2)

(convolution of pdfX1and pdfX2

: p.93)

(c) (i) pdfZ2(y) =∑

z:z2=y

pdfZ(z)∣∣∣dy

dz

∣∣∣−1

= 2 · (2π)−1/2 exp(−y/2)(2√

y)−1I(0,∞)(y)

=1

Γ(1/2)21/2y1/2−1 exp(−y/2)I(0,∞)(y)

: pdf of Gamma(1/2, 2) = χ2(1)

(ii) follows from the property of χ2 and (i).

Read Handout #4 (Facts from Linear Algebra) and Appendix II

14

§3.5 The Multivariate Normal Distribution

Multivariate normal distribution (p.172 (3.5.8)) (p.173 (3.5.11))

1© (Equivalent definition of MVN)

X ∼ Nn(µ, Σ) Σ : n × n real symmetric non-negative definite

⇔ Xd≡ AZ + µ with AA′ = Σ, A : n × k, rank(A) = k and

Z1, · · · , Zk ∼ N(0, 1) IID

⇔ Xd≡ Σ1/2Z + µ with Σ1/2Σ1/2 = Σ, Σ1/2 : n × n, real symmetric and

Z1, · · · , Zn ∼ N(0, 1) IID

⇔ mgfX(t) = exp(µt +1

2t′Σt), t ∈ Rn

⇔ a′X ∼ N(a′µ, a′Σa) for any a : n × 1

Note that Σ can be singular in the above definition.

IDEA : Derive

E[exp(t′X)] = exp(µ′t +

1

2t′Σt

)

from E[exp(s′Z)] = exp(

12s′s).

2© (pdf of non-singular MVN)

X ∼ Nn(µ, Σ), Σ : n × n real symmetric positive definite

⇔ pdfX(x) = det(2πΣ)−1/2 exp−1

2(x − µ)′Σ−1(x − µ)

IDEA : (p.173 (3.5.12)) Derive the pdf of X = Σ1/2Z + µ from

pdfZ(z) = |2πI|−1/2 exp(− 1

2z′z).

15

3© (Properties of MVN)

(i) (affine invariance of MVN family)

X ∼ N(µ, Σ) ⇒ AX + b ∼ N(Aµ + b, AΣA′)

(ii) (independence of components)

Suppose

X1

X2

∼ N

µ1

µ2

,

Σ11 Σ12

Σ21 Σ22

.

Then, X1 and X2 : independent ⇔ Σ12 = cov(X1, X2) = 0

(iii) (independence of linear functions)

Suppose X ∼ N(µ, Σ).

Then, AX and BX : independent ⇔ cov(AX, BX) = AΣB′ = 0

IDEA :

(i) (p.173 Thm 3.5.1) Xd≡ Σ1/2Z + µ AX + b

d≡ AΣ1/2Z + (Aµ + b)

(AΣ1/2)(AΣ1/2)′ = AΣA′

(ii) (p.175) MX1,X2(t1, t2)t1,t2= MX1(t1)MX2(t2)

where MX1(t1) = MX1,X2(t1, 0) and MX2(t2) = MX1,X2(0, t2)

2t′1Σ12t2t1,t2= 0

(iii)

(AX

BX

)=

(A

B

)X ∼ N(·, ·) & apply (ii).

4© (Conditional and marginal distributions)

Suppose

X1

X2

∼ N

µ1

µ2

,

Σ11 Σ12

Σ21 Σ22

, Σ22 : positive definite.

Then (i) X1|X2=x2∼ N

(µ1 + Σ12Σ

−122 (x2 − µ2), Σ11·2

)with

Σ11·2 = Σ11 − Σ12Σ−122 Σ21,

(ii) X2 ∼ N(µ2, Σ22).

IDEA : Consider

X1 − Σ12Σ−122 X2

X2

=

I − Σ12Σ−122

0 I

X1

X2

,

and apply 3© (i)(ii).

16

5© (Distribution of a quadratic form)

Suppose Z ∼ N(0, I), A : real symmetric.

Then, A2 = A ⇒ Z ′AZ ∼ χ2(r) with r = trace(A).

Remark : The converse (⇐) is also true.

IDEA

• From diagonalization of a real symmetric matrix,

A = Pdiag(λi)P′, PP ′ = I = P ′P

and λi = 1 or 0 since A2 = A.

• May assume λ1 = · · · = λr = 1, λr+1 = · · · = λn = 0

since r = trace(A) = tracePdiag(λi)P′ = tracediag(λi).

• Let W = P ′Z ∼ N(0, I) by 3© (i).

• Z ′AZ = W ′diag(λi)W =∑r

i=1 W 2i ∼ χ2(r)

17

Fundamental Theorem in Normal Sampling (p.186)

X1, · · · , Xn : IID N(µ, σ2) (random sample)

|⇒ (a) X =∑n

1 Xi/n ∼ N(µ, σ2/n)

(b) X and S2 =∑n

i=1(Xi − X)2/(n − 1) are independent

(c) (n − 1)S2/σ2 ∼ χ2(n − 1)

(Derivation)

(a) (done before)

(b) Let X = (X1, · · · , Xn)′ ∼ N(µ1, σ2I)

X = (1′1)−11′X

(n − 1)S2 = X ′(I − 1(1′1)−11′)X = ||(I − 1(1′1)−11′)X||2

Since (1′1)−11′(σ2I)(I − 1(1′1)−11′) = 0,

X = (1′1)−11′X and

X1 − X...

Xn − X

= (I − 1(1′1)−11′)X are indep.

∴ X and

n∑

i=1

(Xi − X)2 = ||(I − 1(1′1)−11′)X||2 are independent.

(c) Xd≡ σZ + µ1, Z ∼ N(0, I)

(n − 1)S2/σ2 d≡ Z ′ (I − 1(1′1)−11′)︸︷︷︸Z

idempotent with trace n − 1

18

§3.6 t and F -Distributions

(I) (Student’s) t-distribution

T ∼ t(r)

(a) representational definition

Td≡ W√

V/r

where W ∼ N(0, 1), V ∼ χ2(r), V, W : independent.

(b) pdf

Γ( r+12

)

Γ(12)Γ( r

2)

(1 +

t2

r

)− r+12 1√

r, −∞ < t < ∞

mgf : does not exist.

(c) mean and variance

- E(T ) = 0 for r > 1 (E(T ) does not exist for r = 1)

- Var(T ) =r

r − 2for r > 2

Property

pdfT (r)(t)r→∞−→ 1√

2πe−

12t2

(use Stirling’s formula)

R code (p. 183)

(Derivation of pdf) (p.182-p.183)

-

T =

W√V/r

U = V

w = t√

u/r

v = u

19

-∣∣∣det

(∂(w, v)

∂(t, u)

)∣∣∣ =

∣∣∣∣∣det

√

u/r 12t 1√

r√

u

0 1

∣∣∣∣∣ =

√u√r

pdfT (t) =

∫ ∞

0

pdfW,V (w, v) ·∣∣∣det

(∂(w, v)

∂(t, u)

)∣∣∣du

=

∫ ∞

0

1

Γ(r/2)2r2

ur2−1e−

12u 1√

2πe−

12(t√

u/r)2 ·√

u

rdu

=1√

2πrΓ(r/2)2r2

∫ ∞

0

ur+12

−1 exp(− 1

2(1 +

t2

r)u)du

Application (p.186)

X1, · · · , Xn : IID from N(µ, σ2)

|⇒ X − µ

S/√

n∼ t(n − 1)

where (n − 1)S2 =∑n

1 (Xi − X)2.

Statement :

P(∣∣∣

X − µ

S/√

n≤ tα/2(n − 1)

∣∣∣)

= 1 − α

(교재 tα/2,n−1) (p.257)

(II) (Fisher’s) F -distribution

F ∼ F (r1, r2)

(a) representational definition

Fd≡ U/r1

V/r2

where U ∼ χ2(r1), V ∼ χ2(r2), U, V : independent.

(b) pdf

Γ( r1+r2

2)

Γ( r1

2)Γ( r2

2)

(r1

r2

) r12 w

r12−1

(1 + r1w/r2)r1+r2

2

, w > 0

mgf : does not exist.

20

(Derivation of pdf) (p.185)

-

W =

U/r1

V/r2

Z = V

u =

r1

r2wz

v = z

-∣∣∣det

( ∂(u, v)

∂(w, z)

)∣∣∣ =

∣∣∣∣∣det

r1

r2z r1

r2w

0 1

∣∣∣∣∣ =

r1

r2z

pdfW (w) =

∫ ∞

0

pdfU,V (u, v) ·(r1

r2

z)dz

=1

Γ( r1

2)Γ( r2

2)2

r1+r22

∫ ∞

0

(r1

r2

wz) r1

2−1

zr22−1(r1

r2

)z exp

(− 1

2

r1

r2

wz − 1

2z)dz

= result

Application (p.263, #5.4.25)

X1, · · · , Xn : IID from N(µ1, σ21)

Y1, · · · , Ym : IID from N(µ2, σ22)

indep.

(n − 1)S21 =

∑ni=1(Xi − X)2, (m − 1)S2

2 =∑m

i=1(Yi − Y )2

=⇒

(m − 1)S22

σ22

∼ χ2(m − 1)

(n − 1)S21

σ21

∼ χ2(n − 1)

indep.

=⇒

(m − 1)S22

σ22

/(m − 1)

(n − 1)S21

σ21

/(n − 1)

∼ F (m − 1, n − 1)

Statement :

P(F1−α/2(m − 1, n − 1) ≤ S2

2

σ22

/S21

σ21

≤ Fα/2(m − 1, n − 1))

= 1 − α

F1−α/2 ·S2

1

S22

≤ σ21

σ22

≤ Fα/2 ·S2

1

S22

21

§3.7 The Uniform Distribution and Random Number

(§5.8 : p.288∼)

Uniform distribution

U ∼ U(a, b)

1© pdf : f(u) =1

b − aI(a,b)(u)

2© representational definition :

U ∼ U(a, b) ⇔ Ud≡ (b − a)Z + a, Z ∼ U(0, 1)

3© mean and variance : For U ∼ U(0, 1),

E(U) = 1/2, Var(U) = 1/12

Note that Ud≡ 1 − U

Probability Integral Transformation (Thm 5.8.1 p.288)

(i) X ∼ cdf F , continuous type, F : strictly ↑ (#5.3.1 p.253)

|⇒ F (X) ∼ U(0, 1)

(ii) U ∼ U(0, 1), F : cdf of a random variable (Thm 5.8.1 p.288)

|⇒ F−1(U) ∼ cdf F

where F−1(u) ≡ inft : F (t) ≥ u.∵ (i) P (F (X) ≤ u) = P (X ≤ F−1(u)) (assuming F is strictly ↑)

= F (F−1(u))

= u (continuous type)

(ii) P (F−1(U) ≤ x) = P (U ≤ F (x))

= F (x)

22

Can show F−1(u) ≤ x ⇔ u ≤ F (x) for F−1(u) ≡ inft : F (t) ≥ u

1© F (F−1(u)) ≥ u, F−1(F (x)) ≤ x

u ≤ F (x) ⇔ F−1(u) ≤ x

2© F : continuous ⇒ F (F−1(u)) = u

3© F : continuous and strictly increasing

⇒ F (F−1(u)) = u, F−1(F (x)) = x

∵ 1© ⇐) F−1(u) ≤ x

⇒ u ≤ F (x + 1/n) ↓ F (x)

⇒ u ≤ F (x)

⇒) u ≤ F (x)

⇒ F−1(u) ≤ x by definition

2© ⇒) F (F−1(u) − 1/n) ≤ u ↑ F (x)

F (F−1(u)) ≤ u

b

u3

u2

u1

F−1(u3) F−1(u2) F−1(u1)

F (F−1(u3)) ≥ u3

F (F−1(u)) ≥ u

(eg) X ∼ Exp(1) ⇔ Xd≡ − log(1 − U), U ∼ U(0, 1)

Z ∼ N(0, 1) ⇔ Zd≡ Φ−1(U), Φ : cdf of N(0, 1)

23

eg 5.8.4 (p.290-p.291)

(a) X1, X2 : IID N(0, 1)

X1 = R cos Θ

X2 = R sin Θ, 0 ≤ R < ∞, 0 ≤ Θ < 2π

pdf of (R, Θ)?

(Solution)

-

x1 = r cos θ

x2 = r sin θ1 − 1

-∣∣∣det

(∂(x1, x2)

∂(r, θ)

)∣∣∣ =

∣∣∣∣∣det

cos θ sin θ

−r sin θ r cos θ

∣∣∣∣∣ = r

pdfR,Θ(r, θ) =1

2πexp

(− 1

2(x2

1 + x22))· r, 0 ≤ r < ∞, 0 ≤ θ < 2π

∴ pdfR,Θ(r, θ) = re−12r2

I(0,∞)(r)︸︷︷︸· 1

2πI(0,2π)(θ)

︸︷︷︸12R2 ∼ Exp(1), Θ ∼ U(0, 2π) independent

(b) U1, U2 : IID U(0, 1)

X1 =

√−2 log(1 − U1) cos(2πU2)

X2 =√−2 log(1 − U1) sin(2πU2)

|⇒ X1, X2 : IID N(0, 1)

(Solution)

R =√

−2 log(1 − U1), Θ = 2πU2

⇒ 12R2 ∼ Exp(1), Θ ∼ U(0, 2π) independent.

pdfX1,X2(x1, x2) = re−

12r2 · 1

2π

∣∣∣det(∂(x1, x2)

∂(r, θ)

)∣∣∣−1

, −∞ < x1, x2 < ∞

= (2π)−1/2 · exp(−12x2

1)(2π)−1/2 exp(−12x2

2)

24

§3.8 Order Statistics (§5.1 and §5.2)

Sampling and Statistics (§5.1 : p.233 ∼ p.235)

a populationobservation :

a part of the populationx1, . . . , xn

sampling

inference

Sampling : to get the “good representatives”, need “random sampling”(p.234)

Inference :

1© modelling : postulate a set of possible distributions

f(·; θ) | θ ∈ Ω (θ : parameter, Ω : parameter space)

(eg Bernoulli(p), N(µ, σ2), Exp(β), . . . )

2© design a sampling : often

X1, · · · , Xn : random sample (r.s.) IID (p.234)

- representative of an infinite population

3© choose a function of the r.s. for an inference:

T (X1, · · · , Xn) (or simply T ) : statistic (p.235)

(eg X or

√∑ni=1(Xi − X)2

(n − 1)or med(Xi),. . . )

4© establish theories for distributions of T (X1, · · · , Xn):

sampling distribution of T (X1, · · · , Xn)

(eg X1 + · · ·+ Xn ∼ Bin(n, p), X1 + · · · + Xn ∼ Poisson(nµ),

X ∼ N(µ, σ2/n), . . .)

Approximation of sampling distributions (§4.2 ∼ §4.5)

Except some trivial cases, difficult to derive the sampling distribution

- approximate it for large sample size n → ∞

25

Order Statistics (§5.2 : p.238 ∼ p.242)

(eg) X1, X2 : IID Exp(1) r.v.’s

Let Y1 < Y2 denote the ordered X1 and X2, i.e.,

Y1 = min(X1, X2), Y2 = max(X1, X2).

(a) Find P (Y1 ≤ y1, Y2 ≤ y2) for 0 < y1 < y2 < ∞.

(b) Find the joint pdf of Y1 and Y2.

(c) Find the marginal pdf’s of Y1, Y2, respectively.

(Solution) Note that

pdfX1,X2(x, y) = e−x−yI(0,∞)(x)I(0,∞)(y)

(a) P (Y1 ≤ y1, Y2 ≤ y2)

= P (Y1 ≤ y1, Y2 ≤ y2, X1 < X2) + P (Y1 ≤ y1, Y2 ≤ y2, X2 < X1)

= P (X1 ≤ y1, X2 ≤ y2, X1 < X2) + P (X2 ≤ y1, X1 ≤ y2, X2 < X1)

=

∫ ∫

0<x<yx≤y1, y≤y2

e−x−ydxdy +

∫ ∫

0<y<xy≤y1, x≤y2

e−x−ydxdy

=

∫ ∫

0<x<yx≤y1, y≤y2

2e−x−ydxdy

=

∫ ∫

x≤y1, y≤y2

2e−x−yI(0<x<y)dxdy

=

∫ y2

0

∫ y∧y1

0

2e−x−ydxdy

= 1 − 2e−y2 − e−2y1 + 2e−y1−y2 (0 < y1 < y2 < ∞)

(b) pdfY1,Y2(y1, y2) =

∂2

∂y1∂y2P (Y1 ≤ y1, Y2 ≤ y2)

= 2e−y1−y2I(0<y1<y2)

26

(c) pdfY1(y1) =

∫ ∞

−∞pdfY1,Y2

(y1, y2)dy2

=

∫ ∞

y1

2e−y1−y2dy2I(0,∞)(y1)

= 2e−2y1I(0,∞)(y1)

pdfY2(y2) =

∫ ∞

−∞pdfY1,Y2

(y1, y2)dy1

=

∫ y2

0

2e−y1−y2dy1I(0,∞)(y2)

= 2(1 − e−y2)e−y2I(0,∞)(y2)

Order statistics (p.238)

X1, · · · , Xn : IID rv’s of continuous type

The ordered X1, · · · , Xn are denoted by X(1) < X(2) < · · · < X(n) and

called the order statistics based on X1, · · · , Xn.

Pdf of order statistics (p.238, p.241)

X1, · · · , Xn : IID with pdf f & cdf F of continuous type.

(a) Joint pdf of X(1), · · · , X(n) :

pdfX(1),X(2),··· ,X(n)(y1, y2, · · · , yn) = n!

n∏

i=1

f(yi)I(y1<y2<···<yn)

(b) Marginal pdf of (X(r), X(s)) (1 ≤ r < s ≤ n) :

pdfX(r),X(s)(x, y) =

n!

(r − 1)! · 1! · (s − 1 − r)! · 1! · (n − s)!(F (x))r−1f(x)

× (F (y) − F (x))s−1−rf(y)(1 − F (y))n−sI(x<y)

(c) Marginal pdf of X(r) :

pdfX(r)(x) =

n!

(r − 1)! · 1! · (n − r)!F (x)r−1f(x)(1 − F (x))n−r

27

Heuristic derivation (p.241)

(eg) (# 5.2.22) (p.249) (Exponential Spacings)

- X1, · · · , Xn : IID Exp(1)

- X(1) < · · · < X(n) : order statistics based on X1, · · · , Xn

-

Z1 = nX(1)

Z2 = (n − 1)(X(2) − X(1))...

Zr = (n − r + 1)(X(r) − X(r−1))...

Zn = 1 · (X(n) − X(n−1))

normalized spacings

|⇒ Z1, · · · , Zn : IID Exp(1)

In other words,

(X(r)

)1≤r≤n

d≡(1

nZ1 + · · ·+ 1

n − r + 1Zr

)

1≤r≤n

where Z1, · · ·Zn : IID Exp(1).

(Proof)

pdfX(1),··· ,X(n)(x(1), · · · , x(n)) = n! · e

−n∑

i=1x(i)

I(0<x(1)<···<x(n))

det

(∂(x(1), · · · , x(n))

∂(z(1), · · · , z(n))

)= det

1

n0 · · · 0 · · · 0

1

n

1

n − 1· · · 0 · · · 0

...

1

n

1

n − 1· · · 1

n − r + 1· · · 0

...

1

n

1

n − 1· · · 1

n − r + 1· · · 1

1

=1

n!

28

0 < x(1) < · · · < x(n) ⇔ zi > 0 (i = 1, · · · , n),n∑

i=1

x(i) =n∑

i=1

zi

∴ pdfZ1,··· ,Zn(z1, · · · , zn) = n! · e

−n∑

i=1x(i) 1

n!

n∏

i=1

I(0,∞)(zi)

=n∏

i=1

e−ziI(0,∞)(zi)

(eg) (uniform order statistics)

U1, · · · , Un : IID U(0,1)

U(1) < · · · < U(n) : order statistics based on U1, · · · , Un

(a) Y1 = U(1)

Y2 = U(2) − U(1)

...

Yn = U(n) − U(n−1)

spacings

|⇒ pdfY1,··· ,Yn(y1, · · · , yn) = n! · I(0<yi, y1+···+yn<1)

In other words,

(U(1), U(2) − U(1), · · · , U(n) − U(n−1))′ ∼ Dirichlet (1, · · · , 1, 1)︸︷︷︸

n+1

i.e. with U(0) ≡ 0,

(U(r) − U(r−1)

)

1≤r≤n

d≡(

Zr

Z1 + · · ·+ Zn + Zn+1

)

1≤r≤n

where Z1, · · · , Zn, Zn+1iid∼ Exp(1) = Gamma(1,1)

(b) U(r) ∼ Beta(r, n − r + 1)(

∵ U(r)d≡ Z1 + · · · + Zr

(Z1 + · · ·+ Zr) + (Zr+1 + · · ·+ Zn+1)

)

(c) (U(r), U(s) − U(r))′ ∼ Dirichlet(r, (s − r), n − s + 1) (1 ≤ r < s ≤ n)

(∵ (U(r), U(s) − U(r))

′ d≡( Z1 + · · ·+ Zr

Z1 + · · ·+ Zn+1

,Zr+1 + · · ·+ Zs

Z1 + · · · + Zn+1

)′)

29

math note3

Documents