lecture 20: covariance / correlation & general bivariate ...cr173/sta230_sp14/lec/lec20.pdfgenerate...

9
Lecture 20: Covariance / Correlation & General Bivariate Normal Sta230 / Mth 230 Colin Rundel April 11, 2012 6.4, 6.5 Covariance and Correlation Covariance We have previously discussed Covariance in relation to the variance of the sum of two random variables (Review Lecture 8). Var (X + Y )= Var (X )+ Var (Y )+2Cov (X , Y ) Specifically, covariance is defined as Cov (X , Y )= E [(X - E (X ))(Y - E (Y ))] = E (XY ) - E (X )E (Y ) this is a generalization of variance to two random variables and generally measures the degree to which X and Y tend to be large (or small) at the same time or the degree to which one tends to be large while the other is small. Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 1 / 33 6.4, 6.5 Covariance and Correlation Covariance, cont. The magnitude of the covariance is not usually informative since it is affected by the magnitude of both X and X . However, the sign of the covariance tells us something useful about the relationship between X and Y . Consider the following conditions: X X and Y Y then (X - μ X )(Y - μ Y ) will be positive. X X and Y Y then (X - μ X )(Y - μ Y ) will be positive. X X and Y Y then (X - μ X )(Y - μ Y ) will be negative. X X and Y Y then (X - μ X )(Y - μ Y ) will be negative. Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 2 / 33 6.4, 6.5 Covariance and Correlation Properties of Covariance Cov (X , Y )= E [(X - μ x )(Y - μ y )] = E (XY ) - μ x μ y Cov (X , Y )= Cov (Y , X ) Cov (X , Y ) = 0 if X and Y are independent Cov (X , c )=0 Cov (X , X )= Var (X ) Cov (aX , bY )= ab Cov (X , Y ) Cov (X + a, Y + b)= Cov (X , Y ) Cov (X , Y + Z )= Cov (X , Y )+ Cov (X , Z ) Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 3 / 33

Upload: others

Post on 01-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Lecture 20: Covariance / Correlation & GeneralBivariate Normal

    Sta230 / Mth 230

    Colin Rundel

    April 11, 2012

    6.4, 6.5 Covariance and Correlation

    Covariance

    We have previously discussed Covariance in relation to the variance of thesum of two random variables (Review Lecture 8).

    Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X ,Y )

    Specifically, covariance is defined as

    Cov(X ,Y ) = E [(X − E (X ))(Y − E (Y ))]= E (XY )− E (X )E (Y )

    this is a generalization of variance to two random variables and generallymeasures the degree to which X and Y tend to be large (or small) at thesame time or the degree to which one tends to be large while the other issmall.

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 1 / 33

    6.4, 6.5 Covariance and Correlation

    Covariance, cont.

    The magnitude of the covariance is not usually informative since it isaffected by the magnitude of both X and X . However, the sign of thecovariance tells us something useful about the relationship between X andY .

    Consider the following conditions:

    X > µX and Y > µY then (X − µX )(Y − µY ) will be positive.

    X < µX and Y < µY then (X − µX )(Y − µY ) will be positive.

    X > µX and Y < µY then (X − µX )(Y − µY ) will be negative.

    X < µX and Y > µY then (X − µX )(Y − µY ) will be negative.

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 2 / 33

    6.4, 6.5 Covariance and Correlation

    Properties of Covariance

    Cov(X ,Y ) = E [(X − µx)(Y − µy )] = E (XY )− µxµy

    Cov(X ,Y ) = Cov(Y ,X )

    Cov(X ,Y ) = 0 if X and Y are independent

    Cov(X , c) = 0

    Cov(X ,X ) = Var(X )

    Cov(aX , bY ) = ab Cov(X ,Y )

    Cov(X + a,Y + b) = Cov(X ,Y )

    Cov(X ,Y + Z ) = Cov(X ,Y ) + Cov(X ,Z )

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 3 / 33

  • 6.4, 6.5 Covariance and Correlation

    Correlation

    Since Cov(X ,Y ) depends on the magnitude of X and Y we would preferto have a measure of association that is not affected by changes in thescales of the random variables.

    The most common measure of linear association is correlation which isdefined as

    ρ(X ,Y ) =Cov(X ,Y )

    σXσY

    −1 < ρ(X ,Y ) < 1

    Where the magnitude of the correlation measures the strength of the linearassociation and the sign determines if it is a positive or negativerelationship.

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 4 / 33

    6.4, 6.5 Covariance and Correlation

    Correlation, cont.

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 5 / 33

    6.4, 6.5 Covariance and Correlation

    Correlation and Independence

    Given random variables X and Y

    X and Y are independent =⇒ Cov(X ,Y ) = ρ(X ,Y ) = 0

    Cov(X ,Y ) = ρ(X ,Y ) = 0 6=⇒ X and Y are independent

    Cov(X ,Y ) = 0 is necessary but not sufficient for independence!

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 6 / 33

    6.4, 6.5 Covariance and Correlation

    Example

    Let X = {−1, 0, 1} with equal probability and Y = X 2. Clearly X and Yare not independent random variables.

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 7 / 33

  • 6.4, 6.5 Covariance and Correlation

    Example - Linear Dependence

    Let X ∼ Unif(0, 1) and Y |X = aX + b for constants a and b. FindCov(X ,Y ) and ρ(X ,Y )

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 8 / 33

    6.4, 6.5 Covariance and Correlation

    Multinomial Distribution

    Let X1,X2, · · · ,Xk be the k random variables that reflect the number ofoutcomes belonging to categories 1, . . . , k in n trials with the probabilityof success for category i being pi , X1, · · · ,Xk ∼ Multinom(n, p1, · · · , pk)

    P(X1 = x1, · · · ,Xk = xk) = f (x1, · · · , xk |n, p1, · · · , pk)

    =n!

    x1! · · · xk !px11 · · · p

    xkk

    wherek∑

    i=1

    xi = n andk∑

    i=1

    pi = 1

    E(Xi ) = npi

    Var(Xi ) = npi (1− pi )Cov(Xi ,Xj) = −npipj

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 9 / 33

    6.4, 6.5 Covariance and Correlation

    Example - Covariance of Multinomial Distribution

    Marginal distribution of Xi - consider category i a success and all other categories to be afailure, therefore in the n trials there are Xi successes and n − Xi failures with the probability ofsuccess is pi and failure is 1− pi which means Xi has a binomial distribution

    Xi ∼ Binom(n, pi )

    Similarly, the distribution of Xi + Xj will also follow a Binomial distribution with probability ofsuccess pi + pj .

    Xi + Xj ∼ Binom(n, pi + pj )

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 10 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal

    Let Z1,Z2 ∼ N (0, 1), which we will use to build a general bivariate normaldistribution.

    f (z1, z2) =1

    2πexp

    [−1

    2(z21 + z

    22 )

    ]

    We want to transform these unit normal distributions to have the followparameters: µX , µY , σX , σY , ρ

    X = µX + σX Z1

    Y = µY + σY

    (ρZ1 +

    √1− ρ2 Z2

    )

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 11 / 33

  • 6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Marginals

    First, lets examine the marginal distributions of X and Y ,

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 12 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Cov/Corr

    Second, we can find Cov(X ,Y ) and ρ(X ,Y )

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 13 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - RNG

    Consequently, if we want to generate a Bivariate Normal random variablewith X ∼ N (µX , σ2X ) and Y ∼ N (µY , σ2Y ) where Corr(X ,Y ) = ρ we cangenerate two independent unit normals Z1 and Z2 and use thetransformation:

    X = µX + σXZ1

    Y = µY + σY

    (ρZ1 +

    √1− ρ2Z2

    )We can also use this result to find the joint density of the BivariateNormal using a 2d change of variables.

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 14 / 33

    6.4, 6.5 General Bivariate Normal

    Multivariate Change of Variables

    Let X1, . . . ,Xn have a continuous joint distribution with pdf f defined of S . We can define nnew random variables Y1, . . . ,Yn as follows:

    Y1 = r1(X1, . . . ,Xn) · · · Yn = rn(X1, . . . ,Xn)

    If we assume that the n functions r1, . . . , rn define a one-to-one differentiable transformationfrom S to T then let the inverse of this transformation be

    x1 = s1(y1, . . . , yn) · · · xn = sn(y1, . . . , yn)

    Then the joint pdf g of Y1, . . . ,Yn is

    g(y1, . . . , yn) =

    {f (s1, . . . , sn)|J| for (y1, . . . , yn) ∈ T0 otherwise

    Where

    J = det

    ∂s1∂y1

    · · · ∂s1∂yn

    .... . .

    ...∂sn∂y1

    · · · ∂sn∂yn

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 15 / 33

  • 6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Density

    The first thing we need to find are the inverses of the transformation. IfX = r1(z1, z2) and Y = r2(z1, z2) we need to find functions Z1 and Z2such that Z1 = s1(X ,Y ) and Z2 = s2(X ,Y ).

    X = σXZ1 + µX

    Z1 =X − µXσX

    Y = σY [ρZ1 +√

    1− ρ2Z2] + µYY − µYσY

    = ρX − µXσX

    +√

    1− ρ2Z2

    Z2 =1√

    1− ρ2

    [Y − µYσY

    − ρX − µXσX

    ]Therefore,

    s1(x , y) =x − µXσX

    s2(x , y) =1√

    1− ρ2

    [y − µYσY

    − ρx − µXσX

    ]

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 16 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Density

    Next we calculate the Jacobian,

    J = det

    [∂s1∂x

    ∂s1∂y

    ∂s2∂x

    ∂s2∂y

    ]= det

    [ 1σX

    0−ρ

    σX√

    1−ρ21

    σY√

    1−ρ2

    ]=

    1

    σXσY√

    1− ρ2

    The joint density of X and Y is then given by

    f (x, y) = f (z1, z2)|J|

    =1

    2πexp

    [−

    1

    2(z21 + z

    22 )

    ]|J| =

    1

    2πσXσY√

    1− ρ2exp

    [−

    1

    2(z21 + z

    22 )

    ]

    =1

    2πσXσY√

    1− ρ2exp

    [−

    1

    2

    [(x − µXσX

    )2+

    1

    1− ρ2

    (y − µYσY

    − ρx − µXσX

    )2]]

    =1

    2πσXσY (1− ρ2)1/2exp

    [−1

    2(1− ρ2)

    ((x − µX )2

    σ2X

    +(y − µY )2

    σ2Y

    − 2ρ(x − µX )σX

    (y − µY )σY

    )]

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 17 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Density (Matrix Notation)

    Obviously, the density for the Bivariate Normal is ugly, and it only getsworse when we consider higher dimensional joint densities of normals. Wecan write the density in a more compact form using matrix notation,

    x =(

    xy

    )µ =

    (µXµY

    )Σ =

    (σ2X ρσXσY

    ρσXσY σ2Y

    )

    f (x) =1

    2π(det Σ)−1/2 exp

    [−1

    2(x − µ)TΣ−1(x − µ)

    ]We can confirm our results by checking the value of (det Σ)−1/2 and(x − µ)TΣ−1(x − µ) for the bivariate case.

    (det Σ)−1/2 =(σ2Xσ

    2Y − ρ2σ2Xσ2Y

    )−1/2=

    1

    σXσY (1− ρ2)1/2

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 18 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Density (Matrix Notation)

    Recall for a 2× 2 matrix,

    A =(

    a bc d

    )A−1 =

    1

    detA

    (d −b−c a

    )=

    1

    ad − bc

    (d −b−c a

    )Then,

    (x − µ)T Σ−1(x − µ)

    =1

    σ2Xσ2Y (1− ρ2)

    (x − µxy − µy

    )T (σ2Y −ρσXσY

    −ρσXσY σ2X

    )(x − µxy − µy

    )

    =1

    σ2Xσ2Y (1− ρ2)

    (σ2Y (x − µX )− ρσXσY (y − µY )−ρσXσY (x − µX ) + σ2X (y − µY )

    )T (x − µxy − µy

    )=

    1

    σ2Xσ2Y (1− ρ2)

    (σ2Y (x − µX )

    2 − 2ρσXσY (x − µX )(y − µY ) + σ2X (y − µY )2)

    =1

    1− ρ2

    ((x − µX )2

    σ2X− 2ρ

    (x − µX )(y − µY )σXσY

    +(y − µY )2

    σ2Y

    )

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 19 / 33

  • 6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Examples

    X ∼ N (0, 1), Y ∼ N (0, 1)

    ρ = 0

    X ∼ N (0, 2), Y ∼ N (0, 1)

    ρ = 0

    X ∼ N (0, 1), Y ∼ N (0, 2)

    ρ = 0

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 20 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Examples

    X ∼ N (0, 1), Y ∼ N (0, 1)

    ρ = 0.25

    X ∼ N (0, 1), Y ∼ N (0, 1)

    ρ = 0.5

    X ∼ N (0, 1), Y ∼ N (0, 1)

    ρ = 0.75

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 21 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Examples

    X ∼ N (0, 1), Y ∼ N (0, 1)

    ρ = −0.25

    X ∼ N (0, 1), Y ∼ N (0, 1)

    ρ = −0.5

    X ∼ N (0, 1), Y ∼ N (0, 1)

    ρ = −0.75

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 22 / 33

    6.4, 6.5 General Bivariate Normal

    General Bivariate Normal - Examples

    X ∼ N (0, 1), Y ∼ N (0, 1)

    ρ = −0.75

    X ∼ N (0, 2), Y ∼ N (0, 1)

    ρ = −0.75

    X ∼ N (0, 1), Y ∼ N (0, 2)

    ρ = −0.75

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 23 / 33

  • 6.4, 6.5 General Bivariate Normal

    Multivariate Normal Distribution

    Matrix notation allows us to easily express the density of the multivariatenormal distribution for an arbitrary number of dimensions. We express thek-dimensional multivariate normal distribution as follows,

    X ∼ Nk(µ,Σ)

    where µ is the k × 1 column vector of means and Σ is the k × kcovariance matrix where {Σ}i ,j = Cov(Xi ,Xj).

    The density of the distribution is

    f (x) =1

    (2π)k/2(det Σ)−1/2 exp

    [−1

    2(x − µ)TΣ−1(x − µ)

    ]

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 24 / 33

    6.4, 6.5 General Bivariate Normal

    Multivariate Normal Distribution - Cholesky

    In the bivariate case, we had a nice transformation such that we cangenerate two independent unit normal values and transform them into asample from an arbitrary bivariate normal distribution.

    There is a similar method for the multivariate normal distribution thattakes advantage of the Cholesky decomposition of the covariance matrix.

    The Cholesky decomposition is defined for a symmetric, positive definitematrix X as

    L = Chol(X )

    where L is a lower triangular matrix such that LLT = X .

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 25 / 33

    6.4, 6.5 General Bivariate Normal

    Multivariate Normal Distribution - RNG

    Let Z1, . . . ,Zk ∼ N (0, 1) and Z = (Z1, . . . ,Zk)T then

    µ + Chol(Σ)Z ∼ Nk(µ,Σ)

    this is offered without proof in the general k-dimensional case but we cancheck that this results in the same transformation we started with in thebivariate case and should justify how we knew to use that particulartransformation.

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 26 / 33

    6.4, 6.5 General Bivariate Normal

    Cholesky and the Bivariate Transformation

    We need to find the Cholesky decomposition of Σ for the general bivariatecase where

    Σ =

    (σ2X ρσXσY

    ρσXσY σ2Y

    )

    We need to solve the following for a, b, c(a 0b c

    )(a b0 c

    )=

    (a2 abab b2 + c2

    )=

    (σ2X ρσXσY

    ρσXσY σ2Y

    )

    This gives us three (unique) equations and three unknowns to solve for,

    a2 = σ2X ab = ρσXσY b2 + c2 = σ2Y

    a = σX

    b = ρσXσY /a = ρσY

    c =√σ2Y − b2 = σY (1− ρ

    2)1/2

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 27 / 33

  • 6.4, 6.5 General Bivariate Normal

    Cholesky and the Bivariate Transformation

    Let Z1,Z2 ∼ N (0, 1) then

    (XY

    )= µ + Chol(Σ)Z

    =

    (µXµY

    )+

    (σX 0

    ρσY σY (1− ρ2)1/2)(

    Z1Z2

    )=

    (µXµY

    )+

    (σXZ1

    ρσYZ1 + σY (1− ρ2)1/2Z2

    )

    X = µX + σXZ1

    Y = µY + σY [ρZ1 + (1− ρ2)1/2Z2]

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 28 / 33

    6.4, 6.5 General Bivariate Normal

    Conditional Expectation of the Bivariate Normal

    Using X = µX + σXZ1 and Y = µY + σY [ρZ1 + (1− ρ2)1/2Z2] whereZ1,Z2 ∼ N (0, 1) we can find E (Y |X ).

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 29 / 33

    6.4, 6.5 General Bivariate Normal

    Conditional Variance of the Bivariate Normal

    Using X = µX + σXZ1 and Y = µY + σY [ρZ1 + (1− ρ2)1/2Z2] whereZ1,Z2 ∼ N (0, 1) we can find Var(Y |X ).

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 30 / 33

    6.4, 6.5 General Bivariate Normal

    Example - Husbands and Wives (Example 5.10.6, deGroot)

    Suppose that the heights of married couples can be explained by a bivariate normal distribution.If the wives have a mean heigh of 66.8 inches and a standard deviation of 2 inches while theheights of the husbands have a mean of 70 inches and a standard deviation of 2 inches. Thecorrelation between the heights is 0.68. What is the probability that for a randomly selectedcouple the wife is taller than her husband?

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 31 / 33

  • 6.4, 6.5 General Bivariate Normal

    Example - Conditionals

    Suppose that X1 and X2 have a bivariate normal distribution where E(X1|X2) = 3.7− 0.15X2,E(X2|X1) = 0.4− 0.6X1, and Var(X2|X1) = 3.64.

    Find E(X1), Var(X1), E(X2), Var(X2), and ρ(X1,X2).

    Based on the given information we know the following,

    E(X1|X2) = µ1 − ρσ1

    σ2µ2 + ρ

    σ1

    σ2X2 = 3.7− 0.15X2

    E(X2|X1) = µ2 − ρσ2

    σ1µ1 + ρ

    σ2

    σ1X1 = 0.4− 0.6X1

    Var(X2|X1) = (1− ρ2)σ22 = 3.64

    From which we get the following set of equations

    (i) µ1 − ρσ1

    σ2µ2 = 3.7 (ii) ρ

    σ1

    σ2= −0.15

    (iii) µ2 − ρσ2

    σ1µ1 = 0.4 (iv) ρ

    σ2

    σ1= −0.6

    (v) (1− ρ2)σ22 = 3.64

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 32 / 33

    6.4, 6.5 General Bivariate Normal

    Example - Conditionals, cont.

    (i) µ1 − ρσ1

    σ2µ2 = 3.7 (ii) ρ

    σ1

    σ2= −0.15

    (iii) µ2 − ρσ2

    σ1µ1 = 0.4 (iv) ρ

    σ2

    σ1= −0.6

    (v) (1− ρ2)σ22 = 3.64

    From (ii) and (iv)

    ρσ1

    σ2× ρ

    σ2

    σ1= 0.15× 0.6 → ρ2 = 0.09 → ρ = ±0.3 = −0.3

    From (v)σ22 = 3.64/(1− ρ2) = 4

    From (ii)

    σ21 =

    (−0.15σ2

    ρ

    )2= 1

    From (i) and (iii)µ1 + 0.15µ2 = 3.7 0.6µ1 + µ2 = 0.4

    µ1 = 4 µ2 = −2

    Sta230 / Mth 230 (Colin Rundel) Lecture 20 April 11, 2012 33 / 33

    6.4, 6.5Covariance and CorrelationGeneral Bivariate Normal