cl 202 multiplerandomvars

45
CL202: Introduction to Data Analysis MB+SCP Mani Bhushan, Sachin Patwardhan Department of Chemical Engineering, Indian Institute of Technology Bombay Mumbai, India- 400076 mbhushan,[email protected] Spring 2016 MB+SCP (IIT Bombay) CL202 Spring 2016 1 / 45

Upload: rishabh-jain

Post on 11-Jul-2016

3 views

Category:

Documents


1 download

DESCRIPTION

hj

TRANSCRIPT

CL202: Introduction to Data Analysis

MB+SCP

Mani Bhushan, Sachin PatwardhanDepartment of Chemical Engineering,Indian Institute of Technology Bombay

Mumbai, India- 400076

mbhushan,[email protected]

Spring 2016

MB+SCP (IIT Bombay) CL202 Spring 2016 1 / 45

This handout

Multiple Random Variables

Joint, marginal, conditional distribution and density functions

Independence

MB+SCP (IIT Bombay) CL202 Spring 2016 2 / 45

Extension of Ideas:

Multiple (Multivariate) Random Variables: Jointly distributed randomvariables

Event ω occurs in sample space S . Associate many, X1,X2, ...,Xn, randomvariables with ω.

ω = ω1 × ω2 × . . .× ωn

Each random variable is a valid mapping from S to R.

MB+SCP (IIT Bombay) CL202 Spring 2016 3 / 45

Bivariate Random Variables

For simplicity of notation consider two random variables: X ,Y .

Special case of multiple random variables.

Examples:I Average number of cigarettes smoked daily and the age at which an individual

gets cancer,I Height and weight of an individual,I Height and IQ of an individual.I Flow-rate and pressure drop of a liquid flowing through a pipe.

MB+SCP (IIT Bombay) CL202 Spring 2016 4 / 45

Jointly distributed random variables

Often interested in answering questions on X ,Y taking values in a specifiedregion D in R2 (xy plane).

The distribution functions FX (x) and FY (y) of X and Y determine theirindividual probabilities but not their joint probabilities. In particular, theprobability of the event

{X ≤ x} ∩ {Y ≤ y} = {X ≤ x ,Y ≤ y}

cannot be expressed in terms of FX (x) and FY (y).

Joint probabilities of X ,Y are completely determined if the probability of theabove event is known for every x and y .

MB+SCP (IIT Bombay) CL202 Spring 2016 5 / 45

Joint Probability Distribution Function or Joint CumulativeDistribution Function

For random variables (discrete or continuous) X ,Y , the joint (bivariate)probability distribution function is:

FX ,Y (x , y) = P{X ≤ x ,Y ≤ y}

where x , y are two arbitrary real numbers.Often, the subscript X ,Y omitted.

MB+SCP (IIT Bombay) CL202 Spring 2016 6 / 45

Properties of Joint Probability Distribution Function(Papoulis and Pillai, 2002)

1

F (−∞, y) = F (x ,−∞) = 0, F (∞,∞) = 1.

2

P(x1 < X ≤ x2,Y ≤ y) = F (x2, y)− F (x1, y)

P(X ≤ x , y1 < Y ≤ y2) = F (x , y2)− F (x , y1)

3

P(x1 < X ≤ x2, y1 < Y ≤ y2) = F (x2, y2)− F (x1, y2)− F (x2, y1) + F (x1, y1)

MB+SCP (IIT Bombay) CL202 Spring 2016 7 / 45

Joint Density Function

The joint density of X and Y is by definition the function

f (x , y) =∂2F (x , y)

∂x∂y

It follows that,

F (x , y) =

∫ x

−∞

∫ y

−∞f (ξ, ρ)dξdρ

P((X ,Y ) ∈ D) =

∫ ∫D

f (x , y)dxdy

In particular, as ∆x → 0 and ∆y → 0,

P(x < X ≤ x + ∆x , y < Y ≤ y + ∆y) ≈ f (x , y)∆x∆y

∫∞−∞

∫∞−∞ f (x , y)dxdy = 1; f (x , y) ≥ 0 ∀x , y ∈ R.

MB+SCP (IIT Bombay) CL202 Spring 2016 8 / 45

Joint Density Example: Bivariate Gaussian RandomVariable

f (x , y) = α exp(−0.5(ξ − µ)TP−1(ξ − µ))

with

ξ =

[xy

], µ =

[1−1

], P =

[0.9 0.40.4 0.3

], α =

1

2π√|P|

MB+SCP (IIT Bombay) CL202 Spring 2016 9 / 45

Joint Density Visualization

MB+SCP (IIT Bombay) CL202 Spring 2016 10 / 45

Joint Distribution Visualization

MB+SCP (IIT Bombay) CL202 Spring 2016 11 / 45

Marginal Distribution or Density Functions of IndividualRandom Variables

Marginal Probability Distribution Functions: FX (x),FY (y):I Extract FX (x) from F (x , y) as:

FX (x) = P(X ≤ x) =

= P(X ≤ x ,Y <∞) = F (x ,∞)

I Similarly, extract FY (y) as:

FY (y) = P(Y ≤ y) = P(X <∞,Y ≤ y) = F (∞, y)

Marginal Probability Density Functions: fX (x), fY (y):I Extract these from f (x , y) as:

fX (x) =

∫ ∞−∞

f (x , y)dy

fY (y) =

∫ ∞−∞

f (x , y)dx

MB+SCP (IIT Bombay) CL202 Spring 2016 12 / 45

Marginal Probability Density

fX (x) =

∫ ∞−∞

f (x , y)dy

Makes sense, since

P(X ∈ A) = P(X ∈ A,Y ∈ (−∞,∞))

=

∫A

∫ ∞−∞

f (x , y)dydx

=

∫A

fX (x)dx

where fX (x) is as defined above.Similarly,

fY (y) =

∫ ∞−∞

f (x , y)dx

MB+SCP (IIT Bombay) CL202 Spring 2016 13 / 45

Example 4.3c from Ross

f (x , y) =

{2e−xe−2y 0 < x <∞, 0 < y <∞0 otherwise

Compute: (a) P(X > 1,Y < 1), (b) P(X < Y ), (c) P(X < a)

P(X > 1,Y < 1) =

∫ 1

0

∫ ∞1

2e−xe−2ydxdy

=

∫ 1

0

2e−2y (−e−x |∞1 )dy

= e−1∫ 1

0

2e−2ydy = e−1(1− e−2)

P(X < Y ) =

∫ ∞0

∫ y

0

2e−xe−2ydxdy = 1/3

P(X < a) =

∫ a

0

∫ ∞0

2e−xe−2ydydx = 1− e−a

MB+SCP (IIT Bombay) CL202 Spring 2016 14 / 45

Joint Density Visualization: Exponential

MB+SCP (IIT Bombay) CL202 Spring 2016 15 / 45

Joint Distribution Visualization: Exponential

MB+SCP (IIT Bombay) CL202 Spring 2016 16 / 45

Joint Probability Mass Function (PMF)

Given two discrete random variables X and Y in the same experiment, thejoint PMF of X and Y is

p(xi , yj) = P(X = xi ,Y = yj)

for all pairs of (xi , yj) values that X and Y can take.p(xi , yj) also denoted as pX ,Y (xi , yj).

The marginal probability mass functions for X and Y are

pX (x) = P(X = x) =∑y

pX ,Y (x , y)

pY (y) = P(Y = y) =∑x

pX ,Y (x , y)

MB+SCP (IIT Bombay) CL202 Spring 2016 17 / 45

Computation of Marginal PMF from Joint PMF

Formally:

{X = xi} =⋃j

{X = xi ,Y = yj}

All events on RHS are mutually exclusive. Thus,

pX (xi ) = P(X = xi ) =∑j

P(X = xi ,Y = yj) =∑j

p(xi , yj)

Similarly,pY (yj) = P(Y = yj) =

∑i

p(xi , yj).

Note: P(X = xi ,Y = yj) cannot be constructed from knowledge of P(X = xi )and P(Y = yj).

MB+SCP (IIT Bombay) CL202 Spring 2016 18 / 45

Example: 4.3a, Ross

3 batteries are randomly chosen from a group of 3 new, 4 used but still working,and 5 defective batteries. Let X , Y denote the number of new, and used butworking batteries that are chosen, respectively. Findp(xi , yj) = P(X = xi ,Y = yj).Solution: Let T =12 C3

p(0, 0) = (5C3)/Tp(0, 1) = (4C1)(5C2)/Tp(0, 2) = (4C2)(5C1)/Tp(0, 3) = (4C3)/Tp(1, 0) = (3C1)(5C2)/Tp(1, 1) = (3C1)(4C1)(5C1)/Tp(1, 2) = ...p(2, 0) = ...p(2, 1) = ...p(3, 0) = ...

MB+SCP (IIT Bombay) CL202 Spring 2016 19 / 45

Tabular Form

0 1 2 3 Row Sum(P(X = i))

0 10/220 40/220 30/220 4/220 84/2201 30/220 60/220 18/220 0 108/2202 15/220 12/220 0 0 27/2203 1/220 0 0 0 1/220Col sum 56/220 112/220 48/220 4/220(P(Y = j))

i represents row and j represents column:Both row and column sums add upto 1.Marginal probabilities in the margins of the table.

MB+SCP (IIT Bombay) CL202 Spring 2016 20 / 45

n Random Variables

Joint cumulative probability distribution function F (x1, x2, ..., xn) of n randomvariables X1,X2, ...,Xn is defined as:

F (x1, x2, ..., xn) = P(X1 ≤ x1,X2 ≤ x2, ...,Xn ≤ xn)

If random vars. discrete: joint probability mass function

p(x1, x2, ..., xn) = P(X1 = x1,X2 = x2, ...,Xn = xn)

If random vars. continuous: joint probability density function f (x1, x2, ..., xn)such that for any set C in n-dimensional space

P((X1,X2, ...,Xn) ∈ C ) =

∫ ∫. . .

∫(x1,...,xn)∈C

f (x1, x2, ..., xn)dx1dx2...dxn

where,

f (x1, x2, ..., xn) =∂nF (x1, x2, . . . , xn)

∂x1∂x2 . . . ∂xn

MB+SCP (IIT Bombay) CL202 Spring 2016 21 / 45

Obtaining Marginals

FX1(x1) = F (x1,∞,∞, . . . ,∞)

fX1(x1) =

∫ ∞−∞

∫ ∞−∞

. . .

∫ ∞−∞

f (x1, x2, . . . , xn)dx2dx3 . . . dxn

pX1(x1) =∑x2

∑x3

. . .∑xn

p(x1, x2, . . . , xn)

MB+SCP (IIT Bombay) CL202 Spring 2016 22 / 45

Independence of Random Variables

Random variables X and Y are independent if for any two sets of realnumbers A and B:

P(X ∈ A,Y ∈ B) = P(X ∈ A)P(Y ∈ B)

i.e. events EA = {X ∈ A} and EB = {Y ∈ B} are independent.

Height and IQ

In particular: P(X ≤ a,Y ≤ b) = P(X ≤ a)P(Y ≤ b), or

In terms of joint cumulative distribution function F of X and Y :

F (a, b) = FX (a)FY (b); ∀a, b ∈ R

Random variables that are not independent are dependent

MB+SCP (IIT Bombay) CL202 Spring 2016 23 / 45

Independence: Probability Mass and Density Functions

Random variables X ,Y independent if:

Discrete random variables: Probability mass function

p(xi , yj) = pX (xi )pY (yj) for all xi , yj

Continuous random variables: Probability density function

f (x , y) = fX (x)fY (y) for all x , y

MB+SCP (IIT Bombay) CL202 Spring 2016 24 / 45

Independence: Equivalent Statements

1) P(X ∈ A,Y ∈ B) = P(X ∈ A)P(Y ∈ B); ∀A,B sets in R

2) F (x , y) = FX (x)FY (y); ∀x , y3) f (x , y) = fX (x)fY (y); ∀x , y ; continuous RVs

3) p(xi , yj) = pX (xi )pY (yj); ∀xi , yj ; discrete RVs

MB+SCP (IIT Bombay) CL202 Spring 2016 25 / 45

Example 5.2 (Ogunnaike, 2009)

The reliability of the temperature control system for a commercial, highlyexothermic polymer reactor is known to depend on the lifetimes (in years) of thecontrol hardware electronics, X1, and of the control valve on the cooling waterline, X2. If one component fails, the entire control system fails. The randomphenomenon in question is characterized by the two-dimensional random variable(X1,X2) whose joint probability distribution is given as:

f (x1, x2) =

{150e−(0.2x1+0.1x2), 0 < x1 <∞, 0 < x2 <∞

0, otherwise

1 Establish that above is a legitimate joint probability density function,To show:

∫∞0

∫∞0

f (x1, x2)dx1dx2 = 1.∫ ∞0

∫ ∞0

1

50e−(0.2x1+0.1x2)dx1dx2 =

1

50(−5e−0.2x1 |∞0 )(−10e−0.1x2 |∞0 ) = 1

MB+SCP (IIT Bombay) CL202 Spring 2016 26 / 45

Example (Continued)

1 Whats the probability of the system lasting more than 2 years.To find: P(X1 > 2,X2 > 2) =

∫∞2

∫∞2

150e−(0.2x1+0.1x2)dx1dx2 = 0.549.

2 Find marginal density function of X1.

fX1(x1) =

∫ ∞0

1

50e−(0.2x1+0.1x2)dx2 =

1

5e−(0.2x1)

3 Find marginal density function of X2?

fX2(x2) =

∫ ∞0

1

50e−(0.2x1+0.1x2)dx1 =

1

10e−(0.1x2)

4 Are X1,X2 independent? Yes, since f (x1, x2) = fX1(x1)fX2(x2).

MB+SCP (IIT Bombay) CL202 Spring 2016 27 / 45

Independence of n Random Variables

Random variables X1,X2, ...,Xn are said to be independent if

For all sets of real numbers A1,A2, ...,An:

P(X1 ∈ A1,X2 ∈ A2, ...,Xn ∈ An) =n∏

i=1

P(Xi ∈ Ai )

In particular: ∀a1, a2, ..., an ∈ R

P(X1 ≤ a1,X2 ≤ a2, ...,Xn ≤ an) =n∏

i=1

P(Xi ≤ ai ), or

F (a1, a2, ..., an) =n∏

i=1

FXi (ai )

For discrete random variables: probability mass function factorizes:

p(x1, x2, ..., xn) = pX1(x1)pX2(x2)...pXn(xn)

For continuous random variables: probability density function factorizes:

f (x1, x2, ..., xn) = fX1(x1)fX2(x2)...fXn(xn)

MB+SCP (IIT Bombay) CL202 Spring 2016 28 / 45

Independent, Repeated Trials

In statistics, one usually does not consider just a single experiment, but thatthe same experiment is performed several times.

Associate a separate random variable with each of those experimentaloutcomes.

If the experiments are independent of each other, then we get a set ofindependent random variables.

Example: Tossing a coin n times. Random variable Xi is the outcome (0 or1) in the i th toss.

MB+SCP (IIT Bombay) CL202 Spring 2016 29 / 45

Independent and Identically Distributed (IID) Variables

A collection of random variables is said to be IID if

The variables are independent

The variables have the same probability distribution

Example 1: Tossing a coin n times. The probability of obtaining a head in asingle toss does not vary and all the tosses are independent.

I Each toss leads to a random variable with the same probability distributionfunction. The random variables are also independent. Thus, IID.

Example 2: Measuring temperature of a beaker at n time instances in theday. The true water temperature changes throughout the day. The sensor isnoisy.

I Each sensor reading leads to a random variable.I Variables are independent but not identically distributed.

MB+SCP (IIT Bombay) CL202 Spring 2016 30 / 45

Conditional Distributions

Remember for two events A and B: conditional probability of A given B is:

P(A | B) =P(A,B)

P(B)

for P(B) > 0.

MB+SCP (IIT Bombay) CL202 Spring 2016 31 / 45

Conditional Probability Mass Function

For X ,Y discrete random variables, define the conditional probability massfunction of X given Y = y by

pX |Y (x |y) = P(X = x | Y = y) =P(X = x ,Y = y)

P(Y = y)=

p(x , y)

pY (y)

for pY (y) > 0.

MB+SCP (IIT Bombay) CL202 Spring 2016 32 / 45

Examples 4.3b,f from Ross

Question: In a community, 15% families have no children, 20% have 1, 35% have2 and 30% have 3 children. Each child is equally likely to be a boy or girl. Wechoose a family at random. Given that the chosen family has one girl, computethe probability mass function of the number of boys in the family?G: number of girls, B: number of boys, C: number of childrenTo find: P(B = i |G = 1), i = 0, 1, 2, 3.

P(B = i |G = 1) =P(B = i ,G = 1)

P(G = i), i = 0, 1, 2, 3

First find P(G = 1)

{G = 1} = {G = 1} ∩ ({C = 0} ∪ {C = 1} ∪ {C = 2} ∪ {C = 3})P(G = 1) = P(G = 1,C = 0) + P(G = 1,C = 1) + P(G = 1,C = 2)

+ P(G = 1,C = 3)

since C = 0,C = 1,C = 2,C = 3 are mutually exclusive events with union a S .Then,

P(G = 1) = P(G = 1 | C = 0)P(C = 0) + P(G = 1 | C = 1)P(C = 1) + ...

= 0 + (1/2)× 0.2 + ... = 0.3875

MB+SCP (IIT Bombay) CL202 Spring 2016 33 / 45

Example continued

Then,

P(B = 0 | G = 1) =P(B = 0,G = 1)

P(G = 1)

Numerator= P(G = 1andC = 1) = P(G = 1 | C = 1)P(C = 1) = (1/2)0.2 = 0.1. Then,

P(B = 0 | G = 1) = 0.1/0.3875 = 8/31

Similarly:P(B = 1 | G = 1) = 14/31,P(B = 2 | G = 1) = 9/31,P(B = 3 | G = 1) = 0.Check: Sum of conditional probabilities is 1.

MB+SCP (IIT Bombay) CL202 Spring 2016 34 / 45

Conditional Probability Density Function

For Random Variables X ,Y , conditional probability density of X given that Y = yis defined as:

fX |Y (x | y) =f (x , y)

fY (y)

for fY (y) > 0.Hence, can make statements on probabilities of X taking values in some set Agiven the value obtained by Y as:

P(X ∈ A | Y = y) =

∫A

fX |Y (x | y)dx

MB+SCP (IIT Bombay) CL202 Spring 2016 35 / 45

Independence and Conditional Probabilities

If X ,Y are independent, then

pX |Y (x |y) = pX (x)

fX |Y (x |y) = fX (x)

MB+SCP (IIT Bombay) CL202 Spring 2016 36 / 45

Temperature Control Example (Continued), Example 5.2(Ogunnaike, 2009) Earlier

1 Find Conditional density function: fX1|X2(x1|x2).

f (x1, x2)/fX2(x2) =1

5e−0.2x1

which is same as fX1(x1) in this example.

2 Similarly, fX2|X1(x2|x1) = fX2(x2) in this example.

Generic Question: If fX1|X2(x1|x2) = fX1(x1), then is fX2|X1

(x2|x1) = fX2(x2)?Answer: Yes

MB+SCP (IIT Bombay) CL202 Spring 2016 37 / 45

Example 5.5 (Ogunnaike, 2009)

fX1,X2 =

{x1 − x2, 1 < x1 < 2, 0 < x2 < 10, otherwise

Find: Conditional probability densities.Answer: Compute marginals

fX1(x1) =

{(x1 − 0.5), 1 < x1 < 20, otherwise

fX2(x2) =

{(1.5− x2), 0 < x2 < 10, otherwise

Then compute conditionals

fX1|X2(x1|x2) =

(x1 − x2)

(1.5− x2), 1 < x1 < 2

fX2|X1(x2|x1) =

(x1 − x2)

(x1 − 0.5), 0 < x2 < 1

The random variables X1,X2 are not independent.MB+SCP (IIT Bombay) CL202 Spring 2016 38 / 45

Plots

MB+SCP (IIT Bombay) CL202 Spring 2016 39 / 45

Independence of Transformations

If random variables X ,Y are independent, then the random variables

Z = g(X ),U = h(Y )

are also independent.Proof: Let Az denote the set of points on the x-axis such that g(x) ≤ z and Bu

denote the set of points on the y-axis such that h(y) ≤ u. Then,

{Z ≤ z} = {X ∈ Az}; {U ≤ u} = {Y ∈ Bu}

Thus, the events {Z ≤ z} and {U ≤ u} are independent because events{X ∈ Az} and {Y ∈ Bu} are independent.

MB+SCP (IIT Bombay) CL202 Spring 2016 40 / 45

Expected Value

By analogy with transformation of a single RV, expected value of a transformationof multiple RVs can be defined as:

E [g(X ,Y )] =

∫ ∞−∞

∫ ∞−∞

g(x , y)f (x , y)dxdy

For discrete RVs, the above becomes

E [g(X ,Y )] =∑y

∑x

g(x , y)p(x , y)

MB+SCP (IIT Bombay) CL202 Spring 2016 41 / 45

Special Cases

g(X ,Y ) = X + Y . Then,

E (g(X ,Y )) = E [X ] + E [Y ]

g(X ,Y ) = (X − E [X ])(Y − E [Y ]): covariance of X,Y; labeled Cov(X,Y).

Correlation coefficient: ρ

ρ =Cov(X ,Y )

σXσY

Property: ρ dimensionless, −1 ≤ ρ ≤ 1.

MB+SCP (IIT Bombay) CL202 Spring 2016 42 / 45

Independence versus Covariance

If X ,Y are independent, then

Cov(X ,Y ) = 0

Independence =⇒ covariance=0 (variables uncorrelated)

Covariance=0 6=⇒ independenceExample: X,Y take values (0, 1), (−1, 0), (0,−1), (1, 0) with equal probability(1/4).Cov(X,Y)=0, but X,Y not independent.

MB+SCP (IIT Bombay) CL202 Spring 2016 43 / 45

Independence Implications

g(X ,Y ) = XYI If X ,Y independent,

E [XY ] = E [X ]E [Y ]

g(X ,Y ) = h(X )l(Y )I If X ,Y independent,

E [h(X )l(Y )] = E [h(X )]E [l(Y )]

MB+SCP (IIT Bombay) CL202 Spring 2016 44 / 45

THANK YOU

MB+SCP (IIT Bombay) CL202 Spring 2016 45 / 45