cl 202 multiplerandomvars

CL202: Introduction to Data Analysis

MB+SCP

Mani Bhushan, Sachin PatwardhanDepartment of Chemical Engineering,Indian Institute of Technology Bombay

Mumbai, India- 400076

mbhushan,[email protected]

Spring 2016

MB+SCP (IIT Bombay) CL202 Spring 2016 1 / 45

This handout

Multiple Random Variables

Joint, marginal, conditional distribution and density functions

Independence


Extension of Ideas:

Multiple (Multivariate) Random Variables: Jointly distributed randomvariables

Event ω occurs in sample space S . Associate many, X1,X2, ...,Xn, randomvariables with ω.

ω = ω1 × ω2 × . . .× ωn

Each random variable is a valid mapping from S to R.


Bivariate Random Variables

For simplicity of notation consider two random variables: X ,Y .

Special case of multiple random variables.

Examples:I Average number of cigarettes smoked daily and the age at which an individual

gets cancer,I Height and weight of an individual,I Height and IQ of an individual.I Flow-rate and pressure drop of a liquid flowing through a pipe.


Jointly distributed random variables

Often interested in answering questions on X ,Y taking values in a specifiedregion D in R2 (xy plane).

The distribution functions FX (x) and FY (y) of X and Y determine theirindividual probabilities but not their joint probabilities. In particular, theprobability of the event

{X ≤ x} ∩ {Y ≤ y} = {X ≤ x ,Y ≤ y}

cannot be expressed in terms of FX (x) and FY (y).

Joint probabilities of X ,Y are completely determined if the probability of theabove event is known for every x and y .


Joint Probability Distribution Function or Joint CumulativeDistribution Function

For random variables (discrete or continuous) X ,Y , the joint (bivariate)probability distribution function is:

FX ,Y (x , y) = P{X ≤ x ,Y ≤ y}

where x , y are two arbitrary real numbers.Often, the subscript X ,Y omitted.


Properties of Joint Probability Distribution Function(Papoulis and Pillai, 2002)

1

F (−∞, y) = F (x ,−∞) = 0, F (∞,∞) = 1.

2

P(x1 < X ≤ x2,Y ≤ y) = F (x2, y)− F (x1, y)

P(X ≤ x , y1 < Y ≤ y2) = F (x , y2)− F (x , y1)

3

P(x1 < X ≤ x2, y1 < Y ≤ y2) = F (x2, y2)− F (x1, y2)− F (x2, y1) + F (x1, y1)


Joint Density Function

The joint density of X and Y is by definition the function

f (x , y) =∂2F (x , y)

∂x∂y

It follows that,

F (x , y) =

∫ x

−∞

∫ y

−∞f (ξ, ρ)dξdρ

P((X ,Y ) ∈ D) =

∫ ∫D

f (x , y)dxdy

In particular, as ∆x → 0 and ∆y → 0,

P(x < X ≤ x + ∆x , y < Y ≤ y + ∆y) ≈ f (x , y)∆x∆y

∫∞−∞

∫∞−∞ f (x , y)dxdy = 1; f (x , y) ≥ 0 ∀x , y ∈ R.


Joint Density Example: Bivariate Gaussian RandomVariable

f (x , y) = α exp(−0.5(ξ − µ)TP−1(ξ − µ))

with

ξ =

[xy

], µ =

[1−1

], P =

[0.9 0.40.4 0.3

], α =

1

2π√|P|


Joint Density Visualization


Joint Distribution Visualization


Marginal Distribution or Density Functions of IndividualRandom Variables

Marginal Probability Distribution Functions: FX (x),FY (y):I Extract FX (x) from F (x , y) as:

FX (x) = P(X ≤ x) =

= P(X ≤ x ,Y <∞) = F (x ,∞)

I Similarly, extract FY (y) as:

FY (y) = P(Y ≤ y) = P(X <∞,Y ≤ y) = F (∞, y)

Marginal Probability Density Functions: fX (x), fY (y):I Extract these from f (x , y) as:

fX (x) =

∫ ∞−∞

f (x , y)dy

fY (y) =

∫ ∞−∞

f (x , y)dx


Marginal Probability Density

fX (x) =

∫ ∞−∞

f (x , y)dy

Makes sense, since

P(X ∈ A) = P(X ∈ A,Y ∈ (−∞,∞))

=

∫A

∫ ∞−∞

f (x , y)dydx

=

∫A

fX (x)dx

where fX (x) is as defined above.Similarly,

fY (y) =

∫ ∞−∞

f (x , y)dx


Example 4.3c from Ross

f (x , y) =

{2e−xe−2y 0 < x <∞, 0 < y <∞0 otherwise

Compute: (a) P(X > 1,Y < 1), (b) P(X < Y ), (c) P(X < a)

P(X > 1,Y < 1) =

∫ 1

0

∫ ∞1

2e−xe−2ydxdy

=

∫ 1

0

2e−2y (−e−x |∞1 )dy

= e−1∫ 1

0

2e−2ydy = e−1(1− e−2)

P(X < Y ) =

∫ ∞0

∫ y

0

2e−xe−2ydxdy = 1/3

P(X < a) =

∫ a

0

∫ ∞0

2e−xe−2ydydx = 1− e−a


Joint Density Visualization: Exponential


Joint Distribution Visualization: Exponential


Joint Probability Mass Function (PMF)

Given two discrete random variables X and Y in the same experiment, thejoint PMF of X and Y is

p(xi , yj) = P(X = xi ,Y = yj)

for all pairs of (xi , yj) values that X and Y can take.p(xi , yj) also denoted as pX ,Y (xi , yj).

The marginal probability mass functions for X and Y are

pX (x) = P(X = x) =∑y

pX ,Y (x , y)

pY (y) = P(Y = y) =∑x

pX ,Y (x , y)


Computation of Marginal PMF from Joint PMF

Formally:

{X = xi} =⋃j

{X = xi ,Y = yj}

All events on RHS are mutually exclusive. Thus,

pX (xi ) = P(X = xi ) =∑j

P(X = xi ,Y = yj) =∑j

p(xi , yj)

Similarly,pY (yj) = P(Y = yj) =

∑i

p(xi , yj).

Note: P(X = xi ,Y = yj) cannot be constructed from knowledge of P(X = xi )and P(Y = yj).


Example: 4.3a, Ross

3 batteries are randomly chosen from a group of 3 new, 4 used but still working,and 5 defective batteries. Let X , Y denote the number of new, and used butworking batteries that are chosen, respectively. Findp(xi , yj) = P(X = xi ,Y = yj).Solution: Let T =12 C3

p(0, 0) = (5C3)/Tp(0, 1) = (4C1)(5C2)/Tp(0, 2) = (4C2)(5C1)/Tp(0, 3) = (4C3)/Tp(1, 0) = (3C1)(5C2)/Tp(1, 1) = (3C1)(4C1)(5C1)/Tp(1, 2) = ...p(2, 0) = ...p(2, 1) = ...p(3, 0) = ...


Tabular Form

0 1 2 3 Row Sum(P(X = i))

0 10/220 40/220 30/220 4/220 84/2201 30/220 60/220 18/220 0 108/2202 15/220 12/220 0 0 27/2203 1/220 0 0 0 1/220Col sum 56/220 112/220 48/220 4/220(P(Y = j))

i represents row and j represents column:Both row and column sums add upto 1.Marginal probabilities in the margins of the table.


n Random Variables

Joint cumulative probability distribution function F (x1, x2, ..., xn) of n randomvariables X1,X2, ...,Xn is defined as:

F (x1, x2, ..., xn) = P(X1 ≤ x1,X2 ≤ x2, ...,Xn ≤ xn)

If random vars. discrete: joint probability mass function

p(x1, x2, ..., xn) = P(X1 = x1,X2 = x2, ...,Xn = xn)

If random vars. continuous: joint probability density function f (x1, x2, ..., xn)such that for any set C in n-dimensional space

P((X1,X2, ...,Xn) ∈ C ) =

∫ ∫. . .

∫(x1,...,xn)∈C

f (x1, x2, ..., xn)dx1dx2...dxn

where,

f (x1, x2, ..., xn) =∂nF (x1, x2, . . . , xn)

∂x1∂x2 . . . ∂xn


Obtaining Marginals

FX1(x1) = F (x1,∞,∞, . . . ,∞)

fX1(x1) =

∫ ∞−∞

∫ ∞−∞

. . .

∫ ∞−∞

f (x1, x2, . . . , xn)dx2dx3 . . . dxn

pX1(x1) =∑x2

∑x3

. . .∑xn

p(x1, x2, . . . , xn)


Independence of Random Variables

Random variables X and Y are independent if for any two sets of realnumbers A and B:

P(X ∈ A,Y ∈ B) = P(X ∈ A)P(Y ∈ B)

i.e. events EA = {X ∈ A} and EB = {Y ∈ B} are independent.

Height and IQ

In particular: P(X ≤ a,Y ≤ b) = P(X ≤ a)P(Y ≤ b), or

In terms of joint cumulative distribution function F of X and Y :

F (a, b) = FX (a)FY (b); ∀a, b ∈ R

Random variables that are not independent are dependent


Independence: Probability Mass and Density Functions

Random variables X ,Y independent if:

Discrete random variables: Probability mass function

p(xi , yj) = pX (xi )pY (yj) for all xi , yj

Continuous random variables: Probability density function

f (x , y) = fX (x)fY (y) for all x , y


Independence: Equivalent Statements

1) P(X ∈ A,Y ∈ B) = P(X ∈ A)P(Y ∈ B); ∀A,B sets in R

2) F (x , y) = FX (x)FY (y); ∀x , y3) f (x , y) = fX (x)fY (y); ∀x , y ; continuous RVs

3) p(xi , yj) = pX (xi )pY (yj); ∀xi , yj ; discrete RVs


Example 5.2 (Ogunnaike, 2009)

The reliability of the temperature control system for a commercial, highlyexothermic polymer reactor is known to depend on the lifetimes (in years) of thecontrol hardware electronics, X1, and of the control valve on the cooling waterline, X2. If one component fails, the entire control system fails. The randomphenomenon in question is characterized by the two-dimensional random variable(X1,X2) whose joint probability distribution is given as:

f (x1, x2) =

{150e−(0.2x1+0.1x2), 0 < x1 <∞, 0 < x2 <∞

0, otherwise

1 Establish that above is a legitimate joint probability density function,To show:

∫∞0

∫∞0

f (x1, x2)dx1dx2 = 1.∫ ∞0

∫ ∞0

1

50e−(0.2x1+0.1x2)dx1dx2 =

1

50(−5e−0.2x1 |∞0 )(−10e−0.1x2 |∞0 ) = 1


Example (Continued)

1 Whats the probability of the system lasting more than 2 years.To find: P(X1 > 2,X2 > 2) =

∫∞2

∫∞2

150e−(0.2x1+0.1x2)dx1dx2 = 0.549.

2 Find marginal density function of X1.

fX1(x1) =

∫ ∞0

1

50e−(0.2x1+0.1x2)dx2 =

1

5e−(0.2x1)

3 Find marginal density function of X2?

fX2(x2) =

∫ ∞0

1

50e−(0.2x1+0.1x2)dx1 =

1

10e−(0.1x2)

4 Are X1,X2 independent? Yes, since f (x1, x2) = fX1(x1)fX2(x2).


Independence of n Random Variables

Random variables X1,X2, ...,Xn are said to be independent if

For all sets of real numbers A1,A2, ...,An:

P(X1 ∈ A1,X2 ∈ A2, ...,Xn ∈ An) =n∏

i=1

P(Xi ∈ Ai )

In particular: ∀a1, a2, ..., an ∈ R

P(X1 ≤ a1,X2 ≤ a2, ...,Xn ≤ an) =n∏

i=1

P(Xi ≤ ai ), or

F (a1, a2, ..., an) =n∏

i=1

FXi (ai )

For discrete random variables: probability mass function factorizes:

p(x1, x2, ..., xn) = pX1(x1)pX2(x2)...pXn(xn)

For continuous random variables: probability density function factorizes:

f (x1, x2, ..., xn) = fX1(x1)fX2(x2)...fXn(xn)


Independent, Repeated Trials

In statistics, one usually does not consider just a single experiment, but thatthe same experiment is performed several times.

Associate a separate random variable with each of those experimentaloutcomes.

If the experiments are independent of each other, then we get a set ofindependent random variables.

Example: Tossing a coin n times. Random variable Xi is the outcome (0 or1) in the i th toss.


Independent and Identically Distributed (IID) Variables

A collection of random variables is said to be IID if

The variables are independent

The variables have the same probability distribution

Example 1: Tossing a coin n times. The probability of obtaining a head in asingle toss does not vary and all the tosses are independent.

I Each toss leads to a random variable with the same probability distributionfunction. The random variables are also independent. Thus, IID.

Example 2: Measuring temperature of a beaker at n time instances in theday. The true water temperature changes throughout the day. The sensor isnoisy.

I Each sensor reading leads to a random variable.I Variables are independent but not identically distributed.


Conditional Distributions

Remember for two events A and B: conditional probability of A given B is:

P(A | B) =P(A,B)

P(B)

for P(B) > 0.


Conditional Probability Mass Function

For X ,Y discrete random variables, define the conditional probability massfunction of X given Y = y by

pX |Y (x |y) = P(X = x | Y = y) =P(X = x ,Y = y)

P(Y = y)=

p(x , y)

pY (y)

for pY (y) > 0.


Examples 4.3b,f from Ross

Question: In a community, 15% families have no children, 20% have 1, 35% have2 and 30% have 3 children. Each child is equally likely to be a boy or girl. Wechoose a family at random. Given that the chosen family has one girl, computethe probability mass function of the number of boys in the family?G: number of girls, B: number of boys, C: number of childrenTo find: P(B = i |G = 1), i = 0, 1, 2, 3.

P(B = i |G = 1) =P(B = i ,G = 1)

P(G = i), i = 0, 1, 2, 3

First find P(G = 1)

{G = 1} = {G = 1} ∩ ({C = 0} ∪ {C = 1} ∪ {C = 2} ∪ {C = 3})P(G = 1) = P(G = 1,C = 0) + P(G = 1,C = 1) + P(G = 1,C = 2)

+ P(G = 1,C = 3)

since C = 0,C = 1,C = 2,C = 3 are mutually exclusive events with union a S .Then,

P(G = 1) = P(G = 1 | C = 0)P(C = 0) + P(G = 1 | C = 1)P(C = 1) + ...

= 0 + (1/2)× 0.2 + ... = 0.3875


Example continued

Then,

P(B = 0 | G = 1) =P(B = 0,G = 1)

P(G = 1)

Numerator= P(G = 1andC = 1) = P(G = 1 | C = 1)P(C = 1) = (1/2)0.2 = 0.1. Then,

P(B = 0 | G = 1) = 0.1/0.3875 = 8/31

Similarly:P(B = 1 | G = 1) = 14/31,P(B = 2 | G = 1) = 9/31,P(B = 3 | G = 1) = 0.Check: Sum of conditional probabilities is 1.


Conditional Probability Density Function

For Random Variables X ,Y , conditional probability density of X given that Y = yis defined as:

fX |Y (x | y) =f (x , y)

fY (y)

for fY (y) > 0.Hence, can make statements on probabilities of X taking values in some set Agiven the value obtained by Y as:

P(X ∈ A | Y = y) =

∫A

fX |Y (x | y)dx


Independence and Conditional Probabilities

If X ,Y are independent, then

pX |Y (x |y) = pX (x)

fX |Y (x |y) = fX (x)


Temperature Control Example (Continued), Example 5.2(Ogunnaike, 2009) Earlier

1 Find Conditional density function: fX1|X2(x1|x2).

f (x1, x2)/fX2(x2) =1

5e−0.2x1

which is same as fX1(x1) in this example.

2 Similarly, fX2|X1(x2|x1) = fX2(x2) in this example.

Generic Question: If fX1|X2(x1|x2) = fX1(x1), then is fX2|X1

(x2|x1) = fX2(x2)?Answer: Yes


Example 5.5 (Ogunnaike, 2009)

fX1,X2 =

{x1 − x2, 1 < x1 < 2, 0 < x2 < 10, otherwise

Find: Conditional probability densities.Answer: Compute marginals

fX1(x1) =

{(x1 − 0.5), 1 < x1 < 20, otherwise

fX2(x2) =

{(1.5− x2), 0 < x2 < 10, otherwise

Then compute conditionals

fX1|X2(x1|x2) =

(x1 − x2)

(1.5− x2), 1 < x1 < 2

fX2|X1(x2|x1) =

(x1 − x2)

(x1 − 0.5), 0 < x2 < 1

The random variables X1,X2 are not independent.MB+SCP (IIT Bombay) CL202 Spring 2016 38 / 45

Plots


Independence of Transformations

If random variables X ,Y are independent, then the random variables

Z = g(X ),U = h(Y )

are also independent.Proof: Let Az denote the set of points on the x-axis such that g(x) ≤ z and Bu

denote the set of points on the y-axis such that h(y) ≤ u. Then,

{Z ≤ z} = {X ∈ Az}; {U ≤ u} = {Y ∈ Bu}

Thus, the events {Z ≤ z} and {U ≤ u} are independent because events{X ∈ Az} and {Y ∈ Bu} are independent.


Expected Value

By analogy with transformation of a single RV, expected value of a transformationof multiple RVs can be defined as:

E [g(X ,Y )] =

∫ ∞−∞

∫ ∞−∞

g(x , y)f (x , y)dxdy

For discrete RVs, the above becomes

E [g(X ,Y )] =∑y

∑x

g(x , y)p(x , y)


Special Cases

g(X ,Y ) = X + Y . Then,

E (g(X ,Y )) = E [X ] + E [Y ]

g(X ,Y ) = (X − E [X ])(Y − E [Y ]): covariance of X,Y; labeled Cov(X,Y).

Correlation coefficient: ρ

ρ =Cov(X ,Y )

σXσY

Property: ρ dimensionless, −1 ≤ ρ ≤ 1.


Independence versus Covariance

If X ,Y are independent, then

Cov(X ,Y ) = 0

Independence =⇒ covariance=0 (variables uncorrelated)

Covariance=0 6=⇒ independenceExample: X,Y take values (0, 1), (−1, 0), (0,−1), (1, 0) with equal probability(1/4).Cov(X,Y)=0, but X,Y not independent.


Independence Implications

g(X ,Y ) = XYI If X ,Y independent,

E [XY ] = E [X ]E [Y ]

g(X ,Y ) = h(X )l(Y )I If X ,Y independent,

E [h(X )l(Y )] = E [h(X )]E [l(Y )]


THANK YOU


cl 202 multiplerandomvars

Documents