unit 2 : random variables and their distributions
DESCRIPTION
Unit 2 : Random Variables and their Distributions. Wenyaw Chan Division of Biostatistics School of Public Health University of Texas - Health Science Center at Houston. Random Variable. Random Variable: A numeric function that assigns probabilities to different events in a sample. - PowerPoint PPT PresentationTRANSCRIPT
Unit 2 : Random Variables and their Distributions
Wenyaw ChanDivision of BiostatisticsSchool of Public Health
University of Texas- Health Science Center at Houston
Random Variable• Random Variable:
– A numeric function that assigns probabilities to different events in a sample.
• Discrete Random Variable: – A random variable that assumes only a finite or denumerable
number of values. – The probability mass function of a discrete random variable X that
assumes values x1, x2,… is p(x1), p(x2), …., where p(xi)=Pr[X= xi].
• Continuous Random Variable: – A random variable whose possible values cannot be enumerated.
Example: Flip a coin 3 times• Random Variable
– X = # of heads in the 3 coin tosses• Probability Mass Function
– P(X=3) = P{(HHH)} =1/8– P(X=2) = P{HHT, HTH, THH}= 3/8– P(X=1) = P{HTT,THT, TTH} = 3/8– P(X=0) = P{TTT} = 1/8
• X is a discrete random variable with probability (mass) function
x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8
Random Variable
Expected value of X :
Variance of X :
Standard Deviation of X: =
k
iii xXxXE
1)Pr()(
2)( XVar
k
iii xXx
1
2 )Pr()(
)(XVar
Random Variable
• Note :
• Cumulative Distribution Function– of X : Pr(X<=x) = F(x)
2)()( XEXVar22 )]([)( XEXE
Binomial Distribution
• Examples of the binomial distribution have a common structure:– n independent trials– each trial has only two possible outcomes, called
“success” and “failure”.– Pr (success) = p for all trials
Binomial Distribution• If X= # of successful trials in these n trials, then X has a
binomial distribution.
• k=0,1,2,….,n • where • Example: Flip a coin 10 times
knkn
kppkXP
)1(
!( )! !
n nk n k k
Properties of Binomial Distribution
• If X~ Binomial (n, p), thenE(X) = np
Var (X) = np(1-p)
Poisson Distribution
k=0,1,2,…..
If X~ Poisson (), then EX = and VarX =
!
Prk
ekXk
Poisson Process• Assumption 1:
– Pr {1 event occurs in a very small time interval [0,t)} t– Pr {0 event occurs in a very small time interval [0, t)}1- t– Pr{more than one event occurs in a very small time interval [0, t)}0
• Assumption 2:– Probability that the number of events occur per unit time is the same
through out the entire time interval • Assumption 3:
– Pr {one event in [t1,t2) | one event in [t0, t1)} = Pr {one event in [t1, t2)}
Poisson Distribution• X=The number of events occurred in the time period t for
the above process with parameter, then mean=t and
where k= 0,1,2,… and e= 2.71828
E(X)=Var(X)=t
!)()(Pr
ktekXkt
Poisson approximation to Binomial
• If X~ Binomial (n, p), n is large and p is small, then
!)()(
knpekXPknp
Continuous Probability Distributions
• Probability density function (p.d.f.) (of a random variable):– a curve such that the area under the curve
between any two points a and b, equals – Prob[a x b ]= ∫ a x bf(x)dx
a b
Pr(a<=X<=b)
Continuous Probability Distributions
• Cumulative distribution function: Pr(x a)
a
Pr(X<=a)
Continuous Probability Distributions
• The expected value of a continuous random variable X is
∫ xf(x)dx, where f(x) is the p.d.f. of X.• The definition for the variance of a
continuous random variable is the same as that of a discrete random variable, i.e.
Var(X)=E(X2)- (EX)2=∫(x-µ)2f(x)dx, where µ=E(X).
The Normal Distribution (The Gaussian distribution)
•
• The p.d.f. of a normal distribution exp , - < x <
21)( xf
2
2 )(2
1
x
The Normal Distribution
• figures: a bell-shaped curve symmetric about
• Notation: X~N(, 2 ) : mean
2 : variance
u-s u u+s
s s
point of inflection
The Normal Distribution
• N(0,1) is the standard normal distribution• If X~ N(0,1), then
– ~ : “is distributed as” , – : c.d.f. for the standard normal r.v.
• Note:– The point of inflection is a point where the slope of
the curve changes its direction.
)Pr()( xXx
Properties of the N(0,1)
• 1. (-x) = 1-(x)• 2.
– About 68% of the area under the standard normal curve lies between –1 and 1.
– About 95% of the area under the standard normal curve lies between –2 and 2.
– About 99% of the area under the standard normal curve lies between –2.5 and 2.5.
Properties of the N(0,1)
• If X~ N(0,1) and P(X< Zu)=u, 0 u 1 then Zu is called the 100uth percentile of the
standard normal distribution.95th %tile=1.645, 97.5th %tile=1.96, 99th %tile=2.33
Zu
Area=u
Properties of the N(0,1)
• If X~ N(, 2), then
• This property allows us to calculate the probability of a non-standard normal random variable.
~ (0,1)X N
r ra X bP a X b P
ab
Other Distributions--- t distribution
• Let X1, ….Xn be a random sample from a normal population N(, σ2).
Then
has a t distribution with n-1 degrees of freedom (df).
/Xs n
Other Distributions---Chi-square distribution
• Let X1, ….Xn be a random sample from a normal population N(0, 1).
Then
has a chi-square distribution with n degrees of freedom (df).
2
1
n
ii
X
Other Distributions---F distribution
• Let U and V be independent random variables and each has a chi-square distribution with p and q degrees of freedom respectively.
Then
has a F distribution with p and q degrees of freedom (df).
//
U pV q
Covariance and Correlation
• The covariance between two random variables is defined by
Cov(X,Y)=E[(X-µX)(Y-µY)].• The correlation coefficient between two
random variables is defined by ρ=Corr(X,Y)=Cov(X,Y)/(σX σ Y).
Variance of a Linear Combination
• Var(c1X1 + c2X2)
),(2
)()(
2121
2221
21
XXCovcc
XVarcXVarc
),(2
)()(
2121
2221
21
XXCorrcc
XVarcXVarc
YX