unit 2: review of probability - university of...
TRANSCRIPT
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 1
Unit 2: Review of Probability
Statistics 571: Statistical MethodsRamón V. León
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 2
Approaches to Probability• Approaches to probability
– Classical approach– Frequentist– Personal or subjective approach– Axiomatic approach
• Basic ideas of axiomatic approach– Sample space– Events– Union– Intersection– Complement– Disjoint or mutually exclusive events– Inclusion
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 3
Axioms of Probability• Axioms:
– P(A) ≥0– P(S) = 1 where S is the sample space– P(A ∪ B) = P(A) + P(B) if A and B are mutually
exclusive events• Theorems about probability can be proved using these
axioms• These theorems can be used in probability calculations
– E.g. assuming all elements of the sample space are equally likely
– Counting arguments used. (Take a look at Birthday Problem on Page 13.)
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 4
Conditional Probability and Independence• Conditional probability
– P(A | B) = P (A ∩ B) / P(B)• Events A and B are mutually independent if P (A | B) = P(A)
– Implies P (A ∩ B) = P(A)P(B)
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 5
Tossing Two Dice
(6,6)(6,5)(6,4)(6,3)(6,2)(6,1)6(5,6)(5,5)(5,4)(5,3)(5,2)(5,1)5(4,6)(4,5)(4,4)(4,3)(4,2)(4,1)4(3,6)(3,5)(3,4)(3,3)(3,2)(3,1)3
(2,6)(2,5)(2,4)(2,3)(2,2)(2,1)2(1,6)(1,5)(1,4)(1,3)(1,2)(1,1)1
654321
First Die Outcome
Second Die Outcome
Sample space has 6 x 6 = 36 outcomes
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 6
Conditional Probability Example
P(A)=8/36
P(B)=18/36
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 7
AIDS Example
100009900100
941094055TestNegative
59049595Test positive
Not AIDSAIDS
P(A) = 100/10000 =.01 P(+|A) = 95/100 =.95 P(-|~A) = 9405/9900 =.95P(A|+) = 95/590 =.16
The usual way of solving this problem uses Bayes Theorem
Given
Conclude
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 8
What Does a Positive HIV Test Means?
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 9
Independence Example
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 10
Random Variables
• A random variable (r.v.) associates a unique numerical value with each outcome in the sample space
• Example:
• Discrete random variables: number of possible values is finite or countably infinite: x1, x2, x3, x4, x5, x6, …
• Probability mass function (p.m.f.)– f(x) = P(X= x )
• Cumulative distribution function (c.d.f.)
– F(x) = P (X ≤ x) =
10
X =
if coin toss results in a head
if coin toss results in a tail
( )k x
f k≤∑
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 11
Discrete Random Variable Example
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 12
Graphs of Mass Function and Distribution
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 13
Continuous Random Variables
( ) 0f x ≥
•An r.v. is continuous if it can assume any value from one or more intervals of real numbers•Probability density function f(x) :
( ) 1f x dx∞
−∞
=∫
( ) ( )b
a
P a X b f x dx≤ ≤ = ∫ for any a b≤
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 14
Cumulative Distribution Function
The cumulative distribution function (c.d.f.), denoted by F(x) , for a continuous random variable is given by:
( ) ( ) ( )x
F x P X x f y dy−∞
= ≤ = ∫
It follows that ( )( ) dF xf x
dx=
x
f(x)F(x)
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 15
Exponential Distribution Example
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 16
Mean and Variance of Sum of Two Dice Tosses2( ) ( ), ( ) ( ( ))E X xp x Var X x E X= = −∑ ∑
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 17
Mean and Variance of Sum of Two Dice Tosses
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 18
Expected Value or MeanThe expected value or mean of a discrete r. v. X denoted by E(X), µX , or simply µ, is defined as:
1 1 2 2( ) ( ) ( ) ( ) ...x
E X xf x x f x x f xµ= = = + +∑The expected value of a continuous r. v. is defined as:
( ) ( )E X xf x dxµ= = ∫
0
Mean of Exponetial Distribution 1( ) xE X x e dxλλλ
∞ −= =∫
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 19
Variance and Standard Deviation: SummaryThe variance of an r.v. X, denoted by Var(X), 2
Xσ , or simply 2σis defined as
2 2( ) ( )Var X E Xσ µ= = −We can show that
( )22( ) ( ) ( )Var X E X E X= −The standard deviation (SD) is the square root of the variance
Challenge exercise: Show that for the exponential distribution the standard deviation is 1/λ
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 20
Variance of the Mean of independent, Identically Distributed Random Variables
( )( )
1
2 1
2 1
22
2
( )
1
1
1
nii
nii
nii
XVar X Var
n
Var Xn
Var Xn
nn
n
σ
σ
=
=
=
=
= = =
=
∑
∑
∑ by independence
since the r.v.’s are identically distributed
1 2
We often refer to , ,..., as
a randon samplewith replacement orfrom a very large population
nX X X
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 21
Quantiles and PercentilesFor 0 1p≤ ≤ the pth quantile (or the 100pth percentile), denoted by
pθ ,of a continuous r.v. X is defined by the following equation:
( ) ( )p pP X F pθ θ≤ = =
.5θ is called the median
pF(x)
θp
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 22
Exponential Distribution Percentiles
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 23
Jointly Distributed Random Variables
( , ) joint probability mass function
f x y =
32 0.16200
=
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 24
Marginal Distribution
Discrete: ( ) ( ) ( , )
Continuous: ( ) ( ) ( , )
y
X
g x P X x f x y
g x f x f x y dy∞
−∞
= = =
= =
∑
∫
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 25
Conditional Distribution
( , )( | ) ( | )( )
f x yf y x P Y y X xg x
= = = =
Conditional probability mass function (p.m.f.):( 4, 1) 0.005( 1| 4)
( 4) 0.315P X YP Y X
P X= =
= = = ==
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 26
Independent Random Variables and are independent r.v.'s if ( , ) ( ) ( )
( , )Note that ( | ) ( )( )
X Y f x y g x h yf x yf y x h yg x
=
= =
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 27
Covariance and Correlation
If X and Y are independent then E(XY)=E(X)E(Y) so the covarianceis zero. The other direction is not true.
( , ) ( )( ) ( ) ( ) ( )XY X YCov X Y E X Y E XY E X E Yσ µ µ= = − − = −
Note that: ( ) ( , )E XY xyf x y dxdy∞ ∞
−∞ −∞= ∫ ∫
( , )( , )var( ) var( )
XYXY
X Y
Cov X Ycorr X YX Y
σρσ σ
= = =
Measures strength of linear association
1 1XYρ− ≤ ≤
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 28
Covariance Example
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 29
Correlation Example
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 30
Chebyshev’s InequalityLet c > 0 be a constant. Then, irrespective of thedistribution of X,
2
2( )P X ccσµ− ≥ ≤
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 31
Weak Law of Large Numbers
Let X be the sample mean of n i.i.d. observations from a
population with finite mean µ and variance 2σ . Then forany fixed c > 0
2
2( ) 0P X cncσµ− ≥ ≤ → as n→∞
We see that X approaches µ as n gets large.
This follows from Chebyshev’s inequality and the fact that2
( ) and ( )E X Var Xnσµ= =
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 32
Selected Discrete Distributions
Bernoulli distribution:
( ) ( )1p
f x P X xp
= = = −
if x = 1if x = 0
( ) ,E X p= ( ) (1 )V ar X p p= −
Binomial distribution:
( ) ( ) (1 )x n xnf x P X x p p
x−
= = = −
for x = 0, 1, …,n
( ) ,E X np= ( ) (1 )Var X np p= −
( )5! 5!, e.g., 103! ! 3!2!
n nk k n k
= = = −
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 33
Binomial Distribution Example
Suppose that the probability of a thumbtack landing with the pin up is 0.9. If we toss the thumbtack ten times what is the probability that it lands with the pin up exactly 7 times?
7 3 7 310( 7) (.9) (1 .9) 120(.9) (.1) .057
7P X
= = − = =
Answer:
See Example 2.30, Page 43 for another application of the Binomial distribution
( ) 10 .9 9, ( ) (1 ) 10 .9 .1 .9E X np Var X np p= = × = = − = × × =
Also note:
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 34
Hypergeometric Distribution(Sampling without replacement from a small population)
A lot of 50 tables has two defective tables. A sample of five tables are selected without replacement. What is the probabilitythat none of these five tables is defective?
2 480 5
( 0) .8082505
P X
= = =
Suppose the five tables had been selected with replacement? What would then be the probability?
0 55 2 48( 0) .81540 50 50
P X = = =
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 35
( ) ( ) ,!
xef x P X xx
λλ−
= = = for x = 0, 1, 2, …
( ) ,E X λ= ( )Var X λ=
Poisson Distribution:
Example: On the average five Prussian soldiers die from horsekicks in a year. What is the probability that exactly four soldiers are killed this way in a given year?
5 4(5)( 4) .1754!
eP X−
= = =
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 36
Geometric Distribution
Probability of waiting time to an event in n independent trials
1
2
( ) (1 ) , 1, 2,...1 1( ) and ( )
xP X x p p xpE X Var X
p p
−= = − =−
= =
Suppose the probability of winning the jackpot in a slot machineis .01. What is the expected number of tries to win the jackpot?What the is the probability that you hit the jackpot for the first time on your fifth try?
41( ) 100, ( 5) (.99) (.01) .0096.01
E X P X= = = = =
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 37
Uniform DistributionDistribution when all values in an interval are equally likely
Suppose that you select a real number at random in the interval [1,5]. What is the probability that it turns out to be between 2and 4?
4 2(2 4) 0.55 1
P X −≤ ≤ = =
−Proportion of the lengths of the intervals [2,4] to the length of the interval [1,5]
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 38
Exponential DistributionDistribution of waiting time when arrivals occur at random
0( ) , ( ) 1 for 0
xx t xf x e F x e dt e xλ λ λλ λ− − −= = = − ≥∫2
1 1( ) and ( )E X Var Xλ λ
= =
( ) 1 ( ) 1 ( ) xP X x P X x F x e λ−> = − ≤ = − =
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 39
Memoryless Property of the Exponential Distribution
( ) ( )|P X s t X s P X t> + > = >
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 40
Normal DistributionA continuous r.v. X has a normal distribution with parameter µand 2σ if its p.d.f. is given by
2
2( )
21( ) for 2
x
f x e xµσ
σ π
−−
= −∞< <∞
( )E X µ= 2( )Var X σ=and
2~ ( , )X N µ σNotation:
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 41
Standard Normal Distribution = N(0,1)2If ~ ( , ) then ~ (0,1)XX N Z Nµµ σ
σ−
=
( ) X x xP X x P Z µ µ µσ σ σ− − − ≤ = = ≤ = Φ
2~ (205,5 )( 200)
205 200 2055 5
( 1) ( 1) 0.1587
X NP X
XP Z
P Z
< =
− − = < =
< − = Φ − =
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 42
Standard Normal Table
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 43
Empirical Rule
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 44
Mean of i.i.d. Normal Random Variable
1 2
21 2
2
21
Let , ,..., be independent, indentically
distributed ( , ). We say that , ,...,
is a random sample from a ( , ) population.
Then for we have ~ , .
n
n
nii
X X XN X X X
N
XX X N
n n
µ σ
µ σ
σµ= =
∑
Hint: Use this result to do homework problem 2.83
6/2/2003 Unit 2 - Stat 571 - Ramón V. León 45
Percentiles of the Normal Distribution
Suppose that the scores on a standardized test are normally distributed with mean 500 and standard deviation 100. Whatis the 75th percentile score of this test?
500 500 500 500( ) .75100 100 100 100
X x x xP X x P P Z− − − − ≤ = ≤ = ≤ = Φ =
500 0.675 500 (0.675)(100) 567.5100
x x−= ⇒ = + =
From Table A.3 (0.675) 0.75.Φ = So
For 75 percentile means that ( ) .75. Sothx P X x= ≤ =