probability and basic statistics
TRANSCRIPT
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
BIN504 - Lecture IProbability, Random Variables, and Basic Statistics
c©2012-13 Aybar C. Acar
Based on slides by Tolga Can and Jae K. Lee
Rev. 1.1 (Build 20130305220800)
BIN 504 - Probability & Basic Statistics 1 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Probability
Definition
Probability is a measure of the likeliness that a random event willoccur.
There are two major schools:
Frequentists: “Probability is the relative frequency of occurrence ofsome event after repeating a process a large number oftimes under similar conditions.”
Can only treat processes that are repeatable andwell-defined.Formal and objective.
Subjectivists: “Probability is the degree of belief that the individualmaking the assessment has in the occurrence of someevent.”
Can assign a probability (credence) to any statement.Not very formal and not objective at all.
BIN 504 - Probability & Basic Statistics 2 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Sample Space
Definition
The set of all possible outcomes of a random process is called thesample space.
Example
When tossing a coin, the sample space U is:
S = {(Heads), (Tails)}
When tossing two coins:
S = {(Heads,Heads), (Heads,Tails), (Tails,Heads), (Tails,Tails)}
Years to failure for a light bulb:
S = [0,+∞)
BIN 504 - Probability & Basic Statistics 3 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Event
Definition
An event is any collection of possible outcomes of a randomprocess (experiment).
Is a subset of the sample space for the process.
Example
Experiment: tossing two coins.Event: getting exactly one head.
{(H,T ), (T ,H)} ⊂ {(H,H), (H,T ), (T ,H), (T ,T )}
Experiment: Years to failure for a light bulb.Event: The lightbulb lasting more than 1 year.
(1,+∞) ⊂ [0,+∞)
BIN 504 - Probability & Basic Statistics 4 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Event Algebra
For all outcomes x :
Union
A ∪ B = {x : x ∈ A ∨ x ∈ B}
Intersection
A ∩ B = {x : x ∈ A ∧ x ∈ B}
Complementation
¬A = {x : x /∈ A}
Disjoint events
A ∩ B = ∅
BIN 504 - Probability & Basic Statistics 5 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Frequentist Event Probability
Assume U is the set of all the experiments ever done.
P(A) =|A||U|
P(B) =|B||U|
P(¬A) =|U − A||U|
P(¬B) =|U − B||U|
Also called prior or marginal probabilities of some event(e.g. A, B)
Sometimes very difficult to calculateWe can only observe U partially.God’s dataset
BIN 504 - Probability & Basic Statistics 6 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Probability Functions
Definition
Any function that satisfies these basic axioms is probabilityfunction:
P(A) ≥ 0
P(S) = 1
If A and B are disjoint: P(A ∪ B) = P(A) + P(B)
Example
Experiment: tossing two coinsEvent A: getting exactly one headP(A) = 0.5Since each state is equally likely.
BIN 504 - Probability & Basic Statistics 7 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Joint and Conditional Probability
Joint probability
Frequency of two eventsoccurring together
P(A,B) = P(A ∩ B) =|A ∩ B||U|
Conditional probability
Frequency of one event given theother has occured.“Probability of A given B”
P(A|B) =|A ∩ B||B|
“Probability of B given A”
P(B|A) =|A ∩ B||A|
BIN 504 - Probability & Basic Statistics 8 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Conditional Independence
Definition
Event A is independent of event B if P(A|B) = P(A).
Axiom: If two events A and B are mutually independent:
P(A,B) = P(A)P(B)
Careful: Independence does not mean disjointness.
Example
Tossing a fair six-sided die. A = {the result is even},B = {the result is > 2}
P(A|B) = 1/2P(B|A) = 2/3P(A) = 1/2P(B) = 2/3
BIN 504 - Probability & Basic Statistics 9 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Bayes’ Theorem
Theorem
Bayes’ Theorem states that:
P(A|B) =P(A,B)
P(B)
Likewise:
P(B|A) =P(A,B)
P(A)
Therefore:
P(A|B) =P(B|A)P(A)
P(B)
BIN 504 - Probability & Basic Statistics 10 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Law of Total Probability
Theorem
Let B1,B2 . . .Bk be a partition of state space S.(i.e. never occur together, and one of them must occur)Let A be some other event.
P(Bi |A) =P(A|Bi )P(Bi )
P(A)=
P(A|Bi )P(Bi )∑kj=1 P(A|Bj)P(Bj)
Example
A novel diagnosis array is 95% effective in detecting a certain diseasewhen it is present. The test also has a 1% false positive rate. If 0.5% ofthe population has the disease (B), what is the probability a person witha positive test (A) result actually has the disease?
P(B|A) =P(A|B)P(B)
P(A|B)P(B) + P(A|¬B)P(¬B)=
0.95× 0.005
0.95× 0.005 + 0.01(1− 0.005)= 0.323
BIN 504 - Probability & Basic Statistics 11 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Random Variables
Definition
A random variable (r.v.) associates a unique numerical value witheach outcome in the sample space. It is a real-valued functionfrom a sample space S into real numbers.
Like events, random variables are denoted by uppercase letters(e.g., X or Y )
Particular values that are taken by a r.v. are denoted by thecorresponding lowercase letter (e.g. x or y)If the range of an r.v. is finite or countably infinite, it is calleddiscrete.
Toss three coins. X = number of headsWatch a bulb until it fails. X = lifetime in minutes
If an r.v. X is continuous, it can take any value from one ormore intervals of real numbers.
Pick an Informatics Institute student. X = GPA of student
BIN 504 - Probability & Basic Statistics 12 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
The Monty Hall Problem
Image: American Broadcasting Companies Inc.
BIN 504 - Probability & Basic Statistics 13 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
The Monty Hall Problem
1 There are three doors.2 Behind one is a prize, the other two have nothing.3 You pick one door. You do not open it yet.4 Monty opens one of the doors you didn’t pick and it’s empty.5 Monty then asks if you want to stick with your original choice or
switch to the other door
Is it better to stick or switch?BIN 504 - Probability & Basic Statistics 14 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Monty Hall: Random Variables
Let’s say there are three r.v.’s (C,X,M) each with the possiblevalues {1, 2, 3}.
The r.v. denoting your choice is C
You choose at random, so P(C = c) = 1/3
The r.v. denoting the prize door is X
This too is random, so P(X = x) = 1/3You do not know which door the prize is behind, so:
P(C = c |X = x) = P(C = c)
The r.v. denoting the door Monty opens is M.
P(M = m|X = x ,C = c) =
0, if m = c (Monty can’t open your door)
0, if m = x (Monty can’t open the prize door)
1/2, if c = x (Monty can choose either door)
1, if c 6= x (Monty has only one option)
BIN 504 - Probability & Basic Statistics 15 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Monty Hall: Bayes’ Theorem
The probability we are interested in is:
P(X = x |C = c ,M = m) =P(M = m|X = x ,C = c)P(C = c |X = x)
P(M = m|C = c)
which can be simplified to:
P(X = x |C = c ,M = m) =1/3P(M = m|X = x ,C = c)
P(M = m|C = c)
Since we know C is independent of X. The denominator can be found by:
P(M = m|C = c) =3∑
x=1
P(M = m,X = x |C = c)
=3∑
x=1
P(M = m|X = x ,C = c)P(C = c |X = x)
= 1/33∑
x=1
P(M = m|X = x ,C = c)
BIN 504 - Probability & Basic Statistics 16 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Monty Hall: Result
So, now we have:
P(X = x |C = c ,M = m) =1/3P(M = m|X = x ,C = c)
1/3∑3
x=1 P(M = m|X = x ,C = c)
Let’s say c=1, m=3.The chances of winning by switching (i.e. X=2 ) is:
P(X = 2|M = 3,C = 1) =1
1/2 + 1 + 0= 2/3
It’s better to switch!
BIN 504 - Probability & Basic Statistics 17 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Probability Distributions
Random variables are defined by their probability distributions.
The probability distribution of an r.v. is the function whichmaps to the probability of it having a value, P(X = x) for allpossible values x
Example
If X is the r.v. representing the outcome of a coin toss, it isdistributed as
f (x) = 0.5 ∀x ∈ {heads, tails}
The distinction between discrete and continuous randomvariables is whether the domain x is discrete (e.g. integer) orcontinuous (e.g real)
BIN 504 - Probability & Basic Statistics 18 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Mass and Density Functions
The probability distribution function for a discrete r.v. X iscalled its Probability Mass Function (pmf):
fX (x) = P(X = x)
For continuous variables the situation is a bit complex.Since the x range is uncountably infinite the probability of asingle value will be infinitesimal.limprecision→∞ P(X = x) is zero
We define fX (x) in terms of the probability between twovalues:
P(xl ≤ X ≤ xh) =
∫ xh
xl
fX (x)dx
fX (x) is then called the Probability Density Function (pdf) ofX.
BIN 504 - Probability & Basic Statistics 19 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Cumulative Distribution Functions
We are often interested in the probability of a r.v. X havingsome value up to and including x
The function FX (x) that defines this is the CumulativeDistribution Function (CDF) of X
For discrete variables:
FX (x) =∑∀t≤x
f (t)
For continuous variables:
FX (x) =
∫ x
−∞f (t)dt
BIN 504 - Probability & Basic Statistics 20 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Example
Assume that a lab rat is observed for a while and the number ofmeals it eats each day is X . The number of hours it sleeps eachday is defined by the random variable Y . The rat:
eats two meals half the time, the remaining days it will eateither one meal or no food with equal probability.sleeps 10± 4 hours each day, uniformly distributed.
e.g. it is equally likely to sleep 9.88232 hours as it is to sleep13.432323̄ hours
fX (x) =
0.25, y = 0
0.25, y = 1
0.50, y = 2
FX (x) =
0.25, y = 0
0.50, y = 1
1, y = 2
BIN 504 - Probability & Basic Statistics 21 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Example (cont.)
FY (y) =
0 y < 6(y−6)
8 6 ≤ y ≤ 14
1 y > 14
fY (y) =dFY (y)
dy
in other words:
fY (y) =
0 y < 618 6 ≤ y ≤ 14
0 y > 14
BIN 504 - Probability & Basic Statistics 22 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Expectation
An important summary statistic of an r.v. is what we expectit will be.
This is called the Expected Value or mean of a r.v.
For discrete variables:
E [X ] = µX =∑x
xf (x) = x1f (x1) + x2f (x2) + . . . xk f (xk)
For continuous variables:
E [X ] = µX =
∫ +∞
−∞xf (x)dx
BIN 504 - Probability & Basic Statistics 23 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Example
The expected number of meals for our rat:
E [X ] = (0)0.25 + (1)0.25 + (2)0.50 = 1.25 meals
The expected sleep time:
E [Y ] =
∫ 14
6
y
8dy =
142 − 62
8× 2= 10 hours
BIN 504 - Probability & Basic Statistics 24 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Functions on Expectation
The expected value of any arbitrary function g(x) applied to X is:
E [g(X )] =∑x
g(x)f (x) = g(x1)f (x1)+g(x2)f (x2)+. . . g(xk)f (xk)
or
E [g(X )] =
∫ +∞
−∞g(x)f (x)dx
Example
The expected number of hours above 8 hours that our rat sleeps(i.e. g(Y ) = y − 8)
E [g(Y )] =
∫ 14
6
(y − 8)
8dy =
(14− 8)2 − (6− 8)2
8× 2= 2
BIN 504 - Probability & Basic Statistics 25 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Linearity and Product of Expectation
Linear combinations of expected variables are possible:
E [c1X + c2Y ] = c1E [X ] + c2E [Y ]
The expected value of the product of two variables is definedin terms of the joint probability fXY (x , y):
E [XY ] =
∫x
∫yxy fXY (x , y)dxdy
Caution
fXY (x , y) = fX (y)fY (y) if and only if X and Y are conditionallyindependent.
BIN 504 - Probability & Basic Statistics 26 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Example
What is the expected number of meals per hour of sleep for ourrat? Assuming independence (fXY (x , y) = fX (y)fY (y)):
E
[X
Y
]=
∫x
∫yx
1
yfX (x)fY (y)dxdy
E
[X
Y
]=∑x
x fX (x)
∫y
1
yfY (y)dy
1.25
∫ 14
6
fY (y)
ydy = 1.25
(ln(14)− ln(6)
8
)= 0.132
Note that it is not 0.125 ( 1.25/10 )!
What if we assume the rat eats one less meal than normal if it gotless than 8 hours of sleep?
BIN 504 - Probability & Basic Statistics 27 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Variance
Definition
Variance is defined as square the expected deviation from the mean.
Var(X ) = E [(X − µX )2] = E (X 2)− µ2X
Standard deviation σ is defined as:
σ =√Var(X )
Example
The variance of the sleep time for the rat is:
E(Y 2) − µ2Y =
∫ 14
6
y 2 f (Y )dy − 100
=(143 − 63)
3 × 8− 100 = 5.333̄
σY = 2.309
BIN 504 - Probability & Basic Statistics 28 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Covariance
Definition
Covariance is the measure of the dependence of two randomvariables to each other.
Cov(X ,Y ) = E [(X − µX )(Y − µY )]
Notice that:
E [(X − µX )(Y − µY )] = E (XY )− E (X )E (Y )
if X and Y are independent: fXY (x , y) = fX (x)fY (y)
fXY (x , y) = fX (x)fY (y) ⇒ E (X ,Y ) = E (X )E (Y )
E (X ,Y ) = E (X )E (Y ) ⇒ Cov(X ,Y ) = 0
Covariance is zero if the variables are independent.
BIN 504 - Probability & Basic Statistics 29 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Correlation
Definition
Pearson Correlation is a normalized measure of the dependenceof two random variables to each other.
corr(X ,Y ) =Cov(X ,Y )
σXσY
Essentially the covariance normalized by the products of thestandard deviations.
Like covariance, corr(X ,Y ) = 0 if X and Y are independent.
Furthermore,
corr(X ,Y ) = 1 denotes perfect positive dependencecorr(X ,Y ) = −1 denotes perfect negative dependence
BIN 504 - Probability & Basic Statistics 30 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Correlation Illustration
1 0.8 0.4 0 -0.4 -0.8 -1
1 1 1 -1 -1 -1
0 0 0 0 0 0 0
Illustration by: Denis Boigelot
BIN 504 - Probability & Basic Statistics 31 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Distributions of More than One RV
Assume we do a survey of 100 people and ask them how many kids(rows) and how many vases (columns) they have:
1 2 3 4 Total
1 5 5 5 5 202 15 12 10 3 403 11 8 5 1 254 8 5 2 0 15
Total 39 30 22 9 100
The pmf of the joint distribution, f(k,v), is N(k,v)/100.
e.g. P(K = 3,V = 4) = 0.01 whereasP(K = 1,V = 4) = 0.05It seems number of kids is inversely correlated with number ofvases.
BIN 504 - Probability & Basic Statistics 32 of 33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions
Joint and Marginal Distributions
The joint distribution of two r.v.s then satisfies:∫x
∫yf (x , y)dxdy = 1
We can find f (x) by integrating (or summing) over y
f (x) =
∫yf (x , y)dy
and vice versa, for f (y)
f (x) and f (y) are the Marginal distributions.
So, the joint distribution carries information about themarginal distributions as well.
The converse is not true unless f (x , y) = f (x)f (y)f (x , y) = f (x)f (y) when Cov(X ,Y ) = 0
BIN 504 - Probability & Basic Statistics 33 of 33