probability and basic statistics

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

BIN504 - Lecture IProbability, Random Variables, and Basic Statistics

c©2012-13 Aybar C. Acar

Based on slides by Tolga Can and Jae K. Lee

Rev. 1.1 (Build 20130305220800)

BIN 504 - Probability & Basic Statistics 1 of 33

http://www.ceng.metu.edu.tr/~tcan

http://geossdev.med.virginia.edu/research/teaching/teaching.html


Probability

Definition

Probability is a measure of the likeliness that a random event willoccur.

There are two major schools:

Frequentists: “Probability is the relative frequency of occurrence ofsome event after repeating a process a large number oftimes under similar conditions.”

Can only treat processes that are repeatable andwell-defined.Formal and objective.

Subjectivists: “Probability is the degree of belief that the individualmaking the assessment has in the occurrence of someevent.”

Can assign a probability (credence) to any statement.Not very formal and not objective at all.



Sample Space

Definition

The set of all possible outcomes of a random process is called thesample space.

Example

When tossing a coin, the sample space U is:

S = {(Heads), (Tails)}

When tossing two coins:

S = {(Heads,Heads), (Heads,Tails), (Tails,Heads), (Tails,Tails)}

Years to failure for a light bulb:

S = [0,+∞)



Event

Definition

An event is any collection of possible outcomes of a randomprocess (experiment).

Is a subset of the sample space for the process.

Example

Experiment: tossing two coins.Event: getting exactly one head.

{(H,T ), (T ,H)} ⊂ {(H,H), (H,T ), (T ,H), (T ,T )}

Experiment: Years to failure for a light bulb.Event: The lightbulb lasting more than 1 year.

(1,+∞) ⊂ [0,+∞)



Event Algebra

For all outcomes x :

Union

A ∪ B = {x : x ∈ A ∨ x ∈ B}

Intersection

A ∩ B = {x : x ∈ A ∧ x ∈ B}

Complementation

¬A = {x : x /∈ A}

Disjoint events

A ∩ B = ∅



Frequentist Event Probability

Assume U is the set of all the experiments ever done.

P(A) =|A||U|

P(B) =|B||U|

P(¬A) =|U − A||U|

P(¬B) =|U − B||U|

Also called prior or marginal probabilities of some event(e.g. A, B)

Sometimes very difficult to calculateWe can only observe U partially.God’s dataset



Probability Functions

Definition

Any function that satisfies these basic axioms is probabilityfunction:

P(A) ≥ 0

P(S) = 1

If A and B are disjoint: P(A ∪ B) = P(A) + P(B)

Example

Experiment: tossing two coinsEvent A: getting exactly one headP(A) = 0.5Since each state is equally likely.



Joint and Conditional Probability

Joint probability

Frequency of two eventsoccurring together

P(A,B) = P(A ∩ B) =|A ∩ B||U|

Conditional probability

Frequency of one event given theother has occured.“Probability of A given B”

P(A|B) =|A ∩ B||B|

“Probability of B given A”

P(B|A) =|A ∩ B||A|



Conditional Independence

Definition

Event A is independent of event B if P(A|B) = P(A).

Axiom: If two events A and B are mutually independent:

P(A,B) = P(A)P(B)

Careful: Independence does not mean disjointness.

Example

Tossing a fair six-sided die. A = {the result is even},B = {the result is > 2}

P(A|B) = 1/2P(B|A) = 2/3P(A) = 1/2P(B) = 2/3



Bayes’ Theorem

Theorem

Bayes’ Theorem states that:

P(A|B) =P(A,B)

P(B)

Likewise:

P(B|A) =P(A,B)

P(A)

Therefore:

P(A|B) =P(B|A)P(A)

P(B)



Law of Total Probability

Theorem

Let B1,B2 . . .Bk be a partition of state space S.(i.e. never occur together, and one of them must occur)Let A be some other event.

P(Bi |A) =P(A|Bi )P(Bi )

P(A)=

P(A|Bi )P(Bi )∑kj=1 P(A|Bj)P(Bj)

Example

A novel diagnosis array is 95% effective in detecting a certain diseasewhen it is present. The test also has a 1% false positive rate. If 0.5% ofthe population has the disease (B), what is the probability a person witha positive test (A) result actually has the disease?

P(B|A) =P(A|B)P(B)

P(A|B)P(B) + P(A|¬B)P(¬B)=

0.95× 0.005

0.95× 0.005 + 0.01(1− 0.005)= 0.323



Random Variables

Definition

A random variable (r.v.) associates a unique numerical value witheach outcome in the sample space. It is a real-valued functionfrom a sample space S into real numbers.

Like events, random variables are denoted by uppercase letters(e.g., X or Y )

Particular values that are taken by a r.v. are denoted by thecorresponding lowercase letter (e.g. x or y)If the range of an r.v. is finite or countably infinite, it is calleddiscrete.

Toss three coins. X = number of headsWatch a bulb until it fails. X = lifetime in minutes

If an r.v. X is continuous, it can take any value from one ormore intervals of real numbers.

Pick an Informatics Institute student. X = GPA of student



The Monty Hall Problem

Image: American Broadcasting Companies Inc.



The Monty Hall Problem

1 There are three doors.2 Behind one is a prize, the other two have nothing.3 You pick one door. You do not open it yet.4 Monty opens one of the doors you didn’t pick and it’s empty.5 Monty then asks if you want to stick with your original choice or

switch to the other door

Is it better to stick or switch?BIN 504 - Probability & Basic Statistics 14 of 33


Monty Hall: Random Variables

Let’s say there are three r.v.’s (C,X,M) each with the possiblevalues {1, 2, 3}.

The r.v. denoting your choice is C

You choose at random, so P(C = c) = 1/3

The r.v. denoting the prize door is X

This too is random, so P(X = x) = 1/3You do not know which door the prize is behind, so:

P(C = c |X = x) = P(C = c)

The r.v. denoting the door Monty opens is M.

P(M = m|X = x ,C = c) =

0, if m = c (Monty can’t open your door)

0, if m = x (Monty can’t open the prize door)

1/2, if c = x (Monty can choose either door)

1, if c 6= x (Monty has only one option)



Monty Hall: Result

So, now we have:

P(X = x |C = c ,M = m) =1/3P(M = m|X = x ,C = c)

1/3∑3

x=1 P(M = m|X = x ,C = c)

Let’s say c=1, m=3.The chances of winning by switching (i.e. X=2 ) is:

P(X = 2|M = 3,C = 1) =1

1/2 + 1 + 0= 2/3

It’s better to switch!



Probability Distributions

Random variables are defined by their probability distributions.

The probability distribution of an r.v. is the function whichmaps to the probability of it having a value, P(X = x) for allpossible values x

Example

If X is the r.v. representing the outcome of a coin toss, it isdistributed as

f (x) = 0.5 ∀x ∈ {heads, tails}

The distinction between discrete and continuous randomvariables is whether the domain x is discrete (e.g. integer) orcontinuous (e.g real)



Mass and Density Functions

The probability distribution function for a discrete r.v. X iscalled its Probability Mass Function (pmf):

fX (x) = P(X = x)

For continuous variables the situation is a bit complex.Since the x range is uncountably infinite the probability of asingle value will be infinitesimal.limprecision→∞ P(X = x) is zero

We define fX (x) in terms of the probability between twovalues:

P(xl ≤ X ≤ xh) =

∫ xh

xl

fX (x)dx

fX (x) is then called the Probability Density Function (pdf) ofX.



Cumulative Distribution Functions

We are often interested in the probability of a r.v. X havingsome value up to and including x

The function FX (x) that defines this is the CumulativeDistribution Function (CDF) of X

For discrete variables:

FX (x) =∑∀t≤x

f (t)

For continuous variables:

FX (x) =

∫ x

−∞f (t)dt



Example

Assume that a lab rat is observed for a while and the number ofmeals it eats each day is X . The number of hours it sleeps eachday is defined by the random variable Y . The rat:

eats two meals half the time, the remaining days it will eateither one meal or no food with equal probability.sleeps 10± 4 hours each day, uniformly distributed.

e.g. it is equally likely to sleep 9.88232 hours as it is to sleep13.432323̄ hours

fX (x) =

0.25, y = 0

0.25, y = 1

0.50, y = 2

FX (x) =

0.25, y = 0

0.50, y = 1

1, y = 2



Example (cont.)

FY (y) =

0 y < 6(y−6)

8 6 ≤ y ≤ 14

1 y > 14

fY (y) =dFY (y)

dy

in other words:

fY (y) =

0 y < 618 6 ≤ y ≤ 14

0 y > 14



Expectation

An important summary statistic of an r.v. is what we expectit will be.

This is called the Expected Value or mean of a r.v.

For discrete variables:

E [X ] = µX =∑x

xf (x) = x1f (x1) + x2f (x2) + . . . xk f (xk)

For continuous variables:

E [X ] = µX =

∫ +∞

−∞xf (x)dx



Example

The expected number of meals for our rat:

E [X ] = (0)0.25 + (1)0.25 + (2)0.50 = 1.25 meals

The expected sleep time:

E [Y ] =

∫ 14

6

y

8dy =

142 − 62

8× 2= 10 hours



Functions on Expectation

The expected value of any arbitrary function g(x) applied to X is:

E [g(X )] =∑x

g(x)f (x) = g(x1)f (x1)+g(x2)f (x2)+. . . g(xk)f (xk)

or

E [g(X )] =

∫ +∞

−∞g(x)f (x)dx

Example

The expected number of hours above 8 hours that our rat sleeps(i.e. g(Y ) = y − 8)

E [g(Y )] =

∫ 14

6

(y − 8)

8dy =

(14− 8)2 − (6− 8)2

8× 2= 2



Linearity and Product of Expectation

Linear combinations of expected variables are possible:

E [c1X + c2Y ] = c1E [X ] + c2E [Y ]

The expected value of the product of two variables is definedin terms of the joint probability fXY (x , y):

E [XY ] =

∫x

∫yxy fXY (x , y)dxdy

Caution

fXY (x , y) = fX (y)fY (y) if and only if X and Y are conditionallyindependent.



Example

What is the expected number of meals per hour of sleep for ourrat? Assuming independence (fXY (x , y) = fX (y)fY (y)):

E

[X

Y

]=

∫x

∫yx

1

yfX (x)fY (y)dxdy

E

[X

Y

]=∑x

x fX (x)

∫y

1

yfY (y)dy

1.25

∫ 14

6

fY (y)

ydy = 1.25

(ln(14)− ln(6)

8

)= 0.132

Note that it is not 0.125 ( 1.25/10 )!

What if we assume the rat eats one less meal than normal if it gotless than 8 hours of sleep?



Variance

Definition

Variance is defined as square the expected deviation from the mean.

Var(X ) = E [(X − µX )2] = E (X 2)− µ2X

Standard deviation σ is defined as:

σ =√Var(X )

Example

The variance of the sleep time for the rat is:

E(Y 2) − µ2Y =

∫ 14

6

y 2 f (Y )dy − 100

=(143 − 63)

3 × 8− 100 = 5.333̄

σY = 2.309



Covariance

Definition

Covariance is the measure of the dependence of two randomvariables to each other.

Cov(X ,Y ) = E [(X − µX )(Y − µY )]

Notice that:

E [(X − µX )(Y − µY )] = E (XY )− E (X )E (Y )

if X and Y are independent: fXY (x , y) = fX (x)fY (y)

fXY (x , y) = fX (x)fY (y) ⇒ E (X ,Y ) = E (X )E (Y )

E (X ,Y ) = E (X )E (Y ) ⇒ Cov(X ,Y ) = 0

Covariance is zero if the variables are independent.



Correlation

Definition

Pearson Correlation is a normalized measure of the dependenceof two random variables to each other.

corr(X ,Y ) =Cov(X ,Y )

σXσY

Essentially the covariance normalized by the products of thestandard deviations.

Like covariance, corr(X ,Y ) = 0 if X and Y are independent.

Furthermore,

corr(X ,Y ) = 1 denotes perfect positive dependencecorr(X ,Y ) = −1 denotes perfect negative dependence



Correlation Illustration

1 0.8 0.4 0 -0.4 -0.8 -1

1 1 1 -1 -1 -1

0 0 0 0 0 0 0

Illustration by: Denis Boigelot


http://en.wikipedia.org/wiki/File:Correlation_examples2.svg


Distributions of More than One RV

Assume we do a survey of 100 people and ask them how many kids(rows) and how many vases (columns) they have:

1 2 3 4 Total

1 5 5 5 5 202 15 12 10 3 403 11 8 5 1 254 8 5 2 0 15

Total 39 30 22 9 100

The pmf of the joint distribution, f(k,v), is N(k,v)/100.

e.g. P(K = 3,V = 4) = 0.01 whereasP(K = 1,V = 4) = 0.05It seems number of kids is inversely correlated with number ofvases.



Joint and Marginal Distributions

The joint distribution of two r.v.s then satisfies:∫x

∫yf (x , y)dxdy = 1

We can find f (x) by integrating (or summing) over y

f (x) =

∫yf (x , y)dy

and vice versa, for f (y)

f (x) and f (y) are the Marginal distributions.

So, the joint distribution carries information about themarginal distributions as well.

The converse is not true unless f (x , y) = f (x)f (y)f (x , y) = f (x)f (y) when Cov(X ,Y ) = 0


probability and basic statistics

Documents