probability and basic statistics

33
Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions BIN504 - Lecture I Probability, Random Variables, and Basic Statistics c 2012-13 Aybar C. Acar Based on slides by Tolga Can and Jae K. Lee Rev. 1.1 (Build 20130305220800) BIN 504 - Probability & Basic Statistics 1 of 33

Upload: others

Post on 09-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

BIN504 - Lecture IProbability, Random Variables, and Basic Statistics

c©2012-13 Aybar C. Acar

Based on slides by Tolga Can and Jae K. Lee

Rev. 1.1 (Build 20130305220800)

BIN 504 - Probability & Basic Statistics 1 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Probability

Definition

Probability is a measure of the likeliness that a random event willoccur.

There are two major schools:

Frequentists: “Probability is the relative frequency of occurrence ofsome event after repeating a process a large number oftimes under similar conditions.”

Can only treat processes that are repeatable andwell-defined.Formal and objective.

Subjectivists: “Probability is the degree of belief that the individualmaking the assessment has in the occurrence of someevent.”

Can assign a probability (credence) to any statement.Not very formal and not objective at all.

BIN 504 - Probability & Basic Statistics 2 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Sample Space

Definition

The set of all possible outcomes of a random process is called thesample space.

Example

When tossing a coin, the sample space U is:

S = {(Heads), (Tails)}

When tossing two coins:

S = {(Heads,Heads), (Heads,Tails), (Tails,Heads), (Tails,Tails)}

Years to failure for a light bulb:

S = [0,+∞)

BIN 504 - Probability & Basic Statistics 3 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Event

Definition

An event is any collection of possible outcomes of a randomprocess (experiment).

Is a subset of the sample space for the process.

Example

Experiment: tossing two coins.Event: getting exactly one head.

{(H,T ), (T ,H)} ⊂ {(H,H), (H,T ), (T ,H), (T ,T )}

Experiment: Years to failure for a light bulb.Event: The lightbulb lasting more than 1 year.

(1,+∞) ⊂ [0,+∞)

BIN 504 - Probability & Basic Statistics 4 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Event Algebra

For all outcomes x :

Union

A ∪ B = {x : x ∈ A ∨ x ∈ B}

Intersection

A ∩ B = {x : x ∈ A ∧ x ∈ B}

Complementation

¬A = {x : x /∈ A}

Disjoint events

A ∩ B = ∅

BIN 504 - Probability & Basic Statistics 5 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Frequentist Event Probability

Assume U is the set of all the experiments ever done.

P(A) =|A||U|

P(B) =|B||U|

P(¬A) =|U − A||U|

P(¬B) =|U − B||U|

Also called prior or marginal probabilities of some event(e.g. A, B)

Sometimes very difficult to calculateWe can only observe U partially.God’s dataset

BIN 504 - Probability & Basic Statistics 6 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Probability Functions

Definition

Any function that satisfies these basic axioms is probabilityfunction:

P(A) ≥ 0

P(S) = 1

If A and B are disjoint: P(A ∪ B) = P(A) + P(B)

Example

Experiment: tossing two coinsEvent A: getting exactly one headP(A) = 0.5Since each state is equally likely.

BIN 504 - Probability & Basic Statistics 7 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Joint and Conditional Probability

Joint probability

Frequency of two eventsoccurring together

P(A,B) = P(A ∩ B) =|A ∩ B||U|

Conditional probability

Frequency of one event given theother has occured.“Probability of A given B”

P(A|B) =|A ∩ B||B|

“Probability of B given A”

P(B|A) =|A ∩ B||A|

BIN 504 - Probability & Basic Statistics 8 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Conditional Independence

Definition

Event A is independent of event B if P(A|B) = P(A).

Axiom: If two events A and B are mutually independent:

P(A,B) = P(A)P(B)

Careful: Independence does not mean disjointness.

Example

Tossing a fair six-sided die. A = {the result is even},B = {the result is > 2}

P(A|B) = 1/2P(B|A) = 2/3P(A) = 1/2P(B) = 2/3

BIN 504 - Probability & Basic Statistics 9 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Bayes’ Theorem

Theorem

Bayes’ Theorem states that:

P(A|B) =P(A,B)

P(B)

Likewise:

P(B|A) =P(A,B)

P(A)

Therefore:

P(A|B) =P(B|A)P(A)

P(B)

BIN 504 - Probability & Basic Statistics 10 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Law of Total Probability

Theorem

Let B1,B2 . . .Bk be a partition of state space S.(i.e. never occur together, and one of them must occur)Let A be some other event.

P(Bi |A) =P(A|Bi )P(Bi )

P(A)=

P(A|Bi )P(Bi )∑kj=1 P(A|Bj)P(Bj)

Example

A novel diagnosis array is 95% effective in detecting a certain diseasewhen it is present. The test also has a 1% false positive rate. If 0.5% ofthe population has the disease (B), what is the probability a person witha positive test (A) result actually has the disease?

P(B|A) =P(A|B)P(B)

P(A|B)P(B) + P(A|¬B)P(¬B)=

0.95× 0.005

0.95× 0.005 + 0.01(1− 0.005)= 0.323

BIN 504 - Probability & Basic Statistics 11 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Random Variables

Definition

A random variable (r.v.) associates a unique numerical value witheach outcome in the sample space. It is a real-valued functionfrom a sample space S into real numbers.

Like events, random variables are denoted by uppercase letters(e.g., X or Y )

Particular values that are taken by a r.v. are denoted by thecorresponding lowercase letter (e.g. x or y)If the range of an r.v. is finite or countably infinite, it is calleddiscrete.

Toss three coins. X = number of headsWatch a bulb until it fails. X = lifetime in minutes

If an r.v. X is continuous, it can take any value from one ormore intervals of real numbers.

Pick an Informatics Institute student. X = GPA of student

BIN 504 - Probability & Basic Statistics 12 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

The Monty Hall Problem

Image: American Broadcasting Companies Inc.

BIN 504 - Probability & Basic Statistics 13 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

The Monty Hall Problem

1 There are three doors.2 Behind one is a prize, the other two have nothing.3 You pick one door. You do not open it yet.4 Monty opens one of the doors you didn’t pick and it’s empty.5 Monty then asks if you want to stick with your original choice or

switch to the other door

Is it better to stick or switch?BIN 504 - Probability & Basic Statistics 14 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Monty Hall: Random Variables

Let’s say there are three r.v.’s (C,X,M) each with the possiblevalues {1, 2, 3}.

The r.v. denoting your choice is C

You choose at random, so P(C = c) = 1/3

The r.v. denoting the prize door is X

This too is random, so P(X = x) = 1/3You do not know which door the prize is behind, so:

P(C = c |X = x) = P(C = c)

The r.v. denoting the door Monty opens is M.

P(M = m|X = x ,C = c) =

0, if m = c (Monty can’t open your door)

0, if m = x (Monty can’t open the prize door)

1/2, if c = x (Monty can choose either door)

1, if c 6= x (Monty has only one option)

BIN 504 - Probability & Basic Statistics 15 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Monty Hall: Bayes’ Theorem

The probability we are interested in is:

P(X = x |C = c ,M = m) =P(M = m|X = x ,C = c)P(C = c |X = x)

P(M = m|C = c)

which can be simplified to:

P(X = x |C = c ,M = m) =1/3P(M = m|X = x ,C = c)

P(M = m|C = c)

Since we know C is independent of X. The denominator can be found by:

P(M = m|C = c) =3∑

x=1

P(M = m,X = x |C = c)

=3∑

x=1

P(M = m|X = x ,C = c)P(C = c |X = x)

= 1/33∑

x=1

P(M = m|X = x ,C = c)

BIN 504 - Probability & Basic Statistics 16 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Monty Hall: Result

So, now we have:

P(X = x |C = c ,M = m) =1/3P(M = m|X = x ,C = c)

1/3∑3

x=1 P(M = m|X = x ,C = c)

Let’s say c=1, m=3.The chances of winning by switching (i.e. X=2 ) is:

P(X = 2|M = 3,C = 1) =1

1/2 + 1 + 0= 2/3

It’s better to switch!

BIN 504 - Probability & Basic Statistics 17 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Probability Distributions

Random variables are defined by their probability distributions.

The probability distribution of an r.v. is the function whichmaps to the probability of it having a value, P(X = x) for allpossible values x

Example

If X is the r.v. representing the outcome of a coin toss, it isdistributed as

f (x) = 0.5 ∀x ∈ {heads, tails}

The distinction between discrete and continuous randomvariables is whether the domain x is discrete (e.g. integer) orcontinuous (e.g real)

BIN 504 - Probability & Basic Statistics 18 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Mass and Density Functions

The probability distribution function for a discrete r.v. X iscalled its Probability Mass Function (pmf):

fX (x) = P(X = x)

For continuous variables the situation is a bit complex.Since the x range is uncountably infinite the probability of asingle value will be infinitesimal.limprecision→∞ P(X = x) is zero

We define fX (x) in terms of the probability between twovalues:

P(xl ≤ X ≤ xh) =

∫ xh

xl

fX (x)dx

fX (x) is then called the Probability Density Function (pdf) ofX.

BIN 504 - Probability & Basic Statistics 19 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Cumulative Distribution Functions

We are often interested in the probability of a r.v. X havingsome value up to and including x

The function FX (x) that defines this is the CumulativeDistribution Function (CDF) of X

For discrete variables:

FX (x) =∑∀t≤x

f (t)

For continuous variables:

FX (x) =

∫ x

−∞f (t)dt

BIN 504 - Probability & Basic Statistics 20 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Example

Assume that a lab rat is observed for a while and the number ofmeals it eats each day is X . The number of hours it sleeps eachday is defined by the random variable Y . The rat:

eats two meals half the time, the remaining days it will eateither one meal or no food with equal probability.sleeps 10± 4 hours each day, uniformly distributed.

e.g. it is equally likely to sleep 9.88232 hours as it is to sleep13.432323̄ hours

fX (x) =

0.25, y = 0

0.25, y = 1

0.50, y = 2

FX (x) =

0.25, y = 0

0.50, y = 1

1, y = 2

BIN 504 - Probability & Basic Statistics 21 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Example (cont.)

FY (y) =

0 y < 6(y−6)

8 6 ≤ y ≤ 14

1 y > 14

fY (y) =dFY (y)

dy

in other words:

fY (y) =

0 y < 618 6 ≤ y ≤ 14

0 y > 14

BIN 504 - Probability & Basic Statistics 22 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Expectation

An important summary statistic of an r.v. is what we expectit will be.

This is called the Expected Value or mean of a r.v.

For discrete variables:

E [X ] = µX =∑x

xf (x) = x1f (x1) + x2f (x2) + . . . xk f (xk)

For continuous variables:

E [X ] = µX =

∫ +∞

−∞xf (x)dx

BIN 504 - Probability & Basic Statistics 23 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Example

The expected number of meals for our rat:

E [X ] = (0)0.25 + (1)0.25 + (2)0.50 = 1.25 meals

The expected sleep time:

E [Y ] =

∫ 14

6

y

8dy =

142 − 62

8× 2= 10 hours

BIN 504 - Probability & Basic Statistics 24 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Functions on Expectation

The expected value of any arbitrary function g(x) applied to X is:

E [g(X )] =∑x

g(x)f (x) = g(x1)f (x1)+g(x2)f (x2)+. . . g(xk)f (xk)

or

E [g(X )] =

∫ +∞

−∞g(x)f (x)dx

Example

The expected number of hours above 8 hours that our rat sleeps(i.e. g(Y ) = y − 8)

E [g(Y )] =

∫ 14

6

(y − 8)

8dy =

(14− 8)2 − (6− 8)2

8× 2= 2

BIN 504 - Probability & Basic Statistics 25 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Linearity and Product of Expectation

Linear combinations of expected variables are possible:

E [c1X + c2Y ] = c1E [X ] + c2E [Y ]

The expected value of the product of two variables is definedin terms of the joint probability fXY (x , y):

E [XY ] =

∫x

∫yxy fXY (x , y)dxdy

Caution

fXY (x , y) = fX (y)fY (y) if and only if X and Y are conditionallyindependent.

BIN 504 - Probability & Basic Statistics 26 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Example

What is the expected number of meals per hour of sleep for ourrat? Assuming independence (fXY (x , y) = fX (y)fY (y)):

E

[X

Y

]=

∫x

∫yx

1

yfX (x)fY (y)dxdy

E

[X

Y

]=∑x

x fX (x)

∫y

1

yfY (y)dy

1.25

∫ 14

6

fY (y)

ydy = 1.25

(ln(14)− ln(6)

8

)= 0.132

Note that it is not 0.125 ( 1.25/10 )!

What if we assume the rat eats one less meal than normal if it gotless than 8 hours of sleep?

BIN 504 - Probability & Basic Statistics 27 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Variance

Definition

Variance is defined as square the expected deviation from the mean.

Var(X ) = E [(X − µX )2] = E (X 2)− µ2X

Standard deviation σ is defined as:

σ =√Var(X )

Example

The variance of the sleep time for the rat is:

E(Y 2) − µ2Y =

∫ 14

6

y 2 f (Y )dy − 100

=(143 − 63)

3 × 8− 100 = 5.333̄

σY = 2.309

BIN 504 - Probability & Basic Statistics 28 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Covariance

Definition

Covariance is the measure of the dependence of two randomvariables to each other.

Cov(X ,Y ) = E [(X − µX )(Y − µY )]

Notice that:

E [(X − µX )(Y − µY )] = E (XY )− E (X )E (Y )

if X and Y are independent: fXY (x , y) = fX (x)fY (y)

fXY (x , y) = fX (x)fY (y) ⇒ E (X ,Y ) = E (X )E (Y )

E (X ,Y ) = E (X )E (Y ) ⇒ Cov(X ,Y ) = 0

Covariance is zero if the variables are independent.

BIN 504 - Probability & Basic Statistics 29 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Correlation

Definition

Pearson Correlation is a normalized measure of the dependenceof two random variables to each other.

corr(X ,Y ) =Cov(X ,Y )

σXσY

Essentially the covariance normalized by the products of thestandard deviations.

Like covariance, corr(X ,Y ) = 0 if X and Y are independent.

Furthermore,

corr(X ,Y ) = 1 denotes perfect positive dependencecorr(X ,Y ) = −1 denotes perfect negative dependence

BIN 504 - Probability & Basic Statistics 30 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Correlation Illustration

1 0.8 0.4 0 -0.4 -0.8 -1

1 1 1 -1 -1 -1

0 0 0 0 0 0 0

Illustration by: Denis Boigelot

BIN 504 - Probability & Basic Statistics 31 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Distributions of More than One RV

Assume we do a survey of 100 people and ask them how many kids(rows) and how many vases (columns) they have:

1 2 3 4 Total

1 5 5 5 5 202 15 12 10 3 403 11 8 5 1 254 8 5 2 0 15

Total 39 30 22 9 100

The pmf of the joint distribution, f(k,v), is N(k,v)/100.

e.g. P(K = 3,V = 4) = 0.01 whereasP(K = 1,V = 4) = 0.05It seems number of kids is inversely correlated with number ofvases.

BIN 504 - Probability & Basic Statistics 32 of 33

Probability Bayes Theorem Random Variables Distribution Functions Expectation Variance Joint Distributions

Joint and Marginal Distributions

The joint distribution of two r.v.s then satisfies:∫x

∫yf (x , y)dxdy = 1

We can find f (x) by integrating (or summing) over y

f (x) =

∫yf (x , y)dy

and vice versa, for f (y)

f (x) and f (y) are the Marginal distributions.

So, the joint distribution carries information about themarginal distributions as well.

The converse is not true unless f (x , y) = f (x)f (y)f (x , y) = f (x)f (y) when Cov(X ,Y ) = 0

BIN 504 - Probability & Basic Statistics 33 of 33