module #16: probability theory

Module #16:Probability Theory

Rosen 5th ed., ch. 5

Let’s move on to probability, ch. 5.

Terminology• A (stochastic) experiment is a procedure that

yields one of a given set of possible outcomes• The sample space S of the experiment is the

set of possible outcomes.• An event is a subset of sample space.• A random variable is a function that assigns a

real value to each outcome of an experimentNormally, a probability is related to an experiment or a trial.

Let’s take flipping a coin for example, what are the possible outcomes?

Heads or tails (front or back side) of the coin will be shown upwards.After a sufficient number of tossing, we can “statistically” concludethat the probability of head is 0.5.In rolling a dice, there are 6 outcomes. Suppose we want to calculate the prob. of the event of odd numbers of a dice. What is that probability?

Probability: Laplacian Definition

• First, assume that all outcomes in the sample space are equally likely– This term still needs to be defined.

• Then, the probability of event E in sample space S is given by Pr[E] = |E|/|S|.

Even though there are many definitions of probability, I would like touse the one from Laplace. The expression “equally likely” may be a little bit vague from the perspective of pure mathematics. But in engineering viewpoint, I think that is ok.

Probability of Complementary Events

• Let E be an event in a sample space S.

• Then, E represents the complementary event.

• Pr[E] = 1 − Pr[E]

• Pr[S] = 1

Probability of Unions of Events

• Let E1,E2 S

• Then: Pr[E1 E2] = Pr[E1] + Pr[E2] − Pr[E1E2]

– By the inclusion-exclusion principle.

Mutually Exclusive Events

• Two events E1, E2 are called mutually exclusive if they are disjoint: E1E2 =

• Note that two mutually exclusive events cannot both occur in the same instance of a given experiment.

• For mutually exclusive events,Pr[E1 E2] = Pr[E1] + Pr[E2].

Exhaustive Sets of Events

• A set E = {E1, E2, …} of events in the sample space S is exhaustive if

• An exhaustive set of events that are all mutually exclusive with each other has the property that

SEi

1]Pr[ iE

Independent Events

• Two events E,F are independent if Pr[EF] = Pr[E]·Pr[F].

• Relates to product rule for number of ways of doing two independent tasks

• Example: Flip a coin, and roll a die.Pr[ quarter is heads die is 1 ] =

Pr[quarter is heads] × Pr[die is 1]

Now the question is: how we can figure out whether two events are independent or not

Conditional Probability

• Let E,F be events such that Pr[F]>0.• Then, the conditional probability of E given F,

written Pr[E|F], is defined as Pr[EF]/Pr[F].• This is the probability that E would turn out to

be true, given just the information that F is true.

• If E and F are independent, Pr[E|F] = Pr[E].

Here is the most important part in the probability, the cond. prob.

By the cond. prob., we can figure out whether there is a correlation or dependency between two probabilities.

Bayes’ theorem: example• Suppose 1% of population has AIDS• Prob. that the positive result is right: 95%• Prob. that the negative result is right: 90%• What is the probability that someone who has

the positive result is actually an AIDS patient?

• H: event that a person has AIDS• D: event of positive result• P[D] = P[D|H]P[H]+P[D|H]P[H ]

= 0.95*0.01+0.1*0.99=0.1085• P[H|D] = 0.95*0.01/0.1085=0.0876

Expectation Values

• For a random variable X(s) having a numeric domain, its expectation value or expected value or weighted average value or arithmetic mean value E[X] is defined as

Ss

sXsp )()(

Linearity of Expectation

• Let X1, X2 be any two random variables derived from the same sample space. Then:

• E[X1+X2] = E[X1] + E[X2]

• E[aX1 + b] = aE[X1] + b

Variance

• The variance Var[X] = σ2(X) of a random variable X is the expected value of the square of the difference between the value of X and its expectation value E[X]:

• The standard deviation or root-mean-square (RMS) difference of X, σ(X) :≡ (Var[X])1/2.

Ss

spXsXX )(])[)((:][ 2EVar

Visualizing Sample Space

• 1. Listing– S = {Head, Tail}

• 2. Venn Diagram

• 3. Contingency Table

• 4. Decision Tree Diagram

SS

TailTail

HHHH

TTTT

THTHHTHT

Sample SpaceSample SpaceS = {HH, HT, TH, TT}S = {HH, HT, TH, TT}

Venn Diagram

OutcomeOutcome

Experiment: Toss 2 Coins. Note Faces.Experiment: Toss 2 Coins. Note Faces.

Event Event

22ndnd CoinCoin11stst CoinCoin HeadHead TailTail TotalTotal

HeadHead HHHH HTHT HH, HTHH, HT

TailTail THTH TTTT TH, TTTH, TT

TotalTotal HH,HH, THTH HT,HT, TTTT SS

Contingency Table


S = {HH, HT, TH, TT}S = {HH, HT, TH, TT} Sample SpaceSample Space

OutcomeOutcomeSimpleSimpleEvent Event (Head on(Head on1st Coin)1st Coin)

EventEventEventEvent BB11 BB22 TotalTotal

AA11 P(AP(A1 1 BB11)) P(AP(A1 1 BB22)) P(AP(A11))

AA22 P(AP(A2 2 BB11)) P(AP(A2 2 BB22)) P(AP(A22))

TotalTotal P(BP(B11)) P(BP(B22)) 11

Event Probability Using Contingency Table

Joint ProbabilityJoint Probability Marginal (Simple) ProbabilityMarginal (Simple) Probability

Marginal probability

• Let S be partitioned into m x n disjoint sets Ei and Fj where the general subset

is denoted Ei Fj . Then the marginal

probability of Ei is

j

n

ji FE

1

Tree Diagram

Outcome Outcome

S = {HH, HT, TH, TT}S = {HH, HT, TH, TT} Sample SpaceSample Space


TT

HH

TT

HH

TT

HHHH

HTHT

THTH

TTTT

HH

Discrete Random Variable

– Possible values (outcomes) are discrete• E.g., natural number (0, 1, 2, 3 etc.)

– Obtained by Counting– Usually Finite Number of Values

• But could be infinite (must be “countable”)

Discrete Probability Distribution

1.List of All possible [x, p(x)] pairs– x = Value of Random Variable (Outcome)– p(x) = Probability Associated with Value

2.Mutually Exclusive (No Overlap)

3.Collectively Exhaustive (Nothing Left Out)

4. 0 p(x) 1

5. p(x) = 1

Visualizing Discrete Probability Distributions

• { (0, .25), (1, .50), (2, .25) }

• { (0, .25), (1, .50), (2, .25) }

ListingListingTableTable

GraphGraph EquationEquation

# # TailsTails f(xf(x))CountCount

p(xp(x))

00 11 .25.2511 22 .50.5022 11 .25.25

pp xxnn

xx nn xxpp ppxx nn xx(( ))

!!

!! (( )) !!(( ))

11

.00.00

.25.25

.50.50

00 11 22xx

p(x)p(x)

Cumulative Distribution Function (CDF)

ax

x xpaXaF Pr

Binomial Distribution

1. Sequence of n Identical Trials

2. Each Trial Has 2 Outcomes– ‘Success’ (Desired/specified Outcome) or ‘Failure’

3. Constant Trial Probability

4. Trials Are Independent

5. # of successes in n trials is a binomial random variable

Binomial Probability Distribution Function

xnxxnx ppxnx

nqp

x

nxp

)1(

)!(!

!)( xnxxnx pp

xnx

nqp

x

nxp

)1(

)!(!

!)(

pp((xx) = Probability of ) = Probability of x x ‘Successes’‘Successes’

nn == SampleSample Size Size

pp == Probability of ‘Success’Probability of ‘Success’

xx == Number of ‘Successes’ in Number of ‘Successes’ in SampleSample ( (xx = 0, 1, 2, ..., = 0, 1, 2, ..., n n))

Binomial Distribution Characteristics

.0

.2

.4

.6

0 1 2 3 4 5

X

P(X)

.0

.2

.4

.6

0 1 2 3 4 5

X

P(X)

.0

.2

.4

.6

0 1 2 3 4 5

X

P(X)

.0

.2

.4

.6

0 1 2 3 4 5

X

P(X)

n = 5 p = 0.1

n = 5 p = 0.5

E x np

np p

( )

( )1

E x np

np p

( )

( )1

MeanMean

Standard DeviationStandard Deviation

Useful Observation 1

• For any X and Y

YEXE

yypxxp

yxpyyxpx

yxypyxxp

yxpyxYXE

yx

y xx y

x yx y

x y

)()(

)()(

)(

One Binary Outcome

• Random variable X, one binary outcome

• Code success as 1, failure as 0

• P(success)=p, P(failure)=(1-p)=q

• E(X) = p

pppp

pqp

XEXE

XEXEXVar

1

0*1*2

222

22

2

• Independent, identically distributed

• X1, …, Xn; E(Xi)=p; Binomial X =

Mean of a Binomial

np

XnE

XE

XE

i

i

1

iX

By useful By useful observation 1observation 1


• For independent X and Y

YEXEXEYExxpYE

YExxpyypxxp

ypxxyp

yxpxyXYE

x

xx y

x y

x y

)(


• For independent X and Y

222

22

2 YEXEXYYXE

YXEYXEYXVar

YVarXVarYEYEXEXE

YEXEXYE

YEYEXEXE

YEXEYEXE

XYEYEXE

2222

2222

22

22

22

2

2

cancelled by obs. 2cancelled by obs. 2

Variance of Binomial

• Independent, identically distributed

• X1, …, Xn; E(Xi)=p; Binomial X =

pnp

XnVar

XVarXVar

i

ii

1

iX


• For any X

XVark

XEkXEk

XkEXEk

kXEXkEkXVar

2

2222

222

222

Continuous random variable

Continuous Prob. Density Function

1.Mathematical Formula

2.Shows All Values, x, and Frequencies, f(x)– f(x) Is Not Probability

3.Properties

((Area Under Curve)Area Under Curve)ValueValue

((Value, Frequency)Value, Frequency)

f(x)f(x)

aa bbxxff xx dxdx

ff xx

(( ))

(( ))

All All xx

aa x x bb

11

0,0,

Continuous Random Variable Probability

Probability Is Area Probability Is Area Under Curve!Under Curve!

PP cc xx dd ff xx dxdxcc

dd(( )) (( ))

f(x)f(x)

Xc d

Uniform Distribution1. Equally Likely Outcomes

2. Probability Density

3. Mean & Standard Deviation Mean Mean

MedianMedian

f xd c

( ) 1

f xd c

( ) 1

c d d c

2 12

c d d c2 12

1d c

1d c

x

f(x)

dc

Uniform Distribution Example

• You’re production manager of a soft drink bottling company. You believe that when a machine is set to dispense 12 oz., it really dispenses 11.5 to 12.5 oz. inclusive.

• Suppose the amount dispensed has a uniform distribution.

• What is the probability that less than 11.8 oz. is dispensed?

Uniform Distribution Solution

P(11.5 x 11.8) = (Base)(Height)

= (11.8 - 11.5)(1) = 0.30

11.511.5 12.512.5

ff((xx))

xx11.811.8

1 112 5 11511

10

d c

. .

.

1 112 5 11511

10

d c

. .

.

1.01.0

Normal Distribution

1. Describes Many Random Processes or Continuous Phenomena

2. Can Be Used to Approximate Discrete Probability Distributions

– Example: Binomial

3. Basis for Classical Statistical Inference

4. A.k.a. Gaussian distribution

Normal Distribution

1. ‘Bell-Shaped’ & Symmetrical

2.Mean, Median, Mode Are Equal

4. Random Variable Has Infinite Range Mean: Mean: 평균 평균

Median: Median: 중간값 중간값 Mode: Mode: 최빈값최빈값

X

f(X)

X

f(X)

* light-tailed distribution

Probability Density Function

2

2

1

e2

1)(

x

xf

2

2

1

e2

1)(

x

xf

f(x) = Frequency of Random Variable x = Population Standard Deviation = 3.14159; e = 2.71828x = Value of Random Variable (-< x < ) = Population Mean

Effect of Varying Parameters ( & )

X

f(X)

CA

B

Normal Distribution Probability

?)()( dxxfdxcPd

c

c dx

f(x)

c dx

f(x)

Probability is Probability is area under area under curve!curve!

X

f(X)

X

f(X)

Infinite Number of Tables

Normal distributions differ by Normal distributions differ by mean & standard deviation.mean & standard deviation.

Each distribution would Each distribution would require its own table.require its own table.

That’s an That’s an infinite infinite number!number!

Standardize theNormal Distribution

X

X

One table! One table!

Normal DistributionNormal Distribution

= 0

= 1

Z = 0

= 1

Z

ZX

ZX

Standardized

Normal DistributionStandardized

Normal Distribution

Intuitions on Standardizing

• Subtracting from each value X just moves the curve around, so values are centered on 0 instead of on

• Once the curve is centered, dividing each value by >1 moves all values toward 0, pressing the curve

Standardizing Example

X= 5

= 10

6.2 X= 5

= 10

6.2


ZX

6 2 5

1012

..Z

X

6 2 510

12.

.

Standardizing Example

X= 5

= 10

6.2 X= 5

= 10

6.2


ZX

6 2 5

1012

..Z

X

6 2 510

12.

.

Z= 0

= 1

.12 Z= 0

= 1

.12

Standardized Normal Distribution

Standardized Normal Distribution

Z= 0

= 1

.12 Z= 0

= 1

.12

Z .00 .01

0.0 .0000 .0040 .0080

.0398 .0438

0.2 .0793 .0832 .0871

0.3 .1179 .1217 .1255

Z .00 .01

0.0 .0000 .0040 .0080

.0398 .0438

0.2 .0793 .0832 .0871

0.3 .1179 .1217 .1255

Obtaining the Probability

.0478.0478.0478.0478

.02.02

0.10.1 .0478

Standardized Normal Standardized Normal Probability Table (Portion)Probability Table (Portion)Standardized Normal Standardized Normal Probability Table (Portion)Probability Table (Portion)

ProbabilitiesProbabilitiesProbabilitiesProbabilitiesShaded area Shaded area exaggeratedexaggeratedShaded area Shaded area exaggeratedexaggerated

module #16: probability theory

Documents