probability theory longin jan latecki temple university slides based on slides by aaron hertzmann,...

75
Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Upload: nathan-powers

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Aristotelian logic If A is true, then B is true A is true Therefore, B is true A: My car was stolen B: My car isn’t where I left it

TRANSCRIPT

Page 1: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Probability Theory

Longin Jan LateckiTemple University

Slides based on slides by Aaron Hertzmann, Michael P. Frank, and

Christopher Bishop

Page 2: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

What is reasoning?• How do we infer properties of

the world?• How should computers do it?

Page 3: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Aristotelian logic• If A is true, then B is true• A is true• Therefore, B is true

A: My car was stolenB: My car isn’t where I left it

Page 4: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Real-world is uncertainProblems with pure logic:• Don’t have perfect information• Don’t really know the model• Model is non-deterministic

So let’s build a logic of uncertainty!

Page 5: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

BeliefsLet B(A) = “belief A is true”

B(¬A) = “belief A is false”

e.g., A = “my car was stolen” B(A) = “belief my car was

stolen”

Page 6: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Reasoning with beliefsCox Axioms [Cox 1946]1. Ordering exists

– e.g., B(A) > B(B) > B(C)2. Negation function exists

– B(¬A) = f(B(A))3. Product function exists

– B(A Y) = g(B(A|Y),B(Y))This is all we need!

Page 7: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

The Cox Axioms uniquely define a complete system of reasoning: This is probability

theory!

Page 8: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

“Probability theory is nothing more than common sense reduced to calculation.”

- Pierre-Simon Laplace, 1814

Principle #1:

Page 9: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Definitions

P(A) = “probability A is true” = B(A) = “belief A is true”

P(A) 2 [0…1]P(A) = 1 iff “A is true”P(A) = 0 iff “A is false”

P(A|B) = “prob. of A if we knew B”P(A, B) = “prob. A and B”

Page 10: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

ExamplesA: “my car was stolen”B: “I can’t find my car”

P(A) = .1P(A) = .5

P(B | A) = .99P(A | B) = .3

Page 11: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Sum rule:

P(A) + P(¬A) = 1

Basic rules

Example:A: “it will rain today”

p(A) = .9 p(¬A) = .1

Page 12: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Sum rule:

i P(Ai) = 1

Basic rules

when exactly one of Ai must be true

Page 13: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Product rule:

P(A,B) = P(A|B) P(B) = P(B|A) P(A)

Basic rules

Page 14: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Basic rulesConditioning

i P(Ai) = 1 i P(Ai|B) = 1

P(A,B) = P(A|B) P(B)

P(A,B|C) = P(A|B,C) P(B|C)

Sum Rule

Product Rule

Page 15: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

SummaryProduct ruleSum rule

All derivable from Cox axioms; must obey rules of common sense

Now we can derive new rules

P(A,B) = P(A|B) P(B)i P(Ai) = 1

Page 16: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

ExampleA = you eat a good meal tonightB = you go to a highly-recommended

restaurant¬B = you go to an unknown restaurant

Model: P(B) = .7, P(A|B) = .8, P(A|¬B) = .5

What is P(A)?

Page 17: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Example, continuedModel: P(B) = .7, P(A|B) = .8, P(A|¬B)

= .5

1 = P(B) + P(¬B)1 = P(B|A) + P(¬B|A)P(A) = P(B|A)P(A) + P(¬B|A)P(A) = P(A,B) + P(A,¬B) = P(A|B)P(B) + P(A|¬B)P(¬B) = .8 .7 + .5 (1-.7) = .71

Sum ruleConditioning

Product rule

Product rule

Page 18: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Basic rulesMarginalizing

P(A) = i P(A, Bi)for mutually-exclusive Bi

e.g., p(A) = p(A,B) + p(A, ¬B)

Page 19: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Given a complete model, we can derive any other

probability

Principle #2:

Page 20: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Model: P(B) = .7, P(A|B) = .8, P(A|¬B) = .5

If we know A, what is P(B|A)?(“Inference”)

Inference

P(A,B) = P(A|B) P(B) = P(B|A) P(A)

P(B|A) =P(A|B) P(B)

P(A) = .8 .7 / .71 ≈ .79

Bayes’ Rule

Page 21: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

InferenceBayes Rule

P(M|D) =P(D|M) P(M)

P(D)

Posterior

Likelihood Prior

Page 22: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Describe your model of the world, and then compute the

probabilities of the unknowns given the

observations

Principle #3:

Page 23: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Use Bayes’ Rule to infer unknown model variables from observed

data

Principle #3a:

P(M|D) =P(D|M) P(M)

P(D)

LikelihoodPrior

Posterior

Page 24: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Bayes’ Theorem

posterior likelihood × prior

Rev. Thomas Bayes1702-1761

Page 25: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop
Page 26: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop
Page 27: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop
Page 28: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

ExampleSuppose a red die and a blue die are rolled. The sample space:

1 2 3 4 5 6 1 x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x x 5 x x x x x x 6 x x x x x x

Are the events sum is 7 and the blue die is 3 independent?

Page 29: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

|S| = 36 1 2 3 4 5 6 1 x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x x 5 x x x x x x 6 x x x x x x

|sum is 7| = 6

|blue die is 3| = 6

| in intersection | = 1

p(sum is 7 and blue die is 3) =1/36

p(sum is 7) p(blue die is 3) =6/36*6/36=1/36

Thus, p((sum is 7) and (blue die is 3)) = p(sum is 7) p(blue die is 3)

The events sum is 7 and the blue die is 3 are independent:

Page 30: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Conditional Probability• Let E,F be any events such that Pr(F)>0.• Then, the conditional probability of E given F,

written Pr(E|F), is defined as Pr(E|F) :≡ Pr(EF)/Pr(F).

• This is what our probability that E would turn out to occur should be, if we are given only the information that F occurs.

• If E and F are independent then Pr(E|F) = Pr(E).Pr(E|F) = Pr(EF)/Pr(F) = Pr(E)×Pr(F)/Pr(F) = Pr(E)

Page 31: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Visualizing Conditional Probability

• If we are given that event F occurs, then– Our attention gets restricted to the

subspace F.• Our posterior probability for E (after seeing

F) correspondsto the fraction of F where Eoccurs also.

• Thus, p′(E)=p(E∩F)/p(F).

Entire sample space S

Event FEvent EEventE∩F

Page 32: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

• Suppose I choose a single letter out of the 26-letter English alphabet, totally at random.– Use the Laplacian assumption on the sample space

{a,b,..,z}.– What is the (prior) probability

that the letter is a vowel?• Pr[Vowel] = __ / __ .

• Now, suppose I tell you that the letter chosen happened to be in the first 9 letters of the alphabet.– Now, what is the conditional

(or posterior) probability that the letter is a vowel, given this information?

• Pr[Vowel | First9] = ___ / ___ .

Conditional Probability Example

a b cde f

ghij

k

lmn

op q

r

s

tu

v

w

xy

z

1st 9lettersvowels

Sample Space S

Page 33: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Example

• What is the probability that, if we flip a coin three times, that we get an odd number of tails (=event E), if we know that the event F, the first flip comes up tails occurs?

(TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH)

Each outcome has probability 1/4,p(E |F) = 1/4+1/4 = ½, where E=odd number of tailsor p(E|F) = p(EF)/p(F) = 2/4 = ½For comparison p(E) = 4/8 = ½ E and F are independent, since p(E |F) = Pr(E).

Page 34: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Example: Two boxes with balls

• Two boxes: first: 2 blue and 7 red balls; second: 4 blue and 3 red balls

• Bob selects a ball by first choosing one of the two boxes, and then one ball from this box.

• If Bob has selected a red ball, what is the probability that he selected a ball from the first box.

• An event E: Bob has chosen a red ball.• An event F: Bob has chosen a ball from the first box.• We want to find p(F | E)

Page 35: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

What’s behind door number three?

• The Monty Hall problem paradox– Consider a game show where a prize (a car) is

behind one of three doors– The other two doors do not have prizes (goats

instead)– After picking one of the doors, the host (Monty

Hall) opens a different door to show you that the door he opened is not the prize

– Do you change your decision?• Your initial probability to win (i.e. pick the

right door) is 1/3• What is your chance of winning if you change

your choice after Monty opens a wrong door?• After Monty opens a wrong door, if you change

your choice, your chance of winning is 2/3– Thus, your chance of winning doubles if you

change– Huh?

Page 36: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop
Page 37: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Monty Hall Problem

Ci - The car is behind Door i, for i equal to 1, 2 or 3.

Hij - The host opens Door j after the player has picked Door i, for i and j equal to 1, 2 or 3. Without loss of generality, assume, by re-numbering the doors if necessary, that the player picks Door 1, and that the host then opens Door 3, revealing a goat. In other words, the host makes proposition H13 true.

Then the posterior probability of winning by not switching doorsis P(C1|H13).

31)( iCP

Page 38: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

The probability of winning by switching is P(C2|H13), since under our assumption switching means switching the selection to Door 2, since P(C3|H13) = 0 (the host will never open the door with the car)

3

113

2213

13

2213132

)()|(

)()|()(

)()|()|(

iii CPCHP

CPCHPHP

CPCHPHCP

32

2131

310

311

31

21

311

The posterior probability of winning by not switching doorsis P(C1|H13) = 1/3.

Page 39: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Discrete random variables

Probabilities over discrete variables

C 2 { Heads, Tails }

P(C=Heads) = .5

P(C=Heads) + P(C=Tails) = 1

Possible values (outcomes) are Possible values (outcomes) are discretediscrete

E.g., natural number (0, 1, 2, 3 etc.)E.g., natural number (0, 1, 2, 3 etc.)

Page 40: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Terminology• A (stochastic) experiment is a procedure that yields

one of a given set of possible outcomes• The sample space S of the experiment is the set of

possible outcomes.• An event is a subset of sample space.• A random variable is a function that assigns a real

value to each outcome of an experiment

Normally, a probability is related to an experiment or a trial.

Let’s take flipping a coin for example, what are the possible outcomes?Heads or tails (front or back side) of the coin will be shown upwards.After a sufficient number of tossing, we can “statistically” concludethat the probability of head is 0.5.In rolling a dice, there are 6 outcomes. Suppose we want to calculate the prob. of the event of odd numbers of a dice. What is that probability?

Page 41: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Random Variables

• A “random variable” V is any variable whose value is unknown, or whose value depends on the precise situation.– E.g., the number of students in class today– Whether it will rain tonight (Boolean

variable)• The proposition V=vi may have an uncertain

truth value, and may be assigned a probability.

Page 42: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Example• A fair coin is flipped 3 times. Let S be the sample

space of 8 possible outcomes, and let X be a random variable that assignees to an outcome the number of heads in this outcome.

• Random variable X is a function X:S → X(S), where X(S)={0, 1, 2, 3} is the range of X, which is the number of heads, andS={ (TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH) }

• X(TTT) = 0 X(TTH) = X(HTT) = X(THT) = 1X(HHT) = X(THH) = X(HTH) = 2X(HHH) = 3

• The probability distribution (pdf) of random variable X is given by P(X=3) = 1/8, P(X=2) = 3/8, P(X=1) = 3/8, P(X=0) = 1/8.

Page 43: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Experiments & Sample Spaces• A (stochastic) experiment is any process by

which a given random variable V gets assigned some particular value, and where this value is not necessarily known in advance.– We call it the “actual” value of the variable,

as determined by that particular experiment.

• The sample space S of the experiment is justthe domain of the random variable, S = dom[V].

• The outcome of the experiment is the specific value vi of the random variable that is selected.

Page 44: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Events• An event E is any set of possible outcomes in S…

– That is, E S = dom[V].•E.g., the event that “less than 50 people show

up for our next class” is represented as the set {1, 2, …, 49} of values of the variable V = (# of people here next class).

• We say that event E occurs when the actual value of V is in E, which may be written VE.– Note that VE denotes the proposition (of

uncertain truth) asserting that the actual outcome (value of V) will be one of the outcomes in the set E.

Page 45: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Probability of an event E

The probability of an event E is the sum of the probabilities of the outcomes in E. That is

Note that, if there are n outcomes in the event E, that is, if E = {a1,a2,…,an} then

p(E) p(s)

sE

p(E) p(

i1

n

ai)

Page 46: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Example

• What is the probability that, if we flip a coin three times, that we get an odd number of tails?

(TTT), (TTH), (THH), (HTT), (HHT), (HHH), (THT), (HTH)

Each outcome has probability 1/8,p(odd number of tails) =

1/8+1/8+1/8+1/8 = ½

Page 47: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

SS

TailTail

HHHHTTTT

THTHHTHT

Sample SpaceSample SpaceS = {HH, HT, TH, TT}S = {HH, HT, TH, TT}

Venn Diagram

OutcomeOutcome

Experiment: Toss 2 Coins. Note Faces.Experiment: Toss 2 Coins. Note Faces.

Event Event

Page 48: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Discrete Probability Distribution ( also called probability mass

function (pmf) )1.List of All possible [x, p(x)] pairs

– x = Value of Random Variable (Outcome)– p(x) = Probability Associated with Value

2.Mutually Exclusive (No Overlap)3.Collectively Exhaustive (Nothing Left Out)4. 0 p(x) 15. p(x) = 1

Page 49: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Visualizing Discrete Probability Distributions

{ (0, .25), (1, .50), (2, .25) }

ListingListingTableTable

GraphGraph EquationEquation

# # TailsTails f(xf(x))CountCount

p(xp(x))

00 11 .25.2511 22 .50.5022 11 .25.25

pp xx nnxx nn xx

pp ppxx nn xx(( )) !!!! (( )) !!

(( ))

11

.00.00

.25.25

.50.50

00 11 22xx

p(x)p(x)

Page 50: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

• Marginal Probability

• Conditional ProbabilityJoint Probability

N is the total number of trials andnij is the number of instances where X=xi and Y=yj

Page 51: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

• Sum Rule

Product Rule

Page 52: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

The Rules of Probability

• Sum Rule

• Product Rule

Page 53: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Continuous variablesProbability Distribution Function

(PDF)a.k.a. “marginal probability”

p(x) P(a · x · b) = sab p(x) dx

x

Notation: P(x) is probp(x) is PDF

Page 54: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Continuous variablesProbability Distribution Function (PDF)

Let x 2 Rp(x) can be any function s.t. s-1

1 p(x) dx = 1p(x) ¸ 0

Define P(a · x · b) = sab p(x) dx

Page 55: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Continuous Prob. Density Function

1.Mathematical Formula

2.Shows All Values, x, and Frequencies, f(x)– f(x) Is Not Probability

3.Properties

((Area Under Curve)Area Under Curve)ValueValue

((Value, Frequency)Value, Frequency)

f(x)f(x)

aa bbxxff xx dxdx

ff xx

(( ))

(( ))

All All xx

aa x x bb

11

0,0,

Page 56: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Continuous Random Variable Probability

Probability Is Area Probability Is Area Under Curve!Under Curve!

PP cc xx dd ff xx dxdxccdd

(( )) (( ))

f(x)f(x)

Xc d

Page 57: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Probability mass functionIn probability theory, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value. A pmf differs from a probability density function (pdf) in that the values of a pdf, defined only for continuous random variables, are not probabilities as such. Instead, the integral of a pdf over a range of possible values (a, b] gives the probability of the random variable falling within that range.

Example graphs of a pmfs. All the values of a pmf must be non-negative and sum up to 1. (right) The pmf of a fair die. (All the numbers on the die have an equal chance of appearing on top when the die is rolled.)

Page 58: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Suppose that X is a discrete random variable, taking values on some countable sample space  S ⊆ R. Then the probability mass function  fX(x)  for X is given by

Note that this explicitly defines  fX(x)  for all real numbers, including all values in R that X could never take; indeed, it assigns such values a probability of zero.

Example. Suppose that X is the outcome of a single coin toss, assigning 0 to tails and 1 to heads. The probability that X = x is 0.5 on the state space {0, 1} (this is a Bernoulli random variable), and hence the probability mass function is

Page 59: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Uniform Distribution1. Equally Likely Outcomes

2. Probability Density

3. Mean & Standard Deviation Mean Mean MedianMedian

f xd c

( ) 1

c d d c

2 12

1d c

x

f(x)

dc

Page 60: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Uniform Distribution Example

• You’re production manager of a soft drink bottling company. You believe that when a machine is set to dispense 12 oz., it really dispenses 11.5 to 12.5 oz. inclusive.

• Suppose the amount dispensed has a uniform distribution.

• What is the probability that less than 11.8 oz. is dispensed?

Page 61: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Uniform Distribution Solution

P(11.5 x 11.8) = (Base)(Height) = (11.8 - 11.5)(1) = 0.30

11.511.5 12.512.5

ff((xx))

xx11.811.8

1 112 5 11511

10

d c

. .

.

1.01.0

Page 62: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Normal Distribution

1. Describes Many Random Processes or Continuous Phenomena

2. Can Be Used to Approximate Discrete Probability Distributions

– Example: Binomial

3. Basis for Classical Statistical Inference

4. A.k.a. Gaussian distribution

Page 63: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Normal Distribution1. ‘Bell-Shaped’ &

Symmetrical

2.Mean, Median, Mode Are Equal

4. Random Variable Has Infinite Range MeanMean

X

f(X)

* light-tailed distribution

Page 64: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Probability Density Function

2

21

e2

1)(

x

xf

f(x) = Frequency of Random Variable x = Population Standard Deviation = 3.14159; e = 2.71828x = Value of Random Variable (-< x < ) = Population Mean

Page 65: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop
Page 66: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Effect of Varying Parameters ( & )

X

f(X)

CA

B

Page 67: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Normal Distribution Probability

?)()( dxxfdxcPd

c

c d x

f(x)

Probability is Probability is area under area under curve!curve!

Page 68: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

X

f(X)

Infinite Number of Tables

Normal distributions differ by Normal distributions differ by mean & standard deviation.mean & standard deviation.

Each distribution would Each distribution would require its own table.require its own table.

That’s an That’s an infinite infinite number!number!

Page 69: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Standardize theNormal Distribution

X

One table!

Normal Distribution

= 0

= 1

Z

Z X

Standardized

Normal Distribution

Page 70: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Intuitions on Standardizing• Subtracting from each value X just

moves the curve around, so values are centered on 0 instead of on

• Once the curve is centered, dividing each value by >1 moves all values toward 0, pressing the curve

Page 71: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Standardizing Example

X= 5

= 10

6.2

Normal Distribution

Z X

6 2 510

12. .

Page 72: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Standardizing Example

X= 5

= 10

6.2

Normal Distribution

Z X

6 2 510

12. .

Z= 0

= 1

.12

Standardized Normal Distribution

Page 73: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Why use Gaussians?• Convenient analytic properties• Central Limit Theorem• Works well• Not for everything, but a good

building block• For more reasons, see

[Bishop 1995, Jaynes 2003]

Page 74: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Rules for continuous PDFs

Same intuitions and rules apply

“Sum rule”: s-11 p(x) dx = 1

Product rule: p(x,y) = p(x|y)p(x)Marginalizing: p(x) = s p(x,y)dy

… Bayes’ Rule, conditioning, etc.

Page 75: Probability Theory Longin Jan Latecki Temple University Slides based on slides by Aaron Hertzmann, Michael P. Frank, and Christopher Bishop

Multivariate distributions

Uniform: x » U(dom) Gaussian: x » N(, )