comparing distributions ii: bayes rule and acceptance sampling by peter woolf ([email protected])...

21
Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf ([email protected]) University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0 Creative commons

Post on 15-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

Comparing Distributions II:Bayes Rule and Acceptance Sampling

By Peter Woolf ([email protected])University of Michigan

Michigan Chemical Process Dynamics and Controls Open Textbook

version 1.0

Creative commons

Page 2: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

From last lecture found that variations in the product yield were significantly related to runny feed

One solution is to find a way to identify runny feed before it was fed into the process and avoid it.

Page 3: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

RunnyfeedometerTM

Image from http://controls.engin.umich.edu/wiki/index.php/PHandViscositySensors

You develop an offline tool to detect runny feed using a cone and plate viscometer. The test is inexpensive, but not always accurate due to inhomogeneous feed.

You have a more accurate way of measuring runny feed but it is slow and expensive, so maybe you can get away with multiple reads on the RunnyfeedometerTM?Experimental Data: 100 known runny and 100 known normal samples tested in the RunnyfeedometerTM

P(+ test | runny) = 98:100P(- test | runny) = 2:100P(+ test | normal) = 3:100P(- test | normal) = 97:100

True positiveFalse negativeFalse positiveTrue negative

What are the odds that 9 in 10 tests on a runny sample would all come back positive?

Page 4: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

P(+ test | runny) = 98:100P(- test | runny) = 2:100

Question: What are the odds that 9 in 10 tests on a runny sample would all come back positive?

10 combinations

10

1

⎝ ⎜

⎠ ⎟=

10!

1!(10 −1)!=10

Probability of a particular outcome

(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.02)

Overall probability=probability of a particular outcome* # combinations=10*(0.98)9(0.02)1=0.1667

Possible results:{+,+,+,+,+,+,+,+,+,-}{+,+,+,+,+,+,+,+,-,+}{+,+,+,+,+,+,+,-,+,+}{+,+,+,+,+,+,-,+,+,+}{+,+,+,+,+,-,+,+,+,+}{+,+,+,+,-,+,+,+,+,+}{+,+,+,-,+,+,+,+,+,+}{+,+,-,+,+,+,+,+,+,+}{+,-,+,+,+,+,+,+,+,+}{-,+,+,+,+,+,+,+,+,+}

Note: hard to list if 2 or more fail..

Page 5: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

In our case:P(+ test | runny) = 98:100 = pP(- test | runny) = 2:100 = (1-p)

Binomial Distribution

Describes the probability of obtaining k events from N independent samples of a binary outcome with known probability.

Examples: • Odds of getting 20 heads from 30 coin tosses• Odds of finding 3 broken bolts in a box of 100

pbinomial (k,N, p) =N

k

⎝ ⎜

⎠ ⎟pk (1− p)N−k =

N!

k!(N − k)!pk (1− p)N−k

Page 6: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

In Mathematica

Probability of exactly 5 heads out of 10 tossesProbability of 0-5 heads out of 10 tosses

Probability test: What are the odds of getting 5 heads out of 10 coin tosses? (a) 25%

(b) 50%(c) 62%

Page 7: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

Probability of exactly 5 heads out of 10 tossesProbability of 0-5 heads out of 10 tosses

Probability test: What are the odds of getting 5 heads out of 10 tosses?

Note axes are off by 1

25%62%

(a) 25%(b) 50%(c) 62%

=5 OkayNo≤5 Okay

Page 8: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

RunnyfeedometerTM

Image from http://controls.engin.umich.edu/wiki/index.php/PHandViscositySensors

P(+ test | runny) = 98:100P(- test | runny) = 2:100P(+ test | normal) = 3:100P(- test | normal) = 97:100

Given these data what acceptance sampling criteria would be required to correctly identify a normal sample with 99.99% confidence?

Example acceptance sampling criteria: Accept sample if from 10 samples, 3 or fewer test positive

Translation: We want the followingP(normal | 3 or fewer positive results from 10 tests)

Using our binomial distribution we can calculate a related quantity

(0 in 10 positive: very likely normal, 10 in 10: very likely runny)

Page 9: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

x

P(x)

Using our binomial distribution we can calculate a related quantity

P(3 or fewer positive results from 10 tests | normal)

=10!

i!(10 − i)!pi(1− p)10−i

i= 0

3

Where i=# of positive resultsp= probability of a positive result given a normal feed=0.03

If normal will get ≤3 positive tests with 99% probability!

Not the same!Translation: We want the followingP(normal | 3 or fewer positive results from 10 tests)

Page 10: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

1. Joint Probability

2. Conditional Probability

3. Marginalization

Three Probability Definitions

P(A,B)

P(A |B)

P(A) = P(A |Bi)P(Bi)i=1

n

Page 11: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

1. Joint Probability

Three Probability Definitions

P(A,B)What is the probability of drawing an ace first and then a jack from a deck of 52 cards?

What is the probability of a protein being highly expressed and phosphorylated?

What is the probability that valves A and B both fail?

4

52

⎝ ⎜

⎠ ⎟

4

51

⎝ ⎜

⎠ ⎟

(# highly expressed and phosphorylated proteins)/(total proteins)

(# times A & B fail) (total observations)

Page 12: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

2. Conditional Probability

Three Probability Definitions

P(A |B)

What is the probability of drawing an ace given that you just drew a jack from a deck of 52 cards?

What is the probability of a protein being highly expressed given that it is phosphorylated?

What is the probability that valve A fails given that B has failed?

4

51

⎝ ⎜

⎠ ⎟

(# highly expressed phosphorylated proteins)/(total phosphorylated proteins)

(# times A & B fail)(total observations where B fails)

Page 13: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

3. Marginalization

Three Probability Definitions

P(A) = P(A |Bi)P(Bi)i=1

n

What is the probability of drawing an ace given that you just drew one other card from a deck of 52 cards?

P(Ace) = P(Ace | previous ace)P(previous ace) +

P(Ace |¬previous ace)P(¬previous ace)

P(Ace) =3

51

⎝ ⎜

⎠ ⎟

4

52

⎝ ⎜

⎠ ⎟+

4

51

⎝ ⎜

⎠ ⎟48

52

⎝ ⎜

⎠ ⎟

Page 14: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

in general if independent

Probability Algebra

P(A,B) = P(A |B)P(B)

⇒ P(A)P(B)

P(A,B) = P(A |B)P(B) = P(B | A)P(A)

P(A |B) =P(B | A)P(A)

P(B)Bayes’ Rule

Page 15: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

We want the followingP(normal | 3 or fewer positive results from 10 tests)

P(A |B) =P(B | A)P(A)

P(B)Bayes’ Rule

P(normal | 3 or fewer positive results from 10 tests)=P(3 or fewer positive results from 10 tests | normal) P(normal)

P(3 or fewer positive results from 10 tests)

MarginalizeBinomial distribution Prior

Page 16: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

P(3 or fewer positive results from 10 tests | normal):

P(normal): from prior observations, what are the odds of getting a batch of normal feed?From previous data found normal feed in 19 of 25 samples, so a first approximation could be 0.76

P(normal | 3 or fewer positive results from 10 tests)=P(3 or fewer positive results from 10 tests | normal) P(normal)

P(3 or fewer positive results from 10 tests)

=0.9998

=10!

i!(10 − i)!pi(1− p)10−i

i= 0

3

Page 17: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

P(3 or fewer positive results from 10 tests): Found by marginalizing over runny and normal

=P(≤3 of 10 positive | runny)P(runny)+ P(≤3 of 10 positive | normal)P(normal)

P(A) = P(A |Bi)P(Bi)i=1

n

P(≤3 of 10 positive | runny)

P(+ test | runny) = 98:100

~0% of the time will a runny sample yield ≤3 pos.

P(runny)=1-P(normal)= 0.24

P(normal | 3 or fewer positive results from 10 tests)=P(3 or fewer positive results from 10 tests | normal) P(normal)

P(3 or fewer positive results from 10 tests)

=10!

i!(10 − i)!pi(1− p)10−i

i= 0

3

Page 18: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

P(3 or fewer positive results from 10 tests): Found by marginalizing over runny and normal

P(A) = P(A |Bi)P(Bi)i=1

n

=(0)(0.24)+(0.9998)(0.76)=0.75988

P(runny | 3 or fewer positive results from 10 tests)=(0.9998) (0.76)= 1 0.75988

Acceptance sampling criteria will identify runny feeds essentially 100% of the time.. May be too strict!

=P(≤3 of 10 positive | runny)P(runny)+ P(≤3 of 10 positive | normal)P(normal)

P(normal | 3 or fewer positive results from 10 tests)=P(3 or fewer positive results from 10 tests | normal) P(normal)

P(3 or fewer positive results from 10 tests)

Page 19: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

Test different acceptance sampling criteria:

Acceptance sampling criteria will identify normal feeds >99.99% of the time

Remember:0 in 10 positive: very likely normal10 in 10 positive: very likely runny0 to 10 positive: no information--> 0 to 6 positive: likely normal

Page 20: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

RunnyfeedometerTM

Image from http://controls.engin.umich.edu/wiki/index.php/PHandViscositySensors

Analysis result:If ≤6 of 10 samples report positive then I am >99.99% sure the feed is normal.

Acceptance criteria:If ≤6 of 10 tests are positive, use feed, otherwise reject feed.

Q: What are the odds of rejecting normal feed?

P(normal | 7 or more positive results from 10 tests)=P(7 or more positive results from 10 tests | normal) P(normal)

P(7 or more positive results from 10 tests)

Very rarely..

Page 21: Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics

Take Home Messages

• Acceptance sampling provides an easy to implement way to eliminate variation

• Basic probability rules like Bayes Rule help to rearrange your expressions to get to things you can solve.