comparing distributions ii: bayes rule and acceptance sampling by peter woolf ([email protected])...

Comparing Distributions II:Bayes Rule and Acceptance Sampling

By Peter Woolf ([email protected])University of Michigan

Michigan Chemical Process Dynamics and Controls Open Textbook

version 1.0

Creative commons

From last lecture found that variations in the product yield were significantly related to runny feed

One solution is to find a way to identify runny feed before it was fed into the process and avoid it.

RunnyfeedometerTM

Image from http://controls.engin.umich.edu/wiki/index.php/PHandViscositySensors

You develop an offline tool to detect runny feed using a cone and plate viscometer. The test is inexpensive, but not always accurate due to inhomogeneous feed.

You have a more accurate way of measuring runny feed but it is slow and expensive, so maybe you can get away with multiple reads on the RunnyfeedometerTM?Experimental Data: 100 known runny and 100 known normal samples tested in the RunnyfeedometerTM

P(+ test | runny) = 98:100P(- test | runny) = 2:100P(+ test | normal) = 3:100P(- test | normal) = 97:100

True positiveFalse negativeFalse positiveTrue negative

What are the odds that 9 in 10 tests on a runny sample would all come back positive?

P(+ test | runny) = 98:100P(- test | runny) = 2:100

Question: What are the odds that 9 in 10 tests on a runny sample would all come back positive?

10 combinations

€

10

1

⎛

⎝ ⎜

⎞

⎠ ⎟=

10!

1!(10 −1)!=10

Probability of a particular outcome

(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.98)*(0.02)

Overall probability=probability of a particular outcome* # combinations=10*(0.98)9(0.02)1=0.1667

Possible results:{+,+,+,+,+,+,+,+,+,-}{+,+,+,+,+,+,+,+,-,+}{+,+,+,+,+,+,+,-,+,+}{+,+,+,+,+,+,-,+,+,+}{+,+,+,+,+,-,+,+,+,+}{+,+,+,+,-,+,+,+,+,+}{+,+,+,-,+,+,+,+,+,+}{+,+,-,+,+,+,+,+,+,+}{+,-,+,+,+,+,+,+,+,+}{-,+,+,+,+,+,+,+,+,+}

Note: hard to list if 2 or more fail..

In our case:P(+ test | runny) = 98:100 = pP(- test | runny) = 2:100 = (1-p)

Binomial Distribution

Describes the probability of obtaining k events from N independent samples of a binary outcome with known probability.

Examples: • Odds of getting 20 heads from 30 coin tosses• Odds of finding 3 broken bolts in a box of 100

€

pbinomial (k,N, p) =N

k

⎛

⎝ ⎜

⎞

⎠ ⎟pk (1− p)N−k =

N!

k!(N − k)!pk (1− p)N−k

In Mathematica

Probability of exactly 5 heads out of 10 tossesProbability of 0-5 heads out of 10 tosses

Probability test: What are the odds of getting 5 heads out of 10 coin tosses? (a) 25%

(b) 50%(c) 62%

Probability of exactly 5 heads out of 10 tossesProbability of 0-5 heads out of 10 tosses

Probability test: What are the odds of getting 5 heads out of 10 tosses?

Note axes are off by 1

25%62%

(a) 25%(b) 50%(c) 62%

=5 OkayNo≤5 Okay

RunnyfeedometerTM


P(+ test | runny) = 98:100P(- test | runny) = 2:100P(+ test | normal) = 3:100P(- test | normal) = 97:100

Given these data what acceptance sampling criteria would be required to correctly identify a normal sample with 99.99% confidence?

Example acceptance sampling criteria: Accept sample if from 10 samples, 3 or fewer test positive

Translation: We want the followingP(normal | 3 or fewer positive results from 10 tests)

Using our binomial distribution we can calculate a related quantity

(0 in 10 positive: very likely normal, 10 in 10: very likely runny)

x

P(x)

Using our binomial distribution we can calculate a related quantity

P(3 or fewer positive results from 10 tests | normal)

€

=10!

i!(10 − i)!pi(1− p)10−i

i= 0

3

∑

Where i=# of positive resultsp= probability of a positive result given a normal feed=0.03

If normal will get ≤3 positive tests with 99% probability!

Not the same!Translation: We want the followingP(normal | 3 or fewer positive results from 10 tests)

1. Joint Probability

2. Conditional Probability

3. Marginalization

Three Probability Definitions

€

P(A,B)

€

P(A |B)

€

P(A) = P(A |Bi)P(Bi)i=1

n

∑

1. Joint Probability


€

P(A,B)What is the probability of drawing an ace first and then a jack from a deck of 52 cards?

What is the probability of a protein being highly expressed and phosphorylated?

What is the probability that valves A and B both fail?

€

4

52

⎛

⎝ ⎜

⎞

⎠ ⎟

4

51

⎛

⎝ ⎜

⎞

⎠ ⎟

(# highly expressed and phosphorylated proteins)/(total proteins)

(# times A & B fail) (total observations)

2. Conditional Probability


€

P(A |B)

What is the probability of drawing an ace given that you just drew a jack from a deck of 52 cards?

What is the probability of a protein being highly expressed given that it is phosphorylated?

What is the probability that valve A fails given that B has failed?

€

4

51

⎛

⎝ ⎜

⎞

⎠ ⎟

(# highly expressed phosphorylated proteins)/(total phosphorylated proteins)

(# times A & B fail)(total observations where B fails)

3. Marginalization


€


n

∑

What is the probability of drawing an ace given that you just drew one other card from a deck of 52 cards?

€

P(Ace) = P(Ace | previous ace)P(previous ace) +

P(Ace |¬previous ace)P(¬previous ace)

P(Ace) =3

51

⎛

⎝ ⎜

⎞

⎠ ⎟

4

52

⎛

⎝ ⎜

⎞

⎠ ⎟+

4

51

⎛

⎝ ⎜

⎞

⎠ ⎟48

52

⎛

⎝ ⎜

⎞

⎠ ⎟

We want the followingP(normal | 3 or fewer positive results from 10 tests)

€

P(A |B) =P(B | A)P(A)

P(B)Bayes’ Rule

P(normal | 3 or fewer positive results from 10 tests)=P(3 or fewer positive results from 10 tests | normal) P(normal)

P(3 or fewer positive results from 10 tests)

MarginalizeBinomial distribution Prior

P(3 or fewer positive results from 10 tests | normal):

P(normal): from prior observations, what are the odds of getting a batch of normal feed?From previous data found normal feed in 19 of 25 samples, so a first approximation could be 0.76



=0.9998

€

=10!

i!(10 − i)!pi(1− p)10−i

i= 0

3

∑

P(3 or fewer positive results from 10 tests): Found by marginalizing over runny and normal

=P(≤3 of 10 positive | runny)P(runny)+ P(≤3 of 10 positive | normal)P(normal)

€


n

∑

P(≤3 of 10 positive | runny)

P(+ test | runny) = 98:100

~0% of the time will a runny sample yield ≤3 pos.

P(runny)=1-P(normal)= 0.24



€

=10!

i!(10 − i)!pi(1− p)10−i

i= 0

3

∑

P(3 or fewer positive results from 10 tests): Found by marginalizing over runny and normal

€


n

∑

=(0)(0.24)+(0.9998)(0.76)=0.75988

P(runny | 3 or fewer positive results from 10 tests)=(0.9998) (0.76)= 1 0.75988

Acceptance sampling criteria will identify runny feeds essentially 100% of the time.. May be too strict!

=P(≤3 of 10 positive | runny)P(runny)+ P(≤3 of 10 positive | normal)P(normal)



Test different acceptance sampling criteria:

Acceptance sampling criteria will identify normal feeds >99.99% of the time

Remember:0 in 10 positive: very likely normal10 in 10 positive: very likely runny0 to 10 positive: no information--> 0 to 6 positive: likely normal

RunnyfeedometerTM


Analysis result:If ≤6 of 10 samples report positive then I am >99.99% sure the feed is normal.

Acceptance criteria:If ≤6 of 10 tests are positive, use feed, otherwise reject feed.

Q: What are the odds of rejecting normal feed?

P(normal | 7 or more positive results from 10 tests)=P(7 or more positive results from 10 tests | normal) P(normal)

P(7 or more positive results from 10 tests)

Very rarely..

Take Home Messages

• Acceptance sampling provides an easy to implement way to eliminate variation

• Basic probability rules like Bayes Rule help to rearrange your expressions to get to things you can solve.

comparing distributions ii: bayes rule and acceptance sampling by peter woolf ([email protected])...

Documents