stat 20: intro to probability and statisticstchilders/stat20/lecture16.pdf · stat 20: intro to...

28
Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley 22 July 2014

Upload: dominh

Post on 15-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Stat 20: Intro to Probability and StatisticsLecture 16: More Box Models

Tessa L. Childers-DayUC Berkeley

22 July 2014

Page 2: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

By the end of this lecture...

You will be able to:

Determine what we expect the sum of draws from a box tobe, and how far off we will likely be

Quickly calculate the SD of a list with only two kinds ofnumbers

Easily calculate probabilities for sums of draws

Use a box model to address more kinds of problems, e.g.counting the number of “6”s shown in a series of throws

2 / 28

Page 3: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Recap: Box Models

Box models are useful in analyzing games of chance

Draw a box

Indicate the number and kind of tickets

Indicate the number and kind of draws

Indicate what is done with each ticket

Examined minimum and maximum of sum of draws

3 / 28

Page 4: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 1: Box Model

Have a box with three tickets–a “1”, a “2”, and a “3”

Draw 5 times, with replacement

Add together the values seen on each ticket

What is the sum of the draws?

How much does each draw contribute to the sum?

What can we reasonably expect the sum to be?

4 / 28

Page 5: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

The Expected Value

The Expected Value (EV) for the sum of the draws from the box is

# of draws × average of the box

The sum of draws from a box (with replacement) should besomewhere around the expected value.

5 / 28

Page 6: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 2: Rolling Dice

You are playing a dice game. It costs $1 per play. You roll thedice, and if it is an even number, you win $3. If it is odd you winnothing. About how much do you expect to win or lose in 50plays?

1 Draw a box model, indicating the number and kind of tickets

2 Indicate the number and kind of draws

3 Indicate what is done with each ticket

4 Answer the question above

6 / 28

Page 7: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 3: Coin Flipping

You are playing a coin flipping game. It costs $1 to play. You flipthe two coins, and if there is at least one head showing, you win$2. Otherwise, you win nothing.

True or False, and Explain: If you play 30 times, you will definitelywin $10.

7 / 28

Page 8: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Chance Error

Variation around expected value is due to chance error

chance error = # observed – # expected

If I actually win $5, what is my chance error? What if I lose $15?

How big is my chance error likely to be?

8 / 28

Page 9: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Standard Error

The Standard Error (SE) for the sum of the draws from the box is√# of draws × SD of the box

The sum of draws from a box (with replacement) should besomewhere around the expected value, give or take a SE.

9 / 28

Page 10: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 4: Two Boxes

What kind of variability do we expect from the sum of 5 draws,with replacement, from a box with:

1 A single “1” and a single “3”?

2 A single “1” and a single “10”?

Calculate the EV and SE of both of these situations.

10 / 28

Page 11: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

SD Shortcut

Obviously calculating a lot of SDs:

1 Find average

2 Find the deviations from average

3 Square the deviations from average

4 Average the squared deviations

5 Take the square root

11 / 28

Page 12: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

SD Shortcut (cont.)

If there are only two types of tickets in the box (or only twotypes of numbers in the list):

1 Call the larger number the “big #” and the smaller numberthe “small #”

2 Call the fraction of larger numbers “b.f.” and the fraction ofsmaller numbers “s.f.”

3 Calculate

SD = (big # − small #)×√

b.f. × s.f.

12 / 28

Page 13: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

SD Shortcut (cont.)

Let’s look at a box with three “2”s and two “1”s

avg = 1.6

sd =

√(2− 1.6)2 + (2− 1.6)2 + (2− 1.6)2 + (1− 1.6)2 + (1− 1.6)2

5

=

√(0.4)2 + (0.4)2 + (0.4)2 + (−0.6)2 + (−0.6)2

5

= 0.49

sd = (2− 1)×√

3

5× 2

5

= 0.49

13 / 28

Page 14: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Lists vs. Chance Processes

List of numbers (tickets in a box), all values are known

mean = average = sum of values, divided by number ofvalues; the typical size of an entry/ticket

SD = standard deviation = square root of average ofdeviations from mean; the typical size of the deviation fromthe mean in a single entry

The typical entry in a list is around average, give or take astandard deviation or so.

14 / 28

Page 15: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Lists vs. Chance Processes (cont.)

Chance process (draws from a box), values are unknown

EV for sum of draws with replacement = number of drawstimes average of box; typical size of the sum of draws withreplacement

SE for sum of draws with replacement = standard error =square root of number of draws times SD of box; typical sizeof deviation from EV in a single sum of draws

The sum of draws with replacement is around expected value,give or take a standard error or so.

15 / 28

Page 16: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 5: Drawing from a Box

50 draws are taken, with replacement, from a box with 1 each ofthe following: “1”, “2”, “3”, “6”, “8”

1 Calculate the expected value and standard error for the sumof the draws.

2 The sum of the draws will be around , give ortake or so.

3 Someone actually makes 50 draws with replacement. You areasked to guess what the sum is. Do you think your guess is offby about 2, 12, or 20?

4 You are told that 175 is the sum. Fill in the following:

(a) expected value =(b) observed value =(c) chance error =(d) standard error =

16 / 28

Page 17: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Interesting Question:

What is the chance that using the box above (1 each of: “1”, “2”,“3”, “6”, “8”), the sum of 1000 draws is between 3900 and 4100?

Could you find a similar probability for a much smaller number ofdraws?

Recalling the frequency definition of probability, could you find thisprobability?

17 / 28

Page 18: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Interesting Question: (cont.)

Draw 1000 ticketswith replacement,calculate the sum ofthe tickets

Do this 10 times,record the proportionof sums (out of 10)that are between 3900and 4100

Do this 100 times,record the proportionof sums (out of 100)that are between 3900and 4100

0 2000 4000 6000 8000 10000

0.75

0.80

0.85

0.90

Relative Proportion of Observed Sums of 1000 Draws Between 3900 and 4100

Number of Observed Sums

Rel

ativ

e P

ropo

rtio

n B

etw

een

3900

and

410

0

18 / 28

Page 19: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Interesting Question: (cont.)

Draw 1000 ticketswith replacement,calculate the sum ofthe tickets

Do this 200 times,record the proportionof sums (out of 200)that are between 3900and 4100

Do this times,record the proportionof sums that arebetween 3900 and4100

0 2000 4000 6000 8000 10000

0.75

0.80

0.85

0.90

Relative Proportion of Observed Sums of 1000 Draws Between 3900 and 4100

Number of Observed Sums

Rel

ativ

e P

ropo

rtio

n B

etw

een

3900

and

410

0

19 / 28

Page 20: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Interesting Question: (cont.)

Draw 1000 ticketswith replacement,calculate the sum ofthe tickets

Do this 10,000 times,make a histogram ofthe sum

The histogram looksnormal

Histogram of 10,000 Observed Sums of 1,000 Draws From The Box

Sum of 1,000 Draws

Den

sity

3700 3800 3900 4000 4100 4200 4300

0.00

00.

001

0.00

20.

003

0.00

4

20 / 28

Page 21: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Interesting Question: (cont.)

We can use the approximate normality of this curve to calculatethe chance that the sum of 1,000 draws is between 3900 and 4100.

z =value of sum − expected value of sum

standard error of sum

Use the normal table to find the chance that the sum of 1000 drawsfrom the box (“1”, “2”, “3”, “6”, “8”) is between 3900 and 4100.

In general, the normal curve can be used to calculate probabilitiesfor sums of random draws with replacement from a box, when thenumber of draws is “large”

21 / 28

Page 22: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 6: Using the Normal Curve

A fair die is thrown 200 times.

1 Calculate the expected value and standard error for the sumof the throws

2 The sum of the throws will be around , give ortake or so.

3 Find the probability that the sum of the throws is greater than647.

22 / 28

Page 23: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 7: Counting the Evens

A fair die is thrown 600 times.

1 The sum of the throws will be around 2100, give or take 42or so.

2 The sum of evens thrown will be around , give ortake or so.

3 The number of evens thrown will be around , giveor take or so.

23 / 28

Page 24: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 7: Counting the Evens (cont.)

Let’s simplify, and assume a fair die is thrown 5 times. I roll

6 2 4 5 3

If I am making a sum of throws, I will add:

6 + 2 + 4 + 5 + 3 = 20

If I am making a sum of evens thrown, I will add:

6 + 2 + 4 + 0 + 0 = 12

If I am counting the number of evens thrown, I will add:

1 + 1 + 1 + 0 + 0 = 3

Each process can be written as a sum!

24 / 28

Page 25: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 7: Counting the Evens (cont.)

Strategy for adding only certain things or counting number ofthings:

1 Make the box describing the basic chance process

2 Formulate your desired quantity as a sum

3 Change the value of the tickets (but not the number oftickets!) to add as appropriate

25 / 28

Page 26: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Example 7: Counting the Evens (cont.)

A fair die is thrown 600 times. Box is (“1”, “2”, “3”, “4”, “5”,“6”). Have 600 throws, with replacement

1 Sum of throws is like sum of draws from box above, usual EVand SE formulas apply

2 Sum of evens: each even drawn adds itself to the sum. Eachodd drawn adds 0 to the sum. Box becomes (“0”, “2”, “0”,“4”, “0”, “6”). Sum of evens is like sum of draws from thisbox, usual EV and SE formulas apply

3 Number of evens: each even drawn adds 1 to the count (sum).Each odd drawn adds 0 to the count (sum). Box becomes(“0”, “1”, “0”, “1”, “0”, “1”). Number of evens is like sumof draws from this box, usual EV and SE formulas apply.

26 / 28

Page 27: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

In a 0-1 Box

The “1”s represent the event(s) that we wish to count, the“0”s represent the event(s) that we do not wish to count

The EV for the sum = number of draws × average of box.But what is the average of the box?

The SE for the sum =√

number of draws× SD of box. Butwhat is the SD of the box?

The normal curve can be used to calculate chances for sumsof draws: new avg = EV for sum, new SD = SE for sum

27 / 28

Page 28: Stat 20: Intro to Probability and Statisticstchilders/stat20/lecture16.pdf · Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley

Today’s Goals EV and SE Normal Curve Classifying and Counting

Important Takeaways

The EV is the sum that we expect to see

The SE is the amount we expect a particular sum to be “off”from the EV, due to chance error

There is a shortcut for calculating the SD of a list with onlytwo kinds of numbers

The normal curve can be used to calculate chances for sumsof draws: new avg = EV for sum, new SD = SE for sum

Changing the box helps address a lot of problems–either toadd only certain kinds of tickets, or to count the number of acertain event

Next time: Why use the normal curve to calculate probabilities?

28 / 28