is the birthday problem more than a curiosity? · probability, 1976 [after developing a formula for...

13
11/9/2014 1 Steven J. Wilson Johnson County Community College AMATYC 2014, Nashville, Tennessee How many people must be in a room before the probability of at least two sharing a birthday is greater than 50%? Johnny Carson’s stab at it … http://www.cornell.edu/video/ the-tonight-show-with-johnny- carson-feb-6-1980-excerpt Also Feb 7, Feb 8 Johnny Carson, 1925-2005 Ed McMahon, 1923-2009

Upload: others

Post on 15-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

1

Steven J. Wilson

Johnson County Community College

AMATYC 2014, Nashville, Tennessee

How many people must

be in a room before the

probability of at least two

sharing a birthday is

greater than 50%?

Johnny Carson’s stab at it …

http://www.cornell.edu/video/

the-tonight-show-with-johnny-

carson-feb-6-1980-excerpt

Also Feb 7, Feb 8

Johnny Carson, 1925-2005

Ed McMahon, 1923-2009

Page 2: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

2

P(at least 2 share) = 1 – P(no one shares)

To find P(no one shares), do probabilities

of choosing a non-matching birthday:

So P(at least 2 share)

365

365

364

365

363

365

362

365

11

365

n

365 364 363 (365 ( 1))

365n

n

365365!

(365 )! 365 365

n

n n

P

n

3651365

n

n

P

P(at least 2 share)

For n = 23, P(at least 2 share) = 50.73%

3651365

n

n

P

n P(sharing)

5 2.71%

10 11.69%

15 25.29%

20 41.14%

25 56.87%

30 70.63%

35 81.44%

40 89.12%

45 94.10%

50 97.04%

The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation … … what calculators do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories.

--- Paul Halmos (1916-2006), I Want to Be a Mathematician, 1985

Page 3: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

3

Is this only a curiosity?

How was this computed historically?

What is the underlying distribution?

Generalizations?

Applications?

[After solving the Birthday Problem]

“The next example in this section not

only possesses the virtue of giving rise

to a somewhat surprising answer, but it

is also of theoretical interest.”

- Sheldon Ross, A First Course in

Probability, 1976

[After developing a formula for the

probability that no point appears twice when sampling with replacement]

“A novel and rather surprising

application of [the formula] is the

so-called birthday problem.”

- Hoel, Port, & Stone, Introduction to

Probability Theory, 1971

Page 4: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

4

Richard von Mises (1883-1953)

often gets credit for posing it in

1939, but …

… he sought the expected

number of repetitions as

a function of the

number of people.

Ball & Coxeter included it in their 11th

edition of Mathematical Recreations

and Essays, published in 1939.

They gave credit to Harold Davenport

(1907-1969), who shared it about 1927.

But he did not think he was the

originator either.

So how did they do the computations back then?

Paul Halmos: “The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation …”

Page 5: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

5

P(no one shares)

Recall the Taylor Series:

First-order approximation: P(no one shares)

365 364 363 362 365 ( 1)

365 365 365 365 365

n

0 1 2 11 1 1 1

365 365 365 365

n

2

12 !

nx x x

e xn

0/365 1/365 2/365 ( 1)/365ne e e e (1 2 ( 1))/3651 ne

( ( 1)/2)/365n ne

Solving P(at least 2 share) > 0.50

( ( 1)/2)/365

( ( 1)/2)/365

1 0.5

0.5

( 1)ln(0.5)

2(365)

( 1) (365) ln 4 505.997

n n

n n

e

e

n n

n n

1 1 4(506) 1 202523

2 2n

2 506 0n n

2

( 1) ln 4

ln 4 0

1 1 4 ln 4

2

n n d

n n d

dn

0.5 0.25 ln 4

0.5 1.177

n d

n d

Page 6: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

6

So n is roughly proportional to the

square root of d, with

proportionality constant about 1.2.

1.2 365 22.93

0.5 1.177 365 22.99

n

n

(Pat’sBlog attributes the factor 1.2 to Persi Diaconis)

0.5 0.25 ln 4

0.5 1.177

1.2

n d

n d

n d

P(2 people not sharing) =

Among n people,

number of pairs =

oFor 23 people, there are 253 pairs!

oThat’s 253 chances of a birthday

match!

P(2 of n people sharing) ≈

oAlarm Bells! Independence assumed!

364

365

2

( 1)

2n

n nC

( 1)

23641

365

n n

P(2 of n people sharing) ≈

o Alarm Bells temporarily suppressed!

Solving:

gives n > 22.98

( 1)

23641

365

n n

( 1)

2

2

3641 0.5

365

505.3 0

n n

n n

Page 7: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

7

Discrete or Continuous?

These answer the wrong question

Discrete

Distribution

Random

Variable

Assumptions

Binomial No. of Successes Fixed n, constant p, independent

Hypergeometric No. of Successes Fixed n, p varies, dependent

Poisson No. of Successes Fixed time, constant λ,

independent

Geometric First Success n varies, constant p, independent

Negative

Binomial

k-th Success n varies, constant p, independent

Multinomial No. of Each Type Fixed n, constant p, independent

Place n indistinguishable balls

into d distinguishable bins.

Balls → People, Bins → Birthdays

When does one bin contain two

(or more) balls?

Bernoulli ???

Multinomial ???

Want “first match” !

AKA: occupancy problem,

collision counting

Page 8: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

8

How many trials to get the first match?

Variables:

x = number of trials to get the first match

d = number of equally likely options

(birthdays)

Pattern:

Not Not Not … Not Match

1 2 ( 2) 1d d d d x x

d d d d d

Pattern:

Not Not Not … Not Match

Probability:

1 2 ( 2) 1d d d d x x

d d d d d

1

( 1)( ) for 2,3,4,..., 1d xx

xP X x P x d

d

PDF CDF

When n=23,

the probability

of at least one

match first

exceeds 50%

The probability

of obtaining

the first match

is highest when

n = 20

MODE

MEAN

E(X) ≈ 24.6166

MEDIAN

Page 9: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

9

The PDF is discrete, so avoid calculus.

For which x is P(x) > P(x+1) ?

For d=365, we get x>19.61, so 19 or 20

Approximating:

1 1

1

2

1

( 1) ! !

( 1)! ( )!

( 1) ( 1)

0

1 1 4

2

d x d xx x

x x

x xP P

d d

x d x d

d d x d d x

d x x d x

x x d

dx

EXACT FORMULA!

x d for large d

Since

then

which is related to Ramanujan’s Q-function,

specifically

which is asymptotic:

and for d=365, we get:

1

( 1)( ) for 2,3,4,..., 1d xx

xP X x P x d

d

1

1

2

1( )

d

d xxx

xE X x P

d

( ) 1 ( )E X Q d

1 1 4( ) 1

2 3 12 2 135

dE X

d d

365 2 1 4( ) 24.6166

2 3 12 730 49275E X

A Jovian day = 9.9259 Earth-hours

A Jovian year = 11.86231 Earth-years

Both 1047625 and 10476P25 cause a

TI-84 overflow!

365.24 24 11 11.86231

1 1 9.9259

10476

Edays Ehrs JdayJyr Eyrs

Eyr Eday Ehrs

Jdays

10476 1

( 1)( ) for 2,3,4,...,10477

10476xx

xP X x P x

Page 10: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

10

Proportionality?

Heuristically?

1.2 10476 122.823n

ln 4( 1)

10476ln

10475

n n

( 1)

2104751 0.5

10476

n n

121.009n

Series Approximation:

Mathematica: for n = 121,

P(match) = 50.13%

( 1) (10476)ln4n n

10476 10475 10476 ( 1)0.5

10476 10476 10476

n

121.012n

Mean:

Median:

Mode:

1 1 4102.854

2

dx

2 1 4( ) 128.947

2 3 12 2 135

dE X

d d

1 1 4 ln 4121.012

2

dn

Page 11: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

11

Digitally

signed document

Your PIN

(private key)

F(Message)

Message

F

The hash function F

is not one-to-one!

Impostor Message

Alt 4

Alt 1

Alt 2

Alt 5

Alt 3

Imp 1

Imp 2

Imp 3

Imp 4 Imp 5

Birthday Attack: Find a good message

and a fraudulent message with the same value of F(Message).

F

G

Given n good messages, n fraudulent

messages, and a hash function with d

possible outputs, what is the probability

of a match between 2 sets?

Assume , so n outputs are

probably all different.

P(message 1 in set 1 doesn’t match

set 2) =

n d

1n

d

P(no message in set 1 matches

anything in set 2) =

50% probability of at least one

match between the sets:

Probability p of at least one match

between the sets:

2/ /1

nn

n d n dne e

d

2 /

2

1 0.50

ln 2

0.83

n de

n d

n d

1ln

1n d

p

Page 12: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

12

With a 16-bit hash function, there

are possible outputs

50% probability of a match with

messages:

1% probability of a match with

messages

162 65,536

0.83 65536 213n

165536ln 26

0.99n

Increase the size of the hash function:

oWith 64-bits, outputs are possible

o50% probability of a match with 3.5 billion messages (11 messages per US citizen)

o1% probability of a match with 431 million messages

642 18,446,744,073,709,551,616

Is this only a curiosity?

No, it’s part of a class of problems

How was this computed historically?

Using approximations (including Taylor Series)

What is the underlying distribution?

“First Match” Distribution

Generalizations?

Vary days in a year (other planets)

Applications?

Digital signatures

Page 13: Is the Birthday Problem More Than a Curiosity? · Probability, 1976 [After developing a formula for the probability that no point appears twice when sampling with replacement] “A

11/9/2014

13

Steven J. Wilson

Johnson County Community College

Overland Park, KS 66210

[email protected]

A PDF of the presentation is available at:

http://www.milefoot.com/about/presentations/birthday.pdf