cs#109 lecture#10 april#18th,2016 - stanford university · 2016-04-20 · • common#for#natural...

44
CS 109 Lecture 10 April 18th, 2016

Upload: others

Post on 13-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

CS  109Lecture  10

April  18th,  2016

Page 2: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• X is  a  Normal  Random  Variable:  X ~  N(µ,  σ 2)§ Probability  Density  Function  (PDF):

§

§

§ Also  called  “Gaussian”§ Note:  f(x) is  symmetric  about  µ

∞<<∞−= −− xexf x where2

1)(22 2/)( σµ

πσµ=][XE

2)( σ=XVar)(xf

x

µ

The Normal Distribution

Page 3: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• Common  for  natural  phenomena:  heights,  weights,  etc.

• Often  results  from  the  sum  of  multiple  variables

• Most  noise  is  Normal

Why Use the Normal?

Page 4: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Four Prototypical Trajectories

Or  that  is  what  they  want  you  to  believe

Page 5: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• Common  for  natural  phenomena:  heights,  weights,  etc.

• Often results from the sum ofmultiple variables

• Most  noise  is  Normal

Except Not

These are log-normal

Only if they are equally weighted

I wish it were that easy

Page 6: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Four Prototypical Trajectories

It  is  the  most  important  distribution

Page 7: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Four Prototypical Trajectories

Because  of  a  deeper  truth…

Page 8: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Training Data Set (x, y)

Learning Algorithm

Trained ModelTest x Predicted

y

World Model (𝜃)

Supervised Learning

Key question: How well can you generalize

Page 9: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Occam’s Razor

“The  simplest  explanation   is  usually   the  best  one”

Page 10: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Model Degrees of Freedom

Goo

dnes

s of F

it Training Set

Testing Set

Overfiting

Just right Too complex

Page 11: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Complexity is Tempting

value

prob

abilit

y

*  That  describes  the  training  data,  but  will  it  generalize?

Page 12: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

value

prob

abilit

y

µ

σ2

Simplicity is Humble

*  A  Gaussian  maximizes  entropy  for  a  given  mean  and  variance

Page 13: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• Carl  Friedrich  Gauss  (1777-­1855)  was  a  remarkably  influential  German  mathematician

• Started  doing  groundbreaking  math  as  teenager§ Did  not  invent  Normal  distribution,  but  popularized   it

• He  looked  more  like  Martin  Sheen§ Who  is,  of  course,  Charlie  Sheen’s  father

Carl Friedrich Gauss

Page 14: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Anatomy of a beautiful equation

probability density at x

the distance to the mean

f(x) =1

p2⇡

e

�(x�µ)2

2�2

a constant

“exponential”

sigma shows up twice

N (µ,�2)

Page 15: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

P (a X b) =

Z 1

�1

1

p2⇡

e

�(x�µ)2

2�2dx

P (a X b) =

Z 1

�1

1

p2⇡

e

�(x�µ)2

2�2dx

Let’s try and integrate it!

*  Call  me  if  you  find  an  equation  for  this

Page 16: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Four Prototypical Trajectories

No  closed  form  for  the  integral

Page 17: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Four Prototypical Trajectories

No  closed  form  for  F(x)

Page 18: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

F (x) = �

✓x� µ

Spoiler Alert

A function that has been solved for numerically

*  We  are  going  to  spend  the  next  few  slides  getting  here

The cumulative density function of

any normal

N (µ,�2)

Page 19: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Linear Transform of Normal is NormalLet X ⇠ N (µ,�2)

Y = aX + bIf then Y is also Normal

E[Y ] = E[aX + b]

= aE[X] + b

= aµ+ b

V ar(Y ) = V ar(aX + b)

= a2V ar(X)

= a2�2

Y ⇠ N (aµ+ b, a2�2)

Page 20: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Special Linear Transform

Y ⇠ N (aµ+ b, a2�2)

Z =X � µ

�=

1

�X � µ

There is a special case of linear transform for any X:

Z ⇠ N (aµ+ b, a2�2)

⇠ N (µ

�� µ

�,�2

�2)

⇠ N (0, 1)

Y = aX + bIf then Y is also Normal

a =1

�b = �µ

Page 21: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• Z is  a  Standard  (or  Unit)  Normal  RV: Z ~ N(0, 1)§ E[Z] = m = 0 Var(Z) = s 2 = 1 SD(Z) = s = 1§ CDF of Z, FZ(z) does not have closed form

§ We denote FZ(z) as Φ(z): “phi of z”

§ By symmetry: Φ(–z) = P(Z ≤ –z) = P(Z ≥ z) = 1 – Φ(z)

dxedxezZPzz

xz

x ∫∫∞−

∞−

−− ==≤= 2/2/)( 222

21

21)()Φ(

ππσσµ

Standard (Unit) Normal Variable

Page 22: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Symmetry of Phi

a-a

ɸ(-a)

1 – ɸ(a)

ɸ(-a) = 1 – ɸ(a)

µ = 0

Page 23: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Interval of Phi

dc

ɸ(c)ɸ(d)

µ = 0

Page 24: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Interval of Phi

dc

ɸ(c)ɸ(d)

ɸ(d) - ɸ(c)

µ = 0

Page 25: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Use Z to compute F(x)

Compute F(x) via Transform

FX(x) = P (X x)

= P (X � µ x� µ)

= P

✓X � µ

x� µ

= P

✓Z x� µ

= �

✓x� µ

FX(x) = P (X x)

= P (X � µ x� µ)

= P

✓X � µ

x� µ

= P

✓Z x� µ

= �

✓x� µ

FX(x) = P (X x)

= P (X � µ x� µ)

= P

✓X � µ

x� µ

= P

✓Z x� µ

= �

✓x� µ

FX(x) = P (X x)

= P (X � µ x� µ)

= P

✓X � µ

x� µ

= P

✓Z x� µ

= �

✓x� µ

FX(x) = P (X x)

= P (X � µ x� µ)

= P

✓X � µ

x� µ

= P

✓Z x� µ

= �

✓x� µ

Let X ⇠ N (µ,�2) Z =X � µ

�=

1

�X � µ

Page 26: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

And here we are

Table of Φ(z) values in textbook, p. 201 and handout

F (x) = �

✓x� µ

CDF of Standard Normal: A function that has been solved

for numerically

The cumulative density function

(CDF) of any normal

N (µ,�2)

Page 27: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Φ(0.54)  =  0.7054

Using Table of ɸ

Page 28: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Four Prototypical TrajectoriesKinda old  school

Page 29: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Using Programming Library

norm.cdf(x, mean, std)

= �

✓x� µ

*  Every  modern  programming   language  has  a  normal   library

= P (X < x) where X ⇠ N (µ,�2)

Page 30: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

I made one for you

Page 31: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• X  ~  N(3,  16) µ =  3          σ 2 =  16          σ =  4§ What  is  P(X  >  0)?

§ What  is  P(2  <  X  <  5)?

§ What  is  P(|X  – 3|  >  6)?

)42

41)4

3543

432 (()52( <<−<<=<< =

−−− ZPPXP X

2902.0)5987.01(6915.0))(1()()()( 41

21

41

42 =−−=Φ−−Φ=−Φ−Φ

7734.0)())(1(1)(1 43

43

43 =Φ=Φ−−=−Φ−

)431)4

3)430

43 ((()0( −≤−>=> −==

−>

− ZPZPPXP X

)439)4

33 (()9()3( −>+

−−<=>+−< ZPZPXPXP1336.0)9332.01(2))(1(2))(1()( 2

323

23 =−=Φ−=Φ−+−Φ

Get Your Gaussian On

0.1337

Page 32: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• Send  voltage  of  2  or  -­2  on  wire  (to  denote  1  or  0)§ X  =  voltage  sent§ R  =  voltage  received  =  X  +  Y,  where  noise  Y  ~  N(0,  1)§ Decode  R:    if  (R  ≥  0.5)  then  1,  else  0§ What  is  P(error  after  decoding  |  original  bit  =  1)?

§ What  is  P(error  after  decoding  |  original  bit  =  0)?

0668.0)5.1(1)5.1()5.1()5.02( ≈Φ−=−Φ=−<=<+ YPYP

0062.0)5.2(1)5.2()5.02( ≈Φ−=≥=≥+− YPYP

Noisy Wires

Page 33: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Four Prototypical Trajectories

Gaussian  for  uncertainty

Page 34: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

ELO Ratings

How  do  you  model  zero  sum  games?What  is  the  probability   that  the  Warriors  beat  the  Rockets?

Page 35: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

P (Warriors win) = P (AW > AR)

⇡ 0.87

ELO RatingsHow  it  works:• Each  team  has  an  “ELO”  score  S,  calculated  based  on  their  past  performance.

• Each  game,  the  team  has  ability  A ~ N(S, 2002)• The  team  with  the  higher  sampled  ability  wins.

AW ⇠ N (1797, 2002)AR ⇠ N (1555, 2002)

17971555 ability

p

Calculated via sampling

P (Warriors win) = P (AW > AR)

⇡ 0.87

Arpad  Elo

Page 36: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Poll of polls?

Credit: fivethirtyeight.com

Democratic  Pennsylvania  Primary

What  is  the  probability   that  Hillary/Bernie  wins?

Page 37: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Poll of polls?

How  it  works:• Estimate  a  Gaussian  for  each  candidate• Mean  is  a  weighted  average  of  poll  means• Variance   is  a  weighted  average  of  poll  trust• Sample  Gaussians  to  find  out  %  win• Key:  represent  uncertainty

Credit: fivethirtyeight.com

Page 38: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Four Prototypical Trajectories

Gaussian  for  Binomial

Page 39: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

Remember this?

There  is  a  deep  reason  for  the  Binomial/Normal   approximation…

Page 40: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• X  ~  Bin(n,  p)§ E[X]  =  np Var(X)  =  np(1  – p)§ Poisson  approx.  good:  n large  (>  20),  p small  (<  0.05)§ For  large  n:    X  ≈ Y  ~  N(E[X],  Var(X))  =  N(np,  np(1  – p))§ Normal  approx.  good  :  Var(X)  =  np(1  – p)  ≥  10

§ DeMoivre-­Laplace   Limit  Theorem:o Sn:  number  of  successes  (with  prob.  p)  in  n independent  trials

( ) ⎟⎟⎠

⎞⎜⎜⎝

−−Φ−⎟

⎟⎠

⎞⎜⎜⎝

+−Φ=+<<−≈=

)1(5.0

)1(5.0)( 2

121

pnpnpk

pnpnpkkYkPkXP

)()()1(

abbpnpnpSaP nn Φ−Φ⎯⎯ →⎯⎟

⎟⎠

⎞⎜⎜⎝

⎛≤

−≤ ∞→

“Continuity  correction”

Normal Approximation of Binomial

Page 41: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

P(X  =  k)

k

Comparison when n = 100, p = 0.5

Page 42: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• 100  people  are  given  a  new  website  design§ X  =  #  people  whose  time  on  site  increases§ CEO  will  endorse  new  design   if  X  ≥  65  What  is  P(CEO  endorses  change|  it  has  no  effect)?

§ X  ~  Bin(100,  0.5)

§ Use  Normal  approximation:  Y  ~  N(50,  25)

§ Using  Binomial:

)5.64()65( >≈≥ YPXP

( ) 0019.0)9.2(1)9.2()5.64( 5505.64

550 ≈Φ−=>=>=> −− ZPPYP Y

0018.0)65( ≈≥XP

5125150 – p)np( – p) np( np ===

Website Testing

Page 43: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• Stanford  accepts  2480  students§ Each  accepted  student  has  68%  chance  of  attending§ X  =  #  students  who  will  attend.      X  ~  Bin(2480,  0.68)§ What  is  P(X  >  1745)?

§ Use  Normal  approximation:  Y  ~  N(1686.4,  539.65)

§ Using  Binomial:

)5.1745()1745( >≈> YPXP

23.23165.53914.1686 – p)np( – p) np( np ≈≈=

( ) 0055.0)54.2(1)5.1745( 23.234.16865.1745

23.234.1686 ≈Φ−=>=> −−YPYP

0053.0)1745( ≈>XP

Stanford Admissions

Page 44: CS#109 Lecture#10 April#18th,2016 - Stanford University · 2016-04-20 · • Common#for#natural phenomena:#heights,#weights,# etc. • Often results from the sum of multiple variables

• Stanford  Daily,  March  28,  2014“Class  of  2018  Admit  Rates  Lowest  in  University  History”  by  Alex  Zivkovic

“Fewer  students  were  admitted  to  the  Class  of  2018  than  the  Class  of  2017,  due  to  the  increase  in  Stanford’s  yield  rate  which  has  increased  over  5  percent  in  the  past  four  years,  according  to  Colleen  Lim  M.A.  ’80,  Director  of  Undergraduate  Admission.”

Changes in Stanford Admissions