iterating over statistical models: ncaa tournament edition
TRANSCRIPT
![Page 1: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/1.jpg)
ITERATING OVER STATISTICAL MODELSNCAA TOURNAMENT EDITION
![Page 2: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/2.jpg)
Daniel Lee
WHY ARE YOU LISTENING TO ME?
▸ Stan developerhttp://mc-stan.org
▸ Researcher at Columbia
▸ Co-founder of Stan Grouptraining / statistical support / consulting
▸ email: [email protected] twitter: @djsyclik / @mcmc_stan web: http://syclik.com
![Page 3: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/3.jpg)
MARCH MADNESS
![Page 4: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/4.jpg)
WHAT IS MARCH MADNESS?
![Page 5: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/5.jpg)
WHAT IS MARCH MADNESS?
![Page 6: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/6.jpg)
CHAMPIONSHIP GAME: VILLANOVA VS NORTH CAROLINA
![Page 7: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/7.jpg)
P(VILLANOVA WIN)?CHAMPIONSHIP GAME: VILLANOVA VS NORTH CAROLINA
![Page 8: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/8.jpg)
P(VILLANOVA WIN)?
VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS
▸ 1:00. 70 - 69
▸ 0:35. 72 - 69
▸ 0:23. 72 - 71
![Page 9: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/9.jpg)
P(VILLANOVA WIN)?
VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS
▸ 1:00. 70 - 69
▸ 0:35. 72 - 69
▸ 0:23. 72 - 71
▸ 0:13. 74 - 71
![Page 10: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/10.jpg)
P(VILLANOVA WIN)?
VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS
▸ 1:00. 70 - 69
▸ 0:35. 72 - 69
▸ 0:23. 72 - 71
▸ 0:13. 74 - 71
▸ 0:06. 74 - 74
![Page 11: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/11.jpg)
P(VILLANOVA WIN)?
VILLANOVA WILDCATS VS. NORTH CAROLINA TAR HEELS
▸ 1:00. 70 - 69
▸ 0:35. 72 - 69
▸ 0:23. 72 - 71
▸ 0:13. 74 - 71
▸ 0:06. 74 - 74
▸ 0:00. 77 - 74. Villanova wins.
![Page 12: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/12.jpg)
![Page 13: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/13.jpg)
STATISTICAL MODELSTREAT
AS CODE
![Page 14: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/14.jpg)
WRITING CODE IS A DISCIPLINE
![Page 15: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/15.jpg)
WRITING CODE IS A DISCIPLINE
▸ design patterns
▸ testing
▸ code review
▸ maintenance
▸ modularity
▸ collaboration
CAN BE
![Page 16: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/16.jpg)
IS STATISTICAL MODELING A DISCIPLINE?
▸ Art or science?
▸ Models have names
▸ Statistical model vs implementation
▸ Collaboration on the statistical model?
![Page 17: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/17.jpg)
TREAT STATISTICAL MODELS AS CODE
WHAT DO WE NEED TO DO
▸ Elevate statistical models: first class entity
▸ Modularize
▸ Language
▸ Discuss subtle details
▸ Collaborate
![Page 18: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/18.jpg)
TREAT STATISTICAL MODELS AS CODE
STAN GETS US CLOSE
▸ statistical modeling language
▸ domain-specific languagehas its own grammar; not R or BUGS!
▸ rstanshinystan, RStudio integration, rstanarm
▸ open-sourcecore libraries are new BSD
▸ Stan program
▸ plain text
▸ plays nicely with source repositories
▸ imperative language
Note: Stan isn’t the only thing you can do this with
![Page 19: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/19.jpg)
MODELING BASKETBALL
![Page 20: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/20.jpg)
MARCH MADNESS
BASKETBALL HISTORY
▸ 1891
▸ Dr. James Naismith. Springfield, MA
▸ Non-contact conditioning
▸ 13 simple rules
▸ Peach basket
![Page 21: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/21.jpg)
MARCH MADNESS
BASKETBALL HISTORY
▸ 1891
▸ Dr. James Naismith. Springfield, MA
▸ Non-contact conditioning
▸ 13 simple rules
▸ Peach basket
10. The umpire shall be judge of the men and shall note the fouls and notify the referee when three consecutive fouls have been made. He shall have power to disqualify men according to Rule 5.
![Page 22: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/22.jpg)
MARCH MADNESS
BASKETBALL NOW
▸ 2 x 20 min half
▸ Increasing score
▸ Points increment by 2, 3, and 1
▸ 5 players, unlimited substitutions
▸ player DQ: 5th foul
▸ Bonus: 7 team foulsDouble bonus: 10 team fouls
![Page 23: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/23.jpg)
MARCH MADNESS
DATA
▸ 351 NCAA Division 1 Men’s basketball teams
▸ 33 conferences
▸ 5421 games
▸ 24 - 35 games per team
▸ Max 3 observations
![Page 24: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/24.jpg)
![Page 25: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/25.jpg)
IS THIS BIG DATA?
![Page 26: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/26.jpg)
TALL DATA VS WIDE DATA
▸ Tall datalots of replications
▸ Wide datalots of fieldsday, home, score, ot, fgm, fga, 3pm, 3pa, 3a, fta, ftm, or, dr, ast, to, stl, blk, pf
![Page 27: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/27.jpg)
THREE STEPS OF BAYESIAN DATA ANALYSIS
![Page 28: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/28.jpg)
ANDREW GELMAN IN BDA
THE THREE STEPS OF BAYESIAN DATA ANALYSIS
1. Set up full probability model
2. Condition on observed data
3. Evaluate the fit of the model
![Page 29: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/29.jpg)
ANDREW GELMAN IN BDA
THE THREE STEPS OF BAYESIAN DATA ANALYSIS
1. Set up full probability modelWrite a Stan program
2. Condition on observed data
3. Evaluate the fit of the model
![Page 30: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/30.jpg)
ANDREW GELMAN IN BDA
THE THREE STEPS OF BAYESIAN DATA ANALYSIS
1. Set up full probability modelWrite a Stan program
2. Condition on observed dataRun RStan
3. Evaluate the fit of the model
![Page 31: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/31.jpg)
ANDREW GELMAN IN BDA
THE THREE STEPS OF BAYESIAN DATA ANALYSIS
1. Set up full probability modelWrite a Stan program
2. Condition on observed dataRun RStan
3. Evaluate the fit of the modelR, ShinyStan, posterior predictive checks
![Page 32: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/32.jpg)
ITERATING OVER MODELS
![Page 33: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/33.jpg)
ITERATING OVER STATISTICAL MODELS
STATISTICAL MODEL #1
▸ Only 2015-2016 matters
▸ Teams have a latent ability
▸ “logistic regression”
▸ “Bradley-Terry model”
y ⇠ bernoulli(logit
�1(✓1 � ✓2))
![Page 34: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/34.jpg)
![Page 35: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/35.jpg)
P(VILLANOVA > UNC | DATA) = 0.73
![Page 36: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/36.jpg)
![Page 37: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/37.jpg)
![Page 38: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/38.jpg)
ITERATING OVER STATISTICAL MODELS
TREAT STATISTICAL MODELS AS CODE
▸ Statistical model in a separate file
▸ Git
▸ Testing
▸ Inspection of fit
▸ Backtest on historical data
▸ Priors
▸ “Model #1”
![Page 39: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/39.jpg)
IN WRITING, YOU MUST KILL ALL YOUR DARLINGS.
Willliam Faulkner
![Page 40: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/40.jpg)
ITERATING OVER STATISTICAL MODELS
STATISTICAL MODEL #2
▸ Home court advantage!
▸ Teams have a latent ability
▸ “logistic regression”
▸ “Bradley-Terry model”
y ⇠ bernoulli(logit
�1(↵+ ✓1 � ✓2))
![Page 41: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/41.jpg)
![Page 42: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/42.jpg)
P(VILLANOVA > UNC | DATA) = 0.52
![Page 43: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/43.jpg)
![Page 44: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/44.jpg)
ITERATING OVER STATISTICAL MODELS
STATISTICAL MODEL #3
▸ Assumptions:
▸ Only 2015-2016 matters
▸ Teams have a latent ability
▸ Model points
▸ Add a home court advantage
![Page 45: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/45.jpg)
![Page 46: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/46.jpg)
![Page 47: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/47.jpg)
![Page 48: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/48.jpg)
ITERATING OVER STATISTICAL MODELS
HOW DID WE DO?
▸ Kaggle: 87 / 608
▸ What went wrong?
▸ What’s next?
![Page 49: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/49.jpg)
STATISTICAL MODELSTREAT
AS CODE
![Page 50: Iterating over statistical models: NCAA tournament edition](https://reader034.vdocuments.mx/reader034/viewer/2022042723/587688f11a28ab1b158b7b97/html5/thumbnails/50.jpg)
THANKS▸ Collaborative effort
▸ NCAA modeling:Rob Trangucci
▸ Stan teamAndrew Gelman, Bob Carpenter, Matt Hoffman, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, Allen Riddell, Marco Ignacio, Jeff Arnold, Mitzi Morris, Rob Goedman, Brian Lau, Jonah Gabry, Alp Kucukelbir, Robert Grant, Dustin Tran, Krzysztof Sakrejda, Aki Vehtari, Rayleigh Lei, Sebastian Weber
HELP▸ http://mc-stan.org
▸ stan-users mailing list
▸ Stan Group Inc.http://stan.fit training / statistical support / consulting
▸ [email protected] / @djsyclik / @mcmc_stan