bayesian model averaging

25
Volodymyrk Bayesian Model Averaging Bayesian Mixer, 27.09.2016 London, UK

Upload: volodymyr-kazantsev

Post on 14-Apr-2017

688 views

Category:

Science


0 download

TRANSCRIPT

Volodymyrk

Bayesian Model AveragingBayesian Mixer, 27.09.2016

London, UK

Volodymyrk

Bayesian Model Averaging (BMA) - 1 minute versionNew Project - how much does it worth?

CFO VP of Growth

Net Present Value: $50m $100m

Model M1 Model M2

30%CEO belief:after evaluating both models and market

data

70%

$15m + $70m = $85m

K = 2

Volodymyrk

Bayesian Model Averaging (BMA) - 3 minute version

VP of Growth

CLV assumptions

$10 $12 $15

CAC$4 72 129 149$6 62 112 133$8 51 92 101

Average= $100.11m

Sensitivity Analysis for M2DATA

Volodymyrk

Bayesian Model Averaging (BMA) - 5 minute version

Bayesian Model Averaging: A Tutorial Jennifer A. Hoeting, David Madigan, Adrian E. Raftery and Chris T. Volinsky

How much do you trust your VP and CFO, before you look at models?

Scary normalising term that you can ignore

Prior probability for model parameter

Volodymyrk

Bayesian answer to overfitting

Frequentist: - model selection- regularisation

Bayesian: - BMA- marginalisation

Volodymyrk

Case Study

You just get the best job in the galaxy

Volodymyrk

Your new Boss Business domain Modelling case

Always test your models on synthetic data that you understand and control

Volodymyrk

Use Cases:- Fraud Detection- Inventory Sourcing

Data

Volodymyrk

Modelling goals

- Prediction range is needed, so that you can identify fraudulent transactions(sand people under-reporting real transaction size and pocketing profit)

- Sale price should be easily explainable, as a function of various Droid Featuresso that Jabba can invest in appropriate scavenging/sourcing projects

- You want lowest prediction error possibleso that you are not feeded to Sarlacc

Volodymyrk

Data Generation Class-1

Class-2

Class-3

Class-4

durability

circuitry

height

weight

price

...

age

Volodymyrk

Data Collection

Volodymyrk

Model Selection - classical methodcredits ~ height + weight + power + dents + rad + wheels + legs + red + blue + black + temperature + lat + long + ir_emit + dents_log + height_log + weight_log + power_log + rad_log

Adj. R2: 0.884974385182

Volodymyrk

Model Selection - backward elimination

Volodymyrk

Final Modelcredits ~ weight + power + dents + rad + wheels + blue + black + temperature + lat + dents_log + height_log + weight_log + power_log

Adj. R2: 0.903544333611

Volodymyrk

Model Evaluation (out-of-sample)

Volodymyrk

Ridge regression (L2 regularisation)

Volodymyrk

Bayesian Model Averaging for Linear Models - a special case

Inclusion probability for (regression coefficients) are weighted across all possible models

Number of models = combinations of all K features (include/exclude) = 2K

Volodymyrk

How to actually do BMA? (in R)cran.r-project.org/web/packages/BMA cran.r-project.org/web/packages/BAScran.r-project.org/web/packages/BMS

Mature. A.k.a. “the original”

Developed by PhD during research. Not maintained

Newest. Maintained by Chair of the Department of Statistical Science at Duke

Volodymyrk

BMA using BMS (R) package

Model Selection L2 Regularisation BMA

MSE 9736.49 7782.21 7329.44

It worked! But you can find inputs into data generator script that will not work as well!

Volodymyrk

Nice things you get from BMA

Posterior Inclusion Probability!How cool is that!

Volodymyrk

Model ranking!

MCMC can be used, if number of features is large

Best model, according to BMA

Volodymyrk

Can we use it for more complex models?

normalising term that you can ignore

http://www.ssc.wisc.edu/~bhansen/718/NonParametrics15.pdf http://www.ejwagenmakers.com/2004/aic.pdf

Warning: Very questionable math. Does not work

Volodymyrk

Can we use BMA to combine complex (incl. hierarchical) models?

1

3

2

Model order is somewhat similar. Relative probabilities are not.We need working Reverse-Jump MCMC or something more sophisticated.

Not available in common bayesian MCMC packages yet.

Volodymyrk

In Summary

- BMA is a Bayesian version of ML Model Ensembles- Math behind is quite beautiful

- Model Averaging is useful for interpretation, not only prediction

- Invest in synthetic data generation, - before applying new modelling techniques to real-world data

- Even if you are not using BMA, fit different models- And combine them, if your goal is prediction

- BMA works very well for common GLMs, but does not work yet for arbitrary models

- Do try it next time you need to fit OLS, though!

Volodymyrk

Of course we are hiring!

● (Snr, Mid) Data Scientists

● Solutions Architect

● Ruby Developer

● Data Engineer

● Senior Artist

● Technical Artist

● Unity Developers

● Senior Product Manager

● Product Director

http://jobs.productmadness.com/