statistical modeling: the two cultures

39
STATISTICAL MODELING: THE TWO CULTURES Based on Leo Breiman's paper Christoph Molnar Department of Statistics, LMU Munich

Upload: christoph-molnar

Post on 26-Jan-2015

110 views

Category:

Education


3 download

DESCRIPTION

This presentation is based on ``Statistical Modeling: The two cultures'' from Leo Breiman. It compares the data modeling culture (statistics) and the algorithmic modeling culture (machine learning).

TRANSCRIPT

Page 1: Statistical Modeling: The Two Cultures

STATISTICAL MODELING: THETWO CULTURES

Based on Leo Breiman's paper

Christoph MolnarDepartment of Statistics,LMU Munich

Page 2: Statistical Modeling: The Two Cultures

..

STATISTICAL MODELING: THETWO CULTURES

Based on Leo Breiman's paper

Christoph MolnarDepartment of Statistics,LMU Munich

..20

14-01-20

Statistical Modeling: The Two Cultures

Abstract:This presentation compares two cultures of statisticalmodeling: the data modeling culture, which assumes astochastic process that produced the data. This culture isassociated with traditional statistics. The other culture is calledalgorithmic modeling culture, which can be reduced tooptimization of a loss function with an algorithm. This culture isassociated with Machine Learning. It is argued to usealgorithmic modeling more often in statistics.

Page 3: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

OUTLINE

1. Statistics: Data Modeling Culture2. Machine Learning: Algorithmic Modeling Culture3. Statistical Learning Principles4. Personal Experience5. Summary

Content heavily based on: ``Statistical Modeling: The two cultures''from Leo Breiman [1]

1 / 19

Page 4: Statistical Modeling: The Two Cultures

..

OUTLINE

1. Statistics: Data Modeling Culture2. Machine Learning: Algorithmic Modeling Culture3. Statistical Learning Principles4. Personal Experience5. Summary

Content heavily based on: ``Statistical Modeling: The two cultures''from Leo Breiman [1]

..20

14-01-20

Statistical Modeling: The Two Cultures

Outline

This presentation is based on ``Statistical Modeling: The twocultures'' from Leo Breiman [1].The first segment introduces the data modeling culture andanalogously the second segment explains the algorithmicmodeling culture together with the presentation of threealgorithms. The part about statistical learning principalspresents aspects which help to compare both of cultures.Personal experiences in both cultures are addressed. Theconclusion summarizes the message of the paper [1].

Page 5: Statistical Modeling: The Two Cultures

DATA MODELING CULTURE

Page 6: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

WORK OF A DATA ANALYST

→ Predict→ Reveal associations→ Munge data, design experiments, visualize data, …

2 / 19

Page 7: Statistical Modeling: The Two Cultures

..

WORK OF A DATA ANALYST

→ Predict→ Reveal associations→ Munge data, design experiments, visualize data, …

..20

14-01-20

Statistical Modeling: The Two Cultures

Data Modeling

Work of a Data Analyst

The work of a data analyst is very diverse. This presentationfocuses on the modeling of data, which can be reduced to twotargets: Learn a model to predict the outcome for newcovariates and get a better understanding about therelationship between covariates and outcome.

Page 8: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

SIMPLIFIED WORLDVIEW

..nature. x.y ..

3 / 19

Page 9: Statistical Modeling: The Two Cultures

..

SIMPLIFIED WORLDVIEW

..nature. x.y ..

..20

14-01-20

Statistical Modeling: The Two Cultures

Data Modeling

Simplified worldview

In a strongly simplified world an arbitrary outcome y isproduced by the nature given the covariates x. The knowledgeabout the natures true mechanisms range between entirelyunknown and established (scientific) explanations of themechanism. One example: Outcome y is the rent forapartments and covariates x are size, number of bathroomsand location.

Page 10: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

DATA MODELING CULTURE

..

LogisticRegression,Cox Model,

GEE,…

. x.y ..

Find a stochastic model of the data-generating process:y = f(x, parameters, random error)

4 / 19

Page 11: Statistical Modeling: The Two Cultures

..

DATA MODELING CULTURE

..

LogisticRegression,Cox Model,

GEE,…

. x.y ..

Find a stochastic model of the data-generating process:y = f(x, parameters, random error)

..20

14-01-20

Statistical Modeling: The Two Cultures

Data Modeling

Data Modeling Culture

The direct modeling of the mechanism in the ``box'' is labeled»Data Modeling Culture« by Leo Breiman. In this culture astochastic model for the data- generating process is assumed.A common formulation of these model is: y is a function of xwith corresponding weights and a random error. For example:Given the covariates size, number of bathrooms and location,the rent of apartments is normal distributed.

Page 12: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

TYPICAL ASSUMPTIONS AND RESTRICTIONS

→ Specific stochastic model that generated the data→ Distribution of residuals→ Linearity (e.g. linear predictor)→ Manual specification of interactions

5 / 19

Page 13: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

PROBLEMS

→ Conclusions about model, not about nature→ Assumptions often violated→ Often no model evaluation→ ⇒ can lead to irrelevant theory and questionable

statistical conclusions→ Focus not on prediction→ Data models fail in areas like image and speech

recognition

6 / 19

Page 14: Statistical Modeling: The Two Cultures

ALGORITHMIC MODELING CULTURE

Page 15: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

MACHINE LEARNING

..unknown. x.y ..

algorithm

....

Find a function f(X) that minimizes the loss: L(Y, f(X))

7 / 19

Page 16: Statistical Modeling: The Two Cultures

..

MACHINE LEARNING

..unknown. x.y ..

algorithm

....

Find a function f(X) that minimizes the loss: L(Y, f(X))

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Machine Learning

In the »Algorithmic Modeling Culture«, the true mechanism istreated as unknown. It is not the target to find the truedata-generating mechanism but to use an algorithm thatimitates the mechanism as good as possible. Modeling isreduced to a mere problem of function optimization: Given thecovariates x, outcome y and a loss function find a function f(x)which minimizes the loss for the prediction of the outcome. Thisculture is lived in the machine learning area.Summary: The data modeling culture tries to find the truedata-generating mechanism, the algorithmic modeling culturetries to imitate the true mechanism as good as possible.

Page 17: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

ALGORITHM IN MACHINE LEARNING

→ Boosting→ Support Vector Machines→ Artificial neural networks→ Random Forests→ Hidden Markov→ Bayes-Nets→ … 1

1Details and more algorithms in ``Elements of statistical learning''[2]8 / 19

Page 18: Statistical Modeling: The Two Cultures

..

ALGORITHM IN MACHINE LEARNING

→ Boosting→ Support Vector Machines→ Artificial neural networks→ Random Forests→ Hidden Markov→ Bayes-Nets→ … 1

1Details and more algorithms in ``Elements of statistical learning''[2]

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Algorithm in Machine Learning

The algorithms used in machine learning are motivateddifferently. Three algorithms are presented in short.

Page 19: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

ARTIFICIAL NEURAL NETWORKS

2

2http://commons.wikimedia.org/wiki/File:Mouse_cingulate_cortex_neurons.jpghttp://commons.wikimedia.org/wiki/File:Neural_network.svg

9 / 19

Page 20: Statistical Modeling: The Two Cultures

..

ARTIFICIAL NEURAL NETWORKS

2

2http://commons.wikimedia.org/wiki/File:Mouse_cingulate_cortex_neurons.jpghttp://commons.wikimedia.org/wiki/File:Neural_network.svg

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Artificial neural networks

Artificial neural networks are used in classification andregression. They are inspired by the brain, which consists of anetwork of brain cells (neurons). Mathematically artificialneural networks are a concatenation of weighted functions. Anexemplary application is image processing.

Page 21: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

SUPPORT VECTOR MACHINES

3

3http://commons.wikimedia.org/wiki/File:Svm_10_perceptron.JPG 10 / 19

Page 22: Statistical Modeling: The Two Cultures

..

SUPPORT VECTOR MACHINES

3

3http://commons.wikimedia.org/wiki/File:Svm_10_perceptron.JPG

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Support Vector Machines

Support Vector Machines (SVM) were originally a classificationmethod (regression is also possible). SVMs try to draw aborder between two classes in the covariate space. Thedistance between the border and the class points ismaximized. They use a mathematical trick to implicitly projectthe covariates in a space with higher dimensions (yes, it soundsa bit crazy) in order to achieve class separation. Textclassification is an exemplary usage.

Page 23: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

RANDOM FORESTS

4

4http://openclipart.org/detail/175304/forest-by-z-175304

11 / 19

Page 24: Statistical Modeling: The Two Cultures

..

RANDOM FORESTS

4

4http://openclipart.org/detail/175304/forest-by-z-175304

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Random Forests

Random Forests™(invented by Leo Breiman) are used forregression and classification. A Random Forest consists ofmany decision trees. Two random mechanisms are used totrain different trees on the data. Results from all trees areaveraged for the prediction.

Page 25: Statistical Modeling: The Two Cultures

STATISTICAL LEARNING PRINCIPLES

Page 26: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

RASHOMON EFFECT

(Often) Many different models describe a situationequally accurate.

12 / 19

Page 27: Statistical Modeling: The Two Cultures

..

RASHOMON EFFECT

(Often) Many different models describe a situationequally accurate.

..20

14-01-20

Statistical Modeling: The Two Cultures

Principles

Rashomon Effect

Rashomon is a Japanese movie in which four witnesses telldifferent versions of a crime. All versions account for the factsbut they contradict each other. In terms of statistical learningthis means, that often different models (e.g. same y butdifferent covariates) can be equally accurate. Each model hasa different interpretation which makes it difficult to find thetrue mechanism in the data modeling cultures. In thealgorithmic modeling culture the Rashomon effect is exploitedby aggregating over many models. Random forests use thiseffect by aggregating over many trees. It is also common toaverage the predictions of different algorithms.

Page 28: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

DIMENSIONALITY OF THE DATA

→ The higher the dimensionality of the data (#covariates) the more difficult is the separation ofsignal and noise

→ Common practice in data modeling: variable selection(by expert selection or data driven) and reduction ofdimensionality (e.g. PCA)

→ Common practice in algorithmic modeling:Engineering of new features (covariates) to increasepredictive accuracy; algorithms robust for manycovariates

13 / 19

Page 29: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

PREDICTION VS. INTERPRETATION

.. Interpretability.

Predictive accuracy

.

Tree

.

Random Forest

.

Logist. Regression

.

Neuronal networks

.

.

14 / 19

Page 30: Statistical Modeling: The Two Cultures

..

PREDICTION VS. INTERPRETATION

.. Interpretability.

Predictive accuracy

.

Tree

.

Random Forest

.

Logist. Regression

.

Neuronal networks

.

.

..20

14-01-20

Statistical Modeling: The Two Cultures

Principles

Prediction vs. Interpretation

There is a trade-off between interpretability and predictiveaccuracy: the models that are good in prediction are oftencomplex and models that are easy to interpret are often badpredictors. See for example trees and Random Forests: Asingle decision tree is very intuitive and easy to read fornon-professionals, but they are unstable and give weakpredictions. A complex aggregation of decision trees (RandomForest) has an excellent prediction accuracy, but it isimpossible to interpret the model structure.

Page 31: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

GOODNESS OF MODEL

→ Data modeling culture: Goodness of fit often basedon model assumptions (e.g. AIC) and calculated ontraining data.

→ Algorithmic modeling culture: Evaluation of predictiveaccuracy with an extra test set or cross validation.

How good is a statistical model if the predictive accuracyis weak? Is it legit to interpret parameters and p-values?

15 / 19

Page 32: Statistical Modeling: The Two Cultures

PERSONAL EXPERIENCES

Page 33: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

STATISTICAL CONSULTING

Stereotypical user ...

→ are e.g. veterinarians, linguists, biologists, …→ crave p-values→ want interpretability→ ignore model diagnosis

16 / 19

Page 34: Statistical Modeling: The Two Cultures

..

STATISTICAL CONSULTING

Stereotypical user ...

→ are e.g. veterinarians, linguists, biologists, …→ crave p-values→ want interpretability→ ignore model diagnosis

..20

14-01-20

Statistical Modeling: The Two Cultures

Experiences

Statistical Consulting

From my experience in the statistical consulting unit of ouruniversity, most user want to test their scientific hypothesiswith models. They want models which are easy to interpretregarding their questions. Thus it is more important to have amodel that gives parameters associated with covariates andp-values than to have a model that predicts the data well.

Page 35: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

KAGGLE

Algorithms of winners on kaggle, a platform forprediction challenges:

→ Job Salary Prediction: »I used deep neural networks«→ Observing Dark Worlds: »Bayesian analysis provided

the winning recipe for solving this problem«→ Give Me Some Credit: »In the end we only used five

supervised learning methods: a random forest ofclassification trees, a random forest of regressiontrees, a classification tree boosting algorithm, aregression tree boosting algorithm, and a neuralnetwork.«

17 / 19

Page 36: Statistical Modeling: The Two Cultures

..

KAGGLE

Algorithms of winners on kaggle, a platform forprediction challenges:

→ Job Salary Prediction: »I used deep neural networks«→ Observing Dark Worlds: »Bayesian analysis provided

the winning recipe for solving this problem«→ Give Me Some Credit: »In the end we only used five

supervised learning methods: a random forest ofclassification trees, a random forest of regressiontrees, a classification tree boosting algorithm, aregression tree boosting algorithm, and a neuralnetwork.«

..20

14-01-20

Statistical Modeling: The Two Cultures

Experiences

kaggle

Kaggle (http://www.kaggle.com) is a platform for predictionchallenges. Data are provided and the participant with the bestprediction on a test set wins the challenge. To generateinsights about the mechanisms in the data is secondarybecause the prediction is all that counts to win. That's why thealgorithmic modeling culture is superior in this field.

Page 37: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

CONCLUSION

Data analysts should:

→ Use predictive accuracy to evaluate models→ Seek the best model→ Add Machine Learning to their toolbox

18 / 19

Page 38: Statistical Modeling: The Two Cultures

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

FURTHER LITERATURE

L. BreimanStatistical modeling: The two cultures (with commentsand a rejoinder by the author)Institute of Mathematical Statistics, 2001.

T. Hastie, R. Tibshirani and J. FriedmanThe elements of statistical learningSpringer New York, 2001

19 / 19

Page 39: Statistical Modeling: The Two Cultures

»All models are wrong, but some are useful« (G. Box)