statistical modeling: the two cultures - based on leo breiman's

39
STATISTICAL MODELING: THE TWO CULTURES Based on Leo Breiman's paper Christoph Molnar Department of Statistics, LMU Munich

Upload: others

Post on 14-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

STATISTICAL MODELING: THETWO CULTURES

Based on Leo Breiman's paper

Christoph MolnarDepartment of Statistics,LMU Munich

..

STATISTICAL MODELING: THETWO CULTURES

Based on Leo Breiman's paper

Christoph MolnarDepartment of Statistics,LMU Munich

..20

14-01-20

Statistical Modeling: The Two Cultures

Abstract:This presentation compares two cultures of statisticalmodeling: the data modeling culture, which assumes astochastic process that produced the data. This culture isassociated with traditional statistics. The other culture is calledalgorithmic modeling culture, which can be reduced tooptimization of a loss function with an algorithm. This culture isassociated with Machine Learning. It is argued to usealgorithmic modeling more often in statistics.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

OUTLINE

1. Statistics: Data Modeling Culture2. Machine Learning: Algorithmic Modeling Culture3. Statistical Learning Principles4. Personal Experience5. Summary

Content heavily based on: ``Statistical Modeling: The two cultures''from Leo Breiman [1]

1 / 19

..

OUTLINE

1. Statistics: Data Modeling Culture2. Machine Learning: Algorithmic Modeling Culture3. Statistical Learning Principles4. Personal Experience5. Summary

Content heavily based on: ``Statistical Modeling: The two cultures''from Leo Breiman [1]

..20

14-01-20

Statistical Modeling: The Two Cultures

Outline

This presentation is based on ``Statistical Modeling: The twocultures'' from Leo Breiman [1].The first segment introduces the data modeling culture andanalogously the second segment explains the algorithmicmodeling culture together with the presentation of threealgorithms. The part about statistical learning principalspresents aspects which help to compare both of cultures.Personal experiences in both cultures are addressed. Theconclusion summarizes the message of the paper [1].

DATA MODELING CULTURE

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

WORK OF A DATA ANALYST

→ Predict→ Reveal associations→ Munge data, design experiments, visualize data, …

2 / 19

..

WORK OF A DATA ANALYST

→ Predict→ Reveal associations→ Munge data, design experiments, visualize data, …

..20

14-01-20

Statistical Modeling: The Two Cultures

Data Modeling

Work of a Data Analyst

The work of a data analyst is very diverse. This presentationfocuses on the modeling of data, which can be reduced to twotargets: Learn a model to predict the outcome for newcovariates and get a better understanding about therelationship between covariates and outcome.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

SIMPLIFIED WORLDVIEW

..nature. x.y ..

3 / 19

..

SIMPLIFIED WORLDVIEW

..nature. x.y ..

..20

14-01-20

Statistical Modeling: The Two Cultures

Data Modeling

Simplified worldview

In a strongly simplified world an arbitrary outcome y isproduced by the nature given the covariates x. The knowledgeabout the natures true mechanisms range between entirelyunknown and established (scientific) explanations of themechanism. One example: Outcome y is the rent forapartments and covariates x are size, number of bathroomsand location.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

DATA MODELING CULTURE

..

LogisticRegression,Cox Model,

GEE,…

. x.y ..

Find a stochastic model of the data-generating process:y = f(x, parameters, random error)

4 / 19

..

DATA MODELING CULTURE

..

LogisticRegression,Cox Model,

GEE,…

. x.y ..

Find a stochastic model of the data-generating process:y = f(x, parameters, random error)

..20

14-01-20

Statistical Modeling: The Two Cultures

Data Modeling

Data Modeling Culture

The direct modeling of the mechanism in the ``box'' is labeled»Data Modeling Culture« by Leo Breiman. In this culture astochastic model for the data- generating process is assumed.A common formulation of these model is: y is a function of xwith corresponding weights and a random error. For example:Given the covariates size, number of bathrooms and location,the rent of apartments is normal distributed.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

TYPICAL ASSUMPTIONS AND RESTRICTIONS

→ Specific stochastic model that generated the data→ Distribution of residuals→ Linearity (e.g. linear predictor)→ Manual specification of interactions

5 / 19

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

PROBLEMS

→ Conclusions about model, not about nature→ Assumptions often violated→ Often no model evaluation→ ⇒ can lead to irrelevant theory and questionable

statistical conclusions→ Focus not on prediction→ Data models fail in areas like image and speech

recognition

6 / 19

ALGORITHMIC MODELING CULTURE

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

MACHINE LEARNING

..unknown. x.y ..

algorithm

....

Find a function f(X) that minimizes the loss: L(Y, f(X))

7 / 19

..

MACHINE LEARNING

..unknown. x.y ..

algorithm

....

Find a function f(X) that minimizes the loss: L(Y, f(X))

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Machine Learning

In the »Algorithmic Modeling Culture«, the true mechanism istreated as unknown. It is not the target to find the truedata-generating mechanism but to use an algorithm thatimitates the mechanism as good as possible. Modeling isreduced to a mere problem of function optimization: Given thecovariates x, outcome y and a loss function find a function f(x)which minimizes the loss for the prediction of the outcome. Thisculture is lived in the machine learning area.Summary: The data modeling culture tries to find the truedata-generating mechanism, the algorithmic modeling culturetries to imitate the true mechanism as good as possible.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

ALGORITHM IN MACHINE LEARNING

→ Boosting→ Support Vector Machines→ Artificial neural networks→ Random Forests→ Hidden Markov→ Bayes-Nets→ … 1

1Details and more algorithms in ``Elements of statistical learning''[2]8 / 19

..

ALGORITHM IN MACHINE LEARNING

→ Boosting→ Support Vector Machines→ Artificial neural networks→ Random Forests→ Hidden Markov→ Bayes-Nets→ … 1

1Details and more algorithms in ``Elements of statistical learning''[2]

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Algorithm in Machine Learning

The algorithms used in machine learning are motivateddifferently. Three algorithms are presented in short.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

ARTIFICIAL NEURAL NETWORKS

2

2http://commons.wikimedia.org/wiki/File:Mouse_cingulate_cortex_neurons.jpghttp://commons.wikimedia.org/wiki/File:Neural_network.svg

9 / 19

..

ARTIFICIAL NEURAL NETWORKS

2

2http://commons.wikimedia.org/wiki/File:Mouse_cingulate_cortex_neurons.jpghttp://commons.wikimedia.org/wiki/File:Neural_network.svg

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Artificial neural networks

Artificial neural networks are used in classification andregression. They are inspired by the brain, which consists of anetwork of brain cells (neurons). Mathematically artificialneural networks are a concatenation of weighted functions. Anexemplary application is image processing.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

SUPPORT VECTOR MACHINES

3

3http://commons.wikimedia.org/wiki/File:Svm_10_perceptron.JPG 10 / 19

..

SUPPORT VECTOR MACHINES

3

3http://commons.wikimedia.org/wiki/File:Svm_10_perceptron.JPG

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Support Vector Machines

Support Vector Machines (SVM) were originally a classificationmethod (regression is also possible). SVMs try to draw aborder between two classes in the covariate space. Thedistance between the border and the class points ismaximized. They use a mathematical trick to implicitly projectthe covariates in a space with higher dimensions (yes, it soundsa bit crazy) in order to achieve class separation. Textclassification is an exemplary usage.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

RANDOM FORESTS

4

4http://openclipart.org/detail/175304/forest-by-z-175304

11 / 19

..

RANDOM FORESTS

4

4http://openclipart.org/detail/175304/forest-by-z-175304

..20

14-01-20

Statistical Modeling: The Two Cultures

Algorithmic Modeling

Random Forests

Random Forests™(invented by Leo Breiman) are used forregression and classification. A Random Forest consists ofmany decision trees. Two random mechanisms are used totrain different trees on the data. Results from all trees areaveraged for the prediction.

STATISTICAL LEARNING PRINCIPLES

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

RASHOMON EFFECT

(Often) Many different models describe a situationequally accurate.

12 / 19

..

RASHOMON EFFECT

(Often) Many different models describe a situationequally accurate.

..20

14-01-20

Statistical Modeling: The Two Cultures

Principles

Rashomon Effect

Rashomon is a Japanese movie in which four witnesses telldifferent versions of a crime. All versions account for the factsbut they contradict each other. In terms of statistical learningthis means, that often different models (e.g. same y butdifferent covariates) can be equally accurate. Each model hasa different interpretation which makes it difficult to find thetrue mechanism in the data modeling cultures. In thealgorithmic modeling culture the Rashomon effect is exploitedby aggregating over many models. Random forests use thiseffect by aggregating over many trees. It is also common toaverage the predictions of different algorithms.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

DIMENSIONALITY OF THE DATA

→ The higher the dimensionality of the data (#covariates) the more difficult is the separation ofsignal and noise

→ Common practice in data modeling: variable selection(by expert selection or data driven) and reduction ofdimensionality (e.g. PCA)

→ Common practice in algorithmic modeling:Engineering of new features (covariates) to increasepredictive accuracy; algorithms robust for manycovariates

13 / 19

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

PREDICTION VS. INTERPRETATION

.. Interpretability.

Predictive accuracy

.

Tree

.

Random Forest

.

Logist. Regression

.

Neuronal networks

.

.

14 / 19

..

PREDICTION VS. INTERPRETATION

.. Interpretability.

Predictive accuracy

.

Tree

.

Random Forest

.

Logist. Regression

.

Neuronal networks

.

.

..20

14-01-20

Statistical Modeling: The Two Cultures

Principles

Prediction vs. Interpretation

There is a trade-off between interpretability and predictiveaccuracy: the models that are good in prediction are oftencomplex and models that are easy to interpret are often badpredictors. See for example trees and Random Forests: Asingle decision tree is very intuitive and easy to read fornon-professionals, but they are unstable and give weakpredictions. A complex aggregation of decision trees (RandomForest) has an excellent prediction accuracy, but it isimpossible to interpret the model structure.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

GOODNESS OF MODEL

→ Data modeling culture: Goodness of fit often basedon model assumptions (e.g. AIC) and calculated ontraining data.

→ Algorithmic modeling culture: Evaluation of predictiveaccuracy with an extra test set or cross validation.

How good is a statistical model if the predictive accuracyis weak? Is it legit to interpret parameters and p-values?

15 / 19

PERSONAL EXPERIENCES

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

STATISTICAL CONSULTING

Stereotypical user ...

→ are e.g. veterinarians, linguists, biologists, …→ crave p-values→ want interpretability→ ignore model diagnosis

16 / 19

..

STATISTICAL CONSULTING

Stereotypical user ...

→ are e.g. veterinarians, linguists, biologists, …→ crave p-values→ want interpretability→ ignore model diagnosis

..20

14-01-20

Statistical Modeling: The Two Cultures

Experiences

Statistical Consulting

From my experience in the statistical consulting unit of ouruniversity, most user want to test their scientific hypothesiswith models. They want models which are easy to interpretregarding their questions. Thus it is more important to have amodel that gives parameters associated with covariates andp-values than to have a model that predicts the data well.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

KAGGLE

Algorithms of winners on kaggle, a platform forprediction challenges:

→ Job Salary Prediction: »I used deep neural networks«→ Observing Dark Worlds: »Bayesian analysis provided

the winning recipe for solving this problem«→ Give Me Some Credit: »In the end we only used five

supervised learning methods: a random forest ofclassification trees, a random forest of regressiontrees, a classification tree boosting algorithm, aregression tree boosting algorithm, and a neuralnetwork.«

17 / 19

..

KAGGLE

Algorithms of winners on kaggle, a platform forprediction challenges:

→ Job Salary Prediction: »I used deep neural networks«→ Observing Dark Worlds: »Bayesian analysis provided

the winning recipe for solving this problem«→ Give Me Some Credit: »In the end we only used five

supervised learning methods: a random forest ofclassification trees, a random forest of regressiontrees, a classification tree boosting algorithm, aregression tree boosting algorithm, and a neuralnetwork.«

..20

14-01-20

Statistical Modeling: The Two Cultures

Experiences

kaggle

Kaggle (http://www.kaggle.com) is a platform for predictionchallenges. Data are provided and the participant with the bestprediction on a test set wins the challenge. To generateinsights about the mechanisms in the data is secondarybecause the prediction is all that counts to win. That's why thealgorithmic modeling culture is superior in this field.

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

CONCLUSION

Data analysts should:

→ Use predictive accuracy to evaluate models→ Seek the best model→ Add Machine Learning to their toolbox

18 / 19

Data Modeling Algorithmic Modeling Principles Experiences Conclusions

FURTHER LITERATURE

L. BreimanStatistical modeling: The two cultures (with commentsand a rejoinder by the author)Institute of Mathematical Statistics, 2001.

T. Hastie, R. Tibshirani and J. FriedmanThe elements of statistical learningSpringer New York, 2001

19 / 19

»All models are wrong, but some are useful« (G. Box)