modern machine learning: probabilistic modeling and ... · learning: probabilistic modeling and...

40
MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University Environmental Health Summit 1

Upload: others

Post on 02-Jun-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION

Tom DietterichOregon State University

Environmental,Health,Summit 1

Page 2: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Machine Learning Basics

! Goal: program a computer to compute some function

! Given: Training Data !", $" , … , !&, $&! Find: A function ' such that $( ≈ '(!()! Typical Tasks:

" Document classification" Predict jet engine failure" Predict customer behavior

Environmental,Health,Summit 3

“2”

Page 3: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Two Main Paradigms

! Probabilistic Modeling (“Declarative”)! Function Learning (“Algorithmic”)

Environmental,Health,Summit 4

Page 4: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Probabilistic Modeling! Goal:&Predict&! from&"! Model&the&process&that&creates&

the&data:" ! ~ $ ! discrete" " ~ , " -., Σ. Gaussian

! Learning = Model Fitting

! Classification requires probabilistic inference

" $ ! " = 6 . 7(9|;<,=<)∑<@ 6 .@ 7 " -.@Σ.@

Environmental&Health&Summit 5

!

"

“2”

Page 5: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

End-to-End Function Learning

! Define a space of parameterized functions ℱ Θ! Define a loss function # $%, %! Solve the optimization problem:

'( ≔ argmin0 12#(40 52 , %2) + 8 ( 9

! Classify new input 5: by evaluating 4;0 5:

Environmental,Health,Summit 6

LeCu

n,,Bottou,,Ben

gio,,Haffner,,199

8

Page 6: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Programming Languages and Systems! Both paradigms are now well-

supported by programming languages and systems

! Probabilistic programming" Bayesia, Stan, etc.

! Deep neural networks" pytorch, TensorFlow, etc.

Environmental,Health,Summit 7

Page 7: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Outline

! Machine Learning: Two Paradigms! Multi-Level Modeling in Stan! Functional Prediction Methods! Deep Neural Networks

Environmental,Health,Summit 8

Page 8: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Multilevel ModelingGelman & Hill (2006)http://mc-stan.org/users/documentation/case-studies/radon.html

! Radon levels in homes (as risk factor for lung cancer)

! Data" radon level measured in basement or first

floor (if no basement)" county soil uranium level

! Goal: " Identify counties with high radon in homes

! Structure:" households are nested within counties

Environmental,Health,Summit 9

Page 9: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Plate Notation

! ! indexes the county

! " indexes household within county

! #$ soil uranium level

! %&,$ floor (0 or 1)! (&,$ log radon

level

Environmental,Health,Summit 10

! = 1,… , ,

#$

" = 1,… , -$

(&,$ %&,$

countyhome

Page 10: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Alternative Models (1): Fully Pooled Model! !" = $ + &'" + ("

" ignores the county uranium measurement" assumes each house has same error

distribution (" ∼ *+,-./ 0, 23

11Floor

Log(rado

n

High(variance(implies(poor(fit(to(the(data

Assumes(all(counties(have(same(radon(level

Page 11: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Alternative Models (2): No poolingSeparate intercept for each county! !" = $% " + '(" + )"

" assumes each house has same error distribution )" ∼ +,-./0 0, 34

" 5[7] means “the county where house 7 is located”

12

$ %

Much'lower'variance'within'countySome'counties'have'very'high'radon'levels!Are'these'real?

Page 12: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

! County levels (“basement”) vary widely! Are those high levels real?! No, they reflect small sample sizes. Most

counties suffer from small samples of either ! = 0 or ! = 1 (most houses in some counties have basements)

Environmental,Health,Summit 13

fully,pooled

no,pooling

Page 13: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Multilevel Model 1:Partially pooled intercepts! Two-level model:

!" ∼ $%&'() *+, -+./0 ∼ $%&'() 0, -2.30 = !" 0 + 670 + /0

! Combines model of !" and model of 30! All counties affect *+, but counties with

more data points have more influence

Environmental,Health,Summit 14

Page 14: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

! Note that the fit moves toward the fully pooled model for counties with few data points

! Now the variability in radon levels is much less

Environmental,Health,Summit 15

fully,pooled

partial,pooling

Page 15: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

! Visualization of all of the fitted radon models! Some counties have log radon levels near

2.0; others have log radon levels near 1.0

Environmental,Health,Summit 16

!"#

$

Page 16: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Multilevel Model 2:Include county uranium in the intercept model

!" ∼ $%&'() 0, ,-./" = 12 + 145" + !"

67 ∼ $%&'() 0, ,8.97 = /" 7 + :;7 + 67

Environmental,Health,Summit 17

Page 17: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

! Final per-county radon estimates! !" is a strong predictor! But #$ estimates are adjusted to reflect

confounding effects of %&Environmental,Health,Summit 18

#$"

!"

Page 18: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Stan Codedata {

int<lower=0> J;

int<lower=0> N; int<lower=1,upper=J> county[N];

vector[N] u;

vector[N] x;

vector[N] y;

} parameters {

vector[J] a;

vector[2] b;

real mu_a;

real<lower=0,upper=100> sigma_a;real<lower=0,upper=100> sigma_y;

}

transformed parameters {

vector[N] y_hat;

vector[N] m;

for (i in 1:N) {

m[i] <- a[county[i]] + u[i] * b[1];

y_hat[i] <- m[i] + x[i] * b[2];

}}

model {

mu_a ~ normal(0, 1);

a ~ normal(mu_a, sigma_a);

b ~ normal(0, 1);y ~ normal(y_hat, sigma_y);

}

Environmental,Health,Summit 19

Page 19: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Summary:Why Multilevel Modeling?! Accounts for individual- and group-

level variation when estimating group-level coefficients

! Models variation among individual-level coefficients

! Gives better estimates of regression coefficients for groups with small sample sizes by “borrowing strength” from other groups

Environmental,Health,Summit 20

Page 20: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Outline

! Machine Learning: Two Paradigms! Multi-Level Modeling in Stan! Functional Prediction Methods! Deep Neural Networks

Environmental,Health,Summit 21

Page 21: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Functional Prediction Methods

! Random Forests! Support Vector Machines

! Given:" Training data: !", $" , … , (!', $')

! !) *-dimensional vector of predictor variables! $) real or discrete response value

! Find:" Function + that can predict ,$ = +(!) for new

points !

Environmental,Health,Summit 22

Page 22: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Decision Tree

! Let !⋅# be the value of the $-th predictor variable for data point !

! A query ! traverses the tree until it reaches a leaf. The corresponding %&value is '(!)

! The tree is “grown” top-down by choosing the most informative predictor/threshold combination at each step

! %&* is the mean of the !+that arrive at leaf ,

Environmental,Health,Summit 23

!⋅#!⋅- > /0

!⋅1 > /1 !⋅2 > /-

!⋅- > /3 !⋅0 > /4

%&5%&4

%&1%&0

%&3%&-

yes

yes yes

yes yes

no

no no

no no

Page 23: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Randomized Tree

! When the tree is “grown”, only a randomly-chosen subset of !predictor variables is considered at each node

Environmental,Health,Summit 24

"⋅$"⋅% > '(

"⋅) > ') "⋅* > '%

"⋅% > '+ "⋅( > ',

-./-.,

-.)-.(

-.+-.%

yes

yes yes

yes yes

no

no no

no no

Page 24: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Random Forest

! A random forest is a collection of ! randomized trees

! Each tree "# is “grown” on a bootstrap replicate of the training data

! The predicted value is the mean of the predictions of the individual trees

$% = 1!(#)*

+"#(-)

Environmental,Health,Summit 25

Page 25: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Random Forest Advantages

! Can work with a mix of discrete and continuous predictor variables

! Can handle missing values! Makes no assumptions about the error

distribution of !! Considers high-order interactions

among predictors! Generally gives excellent predictive

accuracy

Environmental,Health,Summit 26

Page 26: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Random Forest Disadvantages

! Cannot be usefully inspected (“black box”)

! However" Can provide estimates of variable

importance (see “randomForest” R package)

" Can be modified to support hypothesis tests and confidence intervals (see Mensch & Hooker, 2016a, 2016b)

Environmental,Health,Summit 27

Page 27: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Support Vector Machines

! Extension of Linear Classification Model! ! = #$ + ∑' #'('! New ideas:

" Maximize the margin between the classes" Implicitly map to high-dimensional feature

space using kernels

Environmental,Health,Summit 28

Page 28: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Classification (Iris Species)

Environmental,Health,Summit 29

Page 29: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Decision Boundaries:Which one is best?

Environmental,Health,Summit 30

Page 30: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

SVM Finds the Boundary that Maximizes the Margin

Environmental,Health,Summit 31

Page 31: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Full Iris Data is Not SeparableSVM balances sum of errors

Environmental,Health,Summit 32

Page 32: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

SVMs can fit non-linear decision boundaries using “kernels”

Environmental,Health,Summit 34

Page 33: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

SVM Assessment

! Strengths:" Excellent performance on ! ≫ # problems" Good free implementations (libSVM

wrapped for R, python, etc.)! Weaknesses:

" Does not scale to large datasets easily" Requires tuning 2 hyperparameters

Environmental,Health,Summit 36

Page 34: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Outline

! Machine Learning: Two Paradigms! Multi-Level Modeling in Stan! Functional Prediction Methods! Deep Neural Networks

Environmental,Health,Summit 37

Page 35: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

ImageNet (1000 object classes): Top-5 Error Rate

Environmental,Health,Summit 38

0

5

10

15

20

25

30

2010 2011 2012 2013 2014

Top$5$Clas

sific

ation$Error$(%)

Before After

Page 36: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Speech Recognition Results

Environmental,Health,Summit 39

2013 2014 2015

23%(Word(Error

8%

Google,Speech,Recognition

Credit:,Fernando,Pereira,&,Matthew,Firestone,,Google

Protalinski,,Google

Page 37: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

DNN Practicalities

! The structure of each DNN must be carefully chosen for the task

! There are many many hyperparameters" Auto-ML tools seek to automatically adjust the

network structure and hyperparameters! Generally require lots of data and lots of

compute time" Many groups have had success with “fine

tuning” of pre-trained networks

Environmental,Health,Summit 40

Page 38: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Environmental Health Applications of DNNs! Analyzing medical images! Analyzing EKG and other signal data! Analyzing spectra! Analyzing electronic health records

Environmental,Health,Summit 41

Page 39: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

Summary

! For making inferences about environmental health, the probabilistic modeling paradigm is recommended" Interpretable models" Can draw causal inferences under some conditions

! For extracting data from sensors, EHRs, images" predictive models (random forests, SVMs, DNNs)

excel" SVMs and DNNs require tuning hyperparameters" Tools are beginning to emerge to automate tuning

Environmental,Health,Summit 42

Page 40: MODERN MACHINE LEARNING: PROBABILISTIC MODELING AND ... · LEARNING: PROBABILISTIC MODELING AND FUNCTIONAL PREDICTION Tom Dietterich Oregon State University ... "Can draw causal inferences

References

! STAN: http://mc-stan.org/! Gelman & Hill (2006): Data analysis using regression and

multilevel/hierarchical models.! randomForests package in R! Mentch, L., & Hooker, G. (2016). Quantifying Uncertainty in

Random Forests via Confidence Intervals and Hypothesis Tests. Journal of Machine Learning Research, 17, 1–41.

! Mentch, L., & Hooker, G. (2017). Formal Hypothesis Tests for Additive Structure in Random Forests. Journal of Computational and Graphical Statistics, 26(3), 589–597.

! LibSVM package for fitting support vector machines! Cristianini, N., Shawe-Taylor, J. (2000). An Introduction to

Support Vector Machines and other kernel-based learning methods, Cambridge University Press.

! Zoph, B., & Le, Q. V. (2016). Neural Architecture Search with Reinforcement Learning. ArXiv 1611.01578, 1–16.

Environmental,Health,Summit 43