lecture 02: evaluating models january 27, 2016 sds 293 machine learning

29
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Upload: raymond-horn

Post on 18-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

Outline Evaluating Models Lab pt. 1 – Introduction to R: - Basic Commands - Graphics Overview - Indexing Data - Loading Data - Additional Graphical/Numerical Summaries Lab pt. 2 - Exploring other datasets (time permitting)

TRANSCRIPT

Page 1: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

LECTURE 02:

EVALUATING MODELSJanuary 27, 2016

SDS 293Machine Learning

Page 2: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Announcements / Questions

• Life Sciences and Technology Fair is tomorrow:

3:30-6pm in the Carroll Roomwww.smith.edu/lazaruscenter/fairs_scitech.php

• Office hours: does anyone have a conflict?

Page 3: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Outline• Evaluating Models

• Lab pt. 1 – Introduction to R:- Basic Commands- Graphics Overview- Indexing Data- Loading Data- Additional Graphical/Numerical Summaries

• Lab pt. 2 - Exploring other datasets (time permitting)

Page 4: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Beyond LR

Stated goal of this course:

explore methods that go beyond standard linear regression

Page 5: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

One tool to rule them all…?

Question: why not just teach you the best one first?

Page 6: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Answer: it depends

• No single method dominates over all

• On a particular data set, for a particular question, one specific method may work well; on a related but not identical dataset or question, another might be better

• Choosing the right approach is arguably the most challenging aspect of doing statistics in practice

• So how do we do it?

Page 7: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Measuring “Quality of Fit”

• One question we might ask: how well do my model’s predictions actually match the observations?

• What we need: a way to measure how close the predicted response is to the true response

• Flashback to your stats training: what do we use in regression?

Page 8: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Mean Squared Error

True responsefor the ith observation

Prediction our model givesfor the ith observation

We take the averageover all observations

Of the squared difference

Page 9: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

“Training” MSE

• This version of MSE is computed using the training data that was used to fit the model

• Reality check: is this what we care about?

Page 10: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Test MSE

• Better plan: see how well the model does on observations we didn’t train on

• Given some never-before-seen examples, we can just calculate the MSE on those using the same method

• But what if we don’t have any new observations to test? - Can we just use the training MSE?- Why or why not?

Page 11: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Example

Avg. training MSE

Test MSE

Page 12: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Training vs. Test MSE

• As the flexibility of the statistical learning method increases, we observe:- a monotone decrease in the

training MSE- a U-shape in the test MSE

• Fun fact: occurs regardless of the data set and statistical method being used

• As flexibility increases, training MSE will decrease, but the test MSE may not

Overfitting

Page 13: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Trade-off between bias and variance

• The U-shaped curve in the Test MSE is the result of two competing properties: bias and variance

• Variance refers to the amount by which the model would change if we estimated it using different training data

• Bias refers to the error that is introduced by approximating a real-life problem (which may be extremely complicated) using a much simpler model

Page 14: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Relationship between bias and variance• In general, more flexible methods have higher variance

Page 15: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Relationship between bias and variance• In general, more flexible methods have lower bias

Page 16: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Trade-off between bias and variance

• It is possible to show that the expected test MSE for a given value can be decomposed into three terms:

The bias of our model on the test value

The variance of our model on the test value

The varianceof the error terms

Page 17: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Balancing bias and variance

• We know variance and squared bias are always nonnegative (why?)

• There’s nothing we can do about the variance of the irreducible error inherent in the model

• So we’re looking for a method that minimizes the sum of the first two terms… which are (in some sense) competing

Page 18: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Balancing bias and variance

• It’s easy to build a model with low variance but high bias (how?)

• Just as easy to build one withlow bias but high variance (how?)

• The challenge: finding a method for which both the variance and the squared bias are low

• This trade-off is one of the most important recurring themes in this course

Page 19: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

What about classification?

• So far, we’ve only talked about how to evaluate the accuracy of a regression model

• The idea of a bias-variance trade-off also translates to the classification setting, but we need some minor modifications to deal with qualitative responses

• For example: we can’t really compute MSE without numerical values, so what can we do instead?

Page 20: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

We tally upall the times

Where the model’s classificationwas different from the actual class

Training error rate

• One common approach is to use the training error rate, where we measure the proportion of the times our model incorrectly classifies a training data point:

Using an indicator function

And take theaverage

Page 21: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Takeaways

• Choosing the “right” level of flexibility is critical for success in both the regression and classification settings

• The bias-variance tradeoff can make this a difficult task

• In Chapter 5, we’ll return to this topic and explore various methods for estimating test error rates

• We’ll then use these estimates to find the optimal level of flexibility for a given ML method

Page 22: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Questions?

Page 23: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Lab pt. 1: Introduction to R

• Basic Commands• Graphics• Indexing data• Loading external data• Generating summaries• Playing with real data (time permitting!)

Page 24: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Lab pt. 1: Introduction to R

Page 25: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Lab pt. 1: Introduction to R

Page 26: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Lab pt. 1: Introduction to R

Page 27: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Lab pt. 1: Introduction to R

• Today’s walkthrough (and likely many others) will be run using:

which allows me to build “notebooks” which run live R code (python, too!) in the browser

• Hint: this is also a nice way to format your homework!

Page 28: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Lab pt. 2: Exploring Other Datasets

• More datasets from the book - ISLR package- Already installed on Smith Rstudio server- Working locally? > install.packages(‘ISLR’)- Details available at: cran.r-project.org/web/packages/ISLR- Dataset descriptions: www.inside-r.org/packages/cran/ISLR/docs

• Real world data:- Olympic Athletes: goo.gl/1aUnJW- World Bank Indicators: goo.gl/0QdN9U- Airplane Bird Strikes: goo.gl/lFl5ld- …and a whole bunch more: goo.gl/kcbqfc

Page 29: LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Coming Up

• Next class: Linear Regression 1: Simple and Multiple LR

• For planning purposes: Assignment 1 will be posted next week, and will be due the following Weds (Feb. 10th)