lecture 02: evaluating models january 27, 2016 sds 293 machine learning

LECTURE 02:

EVALUATING MODELSJanuary 27, 2016

SDS 293Machine Learning

Announcements / Questions

• Life Sciences and Technology Fair is tomorrow:

3:30-6pm in the Carroll Roomwww.smith.edu/lazaruscenter/fairs_scitech.php

• Office hours: does anyone have a conflict?

http://www.smith.edu/lazaruscenter/fairs_scitech.php





Outline• Evaluating Models

• Lab pt. 1 – Introduction to R:- Basic Commands- Graphics Overview- Indexing Data- Loading Data- Additional Graphical/Numerical Summaries

• Lab pt. 2 - Exploring other datasets (time permitting)

Beyond LR

Stated goal of this course:

explore methods that go beyond standard linear regression

One tool to rule them all…?

Question: why not just teach you the best one first?

Answer: it depends

• No single method dominates over all

• On a particular data set, for a particular question, one specific method may work well; on a related but not identical dataset or question, another might be better

• Choosing the right approach is arguably the most challenging aspect of doing statistics in practice

• So how do we do it?

Measuring “Quality of Fit”

• One question we might ask: how well do my model’s predictions actually match the observations?

• What we need: a way to measure how close the predicted response is to the true response

• Flashback to your stats training: what do we use in regression?

Mean Squared Error

True responsefor the ith observation

Prediction our model givesfor the ith observation

We take the averageover all observations

Of the squared difference

“Training” MSE

• This version of MSE is computed using the training data that was used to fit the model

• Reality check: is this what we care about?

Test MSE

• Better plan: see how well the model does on observations we didn’t train on

• Given some never-before-seen examples, we can just calculate the MSE on those using the same method

• But what if we don’t have any new observations to test? - Can we just use the training MSE?- Why or why not?

Example

Avg. training MSE

Test MSE

Training vs. Test MSE

• As the flexibility of the statistical learning method increases, we observe:- a monotone decrease in the

training MSE- a U-shape in the test MSE

• Fun fact: occurs regardless of the data set and statistical method being used

• As flexibility increases, training MSE will decrease, but the test MSE may not

Overfitting

Trade-off between bias and variance

• The U-shaped curve in the Test MSE is the result of two competing properties: bias and variance

• Variance refers to the amount by which the model would change if we estimated it using different training data

• Bias refers to the error that is introduced by approximating a real-life problem (which may be extremely complicated) using a much simpler model

Relationship between bias and variance• In general, more flexible methods have higher variance

Relationship between bias and variance• In general, more flexible methods have lower bias

Trade-off between bias and variance

• It is possible to show that the expected test MSE for a given value can be decomposed into three terms:

The bias of our model on the test value

The variance of our model on the test value

The varianceof the error terms

Balancing bias and variance

• We know variance and squared bias are always nonnegative (why?)

• There’s nothing we can do about the variance of the irreducible error inherent in the model

• So we’re looking for a method that minimizes the sum of the first two terms… which are (in some sense) competing

Balancing bias and variance

• It’s easy to build a model with low variance but high bias (how?)

• Just as easy to build one withlow bias but high variance (how?)

• The challenge: finding a method for which both the variance and the squared bias are low

• This trade-off is one of the most important recurring themes in this course

What about classification?

• So far, we’ve only talked about how to evaluate the accuracy of a regression model

• The idea of a bias-variance trade-off also translates to the classification setting, but we need some minor modifications to deal with qualitative responses

• For example: we can’t really compute MSE without numerical values, so what can we do instead?

We tally upall the times

Where the model’s classificationwas different from the actual class

Training error rate

• One common approach is to use the training error rate, where we measure the proportion of the times our model incorrectly classifies a training data point:

Using an indicator function

And take theaverage

Takeaways

• Choosing the “right” level of flexibility is critical for success in both the regression and classification settings

• The bias-variance tradeoff can make this a difficult task

• In Chapter 5, we’ll return to this topic and explore various methods for estimating test error rates

• We’ll then use these estimates to find the optimal level of flexibility for a given ML method

Questions?

Lab pt. 1: Introduction to R

• Basic Commands• Graphics• Indexing data• Loading external data• Generating summaries• Playing with real data (time permitting!)


• Today’s walkthrough (and likely many others) will be run using:

which allows me to build “notebooks” which run live R code (python, too!) in the browser

• Hint: this is also a nice way to format your homework!

Lab pt. 2: Exploring Other Datasets

• More datasets from the book - ISLR package- Already installed on Smith Rstudio server- Working locally? > install.packages(‘ISLR’)- Details available at: cran.r-project.org/web/packages/ISLR- Dataset descriptions: www.inside-r.org/packages/cran/ISLR/docs

• Real world data:- Olympic Athletes: goo.gl/1aUnJW- World Bank Indicators: goo.gl/0QdN9U- Airplane Bird Strikes: goo.gl/lFl5ld- …and a whole bunch more: goo.gl/kcbqfc

https://cran.r-project.org/web/packages/ISLR

http://www.inside-r.org/packages/cran/ISLR/docs

http://www.inside-r.org/packages/cran/ISLR/docs

http://goo.gl/1aUnJW



http://goo.gl/0QdN9U



http://goo.gl/lFl5ld



https://goo.gl/kcbqfc

https://goo.gl/kcbqfc

Coming Up

• Next class: Linear Regression 1: Simple and Multiple LR

• For planning purposes: Assignment 1 will be posted next week, and will be due the following Weds (Feb. 10th)

lecture 02: evaluating models january 27, 2016 sds 293 machine learning

Documents