prediction variance in linear regression

12
Prediction variance in Linear Regression • Assumptions on noise in linear regression allow us to estimate the prediction variance due to the noise at any point. • Prediction variance is usually large when you are far from a data point. • We distinguish between interpolation, when we are in the convex hull of the data points, and extrapolation where we are outside. • Extrapolation is associated with larger errors, and in high dimensions it usually cannot be avoided.

Upload: dinos

Post on 23-Feb-2016

43 views

Category:

Documents


1 download

DESCRIPTION

Prediction variance in Linear Regression. Assumptions on noise in linear regression allow us to estimate the prediction variance due to the noise at any point. Prediction variance is usually large when you are far from a data point. - PowerPoint PPT Presentation

TRANSCRIPT

4.2.1 Interpolation, extrapolation and prediction variance

Prediction variance in Linear RegressionAssumptions on noise in linear regression allow us to estimate the prediction variance due to the noise at any point.Prediction variance is usually large when you are far from a data point.We distinguish between interpolation, when we are in the convex hull of the data points, and extrapolation where we are outside.Extrapolation is associated with larger errors, and in high dimensions it usually cannot be avoided.

When we fit a surrogate to data we typically calculate some measure of the overall accuracy of the fit, such as standard error or cross validation rms. However, when we use the surrogate for prediction, it is helpful to also have an idea of the expected accuracy at the prediction point.

In linear regression we typically assume that the data is contaminated with normally distributed noise. This means that if we generated the data again, we would get different data and a different fit. The assumptions on the noise will allow us to estimate the variance in the prediction of the fit at any given point. This is called prediction variance. The square root of the variance is an estimate of the standard deviation in the prediction at that point. Even if the error in the fit is not due to noise, or only to noise, the prediction variance often gives a good idea of the expected accuracy at a point. This is important because even if the fit is good overall, it may still have large errors at some points, especially at points that are far from data points. In general, we expect that in the convex hull of the data points (well remind you of the definition of convex hull on Slide 5) the predictions will be more accurate than outside, or in other words, interpolation will be more accurate than extrapolation.1Linear Regression

2Model based error for linear regression

3Prediction varianceLinear regression model

Define then

With some algebra

Standard error

4Interpolation, extrapolation and regressionInterpolation is often contrasted to regression or least-squares fitAs important is the contrast between interpolation and extrapolationExtrapolation occurs when we are outside the convex hull of the data points

For high dimensional spaces we must have extrapolation!

We can expect that surrogates will have larger errors at points that are far from data points, or in more general terms when we extrapolate instead of interpolate. There is a small problem in terminology. The term interpolation is often used as the opposite of regression. That is, if you fit the data so that the surrogate passes through the data, you use an interpolant, and it is interpolation whether you are inside or outside the bounds of the data. Here we contrast interpolation with extrapolation, and so we have to define what is interpolation in higher dimensions. The accepted definition is that we interpolate when we stay in the convex hull of the data points, and the convex hull is defined as all the points that can be accessed as a convex combination of the data points. For example, the triangle that is defined by three points (inside and boundary) is the convex hull of three points.

Since extrapolation is associated with large errors, it would appear that we should distribute data points in such a way that we would avoid extrapolation, but in high dimensional space this is often not an option. For example, if we have 20 variables, and each variable is define in an interval, the entire design space is a box with about a million vertices (2^20=1,048,576). We would have to have more than a million data points to avoid extrapolation in that box.52D example of convex hullBy generating 20 points at random in the unit square we end up with substantial region near the origin where we will need to use extrapolationUsing the data in the notes, give a couple ofalternative sets of alphas Approximately for the point(0.4,0.4)

As a two-dimensional example of the convex hull, we generate 20 points at random in the region 0