ridge regression and bayesian linear regression kenneth d. harris 6/5/15
TRANSCRIPT
Ridge regression and Bayesian linear
regressionKenneth D. Harris
6/5/15
Multiple linear regression
What are you predicting?
Data type Continuous
Dimensionality 1
What are you predicting it from?
Data type Continuous
Dimensionality p
How many data points do you have? Enough
What sort of prediction do you need? Single best guess
What sort of relationship can you assume? Linear
Multiple linear regression
What are you predicting?
Data type Continuous
Dimensionality 1
What are you predicting it from?
Data type Continuous
Dimensionality p
How many data points do you have? Not enough
What sort of prediction do you need? Single best guess
What sort of relationship can you assume? Linear
Multiple predictors, one predicted variable
• Choose to minimize sum-squared error:
Optimal weight vector (in MATLAB)
Too many predictors
• If , you can fit the training data perfectly• is equations in unknowns
• If , the solution is underconstrained ( is not invertible)
• But even if , you can problems with too many predictors
𝑁=40 ,𝑝=30 , 𝑦=𝑥1
𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒
𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒
Geometric interpretation
Target
𝐱𝟏
Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .
It would be better to just fit .
SignalN
oise
𝐱𝟐
Geometric interpretation
Target
𝐱𝟏
Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .
It would be better to just fit .
SignalN
oise
𝐱𝟐
Overfitting = large weight vectors
• Solution: weight vector penalty
Optimal weight vector
The inverse can always be taken, even for .
Example
𝜆=0 𝜆=3
Ridge regression introduces a bias𝜆=0 𝜆=50
A quick trick to do ridge regression
• Ordinary linear regression:
Minimizes . Define
Then is the solution to ridge regression. (Why?)
Regression as a probability model
What are you predicting?
Data type Continuous
Dimensionality 1
What are you predicting it from?
Data type Continuous
Dimensionality p
How many data points do you have? Enough
What sort of prediction do you need? Probability distribution
What sort of relationship can you assume? Linear
Regression as a probability model
• Assume is random, but and are just numbers.
Then the likelihood is
Maximum likelihood is the same as least-squares fit.
Bayesian linear regression
• Now consider to also be random with prior distribution:
The posterior distribution is
Bayesian linear regression
This is all quadratic in . So is Gaussian distributed.
Bayesian linear regression
Mean of is exactly the same as in ridge regression. But we also get a covariance matrix for .
Bayesian predictions• Given a training set , and a new value Assume is random but are fixed.
• To make a prediction of , integrate over all possible :
Mean is the same as in ridge regression, but we also get a variance:.
The variance does not depend on the training set . It is low when many of the training set values are collinear with .