04 linear regression
DESCRIPTION
Machine Learning Supercised Learning Algorithm to Find out the parameters values..TRANSCRIPT
-
5/26/2018 04 Linear Regression
1/57
J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR REGRESSION
-
5/26/2018 04 Linear Regression
2/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
2
Credits
!Some of these slides were sourced and/or modifiedfrom:
! Christopher Bishop, Microsoft UK
-
5/26/2018 04 Linear Regression
3/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
3
Relevant Problems from Murphy
!7.4, 7.6, 7.7, 7.9
! Please do 7.9 at least. We will discuss the solutionin class.
-
5/26/2018 04 Linear Regression
4/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
4
Linear Regression Topics
!What is linear regression?
! Example: polynomial curve fitting! Other basis families!
Solving linear regression problems! Regularized regression! Multiple linear regression!
Bayesian linear regression
-
5/26/2018 04 Linear Regression
5/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
5
What is Linear Regression?
! In classification, we seek to identify the categorical class Ck
associate with a given input vector x.
! In regression, we seek to identify (or estimate) a continuousvariable yassociated with a given input vector x.
! yis called the dependent variable.! xis called the independent variable.! If yis a vector, we call this multiple regression.! We will focus on the case where yis a scalar.!
Notation:! ywill denote the continuous model of the dependent variable! twill denote discrete noisy observations of the dependent
variable (sometimes called the target variable).
-
5/26/2018 04 Linear Regression
6/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
6
Where is the Linear in Linear Regression?
!In regression we assume that y is a function of x.The exact nature of this function is governed by an
unknown parameter vector w:
! The regression is linear if yis linear in w. In otherwords, we can express yas
y = y x w( )
y = t! x
( )where! x( ) is some (potentially nonlinear) function of x.
-
5/26/2018 04 Linear Regression
7/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
7
Linear Basis Function Models
!Generally
!where !j(
x
)are known as basis functions.! Typically, "0(x) = 1, so that w0acts as a bias.! In the simplest case, we use linear basis functions :"d(x) = xd.
-
5/26/2018 04 Linear Regression
8/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
8
Linear Regression Topics
!What is linear regression?
! Example: polynomial curve fitting! Other basis families!
Solving linear regression problems! Regularized regression! Multiple linear regression!
Bayesian linear regression
-
5/26/2018 04 Linear Regression
9/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
9
Example: Polynomial Bases
! Polynomial basisfunctions:
!These are global! a small change in xaffects all basis functions.
! A small change in abasis function affects yfor all x.
-
5/26/2018 04 Linear Regression
10/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
10
Example: Polynomial Curve Fitting
-
5/26/2018 04 Linear Regression
11/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
11
Sum-of-Squares Error Function
-
5/26/2018 04 Linear Regression
12/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
12
1stOrder Polynomial
-
5/26/2018 04 Linear Regression
13/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
13
3rdOrder Polynomial
-
5/26/2018 04 Linear Regression
14/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
14
9thOrder Polynomial
-
5/26/2018 04 Linear Regression
15/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
15
Regularization
! Penalize large coefficient values
-
5/26/2018 04 Linear Regression
16/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
16
Regularization
!"#
%&'(& )*+,-*./0+
-
5/26/2018 04 Linear Regression
17/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
17
Regularization
!"#
%&'(& )*+,-*./0+
-
5/26/2018 04 Linear Regression
18/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
18
Regularization
!"#
%&'(& )*+,-*./0+
-
5/26/2018 04 Linear Regression
19/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
19
Probabilistic View of Curve Fitting
! Why least squares?! Model noise (deviation of data from model) as
Gaussian i.i.d.
where ! !1
"2is the precision of the noise.
-
5/26/2018 04 Linear Regression
20/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
20
Maximum Likelihood
! We determine wMLby minimizing the squared error E(w).
! Thus least-squares regression reflects an assumption that thenoise is i.i.d. Gaussian.
-
5/26/2018 04 Linear Regression
21/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
21
Maximum Likelihood
! We determine wMLby minimizing the squared error E(w).
! Now given wML, we can estimate the variance of the noise:
-
5/26/2018 04 Linear Regression
22/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
22
Predictive Distribution
Generating function
Observed data
Maximum likelihood prediction
Posterior over t
-
5/26/2018 04 Linear Regression
23/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
23
MAP: A Step towards Bayes
! Prior knowledge about probable values of wcan be incorporated into theregression:
! Now the posterior over wis proportional to the product of the likelihoodtimes the prior:
! The result is to introduce a new quadratic term in winto the error functionto be minimized:
! Thus regularized (ridge) regression reflects a 0-mean isotropic Gaussianprior on the weights.
-
5/26/2018 04 Linear Regression
24/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
24
Linear Regression Topics
!What is linear regression?
! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression
-
5/26/2018 04 Linear Regression
25/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
25
Gaussian Bases
! Gaussian basis functions:
!These are local:! a small change in x affectsonly nearby basis functions.
! a small change in a basisfunction affects y only fornearby x.
!jand s control locationand scale (width).
Think of these as interpolation functions.
-
5/26/2018 04 Linear Regression
26/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
26
Linear Regression Topics
!What is linear regression?
! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression
-
5/26/2018 04 Linear Regression
27/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
27
Maximum Likelihood and Linear Least Squares
! Assume observations from a deterministic function withadded Gaussian noise:
! which is the same as saying,
! where
where
-
5/26/2018 04 Linear Regression
28/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
28
Maximum Likelihood and Linear Least Squares
! Given observed inputs, , andtargets, we obtain the likelihood
function
-
5/26/2018 04 Linear Regression
29/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
29
Maximum Likelihood and Linear Least Squares
! Taking the logarithm, we get
! where
! is the sum-of-squares error.
-
5/26/2018 04 Linear Regression
30/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
30
! Computing the gradient and setting it to zero yields
!Solving for w, we get
! where
Maximum Likelihood and Least Squares
The Moore-Penrose
pseudo-inverse, .
-
5/26/2018 04 Linear Regression
31/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
31
Linear Regression Topics
! What is linear regression?! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression
-
5/26/2018 04 Linear Regression
32/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
32
Regularized Least Squares
! Consider the error function:
! With the sum-of-squares error function and aquadratic regularizer, we get
! which is minimized by
Data term + Regularization term
is called theregularization
coefficient.
Thus the name ridge regression
-
5/26/2018 04 Linear Regression
33/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
33
Application: Colour Restoration
0 100 200 3000
50
100
150
200
250
300
Red channel
Greenchann
el
0 100 200 3000
50
100
150
200
250
300
Blue channel
Greenchann
el
-
5/26/2018 04 Linear Regression
34/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
34
Application: Colour Restoration
RemoveGreen
RestoreGreen
Original Image Red and Blue Channels Only Predicted Image
-
5/26/2018 04 Linear Regression
35/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
35
Regularized Least Squares
! A more general regularizer:
Lasso Quadratic
(Least absolute shrinkage and selection operator)
-
5/26/2018 04 Linear Regression
36/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
36
Regularized Least Squares
!Lasso generates sparse solutions.
Iso-contoursof data term ED(w)
Iso-contour ofregularization term EW(w)
Quadratic Lasso
-
5/26/2018 04 Linear Regression
37/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
37
Solving Regularized Systems
! Quadratic regularization has the advantage thatthe solution is closed form.
! Non-quadratic regularizers generally do not haveclosed form solutions
! Lasso can be framed as minimizing a quadraticerror with linear constraints, and thus represents aconvex optimization problem that can be solved byquadratic programming or other convex
optimization methods.! We will discuss quadratic programming when we
cover SVMs
-
5/26/2018 04 Linear Regression
38/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
38
Linear Regression Topics
! What is linear regression?! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression
-
5/26/2018 04 Linear Regression
39/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
39
Multiple Outputs
! Analogous to the single output case we have:
! Given observed inputs , andtargets
we obtain the log likelihood function
-
5/26/2018 04 Linear Regression
40/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
40
Multiple Outputs
! Maximizing with respect to W, we obtain
! If we consider a single target variable, tk, we see that
! where , which is identical with thesingle output case.
-
5/26/2018 04 Linear Regression
41/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
41
Some Useful MATLAB Functions
! polyfit! Least-squares fit of a polynomial of specified order to
given data
! regress! More general function that computes linear weights for
least-squares fit
-
5/26/2018 04 Linear Regression
42/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
42
Linear Regression Topics
! What is linear regression?! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression
-
5/26/2018 04 Linear Regression
43/57
Bayesian Linear Regression
Rev. Thomas Bayes, 1702 - 1761
-
5/26/2018 04 Linear Regression
44/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
44
Bayesian Linear Regression
! In least-squares, we determine the weights wthatminimize the least squared error between the model
and the training data.
! This can result in overlearning!! Overlearning can be reduced by adding a
regularizing term to the error function beingminimized.
! Under specific conditions this is equivalent to aBayesian approach, where we specify a prior
distribution over the weight vector.
-
5/26/2018 04 Linear Regression
45/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
45
Bayesian Linear Regression
! Define a conjugate prior over w:
! Combining this with the likelihood function andmatching terms, we obtain
! where
B L R
-
5/26/2018 04 Linear Regression
46/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
46
Bayesian Linear Regression
! A common choice for the prior is
!for which
!Thus mNrepresents the ridge regression solution with
!Next we consider an example ! ="/#
B i Li R i
-
5/26/2018 04 Linear Regression
47/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
47
Bayesian Linear Regression
Example: fitting a straight line
0 data points observedPrior Data Space
B i Li R i
-
5/26/2018 04 Linear Regression
48/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
48
Bayesian Linear Regression
1 data point observed
Likelihood for (x1,t1) Posterior Data Space
B i Li R i
-
5/26/2018 04 Linear Regression
49/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
49
Bayesian Linear Regression
2 data points observed
Posterior Data SpaceLikelihood for (x2,t2)
B i Li R i
-
5/26/2018 04 Linear Regression
50/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
50
Bayesian Linear Regression
20 data points observed
Posterior Data SpaceLikelihood for (x20,t20)
B i P di i
-
5/26/2018 04 Linear Regression
51/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
51
Bayesian Prediction
! In least-squares, or regularized least-squares, wedetermine specific weights w that allow us to predict
a specific value y(x,w) for every observed input x.
! However, our estimate of the weight vector wwillnever be perfect! This will introduce error into ourprediction.
! In Bayesian prediction, we model the posterior
distribution over our predictions, taking into accountour uncertainty in model parameters.
P di ti Di t ib ti
-
5/26/2018 04 Linear Regression
52/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
52
Predictive Distribution
! Predict tfor new values of xby integrating over w:
! where
P di ti Di t ib ti
-
5/26/2018 04 Linear Regression
53/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
53
Predictive Distribution
! Example: Sinusoidal data, 9 Gaussian basis functions,1 data point
E t| t,!,"#$ %&
p t| t,!,"( )
Samples of y(x,w)
Notice how much bigger our uncertainty is
relative to the ML method!!
P di ti Di t ib ti
-
5/26/2018 04 Linear Regression
54/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
54
Predictive Distribution
! Example: Sinusoidal data, 9 Gaussian basis functions,2 data points
Samples of y(x,w)E t| t,!,"#$ %&
p t| t,!,"( )
P di ti Di t ib ti
-
5/26/2018 04 Linear Regression
55/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
55
Predictive Distribution
! Example: Sinusoidal data, 9 Gaussian basis functions,4 data points
E t| t,!,"#$ %& p t| t,!,"( )
Samples of y(x,w)
P di ti Di t ib ti
-
5/26/2018 04 Linear Regression
56/57
Probability & Bayesian Inference
J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
56
Predictive Distribution
! Example: Sinusoidal data, 9 Gaussian basis functions,25 data points
Samples of y(x,w)E t| t,!,"#$ %&
p t| t,!,"( )
Li R i T i
-
5/26/2018 04 Linear Regression
57/57
Probability & Bayesian Inference57
Linear Regression Topics
! What is linear regression?! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression