04 linear regression

57
J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR REGRESSION

Upload: muzzammil-khawaja

Post on 16-Oct-2015

21 views

Category:

Documents


0 download

DESCRIPTION

Machine Learning Supercised Learning Algorithm to Find out the parameters values..

TRANSCRIPT

  • 5/26/2018 04 Linear Regression

    1/57

    J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    LINEAR REGRESSION

  • 5/26/2018 04 Linear Regression

    2/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    2

    Credits

    !Some of these slides were sourced and/or modifiedfrom:

    ! Christopher Bishop, Microsoft UK

  • 5/26/2018 04 Linear Regression

    3/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    3

    Relevant Problems from Murphy

    !7.4, 7.6, 7.7, 7.9

    ! Please do 7.9 at least. We will discuss the solutionin class.

  • 5/26/2018 04 Linear Regression

    4/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    4

    Linear Regression Topics

    !What is linear regression?

    ! Example: polynomial curve fitting! Other basis families!

    Solving linear regression problems! Regularized regression! Multiple linear regression!

    Bayesian linear regression

  • 5/26/2018 04 Linear Regression

    5/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    5

    What is Linear Regression?

    ! In classification, we seek to identify the categorical class Ck

    associate with a given input vector x.

    ! In regression, we seek to identify (or estimate) a continuousvariable yassociated with a given input vector x.

    ! yis called the dependent variable.! xis called the independent variable.! If yis a vector, we call this multiple regression.! We will focus on the case where yis a scalar.!

    Notation:! ywill denote the continuous model of the dependent variable! twill denote discrete noisy observations of the dependent

    variable (sometimes called the target variable).

  • 5/26/2018 04 Linear Regression

    6/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    6

    Where is the Linear in Linear Regression?

    !In regression we assume that y is a function of x.The exact nature of this function is governed by an

    unknown parameter vector w:

    ! The regression is linear if yis linear in w. In otherwords, we can express yas

    y = y x w( )

    y = t! x

    ( )where! x( ) is some (potentially nonlinear) function of x.

  • 5/26/2018 04 Linear Regression

    7/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    7

    Linear Basis Function Models

    !Generally

    !where !j(

    x

    )are known as basis functions.! Typically, "0(x) = 1, so that w0acts as a bias.! In the simplest case, we use linear basis functions :"d(x) = xd.

  • 5/26/2018 04 Linear Regression

    8/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    8

    Linear Regression Topics

    !What is linear regression?

    ! Example: polynomial curve fitting! Other basis families!

    Solving linear regression problems! Regularized regression! Multiple linear regression!

    Bayesian linear regression

  • 5/26/2018 04 Linear Regression

    9/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    9

    Example: Polynomial Bases

    ! Polynomial basisfunctions:

    !These are global! a small change in xaffects all basis functions.

    ! A small change in abasis function affects yfor all x.

  • 5/26/2018 04 Linear Regression

    10/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    10

    Example: Polynomial Curve Fitting

  • 5/26/2018 04 Linear Regression

    11/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    11

    Sum-of-Squares Error Function

  • 5/26/2018 04 Linear Regression

    12/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    12

    1stOrder Polynomial

  • 5/26/2018 04 Linear Regression

    13/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    13

    3rdOrder Polynomial

  • 5/26/2018 04 Linear Regression

    14/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    14

    9thOrder Polynomial

  • 5/26/2018 04 Linear Regression

    15/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    15

    Regularization

    ! Penalize large coefficient values

  • 5/26/2018 04 Linear Regression

    16/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    16

    Regularization

    !"#

    %&'(& )*+,-*./0+

  • 5/26/2018 04 Linear Regression

    17/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    17

    Regularization

    !"#

    %&'(& )*+,-*./0+

  • 5/26/2018 04 Linear Regression

    18/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    18

    Regularization

    !"#

    %&'(& )*+,-*./0+

  • 5/26/2018 04 Linear Regression

    19/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    19

    Probabilistic View of Curve Fitting

    ! Why least squares?! Model noise (deviation of data from model) as

    Gaussian i.i.d.

    where ! !1

    "2is the precision of the noise.

  • 5/26/2018 04 Linear Regression

    20/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    20

    Maximum Likelihood

    ! We determine wMLby minimizing the squared error E(w).

    ! Thus least-squares regression reflects an assumption that thenoise is i.i.d. Gaussian.

  • 5/26/2018 04 Linear Regression

    21/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    21

    Maximum Likelihood

    ! We determine wMLby minimizing the squared error E(w).

    ! Now given wML, we can estimate the variance of the noise:

  • 5/26/2018 04 Linear Regression

    22/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    22

    Predictive Distribution

    Generating function

    Observed data

    Maximum likelihood prediction

    Posterior over t

  • 5/26/2018 04 Linear Regression

    23/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    23

    MAP: A Step towards Bayes

    ! Prior knowledge about probable values of wcan be incorporated into theregression:

    ! Now the posterior over wis proportional to the product of the likelihoodtimes the prior:

    ! The result is to introduce a new quadratic term in winto the error functionto be minimized:

    ! Thus regularized (ridge) regression reflects a 0-mean isotropic Gaussianprior on the weights.

  • 5/26/2018 04 Linear Regression

    24/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    24

    Linear Regression Topics

    !What is linear regression?

    ! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression

  • 5/26/2018 04 Linear Regression

    25/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    25

    Gaussian Bases

    ! Gaussian basis functions:

    !These are local:! a small change in x affectsonly nearby basis functions.

    ! a small change in a basisfunction affects y only fornearby x.

    !jand s control locationand scale (width).

    Think of these as interpolation functions.

  • 5/26/2018 04 Linear Regression

    26/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    26

    Linear Regression Topics

    !What is linear regression?

    ! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression

  • 5/26/2018 04 Linear Regression

    27/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    27

    Maximum Likelihood and Linear Least Squares

    ! Assume observations from a deterministic function withadded Gaussian noise:

    ! which is the same as saying,

    ! where

    where

  • 5/26/2018 04 Linear Regression

    28/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    28

    Maximum Likelihood and Linear Least Squares

    ! Given observed inputs, , andtargets, we obtain the likelihood

    function

  • 5/26/2018 04 Linear Regression

    29/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    29

    Maximum Likelihood and Linear Least Squares

    ! Taking the logarithm, we get

    ! where

    ! is the sum-of-squares error.

  • 5/26/2018 04 Linear Regression

    30/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    30

    ! Computing the gradient and setting it to zero yields

    !Solving for w, we get

    ! where

    Maximum Likelihood and Least Squares

    The Moore-Penrose

    pseudo-inverse, .

  • 5/26/2018 04 Linear Regression

    31/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    31

    Linear Regression Topics

    ! What is linear regression?! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression

  • 5/26/2018 04 Linear Regression

    32/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    32

    Regularized Least Squares

    ! Consider the error function:

    ! With the sum-of-squares error function and aquadratic regularizer, we get

    ! which is minimized by

    Data term + Regularization term

    is called theregularization

    coefficient.

    Thus the name ridge regression

  • 5/26/2018 04 Linear Regression

    33/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    33

    Application: Colour Restoration

    0 100 200 3000

    50

    100

    150

    200

    250

    300

    Red channel

    Greenchann

    el

    0 100 200 3000

    50

    100

    150

    200

    250

    300

    Blue channel

    Greenchann

    el

  • 5/26/2018 04 Linear Regression

    34/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    34

    Application: Colour Restoration

    RemoveGreen

    RestoreGreen

    Original Image Red and Blue Channels Only Predicted Image

  • 5/26/2018 04 Linear Regression

    35/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    35

    Regularized Least Squares

    ! A more general regularizer:

    Lasso Quadratic

    (Least absolute shrinkage and selection operator)

  • 5/26/2018 04 Linear Regression

    36/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    36

    Regularized Least Squares

    !Lasso generates sparse solutions.

    Iso-contoursof data term ED(w)

    Iso-contour ofregularization term EW(w)

    Quadratic Lasso

  • 5/26/2018 04 Linear Regression

    37/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    37

    Solving Regularized Systems

    ! Quadratic regularization has the advantage thatthe solution is closed form.

    ! Non-quadratic regularizers generally do not haveclosed form solutions

    ! Lasso can be framed as minimizing a quadraticerror with linear constraints, and thus represents aconvex optimization problem that can be solved byquadratic programming or other convex

    optimization methods.! We will discuss quadratic programming when we

    cover SVMs

  • 5/26/2018 04 Linear Regression

    38/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    38

    Linear Regression Topics

    ! What is linear regression?! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression

  • 5/26/2018 04 Linear Regression

    39/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    39

    Multiple Outputs

    ! Analogous to the single output case we have:

    ! Given observed inputs , andtargets

    we obtain the log likelihood function

  • 5/26/2018 04 Linear Regression

    40/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    40

    Multiple Outputs

    ! Maximizing with respect to W, we obtain

    ! If we consider a single target variable, tk, we see that

    ! where , which is identical with thesingle output case.

  • 5/26/2018 04 Linear Regression

    41/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    41

    Some Useful MATLAB Functions

    ! polyfit! Least-squares fit of a polynomial of specified order to

    given data

    ! regress! More general function that computes linear weights for

    least-squares fit

  • 5/26/2018 04 Linear Regression

    42/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    42

    Linear Regression Topics

    ! What is linear regression?! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression

  • 5/26/2018 04 Linear Regression

    43/57

    Bayesian Linear Regression

    Rev. Thomas Bayes, 1702 - 1761

  • 5/26/2018 04 Linear Regression

    44/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    44

    Bayesian Linear Regression

    ! In least-squares, we determine the weights wthatminimize the least squared error between the model

    and the training data.

    ! This can result in overlearning!! Overlearning can be reduced by adding a

    regularizing term to the error function beingminimized.

    ! Under specific conditions this is equivalent to aBayesian approach, where we specify a prior

    distribution over the weight vector.

  • 5/26/2018 04 Linear Regression

    45/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    45

    Bayesian Linear Regression

    ! Define a conjugate prior over w:

    ! Combining this with the likelihood function andmatching terms, we obtain

    ! where

    B L R

  • 5/26/2018 04 Linear Regression

    46/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    46

    Bayesian Linear Regression

    ! A common choice for the prior is

    !for which

    !Thus mNrepresents the ridge regression solution with

    !Next we consider an example ! ="/#

    B i Li R i

  • 5/26/2018 04 Linear Regression

    47/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    47

    Bayesian Linear Regression

    Example: fitting a straight line

    0 data points observedPrior Data Space

    B i Li R i

  • 5/26/2018 04 Linear Regression

    48/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    48

    Bayesian Linear Regression

    1 data point observed

    Likelihood for (x1,t1) Posterior Data Space

    B i Li R i

  • 5/26/2018 04 Linear Regression

    49/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    49

    Bayesian Linear Regression

    2 data points observed

    Posterior Data SpaceLikelihood for (x2,t2)

    B i Li R i

  • 5/26/2018 04 Linear Regression

    50/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    50

    Bayesian Linear Regression

    20 data points observed

    Posterior Data SpaceLikelihood for (x20,t20)

    B i P di i

  • 5/26/2018 04 Linear Regression

    51/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    51

    Bayesian Prediction

    ! In least-squares, or regularized least-squares, wedetermine specific weights w that allow us to predict

    a specific value y(x,w) for every observed input x.

    ! However, our estimate of the weight vector wwillnever be perfect! This will introduce error into ourprediction.

    ! In Bayesian prediction, we model the posterior

    distribution over our predictions, taking into accountour uncertainty in model parameters.

    P di ti Di t ib ti

  • 5/26/2018 04 Linear Regression

    52/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    52

    Predictive Distribution

    ! Predict tfor new values of xby integrating over w:

    ! where

    P di ti Di t ib ti

  • 5/26/2018 04 Linear Regression

    53/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    53

    Predictive Distribution

    ! Example: Sinusoidal data, 9 Gaussian basis functions,1 data point

    E t| t,!,"#$ %&

    p t| t,!,"( )

    Samples of y(x,w)

    Notice how much bigger our uncertainty is

    relative to the ML method!!

    P di ti Di t ib ti

  • 5/26/2018 04 Linear Regression

    54/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    54

    Predictive Distribution

    ! Example: Sinusoidal data, 9 Gaussian basis functions,2 data points

    Samples of y(x,w)E t| t,!,"#$ %&

    p t| t,!,"( )

    P di ti Di t ib ti

  • 5/26/2018 04 Linear Regression

    55/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    55

    Predictive Distribution

    ! Example: Sinusoidal data, 9 Gaussian basis functions,4 data points

    E t| t,!,"#$ %& p t| t,!,"( )

    Samples of y(x,w)

    P di ti Di t ib ti

  • 5/26/2018 04 Linear Regression

    56/57

    Probability & Bayesian Inference

    J. ElderCSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

    56

    Predictive Distribution

    ! Example: Sinusoidal data, 9 Gaussian basis functions,25 data points

    Samples of y(x,w)E t| t,!,"#$ %&

    p t| t,!,"( )

    Li R i T i

  • 5/26/2018 04 Linear Regression

    57/57

    Probability & Bayesian Inference57

    Linear Regression Topics

    ! What is linear regression?! Example: polynomial curve fitting! Other basis families! Solving linear regression problems! Regularized regression! Multiple linear regression! Bayesian linear regression