3.linear models for regression

Upload: faheem-khan

Post on 19-Feb-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/23/2019 3.Linear Models for Regression

    1/103

    Linear Models For Regression

    (Supervised Learning)

    :

  • 7/23/2019 3.Linear Models for Regression

    2/103

    Contents

    2

    Deterministic linear model regression

    Line fitting

    Curve fitting

    Regularization

    Basis function

    ML-based probabilistic linear model regression

    Bayesian linear model regression

    Maximum a posteriori (MAP) estimation

    Bayesian estimation

    Evidence approximation

  • 7/23/2019 3.Linear Models for Regression

    3/103

    Deterministic Linear Model

    Regression

  • 7/23/2019 3.Linear Models for Regression

    4/103

    What is Line Fitting ?

    4

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    5/103

    Linear Model : Line Fitting

    5

    Deterministic linear model regression

    Given a vector of d-dimensional inputs we want to

    predict the target (response) using the linear model:1 2( , ,..., )dx x x

    x

    The term w0is the intercept, or called bias term. It will be convenient toinclude The constant variable 1 in xand write:

    Observe a training set consisting ofNobservations

    Together with corresponding target values

    1 2( , ,..., )N

    X x x x

    1 2( , ,..., )Nt t t

    t

    Note that X is an N( d+ 1) matrix

  • 7/23/2019 3.Linear Models for Regression

    6/103

    How to Find(Learn) Optimal w

    6

    Deterministic linear model regression

    One option is to minimize the sum of the squares of the errors

    between the predictions for each data point xnand thecorresponding real-valued targets tn.

    ( , )ny x w

    Number of training data

    * Matrix form

  • 7/23/2019 3.Linear Models for Regression

    7/103

    Transforming Objective Function

    7

    Deterministic linear model regression

    Stack the data into a matrix

    and use the norm operationto handle the sum

    2

    1

    2 2 2

    1 1 2 2

    2

    1 1 2 2 2

    2

    1 1

    2 2

    2

    2

    2

    1( )2

    1( ) ( ) ... ( )

    2

    1 ( ) ( ) ... ( )2

    12

    1 1

    2 2

    N

    n n

    n

    N N

    N N

    N N

    E t

    t t t

    t t t

    t

    t

    t

    w x w

    x w x w x w

    x w x w x w

    x

    x

    w

    x

    t Xw t Xw t Xw

  • 7/23/2019 3.Linear Models for Regression

    8/103

    How to Find(Learn) Optimal w

    8

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    9/103

    Matrix Derivatives

    Supplementary

  • 7/23/2019 3.Linear Models for Regression

    10/103

    Derivatives(Vectors)

    Supplementary

    Vector-by-scalar

    Scalar-by-vector (Gradient)

    Vector-by-vector

  • 7/23/2019 3.Linear Models for Regression

    11/103

  • 7/23/2019 3.Linear Models for Regression

    12/103

    Derivatives Examples

    Supplementary

    Example : Scalar-by-vector

    Example : Scalar-by-matrix

    http://en.wikipedia.org/wiki/Matrix_calculus

  • 7/23/2019 3.Linear Models for Regression

    13/103

    Optimal w : Derivation

    13

    Deterministic linear model regression

    1

    0

    ( )

    T T T

    T T T

    T T

    T T

    w X X t X

    w X X t X

    X Xw X t

    w X X X t

    1( ) ( ) ( )

    2

    1( )( )

    2

    1 ( )2

    1( )

    2

    1 1

    2 2

    T

    T T T

    T T T T T T

    T T T T T

    T T T T

    E w

    Xw t Xw t

    w X t Xw t

    w X Xw w X t t Xw t t

    w X Xw t Xw t Xw t t

    w X Xw t Xw t t 2T T T T

    w X Xw w X X

    w

    1 1( )

    2 2

    T T T T T T T

    w w X Xw t Xw t t w X X t X

    : symmetric sinceT

    A A ( )T T T

    A A A A

  • 7/23/2019 3.Linear Models for Regression

    14/103

    Linear Model : Curve Fitting

    14

    Deterministic linear model regression

    Consider observing a training set consisting N 1-dimensional

    observation , together with corresponding

    realvalued targets:1 2( , ,..., )Nx x x

    x

    1 2( , ,..., )Nt t t

    t

    Previous model only can fit linear relation between input and output

  • 7/23/2019 3.Linear Models for Regression

    15/103

    How to Find(Learn) Optimal w

    15

    Deterministic linear model regression

    * Note that X is N by (m+1) matrix

    As for the least squares example: we can minimize the sum of the

    squares of the errors between the predictions for each datapoint and the corresponding target values

    ( , )ny x wn

    xnt

    Line fitting

    * Ifxid-dim, X is ?

  • 7/23/2019 3.Linear Models for Regression

    16/103

    Various Fitting Result Depend on Size of M

    16

    Deterministic linear model regression

    D i i i li d l i

  • 7/23/2019 3.Linear Models for Regression

    17/103

    Overfitting : Why?

    17

    Deterministic linear model regression

    This is overfitting

    D t i i ti li d l i

  • 7/23/2019 3.Linear Models for Regression

    18/103

    What Happen to w* When Overfitting Occurs?

    18

    Deterministic linear model regression

    D t i i ti li d l i

  • 7/23/2019 3.Linear Models for Regression

    19/103

    Overfitting : Varying the Size of the Data

    19

    Deterministic linear model regression

    D t i i ti li d l i

  • 7/23/2019 3.Linear Models for Regression

    20/103

    Generalization

    20

    Deterministic linear model regression

    The ultimate goal of supervised learning is achieve good

    generalizationby making accurate predictions for new test data that isnot known during learning.

    Choosing the values of parameters that minimize the loss function on

    the training data may not be the best option

    We would like to model the true regularities in the data and ignore

    the noise in the data

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    21/103

    Regularization

    21

    Deterministic linear model regression

    L2 norm

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    22/103

    How to Find(Learn) Optimal w

    22

    Deterministic linear model regression

    * 1

    ( )

    T T

    ridge

    w X X I X t

    Least square Regularized least square

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    23/103

    Optimal w : Derivation

    23

    Deterministic linear model regression

    2

    1

    1( ) ( )

    2 2

    1( ) ( )

    2 2

    1( )( )

    2 2

    1( )

    2 2

    1 1 1 1

    2 2 2 2 21 1

    2 2 2

    N

    T T

    n n

    n

    T T

    T T T T

    T T T T T T T

    T T T T T T

    T T T T T

    E t

    w x w w w

    Xw t Xw t w w

    w X t Xw t w w

    w X Xw w X t t Xw t t w w

    w X Xw t Xw t Xw t t w w

    w X Xw t Xw t t w w

    * 1( )T Tridge

    w X X I X t

    1 1( )

    2 2 2

    T T T T T

    T T T T

    w w X Xw t Xw t t w w

    w X X t X w

    1

    0

    ( )

    ( )

    T T T T

    T T T T

    T T T T

    T T T

    T T

    T T

    w X X t X w

    w X X w t X

    w X X w I t X

    X Xw I w X t

    X X I w X t

    w X X I X t

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    24/103

    Geometric Interpretation of Regularization

    Deterministic linear model regression

    increase

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    25/103

    How to Choose Regularization Parameter

    25

    Deterministic linear model regression

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    26/103

    Cross Validation

    26

    Deterministic linear model regression

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    27/103

    Summary

    27

    Deterministic linear model regression

    Regression : line fitting

    Xis N by (d+1) matrix

    wis (d+1) vector

    Find (learn) optimal w.Minimize error

    Regression : curve fitting

    Is it necessary to choose M : cross validation?

    : cross validation

    What if xis not 1-dim?

    0

    1

    Mj

    j

    j

    w w x

    0 1x

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    28/103

    Linear Basis Function Models

    28

    Deterministic linear model regression

    1

    0 0 1 1 1 1

    0( , ) ( ) ( ) ... ( ) ( )

    M

    M M j j

    jy x w x w x w x w x

    w

    ( ) ii

    x x

    10 1 1

    0 1 1

    0

    ( , ) ... ( )M

    M j

    M j

    j

    y x w x w x w x x w x

    w

    Curve fitting

    0

    1

    M

    jj

    j

    w w x

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    29/103

    Linear Basis Function Models

    29

    g

    1

    0 0 1 1 1 1

    0

    ( , ) ( ) ( ) ... ( ) ( ) ( )M

    M M j j

    j

    y x w x w x w x w x

    w w x

    ( ) ii

    x x 1

    0 1 1

    0 1 1

    0

    ( , ) ... ( )M

    M j

    M j

    j

    y x w x w x w x x w x

    w

    0 1,1 1 1,2 2 1,3 3

    2,1 1 2 2,2 1 3 2,3 2 3

    2 2 2

    3,1 1 3,2 2 3,3 3

    ( , ) ( )

    y w w x w x w x

    w x x w x x w x x

    w x w x w x

    x w w x

    3-dimensional input & M=3 polynomial basis function

    Later, considering only 1-dim input

    - Use multi-indexj=(j1,j2,..jd)

    2-dimensional input & M=3 polynomial basis function

    Approximately, (M-1)Dbasis functions and weights are

    required

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    30/103

    Popular Basis Functions

    30

    g

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    31/103

    Popular Basis Functions

    31

    g

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    32/103

    Basis Function : Gaussian

    32

    g

    Gaussian Basis FunctionCurve fitting

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    33/103

    Regularization (Identical to Previous Case)

    33

    Deterministic linear model regression

  • 7/23/2019 3.Linear Models for Regression

    34/103

    Other Regularization

    34

    M-1

  • 7/23/2019 3.Linear Models for Regression

    35/103

    ML-based Probabilistic Linear

    Model Regression

    ML-based Probabilistic Linear Model Regression

  • 7/23/2019 3.Linear Models for Regression

    36/103

    36

    Probabilistic Perspective

    * Target values(observations) are often noisy

    : scalar

    0

    1

    Mj

    j

    j

    w w x

    ML-based Probabilistic Linear Model Regression

  • 7/23/2019 3.Linear Models for Regression

    37/103

    Maximum Likelihood Estimation (MLE)

    37

    2

    111

    ( )1arg max exp

    22ML

    Nn n

    i

    y x w t

    w

    w

    Least square : loss function

    ML-based Probabilistic Linear Model Regression

  • 7/23/2019 3.Linear Models for Regression

    38/103

    Derivation

    38

    11

    ln ln

    N N

    n n

    nn

    x x

    2

    111

    2 2

    1 11 11 1

    2

    1 1

    11

    ( , )1ln ( | , , ) ln exp

    22

    ( , ) ( , )1 1 ln exp ln lnexp

    2 22 2

    ( , ) 1 ln 2 ln

    2 2

    Nn n

    n

    N Nn n n n

    n n

    Nn n

    n

    y tp

    y t y t

    y t

    x wt x w

    x w x w

    x w

    2

    11

    2

    2

    1

    1 1

    ( , )2

    2

    ( , )1 1 ln ln 2 ( ) ln ln 2

    2 2 2 2 2 2

    Nn n

    n

    N Nn n

    n n

    n n

    y t

    y t N Ny t

    x w

    x wx w

    2

    1

    11

    ( , )1( | , , ) exp

    22

    Nn n

    i

    y tp

    x w

    t x w

    ML-based Probabilistic Linear Model Regression

  • 7/23/2019 3.Linear Models for Regression

    39/103

    Maximum Likelihood Estimation (MLE)

    39

    Least square : loss function

    arg max ln | , , arg min ln | , ,p p w w

    t x w t x w

    1( )T T

    ML

    w X X X t ln | , , ( )Tp w t x w X t Xw

    ML-based Probabilistic Linear Model Regression

  • 7/23/2019 3.Linear Models for Regression

    40/103

    Linear Basis Function Models

    40

    0

    1

    Mj

    j

    j

    w w x

    ML-based Probabilistic Linear Model Regression

  • 7/23/2019 3.Linear Models for Regression

    41/103

    Linear Basis Function Models

    41

    ln | , , ( ) 0Tp

    w

    t x w t w

    ML-based Probabilistic Linear Model Regression

  • 7/23/2019 3.Linear Models for Regression

    42/103

    Predictive Distribution

    42

  • 7/23/2019 3.Linear Models for Regression

    43/103

    Bayesian Linear Model Regression

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    44/103

    Bayes' Theorem

    ( | ) ( )( | )( )

    P Y X P XP X YP Y

    Likelihood Prior probability distributionPosterior probability distribution

    Evidence(Marginl distribution)

    ( | ) ( | ) ( )P X Y P Y X P X

    X

    Y

    ( ) ( , ) ( | ) ( )x X x X

    P Y P Y X x P Y x P x

    Discrete case

    ( | ) ( | ) ( )P X Y P Y X P X

    1

    ( )P Y

    ( | ) ( )P Y X P X

    might not be probability distribution

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    45/103

    Normalization

    ( | ) ( )( | )

    ( )

    P Y X P XP X Y

    P Y

    X

    Y

    A test will produce 99% true positive results for drug users and 99% truenegative results for non-drug users. Suppose that 0.5% of people are users

    of the drug. If a randomly selected individual tests positive, what is the

    probability he or she is a user?

    ( ) ( | ) ( )

    x X

    P Y P Y x P x

    If domain of X is extremely large or high-dimension ? intractable

    ( | ) ( )( | )

    ( )

    P Y y X P XP X Y y

    P Y y

    ( ) ( | ) ( )x X

    P Y y P Y y x P x

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    46/103

    Applying Bayes' Theorem to Learning

    w

    D

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    47/103

    Possible Solutions

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    48/103

    Conjugate Prior

    If the posterior distributionsp( |x) are in the same family as the

    prior probability distributionp(), the prior and posterior are thencalled conjugate distributions, and the prior is called a conjugate

    prior for the likelihood functionp(x |) .

    ( | ) ( )( | )

    ( )

    P Y X P XP X Y

    P Y We dont nee to calculateP(Y)

    Bayesian & Bayesian Learning

  • 7/23/2019 3.Linear Models for Regression

    49/103

    Most Probable Prediction

    { , }V

    Prediction (classification) by MAP : +

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    50/103

    Predictive Distribution

    50

    t* t* t*

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    51/103

    Bayesian Estimation

    51

    w

    t

    m

    S

    x

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    52/103

    Normal (Gaussian) Distribution

    IfXis a normally distributed variable with mean and variance 2,

    2

    normalization factor :

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    53/103

    Multivariate Normal (Gaussian) Distribution

    k-dimensional mean vector

    kk covariance matrix

    Symmetric covariance matrix must always be positive definite

    -1 is precision matrix.

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    54/103

    Bayesian Estimation

    54

    , : known!

    ( | , , , )p w t X

    Multivariate GaussianUnivariate Gaussian ?

    w

    t

    m

    S

    X

    Bayesian linear model regression

    Lik lih d f M lti l U i i t G i V i bl

  • 7/23/2019 3.Linear Models for Regression

    55/103

    55

    Likelihood of Multiple Univariate Gaussian Variable

    Multivariate Gaussian Likelihood

    1d

    2 21 1

    2

    1 1

    ( | , ( ) )

    ( | , ( ) )

    p d d

    N d d

    1 2 1 2( ( , ) | , , ?)

    ( | , ?)

    ( | , ?)

    p d d

    p

    N

    d d

    d d

    d d

    2d

    2 2

    2 2

    2

    2 2

    ( | , ( ) )

    ( | , ( ) )

    p d d

    N d d

    2

    01

    1d

    2d

    2d

    1d

    i i i

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    56/103

    Bayesian Estimation

    56

    ( | , , , )p w t X

    ( | , , )p t X w

    w

    t

    m

    S

    X

    ( | , , , )p w t X 0 0( | , )p w m S

    ( | )p w( | , , , )p w t X ( | , , )p t X w

    ( , , , , ) ( | , , , ) ( , , , )( | , , , )

    ( , , , ) ( , , , , )

    ( | , , ) ( | , , ) ( , , )

    ( | , , ) ( | ) ( ) ( ) ( )

    ( | , , ) ( | ) (

    p p pp

    p p d

    p p p

    p p p p p d

    p p p

    t w X t w X w Xw t X

    t X t w X w

    t w X w X X

    t w X w X w

    t w X w ) ( ) ( )

    ( | , , ) ( | ) ( ) ( ) ( )

    ( | , , ) ( | )

    ( | , , ) ( | )

    p p

    p p d p p p

    p p

    p p d

    X

    t w X w w X

    t w X w

    t w X w w

    P i Di ib i D i i

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    57/103

    Posterior Distribution : Derivation

    57

    2

    111

    1 1exp ( )

    22

    N

    T

    n n

    n

    t

    w x

    1 1

    1/ 2( / 2)

    1 1exp ( )

    2(2 )

    T

    M

    w I w

    2

    111

    1 1exp ( )

    22

    N

    T

    n n

    n

    t

    w x

    exp exp exp( )a b a b

    Univariate Gaussian Multivariate Gaussian

    P i Di ib i D i i

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    58/103

    Posterior Distribution : Derivation

    58

    1 1

    1/ 2/ 2

    1 1

    exp ( )2(2 )

    T

    M

    w I w 2

    111

    1 1

    exp ( )22

    N

    T

    n n

    nt

    w x

    1 11( ) ( ) ( )2 2

    T T

    w t w t w I w

    1( )

    2 2 2

    T T T T T T

    w w w t t t w I w

    :T

    symmetry

    1( )( ) ( )

    2 2

    T T T T

    w t w t w I w

    1

    ( )2 2

    T T T T T T T

    w w w t t w t t w I w

    (AAT)T=(AT)TAT=AAT

    exp exp exp( )a b a b

    P t i Di t ib ti D i ti

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    59/103

    Posterior Distribution : Derivation

    59

    We are given a quadratic form defining the exponent terms in a Gaussian

    distribution, and we need to determine the corresponding mean and

    covariance

    Const denotes terms which are independent of x, and we have

    made use of the symmetry of .

    11 ( ) ( )2

    Tconst

    x x

    Completing the Square

    ( ) T

    Q x x Ax

    Quadratic form

    P t i Di t ib ti D i ti

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    60/103

    Posterior Distribution : Derivation

    60

    1( )

    2 2

    T T T T T

    w I w w t t t

    1( )T

    N

    S I

    1 112 2

    T T T T

    N N N

    w S w w S S t t t

    T

    N Nm S t1 11

    2 2

    T T T

    N N N

    w S w w S m t t

    1:N symmetry

    S

    11 ( ) ( )

    2

    Tconst

    x x

    1( )

    2 2 2

    T T T T T T

    w w w t t t w I w( | , , , )p w t X

    P t i Di t ib ti D i ti

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    61/103

    Posterior Distribution : Derivation

    61

    1 1 1 11 1 1

    2 2 2 2

    T TT T T

    N N N N N N N N NS

    w w w S m m S m m S m t t

    11( ) ( )

    2

    T

    N N NS const

    w m w m

    1 11

    2 2

    T T T

    N N NS

    w w w S m t t

    11

    2 2

    TT

    N N Nconst

    t t m S m

    11 ( ) ( )

    2

    Tconst

    x x

    ( | , , , )p w t X

    1 11

    2 2

    T T T

    N N N

    w S w w S m t t

    1 1 11 1

    2 2

    TT T

    N N N N N NS const

    w w w S m m S m

    2

    111

    1 1( | , , , ) exp ( )

    22

    NT

    n n

    n

    p t

    w t X w x

    1 1

    1/ 2/ 2

    1 1exp ( )

    2(2 )

    T

    M

    w I w

    P t i M d V i

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    62/103

    Posterior : Mean and Variance

    62

    20 0 1 0 1

    1 1 1

    2

    1 0 1 1 1

    1 1 1

    ( ) ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( ) ( )

    N N N

    i i i i M i

    i i i

    N N N

    i i i i M iT

    i i i

    x x x x x

    x x x x x

    2

    1 0 1 1 11 1 1

    ( ) ( ) ( ) ( ) ( )N N N

    M i i M i i M i

    i i i

    x x x x x

    0 1 0 2 0

    1 1 1 2 1

    1 1 1 2 1

    ( ) ( ) ( )

    ( ) ( ) ( )

    ( ) ( ) ( )

    N

    NT

    M M M N

    x x x

    x x x

    x x x

    Meaning of T

    P t i M d V i

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    63/103

    Posterior : Mean and Variance

    1cov , [ ] [ ]E E E E

    w w w w w w ww I

    Covariance MatrixTwo Multivariate Gaussian:

    Precision Matrix

    Posterior distribution:

    Mean vector, Precision

    IPrecision Matrix :

    Prior :

    1

    1

    ( | ( ), ) ( | , , ) ( | , ?)N

    n ML n ML ML

    n

    N t p N

    w x t X w t w

    Likelihood(Different Form): Multivariate Gaussian

    P t i M d V i

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    64/103

    Posterior : Mean and Variance

    1 1

    1

    ( | , , ) ( | ( ), ) ( | , ( ) )N

    ML n ML n MLn

    p N t N

    t X w w x t w

    1cov , ( )E t t tt

    Covariance MatrixTwo Multivariate Gaussian:

    Precision Matrix

    Posterior distribution:

    Mean vector, Precision

    Precision Matrix :Likelihood:

    1

    1

    ( | ( ), ) ( | , , ) ( | , ?)N

    n ML n ML ML

    n

    N t p N

    w x t X w t w

    Likelihood(Different Form): Multivariate GaussianI

    Precision

    Matrix :

    P t i M d V i

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    65/103

    Posterior : Mean and Variance

    65

    1 11 ( )T TN N

    S S

    The more instances we have seen, the larger the posterior precision

    1

    become larger

    become larger

    T

    N

    Svs.

    1 11 ( ) ( )T T T T N

    m t t

    ( )ML

    w t

    T T TN N ML N ML N ML Nm S I 0 w S w S w S t

    1( )ML w t

    Ba esian Linear Regression : General Form

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    66/103

    Bayesian Linear Regression : General Form

    66

    MAP Nw m

    Posterior : Meaning of Mean

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    67/103

    Posterior : Meaning of Mean

    67

    1

    0 0( )T

    N N MLm

    S S m w

    1( )

    ML

    w t

    1

    0 0( )T

    N N MLm

    S S m w

    Mixesbetween sample mean and prior mean

    The higher the precision of the prior, the less we believe the sample mean

    The higher the precision of the instances, the more we believe the sample mean

    (The more instances we have seen, the more we believe the sample mean)

    Effect of Varying Covariance of Prior

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    68/103

    Effect of Varying Covariance of Prior

    68

    ( | , , , )p w t X

    MAP Estimation

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    69/103

    MAP Estimation

    69

    1

    ( | , , , ) ( | ( ), , ) ( | )N

    n n

    n

    p p p

    w t X t x w w

    1

    ln ( | , , , ) ln ( | ( ), , ) ln ( | )N

    n n

    n

    p p p

    w t X t x w w

    2

    1

    ln exp ( )22

    NT

    n n

    n

    t

    w x 2

    1

    ln ln 2 ( )2 2 2

    NT

    n n

    n

    N Nt

    w x

    ( | , , , )p w t X

    MAP Estimation

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    70/103

    MAP Estimation

    70

    1

    ln ( | , , ) ln ( | ( ), , ) ln ( | )N

    n n

    n

    p p p

    w t X t x w w

    1 1

    1/ 2/ 2

    1 1ln exp ( )

    2(2 )

    T

    M

    w I w ln ln 22 2 2

    TM M w w

    A = diagonal

    Ignoring terms that do not depend on w

    2

    1

    ln ( | , , ) ( )

    2 2

    NT T

    n n

    n

    p t

    w t X w x w w

    1/ 2 1/ 2/ 2 / 2 2

    1/ 2/ 2

    1

    ln ln1 ln (2 ) ln(2 ) ln ln 2 ln2(2 )

    M

    M M

    M

    M

    MAP Estimation = Regularized Least Squares

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    71/103

    MAP Estimation = Regularized Least Squares

    71

    2

    1

    ln ( | , , ) ( )2 2

    NT T

    n nn

    p t

    t x w w x w w

    Regularized Least Squares

    MAP Nw mT

    N Nm S t

    1 ( )TN

    S I

    Bayesian estimation

    1

    1

    T T T T

    Nm

    I t I t

    ML vs MAP

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    72/103

    ML vs. MAP

    72

    2

    1 1

    1/ 2/ 211

    1 1 1arg max exp ( ) exp ( )

    2 2(2 )2MAP

    NT T

    n n Mn

    t

    w

    w w x w I w

    2

    1

    1

    1arg max exp ( )

    22

    ML

    N

    n n

    i

    t

    w

    w w x

    1( )T TML

    w t

    1( | , , ) ( | ( ), )ML ML ML MLp t N t x w w x

    1( )T TMAP

    w I t

    1( | , , , ) ( | ( ), )MAP MAP MLp t N t

    x w w x

    Bayesian Linear Regression

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    73/103

    Bayesian Linear Regression

    73

    Bayesian Linear Regression

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    74/103

    Bayesian Linear Regression

    74

    Bayesian Linear Regression

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    75/103

    Bayesian Linear Regression

    75

    Bayesian linear model regression

    Linear Gaussian

  • 7/23/2019 3.Linear Models for Regression

    76/103

    Linear Gaussian

    y= Ax + b =f(x)

    ( ) ( | ) ( )p p p d y y x x x is also Gaussian distribution

    2( | , )N t y I

    ( | 0, )N y K( ) ( | ) ( )p p p d t t y y y

    2( ) ( | 0, )p N t t K I

    Example

    Marginal distribution :

    y

    x

    t

    y

    = y=f(y)

    Predictive Distribution

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    77/103

    Predictive Distribution

    77

    1

    ( | , , )

    ( | ( ), )

    p t

    N t

    x w

    w x

    Varying depend on input

    t

    w

    Likelihood

    Predictive Distribution

    ( | , , , )

    ( | , )N N N

    p

    N p

    w t X

    w m S t S

    Posterior

    1 1 1( ), = , = , , ,N N

    A x m b 0 S L

    Predictive Distribution : Derivation

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    78/103

    Predictive Distribution : Derivation

    78

    ( | , ) ( | , , ) ( )p t p t p d x x w w w

    Predictive distribution (fixed and known ,)

    w

    t

    x X

    t

    ( | , ) ( , | , )

    ( , , , )

    ( , )

    ( | , , ) ( ) ( ) ( )( ) ( )

    ( | , , ) ( )

    p t p t d

    p td

    p

    p t p p p dp p

    p t p d

    x w x w

    w x

    w

    x

    w x w xw

    x

    x w w w

    w

    t

    x

    Predictive Distribution: ML vs Bayes

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    79/103

    Predictive Distribution: ML vs. Bayes

    79

    Predictive Distribution

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    80/103

    Predictive Distribution

    80

    Predictive Distribution

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    81/103

    Predictive Distribution

    81

    Summary : ML, MAP and Bayesian

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    82/103

    Summary : ML, MAP and Bayesian

    82

    1( | , , ) ( | ( ), )

    T

    ML ML ML MLp t N t

    x w w x 2

    111

    ( )1arg max exp

    22ML

    TNn

    n

    t

    w

    w xw

    2

    1 1

    1/ 2( 1) / 21

    1 1arg max exp ( ) exp ( )2 22 (2 )

    MAP

    N

    T T

    n nM

    nt

    ww w x w I w

    1( | , , , )MAP MLp t

    x w

    2 11 1

    1/ 2( 1) / 21

    1 1( | , , , ) exp ( ) exp ( ) ( | , )

    2 22 (2 )

    NTT T

    n n N NMn

    p t N

    w t X w x w I w w m S

    Maximum a Posteriori (MAP)

    1( | , , , , ) ( | , , ) ( | , , , ) ( | ( ), ( ) ( ))

    T T

    N Np t p t p d N t

    t x X x w w t X w m x x S x

  • 7/23/2019 3.Linear Models for Regression

    83/103

    Evidence Approximation

    Fully Bayesian Predictive Distribution

    Evidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    84/103

    Fully Bayesian Predictive Distribution

    84

    Predictive distribution (fixed and known ,)

    Fully Bayesian Predictive Distribution

    Evidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    85/103

    Fully Bayesian Predictive Distribution

    85

    Fully Bayesian Predictive Distribution

    Evidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    86/103

    Fully Bayesian Predictive Distribution

    86

    Predictive distribution (fixed and known ,)

    Evidence Approximation

    Evidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    87/103

    Evidence Approximation

    87

    ( | , , )p t X

    w

    D

    ( | ) ( )( )( )

    P D PP D

    P D

    w ww

    w

    t

    X

    Posterior Distribution for Hyperparameter

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    88/103

    oste o st but o o ype pa a ete

    88

    ( , , , ) ( | , , ) ( , , ) ( , | , )

    ( , ) ( , )

    ( | , , ) ( ) ( ) ( )

    ( , , , )

    ( | , , ) ( ) ( ) ( )

    ( | , , ) ( ) ( ) ( )

    p p pp

    p p

    p p p p

    p d d

    p p p p

    p p p p d d

    w

    w

    t X t X Xt X

    t X t X

    t X X

    t X w w

    t X X

    t w X w X w

    ( | , , ) ( ) ( ) ( )( | , , ) ( ) ( ) ( )

    ( | , , ) ( ) ( )

    ( | , , ) ( ) ( )

    p p p pp p p d d p

    p p p

    p p p d d

    w

    w

    t X X

    t w X w w X

    t X

    t w X w w

    w

    X

    t

    ( | ) ( )( )( )

    P D PP D

    P D

    w ww

    Posterior Distribution for Hyperparameter

    Bayesian linear model regression

  • 7/23/2019 3.Linear Models for Regression

    89/103

    yp p

    89

    ( , | , ) ( , ) ( , | , )

    ( , )

    ( | , , ) ( | , ) ( , ) ( | , , ) ( ) ( ) ( )

    ( , ) ( , , , )

    ( | , , ) ( ) ( ) ( )

    ( | , , ) ( ) ( ) ( )

    ( | , , ) ( ) ( ) ( )

    ( | , , ) ( ) (

    p pp

    p

    p p p p p p p

    p p d d

    p p p p

    p p p p d d

    p p p p

    p p p

    w

    w

    t Xt X

    t X

    t X X t X X

    t X t X w w

    t X X

    t w X w X w

    t X X

    t w X w ) ( )

    ( | , , ) ( ) ( )

    ( | , , ) ( ) ( )

    d d p

    p p p

    p p p d d

    w

    w

    w X

    t X

    t w X w w

    w

    X

    t

    ( | ) ( )( )( )

    P D PP D

    P D

    w ww

    ( , | ) ( | , ) ( | )P P P t X t X X

    Evidence ApproximationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    90/103

    pp

    90

    1/ 22

    1( | , , ) exp ( )2 2

    NT

    n nn

    p t

    t X w w x

    1 1

    1/ 2/ 2

    1 1( | ) exp ( )

    2(2 )

    T

    Mp

    w w I w

    / 2 / 2

    ( | , , ) exp ( )2 2

    N M

    p E d

    t X w w

    1

    ( | , , ) ( | , , )N

    n n

    n

    p p t

    t X w x w

    exp exp exp( )a b a b

    A = diagonal

    w

    t

    X

    Evidence ApproximationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    91/103

    pp

    91

    / 2 / 2

    ( | , , ) exp ( )2 2

    N M

    p E d

    t X w w

    1

    ln ( | , , ) ln ln ln ln 22 2 2 2

    N

    M N Np const t X S

    Log Marginal Likelihood : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    92/103

    g g

    92

    11( ) ( ) ( )

    2

    T

    N N NE const

    w w m S w m

    1

    N

    S A

    11

    2 2

    TT

    N N Nconst

    t t m S m

    T

    N Nm S t

    1

    ( )T

    N

    S I

    Completing the Square

    1

    1( ) ( )2

    T

    const

    x x

    From derivation of posterior distribution :

    1( ) ( ) ( )

    2

    T

    N NE const w w m A w m

    1

    2 2

    TT

    N Nconst

    t t m Am

    Log Marginal Likelihood : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    93/103

    g g

    93

    1( ) ( ) ( )

    2

    T

    N NE const w w m A w m

    1

    2 2

    TT

    N Nconst

    t t m Am

    / 2 / 2

    ( | , , ) exp ( )2 2

    N M

    p E d

    t X w w

    / 2 / 2

    1( | , , ) exp exp ( ) ( )

    2 2 2

    N M

    T

    N Np const d

    t X w m A w m w

    / 2 1/ 2

    / 2 1/ 2

    (2 ) | | 1exp exp ( ) ( )

    (2 ) | | 2

    M

    T

    N NMconst d

    Aw m A w m w

    A

    / 2 / 2

    / 2 1/ 2( | , , ) exp (2 ) | |

    2 2

    N M

    Mp const

    t X A

    exp exp exp( )a b a b

    Log Marginal Likelihood : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    94/103

    g g

    94

    / 2 / 2

    / 2 1/ 2( | , , ) exp (2 ) | |

    2 2

    N M

    Mp const

    t X A

    / 2 / 2

    / 2 1/ 2ln ( | , , ) ln ln ln exp ln(2 ) ln | |2 2

    N M

    Mp const

    t X A

    11ln ln 2 ln ln 2 ln 2 ln | |2 2 2 2 2 2

    N N M M Mconst A

    11ln ln 2 ln ln | |2 2 2 2

    N N Mconst

    A

    1

    ln ( | , , ) ln ln ln ln 22 2 2 2

    N

    M N Np const t X S 1

    N

    S A

    Log Marginal Likelihood : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    95/103

    g g

    95

    1 1ln ( | , , ) ln ln ln

    2 2 2 2 2

    TT

    N N

    M Np

    t X t t m Am A

    11ln ln 2 ln ln | |2 2 2 2

    N N Mconst A

    1

    2 2

    TTN N

    const t t m Am

    Maximizing the EvidenceBayesian Learning : Regression

  • 7/23/2019 3.Linear Models for Regression

    96/103

    g

    96

    / 2 / 2

    ( | , , ) exp ( )2 2

    N M

    p E d

    t X w w

    1 1ln ( | , , ) ln ln ln

    2 2 2 2 2

    TT

    N N

    M Np

    t X t t m Am A

    T

    N N

    m m 1

    M

    i

    i i

    2

    1

    1 1( )

    N

    T

    n N n

    n

    tN

    m x

    Maximizing the Evidence : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    97/103

    g

    97

    ln2 2

    M M

    1( )T

    N

    A S I

    1 1ln ( | , , ) ln ln ln

    2 2 2 2 2

    TT

    N N

    M Np

    t X t t m Am A

    X is square matrix

    X

    Maximizing the Evidence : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    98/103

    98

    1 1ln ( | , , ) ln

    2 2 2

    T

    N N

    Mp

    t X m Am A

    1

    2

    T

    N N

    m Am

    1

    2

    T

    N N

    m Am

    1 T

    N

    m A t

    1( )T

    N

    A S I

    1 11 ( ) ( )2

    T T T

    A t A A t

    1 11 ( ) ( )2

    T T

    t A A A t

    11 ( ) ( )2

    T T

    t A t

    11 ( ) ( )( )2

    T T

    t A t

    1 11 ( ) ( )2

    T T

    At A A t

    1 11 ( ) ( )2

    T T

    At A A t

    1

    2

    T

    N N

    Am m

    1 1

    2 2

    T T

    N N N N m Im m m

    1 1( )

    T

    T

    A A

    A A

    1 1( )

    T

    T

    A A

    A A

    Maximizing the Evidence : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    99/103

    99

    1 1ln ( | , , ) ln

    2 2 2

    T

    N N

    Mp

    t X m Am A

    1

    1 1 1ln ( | , , ) 0

    2 2 2

    MT

    N N

    i i

    Mp

    t X m m

    1

    1M

    T

    N N

    i i

    M

    m m

    1

    1M

    T

    N N

    i i

    M m m

    1 1

    1 11

    M M

    i ii i

    M

    1 1

    M M

    Ti i

    N N

    i ii i i

    m m

    T

    N N

    m m1

    Mi

    i i

    Maximizing the Evidence : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    100/103

    100

    1

    1 1 1ln ( | , , ) 0

    2 2 2

    MT

    N N

    i i

    Mp

    t X m m

    T

    N N

    m m1

    M

    i

    i i

    Note that this gives only an implicit solution for as both and mNdepend on .

    Iterative procedure for finding optimal :

    Start with an initial choice for

    Use to compute mNand Use mNand to re-estimate

    Repeat until convergence.

    T

    N Nm S t

    Maximizing the Evidence : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    101/103

    1 1ln ( | , , ) ln ln ln

    2 2 2 2 2

    TT

    N N

    M Np

    t X t t m Am A

    1 1ln ( | , , ) ln

    2 2 2 2

    TT

    N N

    Np

    t X t t m Am A

    1 1

    1

    ln ln( )

    M M

    i

    i

    i i i

    A

    i i

    1( )T

    N

    A S I

    T

    N Nm S t

    X is square matrix

    X

    Eigenvaluesidefined by

    are proportional to

    1

    M

    i

    i i

    1 1

    1 1M Mi i

    i ii i

    1

    1M

    i

    i i

    ia b

    iab

    iab

    Maximizing the Evidence : DerivationEvidence Approximation

  • 7/23/2019 3.Linear Models for Regression

    102/103

    102

    1 1ln ( | , , ) ln ln ln

    2 2 2 2 2

    TT

    N N

    M Np

    t X t t m Am A

    1 1ln ( | , , ) ln

    2 2 2 2

    TT

    N N

    Np

    t X t t m Am A

    1ln

    2 2

    A

    2

    1

    1 1 1( )

    2 2 2 2

    N

    T TT

    N N n N n N N

    n

    t

    t t m Am m x t m t m

    1( )T

    N

    A S I

    T

    N Nm S t

    2

    1

    1( ) 0

    2 2 2

    N

    T

    n N n

    n

    Nt

    m x

    2

    1

    1 1( )

    N

    T

    n N n

    n

    tN

    m x

  • 7/23/2019 3.Linear Models for Regression

    103/103

    End