3.linear models for regression
TRANSCRIPT
-
7/23/2019 3.Linear Models for Regression
1/103
Linear Models For Regression
(Supervised Learning)
:
-
7/23/2019 3.Linear Models for Regression
2/103
Contents
2
Deterministic linear model regression
Line fitting
Curve fitting
Regularization
Basis function
ML-based probabilistic linear model regression
Bayesian linear model regression
Maximum a posteriori (MAP) estimation
Bayesian estimation
Evidence approximation
-
7/23/2019 3.Linear Models for Regression
3/103
Deterministic Linear Model
Regression
-
7/23/2019 3.Linear Models for Regression
4/103
What is Line Fitting ?
4
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
5/103
Linear Model : Line Fitting
5
Deterministic linear model regression
Given a vector of d-dimensional inputs we want to
predict the target (response) using the linear model:1 2( , ,..., )dx x x
x
The term w0is the intercept, or called bias term. It will be convenient toinclude The constant variable 1 in xand write:
Observe a training set consisting ofNobservations
Together with corresponding target values
1 2( , ,..., )N
X x x x
1 2( , ,..., )Nt t t
t
Note that X is an N( d+ 1) matrix
-
7/23/2019 3.Linear Models for Regression
6/103
How to Find(Learn) Optimal w
6
Deterministic linear model regression
One option is to minimize the sum of the squares of the errors
between the predictions for each data point xnand thecorresponding real-valued targets tn.
( , )ny x w
Number of training data
* Matrix form
-
7/23/2019 3.Linear Models for Regression
7/103
Transforming Objective Function
7
Deterministic linear model regression
Stack the data into a matrix
and use the norm operationto handle the sum
2
1
2 2 2
1 1 2 2
2
1 1 2 2 2
2
1 1
2 2
2
2
2
1( )2
1( ) ( ) ... ( )
2
1 ( ) ( ) ... ( )2
12
1 1
2 2
N
n n
n
N N
N N
N N
E t
t t t
t t t
t
t
t
w x w
x w x w x w
x w x w x w
x
x
w
x
t Xw t Xw t Xw
-
7/23/2019 3.Linear Models for Regression
8/103
How to Find(Learn) Optimal w
8
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
9/103
Matrix Derivatives
Supplementary
-
7/23/2019 3.Linear Models for Regression
10/103
Derivatives(Vectors)
Supplementary
Vector-by-scalar
Scalar-by-vector (Gradient)
Vector-by-vector
-
7/23/2019 3.Linear Models for Regression
11/103
-
7/23/2019 3.Linear Models for Regression
12/103
Derivatives Examples
Supplementary
Example : Scalar-by-vector
Example : Scalar-by-matrix
http://en.wikipedia.org/wiki/Matrix_calculus
-
7/23/2019 3.Linear Models for Regression
13/103
Optimal w : Derivation
13
Deterministic linear model regression
1
0
( )
T T T
T T T
T T
T T
w X X t X
w X X t X
X Xw X t
w X X X t
1( ) ( ) ( )
2
1( )( )
2
1 ( )2
1( )
2
1 1
2 2
T
T T T
T T T T T T
T T T T T
T T T T
E w
Xw t Xw t
w X t Xw t
w X Xw w X t t Xw t t
w X Xw t Xw t Xw t t
w X Xw t Xw t t 2T T T T
w X Xw w X X
w
1 1( )
2 2
T T T T T T T
w w X Xw t Xw t t w X X t X
: symmetric sinceT
A A ( )T T T
A A A A
-
7/23/2019 3.Linear Models for Regression
14/103
Linear Model : Curve Fitting
14
Deterministic linear model regression
Consider observing a training set consisting N 1-dimensional
observation , together with corresponding
realvalued targets:1 2( , ,..., )Nx x x
x
1 2( , ,..., )Nt t t
t
Previous model only can fit linear relation between input and output
-
7/23/2019 3.Linear Models for Regression
15/103
How to Find(Learn) Optimal w
15
Deterministic linear model regression
* Note that X is N by (m+1) matrix
As for the least squares example: we can minimize the sum of the
squares of the errors between the predictions for each datapoint and the corresponding target values
( , )ny x wn
xnt
Line fitting
* Ifxid-dim, X is ?
-
7/23/2019 3.Linear Models for Regression
16/103
Various Fitting Result Depend on Size of M
16
Deterministic linear model regression
D i i i li d l i
-
7/23/2019 3.Linear Models for Regression
17/103
Overfitting : Why?
17
Deterministic linear model regression
This is overfitting
D t i i ti li d l i
-
7/23/2019 3.Linear Models for Regression
18/103
What Happen to w* When Overfitting Occurs?
18
Deterministic linear model regression
D t i i ti li d l i
-
7/23/2019 3.Linear Models for Regression
19/103
Overfitting : Varying the Size of the Data
19
Deterministic linear model regression
D t i i ti li d l i
-
7/23/2019 3.Linear Models for Regression
20/103
Generalization
20
Deterministic linear model regression
The ultimate goal of supervised learning is achieve good
generalizationby making accurate predictions for new test data that isnot known during learning.
Choosing the values of parameters that minimize the loss function on
the training data may not be the best option
We would like to model the true regularities in the data and ignore
the noise in the data
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
21/103
Regularization
21
Deterministic linear model regression
L2 norm
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
22/103
How to Find(Learn) Optimal w
22
Deterministic linear model regression
* 1
( )
T T
ridge
w X X I X t
Least square Regularized least square
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
23/103
Optimal w : Derivation
23
Deterministic linear model regression
2
1
1( ) ( )
2 2
1( ) ( )
2 2
1( )( )
2 2
1( )
2 2
1 1 1 1
2 2 2 2 21 1
2 2 2
N
T T
n n
n
T T
T T T T
T T T T T T T
T T T T T T
T T T T T
E t
w x w w w
Xw t Xw t w w
w X t Xw t w w
w X Xw w X t t Xw t t w w
w X Xw t Xw t Xw t t w w
w X Xw t Xw t t w w
* 1( )T Tridge
w X X I X t
1 1( )
2 2 2
T T T T T
T T T T
w w X Xw t Xw t t w w
w X X t X w
1
0
( )
( )
T T T T
T T T T
T T T T
T T T
T T
T T
w X X t X w
w X X w t X
w X X w I t X
X Xw I w X t
X X I w X t
w X X I X t
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
24/103
Geometric Interpretation of Regularization
Deterministic linear model regression
increase
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
25/103
How to Choose Regularization Parameter
25
Deterministic linear model regression
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
26/103
Cross Validation
26
Deterministic linear model regression
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
27/103
Summary
27
Deterministic linear model regression
Regression : line fitting
Xis N by (d+1) matrix
wis (d+1) vector
Find (learn) optimal w.Minimize error
Regression : curve fitting
Is it necessary to choose M : cross validation?
: cross validation
What if xis not 1-dim?
0
1
Mj
j
j
w w x
0 1x
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
28/103
Linear Basis Function Models
28
Deterministic linear model regression
1
0 0 1 1 1 1
0( , ) ( ) ( ) ... ( ) ( )
M
M M j j
jy x w x w x w x w x
w
( ) ii
x x
10 1 1
0 1 1
0
( , ) ... ( )M
M j
M j
j
y x w x w x w x x w x
w
Curve fitting
0
1
M
jj
j
w w x
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
29/103
Linear Basis Function Models
29
g
1
0 0 1 1 1 1
0
( , ) ( ) ( ) ... ( ) ( ) ( )M
M M j j
j
y x w x w x w x w x
w w x
( ) ii
x x 1
0 1 1
0 1 1
0
( , ) ... ( )M
M j
M j
j
y x w x w x w x x w x
w
0 1,1 1 1,2 2 1,3 3
2,1 1 2 2,2 1 3 2,3 2 3
2 2 2
3,1 1 3,2 2 3,3 3
( , ) ( )
y w w x w x w x
w x x w x x w x x
w x w x w x
x w w x
3-dimensional input & M=3 polynomial basis function
Later, considering only 1-dim input
- Use multi-indexj=(j1,j2,..jd)
2-dimensional input & M=3 polynomial basis function
Approximately, (M-1)Dbasis functions and weights are
required
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
30/103
Popular Basis Functions
30
g
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
31/103
Popular Basis Functions
31
g
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
32/103
Basis Function : Gaussian
32
g
Gaussian Basis FunctionCurve fitting
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
33/103
Regularization (Identical to Previous Case)
33
Deterministic linear model regression
-
7/23/2019 3.Linear Models for Regression
34/103
Other Regularization
34
M-1
-
7/23/2019 3.Linear Models for Regression
35/103
ML-based Probabilistic Linear
Model Regression
ML-based Probabilistic Linear Model Regression
-
7/23/2019 3.Linear Models for Regression
36/103
36
Probabilistic Perspective
* Target values(observations) are often noisy
: scalar
0
1
Mj
j
j
w w x
ML-based Probabilistic Linear Model Regression
-
7/23/2019 3.Linear Models for Regression
37/103
Maximum Likelihood Estimation (MLE)
37
2
111
( )1arg max exp
22ML
Nn n
i
y x w t
w
w
Least square : loss function
ML-based Probabilistic Linear Model Regression
-
7/23/2019 3.Linear Models for Regression
38/103
Derivation
38
11
ln ln
N N
n n
nn
x x
2
111
2 2
1 11 11 1
2
1 1
11
( , )1ln ( | , , ) ln exp
22
( , ) ( , )1 1 ln exp ln lnexp
2 22 2
( , ) 1 ln 2 ln
2 2
Nn n
n
N Nn n n n
n n
Nn n
n
y tp
y t y t
y t
x wt x w
x w x w
x w
2
11
2
2
1
1 1
( , )2
2
( , )1 1 ln ln 2 ( ) ln ln 2
2 2 2 2 2 2
Nn n
n
N Nn n
n n
n n
y t
y t N Ny t
x w
x wx w
2
1
11
( , )1( | , , ) exp
22
Nn n
i
y tp
x w
t x w
ML-based Probabilistic Linear Model Regression
-
7/23/2019 3.Linear Models for Regression
39/103
Maximum Likelihood Estimation (MLE)
39
Least square : loss function
arg max ln | , , arg min ln | , ,p p w w
t x w t x w
1( )T T
ML
w X X X t ln | , , ( )Tp w t x w X t Xw
ML-based Probabilistic Linear Model Regression
-
7/23/2019 3.Linear Models for Regression
40/103
Linear Basis Function Models
40
0
1
Mj
j
j
w w x
ML-based Probabilistic Linear Model Regression
-
7/23/2019 3.Linear Models for Regression
41/103
Linear Basis Function Models
41
ln | , , ( ) 0Tp
w
t x w t w
ML-based Probabilistic Linear Model Regression
-
7/23/2019 3.Linear Models for Regression
42/103
Predictive Distribution
42
-
7/23/2019 3.Linear Models for Regression
43/103
Bayesian Linear Model Regression
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
44/103
Bayes' Theorem
( | ) ( )( | )( )
P Y X P XP X YP Y
Likelihood Prior probability distributionPosterior probability distribution
Evidence(Marginl distribution)
( | ) ( | ) ( )P X Y P Y X P X
X
Y
( ) ( , ) ( | ) ( )x X x X
P Y P Y X x P Y x P x
Discrete case
( | ) ( | ) ( )P X Y P Y X P X
1
( )P Y
( | ) ( )P Y X P X
might not be probability distribution
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
45/103
Normalization
( | ) ( )( | )
( )
P Y X P XP X Y
P Y
X
Y
A test will produce 99% true positive results for drug users and 99% truenegative results for non-drug users. Suppose that 0.5% of people are users
of the drug. If a randomly selected individual tests positive, what is the
probability he or she is a user?
( ) ( | ) ( )
x X
P Y P Y x P x
If domain of X is extremely large or high-dimension ? intractable
( | ) ( )( | )
( )
P Y y X P XP X Y y
P Y y
( ) ( | ) ( )x X
P Y y P Y y x P x
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
46/103
Applying Bayes' Theorem to Learning
w
D
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
47/103
Possible Solutions
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
48/103
Conjugate Prior
If the posterior distributionsp( |x) are in the same family as the
prior probability distributionp(), the prior and posterior are thencalled conjugate distributions, and the prior is called a conjugate
prior for the likelihood functionp(x |) .
( | ) ( )( | )
( )
P Y X P XP X Y
P Y We dont nee to calculateP(Y)
Bayesian & Bayesian Learning
-
7/23/2019 3.Linear Models for Regression
49/103
Most Probable Prediction
{ , }V
Prediction (classification) by MAP : +
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
50/103
Predictive Distribution
50
t* t* t*
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
51/103
Bayesian Estimation
51
w
t
m
S
x
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
52/103
Normal (Gaussian) Distribution
IfXis a normally distributed variable with mean and variance 2,
2
normalization factor :
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
53/103
Multivariate Normal (Gaussian) Distribution
k-dimensional mean vector
kk covariance matrix
Symmetric covariance matrix must always be positive definite
-1 is precision matrix.
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
54/103
Bayesian Estimation
54
, : known!
( | , , , )p w t X
Multivariate GaussianUnivariate Gaussian ?
w
t
m
S
X
Bayesian linear model regression
Lik lih d f M lti l U i i t G i V i bl
-
7/23/2019 3.Linear Models for Regression
55/103
55
Likelihood of Multiple Univariate Gaussian Variable
Multivariate Gaussian Likelihood
1d
2 21 1
2
1 1
( | , ( ) )
( | , ( ) )
p d d
N d d
1 2 1 2( ( , ) | , , ?)
( | , ?)
( | , ?)
p d d
p
N
d d
d d
d d
2d
2 2
2 2
2
2 2
( | , ( ) )
( | , ( ) )
p d d
N d d
2
01
1d
2d
2d
1d
i i i
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
56/103
Bayesian Estimation
56
( | , , , )p w t X
( | , , )p t X w
w
t
m
S
X
( | , , , )p w t X 0 0( | , )p w m S
( | )p w( | , , , )p w t X ( | , , )p t X w
( , , , , ) ( | , , , ) ( , , , )( | , , , )
( , , , ) ( , , , , )
( | , , ) ( | , , ) ( , , )
( | , , ) ( | ) ( ) ( ) ( )
( | , , ) ( | ) (
p p pp
p p d
p p p
p p p p p d
p p p
t w X t w X w Xw t X
t X t w X w
t w X w X X
t w X w X w
t w X w ) ( ) ( )
( | , , ) ( | ) ( ) ( ) ( )
( | , , ) ( | )
( | , , ) ( | )
p p
p p d p p p
p p
p p d
X
t w X w w X
t w X w
t w X w w
P i Di ib i D i i
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
57/103
Posterior Distribution : Derivation
57
2
111
1 1exp ( )
22
N
T
n n
n
t
w x
1 1
1/ 2( / 2)
1 1exp ( )
2(2 )
T
M
w I w
2
111
1 1exp ( )
22
N
T
n n
n
t
w x
exp exp exp( )a b a b
Univariate Gaussian Multivariate Gaussian
P i Di ib i D i i
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
58/103
Posterior Distribution : Derivation
58
1 1
1/ 2/ 2
1 1
exp ( )2(2 )
T
M
w I w 2
111
1 1
exp ( )22
N
T
n n
nt
w x
1 11( ) ( ) ( )2 2
T T
w t w t w I w
1( )
2 2 2
T T T T T T
w w w t t t w I w
:T
symmetry
1( )( ) ( )
2 2
T T T T
w t w t w I w
1
( )2 2
T T T T T T T
w w w t t w t t w I w
(AAT)T=(AT)TAT=AAT
exp exp exp( )a b a b
P t i Di t ib ti D i ti
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
59/103
Posterior Distribution : Derivation
59
We are given a quadratic form defining the exponent terms in a Gaussian
distribution, and we need to determine the corresponding mean and
covariance
Const denotes terms which are independent of x, and we have
made use of the symmetry of .
11 ( ) ( )2
Tconst
x x
Completing the Square
( ) T
Q x x Ax
Quadratic form
P t i Di t ib ti D i ti
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
60/103
Posterior Distribution : Derivation
60
1( )
2 2
T T T T T
w I w w t t t
1( )T
N
S I
1 112 2
T T T T
N N N
w S w w S S t t t
T
N Nm S t1 11
2 2
T T T
N N N
w S w w S m t t
1:N symmetry
S
11 ( ) ( )
2
Tconst
x x
1( )
2 2 2
T T T T T T
w w w t t t w I w( | , , , )p w t X
P t i Di t ib ti D i ti
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
61/103
Posterior Distribution : Derivation
61
1 1 1 11 1 1
2 2 2 2
T TT T T
N N N N N N N N NS
w w w S m m S m m S m t t
11( ) ( )
2
T
N N NS const
w m w m
1 11
2 2
T T T
N N NS
w w w S m t t
11
2 2
TT
N N Nconst
t t m S m
11 ( ) ( )
2
Tconst
x x
( | , , , )p w t X
1 11
2 2
T T T
N N N
w S w w S m t t
1 1 11 1
2 2
TT T
N N N N N NS const
w w w S m m S m
2
111
1 1( | , , , ) exp ( )
22
NT
n n
n
p t
w t X w x
1 1
1/ 2/ 2
1 1exp ( )
2(2 )
T
M
w I w
P t i M d V i
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
62/103
Posterior : Mean and Variance
62
20 0 1 0 1
1 1 1
2
1 0 1 1 1
1 1 1
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
N N N
i i i i M i
i i i
N N N
i i i i M iT
i i i
x x x x x
x x x x x
2
1 0 1 1 11 1 1
( ) ( ) ( ) ( ) ( )N N N
M i i M i i M i
i i i
x x x x x
0 1 0 2 0
1 1 1 2 1
1 1 1 2 1
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
N
NT
M M M N
x x x
x x x
x x x
Meaning of T
P t i M d V i
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
63/103
Posterior : Mean and Variance
1cov , [ ] [ ]E E E E
w w w w w w ww I
Covariance MatrixTwo Multivariate Gaussian:
Precision Matrix
Posterior distribution:
Mean vector, Precision
IPrecision Matrix :
Prior :
1
1
( | ( ), ) ( | , , ) ( | , ?)N
n ML n ML ML
n
N t p N
w x t X w t w
Likelihood(Different Form): Multivariate Gaussian
P t i M d V i
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
64/103
Posterior : Mean and Variance
1 1
1
( | , , ) ( | ( ), ) ( | , ( ) )N
ML n ML n MLn
p N t N
t X w w x t w
1cov , ( )E t t tt
Covariance MatrixTwo Multivariate Gaussian:
Precision Matrix
Posterior distribution:
Mean vector, Precision
Precision Matrix :Likelihood:
1
1
( | ( ), ) ( | , , ) ( | , ?)N
n ML n ML ML
n
N t p N
w x t X w t w
Likelihood(Different Form): Multivariate GaussianI
Precision
Matrix :
P t i M d V i
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
65/103
Posterior : Mean and Variance
65
1 11 ( )T TN N
S S
The more instances we have seen, the larger the posterior precision
1
become larger
become larger
T
N
Svs.
1 11 ( ) ( )T T T T N
m t t
( )ML
w t
T T TN N ML N ML N ML Nm S I 0 w S w S w S t
1( )ML w t
Ba esian Linear Regression : General Form
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
66/103
Bayesian Linear Regression : General Form
66
MAP Nw m
Posterior : Meaning of Mean
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
67/103
Posterior : Meaning of Mean
67
1
0 0( )T
N N MLm
S S m w
1( )
ML
w t
1
0 0( )T
N N MLm
S S m w
Mixesbetween sample mean and prior mean
The higher the precision of the prior, the less we believe the sample mean
The higher the precision of the instances, the more we believe the sample mean
(The more instances we have seen, the more we believe the sample mean)
Effect of Varying Covariance of Prior
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
68/103
Effect of Varying Covariance of Prior
68
( | , , , )p w t X
MAP Estimation
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
69/103
MAP Estimation
69
1
( | , , , ) ( | ( ), , ) ( | )N
n n
n
p p p
w t X t x w w
1
ln ( | , , , ) ln ( | ( ), , ) ln ( | )N
n n
n
p p p
w t X t x w w
2
1
ln exp ( )22
NT
n n
n
t
w x 2
1
ln ln 2 ( )2 2 2
NT
n n
n
N Nt
w x
( | , , , )p w t X
MAP Estimation
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
70/103
MAP Estimation
70
1
ln ( | , , ) ln ( | ( ), , ) ln ( | )N
n n
n
p p p
w t X t x w w
1 1
1/ 2/ 2
1 1ln exp ( )
2(2 )
T
M
w I w ln ln 22 2 2
TM M w w
A = diagonal
Ignoring terms that do not depend on w
2
1
ln ( | , , ) ( )
2 2
NT T
n n
n
p t
w t X w x w w
1/ 2 1/ 2/ 2 / 2 2
1/ 2/ 2
1
ln ln1 ln (2 ) ln(2 ) ln ln 2 ln2(2 )
M
M M
M
M
MAP Estimation = Regularized Least Squares
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
71/103
MAP Estimation = Regularized Least Squares
71
2
1
ln ( | , , ) ( )2 2
NT T
n nn
p t
t x w w x w w
Regularized Least Squares
MAP Nw mT
N Nm S t
1 ( )TN
S I
Bayesian estimation
1
1
T T T T
Nm
I t I t
ML vs MAP
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
72/103
ML vs. MAP
72
2
1 1
1/ 2/ 211
1 1 1arg max exp ( ) exp ( )
2 2(2 )2MAP
NT T
n n Mn
t
w
w w x w I w
2
1
1
1arg max exp ( )
22
ML
N
n n
i
t
w
w w x
1( )T TML
w t
1( | , , ) ( | ( ), )ML ML ML MLp t N t x w w x
1( )T TMAP
w I t
1( | , , , ) ( | ( ), )MAP MAP MLp t N t
x w w x
Bayesian Linear Regression
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
73/103
Bayesian Linear Regression
73
Bayesian Linear Regression
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
74/103
Bayesian Linear Regression
74
Bayesian Linear Regression
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
75/103
Bayesian Linear Regression
75
Bayesian linear model regression
Linear Gaussian
-
7/23/2019 3.Linear Models for Regression
76/103
Linear Gaussian
y= Ax + b =f(x)
( ) ( | ) ( )p p p d y y x x x is also Gaussian distribution
2( | , )N t y I
( | 0, )N y K( ) ( | ) ( )p p p d t t y y y
2( ) ( | 0, )p N t t K I
Example
Marginal distribution :
y
x
t
y
= y=f(y)
Predictive Distribution
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
77/103
Predictive Distribution
77
1
( | , , )
( | ( ), )
p t
N t
x w
w x
Varying depend on input
t
w
Likelihood
Predictive Distribution
( | , , , )
( | , )N N N
p
N p
w t X
w m S t S
Posterior
1 1 1( ), = , = , , ,N N
A x m b 0 S L
Predictive Distribution : Derivation
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
78/103
Predictive Distribution : Derivation
78
( | , ) ( | , , ) ( )p t p t p d x x w w w
Predictive distribution (fixed and known ,)
w
t
x X
t
( | , ) ( , | , )
( , , , )
( , )
( | , , ) ( ) ( ) ( )( ) ( )
( | , , ) ( )
p t p t d
p td
p
p t p p p dp p
p t p d
x w x w
w x
w
x
w x w xw
x
x w w w
w
t
x
Predictive Distribution: ML vs Bayes
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
79/103
Predictive Distribution: ML vs. Bayes
79
Predictive Distribution
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
80/103
Predictive Distribution
80
Predictive Distribution
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
81/103
Predictive Distribution
81
Summary : ML, MAP and Bayesian
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
82/103
Summary : ML, MAP and Bayesian
82
1( | , , ) ( | ( ), )
T
ML ML ML MLp t N t
x w w x 2
111
( )1arg max exp
22ML
TNn
n
t
w
w xw
2
1 1
1/ 2( 1) / 21
1 1arg max exp ( ) exp ( )2 22 (2 )
MAP
N
T T
n nM
nt
ww w x w I w
1( | , , , )MAP MLp t
x w
2 11 1
1/ 2( 1) / 21
1 1( | , , , ) exp ( ) exp ( ) ( | , )
2 22 (2 )
NTT T
n n N NMn
p t N
w t X w x w I w w m S
Maximum a Posteriori (MAP)
1( | , , , , ) ( | , , ) ( | , , , ) ( | ( ), ( ) ( ))
T T
N Np t p t p d N t
t x X x w w t X w m x x S x
-
7/23/2019 3.Linear Models for Regression
83/103
Evidence Approximation
Fully Bayesian Predictive Distribution
Evidence Approximation
-
7/23/2019 3.Linear Models for Regression
84/103
Fully Bayesian Predictive Distribution
84
Predictive distribution (fixed and known ,)
Fully Bayesian Predictive Distribution
Evidence Approximation
-
7/23/2019 3.Linear Models for Regression
85/103
Fully Bayesian Predictive Distribution
85
Fully Bayesian Predictive Distribution
Evidence Approximation
-
7/23/2019 3.Linear Models for Regression
86/103
Fully Bayesian Predictive Distribution
86
Predictive distribution (fixed and known ,)
Evidence Approximation
Evidence Approximation
-
7/23/2019 3.Linear Models for Regression
87/103
Evidence Approximation
87
( | , , )p t X
w
D
( | ) ( )( )( )
P D PP D
P D
w ww
w
t
X
Posterior Distribution for Hyperparameter
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
88/103
oste o st but o o ype pa a ete
88
( , , , ) ( | , , ) ( , , ) ( , | , )
( , ) ( , )
( | , , ) ( ) ( ) ( )
( , , , )
( | , , ) ( ) ( ) ( )
( | , , ) ( ) ( ) ( )
p p pp
p p
p p p p
p d d
p p p p
p p p p d d
w
w
t X t X Xt X
t X t X
t X X
t X w w
t X X
t w X w X w
( | , , ) ( ) ( ) ( )( | , , ) ( ) ( ) ( )
( | , , ) ( ) ( )
( | , , ) ( ) ( )
p p p pp p p d d p
p p p
p p p d d
w
w
t X X
t w X w w X
t X
t w X w w
w
X
t
( | ) ( )( )( )
P D PP D
P D
w ww
Posterior Distribution for Hyperparameter
Bayesian linear model regression
-
7/23/2019 3.Linear Models for Regression
89/103
yp p
89
( , | , ) ( , ) ( , | , )
( , )
( | , , ) ( | , ) ( , ) ( | , , ) ( ) ( ) ( )
( , ) ( , , , )
( | , , ) ( ) ( ) ( )
( | , , ) ( ) ( ) ( )
( | , , ) ( ) ( ) ( )
( | , , ) ( ) (
p pp
p
p p p p p p p
p p d d
p p p p
p p p p d d
p p p p
p p p
w
w
t Xt X
t X
t X X t X X
t X t X w w
t X X
t w X w X w
t X X
t w X w ) ( )
( | , , ) ( ) ( )
( | , , ) ( ) ( )
d d p
p p p
p p p d d
w
w
w X
t X
t w X w w
w
X
t
( | ) ( )( )( )
P D PP D
P D
w ww
( , | ) ( | , ) ( | )P P P t X t X X
Evidence ApproximationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
90/103
pp
90
1/ 22
1( | , , ) exp ( )2 2
NT
n nn
p t
t X w w x
1 1
1/ 2/ 2
1 1( | ) exp ( )
2(2 )
T
Mp
w w I w
/ 2 / 2
( | , , ) exp ( )2 2
N M
p E d
t X w w
1
( | , , ) ( | , , )N
n n
n
p p t
t X w x w
exp exp exp( )a b a b
A = diagonal
w
t
X
Evidence ApproximationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
91/103
pp
91
/ 2 / 2
( | , , ) exp ( )2 2
N M
p E d
t X w w
1
ln ( | , , ) ln ln ln ln 22 2 2 2
N
M N Np const t X S
Log Marginal Likelihood : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
92/103
g g
92
11( ) ( ) ( )
2
T
N N NE const
w w m S w m
1
N
S A
11
2 2
TT
N N Nconst
t t m S m
T
N Nm S t
1
( )T
N
S I
Completing the Square
1
1( ) ( )2
T
const
x x
From derivation of posterior distribution :
1( ) ( ) ( )
2
T
N NE const w w m A w m
1
2 2
TT
N Nconst
t t m Am
Log Marginal Likelihood : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
93/103
g g
93
1( ) ( ) ( )
2
T
N NE const w w m A w m
1
2 2
TT
N Nconst
t t m Am
/ 2 / 2
( | , , ) exp ( )2 2
N M
p E d
t X w w
/ 2 / 2
1( | , , ) exp exp ( ) ( )
2 2 2
N M
T
N Np const d
t X w m A w m w
/ 2 1/ 2
/ 2 1/ 2
(2 ) | | 1exp exp ( ) ( )
(2 ) | | 2
M
T
N NMconst d
Aw m A w m w
A
/ 2 / 2
/ 2 1/ 2( | , , ) exp (2 ) | |
2 2
N M
Mp const
t X A
exp exp exp( )a b a b
Log Marginal Likelihood : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
94/103
g g
94
/ 2 / 2
/ 2 1/ 2( | , , ) exp (2 ) | |
2 2
N M
Mp const
t X A
/ 2 / 2
/ 2 1/ 2ln ( | , , ) ln ln ln exp ln(2 ) ln | |2 2
N M
Mp const
t X A
11ln ln 2 ln ln 2 ln 2 ln | |2 2 2 2 2 2
N N M M Mconst A
11ln ln 2 ln ln | |2 2 2 2
N N Mconst
A
1
ln ( | , , ) ln ln ln ln 22 2 2 2
N
M N Np const t X S 1
N
S A
Log Marginal Likelihood : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
95/103
g g
95
1 1ln ( | , , ) ln ln ln
2 2 2 2 2
TT
N N
M Np
t X t t m Am A
11ln ln 2 ln ln | |2 2 2 2
N N Mconst A
1
2 2
TTN N
const t t m Am
Maximizing the EvidenceBayesian Learning : Regression
-
7/23/2019 3.Linear Models for Regression
96/103
g
96
/ 2 / 2
( | , , ) exp ( )2 2
N M
p E d
t X w w
1 1ln ( | , , ) ln ln ln
2 2 2 2 2
TT
N N
M Np
t X t t m Am A
T
N N
m m 1
M
i
i i
2
1
1 1( )
N
T
n N n
n
tN
m x
Maximizing the Evidence : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
97/103
g
97
ln2 2
M M
1( )T
N
A S I
1 1ln ( | , , ) ln ln ln
2 2 2 2 2
TT
N N
M Np
t X t t m Am A
X is square matrix
X
Maximizing the Evidence : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
98/103
98
1 1ln ( | , , ) ln
2 2 2
T
N N
Mp
t X m Am A
1
2
T
N N
m Am
1
2
T
N N
m Am
1 T
N
m A t
1( )T
N
A S I
1 11 ( ) ( )2
T T T
A t A A t
1 11 ( ) ( )2
T T
t A A A t
11 ( ) ( )2
T T
t A t
11 ( ) ( )( )2
T T
t A t
1 11 ( ) ( )2
T T
At A A t
1 11 ( ) ( )2
T T
At A A t
1
2
T
N N
Am m
1 1
2 2
T T
N N N N m Im m m
1 1( )
T
T
A A
A A
1 1( )
T
T
A A
A A
Maximizing the Evidence : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
99/103
99
1 1ln ( | , , ) ln
2 2 2
T
N N
Mp
t X m Am A
1
1 1 1ln ( | , , ) 0
2 2 2
MT
N N
i i
Mp
t X m m
1
1M
T
N N
i i
M
m m
1
1M
T
N N
i i
M m m
1 1
1 11
M M
i ii i
M
1 1
M M
Ti i
N N
i ii i i
m m
T
N N
m m1
Mi
i i
Maximizing the Evidence : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
100/103
100
1
1 1 1ln ( | , , ) 0
2 2 2
MT
N N
i i
Mp
t X m m
T
N N
m m1
M
i
i i
Note that this gives only an implicit solution for as both and mNdepend on .
Iterative procedure for finding optimal :
Start with an initial choice for
Use to compute mNand Use mNand to re-estimate
Repeat until convergence.
T
N Nm S t
Maximizing the Evidence : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
101/103
1 1ln ( | , , ) ln ln ln
2 2 2 2 2
TT
N N
M Np
t X t t m Am A
1 1ln ( | , , ) ln
2 2 2 2
TT
N N
Np
t X t t m Am A
1 1
1
ln ln( )
M M
i
i
i i i
A
i i
1( )T
N
A S I
T
N Nm S t
X is square matrix
X
Eigenvaluesidefined by
are proportional to
1
M
i
i i
1 1
1 1M Mi i
i ii i
1
1M
i
i i
ia b
iab
iab
Maximizing the Evidence : DerivationEvidence Approximation
-
7/23/2019 3.Linear Models for Regression
102/103
102
1 1ln ( | , , ) ln ln ln
2 2 2 2 2
TT
N N
M Np
t X t t m Am A
1 1ln ( | , , ) ln
2 2 2 2
TT
N N
Np
t X t t m Am A
1ln
2 2
A
2
1
1 1 1( )
2 2 2 2
N
T TT
N N n N n N N
n
t
t t m Am m x t m t m
1( )T
N
A S I
T
N Nm S t
2
1
1( ) 0
2 2 2
N
T
n N n
n
Nt
m x
2
1
1 1( )
N
T
n N n
n
tN
m x
-
7/23/2019 3.Linear Models for Regression
103/103
End