gaussian linear models - web.ics.purdue.edu

19
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) Gaussian Linear Models Prof. Nicholas Zabaras Materials Process Design and Control Laboratory Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Cornell University Ithaca, NY 14853-3801 Email: [email protected] URL: http://mpdc.mae.cornell.edu/ January 24, 2014 1

Upload: others

Post on 16-Oct-2021

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Gaussian Linear Models

Prof. Nicholas Zabaras Materials Process Design and Control Laboratory

Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall

Cornell University Ithaca, NY 14853-3801

Email: [email protected]

URL: http://mpdc.mae.cornell.edu/

January 24, 2014

1

Page 2: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Information (canonical) form of Gaussians

Gaussian Linear Models, Covariance of the Joint Distribution, Mean of the Joint Distribution

Marginal Distribution p(y), Conditional Distribution p(x|y)

Example of Gaussian Linear Systems : Estimating the mean of a Gaussian with a Gaussian Prior

Estimating an Unknown Vector from Noisy Data, Illustration of Bayesian inference for the mean of a 2d Gaussian

Sensor Fusion, Interpolating Noisy Data

Contents

2

Chris Bishop, Pattern Recognition and Machine Learning, Chapter 2 Kevin Murphy, Machine Learning: A probabilistic Perspective, Chapter 4

Page 3: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Information Form of Gaussians

3

The parametrization of Gaussian we have seen up to now is what is called moment parametrization (i.e. in terms of the mean and variance).

We introduce here the information (canonical parametrization) defined using the canonical parameters:

The canonical parametrization of the Gaussian is then:

We can write the marginal and conditional distributions

discussed earlier in this canonical form:

1 1,− −= =Λ Σ Σξ µ

( ) ( ) ( )/2 1/2 11, 2 | exp2

D T T Tc π − − = −

x x xΛ | Λ Λ + ξ Λ ξ − 2ξξN

( ) ( ) ( ) ( )1 1| | , , | ,a b c a a ab b aa b c b b ba aa a bb ba aa abp p − −= − = − −x x x x x xΛ Λ Λ Λ Λ Λ Λ Λξ ξ ξN N

Page 4: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Information/Canonical Parametrization

4

A useful application of this parametrization is the multiplication of Gaussians. You can show the following:

Compare with the corresponding more complicated

expression in terms of moments:

( ) ( ) ( )c f f c g g c f g f gλ λ λ λ= +, , + ,ξ ξ ξ ξN N N

( ) ( )2 2 2 2

2 22 2 2 2

f g g f f gf f g g

f g f g

µ σ µ σ σ σµ σ µ σ

σ σ σ σ

=

+, , ,

+ +N N N

Page 5: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Bayes’ Theorem and Gaussian Linear Models

5

Consider a linear Gaussian model: A Gaussian marginal distribution p(x) and a Gaussian conditional distribution p(y|x) in which p(y|x) has a mean that is a linear function of x, and a covariance which is independent of x.

We want using Bayes’ rule to find p(y) and p(x|y).

We start with the joint distribution over z=(x,y) which is

quadratic in the components of z – so p(z) is a Gaussian.

( ) ( )( ) ( )

1

1

| ,

| | ,

p

p

=

= +

N

N

x x

y x y Ax b L

µ Λ

Page 6: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Bayes’ Theorem and Gaussian Processes

6

( ) ( )( ) ( )

1

1

| ,

| | ,

p

p

=

= +

N

N

x x

y x y Ax b L

µ Λ

( ) ( ) 1ln ) ln ln | ( ) ( )2

1 ( ) ( )2

T

T

p( p p

const

= + = − − −

− − − +

z x y x x x

y Ax - b L y Ax - b

µ µΛ

Page 7: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Covariance of the Joint Distribution

7

We can immediately write down the covariance of z.

Here we used an earlier result on matrix inversion.

( )1 1 1 1ln )2 2 2 2

12

T T T T T T

T T T

p( const

const

= − − + + + =

− = − + −

z x A LA x y Ly y LAx x A Ly

x xA LA A Ly yLA L

Λ +

Λ +

[ ]1 1 1

1 1 1

− − −

− − −

−= =

T T T

TcovA LA A L A

zLA L A L + A A

Λ + Λ ΛΛ Λ

1 1 1

1 1, :

-where

− − −

− −

= = +

-1-1

-1 -1 -1 -1

A B M M BDM A - BD C

C D -D CM D D CM BD

Quadratic terms

Page 8: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Mean of the Joint Distribution

8

We can immediately write down the mean of z:

It remains to find the marginal p(y). We can use earlier derived results.

1 1ln ) ...2 2

...

T T T T

T T

p( = − +

= +

z x y Lb x A Lb

x - A Lby Lb

µ+ −

µ

Λ

Λ

[ ]1 1

1 1 1

T T

T

− −

− − −

= = +

A - A Lb

zA bA L + A A Lbµµµ

Λ Λ ΛΛ Λ

Linear terms

[ ] [ ]1− =

TT Tcov

- A Lbz z z z

LbµΛ

Page 9: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Marginal p(y) Distribution

9

Recall our earlier result for computing the marginal:

Based on our calculations:

we conclude:

Note that for A=I, the convolution of the two Gaussians gives the well known result:

( ) ( )| ,a a a aap = Nx x µ Σ

[ ] = +

zA bµµ [ ]

1 1

1 1 1

− −

− − −

=

T

TcovA

zA L + A AΛ ΛΛ Λ

[ ] = + y A bµ

[ ] 1 1− −= Tcov y L + A AΛ

[ ] = + y bµ [ ] 1 1− −=cov y L + Λ

( ) ( )1 1| , Tp − −= +Ny y A b L + A Aµ Λ

Page 10: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Conditional p(x|y) Distribution

10

Recall our earlier result for computing the conditional:

Based on our calculations:

we conclude:

[ ] = +

zA bµµ

[ ] ( )( ) ( )( ) ( )

1

1

1

| ( )

( )

(

T T

T T T T

T T

= − =

= − +

= +

x y A LA A L y A b

A LA A LA A LA A L y b

A LA A L y b)

µ+ µ−

µ µ µ −

µ −

Λ +

Λ + Λ +

Λ + Λ [ ] ( ) 1−= Tcov x | y A LAΛ +

1| ( )a b a aa ab b b

−= − −xµ µ µΛ Λ( ) ( )1|| | ,a b a a b aap −=x x x µ ΛN

[ ]1 1 1

1 1 1

− − −

− − −

−= =

T T T

TcovA LA A L A

zLA L A L + A A

Λ + Λ ΛΛ Λ

( ) ( ) ( ) ( )( )1 1| | ( ,T T Tp

− −= +x y x A LA A L y b) A LAµ −Λ + Λ Λ +N

Page 11: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Example of Linear Gaussian Systems: Inferring the Mean

11

We revisit this Bayesian inference problem for the Gaussian. Consider

To put this in the form of our linear Gaussian system, let:

Then from our conditional results given earlier:

This can be simplified as:

The precision is the prior precision + N measurement precisions. The mean is the weighted average of the MLE and prior mean.

{ } 2 1 2 11 2 0 0 0, ,..., ~ ( | , ), ~ ( | , ).N yy y y x with prior xσ λ µ σ λ− −= = = N Ny

1( ), , ( )N y ycolumn vector diag λ−= = =1 0A b IΣ

( ) ( ) ( ) ( )( )1 1| | ( ,T T Tp

− −= +x y x A LA A L y b) A LAµ −Λ + Λ Λ +N

( ) ( )( ) ( )

1

1

| ,

| | ,

p

p

=

= +

N

N

x x

y x y Ax b L

µ Λ

( ) ( ) ( ) ( )( )1 1

0 0 0 0| | ( ,T T TN y N N y N y Np x x λ λ λ µ λ λ λ

− −= +1 1 1 1 1y I I y 0) I+ +−N

( ) ( ) 100 0

0 0

| | ,

N

N

yy

y y

Np x x y N

N Nλ

µ

λ λµ λ λ

λ λ λ λ−

= +

y ++ +

N

Page 12: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Example of Linear Gaussian Systems: Inferring the Mean

12

-5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

prior variance = 1.00

priorlikpost

-5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

prior variance = 5.00

priorlikpost

Inference about x given a noisy observation y = 3. (a) Strong prior N(0, 1). The posterior mean is “shrunk” towards the

prior mean, which is 0. (b) Weak prior N(0, 5). The posterior mean is similar to the MLE

gaussInferParamsMean1d from PMTK

Page 13: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Example of Linear Gaussian Systems: Inferring the Mean

13

We can re-write these results in terms of variances recovering our results from this lecture.

2

2 22 0

2 2 2 2 20 0

2 22 21 0 0 0

02 2 2 2 2 2 2 20 0 0 0

| ~ ( , )

1 1 ,

N N

NN

N

nn ML

N N N ML

x with

N andN

xNN

N N

µ σ

σ σσσ σ σ σ σ

µ µ σµ σµ σ σ µ µσ σ σ σ σ σ σ σ=

= + ⇒ =+

= + = + = + + +

Y N

Page 14: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Example of Linear Gaussian Systems: Inferring the Mean

14

The posterior precision is the sum of the precision of the prior plus one contribution of the data precision for each observed data point. For N→∞ the posterior peaks around the µML and the posterior variance goes to zero, i.e. MLE estimate is recovered within the Bayesian paradigm.

If we apply the data sequentially, we can write for the posterior mean

after the collection of one data point (N=1) the following:

Shrinkage is often measured also with the signal-to-noise ratio:

How about when In this case note that

2 2 2 22 0 0

02 2 2 2 2 20 0 0

,N N MLNand

N N Nσ σ σ σσ µ µ µσ σ σ σ σ σ

= = ++ + +

20 ?σ →∞

22N N MLand

Nσσ µ µ→ →

2

1 0 02 20

( ) ( )y y shrinkage of the data y towards prior meanσµ µ µσ σ

= − −+

( )2 2 2

20 022

, ( ), ~ 0,X

SNR for y x observed signalσ µ ε ε σσε

+ = = = +

N

Page 15: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Example of Linear Gaussian Systems: Inferring an Unknown Vector

15

Consider the following linear Gaussian system:

Consider an effective observation .*

From our earlier results with A=I, we have:

------------------------------- * Note that the effective observation comes with precision from . One can see this by writing the likelihood of the N data as which can equivalently be written as

{ }1 2 0 0, ,..., ~ ( , ), ~ ( , ).N y with prior=y y y y x xΣ ΣµN N

( ) ( )( ) ( )

1

1

| ,

| | ,

p

p

=

= +

N

N

x x

y x y Ax b L

µ Λ( ) ( ) ( ) ( )( )1 1

| | ( ,T T Tp− −

= +x y x A LA A L y b) A LAµ −Λ + Λ Λ +N

( ) ( ) ( ) ( )( )1 11 1 1 1 1 11 2 0 0 0| , ,..., | ,N y y yp N N N

− −− − − − − −= +x y y y x yΣ + Σ Σ Σ Σ + ΣµN

1~ ( , )yNNy x Σ

y 1yN −Σ

1~ ( , )yNNy x Σ

{ }1 2, ,..., ~ ( , )N y= Ny y y y x Σ1( | , )yN

N x y Σ

( ) 1| | , ypN

=

Ny x y x Σ

Page 16: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

-1 0 1-1

-0.5

0

0.5

1data prior

-1 0 1-1

-0.5

0

0.5

1

Bayesian Inference for the Mean of a 2d Gaussian

16

Illustration of Bayesian inference for the mean of a 2d Gaussian. (a) The data is generated from yi ∼ N(x,Σy), where x = [0.5, 0.5]T and Σy = 0.1[2, 1; 1, 1]. We assume the sensor noise covariance Σy is known but x is unknown. The black cross represents x. (b) The prior is p(x) = N(x|0, 0.1I2). (c) We show the posterior after 10 data points have been observed.

Think of this as identifying a missile location x from noisy measurements yi.

gaussInferParamsMean2d from PMTK

Page 17: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Sensor Fusion

17

-1 -0.5 0 0.5 1 1.5-1.5

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1 1.5-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

-0.5 0 0.5 1 1.5-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

We observe y1 = (0,−1) (red cross) and y2 = (1, 0) (green cross) and infer E(μ|y1, y2, θ) (black cross). (a) Equally reliable sensors, so the posterior mean estimate is in between the two circles. (b) Sensor 2 is more reliable, so the estimate shifts more towards the green circle. (c) Sensor 1 is more reliable in the vertical direction, Sensor 2 is more reliable in the horizontal direction. The estimate is an appropriate combination of the two measurements.

sensorFusion2d from PMTK

( ) ( )100 0 2| , 10p = = =0x x IΣµN

( ) ( )1 ,1 2 ,2, , ,y yy x y xΣ Σ~ N ~ N

,1 ,2 20.01y y= = IΣ Σ ,1 2 ,2 20.05 , 0.01y y= =I IΣ Σ ,1 ,2

10 1 1 10.01 , 0.01

1 1 1 10y y

= =

Σ Σ

Page 18: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

We revisit the interpolation example discussed earlier but with noisy data. We observe N observations yi of x1,…,xN.

Here A is an NxD matrix that picks the observed elements out of x.

The prior is as before:

Using the linear Gaussian system results, we can compute the

needed posterior (see an example on the next slide).

The posterior mean can also be computed by solving the following (regularized) optimization problem (the 2nd term penalizes rapid variability of the data – 1st derivative Tikhonov regularizer)

18

Interpolating Noisy Data

D. Calvetti and E. Somersalo, Introduction to Bayesian Scientific Computing, 2007

( ) ( )1 1

2 2

1 1( ) , ,T Txp 0

λ λ− − = =

Nx L L L LΣ

( ) ( ) ( )2 221 1 0 1 12

1 1

1min , ,2 2i

N D

i i j j j j D Dx i jx y x x x x x x x xλ

σ − + += =

− + − + − = = ∑ ∑

( ) 2~ , ,y yA 0 σ= + =Ny x IΣ Σε, ε

( )p x | y

Page 19: Gaussian Linear Models - web.ics.purdue.edu

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Interpolating Noisy Data

We now see that the prior precision λ effects the posterior mean as well as the posterior variance (in comparison to the case of no noise)

For a strong prior (large λ), the estimate is very smooth, and the uncertainty is low but for a weak prior (small λ), the estimate is wiggly, and the uncertainty (away from the data) is high.

19

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-5

-4

-3

-2

-1

0

1

2

3

4

5λ=0p1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-5

-4

-3

-2

-1

0

1

2

3

4

5λ=30

gaussInterpNoisyDemo, splineBasisDemo from PMTK