16. least-squares estimation mae 546 2018

Least-Squares Estimation Robert Stengel

Optimal Control and Estimation, MAE 546, Princeton University, 2018

•  Estimating unknown constants from redundant measurements–  Least-squares–  Weighted least-squares

•  Recursive weighted least-squares estimator

Copyright 2018 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE3546.html

http://www.princeton.edu/~stengel/OptConEst.html

1

Perfect Measurement of a Constant Vector

•  Given –  Measurements, y, of a constant vector, x

•  Estimate x

•  Assume that output, y, is a perfect measurement and H is invertible

y = H xy: (n x 1) output vectorH: (n x n) output matrixx : (n x 1) vector to be estimated

•  Estimate is based on inverse transformation

x = H−1 y2

Imperfect Measurement of a Constant Vector

•  Given –  “Noisy” measurements, z, of a

constant vector, x•  Effects of error can be reduced if

measurement is redundant•  Noise-free output, y

y = H x•  Measurement of output with error, z

z = y + n = H x + n z: (r x 1) measurement vectorn : (r x 1) error vector

y: (r x 1) output vectorH: (r x n) output matrix, r > nx : (n x 1) vector to be estimated

3

Cost Function for Least-Squares Estimate

Squared measurement error = cost function, J

What is the control parameter?

J = 12εTε =

12z −H x( )T z −H x( )

=12zTz − xTHTz − zTH x + xTHTH x( )

Quadratic norm

xThe estimate of x dim x( ) = n ×1( )

Measurement-error residual

ε = z −H x = z − y dim ε( ) = r ×1( )

4

Static Minimization Provides Least-Squares Estimate

Necessary condition for minimum∂J∂x

= 0 = 120 − HTz( )T − zTH + HTH x( )T + xTHTH⎡⎣

⎤⎦

J = 12zTz − xTHTz − zTH x + xTHTH x( )Error cost function (scalar)

xTHTH = zTH

5

Sufficient condition for minimumHTH > 0

Static Minimization Provides Least-Squares Estimate

xT HTH( ) HTH( )−1 = xT = zTH HTH( )−1 (row)

or

x = HTH( )−1HT z (column)

Estimate employs left pseudo-inverse matrix

6

Example: Average Weight of a Pail of Jelly Beans

•  Measurements are equally uncertain

•  Optimal estimate of x (scalar)

•  Express measurements as

zi = x + ni , i = 1 to r

z = Hx + n

H =

11...1

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

•  Output matrix

x = HTH( )−1HT z

7

Average Weight of the Jelly Beans

Optimal estimate

x = 1 1 ... 1⎡⎣ ⎤⎦

11...1

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

⎛

⎝

⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟

−1

1 1 ... 1⎡⎣ ⎤⎦

z1z2...zr

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

= r( )−1 z1 + z2 + ...+ zr( )

x = 1r

zii=1

r

∑ sample mean value[ ]Simple average

8

HTH = r > 0

Convexity

Least-Squares Applications

•  More generally, least-squares estimation is used for–  Higher-degree curve-fitting–  Multivariate estimation

Error “Cost” Function

9

Least-Squares Linear Fit to Noisy Data

y = a0 + a1xz = a0 + a1x( ) + n

zi = a0 + a1x( ) + ni , i = 1,r

z1z2!zr

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

=

a0 + a1x1( )a0 + a1x2( )!

a0 + a1xr( )

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

+

n1n2!nr

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

z =

1 x11 x2! !1 xr

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

a0a1

⎡

⎣⎢⎢

⎤

⎦⎥⎥+ n

" Ha + n

a =a0a1

⎡

⎣⎢⎢

⎤

⎦⎥⎥= HTH( )−1HT z

J = 12z −H a( )T z −H a( )

y = a0 + a1x

Least-squares coefficient estimate

Find trend line in noisy data

Measurement vector

Error cost function

10

Least-squares output estimate

Measurements of Differing Quality

•  Suppose some elements of the measurement, z, are more uncertain than others

z = Hx + n•  Give the more uncertain measurements less weight

in arriving at the minimum-cost estimate•  Let S = measure of uncertainty; then express error

cost in terms of S–1

J = 12εTS−1ε

11

dim z( ) = dim n( ) = r ×1( )dim H( ) = r × n( )dim x( ) = n ×1( )

dim ε( ) = r ×1( )dim S( ) = r × r( )

Error Cost and Necessary Condition for a Minimum

Error cost function, J

Necessary condition for a minimum∂ J∂ x

= 0

= 120 − HTS−1z( )T − zTS−1H + HTS−1H x( )T + xTHTS−1H⎡⎣

⎤⎦

J = 12εTS−1ε = 1

2z −H x( )T S−1 z −H x( )

= 12zTS−1z − xTHTS−1z − zTS−1H x + xTHTS−1H x( )

12

Sufficient condition for a minimumHTS−1H > 0

Weighted Least-Squares Estimate of a Constant Vector

xTHTS−1H − zTS−1H⎡⎣ ⎤⎦ = 0

xTHTS−1H = zTS−1HWeighted left pseudo-inverse provides the solution

x = HTS−1H( )−1HTS−1 z

Necessary condition for a minimum

13

Optimal estimate of average jelly bean weight

x = 1 1 ... 1⎡⎣ ⎤⎦

a11 0 ... 00 a22 ... 0... ... ... ...0 0 ... arr

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

11...1

⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

−1

1 1 ... 1⎡⎣ ⎤⎦

a11 0 ... 00 a22 ... 0... ... ... ...0 0 ... arr

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

z1z2...zr

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

S−1 ! A =

a11 0 ... 00 a22 ... 0... ... ... ...0 0 ... arr

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

x = x = HTS−1H( )−1HTS−1 z

Return of the Jelly Beans

Error-weighting matrix

x =aii

i=1

r

∑ zi

aiii=1

r

∑ 14

Weighted Estimate of x (scalar)

Weighted Least Squares (“Kriging”) Estimates

(Wiener–Kolmogorov Interpolation between measurement points)

15

y

x

!  Curve, y(x), between measurement points, xi, is the mean of a stationary process with covariance derived from the measurements, y(xi) or other known source

!  Curve, y(x), is a distance-weighted linear combination of all of the points

y x( ) = wT x( )y x( ) = w1 x( ) w2 x( ) ! wk x( )⎡⎣

⎤⎦

y x1( )y x2( )!y xk( )

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

http://en.wikipedia.org/wiki/Kriging

w : Minimum-least-squares weighting function

How to Chose the Error Weighting Matrix

J = 12εTSA

−1ε =12z − y( )T SA−1 z − y( ) = 1

2z −H x( )T SA−1 z −H x( )

a)  Normalize the cost function according to expected measurement error, SA

J = 12εTSB

−1ε = 12z −H x( )T SB−1 z −H x( )

b)   Normalize the cost function according to expected measurement residual, SB

16

dim SA( ) = dim SB( ) = r × r( )

Measurement Error Covariance, SA

SA = E z − y( ) z − y( )T⎡⎣ ⎤⎦

= E z −Hx( ) z −Hx( )T⎡⎣ ⎤⎦= E nnT⎡⎣ ⎤⎦ ! R

Expected value of outer product of measurement error vector

17

Measurement Residual Covariance, SB

SB = E εεT⎡⎣ ⎤⎦

= E z −Hx( ) z −Hx( )T⎡⎣ ⎤⎦

= E Hε + n( ) Hε + n( )T⎡⎣ ⎤⎦

ε = z −Hx( )

Expected value of outer product of measurement residual vector

Requires iteration (“adaptation”) of the estimate to find SB

SB = HE εεT⎡⎣ ⎤⎦HT +HE εnT( ) + E nεT( )HT + E nnT( )

! HPHT +HM +MTHT +R where

P = E x − x( ) x − x( )T⎡⎣ ⎤⎦M = E x − x( )nT⎡⎣ ⎤⎦R = E nnT⎡⎣ ⎤⎦

18

Recursive Least-Squares Estimation

! Prior unweighted and weighted least-squares estimators use “batch-processing” approach!  All information is gathered prior to processing!  All information is processed at once

! Recursive approach!  Optimal estimate has been made from prior

measurement set!  New measurement set is obtained!  Optimal estimate is improved by incremental

change (or correction) to the prior optimal estimate

19

Prior Optimal EstimateInitial measurement set and state

estimate, with S = SA = R

z1 = H1x + n1x1 = H1

TR1−1H1( )−1H1

TR1−1z1

dim z1( ) = dim n1( ) = r1 ×1dim H1( ) = r1 × ndim R1( ) = r1 × r1

State estimate minimizes

J1 =12ε1TR1

−1ε1 =12z1 −H1 x1( )T R1

−1 z1 −H1 x1( )

20

New measurementNew Measurement Set

z2 = H2x + n2

R2 : Second measurement error covariance

dim z2( ) = dim n2( ) = r2 ×1dim H2( ) = r2 × ndim R2( ) = r2 × r2

z z1z2

⎛

⎝⎜⎜

⎞

⎠⎟⎟

Concatenation of old and new measurements

21

Cost of Estimation Based on Both Measurement Sets

J2 = z1 −H1x2( )T z2 −H2x2( )T⎡⎣⎢

⎤⎦⎥R1

−1 0

0 R2−1

⎛

⎝⎜⎜

⎞

⎠⎟⎟

z1 −H1x2( )z2 −H2x2( )

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

= z1 −H1x2( )T R1−1 z1 −H1x2( ) + z2 −H2x2( )T R2

−1 z2 −H2x2( )

Cost function incorporates estimate made after incorporating z2

Both residuals refer to x2

22

Optimal Estimate Based on Both Measurement Sets

x2 = H1T H2

T⎡⎣

⎤⎦R1

−1 00 R2

−1

⎡

⎣⎢⎢

⎤

⎦⎥⎥

H1

H2

⎡

⎣⎢⎢

⎤

⎦⎥⎥

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

−1

H1T H2

T⎡⎣

⎤⎦R1

−1 00 R2

−1

⎡

⎣⎢⎢

⎤

⎦⎥⎥

z1z2

⎡

⎣⎢⎢

⎤

⎦⎥⎥

⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Simplification occurs because weighting matrix is block diagonal

23

= H1TR1

−1H1 +H2TR2

−1H2( )−1 H1TR1

−1z1 +H2TR2

−1z2( )Inverse can be put in more useful form

using the Matrix Inversion Lemma

Development of Matrix Inversion Lemma

A =A1 m ×m( ) A2 m × n( )A3 n ×m( ) A4 n × n( )

⎡

⎣⎢⎢

⎤

⎦⎥⎥

24

B ! A−1; then BA = AB = I

A & B are square, non-singular matrices

B =B1 m ×m( ) B2 m × n( )B3 n ×m( ) B4 n × n( )

⎡

⎣⎢⎢

⎤

⎦⎥⎥!

A1 A2

A3 A4

⎡

⎣⎢⎢

⎤

⎦⎥⎥

−1

Development of Matrix Inversion Lemma

25

AB =A1 A2

A3 A4

⎡

⎣⎢⎢

⎤

⎦⎥⎥

B1 B2B3 B4

⎡

⎣⎢⎢

⎤

⎦⎥⎥

=A1B1 +A2B3( ) A1B2 +A2B4( )A3B1 +A4B3( ) A3B2 +A4B4( )

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

=Im 00 In

⎡

⎣⎢⎢

⎤

⎦⎥⎥

AB product is Identity Matrix

BA =B1 B2B3 B4

⎡

⎣⎢⎢

⎤

⎦⎥⎥

A1 A2

A3 A4

⎡

⎣⎢⎢

⎤

⎦⎥⎥

=B1A1 +B2A3( ) B1A2 +B2A4( )B3A1 +B4A3( ) B3A2 +B4A4( )

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

=Im 00 In

⎡

⎣⎢⎢

⎤

⎦⎥⎥

BA product is Identity Matrix

Submatrices of B

26

B3 = −A4−1A3B1

A1B1 −A2A4−1A3B1 = Im

A1 −A2A4−1A3( )B1 = Im

B1 = A1 −A2A4−1A3( )−1

B3 = −A4−1A3 A1 −A2A4

−1A3( )−1

From first column of AB

Solutions for B

27

B =A1 −A2A4

−1A3( )−1 −A1−1A2 A4 −A3A1

−1A2( )−1

−A4−1A3 A1 −A2A4

−1A3( )−1 A4 −A3A1−1A2( )−1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

Similar solutions from second column of AB

Alternative solutions from BA

B =A1 −A2A4

−1A3( )−1 − A1 −A2A4−1A3( )−1A2

−1A4

− A4 −A3A1−1A2( )−1A3

−1A1 A4 −A3A1−1A2( )−1

⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥

Matrix Inversion Lemma

Matrix inversion lemma for symmetric A

28

B4 = In −B3A2[ ]A4−1

Substitution and elimination in BA

A4 −A2TA1

−1A2⎡⎣ ⎤⎦ =

= A4−1 −A4

−1A2T A2A4

−1A2T −A1⎡⎣ ⎤⎦

−1A2A4

−1

For symmetric AA1,A4 ,B1, and B4 are symmetric

A3 = A2T and B3 = B2

T

Apply Matrix Inversion Lemma in Optimal Estimate

First, define

P1−1 H1

TR1−1H1

From matrix inversion lemma

H1TR1

−1H1 +H2TR2

−1H2( )−1 = P1−1 +H2

TR2−1H2( )−1

= P1 − P1H2T H2P1H2

T +R2( )−1H2P1

29

dim P1( ) = n × n( )

Improved Estimate Incorporating New Measurement Set

New estimate is a correction to the old

x1 = P1H1TR1

−1z1

x2 = x1 − P1H2T H2P1H2

T +R2( )−1H2x1

+P1H2T In − H2P1H2

T +R2( )−1H2P1H2T⎡

⎣⎤⎦R2

−1z2

= In − H2P1H2T +R2( )−1H2P1H2

T⎡⎣

⎤⎦ x1

+P1H2T In − H2P1H2

T +R2( )−1H2P1H2T⎡

⎣⎤⎦R2

−1z230

Simplify Optimal Estimate Incorporating New Measurement Set

I = A−1A = AA−1, with A H2P1H2

T + R2

x2 = x1 − P1H2T H2P1H2

T +R2( )−1 z2 −H2x1( )! x1 −K z2 −H2x1( )

Estimator gain matrix

K = P1H2T H2P1H2

T + R2( )−131

Recursive Optimal Estimate! Prior estimate may be based on prior

incremental estimate, and so on! Generalize to a recursive form, with

sequential index i

xi = xi−1 − Pi−1HiT HiPi−1Hi

T +Ri( )−1 zi −Hixi−1( )! xi−1 −Ki zi −Hixi−1( )

dim x( ) = n ×1; dim P( ) = n × ndim z( ) = r ×1; dim R( ) = r × rdim H( ) = r × n; dim K( ) = n × r

32

with

Pi = Pi−1−1 +Hi

TRi−1Hi( )−1

Example of Recursive Optimal Estimate

z = x + n

xi = xi−1 + pi−1 pi−1 +1( )−1 zi − xi−1( )! xi−1 + ki zi − xi−1( )

H = 1; R = 1

x0 = z0x1 = 0.5 x0 + 0.5z1x2 = 0.667 x1 + 0.333z2x3 = 0.75 x2 + 0.25z3x4 = 0.8 x3 + 0.2z4

index p-sub-i k-sub-i0 1 -1 0.5 0.52 0.333 0.3333 0.25 0.254 0.2 0.25 0.167 0.167

ki = pi−1 pi−1 +1( )−1 = pi−1pi−1 +1( )

33

pi = pi−1−1 +1( )−1 = 1

pi−1−1 +1( )

Optimal Gain and Estimate-Error Covariance

! With constant estimation error matrix, R,!  Error covariance decreases at each step!  Estimator gain matrix, K, invariably goes to zero

as number of samples increases! Why?! Each new sample has smaller effect on the

average than the sample before

Pi = Pi−1−1 +Hi

TRi−1Hi( )−1

Ki = Pi−1HiT HiPi−1Hi

T +Ri( )−1

34

Next Time:Propagation of Uncertainty

in Dynamic Systems

35

Supplemental Material!

36

Covariance Matrix is Expected Value of the Outer Product

!  Transcript correlation for normal and abnormal samples!  Identifies possible components of biological circuits

x =

x =

!  Sample tissue correlation for RNA expression!  Confirms or questions the classification of samples

–1 0 +1

37

16. least-squares estimation mae 546 2018

Documents