16. least-squares estimation mae 546 2018
TRANSCRIPT
Least-Squares Estimation Robert Stengel
Optimal Control and Estimation, MAE 546, Princeton University, 2018
• Estimating unknown constants from redundant measurements– Least-squares– Weighted least-squares
• Recursive weighted least-squares estimator
Copyright 2018 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE3546.html
http://www.princeton.edu/~stengel/OptConEst.html
1
Perfect Measurement of a Constant Vector
• Given – Measurements, y, of a constant vector, x
• Estimate x
• Assume that output, y, is a perfect measurement and H is invertible
y = H xy: (n x 1) output vectorH: (n x n) output matrixx : (n x 1) vector to be estimated
• Estimate is based on inverse transformation
x = H−1 y2
Imperfect Measurement of a Constant Vector
• Given – “Noisy” measurements, z, of a
constant vector, x• Effects of error can be reduced if
measurement is redundant• Noise-free output, y
y = H x• Measurement of output with error, z
z = y + n = H x + n z: (r x 1) measurement vectorn : (r x 1) error vector
y: (r x 1) output vectorH: (r x n) output matrix, r > nx : (n x 1) vector to be estimated
3
Cost Function for Least-Squares Estimate
Squared measurement error = cost function, J
What is the control parameter?
J = 12εTε =
12z −H x( )T z −H x( )
=12zTz − xTHTz − zTH x + xTHTH x( )
Quadratic norm
xThe estimate of x dim x( ) = n ×1( )
Measurement-error residual
ε = z −H x = z − y dim ε( ) = r ×1( )
4
Static Minimization Provides Least-Squares Estimate
Necessary condition for minimum∂J∂x
= 0 = 120 − HTz( )T − zTH + HTH x( )T + xTHTH⎡⎣
⎤⎦
J = 12zTz − xTHTz − zTH x + xTHTH x( )Error cost function (scalar)
xTHTH = zTH
5
Sufficient condition for minimumHTH > 0
Static Minimization Provides Least-Squares Estimate
xT HTH( ) HTH( )−1 = xT = zTH HTH( )−1 (row)
or
x = HTH( )−1HT z (column)
Estimate employs left pseudo-inverse matrix
6
Example: Average Weight of a Pail of Jelly Beans
• Measurements are equally uncertain
• Optimal estimate of x (scalar)
• Express measurements as
zi = x + ni , i = 1 to r
z = Hx + n
H =
11...1
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
• Output matrix
x = HTH( )−1HT z
7
Average Weight of the Jelly Beans
Optimal estimate
x = 1 1 ... 1⎡⎣ ⎤⎦
11...1
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
−1
1 1 ... 1⎡⎣ ⎤⎦
z1z2...zr
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
= r( )−1 z1 + z2 + ...+ zr( )
x = 1r
zii=1
r
∑ sample mean value[ ]Simple average
8
HTH = r > 0
Convexity
Least-Squares Applications
• More generally, least-squares estimation is used for– Higher-degree curve-fitting– Multivariate estimation
Error “Cost” Function
9
Least-Squares Linear Fit to Noisy Data
y = a0 + a1xz = a0 + a1x( ) + n
zi = a0 + a1x( ) + ni , i = 1,r
z1z2!zr
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
=
a0 + a1x1( )a0 + a1x2( )!
a0 + a1xr( )
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
+
n1n2!nr
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
z =
1 x11 x2! !1 xr
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
a0a1
⎡
⎣⎢⎢
⎤
⎦⎥⎥+ n
" Ha + n
a =a0a1
⎡
⎣⎢⎢
⎤
⎦⎥⎥= HTH( )−1HT z
J = 12z −H a( )T z −H a( )
y = a0 + a1x
Least-squares coefficient estimate
Find trend line in noisy data
Measurement vector
Error cost function
10
Least-squares output estimate
Measurements of Differing Quality
• Suppose some elements of the measurement, z, are more uncertain than others
z = Hx + n• Give the more uncertain measurements less weight
in arriving at the minimum-cost estimate• Let S = measure of uncertainty; then express error
cost in terms of S–1
J = 12εTS−1ε
11
dim z( ) = dim n( ) = r ×1( )dim H( ) = r × n( )dim x( ) = n ×1( )
dim ε( ) = r ×1( )dim S( ) = r × r( )
Error Cost and Necessary Condition for a Minimum
Error cost function, J
Necessary condition for a minimum∂ J∂ x
= 0
= 120 − HTS−1z( )T − zTS−1H + HTS−1H x( )T + xTHTS−1H⎡⎣
⎤⎦
J = 12εTS−1ε = 1
2z −H x( )T S−1 z −H x( )
= 12zTS−1z − xTHTS−1z − zTS−1H x + xTHTS−1H x( )
12
Sufficient condition for a minimumHTS−1H > 0
Weighted Least-Squares Estimate of a Constant Vector
xTHTS−1H − zTS−1H⎡⎣ ⎤⎦ = 0
xTHTS−1H = zTS−1HWeighted left pseudo-inverse provides the solution
x = HTS−1H( )−1HTS−1 z
Necessary condition for a minimum
13
Optimal estimate of average jelly bean weight
x = 1 1 ... 1⎡⎣ ⎤⎦
a11 0 ... 00 a22 ... 0... ... ... ...0 0 ... arr
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
11...1
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
−1
1 1 ... 1⎡⎣ ⎤⎦
a11 0 ... 00 a22 ... 0... ... ... ...0 0 ... arr
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
z1z2...zr
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
S−1 ! A =
a11 0 ... 00 a22 ... 0... ... ... ...0 0 ... arr
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
x = x = HTS−1H( )−1HTS−1 z
Return of the Jelly Beans
Error-weighting matrix
x =aii
i=1
r
∑ zi
aiii=1
r
∑ 14
Weighted Estimate of x (scalar)
Weighted Least Squares (“Kriging”) Estimates
(Wiener–Kolmogorov Interpolation between measurement points)
15
y
x
! Curve, y(x), between measurement points, xi, is the mean of a stationary process with covariance derived from the measurements, y(xi) or other known source
! Curve, y(x), is a distance-weighted linear combination of all of the points
y x( ) = wT x( )y x( ) = w1 x( ) w2 x( ) ! wk x( )⎡⎣
⎤⎦
y x1( )y x2( )!y xk( )
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
http://en.wikipedia.org/wiki/Kriging
w : Minimum-least-squares weighting function
How to Chose the Error Weighting Matrix
J = 12εTSA
−1ε =12z − y( )T SA−1 z − y( ) = 1
2z −H x( )T SA−1 z −H x( )
a) Normalize the cost function according to expected measurement error, SA
J = 12εTSB
−1ε = 12z −H x( )T SB−1 z −H x( )
b) Normalize the cost function according to expected measurement residual, SB
16
dim SA( ) = dim SB( ) = r × r( )
Measurement Error Covariance, SA
SA = E z − y( ) z − y( )T⎡⎣ ⎤⎦
= E z −Hx( ) z −Hx( )T⎡⎣ ⎤⎦= E nnT⎡⎣ ⎤⎦ ! R
Expected value of outer product of measurement error vector
17
Measurement Residual Covariance, SB
SB = E εεT⎡⎣ ⎤⎦
= E z −Hx( ) z −Hx( )T⎡⎣ ⎤⎦
= E Hε + n( ) Hε + n( )T⎡⎣ ⎤⎦
ε = z −Hx( )
Expected value of outer product of measurement residual vector
Requires iteration (“adaptation”) of the estimate to find SB
SB = HE εεT⎡⎣ ⎤⎦HT +HE εnT( ) + E nεT( )HT + E nnT( )
! HPHT +HM +MTHT +R where
P = E x − x( ) x − x( )T⎡⎣ ⎤⎦M = E x − x( )nT⎡⎣ ⎤⎦R = E nnT⎡⎣ ⎤⎦
18
Recursive Least-Squares Estimation
! Prior unweighted and weighted least-squares estimators use “batch-processing” approach! All information is gathered prior to processing! All information is processed at once
! Recursive approach! Optimal estimate has been made from prior
measurement set! New measurement set is obtained! Optimal estimate is improved by incremental
change (or correction) to the prior optimal estimate
19
Prior Optimal EstimateInitial measurement set and state
estimate, with S = SA = R
z1 = H1x + n1x1 = H1
TR1−1H1( )−1H1
TR1−1z1
dim z1( ) = dim n1( ) = r1 ×1dim H1( ) = r1 × ndim R1( ) = r1 × r1
State estimate minimizes
J1 =12ε1TR1
−1ε1 =12z1 −H1 x1( )T R1
−1 z1 −H1 x1( )
20
New measurementNew Measurement Set
z2 = H2x + n2
R2 : Second measurement error covariance
dim z2( ) = dim n2( ) = r2 ×1dim H2( ) = r2 × ndim R2( ) = r2 × r2
z z1z2
⎛
⎝⎜⎜
⎞
⎠⎟⎟
Concatenation of old and new measurements
21
Cost of Estimation Based on Both Measurement Sets
J2 = z1 −H1x2( )T z2 −H2x2( )T⎡⎣⎢
⎤⎦⎥R1
−1 0
0 R2−1
⎛
⎝⎜⎜
⎞
⎠⎟⎟
z1 −H1x2( )z2 −H2x2( )
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
= z1 −H1x2( )T R1−1 z1 −H1x2( ) + z2 −H2x2( )T R2
−1 z2 −H2x2( )
Cost function incorporates estimate made after incorporating z2
Both residuals refer to x2
22
Optimal Estimate Based on Both Measurement Sets
x2 = H1T H2
T⎡⎣
⎤⎦R1
−1 00 R2
−1
⎡
⎣⎢⎢
⎤
⎦⎥⎥
H1
H2
⎡
⎣⎢⎢
⎤
⎦⎥⎥
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
−1
H1T H2
T⎡⎣
⎤⎦R1
−1 00 R2
−1
⎡
⎣⎢⎢
⎤
⎦⎥⎥
z1z2
⎡
⎣⎢⎢
⎤
⎦⎥⎥
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
Simplification occurs because weighting matrix is block diagonal
23
= H1TR1
−1H1 +H2TR2
−1H2( )−1 H1TR1
−1z1 +H2TR2
−1z2( )Inverse can be put in more useful form
using the Matrix Inversion Lemma
Development of Matrix Inversion Lemma
A =A1 m ×m( ) A2 m × n( )A3 n ×m( ) A4 n × n( )
⎡
⎣⎢⎢
⎤
⎦⎥⎥
24
B ! A−1; then BA = AB = I
A & B are square, non-singular matrices
B =B1 m ×m( ) B2 m × n( )B3 n ×m( ) B4 n × n( )
⎡
⎣⎢⎢
⎤
⎦⎥⎥!
A1 A2
A3 A4
⎡
⎣⎢⎢
⎤
⎦⎥⎥
−1
Development of Matrix Inversion Lemma
25
AB =A1 A2
A3 A4
⎡
⎣⎢⎢
⎤
⎦⎥⎥
B1 B2B3 B4
⎡
⎣⎢⎢
⎤
⎦⎥⎥
=A1B1 +A2B3( ) A1B2 +A2B4( )A3B1 +A4B3( ) A3B2 +A4B4( )
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
=Im 00 In
⎡
⎣⎢⎢
⎤
⎦⎥⎥
AB product is Identity Matrix
BA =B1 B2B3 B4
⎡
⎣⎢⎢
⎤
⎦⎥⎥
A1 A2
A3 A4
⎡
⎣⎢⎢
⎤
⎦⎥⎥
=B1A1 +B2A3( ) B1A2 +B2A4( )B3A1 +B4A3( ) B3A2 +B4A4( )
⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
=Im 00 In
⎡
⎣⎢⎢
⎤
⎦⎥⎥
BA product is Identity Matrix
Submatrices of B
26
B3 = −A4−1A3B1
A1B1 −A2A4−1A3B1 = Im
A1 −A2A4−1A3( )B1 = Im
B1 = A1 −A2A4−1A3( )−1
B3 = −A4−1A3 A1 −A2A4
−1A3( )−1
From first column of AB
Solutions for B
27
B =A1 −A2A4
−1A3( )−1 −A1−1A2 A4 −A3A1
−1A2( )−1
−A4−1A3 A1 −A2A4
−1A3( )−1 A4 −A3A1−1A2( )−1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
Similar solutions from second column of AB
Alternative solutions from BA
B =A1 −A2A4
−1A3( )−1 − A1 −A2A4−1A3( )−1A2
−1A4
− A4 −A3A1−1A2( )−1A3
−1A1 A4 −A3A1−1A2( )−1
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
Matrix Inversion Lemma
Matrix inversion lemma for symmetric A
28
B4 = In −B3A2[ ]A4−1
Substitution and elimination in BA
A4 −A2TA1
−1A2⎡⎣ ⎤⎦ =
= A4−1 −A4
−1A2T A2A4
−1A2T −A1⎡⎣ ⎤⎦
−1A2A4
−1
For symmetric AA1,A4 ,B1, and B4 are symmetric
A3 = A2T and B3 = B2
T
Apply Matrix Inversion Lemma in Optimal Estimate
First, define
P1−1 H1
TR1−1H1
From matrix inversion lemma
H1TR1
−1H1 +H2TR2
−1H2( )−1 = P1−1 +H2
TR2−1H2( )−1
= P1 − P1H2T H2P1H2
T +R2( )−1H2P1
29
dim P1( ) = n × n( )
Improved Estimate Incorporating New Measurement Set
New estimate is a correction to the old
x1 = P1H1TR1
−1z1
x2 = x1 − P1H2T H2P1H2
T +R2( )−1H2x1
+P1H2T In − H2P1H2
T +R2( )−1H2P1H2T⎡
⎣⎤⎦R2
−1z2
= In − H2P1H2T +R2( )−1H2P1H2
T⎡⎣
⎤⎦ x1
+P1H2T In − H2P1H2
T +R2( )−1H2P1H2T⎡
⎣⎤⎦R2
−1z230
Simplify Optimal Estimate Incorporating New Measurement Set
I = A−1A = AA−1, with A H2P1H2
T + R2
x2 = x1 − P1H2T H2P1H2
T +R2( )−1 z2 −H2x1( )! x1 −K z2 −H2x1( )
Estimator gain matrix
K = P1H2T H2P1H2
T + R2( )−131
Recursive Optimal Estimate! Prior estimate may be based on prior
incremental estimate, and so on! Generalize to a recursive form, with
sequential index i
xi = xi−1 − Pi−1HiT HiPi−1Hi
T +Ri( )−1 zi −Hixi−1( )! xi−1 −Ki zi −Hixi−1( )
dim x( ) = n ×1; dim P( ) = n × ndim z( ) = r ×1; dim R( ) = r × rdim H( ) = r × n; dim K( ) = n × r
32
with
Pi = Pi−1−1 +Hi
TRi−1Hi( )−1
Example of Recursive Optimal Estimate
z = x + n
xi = xi−1 + pi−1 pi−1 +1( )−1 zi − xi−1( )! xi−1 + ki zi − xi−1( )
H = 1; R = 1
x0 = z0x1 = 0.5 x0 + 0.5z1x2 = 0.667 x1 + 0.333z2x3 = 0.75 x2 + 0.25z3x4 = 0.8 x3 + 0.2z4
index p-sub-i k-sub-i0 1 -1 0.5 0.52 0.333 0.3333 0.25 0.254 0.2 0.25 0.167 0.167
ki = pi−1 pi−1 +1( )−1 = pi−1pi−1 +1( )
33
pi = pi−1−1 +1( )−1 = 1
pi−1−1 +1( )
Optimal Gain and Estimate-Error Covariance
! With constant estimation error matrix, R,! Error covariance decreases at each step! Estimator gain matrix, K, invariably goes to zero
as number of samples increases! Why?! Each new sample has smaller effect on the
average than the sample before
Pi = Pi−1−1 +Hi
TRi−1Hi( )−1
Ki = Pi−1HiT HiPi−1Hi
T +Ri( )−1
34
Next Time:Propagation of Uncertainty
in Dynamic Systems
35
Supplemental Material!
36
Covariance Matrix is Expected Value of the Outer Product
! Transcript correlation for normal and abnormal samples! Identifies possible components of biological circuits
x =
x =
! Sample tissue correlation for RNA expression! Confirms or questions the classification of samples
–1 0 +1
37