lecture 6 - parameter estimation · elements of a, b, c, and d could be non-linear in the parameter...
TRANSCRIPT
Lecture 6
Pr inc ip les of Model ing for Cyber-Phys ica l Systems
Instructor: Madhur Behl
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 1
Parameter estimation
But first..
•Worksheet 3 is out:• Towards Matlab implementation of the model.• Nominal values of parameters from EnergyPlus IDF file.• Specify model structure in Matlab.• Collect training data.• Use the templates provided to save time.• Due in 1.5 week. Thursday, Oct 4, by 2:00pm
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 2
1/Ugw
1/Ugi
1/Ugo
Cgo
Cgi
Tgo
Tg
Tgi
Tz
Q.rad,e
1/Uei1/Uew1/Ueo
Ceo Cei
Ta
Teo Tei
Q.sol,e
Q.solt/2
1/Uci
1/Ucw
1/UcoCci
Cci
Ta
Q.sol,c
Q.rad,c
Ta
Q.conv + Q.
sens
1/Uii 1/Uiw 1/Uio
Cii Cio
Tci
Tco
1/Uwin Tii Tio
Q.solt/2
Q.rad,g
[ExternalWalls]
Ti
[Ceiling]
[Floor]
[InternalWalls]
[Windows]
How to find the values of the parameters ?
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 3
Previously..
Parameter estimation overview
• Simple Linear Regression
• Least squares
• Non-linear least squares
• State-space sum of squared errors
• Non-linear optimization (estimation) methods
• Global and local search
•MATLAB implementations
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 4
Simple Linear Regression
Suppose we collect some data and want to determine the relation between the observed values, y, and the independent variable, x:
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 5
y
x
response (dependent variable)
predictor (independent variable)
We can model the data using a linear model
iii xββy 10 Î++=
observedresponse
unknownintercept
unknownslope
unknownrandom
error
Simple Linear Regression
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 6
y
x
response (dependent variable)
predictor (independent variable)
iii xββy 10 Î++=
observedresponse
unknownintercept
unknownslope
unknownrandom
error
b0 and b1 are the parameters of this linear model.
• Don’t know the true values of the parameters. • Estimate them using the assumed model and the observations (data)
Simple Linear Regression
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 7
• Estimate b0 and b1 to obtain the best fit of the simulated values to the observations.
• One method: Minimize sum of squared errors, or residuals.
y
x
xbby 10 +=¢
iii yye ¢-=
iii exbby ++= 10
observedresponse
estimate of b0
estimate of b1
residual
Simple Linear Regression
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 8
y
x
xbby 10 +=¢
iii yye ¢-=
iii exbby ++= 10
observedresponse
estimate of b0
estimate of b1
residual
simulated values, yi¢
Sum of squared residuals: ååå===
--=¢-==n
ii
n
iii
n
ii x)bb(y)y(ye),bS(b
1
10
11
10
222
To minimize: bS 00=
¶¶
Set and bS 01=
¶¶
Simple Linear Regression
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 9
This results in the normal equations:å=å+å
å=å+
===
==
n
iii
n
ii
n
ii
n
ii
n
ii
yxxbxb
yxbnb
11
21
10
1110
• Solve these equations to obtain expressions for b0 and b1, the parameter estimates that give the best fit of the simulated and observed values.
ååå===
--=¢-==n
ii
n
iii
n
ii x)bb(y)y(ye),bS(b
1
10
11
10
222
bS 00=
¶¶
Set and bS 01=
¶¶
Linear regression model: , i=1.n è
Linear Regression in Matrix Form
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 10
iii exbby ++= 10 e bXy +=
úúúú
û
ù
êêêê
ë
é
=
ny
yy
y!2
1
úúúú
û
ù
êêêê
ë
é
=
nx
xx
X
1
11
2
1
!! úû
ùêë
é=
1
0bb
búúúú
û
ù
êêêê
ë
é
=
ne
ee
e!2
1
vector ofobserved
values
matrix ofPredictors/
features
vector ofresiduals
vector ofparameters
Linear regression model: , i=1.n è
Linear Regression in Matrix Form
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 11
iii exbby ++= 10 e bXy +=
• The normal equations (b’ is the vector of least-squares estimates of b):
å=å+å
å=å+
===
==
n
iii
n
ii
n
ii
n
ii
n
ii
yxxbxb
yxbnb
11
21
10
1110Using
summationsAnd setting the derivative to 0
yXbXX TT =¢ yXXXb TT 1)( -=¢è
Using matrix notation:
Linear versus Nonlinear Models
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 16
Linear models: Sensitivities of the output are not a function of the model parameters:
10=
¶¢¶byi
ii xby=
¶¢¶
1 úúúú
û
ù
êêêê
ë
é
=
nx
xx
X
1
11
2
1
!!and ; recall
ii xbby 10 +=¢
b1
b2
S(b)
Linear versus Nonlinear parameters
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 17
b1
b2
S(b)
• Linear models have elliptical objective function surfaces.
• i.e. the level sets of the objective function (sum of errors squared) are ellipsis.
With two parameter
One step to get to the minimum.
Nonlinear parametric models: Sensitivities are a function of the model parameters.
Nonlinearity is in parameter space.
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 18
!(# + 1) = () # !(#) + *)(#)+(#),(#) = -) # !(#) + .)(#)+(#)
Elements of A, B, C, and D could be non-linear in the parameter )
Nonlinear Estimation
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 19
Suppose that we have collected data on the output/response Y (n samples),
◦ (y1, y
2, ...y
n)
corresponding to n sets of values of the independent variables/predictors/features
X1, X
2, ... and X
p
◦ (x11
, x21
, ..., xp1
) ,
◦ (x12
, x22
, ..., xp2
),
◦ ... and
◦ (x1n
, x2n
, ..., xpn
).
Nonlinear Estimation
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 20
For possible values q1, q2, ... , qq of the parameters, the residual sum of squares function
( ) ( )å=
-=n
iiiq yy,, ,S
1
221 ˆqqq ! ( )[ ]å
=
-=n
iqpiiii ,θ, ,θ|θ,x, ,xxfy
1
22121 !!
),, ,|,x, ,xf(xy qpiiii qqq !! 2121ˆ =
Nonlinear Estimation
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 21
( ) ( )å=
-=n
iiiq yy,, ,S
1
221 ˆqqq ! ( )[ ]å
=
-=n
iqpiiii ,θ, ,θ|θ,x, ,xxfy
1
22121 !!
The least squares estimates of q1, q2, ... , qq, are values which minimize S(q1, q2, ... , qq).
Nonlinear Estimation
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 22
( ) ( )å=
-=n
iiiq yy,, ,S
1
221 ˆqqq ! ( )[ ]å
=
-=n
iqpiiii ,θ, ,θ|θ,x, ,xxfy
1
22121 !!
To find the least squares estimate we need to determine when all the derivatives S(q1, q2, ... , qq) with respect to each parameter q1, q2, ... and qq are equal to zero.
This will involve, terms with partial derivatives of the non-linear function f.
!"(… )!&'
, !"(… )!&)
, … , !"(… )!&*
Nonlinear Estimation
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 23
Closed form analytical solutions are not possible.
!"(… )!&'
, !"(… )!&)
, … , !"(… )!&*
It is usually necessary to develop an iterative technique for solving them
Recall..
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 24
!(# + 1) = () # !(#) + *)(#)+(#),(#) = -) # !(#) + .)(#)+(#)
/,(#) = 0(/! # , + # , 2)3, … , 2)5)
How can we compute the sum of squared error for state-space models ?
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 25
!(# + 1) = ()!(#) + *)+(#),(#) = -) (#) + .)+(#)
Consider the LTI model
sum of squared error for state-space models
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 26
!(1) = &'!(0) + *'+(1) , 1 = -'! 1 + +.'+(1)
! 0 = !/, 123 + 4 , 4 = 1,…6Given
,(1) = -'&'!(0) + -'*'+ 1 + .'+(1)!(2) = &'!(1) + *'+(2)!(2) = &'&'!(0) + &'*'+(1) + *'+(2) ,(2) = -'!(2) + .'+(2)
,(2) = -'&'&'!(0) + -'&'*'+(1) + -'*'+(2) + .'+(2)
, 0 = -'! 0 + .'+(0)
sum of squared error for state-space models
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 27
!(0)!(1)⋮
!(' − 1)= * + 0 + -
.(0)
.(1)⋮
.(' − 1)
For a given estimate of / , this is the 0! vector ( ) ( )å=
-=n
iiiq yy,, ,S
1
221 ˆqqq !
sum of squared error for state-space models
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 28
! =#$#$%$⋮
#$%$'()* =
+$ 0 …#$.$ +$ 0 …⋮
#$%$'(/.$⋮
#$%$'(0.$⋮
…#$.$ +$
Nonlinear Estimation
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 29
!"# = !"# %# = argmin"∈- .#(", %#)
%#Let be the given data-set {34, 56, 7 = 1,… , :}
.#(", %#) is the squared error i.e. .# ", %# = <4=>
#?4 " ?4@(")
?4 " = A4 − CA4(")
Measured Predicted (for a particular value of " )
Non-linear least squares
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 30
We will cover the following methods:
1) Steepest descent (or Gradient descent) and Newton’s method, 2) Gauss Newton and Linearization, and 3) Levenberg-Marquardt's procedure.
1. In each case a iterative procedure is used to find the least squares estimators :
2. That is an initial estimates, ,for these values are determined.
3. Iteratively find better estimates, that hopefully converge to the least squares estimates,
q,, , qqq ˆˆˆ21 !
002
01 q,, , qqq !
iq
ii ,, , qqq !21
Steepest Descent
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 31
• The steepest descent method focuses on determining the values of q1, q2, ... , qq that minimize the sum of squares function, S(q1, q2, ... , qq). • The basic idea is to determine from an initial point,
and the tangent plane to S(q1, q2, ... , qq) at this point, the vector along which the function S(q1, q2, ... , qq) will be decreasing at the fastest rate. •The method of steepest descent than moves from this initial point along the direction of steepest descent until the value of S(q1, q2, ... , qq) stops decreasing.
002
01 q,, , qqq !
Steepest Descent
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 32
• It uses this point, as the next approximation to the value that minimizes S(q1, q2, ... , qq).
• The procedure than continues until the successive approximation arrive at a point where the sum of squares function, S(q1, q2, ... , qq) is minimized.
• At that point, the tangent plane to S(q1, q2, ... , qq) will be horizontal and there will be no direction of steepest descent.
112
11 q,, , qqq !
To find a local minimum of a function using steepest descent, one takes steps proportional to the negative of the gradient of the function at the current point.
Steepest Descent
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 33
Initialize k=0, choose !0
While k<kmax
Steepest Descent
!" = !"$% − '((!"$%)Gradient
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 35
Steepest Descent
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 36
•While, theoretically, the steepest descent method will converge, it may do so in practice with agonizing slowness after some rapid initial progress.
• Slow convergence is particularly likely when the S(q1, q2, ... , qq) contours highly curved and it happens when the path of steepest descent zigzags slowly up a narrow ridge, each iteration bringing only a slight reduction in S(q1, q2, ... , qq).
• A further disadvantage of the steepest descent method is that it is not scale invariant.
• The steepest descent method is, on the whole, slightly less favored than the linearization method (described later) but will work satisfactorily for many nonlinear problems
Gradient descent is a local optimization method
38
Steepest Descent
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]
Least squares in general
Most optimization problem can be formulated as a nonlinear least squares problem
)()(21minarg xfxfx T
x=*
å=
* =m
iix xfx
1
2))((21minarg
Where , i=1,…,m are given functions, and m>=n RRf ni !:
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 39
Newton’s Method
Quadratic approximation
What’s the minimum solution of the quadratic approximation
2)(21)()()( xxfxxfxfxxf D¢¢+D¢+»D+
)()(xfxf
x¢¢¢
-=D
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 40
Newton’s Method
High dimensional case:
What’s the optimal direction?
xxHxxxFxFxxF T DD+DÑ+»D+ )(21)()()(
)()( 1 xFxHx Ñ-»D -
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 41
Newton’s Method
Initialize k=0, choose x0
While k<kmax
)()( 11
1 --
- Ñ-= kkk xFxHxx l
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 42
Newton’s Method
•Finding the inverse of the Hessian matrix is often expensive
•Approximation methods are often used• conjugate gradient method• quasi-newton method
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 43
Newton’s Method
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 45
min$%(') ')*+ = ') − %′('))
%′′(')) ')*+ = ') − /0+. 2%
f(x)
xkxk+1
q(x)
Newton’s Method
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 47
min$%(') ')*+ = ') − %′('))
%′′(')) ')*+ = ') − /0+. 2%
Newton’s Method
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 48
!"# $ % : ℛ( → ℛ be sufficiently smooth
Taylor’s approximation: $ % ≈ $ + + -. % − + + 12 % − + .2 % − + + ℎ. 5. #.For close to point ‘a’
%.2% − 2+.2% + +.2+- = 7$(+) 2 = 7:$ +
; % = 12 %
.2% + <.% + = < = - − 2+where
7; = 0 ⇒ 2% + < = 0 ⇒ % = −2@A< = −2@A- + + = + − 2@A-
% = + − 2@A- %BCA = %B − 2@A. 7$
Newton’s Method
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 49
!" = 0 ⇒ &' + ) !2" > 0For minima !2" = &Minima if H is PSD
1) Initialize:
2) Iterate: ',-. = ', − &0.. 2
'3
2 = !4(',) & = !24(',)
1) H may fail to be PSD
2) H may not be invertible.
3) Difficult to compute H in practice through numerical methods
Recall: Non-linear least squares
! " =$%&'
()% "
*= )(") *
*
)% " = -% − /-% ) " = )' " , )* " , … , )((") 2
The j-th component of the vector r(x) is the residual
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 50
Non-linear least squares
N
N
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 51
Non-linear least squares
!" # =%&'(
)*& # !*& # = +(#).* #
!2" # =%&'(
)!*& # !*& # 0 +%
&'(
)*& # !2*& #
= + # 0+(#) +%&'(
)*& # !2*& #
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 52
Gauss-Newton Method
! " #!(") +'()*
+,( " -2,( "
Residuals are small when close to the optimal
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 53
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]
Lets talk about curvatures..
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]
Nope, its about space-time curvatures
dude.
Newton does not have a good track record of accounting for
curvatures
Newton’s method cannot use negative curvature
Are you serious right now ! Lets see you try to
invent calculus..
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 56
Newton’s method cannot use negative curvature
• We can progress if we use a positive definite approximation of the Hessian matrix of f(x).
• One possibility is to approximate H by the identity matrix I (always PD)• This will be the same as steepest descent:• Too slow, + convergence issues
• Instead use !" = "$ + &'• High value of & == steepest (gradient) descent.• Low value == Newton or Gauss Newton method
($)* = ($ − ",*. .
($)* = ($ − ∆.
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 57
Levenberg-Marquardt Method
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 58
Illustration of Levenberg-Marquardt gradient descent
59
Contour Line
Steepest descent direction
Newtondirection
Increasing
Decreasing
!
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]
Levenberg-Marquardt Method
1) Adapt the value of λ during the optimization.
2) If the iteration was successful (F(xk+1) < F(xk))
a) Decrease λ and try to use as much curvature information as possible.
3) If the previous iteration was unsuccessful (F(xk+1) > F(xk))
a) Increase λ and use only basic gradient information.
4) Trust Region Algorithm
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 60
Levenberg-Marquardt Method
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 61
Levenberg-Marquardt from 3 initial points
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 62
Levenberg-Marquardt from 3 initial points
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 63
Stopping CriteriaCriterion 1: reach the number of iteration specified by the user
K>kmax
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 64
Stopping CriteriaCriterion 1: reach the number of iteration specified by the user
Criterion 2: when the current function value is smaller than a user-specified threshold
K>kmax
F(xk)<σuser
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 65
Stopping CriteriaCriterion 1: reach the number of iteration specified by the user
Criterion 2: when the current function value is smaller than a user-
specified threshold
Criterion 3: when the change of function value is smaller than a user
specified threshold
K>kmax
F(xk)<σuser
||F(xk)-F(xk-1)||<εuser
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 66
Multi-start search
Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 67
• Several points as initial guesses for regression and the regression is performed for each point.
1) Choose randomly..2) Choose within some
neighborhood of nominal values.