lecture 6 - parameter estimation · elements of a, b, c, and d could be non-linear in the parameter...

Lecture 6

Pr inc ip les of Model ing for Cyber-Phys ica l Systems

Instructor: Madhur Behl

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 1

Parameter estimation

But first..

•Worksheet 3 is out:• Towards Matlab implementation of the model.• Nominal values of parameters from EnergyPlus IDF file.• Specify model structure in Matlab.• Collect training data.• Use the templates provided to save time.• Due in 1.5 week. Thursday, Oct 4, by 2:00pm


1/Ugw

1/Ugi

1/Ugo

Cgo

Cgi

Tgo

Tg

Tgi

Tz

Q.rad,e

1/Uei1/Uew1/Ueo

Ceo Cei

Ta

Teo Tei

Q.sol,e

Q.solt/2

1/Uci

1/Ucw

1/UcoCci

Cci

Ta

Q.sol,c

Q.rad,c

Ta

Q.conv + Q.

sens

1/Uii 1/Uiw 1/Uio

Cii Cio

Tci

Tco

1/Uwin Tii Tio

Q.solt/2

Q.rad,g

[ExternalWalls]

Ti

[Ceiling]

[Floor]

[InternalWalls]

[Windows]

How to find the values of the parameters ?


Previously..

Parameter estimation overview

• Simple Linear Regression

• Least squares

• Non-linear least squares

• State-space sum of squared errors

• Non-linear optimization (estimation) methods

• Global and local search

•MATLAB implementations


Simple Linear Regression

Suppose we collect some data and want to determine the relation between the observed values, y, and the independent variable, x:


y

x

response (dependent variable)

predictor (independent variable)

We can model the data using a linear model

iii xββy 10 Î++=

observedresponse

unknownintercept

unknownslope

unknownrandom

error



y

x

response (dependent variable)

predictor (independent variable)

iii xββy 10 Î++=

observedresponse

unknownintercept

unknownslope

unknownrandom

error

b0 and b1 are the parameters of this linear model.

• Don’t know the true values of the parameters. • Estimate them using the assumed model and the observations (data)



• Estimate b0 and b1 to obtain the best fit of the simulated values to the observations.

• One method: Minimize sum of squared errors, or residuals.

y

x

xbby 10 +=¢

iii yye ¢-=

iii exbby ++= 10

observedresponse

estimate of b0

estimate of b1

residual



y

x

xbby 10 +=¢

iii yye ¢-=

iii exbby ++= 10

observedresponse

estimate of b0

estimate of b1

residual

simulated values, yi¢

Sum of squared residuals: ååå===

--=¢-==n

ii

n

iii

n

ii x)bb(y)y(ye),bS(b

1

10

11

10

222

To minimize: bS 00=

¶¶

Set and bS 01=

¶¶



This results in the normal equations:å=å+å

å=å+

===

==

n

iii

n

ii

n

ii

n

ii

n

ii

yxxbxb

yxbnb

11

21

10

1110

• Solve these equations to obtain expressions for b0 and b1, the parameter estimates that give the best fit of the simulated and observed values.

ååå===

--=¢-==n

ii

n

iii

n

ii x)bb(y)y(ye),bS(b

1

10

11

10

222

bS 00=

¶¶

Set and bS 01=

¶¶

Linear regression model: , i=1.n è

Linear Regression in Matrix Form


iii exbby ++= 10 e bXy +=

úúúú

û

ù

êêêê

ë

é

=

ny

yy

y!2

1

úúúú

û

ù

êêêê

ë

é

=

nx

xx

X

1

11

2

1

!! úû

ùêë

é=

1

0bb

búúúú

û

ù

êêêê

ë

é

=

ne

ee

e!2

1

vector ofobserved

values

matrix ofPredictors/

features

vector ofresiduals

vector ofparameters

Linear regression model: , i=1.n è

Linear Regression in Matrix Form


iii exbby ++= 10 e bXy +=

• The normal equations (b’ is the vector of least-squares estimates of b):

å=å+å

å=å+

===

==

n

iii

n

ii

n

ii

n

ii

n

ii

yxxbxb

yxbnb

11

21

10

1110Using

summationsAnd setting the derivative to 0

yXbXX TT =¢ yXXXb TT 1)( -=¢è

Using matrix notation:

Ordinary least squares


Linear versus Nonlinear Models


Linear models: Sensitivities of the output are not a function of the model parameters:

10=

¶¢¶byi

ii xby=

¶¢¶

1 úúúú

û

ù

êêêê

ë

é

=

nx

xx

X

1

11

2

1

!!and ; recall

ii xbby 10 +=¢

b1

b2

S(b)

Linear versus Nonlinear parameters


b1

b2

S(b)

• Linear models have elliptical objective function surfaces.

• i.e. the level sets of the objective function (sum of errors squared) are ellipsis.

With two parameter

One step to get to the minimum.

Nonlinear parametric models: Sensitivities are a function of the model parameters.

Nonlinearity is in parameter space.


!(# + 1) = () # !(#) + *)(#)+(#),(#) = -) # !(#) + .)(#)+(#)

Elements of A, B, C, and D could be non-linear in the parameter )

Nonlinear Estimation


Suppose that we have collected data on the output/response Y (n samples),

◦ (y1, y

2, ...y

n)

corresponding to n sets of values of the independent variables/predictors/features

X1, X

2, ... and X

p

◦ (x11

, x21

, ..., xp1

) ,

◦ (x12

, x22

, ..., xp2

),

◦ ... and

◦ (x1n

, x2n

, ..., xpn

).



For possible values q1, q2, ... , qq of the parameters, the residual sum of squares function

( ) ( )å=

-=n

iiiq yy,, ,S

1

221 ˆqqq ! ( )[ ]å

=

-=n

iqpiiii ,θ, ,θ|θ,x, ,xxfy

1

22121 !!

),, ,|,x, ,xf(xy qpiiii qqq !! 2121ˆ =



( ) ( )å=

-=n

iiiq yy,, ,S

1

221 ˆqqq ! ( )[ ]å

=

-=n


1

22121 !!

The least squares estimates of q1, q2, ... , qq, are values which minimize S(q1, q2, ... , qq).



( ) ( )å=

-=n

iiiq yy,, ,S

1

221 ˆqqq ! ( )[ ]å

=

-=n


1

22121 !!

To find the least squares estimate we need to determine when all the derivatives S(q1, q2, ... , qq) with respect to each parameter q1, q2, ... and qq are equal to zero.

This will involve, terms with partial derivatives of the non-linear function f.

!"(… )!&'

, !"(… )!&)

, … , !"(… )!&*



Closed form analytical solutions are not possible.

!"(… )!&'

, !"(… )!&)

, … , !"(… )!&*

It is usually necessary to develop an iterative technique for solving them

Recall..


!(# + 1) = () # !(#) + *)(#)+(#),(#) = -) # !(#) + .)(#)+(#)

/,(#) = 0(/! # , + # , 2)3, … , 2)5)

How can we compute the sum of squared error for state-space models ?


!(# + 1) = ()!(#) + *)+(#),(#) = -) (#) + .)+(#)

Consider the LTI model

sum of squared error for state-space models


!(1) = &'!(0) + *'+(1) , 1 = -'! 1 + +.'+(1)

! 0 = !/, 123 + 4 , 4 = 1,…6Given

,(1) = -'&'!(0) + -'*'+ 1 + .'+(1)!(2) = &'!(1) + *'+(2)!(2) = &'&'!(0) + &'*'+(1) + *'+(2) ,(2) = -'!(2) + .'+(2)

,(2) = -'&'&'!(0) + -'&'*'+(1) + -'*'+(2) + .'+(2)

, 0 = -'! 0 + .'+(0)



!(0)!(1)⋮

!(' − 1)= * + 0 + -

.(0)

.(1)⋮

.(' − 1)

For a given estimate of / , this is the 0! vector ( ) ( )å=

-=n

iiiq yy,, ,S

1

221 ˆqqq !



! =#$#$%$⋮

#$%$'()* =

+$ 0 …#$.$ +$ 0 …⋮

#$%$'(/.$⋮

#$%$'(0.$⋮

…#$.$ +$



!"# = !"# %# = argmin"∈- .#(", %#)

%#Let be the given data-set {34, 56, 7 = 1,… , :}

.#(", %#) is the squared error i.e. .# ", %# = <4=>

#?4 " ?4@(")

?4 " = A4 − CA4(")

Measured Predicted (for a particular value of " )

Non-linear least squares


We will cover the following methods:

1) Steepest descent (or Gradient descent) and Newton’s method, 2) Gauss Newton and Linearization, and 3) Levenberg-Marquardt's procedure.

1. In each case a iterative procedure is used to find the least squares estimators :

2. That is an initial estimates, ,for these values are determined.

3. Iteratively find better estimates, that hopefully converge to the least squares estimates,

q,, , qqq ˆˆˆ21 !

002

01 q,, , qqq !

iq

ii ,, , qqq !21

Steepest Descent


• The steepest descent method focuses on determining the values of q1, q2, ... , qq that minimize the sum of squares function, S(q1, q2, ... , qq). • The basic idea is to determine from an initial point,

and the tangent plane to S(q1, q2, ... , qq) at this point, the vector along which the function S(q1, q2, ... , qq) will be decreasing at the fastest rate. •The method of steepest descent than moves from this initial point along the direction of steepest descent until the value of S(q1, q2, ... , qq) stops decreasing.

002

01 q,, , qqq !

Steepest Descent


• It uses this point, as the next approximation to the value that minimizes S(q1, q2, ... , qq).

• The procedure than continues until the successive approximation arrive at a point where the sum of squares function, S(q1, q2, ... , qq) is minimized.

• At that point, the tangent plane to S(q1, q2, ... , qq) will be horizontal and there will be no direction of steepest descent.

112

11 q,, , qqq !

To find a local minimum of a function using steepest descent, one takes steps proportional to the negative of the gradient of the function at the current point.

Steepest Descent


Steepest Descent


Initialize k=0, choose !0

While k<kmax

Steepest Descent

!" = !"$% − '((!"$%)Gradient


Steepest Descent


•While, theoretically, the steepest descent method will converge, it may do so in practice with agonizing slowness after some rapid initial progress.

• Slow convergence is particularly likely when the S(q1, q2, ... , qq) contours highly curved and it happens when the path of steepest descent zigzags slowly up a narrow ridge, each iteration bringing only a slight reduction in S(q1, q2, ... , qq).

• A further disadvantage of the steepest descent method is that it is not scale invariant.

• The steepest descent method is, on the whole, slightly less favored than the linearization method (described later) but will work satisfactorily for many nonlinear problems

Steepest Descent


Gradient descent is a local optimization method

38

Steepest Descent

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]

Least squares in general

Most optimization problem can be formulated as a nonlinear least squares problem

)()(21minarg xfxfx T

x=*

å=

* =m

iix xfx

1

2))((21minarg

Where , i=1,…,m are given functions, and m>=n RRf ni !:


Newton’s Method

Quadratic approximation

What’s the minimum solution of the quadratic approximation

2)(21)()()( xxfxxfxfxxf D¢¢+D¢+»D+

)()(xfxf

x¢¢¢

-=D


Newton’s Method

High dimensional case:

What’s the optimal direction?

xxHxxxFxFxxF T DD+DÑ+»D+ )(21)()()(

)()( 1 xFxHx Ñ-»D -


Newton’s Method

Initialize k=0, choose x0

While k<kmax

)()( 11

1 --

- Ñ-= kkk xFxHxx l


Newton’s Method

•Finding the inverse of the Hessian matrix is often expensive

•Approximation methods are often used• conjugate gradient method• quasi-newton method


Newton’s Method


Newton’s Method


min$%(') ')*+ = ') − %′('))

%′′(')) ')*+ = ') − /0+. 2%

f(x)

xkxk+1

q(x)

Terminology


Newton’s Method


min$%(') ')*+ = ') − %′('))

%′′(')) ')*+ = ') − /0+. 2%

Newton’s Method


!"# $ % : ℛ( → ℛ be sufficiently smooth

Taylor’s approximation: $ % ≈ $ + + -. % − + + 12 % − + .2 % − + + ℎ. 5. #.For close to point ‘a’

%.2% − 2+.2% + +.2+- = 7$(+) 2 = 7:$ +

; % = 12 %

.2% + <.% + = < = - − 2+where

7; = 0 ⇒ 2% + < = 0 ⇒ % = −2@A< = −2@A- + + = + − 2@A-

% = + − 2@A- %BCA = %B − 2@A. 7$

Newton’s Method


!" = 0 ⇒ &' + ) !2" > 0For minima !2" = &Minima if H is PSD

1) Initialize:

2) Iterate: ',-. = ', − &0.. 2

'3

2 = !4(',) & = !24(',)

1) H may fail to be PSD

2) H may not be invertible.

3) Difficult to compute H in practice through numerical methods

Recall: Non-linear least squares

! " =$%&'

()% "

*= )(") *

*

)% " = -% − /-% ) " = )' " , )* " , … , )((") 2

The j-th component of the vector r(x) is the residual



N

N



!" # =%&'(

)*& # !*& # = +(#).* #

!2" # =%&'(

)!*& # !*& # 0 +%

&'(

)*& # !2*& #

= + # 0+(#) +%&'(

)*& # !2*& #


Gauss-Newton Method

! " #!(") +'()*

+,( " -2,( "

Residuals are small when close to the optimal


Lets talk about curvatures..


Nope, its about space-time curvatures

dude.

Newton does not have a good track record of accounting for

curvatures

Newton’s method cannot use negative curvature

Are you serious right now ! Lets see you try to

invent calculus..


Newton’s method cannot use negative curvature

• We can progress if we use a positive definite approximation of the Hessian matrix of f(x).

• One possibility is to approximate H by the identity matrix I (always PD)• This will be the same as steepest descent:• Too slow, + convergence issues

• Instead use !" = "$ + &'• High value of & == steepest (gradient) descent.• Low value == Newton or Gauss Newton method

($)* = ($ − ",*. .

($)* = ($ − ∆.


Levenberg-Marquardt Method


Illustration of Levenberg-Marquardt gradient descent

59

Contour Line

Steepest descent direction

Newtondirection

Increasing

Decreasing

!



1) Adapt the value of λ during the optimization.

2) If the iteration was successful (F(xk+1) < F(xk))

a) Decrease λ and try to use as much curvature information as possible.

3) If the previous iteration was unsuccessful (F(xk+1) > F(xk))

a) Increase λ and use only basic gradient information.

4) Trust Region Algorithm


Levenberg-Marquardt from 3 initial points


Stopping CriteriaCriterion 1: reach the number of iteration specified by the user

K>kmax



Criterion 2: when the current function value is smaller than a user-specified threshold

K>kmax

F(xk)<σuser



Criterion 2: when the current function value is smaller than a user-

specified threshold

Criterion 3: when the change of function value is smaller than a user

specified threshold

K>kmax

F(xk)<σuser

||F(xk)-F(xk-1)||<εuser


Multi-start search


• Several points as initial guesses for regression and the regression is performed for each point.

1) Choose randomly..2) Choose within some

neighborhood of nominal values.

NLLS in Matlab


Example: nlinfit


lecture 6 - parameter estimation · elements of a, b, c, and d could be non-linear in the parameter...

Documents