lecture 6 - parameter estimation · elements of a, b, c, and d could be non-linear in the parameter...

72
Lecture 6 Principles of Modeling for Cyber-Physical Systems Instructor: Madhur Behl Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 1 Parameter estimation

Upload: others

Post on 06-Oct-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Lecture 6

Pr inc ip les of Model ing for Cyber-Phys ica l Systems

Instructor: Madhur Behl

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 1

Parameter estimation

But first..

•Worksheet 3 is out:• Towards Matlab implementation of the model.• Nominal values of parameters from EnergyPlus IDF file.• Specify model structure in Matlab.• Collect training data.• Use the templates provided to save time.• Due in 1.5 week. Thursday, Oct 4, by 2:00pm

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 2

1/Ugw

1/Ugi

1/Ugo

Cgo

Cgi

Tgo

Tg

Tgi

Tz

Q.rad,e

1/Uei1/Uew1/Ueo

Ceo Cei

Ta

Teo Tei

Q.sol,e

Q.solt/2

1/Uci

1/Ucw

1/UcoCci

Cci

Ta

Q.sol,c

Q.rad,c

Ta

Q.conv + Q.

sens

1/Uii 1/Uiw 1/Uio

Cii Cio

Tci

Tco

1/Uwin Tii Tio

Q.solt/2

Q.rad,g

[ExternalWalls]

Ti

[Ceiling]

[Floor]

[InternalWalls]

[Windows]

How to find the values of the parameters ?

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 3

Previously..

Parameter estimation overview

• Simple Linear Regression

• Least squares

• Non-linear least squares

• State-space sum of squared errors

• Non-linear optimization (estimation) methods

• Global and local search

•MATLAB implementations

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 4

Simple Linear Regression

Suppose we collect some data and want to determine the relation between the observed values, y, and the independent variable, x:

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 5

y

x

response (dependent variable)

predictor (independent variable)

We can model the data using a linear model

iii xββy 10 Î++=

observedresponse

unknownintercept

unknownslope

unknownrandom

error

Simple Linear Regression

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 6

y

x

response (dependent variable)

predictor (independent variable)

iii xββy 10 Î++=

observedresponse

unknownintercept

unknownslope

unknownrandom

error

b0 and b1 are the parameters of this linear model.

• Don’t know the true values of the parameters. • Estimate them using the assumed model and the observations (data)

Simple Linear Regression

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 7

• Estimate b0 and b1 to obtain the best fit of the simulated values to the observations.

• One method: Minimize sum of squared errors, or residuals.

y

x

xbby 10 +=¢

iii yye ¢-=

iii exbby ++= 10

observedresponse

estimate of b0

estimate of b1

residual

Simple Linear Regression

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 8

y

x

xbby 10 +=¢

iii yye ¢-=

iii exbby ++= 10

observedresponse

estimate of b0

estimate of b1

residual

simulated values, yi¢

Sum of squared residuals: ååå===

--=¢-==n

ii

n

iii

n

ii x)bb(y)y(ye),bS(b

1

10

11

10

222

To minimize: bS 00=

¶¶

Set and bS 01=

¶¶

Simple Linear Regression

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 9

This results in the normal equations:å=å+å

å=å+

===

==

n

iii

n

ii

n

ii

n

ii

n

ii

yxxbxb

yxbnb

11

21

10

1110

• Solve these equations to obtain expressions for b0 and b1, the parameter estimates that give the best fit of the simulated and observed values.

ååå===

--=¢-==n

ii

n

iii

n

ii x)bb(y)y(ye),bS(b

1

10

11

10

222

bS 00=

¶¶

Set and bS 01=

¶¶

Linear regression model: , i=1.n è

Linear Regression in Matrix Form

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 10

iii exbby ++= 10 e bXy +=

úúúú

û

ù

êêêê

ë

é

=

ny

yy

y!2

1

úúúú

û

ù

êêêê

ë

é

=

nx

xx

X

1

11

2

1

!! úû

ùêë

é=

1

0bb

búúúú

û

ù

êêêê

ë

é

=

ne

ee

e!2

1

vector ofobserved

values

matrix ofPredictors/

features

vector ofresiduals

vector ofparameters

Linear regression model: , i=1.n è

Linear Regression in Matrix Form

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 11

iii exbby ++= 10 e bXy +=

• The normal equations (b’ is the vector of least-squares estimates of b):

å=å+å

å=å+

===

==

n

iii

n

ii

n

ii

n

ii

n

ii

yxxbxb

yxbnb

11

21

10

1110Using

summationsAnd setting the derivative to 0

yXbXX TT =¢ yXXXb TT 1)( -=¢è

Using matrix notation:

Ordinary least squares

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 12

Ordinary least squares

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 13

Ordinary least squares

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 14

Ordinary least squares

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 15

Linear versus Nonlinear Models

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 16

Linear models: Sensitivities of the output are not a function of the model parameters:

10=

¶¢¶byi

ii xby=

¶¢¶

1 úúúú

û

ù

êêêê

ë

é

=

nx

xx

X

1

11

2

1

!!and ; recall

ii xbby 10 +=¢

b1

b2

S(b)

Linear versus Nonlinear parameters

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 17

b1

b2

S(b)

• Linear models have elliptical objective function surfaces.

• i.e. the level sets of the objective function (sum of errors squared) are ellipsis.

With two parameter

One step to get to the minimum.

Nonlinear parametric models: Sensitivities are a function of the model parameters.

Nonlinearity is in parameter space.

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 18

!(# + 1) = () # !(#) + *)(#)+(#),(#) = -) # !(#) + .)(#)+(#)

Elements of A, B, C, and D could be non-linear in the parameter )

Nonlinear Estimation

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 19

Suppose that we have collected data on the output/response Y (n samples),

◦ (y1, y

2, ...y

n)

corresponding to n sets of values of the independent variables/predictors/features

X1, X

2, ... and X

p

◦ (x11

, x21

, ..., xp1

) ,

◦ (x12

, x22

, ..., xp2

),

◦ ... and

◦ (x1n

, x2n

, ..., xpn

).

Nonlinear Estimation

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 20

For possible values q1, q2, ... , qq of the parameters, the residual sum of squares function

( ) ( )å=

-=n

iiiq yy,, ,S

1

221 ˆqqq ! ( )[ ]å

=

-=n

iqpiiii ,θ, ,θ|θ,x, ,xxfy

1

22121 !!

),, ,|,x, ,xf(xy qpiiii qqq !! 2121ˆ =

Nonlinear Estimation

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 21

( ) ( )å=

-=n

iiiq yy,, ,S

1

221 ˆqqq ! ( )[ ]å

=

-=n

iqpiiii ,θ, ,θ|θ,x, ,xxfy

1

22121 !!

The least squares estimates of q1, q2, ... , qq, are values which minimize S(q1, q2, ... , qq).

Nonlinear Estimation

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 22

( ) ( )å=

-=n

iiiq yy,, ,S

1

221 ˆqqq ! ( )[ ]å

=

-=n

iqpiiii ,θ, ,θ|θ,x, ,xxfy

1

22121 !!

To find the least squares estimate we need to determine when all the derivatives S(q1, q2, ... , qq) with respect to each parameter q1, q2, ... and qq are equal to zero.

This will involve, terms with partial derivatives of the non-linear function f.

!"(… )!&'

, !"(… )!&)

, … , !"(… )!&*

Nonlinear Estimation

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 23

Closed form analytical solutions are not possible.

!"(… )!&'

, !"(… )!&)

, … , !"(… )!&*

It is usually necessary to develop an iterative technique for solving them

Recall..

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 24

!(# + 1) = () # !(#) + *)(#)+(#),(#) = -) # !(#) + .)(#)+(#)

/,(#) = 0(/! # , + # , 2)3, … , 2)5)

How can we compute the sum of squared error for state-space models ?

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 25

!(# + 1) = ()!(#) + *)+(#),(#) = -) (#) + .)+(#)

Consider the LTI model

sum of squared error for state-space models

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 26

!(1) = &'!(0) + *'+(1) , 1 = -'! 1 + +.'+(1)

! 0 = !/, 123 + 4 , 4 = 1,…6Given

,(1) = -'&'!(0) + -'*'+ 1 + .'+(1)!(2) = &'!(1) + *'+(2)!(2) = &'&'!(0) + &'*'+(1) + *'+(2) ,(2) = -'!(2) + .'+(2)

,(2) = -'&'&'!(0) + -'&'*'+(1) + -'*'+(2) + .'+(2)

, 0 = -'! 0 + .'+(0)

sum of squared error for state-space models

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 27

!(0)!(1)⋮

!(' − 1)= * + 0 + -

.(0)

.(1)⋮

.(' − 1)

For a given estimate of / , this is the 0! vector ( ) ( )å=

-=n

iiiq yy,, ,S

1

221 ˆqqq !

sum of squared error for state-space models

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 28

! =#$#$%$⋮

#$%$'()* =

+$ 0 …#$.$ +$ 0 …⋮

#$%$'(/.$⋮

#$%$'(0.$⋮

…#$.$ +$

Nonlinear Estimation

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 29

!"# = !"# %# = argmin"∈- .#(", %#)

%#Let be the given data-set {34, 56, 7 = 1,… , :}

.#(", %#) is the squared error i.e. .# ", %# = <4=>

#?4 " ?4@(")

?4 " = A4 − CA4(")

Measured Predicted (for a particular value of " )

Non-linear least squares

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 30

We will cover the following methods:

1) Steepest descent (or Gradient descent) and Newton’s method, 2) Gauss Newton and Linearization, and 3) Levenberg-Marquardt's procedure.

1. In each case a iterative procedure is used to find the least squares estimators :

2. That is an initial estimates, ,for these values are determined.

3. Iteratively find better estimates, that hopefully converge to the least squares estimates,

q,, , qqq ˆˆˆ21 !

002

01 q,, , qqq !

iq

ii ,, , qqq !21

Steepest Descent

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 31

• The steepest descent method focuses on determining the values of q1, q2, ... , qq that minimize the sum of squares function, S(q1, q2, ... , qq). • The basic idea is to determine from an initial point,

and the tangent plane to S(q1, q2, ... , qq) at this point, the vector along which the function S(q1, q2, ... , qq) will be decreasing at the fastest rate. •The method of steepest descent than moves from this initial point along the direction of steepest descent until the value of S(q1, q2, ... , qq) stops decreasing.

002

01 q,, , qqq !

Steepest Descent

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 32

• It uses this point, as the next approximation to the value that minimizes S(q1, q2, ... , qq).

• The procedure than continues until the successive approximation arrive at a point where the sum of squares function, S(q1, q2, ... , qq) is minimized.

• At that point, the tangent plane to S(q1, q2, ... , qq) will be horizontal and there will be no direction of steepest descent.

112

11 q,, , qqq !

To find a local minimum of a function using steepest descent, one takes steps proportional to the negative of the gradient of the function at the current point.

Steepest Descent

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 33

Steepest Descent

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 34

Initialize k=0, choose !0

While k<kmax

Steepest Descent

!" = !"$% − '((!"$%)Gradient

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 35

Steepest Descent

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 36

•While, theoretically, the steepest descent method will converge, it may do so in practice with agonizing slowness after some rapid initial progress.

• Slow convergence is particularly likely when the S(q1, q2, ... , qq) contours highly curved and it happens when the path of steepest descent zigzags slowly up a narrow ridge, each iteration bringing only a slight reduction in S(q1, q2, ... , qq).

• A further disadvantage of the steepest descent method is that it is not scale invariant.

• The steepest descent method is, on the whole, slightly less favored than the linearization method (described later) but will work satisfactorily for many nonlinear problems

Steepest Descent

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 37

Gradient descent is a local optimization method

38

Steepest Descent

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]

Least squares in general

Most optimization problem can be formulated as a nonlinear least squares problem

)()(21minarg xfxfx T

x=*

å=

* =m

iix xfx

1

2))((21minarg

Where , i=1,…,m are given functions, and m>=n RRf ni !:

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 39

Newton’s Method

Quadratic approximation

What’s the minimum solution of the quadratic approximation

2)(21)()()( xxfxxfxfxxf D¢¢+D¢+»D+

)()(xfxf

x¢¢¢

-=D

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 40

Newton’s Method

High dimensional case:

What’s the optimal direction?

xxHxxxFxFxxF T DD+DÑ+»D+ )(21)()()(

)()( 1 xFxHx Ñ-»D -

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 41

Newton’s Method

Initialize k=0, choose x0

While k<kmax

)()( 11

1 --

- Ñ-= kkk xFxHxx l

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 42

Newton’s Method

•Finding the inverse of the Hessian matrix is often expensive

•Approximation methods are often used• conjugate gradient method• quasi-newton method

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 43

Newton’s Method

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 44

Newton’s Method

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 45

min$%(') ')*+ = ') − %′('))

%′′(')) ')*+ = ') − /0+. 2%

f(x)

xkxk+1

q(x)

Terminology

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 46

Newton’s Method

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 47

min$%(') ')*+ = ') − %′('))

%′′(')) ')*+ = ') − /0+. 2%

Newton’s Method

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 48

!"# $ % : ℛ( → ℛ be sufficiently smooth

Taylor’s approximation: $ % ≈ $ + + -. % − + + 12 % − + .2 % − + + ℎ. 5. #.For close to point ‘a’

%.2% − 2+.2% + +.2+- = 7$(+) 2 = 7:$ +

; % = 12 %

.2% + <.% + = < = - − 2+where

7; = 0 ⇒ 2% + < = 0 ⇒ % = −2@A< = −2@A- + + = + − 2@A-

% = + − 2@A- %BCA = %B − 2@A. 7$

Newton’s Method

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 49

!" = 0 ⇒ &' + ) !2" > 0For minima !2" = &Minima if H is PSD

1) Initialize:

2) Iterate: ',-. = ', − &0.. 2

'3

2 = !4(',) & = !24(',)

1) H may fail to be PSD

2) H may not be invertible.

3) Difficult to compute H in practice through numerical methods

Recall: Non-linear least squares

! " =$%&'

()% "

*= )(") *

*

)% " = -% − /-% ) " = )' " , )* " , … , )((") 2

The j-th component of the vector r(x) is the residual

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 50

Non-linear least squares

N

N

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 51

Non-linear least squares

!" # =%&'(

)*& # !*& # = +(#).* #

!2" # =%&'(

)!*& # !*& # 0 +%

&'(

)*& # !2*& #

= + # 0+(#) +%&'(

)*& # !2*& #

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 52

Gauss-Newton Method

! " #!(") +'()*

+,( " -2,( "

Residuals are small when close to the optimal

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 53

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]

Lets talk about curvatures..

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]

Nope, its about space-time curvatures

dude.

Newton does not have a good track record of accounting for

curvatures

Newton’s method cannot use negative curvature

Are you serious right now ! Lets see you try to

invent calculus..

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 56

Newton’s method cannot use negative curvature

• We can progress if we use a positive definite approximation of the Hessian matrix of f(x).

• One possibility is to approximate H by the identity matrix I (always PD)• This will be the same as steepest descent:• Too slow, + convergence issues

• Instead use !" = "$ + &'• High value of & == steepest (gradient) descent.• Low value == Newton or Gauss Newton method

($)* = ($ − ",*. .

($)* = ($ − ∆.

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 57

Levenberg-Marquardt Method

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 58

Illustration of Levenberg-Marquardt gradient descent

59

Contour Line

Steepest descent direction

Newtondirection

Increasing

Decreasing

!

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected]

Levenberg-Marquardt Method

1) Adapt the value of λ during the optimization.

2) If the iteration was successful (F(xk+1) < F(xk))

a) Decrease λ and try to use as much curvature information as possible.

3) If the previous iteration was unsuccessful (F(xk+1) > F(xk))

a) Increase λ and use only basic gradient information.

4) Trust Region Algorithm

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 60

Levenberg-Marquardt Method

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 61

Levenberg-Marquardt from 3 initial points

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 62

Levenberg-Marquardt from 3 initial points

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 63

Stopping CriteriaCriterion 1: reach the number of iteration specified by the user

K>kmax

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 64

Stopping CriteriaCriterion 1: reach the number of iteration specified by the user

Criterion 2: when the current function value is smaller than a user-specified threshold

K>kmax

F(xk)<σuser

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 65

Stopping CriteriaCriterion 1: reach the number of iteration specified by the user

Criterion 2: when the current function value is smaller than a user-

specified threshold

Criterion 3: when the change of function value is smaller than a user

specified threshold

K>kmax

F(xk)<σuser

||F(xk)-F(xk-1)||<εuser

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 66

Multi-start search

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 67

• Several points as initial guesses for regression and the regression is performed for each point.

1) Choose randomly..2) Choose within some

neighborhood of nominal values.

NLLS in Matlab

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 68

Example: nlinfit

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 69

Example: nlinfit

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 70

Example: nlinfit

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 71

Example: nlinfit

Principles of Modeling for CPS – Fall 2018 Madhur Behl [email protected] 72