system identification, forecasting, control neural networks - indico

© Siemens AG, Corporate TechnologyCT RTC BAM

Hans Georg Zimmermann

Siemens AG, Corporate Technology

Email : [email protected]

Corporate Technology

System Identification, Forecasting, Controlwith

Neural Networks


Neural Networks @ Siemens: 25 Years of Research, Development, Innovation

Prior Rule Information

Recurrent Neural Nets

State Space Modeling

FeedforwardNeural Nets

Function Approximation Complex SystemsComplex SystemsComplex Systems

Neuro-Fuzzy Networks

Forecasting& Diagnosis

UncertaintyAnalysis

Optimal Control

TopologyDesign

LearningAlgorithms

Specification Languages

Scripted Modeling

Dataprocessing

Fuzzy Rules

Graphical User Interface

High Performance Computing

Software Environment

Process Modeling

ForecastingRenewables

Process Control

Process Surveillance

Customer Relation Mgt.

Risk Analysis Price

Forecasting

DemandForecasting

EconomicalApplications

TechnicalApplications

for Neural Networks

DecisionSupport

Mathematics of

Neural NetworksNeural NetworksNeural Networks

Prior Rule Information

Recurrent Neural Nets

State Space Modeling

FeedforwardNeural Nets

Function Approximation Complex SystemsComplex SystemsComplex Systems

Neuro-Fuzzy Networks

Forecasting& Diagnosis

UncertaintyAnalysis

Optimal Control

TopologyDesign

LearningAlgorithms

Specification Languages

Scripted Modeling

Dataprocessing

Fuzzy Rules

Graphical User Interface

High Performance Computing

Software Environment

Process Modeling

ForecastingRenewables

Process Control

Process Surveillance

Customer Relation Mgt.

Risk Analysis Price

Forecasting

DemandForecasting

EconomicalApplications

TechnicalApplications

for Neural Networks

DecisionSupport

Mathematics of

Neural NetworksNeural NetworksNeural Networks


Mathematical Neural Networks

# variables

non-linearity

Linear Algebra

Calculus Neural Networks

)( 12 xWfWy =

Existence Theorem:(Hornik, Stinchcombe, White 1989)3-layer neural networks can approximate any continuous function on a compact domain.

Complex Systems Nonlinear Regression

( )21 ,1

2 minWW

T

t

dtt yyE →−=∑

=

Based on data identify an input-output relation

Neural networks imply a Correspondence ofEquations, Architectures, Local Algorithms.

)( 12 xWfWy =

2W

1W

)tanh()( zzf =

xinput

tanh

∗=∂

∂−=Δ

22 W

EW η

∗=∂

∂−=Δ

11 W

EW η

youtput


Neural Networks are No Black Boxes

Sensitivity Analysis: Compute the first derivatives along the time series:

Inpu

ts

NOx Sensitivity over Time

Model Output vs. actual NOx

A classification of input-output sensitivities:

constant over time (= linear relationship)

monotone (input can be used in 1dim. control)

non-monotone (only multi-dim control possible)

~ zero (input useless in modeling and control)

0>∂∂

iinputoutput 0<

∂∂

iinputoutput

Application: Modeling of a Gas Turbine Inputs: 35 sensor measures and control

variables of the turbine Output: NOx emission of the gas turbine


Neural Forecasting of Wind Power Supply

Task: Forecast the wind energy supply of an entire wind field over the next 24 hours Solution: Deep Neural Network (DNN) with 10 hidden layers. Inputs are wind speed and direction Results: DNN (RMSD: 7.20%) outperform the analytical Jensen model (benchmark, RMSD: 10.22%)

Forecast vs. actual Energy Supply (0h) Forecast vs. actual Energy Supply (6h)

Forecast vs. actual Energy Supply (12h) Forecast vs. actual Energy Supply (18h)

Training Test

Training Test Training Test

Training Test


0,00

0,20

0,40

0,60

0,80

1,00

1,20

1,40

1,60

PV Supply SENN + Physics Physics SENN

Forecasting of Solar Power Supply

Forecast the energy supply of a PV plant with a neural network (SENN) based on weather forecasts and/or a physics based model

Data set: 2011-09-08 to 2012-01-23. Minute data (200k data points). 40k data points are randomly selected as test data.

Performance of a purely data-driven model (SENN) is comparable to the physics based model

A hybrid model (SENN & physics) improves the forecast accuracy

Tota

l PV

supp

ly (s

cale

d)

Forecast Horizon: 24h in minute time buckets (Test day, 2012-01-01)

5,78%

5,19%

4,45%

3,0%

3,5%

4,0%

4,5%

5,0%

5,5%

6,0%

Physics SENN SENN + Physics

norm

alize

d M

odel

Erro

r

Model

Model Error measured on Test Data

Fore

cast

Erro

rTime

Combined ApproachNeural NetworkPhysical Model


Optimizing Sensors with Neural Networks

Improve Optical Smoke Detectors:

• Smoke-poor fires are hard to detect

• Avoid false alarms caused by e.g. steam, welding, exhaust fumes

sensor signal

NN-Output

target ( =1 if fire, =0 else )

water steam plastic fire

Modeling with Neural Networks:

• Input: 3 inputs extracted from one sensor over 40 time steps. 50 fire scenarios.

• Network Design: Feedforward network including local and global modeling.

• Network Learning: Stochastic learning including cleaning noise

• Result: Neural networks was able to classify all patterns correctly (time series of 2 different scenarios)


Finite unfolding in time transforms time into a spatial architecture. We assume, that xt=const in the future.The analysis of open systems by RNNs allows a decomposition of its autonomous & external driven subsystems.Long-term predictability depends on a strong autonomous subsystem.

( )CBA

T

t

dtt yy

,,1

2 min→−∑=

state transition

output equation

identification

( )ttt uBAss +=+ tanh1

tt Csy =( )ttt usfs ,1 =+

( )tt sgy =

y

s

u

Modeling of Open Dynamical Systems with Recurrent Neural Networks (RNN)

3+ts

3+ty

C

3−tu

B

2−ts

2−tu

A

B

1−tu

1−ts A

B

tsC

ty

A1+ts

1+ty

CA

2+ts

2+ty

CA

1−ty

C

2−ty

C

tu

4+ts

4+ty

CA

B

1−−= ttt xxuPreprocessing:


An error correction system considers the forecast error in present time as a reaction on unknown external information.

In order to correct the forecasting this error is used as an additional input, which substitutes the unknown external information.

( )dttttt yyusfs −=+ ,,1

( )gf

T

t

dtt yy

,1

2min→−∑

=

( )tt sgy =

-Iddty 2−

2−tz 2+tsD1−ts C

A

1−tz D ts C tz D1+ts C 1+ty C 2+ty

-Id-Id

A A

dty 1−

dty

B

2−tu

B

1−tu

( )

B

tu

Modeling Dynamical Systems with Error Correction Neural Networks (ECNN)

2−ts C

B

3−tu

A

noise+0 Id


Combining Variance - Invariance Separation with ECNN

The bottleneck autoassociator solves the variance - invariance decomposition.

The Error Correction Neural Networksolves the transformed temporal problem.

The sub-networks are implicitly coupled by shared weights.

separation

dynamicst

invariants variantst

variantst+1

dynamicst+1

forecastidentity

invariants

combination

separation

dynamicst

invariants variantst

variantst+1

dynamicst+1

forecastidentity

invariants

combination

2+ts1−ts CA

1−tz D ts C tz D1+ts C 1+tx C

2+tx

A A

1+ty 2+ty

tx

dty 1−− d

ty−dty

ty

B

2−tuB

1−tu

B

tu

F F F

E E E

noise+0 Id

© Siemens AG, Corporate TechnologyIntelligent Systems & Control

Load Forecast actual Load

date

Effective Load Forecasting with Recurrent Neural Networks

Task: Predict the upcoming energy demand on a 15 min. time grid up to 5 days ahead.

Difficulty: Incorporate the impact of ex-ternal influences on the energy demand.

Load Forecasting for Electrical Energy

Load Forecast vs. Actual Load

ener

gy lo

ad

Accuracy of load forecast is 97,95% (Benchmark 95%)

1ty

2ty … 95

ty96ty, , ,,

1ty

2ty … 95

ty96ty, , ,,

α1, …, α8logistic

B > 0

A

96

95

2

1

t

t

t

t

yy

yy

M

8

1

α

α

M= ·

Daily load is represented by a super-position of intraday sub-load curves

1ty

2ty … 95

ty96ty, , ,,1

ty2ty … 95

ty96ty, , ,,1

ty2ty … 95

ty96ty, , ,,

1ty

2ty … 95

ty96ty, , ,,1

ty2ty … 95

ty96ty, , ,,1

ty2ty … 95

ty96ty, , ,,

α1, …, α8logistic

B > 0

A

96

95

2

1

t

t

t

t

yy

yy

M

8

1

α

α

M= ·

Daily load is represented by a super-position of intraday sub-load curves

Compression of Electrical Load Curves


( )0,1

2 minsA

T

t

dtt yy →−∑

=

state transition

output equation

identification

( ) 01 ,tanh ssAs tt =+

[ ] tt sIdy 0,=

Modeling Closed Dynamical Systems with Recurrent Neural Networks

… but to understand the dynamics of the observables, we have to reconstruct at least a part of the hidden states of the world. Forecasting is based on observables and hidden states.

2+ty

Id2−ts A

1−ts Idts A

1+ts

1+ty

Id2+tsA

ty1−ty

]0,[Id]0,[Id]0,[Id]0,[IdId A

]0,[Id

bias 0s tanhtanhtanhtanh

2−ty 2+ty

Id2−ts A

1−ts Idts A

1+ts

1+ty

Id2+tsA

ty1−ty

]0,[Id]0,[Id]0,[Id]0,[IdId A

]0,[Id

bias 0s tanhtanhtanhtanh

2−ty

( )tt sfs =+1

( )tt sgy =

We can only observe a fragment of the world …


The Identification of Dynamical Systems in Closed Form

Embed the original architecture into a larger architecture, which is easier to learn. After thetraining, the extended architecture has to converge to the original model.

3−ts Id A Id A Id A]0,[Id]0,[Id]0,[Id]0,[Id

Id A1+ts

1+ty

2+ts

2+ty

A]0,[Id]0,[Id

−0Id

−0Id

−0Id

−0Id

2−ts 1−ts ts

dty 3−

dty 2−

dty 1−

dty

bias 0s

-Id -Idtar

0=tar

0=tar

0=tar

0=

-Id -Id

Idtanh tanh tanh tanhtanh

The essential task is NOT to reproduce the past observations, but to identify related hidden variables, which make the dynamics of the observables reasonable.


Approaches to Model Uncertainty in Forecasting1 Measure uncertainty as volatility (variance) of the

target series. The underlying forecast model is a constant. Thus sin(ωt) can be highly uncertain!??

2 Build a forecast model. The error is interpreted as uncertainty in form of additive noise. The width of the uncertainty channel is constant over time.

3 Describe uncertainty as a diffusion process (random walk). The diffusion channel widens over time, e.g. scaled by the one-step model error. For large systems 2 & 3 fail: We have to learn to zero error → the uncertainty channel disappears.

4 One large model doesn't allow to analyze forecastuncertainty, but an ensemble forecast shows the characteristics of an uncertainty channel: Given a finite set of data, there exist many perfect models of the past data, showing different future scenarios caused by different estimations of the hidden states.

Training Test

Target SeriesForecast

Time Steps of Forecast Horizon (20 Days)Forecast Horizon (in days)

LME

Copp

er P

rice

Id A Id A]0,[Id]0,[Id]0,[Id

Id A1+ts

1+ty

2+ts

2+ty

A]0,[Id]0,[Id

−0Id

−0Id

−0Id

2−ts 1−ts ts

dty 2−

dty 1−

dty

bias 0s

-Idtar

0=

-Id -Id

Idtanh tanh tanhtanh

tar0=

tar0=

Id A Id A]0,[Id]0,[Id]0,[Id

Id A1+ts

1+ty

2+ts

2+ty

A]0,[Id]0,[Id

−0Id

−0Id

−0Id

2−ts 1−ts ts

dty 2−

dty 1−

dty

bias 0s

-Idtar

0=tar

0=

-Id -Id

Idtanh tanh tanhtanh

tar0=

tar0=

tar0=

tar0=


Control of Dynamical Systems with Recurrent Neural Networks

Questions to Solve

System Identification

State Estimation

Controller Design


Dt

yL min)( →∑>τ

τ

normative target

System Identification, State Estimation & Optimal Control with RNNs

Unfold the RNN into the future, given A,B,C.

Learn a linear feedback controller:

3+ts

3+ty

B

2−ts

3−tu

A

B

2−tu

1−ts

B

ts

ty

1+ts

1+ty

2+ts

2+ty1−ty2−ty

1−tu

A A A A

B

3−ts

4−tu

A

3−ty

B∗+1tu

B∗

+2tuB∗

tu

D D D

τττ Dssuu ==∗ )(

C C C C C C C

( )CBA

t

dyy,,

2 min→−∑≤τ

ττ

ττ Csy =

system identification

( )τττ BuAss +=+ tanh1

noise+0

ττ sy =( )+=+ ττ ss tanh1 )( ττ suA B

C


Reward Maximization with a Learned Controller

0 100 200 300 400 500 600 700 800 900 10000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rew

ard

Number of 5 s time steps

Random actions

No operation

Learned controller(different alogrithms)

0 100 200 300 400 500 600 700 800 900 10000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rew

ard

Number of 5 s time steps

Random actions

No operation

Learned controller(different alogrithms)

Challenges: High data volume (5000 var/sec)

Streaming real-time analysis

Autonomous learning

Real-time data analytics using 1000 learned models

Optimization of emissions

Optimal turbine operation

Tasks:System Identification

State EstimationController Design

system identification, forecasting, control neural networks - indico

Documents