symbolic regression for the derivation of mathematical ... meeting... · symbolic regression for...
TRANSCRIPT
![Page 1: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/1.jpg)
Symbolic regression for the derivation of mathematical models directly from data
A.Murari1, M.Gelfusa2, P.Gaudio2, M.Lungaroni2, E.Peluso2
1Consorzio RFX Associazione EURATOM/ENEA per la Fusione. Padua, Italy 2 Associazione EURATOM-ENEA - University of Rome “Tor Vergata”, Roma, Italy
![Page 2: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/2.jpg)
Complexity of Magnetised Plasmas
The complexity of magnetised plasmas is due to:
• A different matter state far from equilibrium
• Many variables, long range interactions
• Nonlinear phenomena, non separability
This leads to a hierarchy of descriptions of thermonuclear
plasmas (particle, kinetic, fluid) and to a plethora of ad hoc
models of limited applicability (This plurality of models is not a
fault per se but more a specific characteristic of complex
systems) Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 2
![Page 3: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/3.jpg)
Diagnostics and Data
•JET can produce up to 55 Gbytes of data per shot (potentially about 1 Terabyte per day). Total Database: more than 350 Tbytes
•ATLAS can produce up to about 10 Petabytes of data per year
•Hubble Space Telescope sent to earth up to 5 Gbytes of data per day
•Commercial DVD 4.7 Gbytes.
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 3
![Page 4: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/4.jpg)
From Data to Models
When handling complex systems, theory formulation from data
can require several steps
Raw
Data
Selection
Target Data
Preprocessing
Model
Transformed
Data
Patterns
Data Mining
Model formulation
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 4
![Page 5: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/5.jpg)
Outline
I. Symbolic regression/Genetic programming to
extract models from the data without “a priori”
assumptions about their mathematical form
and with a minimum of “human intervention”
II. Scaling laws (Pth to reach the H mode, tE)
III. Conclusions
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 5
![Page 6: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/6.jpg)
Genetic Algorithms and Programs
• GAs and GPs are computational methods able to
solve complex optimization problems.
• Inspired by the genetic processes of living organisms.
• In nature, individuals of a population compete for
basic resources.
• Those individuals that achieve better surviving capabilities
have higher probabilities to generate descendants.
• GAs and GPs emulate this behavior.
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 6
![Page 7: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/7.jpg)
• GAs and GPs work with a population of individuals.
• Each individual represents a possible solution to
a problem.
• The quality of each individual (possible solution)
is evaluated in terms of a fitness function.
• A higher probability to have descendants is
assigned to individuals with better fitness
functions.
Genetic Algorithms and Programs I
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 7
![Page 8: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/8.jpg)
Genetic Algorithms and Programs II
• Genes or Knowledge representation
(how to represent formulas)
• Fitness Function (FF)
• Criteria to validate the results
Three aspects are fundamental in Genetic
Programming:
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 8
![Page 9: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/9.jpg)
• Formulas are represented as
trees
• Btne20/sin(R)+(Bt-R)
This representation allows an easy implementation
of genetic steps: copy, mutation, cross over
Formulas
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 9
![Page 10: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/10.jpg)
Cross-over
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 10
![Page 11: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/11.jpg)
Function class List
Arithmetic constants,+,-,*,/
Trigonometric sin(xi), cos(xi),tan(xi),asin(xi),atan(xi)
Exponential exp(xi),log(xi),power(xi, xj), power(xi,c)
Squashing logistic(xi),step(xi),sign(xi),gauss(xi),tanh(xi), erf(xi),erfc(xi)
Boolean equal(xi, xj),less(xi, xj), less_or_equal(xi, xj), greater(xi, xj),
greater_or_equal(xi, xj), if(xi, xj, xk), and(xi, xj),or(xi, xj),xor(xi,
xj), not(xi)
Other min(xi, xj),max(xi, xj),mod(xi, xj),floor(xi),ceil(xi),
round(xi),abs(xi)
Genes: function nodes
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 11
![Page 12: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/12.jpg)
Fitness Function: AIC & BIC
• Akaike Information Criterion (AIC):
• Bayesian Information Criterion (BIC):
L MSE (Mean Square Error) k number of parameters n number of observations
• The preferred model for AIC (BIC) criterion is the
one with the minimum value of AIC (BIC)
kLAIC 2log2
nkLBIC loglog2
Penalty for models with a higher number of parameters
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 12
![Page 13: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/13.jpg)
Overview of the Method
• Identification of the functional form of the models with
symbolic regression via genetic programming
• Tuning of the parameters with non linear
fitting(confidence intervals)
• Validation of the results with KL divergence (AIC, BIC),
residuals, two current tests (extrapolability)
Murari A. et al Nucl. Fusion 53 (2013) 043001 (13pp)
Murari A. et al Nucl. Fusion 52 (2012) 063016 (13pp)
Computational complexity: on a simple 2-core machine (2,4GHz and
8GB RAM Linux machine) the search time takes 8 hours to
process the data of 1000 discharges of the global ITPA database.
Hundreds of thousands of models generated. Some calculations
performed also “in the cloud”. Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 13
![Page 14: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/14.jpg)
• In MCNF practically all scalings are power law
monomials
• Power laws do not present any form of saturation
and therefore tend to diverge
• Power laws are monotonic
• Power laws force the interaction between terms to
be multiplicative (strange exponents)
• Many different mechanisms can give rise to power
laws that are therefore not very informative about
the dynamics of the phenomena
Scalings and Power Laws
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 14
![Page 15: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/15.jpg)
The method has been applied to the scaling law of the energy confinement time in terms of dimensional quantities
PL1 5.62 ⋅ 10−2𝐼0.93𝐵0.15𝑛0.41𝑀0.19𝑅1.97𝜖0.58𝜅𝑎0.78𝑃−069
PL2 5.55 ⋅ 10−2𝐼0.75𝐵0.32𝑛0.35𝑀0.06𝑅2.0𝜖0.76𝜅𝑎1.14𝑃−062
NPL 0.03670.03660.0369 ⋅ 𝐼1.0060.996
1.016
𝑅1.7311.7121.751
𝜅𝑎1.4501.411
1.489
𝑃−0.735−0.744−0.727
ℎ 𝑛,𝐵
Table IV. Power laws (PLs) and Non Power Law model (NPL). PL1 is the IPB98(y,2) scaling while PL2 is
(EIV)[15]. The term h(n,B) is:
ℎ 𝑛,𝐵 = 𝑛0.4480.4360.460
⋅ 1 + 𝑒−9.403−9.694−9.112 ⋅
𝑛𝐵 −1.365−1.412
−1.318
−1
k AIC BIC KLD
PL1 10 -19416.86 -19362.86 0.0337
PL2 10 -19084.36 -19203.68. 0.0802
NPL 10 -19660.03 -19599.04 0.0254
ITPA Database of tE: dimensional quantities
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 115
![Page 16: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/16.jpg)
Difference significant at high density
k AIC BIC KLD
PLs 10 -5842.20 -6720.79 5.895
NPL 10 -6005.91 -6975.85 3.829
Extrapolation to JET
ITPA Database of tE
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 16
![Page 17: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/17.jpg)
If the main equations governing the plasma behaviour
are invariant under a certain class of transformations,
also the scaling laws for these plasmas must show the
same invariances.
Dimensionless quantities
The traditional dimensionless quantities are therefore
derived by the symmetries of the Vlasov equations
The Symbolic regression via genetic programming
can also be used to identify dimensionless
quantities in a data driven approach instead of an
hypothesis driven approach.
Symbolic regression via genetic programming allows
finding scalings with dimensionless quantities
(traditional log regression does not)
![Page 18: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/18.jpg)
To verify the trends, the same approach has been applied to the confinement time in terms of dimensionless quantities
k AIC BIC MSE KLD AdPL1 9 0.3316
AdPL2 9 . 0.1875
AdNPL 14 0.0567
AdPL1
AdPL2
AdNPL
wt
wt
tE: scalings with dimensionless quantities
𝜏 ⋅ 𝜔𝑐𝑖 = 1.13 1.111.15 ⋅ 10−6 ⋅
𝜅𝑎1.931.70
2.12
𝛽0.370.330.41
𝑀0.570.460.67
𝜌2.192.162.22
𝜈0.400.390.42
𝑞0.160.080.23 − 0.072−0.085
−0.060 ⋅ 𝜅𝑎1.180.94
1.40
+
−0.009−0.011−0.006 ⋅ 𝑞1.080.94
1.21+ 0.150.13
0.17 ⋅ 𝑀0.07−0.050.19
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 18
![Page 19: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/19.jpg)
t: extrapolation to JET
To substantiate the extrapolability of the non power law scalings, the various scalings have been obtained for the small devices and
the residuals have been calculated for JET
Murari A et al, Plasma Phys. Control. Fusion, 57,014008, doi:10.1088/0741-3335/57/1/014008, (2015)
k AIC BIC MSE KLD AdPL1 9 −1650.59 −2533.00 0.5518 0.3316
AdPL2 9 −6034.11 −6744.95. 0.1157 0.1875
AdNPL 14 −13833.82 −13758.91 0.715 ⋅ 10−2 0.0567
![Page 20: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/20.jpg)
Dimensionless quant. identification : Numerical tests
Original equation (used to generate the data)
Symbolic regression via genetic programming is provided
with the dimensional quantities necessary to identify
the traditional dimensionless quantities.
It is then applied to a database generated with the
dimensionless quantities. Noise up to 50%.
Equation identified by Symbolic regression with 30% of noise and using the dimensionless quantities
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 20
![Page 21: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/21.jpg)
Database Quality: Pareto frontier
AdPl1 Eq(8) 𝟕.𝟐𝟏𝟒 ⋅ 𝟏𝟎−𝟖𝐌𝟎.𝟗𝟔𝛜𝟎.𝟕𝟑𝐤𝟑.𝟑𝛒 ∗−𝟐.𝟕𝟎 𝛃−𝟎.𝟗𝟎𝛎 ∗−𝟎.𝟎𝟏 𝐪−𝟑.𝟎
AdPl2 Eq.(9) 𝟏.𝟖𝟑𝟔 ⋅ 𝟏𝟎−𝟖 ⋅ 𝐌𝟎.𝟒𝟗𝐤𝟑.𝟑𝟒𝛒 ∗−𝟐.𝟔𝟕 𝛃−𝟎.𝟓𝟕𝛎−𝟎.𝟏𝟒𝐪−𝟏.𝟖𝟒𝛜−𝟎.𝟏𝟗
D4AdPL1 Eq.(10) 𝟎.𝟏𝟗𝟖𝟎.𝟏𝟗𝟓𝟎.𝟐𝟎𝟎 ⋅ 𝐁𝟎.𝟕𝟕𝟐𝟎.𝟔𝟗𝟖
𝟎.𝟖𝟒𝟓𝐚𝟏.𝟐𝟗𝟖𝟏.𝟏𝟒𝟓
𝟏.𝟒𝟓𝟏𝐈𝟎.𝟖𝟓𝟏𝟎.𝟕𝟖𝟖
𝟎.𝟗𝟏𝟓𝐌−𝟎.𝟕𝟐𝟗−𝟎.𝟖𝟐𝟗
−𝟎.𝟔𝟐𝟖
D4AdPL2 Eq.(11) 𝟎.𝟏𝟎𝟔𝟎.𝟏𝟎𝟒𝟎.𝟏𝟎𝟕 ⋅ 𝐁𝟏.𝟎𝟎𝟗𝟎.𝟗𝟑𝟎
𝟏.𝟎𝟖𝟖𝐚𝟏.𝟓𝟒𝟖𝟏.𝟑𝟗𝟎
𝟏.𝟕𝟎𝟔𝐤𝟏.𝟏𝟏𝟓𝟎.𝟗𝟕𝟐
𝟏.𝟐𝟓𝟕𝐈𝟎.𝟔𝟐𝟐𝟎.𝟓𝟒𝟑
𝟎.𝟔𝟗𝟗𝐌−𝟎.𝟕𝟏𝟓−𝟎.𝟖𝟑𝟖
−𝟎.𝟓𝟗𝟏
D4AdNPL1 Eq.(12) 𝟎.𝟐𝟗𝟗𝟎.𝟐𝟕𝟑𝟎.𝟑𝟐𝟔 ⋅ 𝐁𝟎.𝟕𝟒𝟕𝟎.𝟔𝟕𝟎
𝟎.𝟖𝟐𝟑𝐚𝟎.𝟖𝟏𝟖𝟎.𝟔𝟒𝟓
𝟎.𝟗𝟗𝟏𝐈𝟎.𝟗𝟓𝟗𝟎.𝟖𝟗𝟑
𝟏.𝟎𝟐𝟔𝐟(𝐌,𝐧)
D4AdNPL2 Eq.(13) 𝟎.𝟏𝟐𝟗𝟎.𝟎𝟖𝟗𝟎.𝟏𝟔𝟗 ⋅ 𝐁𝟎.𝟗𝟔𝟖𝟎.𝟖𝟗𝟎
𝟏.𝟎𝟒𝟓𝐚𝟏.𝟎𝟕𝟕𝟎.𝟗𝟎𝟏
𝟏.𝟐𝟓𝟑𝐈𝟎.𝟕𝟑𝟖𝟎.𝟔𝟓𝟖
𝟎.𝟖𝟏𝟕𝐤𝟎.𝟗𝟖𝟗𝟎.𝟖𝟒𝟖
𝟏.𝟏𝟑𝟏𝐌−𝟎.𝟔𝟒𝟕−𝟎.𝟕𝟕𝟕
−𝟎.𝟓𝟐𝟑𝐠(𝐧)
D4AdNPL3 Eq.(14) 𝟎.𝟏𝟖𝟐𝟎.𝟏𝟕𝟓𝟎.𝟏𝟖𝟖 ⋅ 𝐁𝟎.𝟗𝟎𝟖𝟎.𝟖𝟑𝟏
𝟎.𝟗𝟖𝟓𝐚𝟎.𝟗𝟏𝟏𝟎.𝟕𝟔𝟒
𝟏.𝟎𝟓𝟖𝐈𝟎.𝟒𝟑𝟓𝟎.𝟑𝟒𝟐
𝟎.𝟓𝟐𝟖𝐤𝟎.𝟖𝟓𝟗𝟎.𝟕𝟏𝟕
𝟏.𝟎𝟎𝟎𝐌−𝟎.𝟔𝟐𝟐−𝟎.𝟕𝟒𝟗
−𝟎.𝟒𝟗𝟓𝐡(𝐖,𝐧)
D4AdNPL4 Eq.(15) 𝟎.𝟏𝟕𝟒𝟎.𝟏𝟓𝟖𝟎.𝟏𝟖𝟓 ⋅ 𝐁𝟎.𝟗𝟕𝟓𝟎.𝟖𝟖𝟖
𝟏.𝟎𝟔𝟑𝐚𝟏.𝟏𝟎𝟖𝟎.𝟗𝟐𝟒
𝟏.𝟐𝟗𝟒𝐈𝟎.𝟕𝟑𝟎𝟎.𝟔𝟒𝟓
𝟎.𝟖𝟐𝟑𝐤𝟎.𝟗𝟗𝟑𝟎.𝟖𝟐𝟏
𝟎.𝟖𝟐𝟑𝐢(𝐌,𝐧)
Table II: Best scaling laws derived from the survey of the Pareto Frontier obtained with
symbolic regression via genetic programming;the above functions f,g,h,i(x) follow:
AdNPL1 Eq.(12) 𝐟 𝐌,𝐧 = 𝐌−𝟎.𝟖𝟗𝟒−𝟏.𝟐𝟔𝟑−𝟎.𝟓𝟐𝟒
𝟏 + 𝐞−𝟎.𝟖𝟗𝟗−𝟏.𝟐𝟒𝟓−𝟎.𝟓𝟓𝟑𝐌𝟏.𝟏𝟐𝟔𝟎.𝟏𝟑𝟐
𝟐.𝟏𝟐𝟎𝐧−𝟎.𝟕𝟒𝟑−𝟏.𝟏𝟓𝟔
−𝟎.𝟑𝟑𝟎
−𝟏
AdNPL2 Eq.(13) 𝐠 𝐧 = 𝟏 + 𝐞−𝟐.𝟔𝟏𝟎—𝟓.𝟔𝟏𝟓𝟎.𝟑𝟗𝟓 𝐧−𝟎.𝟓𝟔𝟗−𝟎.𝟕𝟐𝟔
−𝟎.𝟒𝟏𝟒
−𝟏
AdNPL3 Eq.(14) 𝐡 𝐖,𝐧 = 𝟏 + 𝐞−𝟏.𝟕𝟑𝟕−𝟐.𝟎𝟑𝟓−𝟏.𝟒𝟑𝟗𝐖𝟏.𝟑𝟕𝟕𝟏.𝟎𝟓𝟏
𝟏.𝟕𝟎𝟒𝐧−𝟏.𝟒𝟒𝟑−𝟏.𝟕𝟏𝟔
−𝟏.𝟏𝟕𝟏
−𝟏
AdNPL4 Eq.(15) 𝐢 𝐌,𝐧 = 𝐌−𝟎.𝟖𝟏𝟑−𝟏.𝟏𝟔𝟗−𝟎.𝟒𝟓𝟓
𝟏 + 𝐞−𝟎.𝟗𝟑𝟒−𝟏.𝟐𝟖𝟐−𝟎.𝟓𝟖𝟔𝐌𝟎.𝟕𝟕𝟏−𝟎.𝟑𝟎𝟗
𝟏.𝟖𝟓𝟏𝐧−𝟎.𝟕𝟎𝟒−𝟏.𝟐𝟓𝟖
−𝟎.𝟏𝟓𝟏
−𝟏
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 21
![Page 22: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/22.jpg)
Models: statistical quality
Model Eq. τ @ITER s
PL1 (8) 3.66
Pl2 (9) 3.29
D4AdPL1 (10)
D4AdPL2 (11)
D4AdNPL1 (12)
D4AdNPL2 (13)
D4AdNPL3 (14)
D4AdNPL4 (15)
Model Eq. k AIC BIC MSE KLD
PL1 (8) 8 -1652.59 -2540.93 𝟎.𝟓𝟓𝟐 0.3312
Pl2 (9) 8 -6036.11 -6752.89 𝟎.𝟏𝟏𝟔 0.1875
D4AdPL1 (10) 5 -13542.346 -13540.276 0.00799 0.09598
D4AdPL2 (11) 6 -13590.778 -13597.021 0.00785 0.06858
D4AdNPL1 (12) 8 -13662.357 -13638.661 0.00764 0.06332
D4AdNPL2 (13) 8 -13690.555 -13678.387 0.00756 0.06854
D4AdNPL3 (14) 9 -13723.038 -13686.454 0.00747 0.05006
D4AdNPL4 (15) 9 -13692.845 -13675.780 0.00755 0.06945
The best models are not power laws and
they do not use the usual dimensionless
quantities.
They estimate a confinement time on ITER
between 2 and 3 s instead of above 3 s
but the best is again around 3 s.
Multiple solutions: lack of severity of the
database Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 22
![Page 23: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/23.jpg)
Conclusions I
I. A methodology has been devised to extract mathematical
models directly from databases without assumptions on
their mathematical form and with minimal human
intervention.
II. The approach is based on symbolic regression via
genetic programming
III. The proposed method covers the entire process, from
the identification of the mathematical form of the models
to the assessment of their quality. Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 23
![Page 24: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/24.jpg)
Conclusions I I
I. Symbolic regression via genetic programming allows
answering questions which cannot be even asked with
traditional tools: the best form of the equations, quality of
databases, selection of dimensionless quantities, etc.
II. Symbolic regression via genetic programming has been
used to derive scaling laws for the Pthres to access the H
mode and tE
III. The ITPA data base has been analysed and the non PL
expressions are better of about one order of magnitude (in
the Kullback-Leibler sense) compared to the power laws.
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 24
![Page 25: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/25.jpg)
I. The methodology proposed is fully general and can be
easily adjusted to the specific needs of the application
(KDE, errors, time series etc.)
Editors A.Murari and J.Vega.
Many Thanks to PPCF!!!!!
II. Since 2014, a toolbox for symbolic regression via genetic
programming is available in Matlab statistics package
Conclusions III
Andrea Murari | IAEA Fusion Data Processing | NIce | 2015 | Page 25
![Page 26: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/26.jpg)
ITPA Database Quality: Pareto frontier
FF
Model complexity
Only the best models
(diamonds) for each
level of complexity
are retained.
The database is not
selective enough
(not enough severity)
Only the dimensional quantities required to obtain the
dimensionless ones are given as input to Symbolic
Regression.
![Page 27: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/27.jpg)
• Power laws are among the most popular statistical
distributions (together with Gaussian and Poisson)
• They are typically written in form of power law monomials:
• Power laws have been applied in a variety of fields ranging
from physics to economics, to geology and social sciences
• Practically all scalings in Fusion are power laws: L-H
transition, confinement, SOL etc
Scaling laws as Power Law Monomials
![Page 28: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/28.jpg)
• Identification of the functional form of the models and
the most relevant independent variables with symbolic
regression via genetic programming without a priori
assumptions
• Proper treatment of the errors and confidence intervals
(assuming additive Gaussian noise but not necessarily)
• Checking of the parameters with non linear fitting robust
against collinearities (sophisticated gradient descent
methods)
• Validation of the results with KL divergence (AIC, BIC),
residuals
Advantages of the Method
![Page 29: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/29.jpg)
Scalings
Behaviour with density no
reproducible with a power law.
C.Maggi EPS 2012
Scalings of the power threshold Pth to reach the H mode
• Experimental
distribution not really
a power law.
![Page 30: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/30.jpg)
t dimensionless: extrapolation to ITER
Trend versus elongation
Equation
AdNPL
AdPL1
AdPL2
Again the effect is potentially significant
The estimates of NPL and AdNPL for the entries
of the ITPA database. The general agreement is
evident systematic since they do not present any
discrepancy even if they are two independent
regressions.
![Page 31: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/31.jpg)
Validation criteria: Kullback-Leibler divergence
The Kullback–Leibler divergence is always non-
negative and zero if and only if p = q
p is interpreted as the reference pdf
(data), q the model
For distributions P and Q of a continuous random variable,
KL-divergence is defined to be the integral:
where p and q denote the densities of P and Q.
![Page 32: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/32.jpg)
• To study dynamical systems, the FF can be based
on the comparison of the partial derivatives
between pairs of variables in the model (d) and the
data (D).
Dynamical systems: fitness Function
• A model structure is preferred to another if the
derivatives between the dependent and independent
variables are more similar to the corresponding discrete
differences in the database.
FF
![Page 33: Symbolic regression for the derivation of mathematical ... Meeting... · Symbolic regression for the derivation of mathematical models directly from data A.Murari1, M.Gelfusa 2, P.Gaudio](https://reader033.vdocuments.mx/reader033/viewer/2022053112/60862f10ce48153f9d7e4283/html5/thumbnails/33.jpg)
•Summarizing, Symbolic Regression (SR) is a technique
that uses GP (a genetic algorithm) to find out a pool of
mathematical individuals, so “functional forms” made up by:
•functions, e.g. sin(), cos() exp(), +, - * / ,…
•constants and variables.
•The results in this presentation have been obtained using
the following function:
Function class List
Arithmetic c (real and integer constants),+,-,*,/
Exponential exp(xi),log(xi), power(xi,c)
Squashing tanh(xi),
Trigonometric sin(xi), cos(xi), sinh(xi), cosh(xi)
List of basic elements used in the SR via GP
Types of function nodes included in the symbolic regression used to derive the
results presented in this paper, xi and xj are the generic independent variables.
Furthermore the general logistic function is defined as