„the perfect is not good enough!” (carl benz) v isualization of high dimensional data by use of...
TRANSCRIPT
„The perfect is not good enough!” (Carl Benz)
VISUALIZATION OF HIGH DIMENSIONAL DATA BY USE OF GENETIC PROGRAMMING – APPLICATION TO ON-LINE INFRARED SPECTROSCOPY BASED PROCESS MONITORINGTIBOR KULCSÁR, JÁNOS ABONYIUNIVERSITY OF PANNONIADEPARTMENT OF PROCESS ENGINEERING
2
PreconditionsOnline analyzers are widely used in oil industry to
predict product properties like Density, Cloud point, etc.
Properties can’t be described using linear models
Visualization of high dimensional spectral database is needed for model development and proces monitoring
Cost function and a tool for equation discovery is needed to obtain compact and interpretable mappingof high dimensional data
3
4000 4100 4200 4300 4400 4500 4600 4700 48000
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
cm-1
jj f wy
w
y
n
n
R
R
w
y 3010 yn
195wn
Njwn
kkj ,...,1,1
1
Task I: Estimation
Nj ,...,1
Tjnjj yPP ,...1y
4
Similar spectra - Similar property
-1.5 -1 -0.5 0 0.5 1 1.5
x 10-4
-4
-3
-2
-1
0
1
2
3
4
5x 10
-5
Dmax
Rsphere = 3 Percentage of Dmax corresponding to the radius of the sphere
15 20 25 30 351
1.5
2
2.5
3
3.5
4
4.5
5
5.5
TotAroP
olyC
ycl
k
n
kkxkjxjjx wwSSdi
1
,
mjxjx iSSdi ,
vxvj PP vvxvj EPP
Evim
5
Finding similar spectra
Prediction model Nearest Neighbors algorithm The neighborhood is basis of the
prediction
2D mapping Define the range of validity for
the local models The mapped plain should follow
the original spectral space
Quality measure Measure the quality of mapping Measure the neighborhood
preserving
Property X = f ( Prop[S1, S2, S3, S4, S5, S6] )Property X = f ( Prop[S1, S2, S3, S4, S5, S6] )
S1S1S2S2
S4S4S6S6S5S5
S3S3
N2N2
N4N4
N6N6N5N5
N3N3
N1N1
XX
nxP̂
n
1iiP
6
Chemical information – interpretable?
0
.2
.4
.6
.8
1
1.2
4000 4100 4200 4300 4400 4500 4600 4700 4800
Ab
so
rbe
nc
y
Aromatic
Eth
yle
nic
Ole
fin
ic
Aro
ma
tic
Aro
ma
tic
Bra
nc
he
d /
cy
clo
nic
Linear
Saturated
Saturated
Branched
Wavenumber (cm-1)
43
21~WW
WWKARO
aromatic
linear
olefinic
7
Aggregates – need for explicit mapping
1.8 2 2.25
6
7
Rsat
Kar
o
1.8 2 2.210
20
30
Rsat
Kis
o
1.8 2 2.210
15
20
Rsat
Ken
e
1.8 2 2.265
70
75
Rsat
Nol
a
1.8 2 2.215
20
25
Rsat
Nol
ef
1.8 2 2.2-10
0
10
Rsat
Nar
o
1.8 2 2.2-100
-50
0
Rsat
Kox
1.8 2 2.280
100
120
Rsat
Par
ox
1.8 2 2.2-1
-0.5
0
Rsat
Kar
o3
1.8 2 2.2100
150
Rsat
Kcy
1.8 2 2.20
50
100
Rsat
Ksa
tu
1.8 2 2.20
50
100
Rsat
Ker
oH
1.8 2 2.29
9.5
10
Rsat
AK
aro
5.5 6 6.5 710
20
30
Karo
Kis
o
5.5 6 6.5 710
15
20
Karo
Ken
e
5.5 6 6.5 765
70
75
Karo
Nol
a5.5 6 6.5 7
15
20
25
Karo
Nol
ef
5.5 6 6.5 7-10
0
10
Karo
Nar
o
5.5 6 6.5 7-100
-50
0
Karo
Kox
5.5 6 6.5 780
100
120
Karo
Par
ox
5.5 6 6.5 7-1
-0.5
0
Karo
Kar
o3
5.5 6 6.5 7100
150
Karo
Kcy
5.5 6 6.5 70
50
100
Karo
Ksa
tu
5.5 6 6.5 70
50
100
Karo
Ker
oH
5.5 6 6.5 79
9.5
10
Karo
AK
aro
15 20 25 3010
15
20
Kiso
Ken
e
15 20 25 3065
70
75
Kiso
Nol
a
15 20 25 3015
20
25
Kiso
Nol
ef
15 20 25 30-10
0
10
Kiso
Nar
o
15 20 25 30-100
-50
0
Kiso
Kox
15 20 25 3080
100
120
Kiso
Par
ox
15 20 25 30-1
-0.5
0
Kiso
Kar
o3
15 20 25 30100
150
Kiso
Kcy
15 20 25 300
50
100
Kiso
Ksa
tu
15 20 25 300
50
100
Kiso
Ker
oH
15 20 25 309
9.5
10
Kiso
AK
aro
12 14 16 1865
70
75
Kene
Nol
a
12 14 16 1815
20
25
Kene
Nol
ef
12 14 16 18-10
0
10
Kene
Nar
o
12 14 16 18-100
-50
0
Kene
Kox
12 14 16 1880
100
120
Kene
Par
ox
12 14 16 18-1
-0.5
0
Kene
Kar
o3
12 14 16 18100
150
Kene
Kcy
12 14 16 180
50
100
Kene
Ksa
tu
12 14 16 180
50
100
Kene
Ker
oH
12 14 16 189
9.5
10
Kene
AK
aro
65 70 7515
20
25
Nola
Nol
ef
65 70 75-10
0
10
NolaN
aro
65 70 75-100
-50
0
Nola
Kox
65 70 7580
100
120
Nola
Par
ox
65 70 75-1
-0.5
0
Nola
Kar
o3
65 70 75100
150
Nola
Kcy
65 70 750
50
100
Nola
Ksa
tu
65 70 750
50
100
Nola
Ker
oH
65 70 759
9.5
10
Nola
AK
aro
15 20 25-10
0
10
Nolef
Nar
o
15 20 25-100
-50
0
Nolef
Kox
15 20 2580
100
120
Nolef
Par
ox
15 20 25-1
-0.5
0
Nolef
Kar
o3
15 20 25100
150
Nolef
Kcy
15 20 250
50
100
Nolef
Ksa
tu
15 20 250
50
100
Nolef
Ker
oH
15 20 259
9.5
10
Nolef
AK
aro
-5 0 5 10-100
-50
0
NaroK
ox-5 0 5 10
80
100
120
Naro
Par
ox
-5 0 5 10-1
-0.5
0
Naro
Kar
o3
-5 0 5 10100
150
Naro
Kcy
-5 0 5 100
50
100
Naro
Ksa
tu
-5 0 5 100
50
100
Naro
Ker
oH
-5 0 5 109
9.5
10
Naro
AK
aro
-100 -50 080
100
120
Kox
Par
ox
-100 -50 0-1
-0.5
0
Kox
Kar
o3
-100 -50 0100
150
Kox
Kcy
-100 -50 00
50
100
Kox
Ksa
tu
-100 -50 00
50
100
Kox
Ker
oH
-100 -50 09
9.5
10
Kox
AK
aro
80 90 100 110-1
-0.5
0
Parox
Kar
o3
80 90 100 110100
150
Parox
Kcy
80 90 100 1100
50
100
Parox
Ksa
tu
80 90 100 1100
50
100
Parox
Ker
oH
80 90 100 1109
9.5
10
Parox
AK
aro
-1 -0.5 0100
150
Karo3
Kcy
-1 -0.5 00
50
100
Karo3
Ksa
tu
-1 -0.5 00
50
100
Karo3
Ker
oH
-1 -0.5 09
9.5
10
Karo3
AK
aro
100 120 140 1600
50
100
Kcy
Ksa
tu
100 120 140 1600
50
100
Kcy
Ker
oH
100 120 140 1609
9.5
10
Kcy
AK
aro
20 40 60 800
50
100
Ksatu
Ker
oH
20 40 60 809
9.5
10
Ksatu
AK
aro
0 50 1009
9.5
10
KeroH
AK
aro
𝑊1 +𝑊2𝑊3 +𝑊4 +𝑊5
൬𝑊1 ⋅ 𝑊2𝑊3 ⋅ 𝑊4 −𝐶1൰𝐶2 +𝐶3
൬𝑊1 −𝑊2 +𝐶1𝐶2𝑊1 +𝑊4 −𝐶3൰𝐶4 +𝐶5
൬𝑊1𝑊2𝑊3 −𝐶1൰𝐶2 −𝐶3
൬𝑊1 +𝑊2 +𝑊3𝑊4 +𝑊5 −𝐶1൰𝐶2 +𝐶3
൬𝐶1𝑊1 +𝑊2𝑊3 +𝑊4 −𝐶1൰𝐶2 +𝐶3
ሺ𝐶1𝑊1 +𝐶2𝑊2 +𝐶3𝑊3 +𝐶4𝑊4 +𝐶5𝑊5 +𝐶6𝑊6ሻ−𝐶7
Agg
rage
2
Aggrage 1
Two aggregate
2D mapping
8
Representation of AggregatesOne of the most popular method
for representing structures is the binary tree.
1221 / pxpxy
Terminal nodes:
Variables: x1, x2
Parameters: p1, p2
Non terminal nodes
Operators: +,-,*,/
Functions: exp(),cos()
–
X1 /
+ P1
P2 X2
11
Scheme of Genetic ProgramingCreation of initial
population
Evaluation
Selection
Direct reproduction
New generation
End?
End
Crossover Mutation
Parameteroptimization
Fitnessvalue
12
Process of model developmentMeasurement•Online spectrum•Labor data
MATLAB•Preprocessing•Data query
MATLAB Genetic
algorithm
TOPNIR environment
Online System
14
ConclusionThe quality of mapping is measureable
Neighborhood preserving (forward and backward) Discriminating operational regimes
Aggregate based mapping Interpretable chemical information Build aggregate – needs much experience (divination)
Genetic programing Controlled method to make new equations Needs proper cost function
(measure the quality of mapping)
Visual representation of models Aggregate -> 2D plot -> dashboard graph Information about the model structure
15
Questions? …
The financial support of the TAMOP-4.2.2/B-10/1-2010-0025 project is acknowledged.
ACKNOWLEDGMENT
In case of any question or remark please contact us