simple interval calculation ( sic-method) theory and applications
DESCRIPTION
Simple Interval Calculation ( SIC-method) theory and applications. Rodionova Oxana [email protected] Semenov Institute of Chemical Physics RAS & Russian Chemometric Society Moscow. Plan. Introduction Main Features of SIC-method Treatment of Parameter b SIC-object status classification - PowerPoint PPT PresentationTRANSCRIPT
1
Simple Interval Calculation (SIC-method)
theory and applications.
Rodionova Oxana [email protected]
Semenov Institute of Chemical Physics RAS & Russian Chemometric Society
Moscow
2
Plan
1. Introduction
2. Main Features of SIC-method
3. Treatment of Parameter
4. SIC-object status classification
5. Conclusions
3
First Question.
Why do we think about some other methods?
Classical statistical methods
Chemometric approach & projection methods
SIC-method
4
Second Question.
Why do we call our method in such a way?
Simple interval calculation (SIC-method)
1. simple idea lies in the background
2. well-known mathematical methods are used for its implementation.
gives the result of the prediction directly in an interval form
5
Main Assumption of SIC-method
,Prob 00
All errors are limited.
+
Normal (–) distribution
Finite (–) distributions Value is
the Maximum Error Deviation (MED)
6
The Region of Possible Values (RPV)
Let (xi,yi) , i=1,…,n – be a calibration sample ( an object)
i - yi i + (1)
yi - xtia yi + (2)
All vectors a, which agree with (2) form a strip S(xi,yi) Rp
- is known
ExactExact (errorless) model
InexactInexact (real) model y=Xa+
y = +X = , X is n p matrix n – samples; p - variables
RPV
7
The Simplest Example of RPV
-1.2
-0.6
0
0.6
1.2
-1.2 -0.6 0 0.6 1.2
X1
X2
y1 1 0 a1
y2 = 0 1 a2
0.9
0.95
1
1.05
1.1
0.9 1 1.1
a1
a2
1
0.9
0.95
1
1.05
1.1
0.9 1 1.1
a1
a2
2 3
0.9
0.95
1
1.05
1.1
0.9 1 1.1
a1
a2
4
0.9
0.95
1
1.05
1.1
0.9 1 1.1
a1
a2
0.9
0.95
1
1.05
1.1
0.9 1 1.1
a1
a2
0.9
0.95
1
1.05
1.1
0.9 1 1.1
a1
a2
5
8
The RPV A Properties
9
SIC Prediction
0
1
2
3
4
5
1 2 3 4
Test Samples
V-prediction interval
U-test interval
10
Example of SIC – prediction
C11C10C9
C8
C7
C6
C5
C4
C3
C2 C1
T4
T3
T2
T1
-10
-5
0
5
10
-40 -20 0 20 40
PC1
PC2
C5 C6 C7 C8 C9
RPV
C2-C4-
C6-
C10-
C11-
C2+
C11+C3+
C4+
A
BC
D
E
PCR
-0.26
-0.22
-0.18
-0.14
0.015 0.025 0.035 0.045
a1
a2
C11C10C9
C8
C7
C6
C5
C4
C3
C2 C1
T4
T3
T2
T1
-10
-5
0
5
10
-40 -20 0 20 40
PC1
PC2
C5 C6 C7 C8 C9
RPV
C2-C4-
C6-
C10-
C11-
C2+
C11+C3+
C4+
A
BC
D
E
PCR
-0.26
-0.22
-0.18
-0.14
0.015 0.025 0.035 0.045
a1
a2
C5 C6 C7 C8 C9
RPV
C2-C4-
C6-
C10-
C11-
C2+
C11+C3+
C4+
A
BC
D
E
PCR
-0.26
-0.22
-0.18
-0.14
0.015 0.025 0.035 0.045
a1
a2
C5 C6 C7 C8 C9
RPV
C2-C4-
C6-
C10-
C11-
C2+
C11+C3+
C4+
A
BC
D
E
PCR
-0.26
-0.22
-0.18
-0.14
0.015 0.025 0.035 0.045
a1
a2
C5 C6 C7 C8 C9
RPV
C2-C4-
C6-
C10-
C11-
C2+
C11+C3+
C4+
A
BC
D
E
PCR
-0.26
-0.22
-0.18
-0.14
0.015 0.025 0.035 0.045
a1
a2
36.69
6.63
11
Treatment of Parameter
known a priori
unknown parameter of error distribution
parameter of the method and it is
unknown
12
Unknown . How to Find It?
There exists a minimum bsuch that A(b) . This minimum value may be taken as an estimator for parameter
Value b is used instead of
The RPV A depends on b and A(b) is extended monotonically with increasing of b
13
1. number of objects in calibration set ( N )b at N
- the Unknown Parameter of the Error Distribution.
The accuracy of estimate depends on
- -
2. form of error distribution
14
Statistical Simulation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
-1.2 -0.8 -0.4 0 0.4 0.8 1.2
0.3
1
2
Number of objects in
calibration set N
Number of repeated series m= 500 at each (N, k )
N 10 20 50 75 100 250
k 0.3, 0.5, 1,
1.5, 2,
2.5, 3
15
bsic Calculation
bsic=breg*C(N,s)
k s0.3 0.5738861 0.53956
1.5 0.4950982 0.4398133 0.328859
N=100 -fixed, k=0.3,…,3
3500 points
initialy = 0.5471x + 0.7263
0.4
0.6
0.8
1
1.2
1.4
1.6
0.2 0.3 0.4 0.5 0.6 0.7
s
breg
corrected
0.4
0.6
0.8
1
1.2
1.4
1.6
0.2 0.3 0.4 0.5 0.6 0.7
s
bconf
16
Octane Rating Example
25
26
JK
L
M
0
0.1
0.2
0.3
0.4
0.5
0.6
1100 1150 1200 1250 1300 1350 1400 1450 1500 1550
Wavelength
Short Training Set (1-24) Long Traing Set (1-26)
Short Test Set (A-I) Long Test Set (A-M)
X-predictors are NIR-measurements (absorbance spectra) over 226 wavelengths,
Y –response is reference measurements of octane number.
Training set =26 samples
Test set =13 samples
Spectral dada
Geometrical shape of RPV for Number of PCs=3, short training set
17
Octane Rating Example
86
87
88
89
90
91
92
93
A B C D E F G H I J K L MTest Samples
Oc
tan
e N
um
be
r (s
am
ple
s A
-I)
60
70
80
90
100
110
120
Oc
tan
e N
um
be
r (s
am
ple
s J
-M)
PCR & SIC prediction for PCs=3
Points ( ) are test values with error bars, points ( ) are PCR estimates, bars ( ) are SIC intervals, curves ( ) are borders of PCR confidence intervals. Short test set
Test set with outliers
s=0.475 C=1.12
18
Quality of Calibration
RMSECRMSEC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
-1.2 -0.8 -0.4 0 0.4 0.8 1.2
0.3
1
2
bsic
~1.7*RMSEC bsic ~ 1.9*RMSEC
bsic ~ 2.3*RMSEC
bbsicsic~1/s*RMSEC~1/s*RMSEC
19
Quality of Prediction
C5 C6 C7 C8 C9
RPV
C2-C4-
C6-
C10-
C11-
C2+
C11+C3+
C4+
A
BC
D
E
PCR
-0.26
-0.22
-0.18
-0.14
0.015 0.025 0.035 0.045
a1
a2
New object (x,y)
?
20
SIC Object Status Theory
21
SIC– leverage / SIC–residual
u
u
2
2d y
v
v
22
SIC Object Status Map(x,y) - SIC-Residual h(x) - SIC-Leverage
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 0.5 1 1.5
h(x)
(x,y)
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 0.5 1 1.5
h(x)
(x,y)
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 0.5 1 1.5
h(x)
(x,y)
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 0.5 1 1.5
h(x)
(x,y)
23
Octane Rating Example
AC D GB
IF
H
E
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 0.5 1 1.5
h(x)
(x,y)
AC D GB
IF
H
E
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 1.5
h(x)
(x,y)
AC D GB
IF
H
E
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 1.5
h(x)
(x,y)
86
87
88
89
90
91
92
93
A B C D E F G H I J K L MTest Samples
Oc
tan
e N
um
be
r (s
am
ple
s A
-I)
60
70
80
90
100
110
120
Oc
tan
e N
um
be
r (s
am
ple
s J
-M)
AC D GB
IF
H
E
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 1.5
(x,y)
E
HF
I
BGDC
A
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 1.5
(x,y)
LJ
MK
-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
12 14 16 18 20
h(x)
bsic=0.66 3 PCs
24 calibration samples 10 boundary samples
24
Wheat Quality Monitoring
X-predictors are NIR-measurements (log-value of absorbance spectra) at 20 wavelengths,
Y –response is reference measurements of protein contents.
Training set =165 (3*55) wheat samples
Standard error in reference method = 0.09
PLS-model with 7 PC
Sample 35 is outlier
0.3
0.4
0.5
0.6
0.7
1440 1640 1840 2040 2240
25
Wheat Quality Monitoring
bmin=0.147
bsic=0.241-2.3
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1.7
2.2
0 0.5 1 1.5
h(x)
(x,y)
x y=x+1 x y=-x-10 1 0 -1
-2.8
-1.8
-0.8
0.2
1.2
2.2
0 0.5 1 1.5 2
h(x)
(x,y)
Sample No 35
18 boundary samples
26
Main rules
is know a priori
Check up that A()
YES
Calculate bmin
and bsic
NO
Error of Modeling
Calculate prediction intervals for test samples
A sample is inside the model – reliable
prediction
A sample is absolute outsider- it differs from
calibration samples.
New sample- absolute outsider or not.
27
The Main Features of the SIC-method
SIC - METHODSIC - METHOD
• gives the result of prediction directly in the interval form.
• calculates the prediction interval irrespective of sample position regarding the model.
• summarizes and processes all errors involved in bi-linear modelling all together and estimates the Maximum Error Deviation for the model
• provides wide possibilities for sample classification and outlier detection