environmental applications of chemometrics
DESCRIPTION
WSC2, Barnaul, March 2003. Environmental Applications of Chemometrics. Pentti Minkkinen Lappeenranta University of Technology. e-mail: [email protected]. General. - PowerPoint PPT PresentationTRANSCRIPT
Environmental Applications of Chemometrics
Pentti MinkkinenLappeenranta University of Technology
e-mail: [email protected]
WSC2, Barnaul, March 2003
General
• Many environmental data sets (problems) are a challenge to a data analyst: multivariate, long time series, missing values, new analytical methods adopted
• Needs: Data compression, visualization, modeling • Problem types: classification, process modeling,
monitoring and detection of trends , new information from old data
• Standard chemometric methods, PCA, PLS and DPLS can be used for many different problems
Contents
• Dependence of emission of diesel engine on its running speed and load
• Effect of exposure to vanadium dust in industrial environment
• Effects of industrial effluents in the recipient lake
• Multivariate study on urban aerosol samples• Periodicities of the surface level fluctuation
of two large Finnish lakes
Emissions of a Diesel Engine
RUN SO4 NO3 ORG OCO elC PAH TOT
A100 11.77 4.16 20.44 24.53 33.87 0.040 73.34
A50 4.63 2.48 37.23 44.67 25.88 0.027 77.65A25 2.34 2.88 48.05 57.66 24.58 0.036 87.45
Mean 6.247 3.173 35.24 42.87 28.11 0.043 79.48
Stand. dev. 4.92 0.88 13.91 16.7 5.03 0.0067 7.23
B100 5.39 1.83 20.38 24.46 51.52 0.026 83.21B50 4.09 1.75 24.06 28.87 47.30 0.024 82.04
B25 1.71 1.64 41.58 49.89 35.16 0.036 88.39
Mean 3.73 1.74 28.67 34.41 44.66 0.287 84.55
Stand. dev. 1.87 0.095 11.32 13.59 8.49 0.064 3.38
A=1600 rpm; B =2600 rpm Ji Ping Shi et al. Environ.Sci & Techn. 34 (No. 5,2000) 748-755
Covariance and correlation matrices of X
SO4 12.97 2.70 -30.34 -36.40 3.65 0.007 -18.42NO3 2.67 0.93 -1.54 -1.85 -5.14 0.004 -3.71ORG -30.34 -1.54 141.68 169.99 -97.92 0.02 42.33OCO -36.40 -1.85 169.99 203.96 -117.49 0.03 50.80elC 3.65 -5.14 -97.92 -117.49 121.15 -0.04 2.81PAH 0.007 0.004 0.02 0.03 -0.04 0.000 -0.001TOT -18.42 -3.71 42.33 50.80 2.81 -0.002 33.18
cov(X) =
SO4 NO3 ORG OCO elC PAH TOT
SO4 1.00 0.78 -0.71 -0.71 0.09 0.31 -0.89NO3 0.78 1.00 -0.13 -0.13 -0.49 0.67 -0.67ORG -0.71 -0.13 1.00 1.00 -0.75 0.30 0.62OCO -0.71 -0.13 1.00 1.00 -0.75 0.30 0.62elC 0.09 -0.49 -0.75 -0.75 1.00 -0.54 0.04PAH 0.31 0.67 0.30 0.30 -0.54 1.00 -0.04TOT -0.89 -0.67 0.62 0.62 0.04 -0.04 1.00
corcoef(X)=
Diesel: Variables OAT
100 50 25 100 50 250
50
100
150
200
250
1600 rpm 2600 rpm
tot
PAHelC
oco
orgCNO3
SO4
Two at a time
0 5 10 150
2
4
6
A100
A50 A25 B100B50 B25
NO
3
0 5 10 1520
30
40
50
A100
A50
A25
B100B50
B25
org
0 5 10 1520
40
60
A100
A50
A25
B100B50
B25
oco
0 5 10 1520
40
60
A100A50 A25
B100B50
B25 elC
0 5 10 150.02
0.03
0.04 A100
A50
A25
B100B50
B25
SO4
PAH
0 5 10 1570
80
90
A100A50
A25 B100B50
B25
SO4
tot
1 2 3 4 520
30
40
50
A100
A50
A25
B100B50
B25 or
g
1 2 3 4 520
40
60
A100
A50
A25
B100B50
B25
oco
1 2 3 4 520
40
60
A100A50 A25
B100B50
B25 elC
1 2 3 4 50.02
0.03
0.04 A100
A50
A25
B100B50
B25
NO3
PAH
1 2 3 4 570
80
90
A100A50
A25 B100B50
B25
NO3
tot
20 30 40 5020
30
40
50
60
A100
A50
A25
B100B50
B25
org
oco
20 30 40 5020
30
40
50
60
A100
A50 A25
B100B50
B25
org
elC
20 30 40 500.02
0.025
0.03
0.035
0.04 A100
A50
A25
B100B50
B25
org
PAH
20 30 40 5070
75
80
85
90
A100
A50
A25
B100B50
B25
org
tot
20 30 40 50 6020
30
40
50
60
A100
A50 A25
B100B50
B25
oco
elC
20 30 40 50 600.02
0.025
0.03
0.035
0.04 A100
A50
A25
B100B50
B25
oco
PAH
20 30 40 50 6070
75
80
85
90
A100
A50
A25
B100B50
B25
oco
tot
20 30 40 50 600.02
0.025
0.03
0.035
0.04 A100
A50
A25
B100B50
B25
elC
PAH
20 30 40 50 6070
75
80
85
90
A100
A50
A25
B100B50
B25
elC
tot
0.02 0.025 0.03 0.035 0.0470
75
80
85
90
A100
A50
A25
B100B50
B25
PAH
tot
PCA: X=T P’, A=3
2.80 -2.12 0.39 -0.35 -0.48 -1.40 -2.25 -1.29 0.15 1.12 1.89 0.34 0.63 1.81 -0.24 -1.95 0.19 0.76
T=
0.47 0.24 -0.49 -0.49 0.24 -0.04 -0.43-0.26 -0.53 -0.21 -0.21 0.50 -0.51 0.22 0.06 0.01 -0.12 -0.12 0.43 0.71 0.53
P’=
52.8 90.7 99.0
R2 =
Object scores Variable loadings
Index of determination
-3 -2 -1 0 1 2 3-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
A100
A50
A25
B100B50
B25
T1 (52.8 %)
T2
(90.
7 %
)
SO4
-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
NO3
orgoco
elC
PAH
tot
P1 (52.8 %)P2
(90.
7 %
)
Scores and loadings: PC1 vs. PC2
Biplot of scores and loadings
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
A100
A50
A25
B100B50
B25
SO4
NO3
orgoco
elC
PAH
tot
T1, P1 (52.8 %)
T2,
P2
(90.
7 %
)
-3-2
-10
12
3
-3-2
-10
12
-1.5
-1
-0.5
0
0.5
1
A100
A50
B100
B50
A25
B25
T2
T3
T1
3-D graph of the scores (99 % variance explained)
Can we make a predictive model?
1600 1001600 501600 252600 1002600 502600 25
Y=
Given X (emissions) can we predict Y (engine speed and load)?
Inverse calibration problem for PLS
PLSPartial Least Squares or Projection to Latent Structure
ui
ti
Xx1 xk
x1
t iX
x2
x3
Yy1 yn
y2
Y
y1
ui
y3
??
X = TP’ +E Y = UQ’ +F
U = T d + G
bpls = W (P' W)-1 Q'
PLS results between autoscaled LOG(X) and autoscaled Y
Percent Variance Captured by PLS Model -----X-Block----- -----Y-Block----- LV # This LV Total This LV Total ---- ------- ------- ------- ------- 1 54.15 54.15 43.21 43.21 2 35.23 89.38 45.29 88.50 3 9.03 98.41 3.89 92.39
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
SO4
NO3
orgoco
elC
PAH
tot
W1, Q1
W2,
Q2
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
SO4
NO3
orgoco
elC
PAH
tot
rev
Load
W1, Q1
W2,
Q2
Diesel, PLS model
Biplot of PLS loading weights (W) and Y variable loadings
1 2 3 4 5 6 7-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
SO4 NO3 org oco
elC PAH tot
1 2 3 4 5 6 7-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
SO4 NO3 org oco
elC PAH tot
1 2 3 4 5 6 7-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
SO4 NO3
org oco
elC PAH
tot
1 2 3 4 5 6 7-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
SO4 NO3
org oco
elC PAH
tot
Regression coefficients from PLS
ENGINEREVOLUTIONS
ENGINELOAD
PRE
DIC
TE
D R
EV
1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 26001400
1800
2200
2600
2800
A100A50
A25
MEASUDERED REV
1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 26001400
1800
2200
2600
2800
A100A50
A25
B100
B50 B25
Diesel: PLS prediction
100100
PRE
DIC
TE
D L
OA
D
20 30 40 50 60 70 80 9020
30
40
50
60
70
80
90
100
110
A50
A25
B50
B25
Diesel: PLS
MEASURED LOAD
20 30 40 50 60 70 80 9020
30
40
50
60
70
80
90
100
110A100
A50
A25
B100
B50
B25
Diesel: PLS prediction
Clinical effects of exposure to vanadium dust
26 clinical variables on blood serum measured on two matched groups: Test group (18 persons exposed to vanadium dust in V2O5 factory and control group (17 persons not exposed to vanadium dust)
Data measured by Lauri Pyy et al.
0 2 4 6 8 10 120
5
10
15
20
25
30
GlucAlb Cl K Na CreaUreaUratCa PI Bil B-cfAfosAlatAsatLD CholTrigFe gCT IGE IGA IGG IGM BSP Prot
Varia
ble
No.
Scaled concentrations
Exposure to vanadium - comparison by variables OAT
-5 0 5-5
-4
-3
-2
-1
0
1
2
3
4
5
VV
V
V
V
VV
V
V
V
VV
V
V
VV
V
VC C
C
C
C
CC
C
C
CCC
C
C
C
C
C
T1 (17.5 %)
T2 (2
9.9
%)
PCA SCORE PLOT
x11 x12 … … x1k
x21 x22 … … x2k
… … … …
… … … …
xi1 xi2 … … xik
xi+1,1 xi+1,2 … … x1+1,k
… … …
… … …
xn1 xn2 xnk
1 0
1 0
1 0
1 0
1 0
0 1
0 1
0 1
0 1
0 1
DESCRIPTOR MATRIX X
DUMMY MATRIX Y
PLSCLASS 1
CLASS 2
Construction of the dummy or indicator matrix for DPLS (PLS discriminant analysis) which is used to find the projections of X space that discriminate best the classes of the training set.
-5 -4 -3 -2 -1 0 1 2 3 4-3
-2
-1
0
1
2
3
VV
VV
V
VV
VV
V
V
V
V
V
V
VV
V
CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
T1
T2
D-PLS SCORE PLOT
Percent Variance Captured by PLS Model -----X-Block----- -----Y-Block----- LV # This LV Total This LV Total ----- -------- ------- ------- ------- 1 11.8 11.9 76.2 76.2 2 5.6 17.4 13.3 89.5
-4 -3 -2 -1 0 1 2 3-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
V
VV
V
V
VV
VV
V
V
V
V
V
V
VV
V
Gluc
Alb
Cl
K
Na
Crea
Urea
Urat
Ca
PI
Bil
B-cf
Afos
Alat
AsatLD
CholTrig
Fe
gCT
IGE
IGA
IGG
IGM
BSP
Prot
CC
C
C
C
C
CC
C
CC
C
C
C
C
C
C
T1, W1
T2, W
2
BIPLOT OF THE D-PLS MODEL
Effect of Industrial Effluents on Trace Element Patterns of Aquatic Plants
Field work: Jukka SärkkäAnalyses: Inkeri Yliruokanen
Study Area in Lake Päijänne: Aquatic plants (3 species of Nympheids) collected in two summers, follow up in Site 1. Samples were analysedfor Ash % and 13 minor and trace elements.
1 = Jyväsjärvi: Industrial and municipal effluents, follow-up after remedial measures
4 = Tehinselkä: Clean Area
3 = Judinsalonselkä: Intermediate zone
2 = Tiirinselkä: heavily polluted
by industrial effluents
Species/ Location
Ash %
V µ/g
Mn µ/g
Fe µ/g
Cu µ/g
Zn µ/g
Rb µ/g
Sr µ/g
Ba µ/g
Y µ/g
La µ/g
Ce µ/g
Pr µ/g
Pb µ/g
A1 12.3 2.4 1200 2800 24 59 24 120 160 2.0 3.0 6.0 1.0 8.6 A2 12.9 7.7 1600 2200 20 110 28 77 130 1.8 3.8 5.1 0.4 5.1 A2 18.9 30.0 1600 5300 56 140 40 94 190 7.5 13 34 1.5 9.4 A3 9.5 1.0 290 360 9.5 66 19 87 95 1.1 1.5 1.9 0.2 1.3 A3 7.2 0.7 250 280 3.6 57 14 86 36 0.2 0.5 0.7 0 1.4 A4 9.0 0.9 650 1100 5.4 36 16 99 27 0.4 0.6 0.9 0.1 0.9 A4 7.5 3.0 540 970 6.0 22 15 97 75 0.5 3.0 3.3 0.2 3.7 B1 12.4 3.7 740 2300 12 42 32 60 93 2.4 1.2 2.4 0.2 2.4 B1 7.1 2.8 300 2000 14 30 26 36 42 1.4 2.1 2.8 0.2 3.5 B3 6.9 0.7 320 590 3.5 21 22 52 31 1.0 1.4 1.4 0.2 1.4 B3 9.7 5.8 610 1600 2.9 23 33 62 38 1.0 1.0 2.4 0.2 1.9 B4 8.1 0.8 120 340 4.1 21 30 51 36 0.6 0.6 0.4 0.1 2.4 B4 5.5 0.8 209 310 2.7 10 17 40 16 0.2 0.6 0.6 0.05 1.7 C1 13.4 2.7 250 270 20 20 53 19 180 0.3 0.1 0.3 0 21 C2 10.8 2.1 960 520 11 41 38 16 150 0.5 0.6 0.6 0.1 1.0 C2 8.4 1.1 840 400 17 45 50 15 84 0.2 0.3 0.7 0.08 1.7 C3 11.5 0.6 240 140 3.4 65 31 24 180 0.3 0.5 0.9 0.1 2.1 C3 10.1 1 300 110 3.0 40 36 23 120 0.3 0.7 0.8 0.1 0.8 C4 9.0 0.3 93 93 1.9 18 28 65 48 0.1 0.2 0.5 0 0.3 C4 8.5 0.9 140 190 2.6 12 30 29 93 0.2 0.4 0.4 0 1.4 a1 7.91 1.0 1200 240 10 27 20 79 110 0.3 1.2 1.7 0.2 1.0 a1 8.29 0.9 360 250 8.3 21 11 65 41 0.1 1.4 0.9 0.2 8.3 b1 9.17 0.9 1200 440 17 27 29 68 50 0.5 1.3 2.4 0.2 1.1 b1 6.46 1.9 340 430 6.5 9 17 34 29 0.1 1.1 1.4 0.2 1.0 c1 9.24 1.0 1200 160 5.9 19 21 13 130 0.3 1.0 1.1 0.3 1.2 c1 9.27 0.7 650 100 10 18 25 25 140 0.3 0.5 0.5 0.1 1.0 c1 9.02 0.7 330 100 4.7 14 31 12 52 0.3 0.5 0.6 0.1 1.8
SPECIES: A and a = Potamogeton natans, B and b = Polygonum amphibium, C and c = Nuphar luteum
-2 -1 0 1 2 3-1.5
-1
-0.5
0
0.5
1
A1A2
A2
A3
A3A4
A4
B1
B1
B3B3B4
B4
C1
C2C2C3C3
C4
C4ar
arbr
br
crcr
cr
CLASSIFICATION ACCORDING TO SPECIES
T1 (49.1 %)
T2 (6
9.5
%)
-2 0 2 4-1
01
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
T1T2
T3
Nympheids
C4
B4
A3
A4
ar
B3brB4
A4
C4
A3ar
br
B3
crB1
C3
C3crcr
A1
B1
C2
A2
C2C1
A2
SAMPLE SCORES
-0.2 0 0.2 0.4 0.6-1
-0.50
0.5
-0.8-0.6-0.4-0.2
00.20.40.6
Fe
CeLa
Sr
Y
Pr
V
P1
PbMnCu
Zn
Nympheids
A%
Rb
Ba
P2P3
VARIABLE LOADINGS
B3
C3
br
cr
A3
C2C2
A2 A2
B3
C3arbr
cr
T1 (65.3 %)-3 -2 -1 0 1 2 3 4
-2
-1.5
-1
-0.5
0
0.5
1
1.5
A1
A4A4
B1B1B4
B4
C1
C4
C4
A3
arcr
EFFECT OF REMEDIAL MEASURES ON JYVÄSJÄRVI (Sites 1 and r)
T2 (8
5.5
%)
EFFECT OF REMEDIAL MEASURES ON JYVÄSJÄRVI (Site 1 and 1r):DPLS model with Site 1 and Site 4 objects, other objects fitted to this model
-1 -0.5 0 0.5 1-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
A1
A4A4
B1B1
B4
B4
C1
C4
C4
T1
T2
-1 -0.5 0 0.5 1-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
A2
A3
A3
B3
B3
C2C2
C3
C3
ararbr
br
crcr
EFFECT OF REMEDIAL MEASURES ON JYVÄSJÄRVI (Site 1 and 1r):DPLS model with Site 1 and Site 4 objects, other objects fitted to this model (different scaling from previous figure – information still the same)
THANK YOU
Спасибо