(080930) statistical testing in mspc
TRANSCRIPT
Statistical testing in MSPC
ISPE PAT CoP meeting, Umetrics AB, Malmö, October 1, 2008
Frans van den BergUniversity of Copenhagen, Faculty of Life SciencesDepartment of Food Science, Quality and TechnologySpectroscopy and Chemometrics group [email protected]
www.models.life.ku.dk www.odin.life.ku.dk
(080930)
Statistical Process monitoring (SPM) and Control (SPC)
Levelchange
Distributionchange
Continuous time Past
Disturbance
Future
Disturbance
Co
nti
nu
ou
s t
ime
Monitoringchartat oneposition
Pro
cess
Disturbance(environment)
Input(raw material)
Output(product)
Statistical Process Control – Mean/Shewhart chart
Time
σµ
Expectation and population parameters
( )( )( )
1
)(
)()(
:1
2
222
:1
−
−=→−=
=→=
∑
∑
=
=
n
xixsxE
n
ixxxE
nixxx
nix
µσ
µ
Expected value sample statistic for n observation
Mean
Variance
Locality
Spread
68.3%95.4%
99.7%
e.g.: Normal distribution N(µx,σx)
ObservationsNotice: µ and σ arepopulation constants
Statistical Process Control – Mean/Shewhart chart
Time
Training
M = model of the process
Some basic notions
( )( )
( )
( )( ) %95Pr
%95Pr
100%
1
)()(
11
11
2:1
2
2:1
=+<<−=+<<−
===
=−
−===
−−
−−
==∑∑
xnxn
xnxn
rrrx
x
xxni
xni
SEtxSEtx
SEtxSEt
ssx
ss
n
sSE
ssn
xixsxE
n
ixx
µµµ
µ
Mean Variance Standard deviation
Standard error Relative SD RSD in %of the mean (coefficient of variation)
95% confidence interval/level; n large or sx given tn-1 = z0 = 1.96
Statistical Process Control – Mean/Shewhart chart
Time
CL = Xbar
UCL = Xbar + 3.06xS (= 99.8%)
UWL = Xbar + 1.96xS (= 95%)
Training Training
LCL = Xbar - 3.06xS (= 99.8%)
LWL = Xbar - 1.96xS (= 95%)
M = model of the processAssumption: Normal distribution base on ‘infinite past’
Statistical Process Control – Mean/Shewhart chart
Time
CL = Xbar
UCL = Xbar + 3.06xS (= 99.8%)
UWL = Xbar + 1.96xS (= 95%)
Training Process Training
LCL = Xbar - 3.06xS (= 99.8%)
LWL = Xbar - 1.96xS (= 95%)
M = model of the process
Classical test – Type I and II errors
Null hypothesis H0 : µ = 5.00Alternative hypo. H1 : µ > 5.00
If H0 is true
5.00 ↓
α = 5%
Type I error : rejecting H0 when it is true (α %); error level of the test
“The 5% is called the level of significance and is the probability of (incorrectly) rejecting the null hypothesis when it is true. The error we make in this way is called the type I error or the α-error.”
Classical test – Type I and II errors
Null hypothesis H0 : µ = 5.00Alternative hypo. H1 : µ > 5.00
If H1 is true
5.00 ↓ ↓ xbar = 5.01
α = 5%β ≈ 65%
↓ critical = 5.06
Type II error : accept H0 when it is false (β %);
“We have now made a type II error or β-error, which consists in (incorrectly) accepting the null hypothesis is true, while in fact it is not.”
Classical test – Type I and II errors
Null hypothesis H0 : µ = 5.00Alternative hypo. H1 : µ > 5.00
If H0 is true If H1 is true
5.00 ↓ ↓ xbar = 5.01
α = 5%β ≈ 65%
↓ critical = 5.06
Type I error : rejecting H0 when it is true (α %); error level of the testType II error : accept H0 when it is false (β %);
α : false positive, β : false negativePower of a test = 1 – β (increase sample size, narrow distributions)More critical test (< α) increases β ( 100%)
Average Run Length - ARL
ARL - how good is a chartL1 : how many observations to detect a shift – should be small
(related to the α-error; must specify change e.g. +3σ/√n, p = 50% L1 = 2 +σ/√n, p = 2.28% L1 = 44)
L0 : how many observations before a false alarm – should be large(related to the β-error; L0 = 500, p = 0.02%)
Null hypothesis H0 : x = µAlternative hypo. H1 : x > µ
If H0 is true If H1 is true
µ ↓ ↓ observation x
α = 5%β ≈ 65%
↓ critical
Type I error : rejecting H0 when it is true (α %); error level of the testType II error : accept H0 when it is false (β %);
Statistical Process Control – Mean/Shewhart chart
Time
Alarm:• 1 point outside U/LCLp = 0.02%
• 2 points outside U/LWLp = (0.025)2 = 0.1%
• 7 points (or 10 out of 11) on one side of CLp = (0.5)7 = 0.1%p = 11(0.5)100.5 = 0.5%
• 7 points in/decreasingp = 0.1%
• Many more statistical (and heuristic) rules
Multivariate SPC
Data table/matrix (X)
p V T
1 x x x2 x x x3 x x x4 x x x5 x x x6 x x x7 x x x8 x x x9 x x x
Multivariate data
p.V = n.R.T
Multivariate data
o
oo
o o
o
o
o
p
V
T
X
variablesp V T
bal
loo
ns
123456789
o
MSPC – Hotelling-statistic
Hotelling T2 statistic:
• observation x is k variables each time point (e.g. [p V T])
• n trainings observations with mean xbar(k x 1) (NOC data)
• covariance matrix S(k x k)
( )( )
( ) ( )( )( ) ( )
( )0
,1
1
2
22
12
1
=
−−−
=
−−=−
−−=
−
=∑
LCL
UCL
newT
new
n
i
Tii
T
knkFknn
knT
xxSxxT
n
xxxxS
α
• T2 is weighted or directed distance from center (xbar)(Mahalanobis distance)
• T2UCL is contour with equal
distance (covering e.g. 95% of all observations)
M
o
oo
o o
o
o
o
p
V
T
X
variablesp V T
123456789
o
o
New sample(e.g. new time update)
Normal Operating Conditions(NOC, e.g. 95% hyper-ellipse)
M
bal
loo
ns
MSPC – Hotelling-statistic
T2 is weighted or directed distance from center (xbar)(Mahalanobis distance)
T2UCL is contour with equal
distance (covering e.g. 95% or 99% of all observations)
… if variables in x are correlated then S will be (near) singular
Effective rank < true rank
Inverse does not exist or unstable!
Hotelling T2 statistic:
observation x is k variables each time point (e.g. [p V T])
n trainings observations with mean xbar(k x 1) (NOC data)
covariance matrix S(k x k)
( )( )
( ) ( )( )( ) ( )knkF
knn
knT
xxSxxT
n
xxxxS
UCL
newT
new
n
i
Tii
−−−
=
−−=−
−−=
−
=∑
,1
1
22
12
1
α
MSPC – Hotelling-statistic
Principal Component Analysis
X E= + +
t1
p1
t2
p2
p.V = n.R.T
PCA - First principal component
o
oo
o o
o
o
ofactor 1
pix
els
123456789
(factor 1)
p
V
T
X
variablesp V T
123456789
loadingsp V T
o
p.V = n.R.Tb
allo
on
s
PCA - First and second principal components
o
oo
o o
o
o
ofactor 1
factor 2
(factors 1 and 2)
X
variablesp V T
123456789
pix
els
loadingsp V T
p
V
T
o
bal
loo
ns
V
o
oo
o o
o
o
ofactor 1
factor 2
T
o
Normal Operating Conditions(NOC, e.g. 95% interval)
Process
A unit operation will record 50 to 2000 highly correlated process variables (p, V, T, etc.) …
A NIR analyzer will measure at 200 highly correlated wavelengths …
… monitor latent-variables from factor model!
p
o
MSPC - process
New sample(e.g. new time update)
MP =
tnew = P’.xnew
=
e’ = x’ – t’.P’
= -
Two statistics:
t is projection of each sample record on the model P to compute D
How close is the new observation to the center of the training-set?
‘Goodness-of-fit’
Q = Σ e2 Sum-of-Squared errorHow far is the observation form the hyper-plane
spanned by the model P?‘Residual’
As usual, what is the right complexity F:How many principal components (new basis)?Same as always in PCA, often from cross-
validationUsually few components, 50-60% explained
variance
Multivariate SPC
MSPC – D-statistic
Hotelling T2 statistic on scores (‘pseudo variables’):
observation t is f principal components (e.g. f = 2, [t1 t2])
n trainings observations with mean tbar(f x 1) (NOC data)
covariance matrix S(f x f)
( )( )
( ) ( )( )( ) ( )fnfF
fnn
fnT
ttSttT
n
ttttS
UCL
newT
new
n
i
Tii
−−−
=
−−=−
−−=
−
=∑
,1
1
22
12
1
α
Notice that S is diagonal, due to orthogonally of the scores
Using a truncated model (only f factors) we now have residual from f+1..k principal components …
Also known as D-statistic
M
M
MSPC – Q-statistic
… the residuals
Assume multi-normal residuals with zero mean
Also known as Q-statistic
( )
( )
⎟⎟⎠
⎞⎜⎜⎝
⎛
=
−= ∑=
freedom of degreesh and gt with weigh
ondistributi square-chi weigthed
)(ˆ)(
2
1
2
hgSPE
ixixSPE
UCL
k
inewnew
χ M
MSPC – Q-statistic
( )
( )
( )
( ) ( ) ( )( ) confidence )-(1for variatenormal standardcovar
3
21
211
freedom of degreesh and weight gwith
ondistributi square-chi weigthed
)(ˆ)(
33
2212
2
210
1
1
202
21
0021
2
1
2
0
α
θθθθθθ
θθ
θθθ
χ
α
α
⇐⇐
===⎟⎟⎠
⎞⎜⎜⎝
⎛−=
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛+⎟⎟
⎠
⎞⎜⎜⎝
⎛ −−=
⎟⎟⎠
⎞⎜⎜⎝
⎛
=
−=∑=
z
tracetracetraceh
hzhhSPE
hgSPE
ixixSPE
h
UCL
UCL
k
inewnew
EV
VVV
M
MSPC – Q-statistic
… the residuals
Assume multi-normal residuals with zero mean
Also known as Q-statistic
( )
( )
⎟⎟⎠
⎞⎜⎜⎝
⎛
=
−= ∑=
NOC form estimatedh and gwith
ondistributi square-chi weigthed
)(ˆ)(
2
1
2
hgSPE
ixixSPE
UCL
k
inewnew
χ M
Principal Component Analysis
X E= + +
t1
p1
t2
p2
… the residuals
Assume multi-normal residuals with zero mean
Also known as Q-statistic
T, PE
A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736
MSPC – Contribution plots
A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736
MSPC – Contribution plots
D-statistic Q-statistic
ARL L1
ARL L1
Pump failure
A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736
MSPC – Contribution plots
Contribution of original variablesin the score value
PCA: X = T. P’ X.P = T.P’.POne sample: t = [t1 t2] = [x’.p1 x’.p2]
ci = x’.pi
Pump failure
A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736
MSPC – Contribution plots
Uncertainty in ci ?x ± zα/2. σ2
x from NOCc ± zα/2. σ2
c unknown
σ2c = σ2
xp = σ2x. σ
2p
Estimate σ2p from
Bootstrap re-sampling
ci = x’.pi
Pump failure
A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736
MSPC – Contribution plots
ci = x’.pi
Pump failure
Thank you!
Statistical testing in MSPC(080930)
Frans van den BergUniversity of Copenhagen, Faculty of Life SciencesDepartment of Food Science, Quality and TechnologySpectroscopy and Chemometrics group [email protected]
www.models.life.ku.dk www.odin.life.ku.dk
References used in this presentation
General introduction:S.J. Wierda Multivariate statistical process control — recent results and directions for future research Statistica
Neerlandica 48(1994)147–168Douglas C. Montgomery Introduction to Statistical Quality Control Wiley (2001)Theodora Kourti and John MacGregor Tutorial: Process Analysis, monitoring and diagnosis, using multivariate
projection methods Chemometrics and Intelligent Laboratory Systems 28(1995)3-21Theodora Kourti Process Analytical Technology Beyond Real-Time Analyzers: The Role of Multivariate Analysis
Critical Reviews in Analytical Chemistry 36(2006)257–278
D- and Q-statistics:J. Edward Jackson and Govind S. Mudholkar Control Procedures for Residuals Associated With Principal
Component Analysis Technometrics 21/3(1979)341-349Paul Nomikos and John F. MacGregor Multivariate SPC Charts for Monitoring Batch Processes Technometrics
37/1(1995)41-59D. J. Louwerse and A. K. Smilde Multivariate statistical process control of batch processes based on three-way
models Chemical Engineering Science 55(2000)1225-1235
Contribution statistics:A. K. Conlin, E. B. Martin and A. J. Morris Confidence limits for contribution plots Journal of Chemometrics
14(2000)725-736Johan A. Westerhuis, Stephen P. Gurden and Age K. Smilde Generalized contribution plots in multivariate statistical
process Monitoring Chemometrics and Intelligent Laboratory Systems 51(2000)95–114
Discussion and Evaluation:Eric N. M. van Sprang, Henk-Jan Ramaker, Johan A. Westerhuis, Stephen P. Gurden and Age K. Smilde Critical
evaluation of approaches for on-line batch process monitoring Chemical Engineering Science 57(2002)3979-3991