(080930) statistical testing in mspc

18
Statistical testing in MSPC ISPE PAT CoP meeting, Umetrics AB, Malmö, October 1, 2008 Frans van den Berg University of Copenhagen, Faculty of Life Sciences Department of Food Science, Quality and Technology Spectroscopy and Chemometrics group [email protected] www.models.life.ku.dk www.odin.life.ku.dk (080930) Statistical Process monitoring (SPM) and Control (SPC) Level change Distribution change Continuous time Past Disturbance Future Disturbance Continuous time Monitoring chart at one position Process Disturbance (environment) Input (raw material) Output (product)

Upload: others

Post on 07-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (080930) Statistical testing in MSPC

Statistical testing in MSPC

ISPE PAT CoP meeting, Umetrics AB, Malmö, October 1, 2008

Frans van den BergUniversity of Copenhagen, Faculty of Life SciencesDepartment of Food Science, Quality and TechnologySpectroscopy and Chemometrics group [email protected]

www.models.life.ku.dk www.odin.life.ku.dk

(080930)

Statistical Process monitoring (SPM) and Control (SPC)

Levelchange

Distributionchange

Continuous time Past

Disturbance

Future

Disturbance

Co

nti

nu

ou

s t

ime

Monitoringchartat oneposition

Pro

cess

Disturbance(environment)

Input(raw material)

Output(product)

Page 2: (080930) Statistical testing in MSPC

Statistical Process Control – Mean/Shewhart chart

Time

σµ

Expectation and population parameters

( )( )( )

1

)(

)()(

:1

2

222

:1

−=→−=

=→=

=

=

n

xixsxE

n

ixxxE

nixxx

nix

µσ

µ

Expected value sample statistic for n observation

Mean

Variance

Locality

Spread

68.3%95.4%

99.7%

e.g.: Normal distribution N(µx,σx)

ObservationsNotice: µ and σ arepopulation constants

Page 3: (080930) Statistical testing in MSPC

Statistical Process Control – Mean/Shewhart chart

Time

Training

M = model of the process

Some basic notions

( )( )

( )

( )( ) %95Pr

%95Pr

100%

1

)()(

11

11

2:1

2

2:1

=+<<−=+<<−

===

=−

−===

−−

−−

==∑∑

xnxn

xnxn

rrrx

x

xxni

xni

SEtxSEtx

SEtxSEt

ssx

ss

n

sSE

ssn

xixsxE

n

ixx

µµµ

µ

Mean Variance Standard deviation

Standard error Relative SD RSD in %of the mean (coefficient of variation)

95% confidence interval/level; n large or sx given tn-1 = z0 = 1.96

Page 4: (080930) Statistical testing in MSPC

Statistical Process Control – Mean/Shewhart chart

Time

CL = Xbar

UCL = Xbar + 3.06xS (= 99.8%)

UWL = Xbar + 1.96xS (= 95%)

Training Training

LCL = Xbar - 3.06xS (= 99.8%)

LWL = Xbar - 1.96xS (= 95%)

M = model of the processAssumption: Normal distribution base on ‘infinite past’

Statistical Process Control – Mean/Shewhart chart

Time

CL = Xbar

UCL = Xbar + 3.06xS (= 99.8%)

UWL = Xbar + 1.96xS (= 95%)

Training Process Training

LCL = Xbar - 3.06xS (= 99.8%)

LWL = Xbar - 1.96xS (= 95%)

M = model of the process

Page 5: (080930) Statistical testing in MSPC

Classical test – Type I and II errors

Null hypothesis H0 : µ = 5.00Alternative hypo. H1 : µ > 5.00

If H0 is true

5.00 ↓

α = 5%

Type I error : rejecting H0 when it is true (α %); error level of the test

“The 5% is called the level of significance and is the probability of (incorrectly) rejecting the null hypothesis when it is true. The error we make in this way is called the type I error or the α-error.”

Classical test – Type I and II errors

Null hypothesis H0 : µ = 5.00Alternative hypo. H1 : µ > 5.00

If H1 is true

5.00 ↓ ↓ xbar = 5.01

α = 5%β ≈ 65%

↓ critical = 5.06

Type II error : accept H0 when it is false (β %);

“We have now made a type II error or β-error, which consists in (incorrectly) accepting the null hypothesis is true, while in fact it is not.”

Page 6: (080930) Statistical testing in MSPC

Classical test – Type I and II errors

Null hypothesis H0 : µ = 5.00Alternative hypo. H1 : µ > 5.00

If H0 is true If H1 is true

5.00 ↓ ↓ xbar = 5.01

α = 5%β ≈ 65%

↓ critical = 5.06

Type I error : rejecting H0 when it is true (α %); error level of the testType II error : accept H0 when it is false (β %);

α : false positive, β : false negativePower of a test = 1 – β (increase sample size, narrow distributions)More critical test (< α) increases β ( 100%)

Average Run Length - ARL

ARL - how good is a chartL1 : how many observations to detect a shift – should be small

(related to the α-error; must specify change e.g. +3σ/√n, p = 50% L1 = 2 +σ/√n, p = 2.28% L1 = 44)

L0 : how many observations before a false alarm – should be large(related to the β-error; L0 = 500, p = 0.02%)

Null hypothesis H0 : x = µAlternative hypo. H1 : x > µ

If H0 is true If H1 is true

µ ↓ ↓ observation x

α = 5%β ≈ 65%

↓ critical

Type I error : rejecting H0 when it is true (α %); error level of the testType II error : accept H0 when it is false (β %);

Page 7: (080930) Statistical testing in MSPC

Statistical Process Control – Mean/Shewhart chart

Time

Alarm:• 1 point outside U/LCLp = 0.02%

• 2 points outside U/LWLp = (0.025)2 = 0.1%

• 7 points (or 10 out of 11) on one side of CLp = (0.5)7 = 0.1%p = 11(0.5)100.5 = 0.5%

• 7 points in/decreasingp = 0.1%

• Many more statistical (and heuristic) rules

Multivariate SPC

Page 8: (080930) Statistical testing in MSPC

Data table/matrix (X)

p V T

1 x x x2 x x x3 x x x4 x x x5 x x x6 x x x7 x x x8 x x x9 x x x

Multivariate data

p.V = n.R.T

Multivariate data

o

oo

o o

o

o

o

p

V

T

X

variablesp V T

bal

loo

ns

123456789

o

Page 9: (080930) Statistical testing in MSPC

MSPC – Hotelling-statistic

Hotelling T2 statistic:

• observation x is k variables each time point (e.g. [p V T])

• n trainings observations with mean xbar(k x 1) (NOC data)

• covariance matrix S(k x k)

( )( )

( ) ( )( )( ) ( )

( )0

,1

1

2

22

12

1

=

−−−

=

−−=−

−−=

=∑

LCL

UCL

newT

new

n

i

Tii

T

knkFknn

knT

xxSxxT

n

xxxxS

α

• T2 is weighted or directed distance from center (xbar)(Mahalanobis distance)

• T2UCL is contour with equal

distance (covering e.g. 95% of all observations)

M

o

oo

o o

o

o

o

p

V

T

X

variablesp V T

123456789

o

o

New sample(e.g. new time update)

Normal Operating Conditions(NOC, e.g. 95% hyper-ellipse)

M

bal

loo

ns

MSPC – Hotelling-statistic

Page 10: (080930) Statistical testing in MSPC

T2 is weighted or directed distance from center (xbar)(Mahalanobis distance)

T2UCL is contour with equal

distance (covering e.g. 95% or 99% of all observations)

… if variables in x are correlated then S will be (near) singular

Effective rank < true rank

Inverse does not exist or unstable!

Hotelling T2 statistic:

observation x is k variables each time point (e.g. [p V T])

n trainings observations with mean xbar(k x 1) (NOC data)

covariance matrix S(k x k)

( )( )

( ) ( )( )( ) ( )knkF

knn

knT

xxSxxT

n

xxxxS

UCL

newT

new

n

i

Tii

−−−

=

−−=−

−−=

=∑

,1

1

22

12

1

α

MSPC – Hotelling-statistic

Principal Component Analysis

X E= + +

t1

p1

t2

p2

p.V = n.R.T

Page 11: (080930) Statistical testing in MSPC

PCA - First principal component

o

oo

o o

o

o

ofactor 1

pix

els

123456789

(factor 1)

p

V

T

X

variablesp V T

123456789

loadingsp V T

o

p.V = n.R.Tb

allo

on

s

PCA - First and second principal components

o

oo

o o

o

o

ofactor 1

factor 2

(factors 1 and 2)

X

variablesp V T

123456789

pix

els

loadingsp V T

p

V

T

o

bal

loo

ns

Page 12: (080930) Statistical testing in MSPC

V

o

oo

o o

o

o

ofactor 1

factor 2

T

o

Normal Operating Conditions(NOC, e.g. 95% interval)

Process

A unit operation will record 50 to 2000 highly correlated process variables (p, V, T, etc.) …

A NIR analyzer will measure at 200 highly correlated wavelengths …

… monitor latent-variables from factor model!

p

o

MSPC - process

New sample(e.g. new time update)

MP =

tnew = P’.xnew

=

e’ = x’ – t’.P’

= -

Two statistics:

t is projection of each sample record on the model P to compute D

How close is the new observation to the center of the training-set?

‘Goodness-of-fit’

Q = Σ e2 Sum-of-Squared errorHow far is the observation form the hyper-plane

spanned by the model P?‘Residual’

As usual, what is the right complexity F:How many principal components (new basis)?Same as always in PCA, often from cross-

validationUsually few components, 50-60% explained

variance

Multivariate SPC

Page 13: (080930) Statistical testing in MSPC

MSPC – D-statistic

Hotelling T2 statistic on scores (‘pseudo variables’):

observation t is f principal components (e.g. f = 2, [t1 t2])

n trainings observations with mean tbar(f x 1) (NOC data)

covariance matrix S(f x f)

( )( )

( ) ( )( )( ) ( )fnfF

fnn

fnT

ttSttT

n

ttttS

UCL

newT

new

n

i

Tii

−−−

=

−−=−

−−=

=∑

,1

1

22

12

1

α

Notice that S is diagonal, due to orthogonally of the scores

Using a truncated model (only f factors) we now have residual from f+1..k principal components …

Also known as D-statistic

M

M

MSPC – Q-statistic

… the residuals

Assume multi-normal residuals with zero mean

Also known as Q-statistic

( )

( )

⎟⎟⎠

⎞⎜⎜⎝

=

−= ∑=

freedom of degreesh and gt with weigh

ondistributi square-chi weigthed

)(ˆ)(

2

1

2

hgSPE

ixixSPE

UCL

k

inewnew

χ M

Page 14: (080930) Statistical testing in MSPC

MSPC – Q-statistic

( )

( )

( )

( ) ( ) ( )( ) confidence )-(1for variatenormal standardcovar

3

21

211

freedom of degreesh and weight gwith

ondistributi square-chi weigthed

)(ˆ)(

33

2212

2

210

1

1

202

21

0021

2

1

2

0

α

θθθθθθ

θθ

θθθ

χ

α

α

⇐⇐

===⎟⎟⎠

⎞⎜⎜⎝

⎛−=

⎟⎟

⎜⎜

⎛+⎟⎟

⎞⎜⎜⎝

⎛ −−=

⎟⎟⎠

⎞⎜⎜⎝

=

−=∑=

z

tracetracetraceh

hzhhSPE

hgSPE

ixixSPE

h

UCL

UCL

k

inewnew

EV

VVV

M

MSPC – Q-statistic

… the residuals

Assume multi-normal residuals with zero mean

Also known as Q-statistic

( )

( )

⎟⎟⎠

⎞⎜⎜⎝

=

−= ∑=

NOC form estimatedh and gwith

ondistributi square-chi weigthed

)(ˆ)(

2

1

2

hgSPE

ixixSPE

UCL

k

inewnew

χ M

Page 15: (080930) Statistical testing in MSPC

Principal Component Analysis

X E= + +

t1

p1

t2

p2

… the residuals

Assume multi-normal residuals with zero mean

Also known as Q-statistic

T, PE

A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736

MSPC – Contribution plots

Page 16: (080930) Statistical testing in MSPC

A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736

MSPC – Contribution plots

D-statistic Q-statistic

ARL L1

ARL L1

Pump failure

A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736

MSPC – Contribution plots

Contribution of original variablesin the score value

PCA: X = T. P’ X.P = T.P’.POne sample: t = [t1 t2] = [x’.p1 x’.p2]

ci = x’.pi

Pump failure

Page 17: (080930) Statistical testing in MSPC

A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736

MSPC – Contribution plots

Uncertainty in ci ?x ± zα/2. σ2

x from NOCc ± zα/2. σ2

c unknown

σ2c = σ2

xp = σ2x. σ

2p

Estimate σ2p from

Bootstrap re-sampling

ci = x’.pi

Pump failure

A.K. Conlin et.al. Confidence limits for contribution plots Journal of Chemometrics 14(2000)725-736

MSPC – Contribution plots

ci = x’.pi

Pump failure

Page 18: (080930) Statistical testing in MSPC

Thank you!

Statistical testing in MSPC(080930)

Frans van den BergUniversity of Copenhagen, Faculty of Life SciencesDepartment of Food Science, Quality and TechnologySpectroscopy and Chemometrics group [email protected]

www.models.life.ku.dk www.odin.life.ku.dk

References used in this presentation

General introduction:S.J. Wierda Multivariate statistical process control — recent results and directions for future research Statistica

Neerlandica 48(1994)147–168Douglas C. Montgomery Introduction to Statistical Quality Control Wiley (2001)Theodora Kourti and John MacGregor Tutorial: Process Analysis, monitoring and diagnosis, using multivariate

projection methods Chemometrics and Intelligent Laboratory Systems 28(1995)3-21Theodora Kourti Process Analytical Technology Beyond Real-Time Analyzers: The Role of Multivariate Analysis

Critical Reviews in Analytical Chemistry 36(2006)257–278

D- and Q-statistics:J. Edward Jackson and Govind S. Mudholkar Control Procedures for Residuals Associated With Principal

Component Analysis Technometrics 21/3(1979)341-349Paul Nomikos and John F. MacGregor Multivariate SPC Charts for Monitoring Batch Processes Technometrics

37/1(1995)41-59D. J. Louwerse and A. K. Smilde Multivariate statistical process control of batch processes based on three-way

models Chemical Engineering Science 55(2000)1225-1235

Contribution statistics:A. K. Conlin, E. B. Martin and A. J. Morris Confidence limits for contribution plots Journal of Chemometrics

14(2000)725-736Johan A. Westerhuis, Stephen P. Gurden and Age K. Smilde Generalized contribution plots in multivariate statistical

process Monitoring Chemometrics and Intelligent Laboratory Systems 51(2000)95–114

Discussion and Evaluation:Eric N. M. van Sprang, Henk-Jan Ramaker, Johan A. Westerhuis, Stephen P. Gurden and Age K. Smilde Critical

evaluation of approaches for on-line batch process monitoring Chemical Engineering Science 57(2002)3979-3991