calibration prof.dr.cevdet demir [email protected]

59
CALIBRATION Prof.Dr.Cevdet Demir [email protected]

Post on 20-Dec-2015

257 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

CALIBRATION

Prof.Dr.Cevdet Demir

[email protected]

Page 2: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

LINKING TWO SETS OF DATA TOGETHER

• Peak height to concentration

• Spectra to concentrations

• Taste to chemical constituents

• Biological activity to structure

• Biological classification to chromatographic peak areas

Page 3: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

NORMALLY WE ARE INTERESTED IN SOME FUNDAMENTAL PARAMETER e.g. concentration or biological classification

WE TAKE SOME MEASUREMENTS e.g. spectra or chromatograms

WE WANT TO USE THESE MEASUREMENTS TO GIVE US A PREDICTION OF THE FUNDAMENTAL PARAMETER

Page 4: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

UNIVARIATE CALIBRATION

One measurement e.g. a peak height

MULTIVARIATE CALIBRATION

Several measurements e.g. spectra

Page 5: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

NOTATION

“x” block is measured data e.g. spectra, chromatograms, GCMS of biological extract, structural parameters

“c” block is what we are trying to predict e.g. concentration, species, acceptability of a product, taste

Page 6: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

X

Y

Independent variable, e.g. Concentration

Response e.g. Spectroscopic

Experimental design

C

X

Predicted parameter, e.g. Concentration

Measurement e.g. spectroscopic

Calibration

Page 7: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

c x c X

C X

Page 8: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

MULTIVARIATE CALIBRATION IN ANALYTICAL CHEMISTRY

•Single component.

Example, concentration of chlorophyll a by uv/vis spectra.

•Mixture of components, all compounds known.

Example, mixture of pharmaceuticals, all pure compounds known.

Page 9: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

•Mixture of components, only some compounds known.

Example, coal tar pitch volatiles in industrial waste studied by spectroscopy, only some known.

•Statistical parameters.

Example, protein in wheat by NIR spectroscopy.

Page 10: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

UNIVARIATE CALIBRATION

“x” and “c” blocks consist of single measurements.

Traditional analytical chemistry

CLASSICAL CALIBRATION

x c . s

Unknown : s

s c+ . x

where c+ is the pseudo-inverse

Page 11: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

x

=

c s

Page 12: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

c

x

TREATMENT OF ERRORS IN CLASSICAL CALIBRATION

Page 13: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

PROBLEMS

1. Modern lab : dilution and sample preparation errors (in “c”) are probably bigger than spectroscopic errors (in “x”). Spectra are more reproducible. Differs to classical statistics.

2. Want to predict concentration from spectra etc. not vice versa.

Most classical textbooks in analytical chemistry and most spreadsheets incorrectly recommend classical calibration.

Page 14: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

INVERSE CALIBRATION

c x . b

Unknown : b

b c . x+

c

x

Page 15: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

c

=

b

=

x

Page 16: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

COMPARING FORWARD AND INVERSE CALIBRATION

0

5

10

15

20

25

30

35

40

0 1 2 3 4 5 6 7 8 9 10

Inverse

Classical

Page 17: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

INCLUDING THE INTERCEPT : first column of “x” is 1s

c b0+ b1x

c X . b

b X+ . c

c

=

b

=

X

Page 18: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

HOW WELL IS THE MODEL PREDICTED?

Huge number of approaches

• Root mean square error (divide by degrees of freedom – number of samples – 1 or 2 according to parameters in the model).

Often express as percentage either of the mean measurement or the standard deviation of the measurements

dxxEI

iii /)ˆ( 2

1

Page 19: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

• Correlation coefficient of predicted versus true – has problems if the number of samples is small.

• ANOVA and replicates analysis using lack-of-fit error, as discussed in the experimental design lectures.

• Leaving samples out and predicting them : cross-validation and testing will be discussed later.

Page 20: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

PROBLEMS

•Outliers can be a major difficulty. Graphical ways of looking for outliers – big area.

•Undue influence on least square models.

Page 21: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

MULTIWAVELENGTH

 

Example : four compounds, four wavelengths.

MULTIPLE LINEAR REGRESSION (MLR)

X = C. B 

Know

•X : a series of spectra

•C : concentrations

Page 22: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

WAYS OF PERFORMING THE CALIBRATION

1. Producing a series of mixture spectra of known concentrations by weighing different amounts and adding together

2. Taking a series of spectra and calibrating against and independent method e.g. HPLC.

Page 23: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

220 240 260 280 300 320 340 360 380 400

Page 24: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

EXAMPLE : UV/VIS OF PAHs AT 4 WAVELENGTHS, NO WAVELENGTH IS UNIQUE

Page 25: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

B = X+ . C

estimated [pyrene] = -3.870 A330 + 8.609 A335 – 5.098 A340 + 1.848 A345

Page 26: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

Can also use classical methods

This can be done by knowledge of the pure spectra.

Different to calibration where a series of mixtures recorded

ˆ X.S+C

Page 27: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

MULTIPLE LINEAR REGRESSION

•Why use only 4 wavelengths?

•Why not 10 or 100 wavelengths?

More information – not arbitrary choice of wavelengths.

•Number of wavelengths can be greater than number of compounds.

Page 28: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

X C B

=

Example

• 25 spectra

• 10 compounds

• 100 wavelengths

Page 29: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

B = X+ . C

In this case

•B is a matrix of coefficients, 100 10

•X is a spectral matrix, 25 100

•C is a concentration matrix, 25 10

Some technical problems using inverse calibration in this case, and often it does not work.

Page 30: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

Better approach

1. First predict the spectra S.

•Either they are known from the calibration of the pure standards

•Or they can be predicted from the mixture spectra

S C+. X

2. Then use these predictions in a model (e.g. of unknowns)

C X. S+

Page 31: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

MLR effectively models a spectrum as a sum of spectra of the components, e.g. for a 3 component model

Observed spectrum =

conc A spectrum A +

conc B spectrum B +

conc C spectrum C

Page 32: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

ENHANCEMENTS

• Selecting only certain variables, not all the wavelengths.

• Weighting of variables.

Page 33: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

ERROR ANALYSIS

This now becomes more sophisticated.

In addition to errors in the “c” block (concentration errors), now also errors in the “x” block (reconstruction of spectra).

Discuss later.

Page 34: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

LIMITATIONS AND PROBLEMS WITH MLR

• Number of experiments and number of wavelengths must never be less than number of compounds

• All significant compounds must be known. If still unknowns, then these are mixed up with the knowns. Problems if no pure standards and no reliable reference method. THIS IS THE BIGGEST LIMITATION.

•Sometimes extra wavelengths can be bad ones e.g. noise or background.

• Assume that concentrations are perfectly known, errors in only one variable, using classical approach.

Page 35: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

However if information on all the significant compounds is known then MLR is a simple an effective method.

Page 36: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

PRINCIPAL COMPONENTS REGRESSION (PCR)

Do not need to know all components in advance, simply "how many components", and the compounds of interest.

Overcomes a major limitation of MLR

Page 37: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

X

P

T

PCA

Sam

ples

Detector (e.g. wavelength)

Regression

T c

Sam

ples

r

concentration

c T . r

Page 38: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

The first step is to perform PCA.

Obtain a scores matrix, retaining A components

The value of A may be a guess of the number of compounds in the mixture.

Then r = T+. c

Page 39: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

Can extend to more than one concentration –

C T . R

TC R

Page 40: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

Example

25 spectra taken at 100 wavelengths

We know about and want to predict 4 compounds

We think there are around 10 compounds in the mixture, 6 are unknown.

T is a matrix of dimensions 25 10

C is a matrix of dimensions 25 4

R is a matrix of dimensions 10 4

Page 41: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

Example of the calculation of the concentration of pyrene in a set of 25 uv/vis spectra containing 10 different PAHS.

How many PCA components to use? The prediction gets better the more the number of components.

Page 42: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

ERRORS – “x” block

Simply as in PCA, look at eigenvalues as more principal components are calculated

0.001

0.01

0.1

1 3 5 7 9 11 13 15

Page 43: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

ERRORS – “c” block

Look at errors in calculation of concentrations – often different behaviour

0.01

0.1

1

1 3 5 7 9 11 13 15

Page 44: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

Predictions for pyrene concentration using 1, 5 and 10 principal components.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

observed concentration

pre

dic

ted

co

nce

ntr

ati

on

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

observed concentration

pre

dic

ted

co

nce

ntr

ati

on

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

observed concentration

pre

dic

ted

co

nce

ntr

ati

on

Page 45: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

Why not use a large number of PCA components?

Then one can get perfect prediction?

FALLACY : the idea is to predict unknowns, after the knowns have been modelled. Later PCs often model noise.

Choose no of PCs equal to number of compounds in the mixture? Methods for determining number of PCs described later when this is unknown.

Page 46: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

Advantage over MLR - only partial knowledge necessary.

 

Disadvantage : assumption that all errors in the "x" block.

Practical situation. 

•Modern instruments very reproducible.

•Volumetrics, measuring cylinders, syringes are inaccurate.

Page 47: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

PARTIAL LEAST SQUARES (PLS)

This technique assumes that errors in both “x” and “c” block are equally significant.

Page 48: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

X T P

T q

c

=

=

E

f

+

+ .

.

Page 49: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

What does this mean?

X = T.P + E

c = T.q + f

Page 50: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

THERE IS A COMMON SCORES MATRIX FOR BOTH “x” AND “c” BLOCKS.

In PCR we calculate the scores just for the “x” block and then use a separate step for regression.

A big difference between PCR and PLS is that in PCR there is only one scores matrix whereas for PLS (using 1 column) there are different scores matrices according for each compound.

The vector q is analogous to loadings.

Page 51: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

PLS components have some analogies to PC components.

In PCA, each component consists of a

•scores vector

•loadings vector

•eigenvalue.

Page 52: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

In PLS, each component consists of a

•scores vector

• “x” loadings vector (p)

• “c” loadings vector (q) – a single number

• magnitude.

Page 53: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

FOR THE TECHNICALLY MINDED.

•Unlike eigenvalues, the magnitudes of success PLS components do not necessarily decrease in size, although they do model the overall datasets.

•Unlike loadings for PCA, loadings in PLS are not orthogonal.

•In most cases PLS loadings are not normal.

•There are many algorithms for PLS and it can be confusing.

Page 54: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

ERROR ANALYSIS : similar principles to PCR but different curves for different compounds.

Sometimes different number of PLS components are used to model different compounds in one mixture.

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

x errors

c errors

Page 55: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

For a dataset consisting of 25 spectra observed at 27 wavelengths, for which 8 PLS components are calculated, there will be

•a T matrix of dimensions 25 8,

•a P matrix of dimensions 8 27,

•an E matrix of dimensions 25 27,

•a q vector of dimensions 8 1 and

•an f vector of dimensions 25 1.

Page 56: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

PLS2 – when more than one “c” variable

X T P

T Q

C

=

=

E

F

+

+ .

.

Page 57: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

X = T.P + E

C = T.Q + F

Differences to PLS1

•C is now a matrix

•Q is also a matrix

•F is also a matrix

•Single scores for all compounds in the mixture.

Page 58: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

•Theoretically PLS2 should perform better than PLS1 but in practice it often performs worse.

•Computationally faster, important 10 years ago.

•Useful for non-linear problems such as QSAR where interactions, but not so useful in analytical chemistry which is very linear.

Page 59: CALIBRATION Prof.Dr.Cevdet Demir cevdet@uludag.edu.tr

SUMMARY OF MAIN METHODS

• Univariate calibration

•Classical

•Inverse

•Multiple linear regression

•Principal components regression

•Partial least squares

•PLS1

•PLS2