threeway analysis

Post on 12-Jan-2016

25 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Threeway analysis. Batch organic synthesis. Paul Geladi. Head of Research NIR CE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi @ btk.slu.se paul.geladi @ syh.fi. THREE-WAY ARRAY. K. - PowerPoint PPT Presentation

TRANSCRIPT

Threeway analysis

Batch organic synthesis

Paul Geladi

Head of Research NIRCEChairperson NIR Nord

Unit of Biomass Technology and ChemistrySwedish University of Agricultural SciencesUmeåTechnobothniaVasa

paul.geladi @ btk.slu.se paul.geladi @ syh.fi

I

J

K

A = batch

B = variable

C = time

THREE-WAY ARRAY

Literature

Geladi P. & Åberg P., Three-way modeling of a batch organic synthesis process monitored by near infrared spectroscopy, Journal of Near Infrared Spectroscopy, 9, 1-9, 2001

Geladi P. & Forsström J., Monitoring of a batch organic synthesis by infrared spectroscopy: modeling and interpretation of three-way data, Journal of Chemometrics, 16, 329-338, 2002.

Three-way arrays

• GC-MS

• LC-UV

• Fluorescence

• Batch processing

• many others

Properties

• Components / pseudorank

• 3 types, not 2

• No orthogonality

• Parsimonious model

400 600 800 1000 1200 1400 1600 1800 2000 2200 2400

0

0.5

1

1.5

2

2.5

3

3.5

4

Pseudoabsorbance

Wavelength nm

All batches and times

BATCH REACTION

• ester synthesis by refluxing alcohol and acid

• many batches as experimental design

• measure NIR spectrum with transflectance fiberoptic probe at regular intervals

• 400-2500 nm every 2 nm, 32 scans average

• reference = air

REACTION

O

O

H

H+amberlite

O

O

H

H+

OH

O

O

H

O H

H +

O

O

H

O H

H

+-H2Omolsikt

O

O

H

+

O

O-H+

-H20molecular sieve

REACTION

C5H11OH + CH3COOH -> C5H11OCOCH3 + H2O

-acid catalysis H+

-remove water to shift equilibrium

Fibers in Fibers out

Mirror

Wavelength nm

PseudoabsorbanceOne batch, all times

AB

C D

400 600 800 1000 1200 1400 1600 1800 2000 2200

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Parsimony = not using too many model parameters = no overfitting

10 Stations x 13 Variables x 22 Times

2 Components

MODEL PARAMETERSPCA1 10x286 20 + 572 = 592PCA2 13x220 26 + 440 = 466PCA3 22x130 44 + 260 = 304PARAFAC 20 + 26 + 44 = 92

IMPORTANT QUESTIONS

- can we learn something about reaction kinetics?

- can we see difference between batches?

- can we interpret the spectra?

- how does it all fit together?

REACTION 1

14 x 701 x 13 array.

Source of SS % explained

Rank 3 model 97.1

Residual 2.9

Total 100

Component 1 48.0

Component 2 15.3

Component 3 4.0

~

~

~

1050 wavelengths

14 batches

13 times

= +

+

X

E

I

J

K

a

b

c

1

1

1

a

b

c

2

2

2

Time

Parafac loading

Comp 2

Comp 1

Comp 3

Line plots of C-loadings

0 2 4 6 8 10 12 140

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

C-loading 1

C-loading 2Scatter plot of C-loadings 1 and 2

0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.420

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1

2

3

4

5

6 7

8 9

10

111213

600 800 1000 1200 1400 1600 1800

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Wavelength nm

B-loading 1

A

Wavelength nm

B-loading 2

B

600 800 1000 1200 1400 1600 1800-0.1

-0.05

0

0.05

0.1

0.15

0.2

Wavelength nm

B-loading 3

C

600 800 1000 1200 1400 1600 1800

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

600 800 1000 1200 1400 1600 1800-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Wavelength nm

First derivative spectra

PARAFAC loadings b1 and mean of acid and alcohol pure spectra

-0.5 -0.45 -0.4 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

b1

b2

b3

ALCOHOL

ACID

ESTER

WATER

ACETOPHENONE

REACTIONMIXTURE

Comp 1 (45%)

Comp 2 (17%)

0.15 0.2 0.25 0.3 0.35 0.40.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

a1

a2

28

55

120

90

100

15070

7860

9063

95

90

95

SLOW

FAST

Batch mode loadings: first and second

REACTION 2

6 x 40 x 776 array

number %SS SS

1 62 2.73

2 18 0.78

3 16 0.71

4 3.2 0.14

Model 99.2 4.38

Residual 0.8 0.038

Total 100 4.42

400 nm 2498 nm

0 min

120 min

60 min

Wavelength

Time

800 nm 1200 nm 1600 nm 2000nm

100 200 300 400 500 600 700 800 900 1000

5

10

15

20

25

Standard deviation (Absorbance)

Time50 min 100 min0 5 10 15 20 25

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Wavelength

Standard deviation (Absorbance)

400 600 800 1000 1200 1400 1600 1800 2000 2200 24000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

11 Batches

40 Times

A few hundred wavelengths

a a a1 2 3

bbb

1

23

cc

c

1

23

32 with extra center points

1 2 3 4 5 6 7 8 9 10 110

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Batch #

a1

block effect

Fig 10.51

t

t

1

2

Block effect

Early batches

Late batches

-0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Wavelength

p1

400 600 800 1000 1200 1400 1600 1800 2000 2200 24000

0.01

0.02

0.03

0.04

0.05

0.06

Ester/C6H6

Ester

”Block” loading

C6H6

Reaction mixture/C6H6

Acetic acid

Acetic acid/C6H6

Reaction

Alcohol

Alcohol/C6H6

Water

0 0.05 0.1 0.15 0.2 0.25 0.30

2

4

6

8

10

12

Distance to K-Means Nearest Group

1

2

3

4

5

6

7

8

9

10

11

Dendrogram Using Unscaled Data

400 600 800 1000 1200 1400 1600 1800 2000 2200 2400-0.03

-0.02

-0.01

0

0.01

0.02

0.03

Wavelength

SGT first derivative

3-level Koshal, or reduced 3 2 design

Reactant ratio (molar)

Catalyst (g)1 1.5 2

0.15

0.45

0.75

00

1

2

3

4

5

6

Fig 10.55

1 2 3 4 5

Pseudorank

Component size

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

a

a

2

3

1.5 / 0.45

1.5 / 0.75

2 / 0.75

1 / 0.75

1 / 0.15 1 / 0.45

Catalyst

Reagentratio

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

C6H6

Ester

Ester/C6H6

CH3COOH

CH3COOH/C6H6

C5H11 OH

C5H11 OH/C6H6

Mixture

Mixture/C6H6

H2O

c1

c2

c3

c4

u

u2

1

Wavelength

H+ on CH3COOH

C6H6

C5H11OH inC6H6

600 800 1000 1200 1400 1600 1800 2000-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

time

b

b b

b

1

2

3

4

0 20 40 60 80 100 120-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

b

b

1

3

030

60

90

0.12 0.14 0.16 0.18 0.2 0.22 0.240

0.05

0.1

0.15

0.2

0.25

600 800 1000 1200 1400 1600 1800-0.02

-0.015

-0.01

-0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

Wavelength

Bias

600 800 1000 1200 1400 1600 18000

0.05

0.1

0.15

Wavelength

Sum of squares

CONCLUSIONS

It is possible: rank 3-4

Preprocessing needed (derivative)

Interpretation of time (reaction kinetics)

Interpretation of batch mode (design)

Interpretation of spectral mode needs pure standards

What is the mystery chemical?

Visual interpretation as line or loading plots

Plotting

Especially for 3-way analysis

Paul Geladi

Plotting techniques

• Line / bar plots

• Box plots

• Quantile plots

• Autocorrelation plots

• Two-dimensional plots

• Three-dimensional plots

• Joint plots / biplots

Plotting techniques

• Response surfaces

• Imaging and mapping

• Movies

• Correlation spectroscopy

• Dendrograms

• Advanced interactive visualization

in more dimensions

What do we want to do?

• Inspect raw data

• Detect outliers / groupings

• Select a model

• Build the model = calculate parameters

• Choose a pseudorank

What do we want to do?

• Inspect and use the model parameters

• Study the residuals

• Use the model for predictions

• More??

Properties

• Rectangular shape

• Every point exists

• Projection

• Resolution?

Properties

• Distances are correct

• Angles are meaningful

Topology

Do all points have a continuum of close neighbours?

4000 5000 6000 7000 8000 9000 100000.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Wavenumber, cm-1

Absorbance

Average NIR spectrum

What do we see?

• Data?

• Interpolation?

• Model?

• Are data fuzzy?

• Are models fuzzy?

The human eye is superb atdetecting things

But also very subjective

The remedies

• Background information

• Experience

• Objective techniques

Chemometrics is poisoned by (bad) line and scatter plots

The biggest problem is with the scatter plots

Grain example

FTNIR 10000-4000 cm-1

112 x 1501

Flour

5 Locations

10 Cultivars

PCA after mean-centering

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

0 5 10 15 20 250

0.5

1

1.5

2

2.5

3

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

Line plotHorizontal: # comp.Vertical: singular value

TrueEasiest

%SS explainedbased on eigenvalues

# %SS Cumulative

1 78.89 78.89

2 18.21 97.10

3 1.56 98.66

4 0.77 99.43

5 0.11 99.54

6 0.08 99.62

7 0.06 99.68

-0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

t2 (18%)

t3 (1.6%)

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

t2 (18%)

t3 (1.6%)

9 10 11 12 13 14 15 16 179

10

11

12

13

14

15

16

17

Measured Y(:,2)

Pre

dic

ted

Y(:

,2)

Scores Plot

Protein in flour

PLS 6 components

Scatter plot requirements?

• Zero indicated?

• Orthonormal base?

• Equal scales?

• Mirroring?

PCA

• Never gives true spectra

• Never finds pure constituents

• Always rotates

• So why would scatter plots from it be useful?

• Factor analysis is much better

• Factors are chemically meaningful

• Curve resolution

• PARAFAC

Making PARAFAC loadings look good

X = A ( C B )’ + E

^

= X + E

X = USV’ + E

US is the space of A in the orthonormal basis of V

top related